ooboonto reminder: srpnja 2011

Since i found that running POM - Princeton Ocean Model and building all the required libraries and stuff in order for it to even attempt to run quite challenging, I figured I'd try and dump it all here for future reference.

Considering I'm an average Ubuntu user - which essentially means, a human being - building and configuring custom linux libraries beyond apt-get isn't truly my stronger side; however I really had an opportunity to develop some of those skills.

Downloading the POM files

For a start, POM offers quite a number of versions to choose from. This includes versions with various additional meteorological models, ice melting models, etc. Considering I am interested only in ocean flow modelling, they are all essentially the same to me model-wise, however there are some additional features I would like to use - namely:

- parallel computing

- NetCDF output.

Due to this, I have chosen to download and try out the:

- pom2k, the latest 'basic pom' version with NetCDF support

- sbPOM, a parallelized version of the code (mpi-based), it too sporting a (parallel) NetCDF support.

POM is also equipped with an exhaustive user manual, which seems to cover the model code quite throughoutly, but I haven't gotten to that part yet because installing other dependencies and stuff took me a lot of time to figure out. In my 'research' of this problem I have encountered many people just wandering around wondering what to do, and the cause of this is that all the old, outdated manuals and stuff is still out there on the web, ready to misguide you.

So here are the...

Prerequisits to run sbpom and pom2k on Ubuntu (Oneiric Ocelot in this case):

the curl library (in the repos)
the bison parser (in the repos)
szip compression libraries (right here)
gzlib library
the flex or lex parsers (in the repos)
some fortran compiler (I used gfortran)
openmpi, mpich or something like that (in the repos)
the HDF5 1.8.7. library (right here)
the parallel-netcdf 1.2.0. library (right here)
the NetCDF 3.6.3. (grab the source code from here) //check link!

To clarify what needs be done here: in order to run pom2k you will need only NetCDF 3.6.3., installed without HDF5 / NetCDF4 / parallel support. For this, you will need only the curl library. I haven't suceeded in running POM with NetCDF 4.1.3., no matter how I compiled it, so this will have to do. If properly compiled, however, 3.6.3. is quite enough as POM doesn't (so far) use any of the NetCDF 4 features.

However in order to run sbPOM, you will need (open)mpi and parallel NetCDF which does require HDF5 support, which, in turn requires bison and (f)lex. And in order to compile parallel NetCDF you will need the zlib library.

To clarify one more thing: parallel NetCDF is not the NetCDF compiled in parallel mode. It is in fact a completely different software, currently at the version 1.2.0, unlike the 'real' NetCDF, currently at version 4.1.3. So, in order to make things more clear, I will reffer to it as parallel-netcdf from now on.

(The versions I mention are current, latest versions at the time of writing, and it is what I have actually used. I cannot say it will work with newer versions, however do not use lower versions, specially HDF5 below 1.7, because that definitely won' work).

Most of this stuff is in the repositories, however you will need to actually up and build it yourself, otherwise there's no way it's going to just work (with one exception). The reason for that is the fortran compiler, which I will discuss in detail a bit further down the text.

In order to run sbPOM you will also need:

matlab
matlab MEXCDF library (here)
matlab library M_MAP (here)
2 extra matlab functions

The order of installation makes all the difference, of course. Considering I have opted to use both sbPOM and pom2k, I required all of these components and, since I was to compile hdf5 and parallel-netcdf, I also wanted to compile netcdf itself in parallel.

This is also the proper order of installation due to related dependencies, and the libraries come as needed (although you could install all of those at once as well, the system wouldn't be bothered) . Meaning, you need hdf5 to build parallel-netcdf, and you need parallel-netcdf in order to build netcdf in parallel.

So let us go step by step into this slightly irritating business...

1. Installing HDF5 libraries

First thing you need to do is download the hdf5 library (right here). Now, the link that I supplied contains the universal version of hdf5 which I myself used. Their official site now offers the same package, however their other (!?) official site offers the latest code in distinct packages meant for static OR shared binding.

If you decide to use one of these, make sure you choose the static binding ones, because shared binding is not supported in parallel mode. (I didn't realize that at first, which brought me all kinds of trouble, considering that, for the sake of convenience, I wanted to use shared binding as well).

In order to install hdf5, you will require the following libraries that are available from the repos:

zlib (zlib-1.1.2 or higher)
szip

Hdf5 can work without those, but you will need them for other libraries so best to install them right away, and to properly bind them with hdf5.

Considering I was to build hdf5 in parallel, I also required an mpi compiler with mpi-io support, for which I chose (also from the repositories):

openmpi, mpicc (mpich)

I actually used mpicc to compile the whole thing.

Hdf5 has a decent installation manual, which you should read before attempting to install.

However, the manual states that shared libraries are 'problematic' in the IBM section of the manual. This problem is not IBM specific, so no matter what configuration you're using, when compiling the parallel version, omit the shared libraries option in order to avoid compilation and building issues.

Note: one thing that I omitted in my installation is this part of the manual:

make sure that your installation of MPICH was configured with the following

configuration command-line option:

-cflags="-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64"

This allows for more than 2GB sized files on Linux systems.

The fact that I avoided this is due to the fact that I do not plan to work on such large files (considering the hardware I'm using). Enabling this option will require you to manually build mpich / openmpi or whatever other mpi compiler you have chosen to use. If yo uplan to use large files, make sure you do this step. If you have already installed an mpi compiler through the repos, remove it first so you avoid any conflicts, and make sure the appropriate environment variables are set before continuing.

In order to build hdf5 for POM use, make sure you --enable-fortran in the ./configure options; even if it's turned on by default: for some reason, for me it didn't work when I built the libraries without explicitly stating this option. Also add: --enable-f77 and --enable-f90.

Here is what I did to build the hdf5 library:

export FFLAGS="-q64 -qxlf90=autodealloc"

CC=mpicc ./configure --prefix=/usr/local/hdf5 --enable-fortran --enable-parallel --disable-shared --enable-f77 and --enable-f90

make

make check

make install

Note: I have chosen to install hdf5 to /usr/local/hdf5 folder. This copy of hdf5 will be used solely for building the parallel-netcdf.

Also note: make check will probably take forever. Make sure the tests are being 'passed' and that all your processor cores are bing used during the tests (just fire up the system monitor, or use a screenlet, or, more conveniently, htop in the terminal (sudo apt-get install htop, if you don't have it). After you've done so, dispose of the terminal and kill the processes to continue, unless you want to grow a beard... :)

2. Installing parallel-netcdf libraries

To install the parallel-netcdf 1.2.0. library you will need to download the code from right here. I will note again: it is a different library from netcdf built in parallel mode, and it is also required to build netcdf in parallel mode.

In order to compile parallel-netcdf, you will require the following libraries:

either yacc or bison
either lex or flex

I have used bison and flex (from the repos). Note, you need only bison, not bison++ or other versions.

In order to compile properly, you will need to set the appropriate environment variables (bash syntax here), and point the compiler to the previously built hdf5 library and zlib library in order for the installation to complete properly, and to get rid of the numerous build-time errors that will occur.

In order to use the szip library, you need to download it from here and compile and install it - an easy procedure, here's what I did:

./configure --prefix=/usr/local/szip

make

make check

make install

After szip is installed to the usr/local/szip folder, one may proceed to install the parallel-netcdf libraries:

export CC="gcc"

export CXX="g++"

export FC="gfortran"

export F90="gfortran"

export MPICC="mpicc"

export MPICXX="mpicxx"

export MPIF77="mpif77"

export MPIF90="mpif90"

FC=mpif90 CC=mpicc CXX=mpicxx ./configure --enable-netcdf-4 --with-hdf5=/usr/local/hdf5/ --with-zlib=/usr/include/ --enable-parallel-tests --enable-parallel --prefix=/usr/local/ncdfp

If you are to copy-paste my code, make sure all of the compilers (gcc, g++, gfortran..) are set up. (Fastest way to do so would be using locate from the terminal, eg locate gfortran). If not, install them first from the repos, or change the environment variables definition if you are using different compilets.

Pointing the mpicc compiler to hdf5 and zlib libraries is done by --with-hdf5=/usr/local/hdf5/ --with-zlib=/usr/include/ attributes, and for parallel-netcdf this works perfectly.

After installing parallel-netcdf, we have acquired the environment to run sbPOM code - the parallel POM model code. All one needs to do now is point the sbPOM code to the parallel-netcdf library file, pnetcdf, and it should function all right.

Other version of NetCDF are required for POM2K, and this is to be discussed in the following post... :)

The remaining part of using sbPOM is installing proper matlab libraries in order to provide data preparation.

3. Installing matlab libraries in order to use sbPOM model

sbPOM model comes with matlab scripts to set up the input and visualize the output data. In order to run these scripts, you will require the following matlab libraries to be added to your basic matlab installation:

matlab
matlab MEXCDF library (here)
matlab library M_MAP (here)
2 extra matlab functions (will add links later)

I suggest trying to run the scripts first, as required by the sbPOM readme file, and check if your matlab installation already has them. I have used a matlab R2010a for unix, which didn't have these libraries so I was to install them manually. This is a very easy procedure, here's how I did this:

3.1 The MEXCDF library

The mexcdf library is the netcdf support library for matlab. It differs from the included, matlab native netcdf support, but it was used to create the .m files shipped with sbPOM so it will be required.

What you need to do is go to this site and download the SNCTOOLS and netcdf-java version 4.1. The download will result in the mexcdf.r3628.zip file and the netcdfAll-4.1.jar file.

There are the installation instructions on the download website, but a better way to do this is to add it to matlab toolboxes permanently, as follows.

First, you need to copy these files in your $MATLAB/toolbox/ folder, each in it's own folder, and extract them. The oddly named zip file will extract to mexcdf (with tqo subfolders), and the jar file to netcdfAll-4.1

Then, add these folders to the list defined in $MATLAB/toolbox/local/pathdef.m, like this:

matlabroot,'/toolbox/mexcdf;', ...

matlabroot,'/toolbox/mexcdf/mexnc;', ...

matlabroot,'/toolbox/mexcdf/snctools;', ...

matlabroot,'/toolbox/netcdfAll-4.1.jar;', ...

matlabroot,'/toolbox/netcdfAll-4.1;', ...

And restart matlab. (You can also use rehash toolboxcache in matlab command line, but for some reason this does not work for me).

3.2 The M_MAP library

Download the m_map library from this site. Again, unpack the files into $MATLAB/toolbox/m_map, add that directory to the list defined in $MATLAB/toolbox/local/pathdef.m, and restart matlab.

To see an example map, try this in matlab:

m_proj('oblique mercator');

m_coast;

m_grid;

3.3 Additional functions

In order to actually run the sbPOM matlab codes, you will need two additional matlab functions; one ispsliceuv and the other one is arrows. Ther're available online, but you have to clean the code of stupid line numbering, so in order to spare you ofthis, I will copy-paste the cleaned versions here. (Just copy-paste into separate .m files and save into your sbPOM/ecoast/prep folder).

function h=psliceuv(x,y,w,isub,sca,color)

% PSLICEUV plots a horizontal matrix of

% velocity from ECOMSI using arrows

% USAGE: h=psliceuv(x,y,w,isub,sca,color)

% x is array of x points

% y is array of y points

% w is array of velocities

% isub is number to subsample

% sca is scale factor for arrows

% color is color for arrows

% EXAMPLE: psliceuv(x,y,w,3,20,'white');

if(~exist('isub')),

isub=2;

end

if(~exist('sca')),

sca=1e4;

end

if(~exist('color')),

color='white';

end

[m,n]=size(w);

w=w([isub:isub:m],[isub:isub:n]);

x=x([isub:isub:m],[isub:isub:n]);

y=y([isub:isub:m],[isub:isub:n]);

ig=find(isfinite(w));

h=arrows(x(ig),y(ig),w(ig),sca,color);

function [h]=arrows(x,y,w,fac,color)

% function [h]=arrows(x,y,w,fac,color)

% Draws arrows with their tails at each point corresponding

% to identical indices in the matrices x,y. The matrices u

% and v are the components of the vector to be represented.

% fac is the scaling factor

% Geometry of arrowheads (choosing HEADA and HEADL):

% If the arrow is defined by the points A B C B D where A is the base of

% the arrow, B is the head, and C and D are the corners of the arrowhead, then

% HEADA is the angle BAC (or BAD), and HEADL is the ratio of distances AC/AB.

% revision 3/20/97 to use nans for line breaks

% much more efficient and returns only a single handle

HEADA=10*pi/180; HEADL=.75;

z=x(:)+i*y(:);

if nargin < 5,color='red';end

if nargin < 4,help arrows,end

w=w(:)*fac;

r=w*HEADL;

wr1=r*exp(+i*HEADA);

wr2=r*exp(-i*HEADA);

wplot=ones(length(z),6);

wplot(:,1)=z;

wplot(:,[2,4])=(z+w)*ones(1,2);

wplot(:,3)=z+wr1; wplot(:,5)=z+wr2;

wplot(:,6)=z*nan;

wplot=wplot.';

wplot=wplot(:);

%z=eps*ones(size(wplot));

%h=line(real(wplot),imag(wplot),z,'color',color);

h=line(real(wplot),imag(wplot),'color',color);

set(h(1),'userdata',fac);

And here it is... now what remains for me is to figure out how to run sbPOM on a 4-core computer, and how to fix up the pom.n file and rid it of bugs in order for pom2k to actuall create .nc results! :)

ooboonto reminder

subota, 30. srpnja 2011.

Building the hdf5, parallel-netcdf, netcdf and other requirements in order to use the POM model

O meni

Arhiva bloga