Trying to run KohGPI but needed to find SPRANNLIB

Trying to run KohGPI but needed to find SPRANNLIB which is not available for what I can see – but found GNU-i-fied version in clibs directory of

This implementation also depends on the GNU Scientific Library (GSL) which you’ll need to build and install beforehand.

Then to build SPRANNLIB  download and untar the rlab archive. Go to ‘clibs/sprannlib/src’ and type make. If things are in order it will make a library in ../lib called libsprann.a spr_$ARCH.a where $ARCH is your architecture (x86_64, etc).

The .a files can be deployed in your normal place for manual (non-RPM/DEB) installations (/usr/local/pkg/sprannlib for me with symlinks to the libraries in /usr/local/lib). Then go back and build kohgpi but you may need to update the Makefile so that the library paths point to your sprannlib installation location.

Yah! Teragrid

I’ve been able to make the transition from the 1000 node Duke cluster to a smaller one here at Berkeley using Teragrid. What’s great about Teragrid is there are heterogeneous compute clusters with big SMP machines and 5000 node blade clusters. So I can run big memory apps or long running CPU intensive jobs without having to really to mess around too much.

Not to say that it is all easy. Each system has its own filesystem and in some cases, own queing system. So you have to be able to deal with PBS, LSF, and I think SGE. Since I’m a cluster scavenger anyways I guess you deal with what you can get.

I haven’t quite figured out how to deal with globus for running these types of jobs, mostly because lots of the analysis requires coordinating too many large datafiles and it is easier to stage them on a particular sites cluster.

I don’t think know if they are ready for large scale informatics though (or at least in me distributing jobs across whole cluster). I can only have 40 jobs in the queue on the TACC system for example so if you need to run 10-20K you have to chunk things a little differently.

All in all I am pleased, we’ll see what happens if I start trying to run annotation pipelines again since the CPU time is allocated and I’ve already eaten up 1/3 on my first foray here. There are larger allocations to apply for so maybe that will be the way to go.

Philosophically I am not sure if general purpose clusters are the way to go for all of bioinformatic computing. It seems like there are always a variety of types of jobs: independent and parallel jobs, jobs with dependancies, large memory jobs, long running jobs, many many short running jobs…