Trying to run KohGPI but needed to find SPRANNLIB

Trying to run KohGPI but needed to find SPRANNLIB which is not available for what I can see – but found GNU-i-fied version in clibs directory of

This implementation also depends on the GNU Scientific Library (GSL) which you’ll need to build and install beforehand.

Then to build SPRANNLIB  download and untar the rlab archive. Go to ‘clibs/sprannlib/src’ and type make. If things are in order it will make a library in ../lib called libsprann.a spr_$ARCH.a where $ARCH is your architecture (x86_64, etc).

The .a files can be deployed in your normal place for manual (non-RPM/DEB) installations (/usr/local/pkg/sprannlib for me with symlinks to the libraries in /usr/local/lib). Then go back and build kohgpi but you may need to update the Makefile so that the library paths point to your sprannlib installation location.


Lamenting Grid access

Thomas makes some good points about his experiences and the still greater need for GRID computing. I am all for people writing interfaces on the GRID, but there doesn’t seem to be a very easy system. Presumably things like myExperiment, MOBY, and other approaches will make this an easier prospect, but I have yet to see a system that has the flexibility in using custom code, large datasets that need to be local to the compute, can be used by non-computer scientists (but computational saavy biology-types).

more filesystems on Mac

Have a server you ssh to and would like to be interact with it like it was a local mounted volume? TUAW has a nice link about this. You can also see how to use the MacFUSE system for a more comprehensive tutorial on FUSE.

Turns out that fink has the GMAILfs and FUSE pluginsto you can mount all kinds of things. Your GMAIL account can be a filesystem.

Gmail Filesystem provides a mountable filesystem which uses your Gmail
account as its storage medium. Gmail Filesystem is a Python
application and uses the FUSE userland filesystem infrastructure to
help provide the filesystem, and libgmail to communicate with Gmail.
GmailFS supports most file operations such as read, write, open,
close, stat, symlink, link, unlink, truncate and rename. This means
that you can use all your favourite unix command line tools to operate
on files stored on Gmail (e.g. cp, ls, mv, rm, ln, grep etc. etc.).
Usage Notes:
Copy gmailfs.conf from the doc directory to ~/.gmailfs.conf (and edit it).
Then run “gmailfs /path/to/mountpoint”
Web site:

Now, how many free GMAIL account invites do you have * how much space, means you could have a lot free-storage if you wanted…

Yah! Teragrid

I’ve been able to make the transition from the 1000 node Duke cluster to a smaller one here at Berkeley using Teragrid. What’s great about Teragrid is there are heterogeneous compute clusters with big SMP machines and 5000 node blade clusters. So I can run big memory apps or long running CPU intensive jobs without having to really to mess around too much.

Not to say that it is all easy. Each system has its own filesystem and in some cases, own queing system. So you have to be able to deal with PBS, LSF, and I think SGE. Since I’m a cluster scavenger anyways I guess you deal with what you can get.

I haven’t quite figured out how to deal with globus for running these types of jobs, mostly because lots of the analysis requires coordinating too many large datafiles and it is easier to stage them on a particular sites cluster.

I don’t think know if they are ready for large scale informatics though (or at least in me distributing jobs across whole cluster). I can only have 40 jobs in the queue on the TACC system for example so if you need to run 10-20K you have to chunk things a little differently.

All in all I am pleased, we’ll see what happens if I start trying to run annotation pipelines again since the CPU time is allocated and I’ve already eaten up 1/3 on my first foray here. There are larger allocations to apply for so maybe that will be the way to go.

Philosophically I am not sure if general purpose clusters are the way to go for all of bioinformatic computing. It seems like there are always a variety of types of jobs: independent and parallel jobs, jobs with dependancies, large memory jobs, long running jobs, many many short running jobs…