I’ve been able to make the transition from the 1000 node Duke cluster to a smaller one here at Berkeley using Teragrid. What’s great about Teragrid is there are heterogeneous compute clusters with big SMP machines and 5000 node blade clusters. So I can run big memory apps or long running CPU intensive jobs without having to really to mess around too much.
Not to say that it is all easy. Each system has its own filesystem and in some cases, own queing system. So you have to be able to deal with PBS, LSF, and I think SGE. Since I’m a cluster scavenger anyways I guess you deal with what you can get.
I haven’t quite figured out how to deal with globus for running these types of jobs, mostly because lots of the analysis requires coordinating too many large datafiles and it is easier to stage them on a particular sites cluster.
I don’t think know if they are ready for large scale informatics though (or at least in me distributing jobs across whole cluster). I can only have 40 jobs in the queue on the TACC system for example so if you need to run 10-20K you have to chunk things a little differently.
All in all I am pleased, we’ll see what happens if I start trying to run annotation pipelines again since the CPU time is allocated and I’ve already eaten up 1/3 on my first foray here. There are larger allocations to apply for so maybe that will be the way to go.
Philosophically I am not sure if general purpose clusters are the way to go for all of bioinformatic computing. It seems like there are always a variety of types of jobs: independent and parallel jobs, jobs with dependancies, large memory jobs, long running jobs, many many short running jobs…