Brainstorming ideas

After a visit to Cornell visiting with Plant Pathology & Plant-Microbe Biology dept, Pseudomonas syringae groups, and SGN and also a breakfast with Tim Hubbard when he was in Berkeley I had a few ideas.

  • We need to be able to put the power of annotation in the hands of more people.  Community assisted annotation at the level of just function, linking to articles, and general curation should be accessible ala-wikipedia.
  • For genome annotation though, there is a more specialized need to be able to incorporate data from different sources. Git-like repository for genome annotation (in GFF) which can be served up to Gbrowse. Edits can be saved to ones own branch.  (all of this assumes the same reference genome assembly which is about the level I’m comfortable worrying about — tho some of the genome projector type tools would seem to make it easy to lift annotation from one assembly to the other).
  • Would probably necessitate a GenomeAnnotationDiff tool.  This might be already accomplished by tools that the Yandell lab has produced described in publication by Eilbeck et al.
  • Gene page with community annotation tools at SGN are ready to go and they have VMs to avoid having to install all the software. I even saw a cool QTL on the fly calculation.  The challenges I see in our data is always linking the data from one context to another how we make this useful. Will have to try and do a transformation of some of the different data we have here.
  • The SGN approach is to use aspects of Chado for the schema that deals with ontologies/controlled vocabularies but to also have domain specific databases for annotation and related info rather than the giant “everything is a feature” that is the Chado-way and doesn’t seem to scale.
  • It is about time to try out hadoop/MapReduce on our big datasets and to also earnestly start running the automated the all-vs-all ortholog prediction scripts on our genomes, there are just too many times it seems important to have an updated dataset – something to deploy on new hardware environment this summer.
  • No one has figured out how to interface with NCBI/GenBank/EMBL to deal with the updating of genomes in a sensible — basically all the really complicated systems are essentially keeping the bulk of the data in their own domain-specific databases and at some appointed times feeding that data back in, but often this is a huge process and only works where there is a real effort from both NCBI/GenBank/EMBL and the group.  E.g. Ensembl has the CCDS and RefSeq projects that can take the output from Ensembl and feed that back into the system.
    What would a comparative reannotation of X fungal genomes system be able to do with the data?

On the Plant Path & fungal side of discussions

  • Looking at multiple genotypes of both the host and pathogen seem like a really smart way to start to explore the effects of mutations. With so many more tools now in both systems it seems like this would be next logical arraying of experimental designs.
  • I really need to get some movies made of Bd (Chytrid) zoospores swimming around, would make for better introductions to talks, I had to settle for showing oomycete zoospores which are cool but not the same.
  • There needs to be new/better tools for population genetics for systems where the populations are clearly not in Hardy-Weinberg equilibrium such as newly introduced pathogens
  • Closeup pictures of fungi are really cool especially through the boroscope

Phycomyces genome now available

phycomycesThe JGI has released the Phycomyces blakesleeanus genome. This represents the second Zygomycete genome sequence that has been released in addition to Rhizopus oryzae that was released by the Broad Institute last year. We are now getting a better look at the basal fungal genomes including the Chytrids and Zygomycetes. Much more on specifics of Phycomyces biology and history are on this site run by the group organizing the genome analysis.

I find one of the most interesting things about P. blakesleeanus is its phototropism. We know light sensins is controlled in part by the gene white-collar 1. A homolog of this gene in Neurospora crassa is involved as an oscillator circadian rhythm. Of course many more genes are involve in pathways for light sensing including some really old proteins like phytochromes.

There will be a lot of cool analyses to do with this genome beyond phototropism. I am looking forward to seeing what gene families are unique and expanded in this species relative to the other zygomycete. It also looks like it is quite intron rich much like the Basidiomycetes, further supporting the idea that fungi had intron rich ancestors.