Great visit with visiting speaker today. So many great ideas. Now to find people to help work on this in the lab.
After a visit to Cornell visiting with Plant Pathology & Plant-Microbe Biology dept, Pseudomonas syringae groups, and SGN and also a breakfast with Tim Hubbard when he was in Berkeley I had a few ideas.
- We need to be able to put the power of annotation in the hands of more people. Community assisted annotation at the level of just function, linking to articles, and general curation should be accessible ala-wikipedia.
- For genome annotation though, there is a more specialized need to be able to incorporate data from different sources. Git-like repository for genome annotation (in GFF) which can be served up to Gbrowse. Edits can be saved to ones own branch. (all of this assumes the same reference genome assembly which is about the level I’m comfortable worrying about — tho some of the genome projector type tools would seem to make it easy to lift annotation from one assembly to the other).
- Would probably necessitate a GenomeAnnotationDiff tool. This might be already accomplished by tools that the Yandell lab has produced described in publication by Eilbeck et al.
- Gene page with community annotation tools at SGN are ready to go and they have VMs to avoid having to install all the software. I even saw a cool QTL on the fly calculation. The challenges I see in our data is always linking the data from one context to another how we make this useful. Will have to try and do a transformation of some of the different data we have here.
- The SGN approach is to use aspects of Chado for the schema that deals with ontologies/controlled vocabularies but to also have domain specific databases for annotation and related info rather than the giant “everything is a feature” that is the Chado-way and doesn’t seem to scale.
- It is about time to try out hadoop/MapReduce on our big datasets and to also earnestly start running the automated the all-vs-all ortholog prediction scripts on our genomes, there are just too many times it seems important to have an updated dataset – something to deploy on new hardware environment this summer.
- No one has figured out how to interface with NCBI/GenBank/EMBL to deal with the updating of genomes in a sensible — basically all the really complicated systems are essentially keeping the bulk of the data in their own domain-specific databases and at some appointed times feeding that data back in, but often this is a huge process and only works where there is a real effort from both NCBI/GenBank/EMBL and the group. E.g. Ensembl has the CCDS and RefSeq projects that can take the output from Ensembl and feed that back into the system.
What would a comparative reannotation of X fungal genomes system be able to do with the data?
On the Plant Path & fungal side of discussions
- Looking at multiple genotypes of both the host and pathogen seem like a really smart way to start to explore the effects of mutations. With so many more tools now in both systems it seems like this would be next logical arraying of experimental designs.
- I really need to get some movies made of Bd (Chytrid) zoospores swimming around, would make for better introductions to talks, I had to settle for showing oomycete zoospores which are cool but not the same.
- There needs to be new/better tools for population genetics for systems where the populations are clearly not in Hardy-Weinberg equilibrium such as newly introduced pathogens
- Closeup pictures of fungi are really cool especially through the boroscope