This recent paper, Pandey & al (2014), made me interested because I’m in the business of finding genes for traits, and have spent quite some time looking at lists of gene names and annotation database output. One is tempted to look for the ”outstanding candidates” that ”make biological sense” (quotes intended as scare quotes), but the truth is probably that no-one knows what genes and functions we should expect to be affected by genetic variation in, for instance, behaviour. This paper tries to make the case for the unknown parts of the brain transcriptome; they use data about gene expression, protein domains, paralogs and literature to argue that the unknown genes are unknown for no good reason and that they might be just as important as genes that happen to be well-known.
They found genes that are had a high ratio of expression in brain to average expression in other tissues of C57BL/6J and DBA/2J mice and searched PubMed for these genes in combination with neuroscience-related keywords. Some of them have few citations and these are their selectively expressed but little studied genes. They then make a series of comparisons between these and well-studied genes. It turns out the only major difference is that well-studied genes were discovered (entered into GenBank) earlier.
I don’t know to what extent these results are suprising. I was not surprised by their main conclusion, but then again, that maybe my opinion was mostly prejudice. There is a literature on biases in the functional genomics literature, but I don’t know much about it. And apparently neither did the authors, initially, as Robert Williams writes in a comment on the PLOS ONE website:
We did not rediscover the lovely work of Robert Hoffmann (now head of WikiGene) until the paper had been submitted in succession to six higher profile journals … Hoffmann and colleagues showed that social factors account for much of the annotation imbalance for genes.
I love the idea of authors writing an informal comment about the background of the paper like this.
The coexpression network results show some of the little known genes are just as connected as known important genes. This suggest some of the unknown genes might be important too, if we can trust that coexpression hub genes are likely to be important (for various values of ”important”). Maybe this is a scientific opportunity for some neuroscientist. Several people I’ve talked with has imagined future Big Science initiatives to describe the function of unknown genes — ”divide them up between labs and characterise them!” — and some initiatives exist, such as the IMPC. On the other hand, how do we know that we really find the most important and interesting functions of a gene? The skeptic in me thinks that going bottom up, from gene to phenotype, will miss the most interesting surprising phenotypes.
I think ”ignorome” is one of those unnecessary bad omics words, which is why I’ve avoided using it.
Their PubMed query was restricted to mouse, human and rat. I wonder why. Maybe there could be something useful from fruit flies or roundworms?
Overall, a fun paper that I recommend reading over a few cups of coffee!
Pandey AK, Lu L, Wang X, Homayouni R, Williams RW (2014) Functionally Enigmatic Genes: A Case Study of the Brain Ignorome. PLoS ONE 9(2): e88889. doi:10.1371/journal.pone.0088889