Andrew Gelman sometimes writes that in genetics it might make sense to have a null hypothesis of zero effect, but in social science nothing is ever exactly zero (and interactions abound). I wonder whether that is actually true even for genetics. Think about pleiotropy. Be it universal or modular, I think the evidence still points in the direction that we should expect any genetic variant to affect lots of traits, albeit with often very small effects. And think of gene expression where genes always show lots of correlation structure: do we expect transcripts from the same cells to ever be independent of each other? It doesn’t seem to me that the null can be strictly true here. Most of these differences have to be too small for us to practically be able to model them, though — and maybe the small effects are so far below the detection limit that we can pretend that they could be zero. (Note: not trying to criticise anybody’s statistical method or view of effect sizes here, just thinking aloud about the ”no true null effect” argument.)
A while ago I wrote a bit about the recent paper on epigenetic inheritance of acetophenone sensitivity and odorant receptor expression. I spent most of the post talking about potential problems, but actually I’m not that negative. There is quite a literature building up about these transgenerational effects, that is quite inspiring if a little overhyped. I for one do not think epigenetic inheritance is particularly outrageous or disrupting to genetics and evolution as we know it. Take this paper: even if it means inheritance of an acquired trait, it is probably not very stable over the generations, and it is nothing like a general Lamarckian transmission mechanism that can work for any trait. It is probably very specific for odourant receptors. It might allow for genetic assimilation of fear of odours though, which would be cool, but probably not at all easy to demonstrate. But no-one knows how it works, if it does — there are even multiple unknown steps. How does fear conditioning translate to DNA methylation differences sperm that translates to olfactory receptor expression in the brain of the offspring?
A while after the transgenerational effects paper I saw this one in PNAS: Rare event of histone demethylation can initiate singular gene expression of olfactory receptors (Tan, Song & Xie 2013). I had no idea olfactory receptor expression was that fascinating! (As is often the case when you scratch the surface of another problem in biology, there turns out to be interesting stuff there …) Mice have lots and lots of odorant receptor genes, but each olfactory neuron only expresses one of them. Apparently the expression is regulated by histone 3 lycine 9 methylation. The genes start out methylated and suppressed, but once one of them is expressed it will keep all other down by downregulating a histone demethylase. This is a modeling paper that shows that if random demethylation happens slowly enough and the feedback to shut down further demethylation is fast enough, these steps are sufficient to explain the specificity of expression. There are some connections between histone methylation and DNA methylation: it seems that DNA methylation binds proteins that bring histone methylases to the gene (review Cedar & Bergman 2009). Dias & Ressler saw hypomethylation near the olfactory receptor gene in question, Olfr151. Maybe that difference, if it survives through to the developing brain of the offspring, can make demethylation of the locus more likely and give Olfr151 a head start in the race to become the first expressed receptor gene.
Brian G Dias & Kerry J Ressler (2013) Parental olfactory experience influences behavior and neural structure in subsequent generations Nature neuroscience doi:10.1038/nn.3594
Longzhi Tan, Chenghang Zong, X. Sunney Xie (2013) Rare event of histone demethylation can initiate singular gene expression of olfactory receptors. PNAS 10.1073/pnas.1321511111
Howard Cedar, Yehudit Bergman (2009) Linking DNA methylation and histone modification: patterns and paradigms. Nature reviews genetics doi:10.1038/nrg2540
Who still uses gene expression microarrays? I do and lots of other people do. And even though it’s pretty clear that RNA-seq is better, as long as it’s more expensive — and it probably still is for many combinations of microarray and sequencing platforms — the trade-off between the technical variability and sample size should still favour microarrays. But the breaking point probably occurs about right now, and I’m looking forward to seeing lots of sequencing based genetical genomics with splice-eQTL, antisense RNA-eQTL and what not! But then again, the same might happen for RNA-seq in a few years: I hope people stick with current generation massively parallel sequencing long enough to get decent sample sizes instead of jumping to small-N studies with the next technology.
In the simplest terms pleiotropy means genetic side-effects: a pleiotropic gene is a gene that does several things and a pleiotropic variant is a variant that makes its carrier different from carriers of other variants in more than one trait. It’s just that the words ‘gene’ , ‘trait’ and ‘different’ are somewhat ambiguous. Paaby & Rockman (2013) have written a nice analytical review about the meaning of pleiotropy. In their terminology, molecular gene pleiotropy is when the product of a gene is involved in more than one biological process. Developmental pleiotropy, on the other hand, deals with genetic variants: a variant is developmentally pleiotropic if it affects more than one trait. This is the sense of the word I’d normally think of. Third, selectional pleiotropy is deals with variants that affect several aspects of fitness, possibly differently for different individuals.
Imagine that we have found a variant associated with two variables. Have we got a pleiotropic variant on our hands? If the variables are just different measures of the same thing, clearly we’re dealing with one trait. But imagine that the variables are actually driven by largely different factors. They might respond to different environmental stimuli and have mostly separate genetic architectures. If so, we have two different traits and a pleiotropic variant affecting both. My point is that it depends on the actual functional relationship between the traits. Without knowing something about how the organism works we can’t count traits. With that in mind, it seems very bold to say things about variants in general and traits in general. Paaby & Rockman’s conclusion seems to be that genetic mapping is not the way to go, because of low power to detect variants of small effect, and instead they bring up alternative statistical and quantitative genetics methods to demonstrate pleiotropy on a large scale. I agree that these results reinforce that pleiotropy must be important, in some sense of the word. But I think the opposite approach still has value: the way to figure out how important pleiotropy is for any given suite of traits is to study them mechanistically.
(Zombie kitty by Anna Nygren.)
Valen Johnson recently published a paper in PNAS about Bayes factors and p-values. In null hypothesis testing p-values measure the probability of seeing data this extreme or more extreme, if the null hypothesis is true. Bayes factors measures the ratio between the posterior probability of the alternative hypothesis to the posterior probability of the null hypothesis. The words ‘probability of the hypothesis’ tells us we’re in Bayes land, but of course, that posterior probability comes from combining the prior probability with the likelihood, which is the probability of generating the data under the hypothesis. So the Bayes factor considers not only what happens if the null is true, but what happens if the alternative is true. That is one source of discrepancies between them. Johnson has found a way to construct Bayes factors so that they correspond certain common hypothesis tests (including an approximation for the t-test, so there goes most of biology), and found for many realistic test situations a p-value of 0.05 corresponds to pretty weak support in terms of Bayes factors. Therefore, he suggests the alpha level of hypothesis tests should be reduced to at least 0.005. I don’t know enough about Bayes factors to really appreciate Johnson’s analysis. However, I do know that some responses to the paper make things seem a bit too easy. Johnson writes:
Of course, there are costs associated with raising the bar for statistical significance. To achieve 80% power in detecting a standardized effect size of 0.3 on a normal mean, for instance, decreasing the threshold for significance from 0.05 to 0.005 requires an increase in sample size from 69 to 130 in experimental designs. To obtain a highly significant result, the sample size of a design must be increased from 112 to 172.
If one does not also increase the sample sizes to preserve — or, I guess, preferably improve — power, just reducing the alpha level to 0.005 will only make matters worse. With low power comes, as Andrew Gelman likes to put it, high Type M or magnitude error rate. That is if power is bad enough not only will there be few significant findings, but all of them will be overestimates.
(Note: ‘Morning coffee’ will be short musings about science-related topics.)
I don’t like tables. Or, more precisely: I don’t like tables that I have to read, but I love telling my computer to read tables for me. Tables made for human eyes tend to have certain features — I don’t know whether they really help facilitate human understanding, but people seem to think they do — such as merged cells or omission of repeated values, footnotes indicated by superscript symbols and sometimes colouring that conveys meaning. There is a conflict between keeping the number of columns small enough to be readable and putting in all the statistics that readers want. Someone might want the coefficient of determination while someone who of information theoretic persuasion wants the AIC. It is more convenient for the human reader to see the table close to the text, while the computer user would probably like it in a text file. Some journals do this almost right: right below the table there is a link to download it as comma separated values. I think ideally any data would be presented as a summary table — or even better a graph! — and the underlying computer-readable data would be the click of a link away.