Interactions between genetic and epigenetic

More speculation about epigenetics and ways that epigenetic mechanisms of gene regulation could contribute to differences between individuals. Many cases, both in plants and animals, have to do with transposable elements, which makes a lot of sense since DNA methylation is involved in silencing the expression of transposable elements. Think about genetical genomics studies such as Gibbs & al (2010), where gene expression and DNA methylation is mapped to genomic regions. First, when expression QTL and methylation QTL coincide, it might be a good idea to start looking for transposable element insertions. Finding them are not as easy as finding SNPs, but hopefully, there will be SNPs tagging the actual variant and DNA methylation will spread outside of the inserted element to CpGs that are being typed. The element itself could of course work as a promoter, but it could also spread methylation into regulatory sequences of the gene, suppressing expression, or increase expression by changing the effect of an insulator.

Second, apparently the DNA methylation of transposable elements can sometimes be variable. This is the case with axin fused, Cabp-IAP and the agouti epialleles (Druker & al 2004; Vasicek & al 1997; Morgan & al 1999); among mice that carry the insertion there is DNA methylation variation causing phenotypic differences. This means that in populations where the insertion segregates, there should be a DNA methylation by gene interaction in the effect on the phenotype. I think that is fun, and I’d like to see someone find that in a mapping study. It might make things more difficult, though. The methylation–gene expression association might be hard to detect because it only exists in one of the alleles.

Third, maybe that is actually how a DNA methylation variant might escape reprogramming. Since some transposable elements are among the sequences that are not demethylated after fertilisation, and if that effect also applies to the newly inserted copy of the transposable element, our hypothetical regulatory methylation difference might be preserved through meiosis that way.


Gibbs, J. R., van der Brug, M. P., Hernandez, D. G., Traynor, B. J., Nalls, M. A., Lai, S. L., … & Singleton, A. B. (2010). Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS genetics, 6(5), e1000952.

Morgan, H. D., Sutherland, H. G., Martin, D. I., & Whitelaw, E. (1999). Epigenetic inheritance at the agouti locus in the mouse. Nature genetics, 23(3), 314-318.

Vasicek, T. J., Zeng, L. I., Guan, X. J., Zhang, T., Costantini, F., & Tilghman, S. M. (1997). Two dominant mutations in the mouse fused gene are the result of transposon insertions. Genetics, 147(2), 777-786.

Druker, R., Bruxner, T. J., Lehrbach, N. J., & Whitelaw, E. (2004). Complex patterns of transcription at the insertion site of a retrotransposon in the mouse. Nucleic acids research, 32(19), 5800-5808.

A note on using R: Residuals from a linear model with missing values

(Not på svenska: Det här är något jag kanske kommer göra då och då — skriva en liten praktiskt inriktad kommentar om något jag upptäckt i arbetet med något visst (oftast datorbaserat) verktyg — något jag skulle velat hitta när jag googlade problemet.)

(This is something I might do more often: posting a small practical thing I’ve found useful, as an attempt to help a fellow user who’s trying to google his or her way to a solution.)

Occasionally when analysing data, you feel the need to pull out the residuals from a linear model — e.g. when trying to control for a bunch of covariates. In R, you can do this very easily with the residuals() function. This works fine with no NAs:

> data(BOD)
   Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8
> residuals(lm(demand ~ Time, data=BOD))
         1          2         3         4          5          6
-1.9428571 -1.6642857 5.3142857 0.5928571 -1.5285714 -0.7714286

However, there’s a slight difficulty when there are NAs in the data. If you assume the residuals will have the same dimensions and order of elements as the input data, your stuff might break.

> BOD$demand[5] <- NA
> residuals(lm(demand ~ Time, data=BOD))
         1          2         3         4          6 
-1.9716981 -1.8084906 5.0547170 0.2179245 -1.4924528

I used to use a small work-around that used the fact that residuals() saves the row names of the original data as names in the residual vector. Then I found that you could get the desired behaviour — at least, what I want is usually for the function to return the a vector of the same length as the input, where NA data points give NA residual values — by simply putting in a na.action argument, like so:

> residuals(lm(demand ~ Time, data=BOD, na.action=na.exclude))
         1          2         3        4         5          6
-1.9716981 -1.8084906 5.0547170 0.2179245       NA -1.4924528

Sometimes, R is pretty neat.