Morning coffee: multilevel drift

20170204_185122

There is an abstract account of natural selection (Lewontin 1970) where one observes that any population of entities, whatever they may be, will evolve through natural selection if (1) there is variation, that (2) affects reproductive success, and (3) is heritable.

I don’t know how I missed this before, but it recently occurred to me that there must be a similarly abstract account of drift, where a population will evolve through drift if there is (1) variation, (2) that is heritable, and (3) sampling due to finite population size.

Drift may not be negligible, especially since at a higher level of organization, the population size should be smaller, making natural selection relatively less efficient.

Morning coffee: against validation and optimization

20160924_130554

It appears like I’m accumulating pet peeves at an alarming rate. In all probability, I am guilty of most of them myself, but that is no reason not to complain about them on the internet. For example: Spend some time in a genetics lab, and you will probably hear talk of ”validation” and ”optimization”. But those things rarely happen in a lab.

According to a dictionary, to ”optimize” means to make something as good as possible. That is almost never possible, nor desirable. What we really do is change things until they work according to some accepted standard. That is not optimization; that is tweaking.

To ”validate” means to confirm to that something is true, which is rarely possible. Occasionally we have something to compare to that you are really sure about, so that if a method agrees with it, we can be pretty certain that it works. But a lot of time, we don’t know the answer. The best we can do is to gather additional evidence.

Additional evidence, ideally from some other method with very different assumptions, is great. So is adjusting a protocol until it performs sufficiently well. So why not just say what we mean?

”You keep using that word. I do not think that it means what you think it means.”

Morning coffee: reviewing

20160417_125609

(It was a long time since I did one of these posts. I’d better get going!)

One fun thing that happened after I received my PhD is that I started getting requests to review papers, four so far. Four papers (plus re-reviews of revised versions) in about a year probably isn’t that much, but it is strictly greater than zero. I’m sure the entertainment value in reviewing wears off quite fast, but so far it’s been fun, and feels good to pay off some of the sizeable review debt I’ve accumulated while publishing papers from my PhD. Maybe I’m just too naïve and haven’t seen the worst parts of the system yet, but I don’t feel that I’ve had any upsetting revelations from seeing the process from the reviewer’s perspective.

Of course, peer review, like any human endeavour, has components of politics, ego and irrationality. Maybe one could do more to quell those tendencies. I note that different journals have quite different instructions to reviewers. Some provide detailed directions, laying out things that the reviewer should and shouldn’t do, while others just tell you how to use their web form. I’m sure editorial practices also differ.

One thing that did surprise me was when an editor changed the text of a review I wrote. It was nothing major, not a case of removing something inappropriate, but rewording a recommendation to make it stronger. I don’t mind, but I feel that the edit changed the tone of the review. I’ve also heard that this particular kind of comment (when a reviewer states that something is required for a paper to be acceptable for publication) rubs some people the wrong way, because that is up to the editor to decide. In this case, the editor must have felt that a more strongly worded review was the best way to get the author to pay attention, or something like that. I wonder how often this happens. That may be a reason to be even more apprehensive about signing reviews (I did not sign).

So far, I’ve never experienced anything else than single-blind review, but I would be curious to review double-blinded. I doubt the process would differ much: I haven’t reviewed any papers from people I know about, and I haven’t spent any time trying to learn more about them, except in some cases checking out previous work that they’ve referenced. I don’t expect that I’d feel any urge to undertake search engine detective work to figure out who the authors were.

Sometimes, there is the tendency among scientists and non-scientists alike to elevate review to something more than a couple of colleagues reading your paper and commenting on it. I’m pretty convinced peer review and editorial comments improve papers. And as such, the fact that a paper has been accepted by an editor after being reviewed is some evidence of quality. But peer review cannot be a guarantee of correctness. I’m sure I’ve missed and misunderstood things. But still, I promise that I’ll do my best, and I will not have the conscience to turn down a request for peer review for a long time. So if you need a reviewer for a paper on domestication, genetic mapping, chickens or related topics, keep me in mind.

Morning coffee: cost per genome

I recently heard this thing referred to as ”the most overused slide in genomics” (David Klevebring). It might be: what it shows is some estimate of the cost of sequencing a human genome over time, and how it plummets around 2008. Before that, the curve is Sanger sequencing, and then the costs show second generation sequencing (454, Illumina and SOLiD).

cost_genome

The source is the US National Human Genome Research Institute, and they’ve put some thought into how to estimate costs so that machines, reagents, analysis and people to do the work are included and that the different platforms are somewhat comparable. One must first point out that downstream analysis to make any sense of the data (assembly and variant calling) isn’t included. But the most important thing that this graph hides, even if the estimates of the cost would be perfect, is that to ”sequence a genome” means something completely different in 2001 and 2015. (Well, with third generation sequencers that give long reads coming up, the old meaning might come back.)

For data since January 2008 (representing data generated using ‘second-generation’ sequencing platforms), the ”Cost per Genome” graph reflects projects involving the ‘re-sequencing’ of the human genome, where an available reference human genome sequence is available to serve as a backbone for downstream data analyses.

The human genome project was of course about sequencing and assembling the genome into high quality sequences. Very few of the millions of human genomes resequenced since are anywhere close. As people in the sequencing loop know, resequencing with short reads doesn’t give you a genome sequence (and neither does trying to assemble a messy eukaryote genome with short reads only). It gives you a list of variants compared to the reference sequence. The usual short read business has no way of detect anything but single nucleotide variants and small indels. (And the latter depends … Also, you can detect copy number variants, but large scale structural variants are mostly off the table.) Of course, you can use these edits to reconstruct a consensus sequence from the reference, but it would be a total lie.

Again, none of this is news for people who deal with sequencing, and I’m not knocking second-generation sequencing. It’s very useful and has made a lot of new things possible. It’s just something I think about every time I see that slide.

Morning coffe: ”epigenetics” is also ambiguous

IMG_20140228_175448

I believe there is an analogy between the dual meaning of the word ”gene” and two senses of epigenetics, that this distinction is easy to get wrong and that it contributes to the confusion about the meaning of epigenetics. Gene can mean a sequence that has a name and a function, or it can mean a genetic variant. I sometimes, half-jokingly, call this genetics(1) and genetics(2). The order is wrong from a historical perspective, since the study of heritable variation predates the discovery of molecular genes. The first deals with the function of sequences and their products. The second deals with differences between individuals carrying different variants.

The same can be said about epigenetics. On one hand there is epigenetics(1), aiming to understand the normal function of certain molecular features, i.e. gene regulatory states that can be passed on through cell division. On the other hand, epigenetics(2) aims to explain individual variation between individuals that differ not in their DNA sequence but in other types of heritable states. And the recurring reader knows that I think that, since a lot of genetics(2) makes no assumptions about the molecular nature of the variation it studies, it will mostly work even if some of these states turn out to be epigenetic. In that sense, epigenetics(2) is a part of genetics.

Also: the spectre of epigenetic inheritance

What is is that is so scandalous about epigenetic inheritance? Not much, in my opinion. Some of the points on the spectrum clearly happen in the wild: stable and fluctuating epigenetic inheritance in plants, parental effects in animals and genomic imprinting in both. Widespread epigenetic inheritance in animals would change a lot of things, of course, but even if epigenetic inheritance turns out to be really important and common, genetics and evolution as we know them will not break. The tools to study and understand them are there.

Looking back at the post from yesterday, there are different flavours of epigenetic inheritance. At the most heritable end of the spectrum, epigenetic variants behave pretty much like genetic variants. Because quantitative genetics is agnostic to the molecular nature of the variants, as long as they behave like an inheritance system, most high-level genetic analysis will work the same. It’s just that on the molecular level, one would have to look to epigenetic marks, not to sequence changes, for the causal variant. Even if a substantial proportion of the genetic variance is caused by epigenetic variants rather than DNA sequence variants, this would not be a revolution that changes genetics or evolution into something incommensurable with previous thought.

The most revolutionary potential lies somewhere in the middle of the scale, in parental effects with really high fidelity of transmission that are potentially responsive to the environment, but in principle these things can still be dealt with by the same theoretical tools. Most people just didn’t think they were that important. How about soft inheritance? It seems dramatic, but all examples deal with specific programmed mechanisms: soft inheritance of the sensitivity to a particular odour or of the DNA methylation and expression state of a particular locus. No-one has yet suggested a generalised Lamarckian mechanism; that is still out of the question. DNA mutations are still unable to pass from somatic cells to gametes. Whatever tricks transgenerational mechanisms use to skip over the soma–germline distinction, they must be pretty exceptional. Discoveries of widespread soft inheritance in nature would be surprising, a cause for rethinking certain things and great fun. But conceptually, it is parental effects writ large. We can understand that. We have the technology.

Morning coffee: the spectrum of epigenetic inheritance

IMG_20140228_175433

Let us think aloud about the different possible meanings of epigenetic inheritance. I don’t want to contribute to unnecessary proliferation of terminology — people have already coined molar/molecular epigenetics (Crews 2009), intergenerational/transgenerational effects (Heard & Martienssen 2014), and probably several more dichotomies. But I thought it could be instructive to try to think about epigenetic inheritance in terms of the contribution it could make to variance components of a quantitative genetic model. After all, quantitative genetics is mostly agnostic about the molecular nature of the heritable variation.

At one end of the spectrum we find molecular epigenetic marks such as DNA methylation, as they feature in the normal development of the organism. Regardless of how faithfully they are transmitted through mitosis, or even if they pass through meiosis, they only contribute to individual variation if they are perturbed in different ways between individuals. If they do vary between individuals, though, in a fashion that is not passed on to the offspring, they will end up in the environmental variance component.

What about transmissible variation? There are multiple non-genetic ways for information to be passed a single generation: maternal or paternal effects need not be epigenetic in the molecular sense. They could be, like genomic imprinting, but they could also be caused by some biomolecule in the sperm, something that passes the blood–placenta barrier or something deposited by the mother into the egg. Transgenerational effects of this kind make related individuals more similar, they will affect the genetic variance component unless they are controlled. And in the best possible world of experimental design, parental effects can be controlled and modelled, and we can in principle separate out the maternal, paternal and genetic component. Think of effects like in Weaver & al (2004) that are perpetuated by maternal behaviour. If the behavioural transmission is strong enough they might form a pretty stable heritable effect that would appear in the genetic variance component if it’s not broken up by cross-fostering.

However, if the variation behaves like germ-line variation it will be irreversible by cross-fostering, inseparable from the genetic variance component, and it will have the potential to form a genuine parallel inheritance system. The question is: how stable will it be? Animals seem to be very good at resetting the epigenetic germline each generation. The most provocative suggestion is probably some type of variation that is both faithfully transmitted and sometimes responsive to the environment. Responsiveness means less fidelity of transmission, though, and it seems (Slatkin 2009) like epigenetic variants need to be stable for many generations to make any lasting impact on heritability. Then, at the heritable end of the spectrum, we find epigenetic variants that arise from some type of random mutation event and are transmitted faithfully through the germline. If they exist, they will behave just like any genetic variants and even have a genomic locus.

Morning coffee: short papers

kaffe_block

I’m going to quote in full the ”methods summary” from the latest FTO/IRX3 Nature paper (Smemo & al 2014); I think the paper is great and I’m only using it as an example of the format of Nature letters:

For 4C-seq, chromatin was digested with DpnII and Csp6I. Captured DNA was amplified using promoter-specific primers and deep sequenced.
For 3C, nuclei were digested with HindIII. Primer quality was assessed using serial dilutions of BACs encompassing the regions of interest (RP23-268O10, RP23-96F3). The average of four independent experiments is represented graphically (Extended data Fig 2c).

For anyone who hasn’t read the paper: it is not only about chromatin capture. The results rely on gene expression, reporter experiments in mice, phenotyping of knockout mice etc etc etc.

I read quite a lot of Nature and Science papers. Yes, one should be critical of the role of glamorous journals, but they publish a lot of things that either is interesting to me or get a lot of media attention. But the papers are not really papers, are they? They are too short to fit all the details, even with the additional online methods part. What does it say about a journal that it forces authors to cut out the methods, the most important section for judging reliability of the results? What does it say about me as a reader that I often don’t bother going to the supplementary material to read them? It’s not very flattering for any of us, I’m sure. If you suffer from such a shortage of paper that you have to remove a section of each article and hide it online it should be the discussion, not the methods.

Literature

Smemo & al (2014) Obesity-associated variants within FTO form long-range functional connections with IRX3.
Nature

Morning coffee: scripting language

Several people have asked: what scripting language should biologists learn if they are interested in doing a little larger-scale data analysis and have never programmed before? I’m not an expert, but these are the kinds of things I tend to say:

The language is not so important; the same principles apply everywhere. Use what your friends and colleagues use so you can get help from them. I believe most people would answer Python. I would answer R. Don’t believe people who tell you that R is not a serious language. You’re already familiar with analysing small datasets in a statistics program. You can do that in R too, and then the step to writing code and handling larger projects is actually very short. Your data will very likely come in tables, and R is very good at that. You’ll also want pretty graphs, and R is very good at that too. Regardless, have a look at the other common languages as well. Practice working from a terminal.

Morning coffee: the selfish gene versus the world

kaffe_knä

The distinction between ”gene” in the sense of an allele at some locus and ”gene” in the sense of a dna sequence with a name and some function seems easy enough, but still causes a lot of confusion, both in popular and scientific literature.

This was very clear a few months ago when science journalist David Dobbs published his ”Die selfish gene, die” and a few weeks of debate broke out. In my opinion it’s not a particularly good piece, but I agree with Dobbs that the ”selfish gene” metaphor sometimes invites misunderstandings. The article itself displays a few of them, when it suggests that evolution and genetics as understood before the age of microarrays are somehow at odds with the importance of gene regulation or phenotypic plasticity. I suspect that many of these problems stem from the double meaning of the word gene. Other examples are found in headlines claiming that researchers have found the gene for something or the confusion about the word pleiotropy (Paaby & Rockman 2012).

When Dawkins wrote about the selfish gene, he did not mean the selfish dna sequence encoding a protein; he meant the selfish genetic variant causing differences in fitness between individuals. (Or rather, a set of genetic variants in sufficiently close linkage to seldom be separated by recombination.) The book is not about molecular genes. As anyone who actually read it knows, it deals mostly with behaviour using game theory approaches. This does not mean that Dawkins denied that there are actual molecular genes doing the mechanistic work, but that he analysed the situation mostly on a different level. And had he chosen to write only about known sequence variants with adaptive effects on behaviour it would have been a very short book.

Of course the word ”selfish”, while I agree that it is the proper word in the sense that Dawkins intended, is great for those who want to point to instances where people are horrible to each other and tell you that it’s all because of evolution. But I think that is a bigger issue that will not be solved by tweaking popular science metaphors. By the way, that is completely contrary to Dawkins’ intentions, which were to popularise the evolutionary models that explain why animals are not always horrible to each other, even though their behaviour is shaped by natural selection.