Adrian Bird on genome ecology

I recently read this essay by Adrian Bird on ”The Selfishness of Law-Abiding Genes”. That is a colourful title in itself, but it doesn’t stop there; this is an extremely metaphor-rich piece. In terms of the theoretical content, there is not much new under the sun. Properties of the organism like complexity, redundancy, and all those exquisite networks of developmental gene regulation may be the result of non-adaptive processes, like constructive neutral evolution and intragenomic conflict. As the title suggests, Bird argues that this kind of thinking is generally accepted about things like transposable elements (”selfish DNA”), but that the same logic applies to regular ”law-abiding” genes. They may also be driven by other evolutionary forces than a net fitness gain at the organismal level.

He gives a couple of possible examples: toxin–antitoxin gene pairs, RNA editing and MeCP2 (that’s probably Bird’s favourite protein that he has done a lot of work on). He gives this possible description of MeCP2 evolution:

Loss of MeCP2 via mutation in humans leads to serious defects in the brain, which might suggest that MeCP2 is a fundamental regulator of nervous system development. Evolutionary considerations question this view, however, as most animals have nervous systems, but only vertebrates, which account for a small proportion of the animal kingdom, have MeCP2. This protein therefore appears to be a late arrival in evolutionary terms, rather than being a core ancestral component of brain assembly. A conventional view of MeCP2 function is that by exerting global transcriptional restraint it tunes gene expression in neurons to optimize their identity, but it is also possible to devise a scenario based on self-interest. Initially, the argument goes, MeCP2 was present at low levels, as it is in non-neuronal tissues, and therefore played little or no role in creating an optimal nervous system. Because DNA methylation is sparse in the great majority of the genome, sporadic mutations that led to mildly increased MeCP2 expression would have had a minimal dampening effect on transcription that may initially have been selectively neutral. If not eliminated by drift, further chance increases might have followed, with neuronal development incrementally adjusting to each minor hike in MeCP2-mediated repression through compensatory mutations in other genes. Mechanisms that lead to ‘constructive neutral evolution’ of this kind have been proposed. Gradually, brain development would accommodate the encroachment of MeCP2 until it became an essential feature. So, in response to the question ‘why do brains need MeCP2?’, the answer under this speculative scenario would be: ‘they do not; MeCP2 has made itself indispensable by stealth’.

I think this is a great passage, and it can be read both as a metaphorical reinterpretation, and as substantive hypothesis. The empirical question ”Did MeCP2 offer an important innovation to vertebrate brains as it arose?”, is a bit hard to answer with data, though. On the other hand, if we just consider the metaphor, can’t you say the same about every functional protein? Sure, it’s nice to think of p53 as the Guardian of the Genome, but can’t it also be viewed as a gangster extracting protection money from the organism? ”Replicate me, or you might get cancer later …”

The piece argues for a gene-centric view, that thinks of molecules and the evolutionary pressures they face. This doesn’t seem so be the fashionable view (sorry, extended synthesists!) but Bird argues that it would be healthy for molecular cell biologists to think more about the alternative, non-adaptive, bottom-up perspective. I don’t think the point is to advocate that way of thinking to the exclusion of the all other. To me, the piece reads more like an invitation to use a broader set of metaphors and verbal models to aid hypothesis generation.

There are too may good quotes in this essay, so I’ll just quote one more from the end, where we’ve jumped from the idea of selfish law-abiding genes, over ”genome ecology” — not in the sense of using genomics in ecology, but in the sense of thinking of the genome as some kind of population of agents with different niches and interactions, I guess — to ”Genetics Meets Sociology?”

Biologists often invoke parallels between molecular processes of life and computer logic, but a gene-centered approach suggests that economics or social science may be a more appropriate model …

I feel like there is a circle of reinforcing metaphors here. Sometimes when we have to explain how something came to be, for example a document, a piece of computer code or a the we do things in an organisation, we say ”it grew organically” or ”it evolved”. Sometimes we talk about the genome as a computer program, and sometimes we talk about our messy computer program code as an organism. Like viruses are just like computer viruses, only biological.

Literature

Bird, Adrian. ”The Selfishness of Law-Abiding Genes.” Trends in Genetics 36.1 (2020): 8-13.

Journal club of one: ”Evolutionary stalling and a limit on the power of natural selection to improve a cellular module”

This is a relatively recent preprint on how correlations between genetic variants can limit the response to selection, with experimental evolution in bacteria.

Experimental evolution and selection experiments live on the gradient from modelling to observations in the wild. Experimental evolution researchers can design the environments and the genotypes to pose problems for evolution, and then watch in real time as organisms solve them. With sequencing, they can also watch how the genome responds to selection.

In this case, the problem posed is how to improve a particular cellular function (”module”). The researcher started out with engineered Escherichia coli that had one component of their translation machinery manipulated: they started out with only one copy of an elongation factor gene (where E.coli normally has two) that could be either from another species, an reconstructed ancestral form, or the regular E.coli gene as a control.

Then, they sequenced samples from replicate populations over time, and looked for potentially adaptive variants: that is, operationally, variants that had large changes in frequency (>20%) and occurred in genes that had more than one candidate adaptive variant.

Finally, because they looked at what genes these variants occurred in. Were they related to translation (”TM-specific” as they call it) or not (”generic”). That gave them trajectories of potentially adaptive variants like this. The horizontal axis is time and the vertical frequency of the variant. The letters are populations of different origin, and the numbers replicates thereof. The colour shows the classification of variants. (”fimD” and ”trkH” in the figure are genes in the ”generic” category that are of special interest for other reasons. The orange–brown shading signifies structural variation at the elongation factor gene.)

This figure shows their main observations:

  • V, A and P populations had more adaptive variants in translation genes, and also worse fitness at the start of the experiment. This goes together with improving more during the experiment. If a population has poor translation, a variant in a translation gene might help. If it has decent translation efficiency already, there is less scope for improvement, and adaptive variants in other kinds of genes happen more often.

    We found that populations whose TMs were initially mildly perturbed (incurring ≲ 3% fitness cost) adapted by acquiring mutations that did not directly affect the TM. Populations whose TM had a moderately severe defect (incurring ~19% fitness cost) discovered TM-specific mutations, but clonal interference often prevented their fixation. Populations whose TMs were initially severely perturbed (incurring ~35% fitness cost) rapidly discovered and fixed TM-specific beneficial mutations.

  • Adaptive variants in translation genes tended to increase fast and early during the experiment and often get fixed, suggesting that they have larger effects than. Again, the your translation capability is badly broken, a large-effect variant in a translation gene might help.

    Out of the 14 TM-specific mutations that eventually fixed, 12 (86%) did so in the first selective sweep. As a result, an average TM-specific beneficial mutation reached fixation by generation 300 ± 52, and only one (7%) reached fixation after generation 600 … In contrast, the average fixation time of generic mutations in the V, A and P populations was 600 ± 72 generations, and 9 of them (56%) fixed after the first selective sweep

  • After one adaptive variant in a translation gene, it seems to stop at that.

The question is: when there aren’t further adaptive variants in translation genes, is that because it’s impossible to improve translation any further, or because of interference from other variants? They use the term ”evolutionary stalling”, kind of asexual linked selection. Because variants occur together, selection acts on the net effect of all the variants in an individual. Adaptation in a certain process (in this case translation) might stall, if there are large-effect adaptive variants in other, potentially completely unrelated processes, that swamp the effect on translation.

They argue for three kinds of indirect evidence that the adaptation in translation has stalled in at least some of the populations:

  1. Some of the replicate populations of V didn’t fix adaptive translation variants.
  2. In some populations, there were a second adaptive translation variant, not yet fixed.
  3. There have been adaptive translation mutations in the Long Term Evolution Experiment, which is based on E.coli with unmanipulated translation machinery.

Stalling depends on large-effect variants, but after they have fixed, adaptation might resume. They use the metaphor of natural selection ”shifting focus”. The two non-translation genes singled out in the above figure might be examples of that:

While we did not observe resumption of adaptive evolution in [translation] during the duration of this experiment, we find evidence for a transition from stalling to adaptation in trkH and fimD genes. Mutations in these two genes appear to be beneficial in all our genetic backgrounds (Figure 4). These mutations are among the earliest to arise and fix in E, S and Y populations where the TM does not adapt … In contrast, mutations in trkH and fimD arise in A and P populations much later, typically following fixations of TM-specific mutations … In other words, natural selection in these populations is initially largely focused on improving the TM, while adaptation in trkH and fimD is stalled. After a TM-specific mutation is fixed, the focus of natural selection shifts away from the TM to other modules, including trkH and fimD.

This is all rather indirect, but interesting. Both ”the focus of natural selection shifting” and ”coupling of modules by the emergent neutrality threshold” are inspiring ways to think about the evolution of genetic architecture, and new to me.

Literature

Venkataram, Sandeep, et al. ”Evolutionary Stalling and a Limit on the Power of Natural Selection to Improve a Cellular Module.” bioRxiv (2019): 850644.

Genes do not form networks

As a wide-eyed PhD student, I read a lot of papers about gene expression networks and was mightily impressed by their power. You can see where this is going, can’t you?

Someone on Twitter talked about their doubts about gene networks: how networks ”must” be how biology works, but that they weren’t sure that network methods actually had helped genetics that much, how there are compelling annotation term enrichments, and individual results that ”make sense”, but not many hard predictions. I promise I’m not trying to gossip about them behind their back, but I couldn’t find the tweets again. If you think about it, however, I don’t think genes must form networks at all, quite the opposite. But there are probably reasons why the network idea is so attractive.

(Edit: Here is the tweet I was talking about by Jeffrey Barrett! Thanks to Guillaume Devailly for pointing me to it.)

First, network representations are handy! There are all kinds of things about genes that can be represented as networks: coexpression, protein interactions, being mentioned in the same PubMed abstract, working on the same substrate, being annotated by the same GO term, being linked in a database such as STRING which tries to combine all kinds of protein–protein interactions understood broadly (Szklarczyk & al 2018), differential coexpression, co-differential expression (Hudson, Reverter & Dalrymple 2009), … There are all kinds of ways of building networks between genes: correlations, mutual information, Bayesian networks, structural equations models … Sometimes one of them will make an interesting biological phenomena stand out and become striking to the eye, or to one of the many ways to cluster nodes and calculate their centrality.

Second, networks are appealing. Birgitte Nerlich has this great blog post–On books, circuits and life–about metaphors for gene editing (the book of life, writing, erasing, cutting and editing) and systems biology (genetic engineering, circuits, wiring, the genetic program). Maybe the view of gene networks fits into the latter category, if we imagine that the extremely dated analogy with cybernetics (Peluffo 2015) has been replaced with the only slightly dated idea of a universal network science. After Internet and Albert, Jeong & Barabási (1999), what could be more apt than understanding genes as forming networks?

I think it’s fair to say that for genes to form networks, the system needs to be reasonably well described by a graph of nodes and edges. If you look at systems of genes that are really well understood, like the gap gene ”network”, you will see that they do not look like this at all. Look at Fig 3 in Jaeger (2011). Here, there is dynamic and spatial information not captured by the network topology that needs to be overlaid for the network view to make sense.

Or look at insulin signalling, in Fig 1 of Nyman et al (2014). Here, there are modified versions of proteins, non-gene products such as glucose and the plasma membrane, and again, dynamics, including both RNA and protein synthesis themselves. There is no justification for assuming that any of that will be captured by any topology or any weighting of genes with edges between them.

We are free to name biological processes networks if we want to; there’s nothing wrong with calling a certain process and group of related genes ”the gap gene network”. And we are free to use any network representation we want when it is useful or visually pleasing, if that’s what we’re going for. However, genes do not actually form networks.

Literature

Szklarczyk, D, et al. (2018) STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research.

Hudson, N. J., Reverter, A., & Dalrymple, B. P. (2009). A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS computational biology, 5(5), e1000382.

Peluffo, A. E. (2015). The ”Genetic Program”: behind the genesis of an influential metaphor. Genetics, 200(3), 685-696.

Albert, R., Jeong, H., & Barabási, A. L. (1999). Diameter of the world-wide web. Nature, 401(6749), 130.

Jaeger, J. (2011). The gap gene network. Cellular and Molecular Life Sciences, 68(2), 243-274.

Nyman, E., Rajan, M. R., Fagerholm, S., Brännmark, C., Cedersund, G., & Strålfors, P. (2014). A single mechanism can explain network-wide insulin resistance in adipocytes from obese patients with type 2 diabetes. Journal of Biological Chemistry, 289(48), 33215-33230.