# Peer review glossary

‘Misleading’ — not exactly as I would have written it

‘Somewhat confusing’ — using terminology from adjacent sub-subfield

‘Confusing’ — completely illegible

‘Poorly structured’ — not exactly as I would have written it

‘Conversational’ — in need of adjectives

‘Descriptive’ — using technology that isn’t fashionable anymore

‘Potentially’ — definitely

‘by a native English speaker’ — by the Microsoft Word spell checker

‘due to insufficient enthusiasm’ — because it’s trite

‘gratefully’ — begrudgingly

‘constructive’ — fairly polite

# ‘Hard cash paid down, over and over again’

The whole subject of inheritance is wonderful. When a new character arises, whatever its nature may be, it generally tends to be inherited, at least in a temporary and sometimes in a most persistent manner. What can be more wonderful than that some trifling peculiarity, not primordially attached to the species, should be transmitted through the male or female sexual cells, which are so minute as not to be visible to the naked eye, and afterwards through the incessant changes of a long course of development, undergone either in the womb or in the egg, and ultimately appear in the offspring when mature, or even when quite old, as in the case of certain diseases? Or again, what can be more wonderful than the well-ascertained fact that the minute ovule of a good milking cow will produce a male, from whom a cell, in union with an ovule, will produce a female, and she, when mature, will have large mammary glands, yielding an abundant supply of milk, and even milk of a particular quality?

Today is Charles Darwin’s birthday. I’m not such a serious Darwin reader, but it’s fun how it seems like you can open a Darwin book at almost any chapter and find something interesting or amusing. This is from The Variation of Animals And Plants Under Domestication, chapter twelve, ‘Inheritance’. Here we find Darwin overflowing with enthusiasm when trying to convince a sceptic about the importance of inheritance. In true Darwin style he launches into a long list of examples:

Some writers, who have not attended to natural history, have attempted to show that the force of inheritance has been much exaggerated. The breeders of animals would smile at such simplicity; and if they condescended to make any answer, might ask what would be the chance of winning a prize if two inferior animals were paired together? They might ask whether the half-wild Arabs were led by theoretical notions to keep pedigrees of their horses? Why have pedigrees been scrupulously kept and published of the Shorthorn cattle, and more recently of the Hereford breed? Is it an illusion that these recently improved animals safely transmit their excellent qualities even when crossed with other breeds? have the Shorthorns, without good reason, been purchased at immense prices and exported to almost every quarter of the globe, a thousand guineas having been given for a bull? With greyhounds pedigrees have likewise been kept, and the names of such dogs, as Snowball, Major, &c., are as well known to coursers as those of Eclipse and Herod on the turf. Even with the Gamecock, pedigrees of famous strains were formerly kept, and extended back for a century. With pigs, the Yorkshire and Cumberland breeders ”preserve and print pedigrees;” and to show how such highly-bred animals are valued, I may mention that Mr. Brown, who won all the first prizes for small breeds at Birmingham in 1850, sold a young sow and boar of his breed to Lord Ducie for 43 guineas; the sow alone was afterwards sold to the Rev. F. Thursby for 65 guineas; who writes, ”She paid me very well, having sold her produce for 300l., and having now four breeding sows from her.” Hard cash paid down, over and over again, is an excellent test of inherited superiority. In fact, the whole art of breeding, from which such great results have been attained during the present century, depends on the inheritance of each small detail of structure. But inheritance is not certain; for if it were, the breeder’s art would be reduced to a certainty, and there would be little scope left for that wonderful skill and perseverance shown by the men who have left an enduring monument of their success in the present state of our domesticated animals.

For the rest of the chapter, he will go on to talk about humans, again with long lists of examples, and then mixing in domestic animals and plants again. A lot of these examples of heredity surely hold up, and others seem like anecdotes. Here and even more in the following chapters–with subtitles including ‘reversion to atavism’, ‘prepotency’ and ‘on the good effects of crossing, and the evil effects of close interbreeding’–Darwin is trying hard to make sense of heredity. Why are certain features heritable? Why do they sometimes go away in the offspring but reappear in later generations? Why are offspring sometimes more like one parent than the other? In chapter 27, he will present his ‘provisional hypthesis of pangenesis’.

Literature

Darwin. 1875. The variation of animals and plants under domestication.

# ‘We have reached peak gene, and passed it’

Ken Richardson recently published an opinion piece about genetics titled ‘It’s the end of the gene as we know it‘. And I feel annoyed.

The overarching point of the piece is that there have been ‘radical revisions of the gene concept’ and that they ‘need to reach the general public soon—before past social policy mistakes are repeated’. He argues, among other things, that:

• headlines like ‘being rich and successful is in your DNA’ are silly;
• polygenic scores for complex traits have limited predictive power and problems with population structure;
• the classical concept of what a ‘gene’ has been undermined by molecular biology, which means that genetic mapping and genomic prediction are conceptually flawed.

You may be able to guess which of these arguments make me cheer and which make me annoyed.

There is a risk when you writes a long list of arguments, that if you make some good points and some weak points, no-one will remember anything but the weak point. Let us look at what I think are some good points, and the main weak one.

Gene-as-variant versus gene-as-sequence

I think Richardson is right that there is a difference in how classical genetics, including quantitative genetics, conceives of a ‘gene’, and what a gene is to molecular biology. This is the same distinction as Griffth & Stotz (2013), Portin & Wilkins (2017), and I’m sure many others have written about. (Personally, I used to call it ‘gene(1)’ and ‘gene(2)’, but that is useless; even I can’t keep track of which is supposed to be one and two. Thankfully, that terminology didn’t make it to the blog.)

In classical terms, the ‘gene’ is a unit of inheritance. It’s something that causes inherited differences between individuals, and it’s only observed indirectly through crosses and and differences between relatives. In molecular terms, a ‘gene’ is a piece of DNA that has a name and, optionally, some function. The these two things are not the same. The classical gene fulfills a different function in genetics than the molecular gene. Classical genes are explained by molecular mechanisms, but they are not reducible to molecular genes.

That is, you can’t just take statements in classical genetics and substitute ‘piece of DNA’ for ‘gene’ and expect to get anything meaningful. Unfortunately, this seems to be what Richardson wants to do, and this inability to appreciate classical genes for what they are is why the piece goes astray. But we’ll return to that in a minute.

A gene for hardwiring in your DNA

I also agree that a lot of the language that we use around genetics, casually and in the media, is inappropriate. Sometimes it’s silly (when reacting positively to animals, believing in God, or whatever is supposed to be ‘hard-wired in our DNA’) and sometimes it’s scary (like when a genetic variant was dubbed ‘The Warrior Gene’ on flimsy grounds and tied to speculations about Maori genetics). Even serious geneticists who should know better will put out press releases where this or that is ‘in your DNA’, and the literature is full of ‘genes for’ complex traits that have at best small effects. This is an area where both researchers and communicators should shape up.

Genomic prediction is hard

Polygenic scores are one form of genomic prediction, that is: one way to predict individuals’ trait values from their DNA. It goes something like this: you collect trait values and perform DNA tests on some reference population, then fit a statistical model that tells you which genetic variants differ between individuals with high and low trait values. Then you take that model and apply it to some other individuals, whose values you want to predict. There are a lot of different ways to do this, but they all amount to estimating how much each variant contributes to the trait, and somehow adding that up.

If you have had any exposure to animal breeding, you will recognise this as genomic selection, a technology that has been a boon to animal breeding in dairy cattle, pig, chicken, and to lesser extent other industries in the last ten years or so (see review by Georges, Charlier & Hayes 2018). It’s only natural that human medical geneticists want to do use the same idea to improve prediction of diseases. Unfortunately, it’s a bit harder to get genomic prediction to be useful for humans, for several reasons.

The piece touches on two important problems with genomic prediction in humans: First, DNA isn’t everything, so the polygenic scores will likely have to be combined with other risk factors in a joint model. It still seems to be an open question how useful genomic prediction will be for what diseases and in what contexts. Second, there are problems with population structure. Ken Richardson explains with an IQ example, but the broader point is that it is hard for the statistical models geneticists use to identify the causal effects in the flurry of spurious associations that are bound to exist in real data.

[A]ll modern societies have resulted from waves of migration by people whose genetic backgrounds are different in ways that are functionally irrelevant. Different waves have tended to enter the class structure at randomly different levels, creating what is called genetic population stratification. But different social classes also experience differences in learning opportunities, and much about the design of IQ tests, education, and so on, reflects those differences, irrespective of differences in learning ability as such. So some spurious correlations are, again, inevitable.

So, it may be really hard to get good genomic predictors that predict accurately. This is especially pressing for studies of adaptation, where researchers might use polygenic scores estimated in European populations to compare other populations, for example. Methods to get good estimates in the face of population structure is a big research topic in both human, animal, and plant genetics. I wouldn’t be surprised if good genomic prediction in humans would require both new method development and big genome-wide association studies that cover people from all of the world.

These problems are empirical research problems. Polygenic scores may be useful or not. They will probably need huge studies with lots of participants and new methods with smart statistical tricks. However, they are not threatened by conceptual problems with the word ‘gene’.

Richardson’s criticism is timely. We’d all like to think that anyone who uses polygenic scores would be responsible, pay attention to the literature about sensitivity to population structure, and not try to over-interpret average polygenic scores as some way to detect genetic differences between populations. But just the other week, an evolutionary psychology journal published a paper that did just that. There are ill-intentioned researchers around, and they enjoy wielding the credibility of fancy-sounding modern methods like polygenic scores.

Genetic variants can be causal, though

Now on to where I think the piece goes astray. Here is a description of genetic causation and how that is more complicated than it first seems:

Of course, it’s easy to see how the impression of direct genetic instructions arose. Parents “pass on” their physical characteristics up to a point: hair and eye color, height, facial features, and so on; things that ”run in the family.” And there are hundreds of diseases statistically associated with mutations to single genes. Known for decades, these surely reflect inherited codes pre-determining development and individual differences?

But it’s not so simple. Consider Mendel’s sweet peas. Some flowers were either purple or white, and patterns of inheritance seemed to reflect variation in a single ”hereditary unit,” as mentioned above. It is not dependent on a single gene, however. The statistical relation obscures several streams of chemical synthesis of the dye (anthocyanin), controlled and regulated by the cell as a whole, including the products of many genes. A tiny alteration in one component (a ”transcription factor”) disrupts this orchestration. In its absence the flower is white.

So far so good. This is one of the central ideas of quantitative genetics: most traits that we care about are complex, in that an individual’s trait value is affected by lots of genes of individually small effects, and to a large extent on environmental factors (that are presumably also many and subtle in their individual effects). Even relatively simple traits tend to be more complicated when you look closely. For example, almost none of the popular textbook examples of single gene traits in humans are truly influenced by variants at only one gene (Myths of human genetics). Most of the time they’re either unstudied or more complicated than that. And even Mendelian rare genetic diseases are often collections of different mutations in different genes that have similar effects.

This is what quantitative geneticists have been saying since the early 1900s (setting aside the details about the transcription factors, which is interesting in its own right, but not a crucial part of the quantitative genetic account). This is why genome-wide association studies and polygenic scores are useful, and why single-gene studies of ‘candidate genes’ picked based on their a priori plausible function is a thing of the past. But let’s continue:

This is a good illustration of what Noble calls ”passive causation.” A similar perspective applies to many ”genetic diseases,” as well as what runs in families. But more evolved functions—and associated diseases—depend upon the vast regulatory networks mentioned above, and thousands of genes. Far from acting as single-minded executives, genes are typically flanked, on the DNA sequence, by a dozen or more ”regulatory” sequences used by wider cell signals and their dynamics to control genetic transcription.

This is where it happens. We get a straw biochemist’s view of the molecular gene, where everything is due only to protein-coding genes that encode one single protein that has one single function, and then he enumerates a lot of different exceptions to this view that is supposed to make us reject the gene concept: regulatory DNA (as in the quote above), dynamic gene regulation during development, alternative splicing that allows the same gene to make multiple protein isoforms, noncoding RNA genes that act without being turned into protein, somatic rearrangements in DNA, and even that similar genes may perform different functions in different species … However, the classical concept of a gene used in quantitative genetics is not the same as the molecular gene. Just because the molecular biology and classical genetics both use the word ‘gene’, users of genome-wide association studies are not forced to commit to any particular view about alternative splicing.

It is true that there are ‘vast regulatory networks’ and interplay at the level of ‘the cell as a whole’, but that does not prevent some (or many) of the genes involved in the network to be affected by genetic variants that cause differences between the individuals. That builds up to form genetic effects on traits, through pathways that are genuinely causal, ‘passive’ or not. There are many genetic variants and complicated indirect mechanisms involved. The causal variants are notoriously hard to find. They are still genuine causes. You can become a bit taller because you had great nutrition as a child rather than poor nutrition. You can become a bit taller because you carry certain genetic variants rather than others.

# Paper: ‘Sequence variation, evolutionary constraint, and selection at the CD163 gene in pigs’

This paper is sort of a preview of what is going to be a large series of empirical papers on pig genomics from a lot of people in our group.

The humble CD163 gene has become quite important, because the PRRS virus exploits it to enter macrophages when it infects a pig. It turns out, that if you inactivate it — and there are several ways to go about that; a new one was even published right this paper (Chen et al. 2019) — you get a PRRSV-resistant pig. For obvious reasons, PRRSV-resistant pigs would be great for pig farmers.

In this paper, we wanted to figure out 1) if there were any natural knockout variants in CD163, and 2) if there was anything special about CD163 if you compare it to the rest of the genes in the pig genome. In short, we found no convincing knockout variants, and that CD163 seemed moderately variant intolerant, under positive selection in the lineage leading up to the pig, and that there was no evidence of a selective sweep at CD63.

You can read the whole thing in GSE.

Figure 1, showing sequence variants detected in the CD163 gene.

If you are so inclined, this might lead on to the interesting but not very well defined open question of how we combine these different perspectives on selection in the genome, and how they go together with other genome features like mutation rate and recombination rate variation. There are some disparate threads to bring together there.

Johnsson, Martin, et al. Sequence variation, evolutionary constraint, and selection at the CD163 gene in pigs. Genetics Selection Evolution 50.1 (2018): 69.

# Paper: ‘Integrating selection mapping with genetic mapping and functional genomics’

If you’re the kind of geneticist who wants to know about causative variants that affect selected traits, you have probably thought about how to combine genome scans for signatures of selection with genome-wide association studies. There is one simple problem: Unfortunately, once you’ve found a selective sweep, the association signal is gone, because the causative variant is fixed (or close to). So you need some tricks.

This is a short review that I wrote for a research topic on the genomics of adaptation. It occurred to me that one can divide the ways to combine selection mapping and genetic mapping in three categories. The review contains examples from the literature of how people have done it, and this mock genome-browser style figure to illustrate them.

You can read the whole thing in Frontiers in Genetics.

Johnsson, Martin. Integrating selection mapping with genetic mapping and functional genomics. Frontiers in Genetics 9 (2018): 603.

# ”Gener påverkar” ditt och datt

Det var länge sedan jag skrev en post som den här, men en gång i tiden bestod bloggen nästan helt av gnäll på avsaknad av referenser i nyhetsartiklar om vetenskap. Delvis var det ett sätt att lägga till referenser till nyhetsartiklarna, för om en bloggpost länkade till en artikel i till exempel DN så svarade de med en länk på artikeln. Det känns som det var oskyldigare tider när tidningar tyckte det var rimligt att automatiskt länka till bloggar som skrev om dem.

Nåväl. Det börjar så här: en vän skickar en länk till den här artikeln på SVT Nyheter Uppsalas hemsida: ”Dina gener påverkar hur ditt fett lägger sig” Det är en notis med anledning av en ny vetenskaplig artikel från forskare i Uppsala. Den har till och med en liten video. Det står:

En ny studie gjord på Uppsala universitet visar att dina gener påverkar var ditt fett hamnar på kroppen.

360 000 personer har deltagit i studien, och studien kan visa att det främst är kvinnor som påverkas av sin genetik.

– Vi vet att kvinnor och män tenderar att lagra fett i olika delar av kroppen. Kvinnor har lättare för att lagra fett på höfter och ben, medan män i högre utsträckning lagrar fett i buken, säger Mathias Rask-Andersen vid institutionen för genetik vid Uppsala universitet.

Och inte så mycket mer. Min vän skriver ungefär: Men det här vet man väl ändå redan, att det kan finnas någon genetisk effekt på hur fett fördelar sig på kroppen? Det måste ligga något mer bakom forskningen som kommit bort i nyhetsartikeln. Och det gör det förstås.

Nu behöver vi hitta originalartikeln. Det finns ingen referens i nyhetsartikeln, men de har i alla fall hjälpsamt nämnt en av forskarna vid namn, så vi har lite mer information än att det är någon kopplad till Uppsala. Jag börjar med att söka efter Mathias Rask-Andersen. Först kollar jag hans Google Scholar-sida, men där finns artikeln inte än. Helt nya artiklar brukar ta en stund på sig att komma in i litteraturdatabaser. Sedan hans och forskargruppens sidor på Uppsala universitet, men de är förstås inte heller uppdaterade än. Eftersom nyhetsartikeln nämnde 360 000 individer kan vi gissa att de förmodligen använde data från UK Biobank, så vi kan titta på deras publikationssida också. Där finns nästan löjligt många artiklar som redan publicerats 2019, men inte den här.

Först efter det kommer jag på att titta på Uppsala universitets pressida efter det fullständiga pressmeddelandet. Bingo. Det innehåller en referens till artikeln i Nature Communications. Här är den: Rask-Andersen et al. (2019) Genome-wide association study of body fat distribution identifies adiposity loci and sex-specific genetic effects.

”Genome-wide association study”, står det — associationsstudie på hela genomet. Det rör sig alltså om en associationsstudie, det vill säga en studie som försöker koppla fettfördelningen till vissa genetiska varianter. Man dna-testar en massa människor och ser vilka genetiska varianter som hänger samman med att ha fettet på ett visst ställe på kroppen. (Här en mycket gammal bloggpost som försöker beskriva detta.)

Det handlar alltså inte om forskning som försöker pröva om fettfördelningen har någon genetisk grund eller inte, utan forskning som givet att fettfördelningen på kroppen har en viss genetisk grund försöker ta reda på vilka gener och genetiska varianter som påverkar. Nyhetsartikeln har alltså fått vad studien handlar om helt om bakfoten, och så här brukar det se ut när associationsstudier presenteras i media. De framställs som något som ska testa om ”gener påverkar” något eller inte. Hur kommer det sig?

Jag misstänker att associationsstudier är för svåra att beskriva kortfattat i ett pressmeddelande. Det är lättare att säga att studien visar ”att gener påverkar” än att den ”försöker hitta just de varianter av gener som påverkar”, och därför blir det vad forskaren eller kommunikatören på universitetet skriver i sitt pressmeddelande. Sedan klipper reportern ner pressmeddelandet till hanterbar längd, och då försvinner de flesta detaljer samt referensen till originalartikeln.

Så kommer det sig att nyhetsartiklar om nya associationsstudier ger helt missvisande beskrivningar av vad de handlar om.

# Journal club of one: ‘Splendor and misery of adaptation, or the importance of neutral null for understanding evolution’

In this paper from a couple of years ago, Eugene Koonin takes on naïve adaptationism, in the style of The Spandrels of Saint Marcos and the Panglossian paradigm (Gould & Lewontin 1979). The spandrels paper is one of those classics that divide people. One of its problems was that it is easy to point out what one shouldn’t do (tell adaptive stories without justification), but harder to say what one should do. But anti-adaptationism has moved forward since the Spandrels, and the current paper has a prescription.

Spandrels contained a list of possible alternatives to adaptation, which I think breaks down into two categories: population genetic alternatives (including neutral or deleterious fixations due to drift and runaway selection driving destructive features rather than fit to the environment), and physiological or physical alternatives (features that arise due to selection on something else, which are the metaphorical spandrels of the title, and fit to the environment that happens due to natural laws unrelated to biological evolution).

Eugene Koonin elaborates on the population genetic part, concentrating more on chance and less on constraint. He brings up examples of molecular structures that may have arisen through neutral evolution. The main idea is that when a feature has fixed, it doesn’t go away so easily, and there can be a ratchet-like process of increasing complexity. Evolution doesn’t Haussmannise, but patches, pieces, and cobbles together what is already there.

As a theoretical example, Michael Lynch (2007) used population genetic models to derive conditions for when molecular networks can extend and become complex by neutral means. (Spoiler: it’s when transcription factor binding motifs arise often in the weakly constrained DNA around genes.) Eugene Koonin thinks that the thing to do with this insight is to use it as a null model:

A simplified and arguably the most realistic approach is to assume a neutral null model and then seek evidence of selection that could falsify it. Null models are standard in physics but apparently not in biology. However, if biology is to evolve into a ”hard” science, with a solid theoretical core, it must be based on null models, no other path is known.

I disagree with this for two reasons. I’m not at all convinced that biology must be based on setting up null models and rejecting them … or that physics is. In some statistical approaches, inference proceeds by setting up a null hypothesis (and model), and trying to shoot it down. But those hypotheses are different from substantial scientific hypotheses. I would suspect that biology spends too much time rejecting nulls, not too little.

Bausman & Halina (2018) summarise the argument against null hypotheses like this in their recent paper Biology & Philosophy:

The pseudo-null strategy is an attempt to move hypotheses away from parity by shifting the burden of disproving the null to the alternative hypotheses on the authority of statistics. As we have argued, there is no clear justification for this strategy, however, so the hypotheses should be treated on a par.

That is, they reject the analogy between statistical testing and scientific reasoning. They take their examples from ecology and psychology, but there is the same tendency in molecular evolution.

Also, constructive neutral evolution is as a pretty elaborate process. Just like adaptation should not be assumed as a default model without positive supporting evidence, neither should it. The default alternative for some elaborate feature of an organism need not be ‘constructive neutral evolution’, but ‘we don’t know how it came about’.

On the other hand, maybe the paper shouldn’t be read as an attempt to set constructive neutral evolution up as the default, but, like Spandrels, to repeat that adaptation isn’t everything:

It is important to realize that this changed paradigm by no means denies the importance of adaptation, only requires that it is not taken for granted. As discussed above, adaptation is common even in the weak selection regime where non-adaptive processes dominate. But the adaptive processes change their character as manifested in the switch from local to global evolutionary solutions, CNE, and pervasive (broadly understood) exaptation.

Naïve adaptationism is certainly not dead, but just whisper $\frac {1}{N_e s}$ and the ghost goes away. I would have been more interested in an attack on sophisticated adaptationism. How about the organismal level? Do ratchet-like neutral processes bias or direct the evolution of form and behaviour of say animals and plants?

Literature

Bausman W & Halina M (2018) Not null enough: pseudo-null hypotheses in community ecology and comparative psychology Philosophy & Biology

Gould SJ & Lewontin R (1979) The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme Proceedings of the Royal Society B.

Koonin EV (2016) Splendor and misery of adaptation, or the importance of neutral null for understanding evolution BMC Biology.

Lynch M (2007) The evolution of genetic networks by non-adaptive processes Nature Reviews Genetics.