Journal club of one: ‘Biological relevance of computationally predicted pathogenicity of noncoding variants’

Wouldn’t it be great if we had a way to tell genetic variants that do something to gene function and regulation from those that don’t? This is a Really Hard Problem, especially for variants that fall outside of protein-coding regions, and thus may or may not do something to gene regulation.

There is a host of bioinformatic methods to tackle the problem, and they use different combinations of evolutionary analysis (looking at how often the position of the variant differs between or within species) and functional genomics (what histone modifications, chromatin accessibility etc are like at the location of the variant) and statistics (comparing known functional variants to other variants).

When a new method is published, it’s always accompanied by a receiver operating curve showing it predicting held-out data well, and some combination of comparisons to other methods and analyses of other datasets of known or presumed functional variants. However, one wonders how these methods will do when we use them to evaluate unknown variants in the lab, or eventually in the clinic.

This is what this paper, Liu et al (2019) Biological relevance of computationally predicted pathogenicity of noncoding variants is trying to do. They construct three test cases that are supposed to be more realistic (pessimistic) test beds for six noncoding variant effect predictors.

The tasks are:

  1. Find out which allele of a variant is the deleterious one. The presumed deleterious test alleles here are ones that don’t occur in any species of a large multiple genome alignment.
  2. Find a causative variant among a set of linked variants. The test alleles are causative variants from the Human Gene Mutation Database and some variants close to them.
  3. Enrich for causative variants among increasingly bigger sets of non-functional variants.

In summary, the methods don’t do too well. The authors think that they have ‘underwhelming performance’. That isn’t happy news, but I don’t think it’s such a surprise. Noncoding variant prediction is universally acknowledged to be tricky. In particular, looking at Task 3, the predictors are bound to look much less impressive in the face of class imbalance than in those receiver operating curves. Then again, class imbalance is going to be a fact when we go out to apply these methods to our long lists of candidate variants.

Task 1 isn’t that well suited to the tools, and the way it’s presented is a bit silly. After describing how they compiled their evolution-based test variant set, the authors write:

Our expectation was that a pathogenic allele would receive a significantly higher impact score (as defined for each of the six tested methods) than a non-pathogenic allele at the same position. Instead, we found that these methods were unsuccessful at this task. In fact, four of them (LINSIGHT, EIGEN, GWAVA, and CATO) reported identical scores for all alternative alleles at every position as they were not designed for allelic contrasts …

Sure, it’s hard to solve this problem with a program that only produces one score per site, but you knew that when you started writing this paragraph, didn’t you?

The whole paper is useful, but to me, the most interesting insight is that variants close to each other tend to have correlated features, meaning that there is little power to tell them apart (Task 2). This might be obvious if you think about it (e.g., if two variants fall in the same enhancer, how different can their chromatin state and histone modifications really be?), but I guess I haven’t thought that hard about it before. This high correlation is unfortunate, because that means that methods for finding causative variants (association and variant effect prediction) have poor spatial resolution. We might need something else to solve the fine mapping problem.

Figure 4 from Liu et al., showing correlation between features of linked variants.

Finally, shout-out to Reviewer 1 whose comment gave rise to these sentences:

An alternative approach is to develop a composite score that may improve upon individual methods. We examined one such method, namely PRVCS, which unfortunately had poor performance (Supplementary Figure 11).

I thought this read like something prompted by an eager beaver reviewer, and thanks to Nature Communications open review policy, we can confirm my suspicions. So don’t say that open review is useless.

Comment R1.d. Line 85: It would be interesting to see if a combination of the examined scores would better distinguish between pathogenic and non-pathogenic non-coding regions. Although we suspect there to be high correlation between features this will test the hypothesis that each score may not be sufficient on its own to make any distinction between pathogenic and non-pathogenic ncSNVs. However, a combined model might provide more discriminating power than individual scores, suggesting that each score captures part of the underlying information with regards to a region’s pathogenicity propensity.

Literature

Liu, L., Sanderford, M. D., Patel, R., Chandrashekar, P., Gibson, G., & Kumar, S. (2019). Biological relevance of computationally predicted pathogenicity of noncoding variants. Nature Communications, 10(1), 330.

Journal club of one: ‘The heritability fallacy’

Public debate about genetics often seems to centre on heritability and on psychiatric and mental traits, maybe because we really care about our minds, and because for a long time heritability was all human geneticists studying quantitative traits could estimate. Here is an anti-heritabililty paper that I think articulates many of the common grievances: Moore & Shenk (2016) The heritability fallacy. The abstract gives a snappy summary of the argument:

The term ‘heritability,’ as it is used today in human behavioral genetics, is one of the most misleading in the history of science. Contrary to popular belief, the measurable heritability of a trait does not tell us how ‘genetically inheritable’ that trait is. Further, it does not inform us about what causes a trait, the relative influence of genes in the development of a trait, or the relative influence of the environment in the development of a trait. Because we already know that genetic factors have significant influence on the development of all human traits, measures of heritability are of little value, except in very rare cases. We, therefore, suggest that continued use of the term does enormous damage to the public understanding of how human beings develop their individual traits and identities.

At first glance, this paper should be a paper for me. I tend to agree that heritability estimates of human traits aren’t very useful. I also agree that geneticists need to care about the interpretations of their claims beyond the purely scientific domain. But the more I read, the less excited I became. The paper is a list of complaints about heritability coefficients. Some are more or less convincing. For example, I find it hard to worry too much about the ‘equal environments assumption’ in twin studies. But sure, it’s hard to identify variance components, and in practice, researchers sometimes restort to designs that are a lot iffier than twin studies.

But I think the main thrust of the paper is this huge overstatement:

Most important of all is a deep flaw in an assumption that many people make about biology: That genetic influences on trait development can be separated from their environmental context. However, contemporary biology has demonstrated beyond any doubt that traits are produced by interactions between genetic and nongenetic factors that occur in each moment of developmental time … That is to say, there are simply no such things as gene-only influences.

There certainly is such a thing as additive genetic variance as well as additive gene action. This passage only makes sense to me if ‘interaction’ is interpreted not as a statistical term but as describing a causal interplay. If so, it is perfectly true that all traits are the outcomes of interplay between genes and environment. It doesn’t follow that genetic variants in populations will interact with variable environments to the degree that quantitative genetic models are ‘nonsensical in most circumstances’.

They illustrate with this parable: Billy and Suzy are filling a bucket. Suzy is holding the hose and Billy turns on the tap. How much of the water is due to Billy and how much is due to Suzy? The answer is supposed to be that the question makes no sense, because they are both filling the bucket through a causal interplay. Well. If they’re filling a dozen buckets, and halfway through, Billy opens the tap half a turn more, and Suzy starts moving faster between buckets, because she’s tired of this and wants lunch … The correct level of analysis for the quantitative bucketist isn’t Billy, Suzy and the hose. It is the half-turn of the tap and Suzy’s moving of the nozzle.

The point is that quantitative genetic models describe variation between individuals. The authors know this, of course, but they write as if genetic analysis of variance is some kind of sleight of hand, as if quantitative genetics ought to be about development, and the fact that it isn’t is a deliberate obfuscation. Here is how they describe Jay Lush’s understanding of heritability:

The intention was ‘to quantify the level of predictability of passage of a biologically interesting phenotype from parent to offspring’. In this way, the new technical use of ‘heritability’ accurately reflected that period’s understanding of genetic determinism. Still, it was a curious appropriation of the term, because—even by the admission of its proponents—it was meant only to represent how variation in DNA relates to variation in traits across a population, not to be a measure of the actual influence of genes on the development of any given trait.

I have no idea what position Lush took on genetic determinism. But we can find the context of heritability by looking at the very page before in Animal breeding plans. The definition of the heritability coefficient occurs on page 87. This is how Lush starts the chapter on page 86:

In the strictest sense of the word, the question of whether a characteristic is hereditary or environmental has no meaning. Every characteristic is both hereditary and environmental, since it is the end result of a long chain of interactions of the genes with each other, with the environment and with the intermediate products at each stage of development. The genes cannot develop the characteristic unless they have the proper environment, and no amount of attention to the environment will cause the characteristc to develop unless the necessary genes are present. If either the genes or the environment are changed, the characteristic which results from their interactions may be changed.

I don’t know — maybe the way quantitative genetics has been used in human behavioural and psychiatric genetics invites genetic determinism. Or maybe genetic determinism is one of those false common-sense views that are really hard to unlearn. In any case, I don’t think it’s reasonable to put the blame on the concept of heritability for not being some general ‘measure of the biological inheritability of complex traits’ — something that it was never intended to be, and cannot possibly be.

My guess is that new debates will be about polygenic scores and genomic prediction. I hope that will be more useful.

Literature

David S. Moore & David Shenk (2016) The heritability fallacy

Jay Lush Animal breeding plans. Online at: https://archive.org/details/animalbreedingpl032391mbp/page/n99

How not to respond to CRISPR babies

In December, after He Jiankui’s alleged experiment with human genome-editing, a Nature editorial said:

It has not yet been independently confirmed that the Chinese genome-editing researcher He Jiankui altered the DNA of embryos using a gene-editing technique and then implanted them in a woman, as he claims. Such a step would be significant and controversial because it would make a permanent change to the germ line that could be passed on to future generations. (This distinguishes germline editing from the use of gene-editing tools as therapies that correct genetic alterations in somatic cells in blood and other tissues.)

I think that this passage, like a lot of other discourse among scientists on this topic, fails to acknowledge, or at least emphasise, the real damage in this case.

When we insist on the germline–soma distinction as The Barrier for genome editing, and crossing The Barrier as the primary problem, we prioritise The Barrier over the actual people involved. The damage is not primarily to ‘the genome’, ‘the gene pool’, or ‘future generations’, but to the children born of the procedure, and their parents. The genome, on the other hand, is fine. It’s being fuzzed by random mutation every generation anyways.

Imagine this was instead a somatic gene ‘therapy’ experiment, with similarly vague potential benefits against similarly unknown and unchecked potential harms. Would it be fine? Of course not. It might be slightly less bad, because the women wouldn’t have to worry that their children would inherit the potential complications. That the variants are (may be) heritable is not unimportant, but it shouldn’t be the main concern.

‘We have reached peak gene, and passed it’

Ken Richardson recently published an opinion piece about genetics titled ‘It’s the end of the gene as we know it‘. And I feel annoyed.

The overarching point of the piece is that there have been ‘radical revisions of the gene concept’ and that they ‘need to reach the general public soon—before past social policy mistakes are repeated’. He argues, among other things, that:

  • headlines like ‘being rich and successful is in your DNA’ are silly;
  • polygenic scores for complex traits have limited predictive power and problems with population structure;
  • the classical concept of what a ‘gene’ has been undermined by molecular biology, which means that genetic mapping and genomic prediction are conceptually flawed.

You may be able to guess which of these arguments make me cheer and which make me annoyed.

There is a risk when you writes a long list of arguments, that if you make some good points and some weak points, no-one will remember anything but the weak point. Let us look at what I think are some good points, and the main weak one.

Gene-as-variant versus gene-as-sequence

I think Richardson is right that there is a difference in how classical genetics, including quantitative genetics, conceives of a ‘gene’, and what a gene is to molecular biology. This is the same distinction as Griffth & Stotz (2013), Portin & Wilkins (2017), and I’m sure many others have written about. (Personally, I used to call it ‘gene(1)’ and ‘gene(2)’, but that is useless; even I can’t keep track of which is supposed to be one and two. Thankfully, that terminology didn’t make it to the blog.)

In classical terms, the ‘gene’ is a unit of inheritance. It’s something that causes inherited differences between individuals, and it’s only observed indirectly through crosses and and differences between relatives. In molecular terms, a ‘gene’ is a piece of DNA that has a name and, optionally, some function. The these two things are not the same. The classical gene fulfills a different function in genetics than the molecular gene. Classical genes are explained by molecular mechanisms, but they are not reducible to molecular genes.

That is, you can’t just take statements in classical genetics and substitute ‘piece of DNA’ for ‘gene’ and expect to get anything meaningful. Unfortunately, this seems to be what Richardson wants to do, and this inability to appreciate classical genes for what they are is why the piece goes astray. But we’ll return to that in a minute.

A gene for hardwiring in your DNA

I also agree that a lot of the language that we use around genetics, casually and in the media, is inappropriate. Sometimes it’s silly (when reacting positively to animals, believing in God, or whatever is supposed to be ‘hard-wired in our DNA’) and sometimes it’s scary (like when a genetic variant was dubbed ‘The Warrior Gene’ on flimsy grounds and tied to speculations about Maori genetics). Even serious geneticists who should know better will put out press releases where this or that is ‘in your DNA’, and the literature is full of ‘genes for’ complex traits that have at best small effects. This is an area where both researchers and communicators should shape up.

Genomic prediction is hard

Polygenic scores are one form of genomic prediction, that is: one way to predict individuals’ trait values from their DNA. It goes something like this: you collect trait values and perform DNA tests on some reference population, then fit a statistical model that tells you which genetic variants differ between individuals with high and low trait values. Then you take that model and apply it to some other individuals, whose values you want to predict. There are a lot of different ways to do this, but they all amount to estimating how much each variant contributes to the trait, and somehow adding that up.

If you have had any exposure to animal breeding, you will recognise this as genomic selection, a technology that has been a boon to animal breeding in dairy cattle, pig, chicken, and to lesser extent other industries in the last ten years or so (see review by Georges, Charlier & Hayes 2018). It’s only natural that human medical geneticists want to do use the same idea to improve prediction of diseases. Unfortunately, it’s a bit harder to get genomic prediction to be useful for humans, for several reasons.

The piece touches on two important problems with genomic prediction in humans: First, DNA isn’t everything, so the polygenic scores will likely have to be combined with other risk factors in a joint model. It still seems to be an open question how useful genomic prediction will be for what diseases and in what contexts. Second, there are problems with population structure. Ken Richardson explains with an IQ example, but the broader point is that it is hard for the statistical models geneticists use to identify the causal effects in the flurry of spurious associations that are bound to exist in real data.

[A]ll modern societies have resulted from waves of migration by people whose genetic backgrounds are different in ways that are functionally irrelevant. Different waves have tended to enter the class structure at randomly different levels, creating what is called genetic population stratification. But different social classes also experience differences in learning opportunities, and much about the design of IQ tests, education, and so on, reflects those differences, irrespective of differences in learning ability as such. So some spurious correlations are, again, inevitable.

So, it may be really hard to get good genomic predictors that predict accurately. This is especially pressing for studies of adaptation, where researchers might use polygenic scores estimated in European populations to compare other populations, for example. Methods to get good estimates in the face of population structure is a big research topic in both human, animal, and plant genetics. I wouldn’t be surprised if good genomic prediction in humans would require both new method development and big genome-wide association studies that cover people from all of the world.

These problems are empirical research problems. Polygenic scores may be useful or not. They will probably need huge studies with lots of participants and new methods with smart statistical tricks. However, they are not threatened by conceptual problems with the word ‘gene’.

Richardson’s criticism is timely. We’d all like to think that anyone who uses polygenic scores would be responsible, pay attention to the literature about sensitivity to population structure, and not try to over-interpret average polygenic scores as some way to detect genetic differences between populations. But just the other week, an evolutionary psychology journal published a paper that did just that. There are ill-intentioned researchers around, and they enjoy wielding the credibility of fancy-sounding modern methods like polygenic scores.

Genetic variants can be causal, though

Now on to where I think the piece goes astray. Here is a description of genetic causation and how that is more complicated than it first seems:

Of course, it’s easy to see how the impression of direct genetic instructions arose. Parents “pass on” their physical characteristics up to a point: hair and eye color, height, facial features, and so on; things that ”run in the family.” And there are hundreds of diseases statistically associated with mutations to single genes. Known for decades, these surely reflect inherited codes pre-determining development and individual differences?

But it’s not so simple. Consider Mendel’s sweet peas. Some flowers were either purple or white, and patterns of inheritance seemed to reflect variation in a single ”hereditary unit,” as mentioned above. It is not dependent on a single gene, however. The statistical relation obscures several streams of chemical synthesis of the dye (anthocyanin), controlled and regulated by the cell as a whole, including the products of many genes. A tiny alteration in one component (a ”transcription factor”) disrupts this orchestration. In its absence the flower is white.

So far so good. This is one of the central ideas of quantitative genetics: most traits that we care about are complex, in that an individual’s trait value is affected by lots of genes of individually small effects, and to a large extent on environmental factors (that are presumably also many and subtle in their individual effects). Even relatively simple traits tend to be more complicated when you look closely. For example, almost none of the popular textbook examples of single gene traits in humans are truly influenced by variants at only one gene (Myths of human genetics). Most of the time they’re either unstudied or more complicated than that. And even Mendelian rare genetic diseases are often collections of different mutations in different genes that have similar effects.

This is what quantitative geneticists have been saying since the early 1900s (setting aside the details about the transcription factors, which is interesting in its own right, but not a crucial part of the quantitative genetic account). This is why genome-wide association studies and polygenic scores are useful, and why single-gene studies of ‘candidate genes’ picked based on their a priori plausible function is a thing of the past. But let’s continue:

This is a good illustration of what Noble calls ”passive causation.” A similar perspective applies to many ”genetic diseases,” as well as what runs in families. But more evolved functions—and associated diseases—depend upon the vast regulatory networks mentioned above, and thousands of genes. Far from acting as single-minded executives, genes are typically flanked, on the DNA sequence, by a dozen or more ”regulatory” sequences used by wider cell signals and their dynamics to control genetic transcription.

This is where it happens. We get a straw biochemist’s view of the molecular gene, where everything is due only to protein-coding genes that encode one single protein that has one single function, and then he enumerates a lot of different exceptions to this view that is supposed to make us reject the gene concept: regulatory DNA (as in the quote above), dynamic gene regulation during development, alternative splicing that allows the same gene to make multiple protein isoforms, noncoding RNA genes that act without being turned into protein, somatic rearrangements in DNA, and even that similar genes may perform different functions in different species … However, the classical concept of a gene used in quantitative genetics is not the same as the molecular gene. Just because the molecular biology and classical genetics both use the word ‘gene’, users of genome-wide association studies are not forced to commit to any particular view about alternative splicing.

It is true that there are ‘vast regulatory networks’ and interplay at the level of ‘the cell as a whole’, but that does not prevent some (or many) of the genes involved in the network to be affected by genetic variants that cause differences between the individuals. That builds up to form genetic effects on traits, through pathways that are genuinely causal, ‘passive’ or not. There are many genetic variants and complicated indirect mechanisms involved. The causal variants are notoriously hard to find. They are still genuine causes. You can become a bit taller because you had great nutrition as a child rather than poor nutrition. You can become a bit taller because you carry certain genetic variants rather than others.

Paper: ‘Sequence variation, evolutionary constraint, and selection at the CD163 gene in pigs’

This paper is sort of a preview of what is going to be a large series of empirical papers on pig genomics from a lot of people in our group.

The humble CD163 gene has become quite important, because the PRRS virus exploits it to enter macrophages when it infects a pig. It turns out, that if you inactivate it — and there are several ways to go about that; a new one was even published right this paper (Chen et al. 2019) — you get a PRRSV-resistant pig. For obvious reasons, PRRSV-resistant pigs would be great for pig farmers.

In this paper, we wanted to figure out 1) if there were any natural knockout variants in CD163, and 2) if there was anything special about CD163 if you compare it to the rest of the genes in the pig genome. In short, we found no convincing knockout variants, and that CD163 seemed moderately variant intolerant, under positive selection in the lineage leading up to the pig, and that there was no evidence of a selective sweep at CD63.

You can read the whole thing in GSE.

Figure 1, showing sequence variants detected in the CD163 gene.

If you are so inclined, this might lead on to the interesting but not very well defined open question of how we combine these different perspectives on selection in the genome, and how they go together with other genome features like mutation rate and recombination rate variation. There are some disparate threads to bring together there.

Johnsson, Martin, et al. Sequence variation, evolutionary constraint, and selection at the CD163 gene in pigs. Genetics Selection Evolution 50.1 (2018): 69.

Paper: ‘Integrating selection mapping with genetic mapping and functional genomics’

If you’re the kind of geneticist who wants to know about causative variants that affect selected traits, you have probably thought about how to combine genome scans for signatures of selection with genome-wide association studies. There is one simple problem: Unfortunately, once you’ve found a selective sweep, the association signal is gone, because the causative variant is fixed (or close to). So you need some tricks.

This is a short review that I wrote for a research topic on the genomics of adaptation. It occurred to me that one can divide the ways to combine selection mapping and genetic mapping in three categories. The review contains examples from the literature of how people have done it, and this mock genome-browser style figure to illustrate them.

You can read the whole thing in Frontiers in Genetics.

Johnsson, Martin. Integrating selection mapping with genetic mapping and functional genomics. Frontiers in Genetics 9 (2018): 603.

”Gener påverkar” ditt och datt

Det var länge sedan jag skrev en post som den här, men en gång i tiden bestod bloggen nästan helt av gnäll på avsaknad av referenser i nyhetsartiklar om vetenskap. Delvis var det ett sätt att lägga till referenser till nyhetsartiklarna, för om en bloggpost länkade till en artikel i till exempel DN så svarade de med en länk på artikeln. Det känns som det var oskyldigare tider när tidningar tyckte det var rimligt att automatiskt länka till bloggar som skrev om dem.

Nåväl. Det börjar så här: en vän skickar en länk till den här artikeln på SVT Nyheter Uppsalas hemsida: ”Dina gener påverkar hur ditt fett lägger sig” Det är en notis med anledning av en ny vetenskaplig artikel från forskare i Uppsala. Den har till och med en liten video. Det står:

En ny studie gjord på Uppsala universitet visar att dina gener påverkar var ditt fett hamnar på kroppen.

360 000 personer har deltagit i studien, och studien kan visa att det främst är kvinnor som påverkas av sin genetik.

– Vi vet att kvinnor och män tenderar att lagra fett i olika delar av kroppen. Kvinnor har lättare för att lagra fett på höfter och ben, medan män i högre utsträckning lagrar fett i buken, säger Mathias Rask-Andersen vid institutionen för genetik vid Uppsala universitet.

Och inte så mycket mer. Min vän skriver ungefär: Men det här vet man väl ändå redan, att det kan finnas någon genetisk effekt på hur fett fördelar sig på kroppen? Det måste ligga något mer bakom forskningen som kommit bort i nyhetsartikeln. Och det gör det förstås.

Nu behöver vi hitta originalartikeln. Det finns ingen referens i nyhetsartikeln, men de har i alla fall hjälpsamt nämnt en av forskarna vid namn, så vi har lite mer information än att det är någon kopplad till Uppsala. Jag börjar med att söka efter Mathias Rask-Andersen. Först kollar jag hans Google Scholar-sida, men där finns artikeln inte än. Helt nya artiklar brukar ta en stund på sig att komma in i litteraturdatabaser. Sedan hans och forskargruppens sidor på Uppsala universitet, men de är förstås inte heller uppdaterade än. Eftersom nyhetsartikeln nämnde 360 000 individer kan vi gissa att de förmodligen använde data från UK Biobank, så vi kan titta på deras publikationssida också. Där finns nästan löjligt många artiklar som redan publicerats 2019, men inte den här.

Först efter det kommer jag på att titta på Uppsala universitets pressida efter det fullständiga pressmeddelandet. Bingo. Det innehåller en referens till artikeln i Nature Communications. Här är den: Rask-Andersen et al. (2019) Genome-wide association study of body fat distribution identifies adiposity loci and sex-specific genetic effects.

”Genome-wide association study”, står det — associationsstudie på hela genomet. Det rör sig alltså om en associationsstudie, det vill säga en studie som försöker koppla fettfördelningen till vissa genetiska varianter. Man dna-testar en massa människor och ser vilka genetiska varianter som hänger samman med att ha fettet på ett visst ställe på kroppen. (Här en mycket gammal bloggpost som försöker beskriva detta.)

Det handlar alltså inte om forskning som försöker pröva om fettfördelningen har någon genetisk grund eller inte, utan forskning som givet att fettfördelningen på kroppen har en viss genetisk grund försöker ta reda på vilka gener och genetiska varianter som påverkar. Nyhetsartikeln har alltså fått vad studien handlar om helt om bakfoten, och så här brukar det se ut när associationsstudier presenteras i media. De framställs som något som ska testa om ”gener påverkar” något eller inte. Hur kommer det sig?

Jag misstänker att associationsstudier är för svåra att beskriva kortfattat i ett pressmeddelande. Det är lättare att säga att studien visar ”att gener påverkar” än att den ”försöker hitta just de varianter av gener som påverkar”, och därför blir det vad forskaren eller kommunikatören på universitetet skriver i sitt pressmeddelande. Sedan klipper reportern ner pressmeddelandet till hanterbar längd, och då försvinner de flesta detaljer samt referensen till originalartikeln.

Så kommer det sig att nyhetsartiklar om nya associationsstudier ger helt missvisande beskrivningar av vad de handlar om.

‘All domestic animals and plants are genetically modified already’

There is an argument among people who, like yours truly, support (or at least are not in principle against) applications of genetic modification in plant and animal breeding that ‘all domestic animals and plants are genetically modified already’ because of domestication and breeding. See for example Food Evolution or this little video from Sonal Katyal.

This is true in one sense, but it’s not very helpful, for two reasons.

First, it makes it look as if the safety and efficacy of genome editing turns on a definition. I don’t know what the people who pull out this idea in discussion expect that the response will be — that the people who worry about genetic modification as some kind of threat will go ‘oh, that was a clever turn of phrase; I guess it’s no problem then’. Again, I think the honest thing to say is that genetic modification (be it mutagenesis, transgenetics, or genome editing) is a different thing than classic breeding, but that it’s still okay.

Second, I also fear that it promotes the misunderstanding that selective breeding is somehow outdated and unimportant. This video is an example (and I don’t mean to bash on the video; I think what’s said in it is true, but not the whole story). Yes, genome editing allows us to introduce certain genetic changes precisely and independently of the surrounding region. This is as opposed to introducing a certain variant by crossing, when other undesired genetic variants will follow along. However, we need to know what to edit and what to put instead, so until knowledge of causative variants is near perfect (spoiler: never), selection will still play a role.

Genome editing in EU law

The European Court of Justice recently produced a judgement (Case C-528/16) that means that genome edited organisms will be regarded as genetically modified and subject to the EU directive 2001/18 about genetically modified organisms, which is bad news for anyone who wants to use genome editing to do anything with plant or animal breeding in Europe.

The judgement is in legalese, but I actually found it more clear and readable than the press coverage about it. The court does not seem conceptually confused: it knows what genome editing is, and makes reasonable distinctions. It’s just that it’s bound by the 2001 directive, and if we want genome editing to be useful, we need something better than that.

First, let’s talk about what ‘genetic modification’, ‘transgenics’, ‘mutagenesis’, and ‘genome editing’ are. This is how I understand the terms.

  • A genetically modified organism, the directive says, is ‘an organism, with the exception of human beings, in which the genetic material has been altered in a way that does not occur naturally by mating and/or natural recombination’. The directive goes on to clarify with some examples that count as genetic modification, and some that don’t, including in vitro fertilisation as well as bacterial and viral processes of horizontal gene transfer. As far as I can tell, this is sensible. The definition isn’t unassailable, of course, because a lot hinges on what counts as a natural process, but no definition in biology ever is.
  • Transgenics are organisms that have had new DNA sequences introduced into them for example from a different species. As such, their DNA is different in a way that is very unlikely to happen by spontaneous mutation. For technical reasons, this kind of genetic modification, even if it may seem more dramatic than changing a few basepairs, is easier to achieve than genome editing. This the old, ‘classic’, genetic modification that the directive was written to deal with.
  • Mutagenesis is when you do something to an organism to change the rate of spontaneous mutation, e.g. treat it with some mutagenic chemical or radiation. With mutagenesis, you don’t control what change will happen (but you may be able to affect the probability of causing a certain type of mutation, because mutagens have different properties).
  • Finally, genome editing means changing a genetic variant into another. These are changes that could probably be introduced by mutagenesis or crossing, but they can be made more quickly and precisely with editing techniques. This is what people often envisage when we talk about using Crispr/Cas9 in breeding or medicine.

On these definitions, the Crispr/Cas9 (and related systems) can be used to do either transgenics, mutagenesis or editing. You could use it for mutagenesis to generate targeted cuts, and let the cell repair by non-homologous end joining, which introduces deletions or rearrangements. This is how Crispr/Cas9 is used in a lot of molecular biology research, to knock out genes by directing disruptive mutations to them. You could also use it to make transgenics by introducing a foreign DNA sequence. For example, this is what happens when Crispr/Cas9 is used to create artificial gene drive systems. Or, you could edit by replacing alleles with other naturally occurring alleles.

Looking back at what is in the directive, it defines genetically modified organisms, and then it goes on to make a few exceptions — means of genetic modification that are exempted from the directive because they’re considered safe and accepted. The top one is mutagenesis, which was already old hat in 2001. And that takes us to the main question that the judgment answers: Should genome editing methods be slotted in there, with chemical and radiation mutagenesis, which are exempt from the directive even if they’re actually a kind of genetic modification, or should they be subject to the full regulatory weight of the directive, like transgenics? Unfortunately, the court found the latter. They write:

[T]he precautionary principle was taken into account in the drafting of the directive and must also be taken into account in its implementation. … In those circumstances, Article 3(1) of Directive 2001/18, read in conjunction with point 1 of Annex I B to that directive [these passages are where the exemption happens — MJ], cannot be interpreted as excluding, from the scope of the directive, organisms obtained by means of new techniques/methods of mutagenesis which have appeared or have been mostly developed since Directive 2001/18 was adopted. Such an interpretation would fail to have regard to the intention of the EU legislature … to exclude from the scope of the directive only organisms obtained by means of techniques/methods which have conventionally been used in a number of applications and have a long safety record.

My opinion is this: Crispr/Cas9, whether used for genome editing, targeted mutagenesis, or even to make transgenics is genetic modification, but genetic modification can be just as safe as old mutagenesis methods. So what do we need instead of the current genetic modification directive?

First, one could include genome edited and targeted mutagenesis products among the exclusions to the directive. There is no reason to think they’d be any less safe than varieties developed by traditional mutagenesis or by crossing. In fact, the new techniques will give you fewer unexpected other variants as side effects. However, EU law does not seem to acknowledge that kind of argument. There would need to be a new law that isn’t based on the precautionary principle.

Second, one could reform the entire directive to something less draconian. It’s not obvious how to do that, though. On the one hand, the directive is based on perceived risks to human health and the environment of genetic modification itself that have little basis in fact. Maybe starting from the precautionary principle was a reasonable position when the directive was written, but now we know that transgenic organisms in themselves are not a threat to human health, and there is no reason to demand each product be individually evaluated to establish that. On the other hand, one can see the need for some risk assessment of transgenic systems. Say for instance that synthetic gene drives become a reality. We really would want to see some kind of environmental risk assessment before they were used outside of the lab.

Skype a scientist

Skype a scientist is a programme that connects classrooms to scientists for question and answer sessions. I have done it a few times now, and from the scientist’s perspective, it has a lot of reward for not that much work.

It works like this: the Skype a scientist team makes matches based on what kind of scientist the teacher asks for; the scientist writes a letter (or it could be a video or something else) about what they work on; the students prepare questions; and the scientist tries to answer.

One thing I like about the format is how it is driven by student questions, turning the conversation to things students actually want to know, and not just what the the scientist (me) believes there’s a need to ‘explain’ (scare quotes used to imply scepticism). Of course, the framing as a classroom exercise, the priming by the letter, and the fact that the questions pass through the teacher influence the content, but still. I also like how some students ask questions that I suspect are not entirely serious, but that still turn out to be interesting. Something I like less is how each session still is kind of a monologue with little interactivity.

I think it has gone reasonably well. I hope my answers will get more polished with time. Another thing I need to get better at is extracting useful feedback from the teachers to improve what I do. They’ve all said positive things (of course, how else could they respond?), but I’m sure there are all kinds of things I could improve.

Here, enjoy some of the questions I’ve gotten! I won’t answer them here; you will have to sign up your classroom for that. I have organised them into categories that I think reflect the most common types of questions.

Pig and chicken genetics

What are some mutations in pigs that you see?

Have you ever encountered a chicken that had something about it that surprised you?

What kinds of chickens live the longest?

What is significant about the DNA of pigs and chickens?

What is the most pervasive genetic disorders in pigs and chickens?

Which genes have the highest demand from industry?

Evolution

If certain traits are dominant and humans have been around for 6 million years, how do we not have all those dominant traits?

What came first, the chicken or the egg?

Does the DNA of chickens and pigs have any similarity to humans — if so, what percent is common?

When were pigs domesticated and what were they domesticated from?

Hard questions

Are science and religion compatible?

Can genetic engineering lead to the creation of a super-race?

Do you think that, if extra-terrestrial life was found, a breeding program between humans and aliens could exist to create hybrids?

Do you think you could genetically modify pigs to create the perfect bacon?

Can you genetically modify an organism to make it more clever?

Will we be able to genetically modify humans with features from other organisms such as gills, not just single gene traits?

What do you think is the next big genetically modified breakthrough on the horizon?

How far away are we from being able to clone a human (like Dolly)?

Have you researched genes designed to protect chickens or pigs from super bacteria resistant to antibiotics?

Personal stuff

Do you ever get to dissect anything?

What is the most exciting part of your job?

What is your favourite complex trait?

Have you always been interested in science?

What makes your job so important that you are willing to move countries?

Why did you choose to study genetics?

Do you prefer group or solo work?

Are you under intense pressure in your job?

What are you looking forward to working on in the future?

The practice of science

What materials do you use in your research?

Who decides what you research?

How do you use computers to research genes and DNA?

What kind of technology/equipment do you use?

Why do you research pigs and chickens?