Paper: ‘Removal of alleles by genome editing (RAGE) against deleterious load’

Our new paper is about using predicted deleterious variants in animal breeding. We use simulation to look at the potential to improve livestock fitness by either selecting on detected deleterious variants or removing deleterious alleles by genome editing.

Summary

Deleterious variants occur when errors in DNA replication that disrupt the function of a gene. Such errors are frequent enough that all organisms carry mildly deleterious variants. Geneticists describe this as a deleterious load, that cause organisms to be less healthy and fit than they could have been if these errors didn’t happen. Load is especially pertinent to livestock populations, because of their relatively small population sizes and inbreeding.

Historically, it has not been possible to observe deleterious variants directly, but as genome sequencing becomes cheaper and new bioinformatic methods are being developed, we can now sequence livestock and detect variants that are likely to be deleterious.

In this study, we used computer simulation to see how future breeding strategies involving selection or genome editing could be used to reduce deleterious load. We tested selection against deleterious load and genome editing strategy we call RAGE (Removal of Alleles by Genome Editing) in simulated livestock populations to see how it improved fitness. The simulations suggest that selecting on deleterious variants identified from genome sequencing may help improve fitness of livestock populations, and that genome editing to remove deleterious variants could improve them even more.

For these strategies to be effective, it is important that detection of deleterious variants is accurate, and genome editing of more than one variant per animal would need to become possible without damaging side-effects. Future research on how to measure deleterious load in large sequence datasets from livestock animals, and how to perform genome editing safely and effectively will be important.

Figure 2 from the paper, showing the average fitness of simulated populations (y-axis) over the generations of breeding (x-axis) with different types of future breeding against deleterious variants.

‘RAGE against …’, what’s with the acronym?

We are very happy with the acronym. In addition to making at least two pop culture references, it’s also a nod to Promotion of Alleles by Genome Editing (PAGE) from Jenko et al. (2015). I like that the acronyms, both PAGE and RAGE, emphasises that we’re dealing with alleles that already exist within a population. We propose using genome editing as a way to promote alleles we like and remove alleles we don’t like in addition to classical breeding. The fancy new biotechnology does not replace selection, but supplements it.

Do you really think one should genome edit farm animals?

Yes, if all the bio- and reproductive technology can be made to work! Currently, genome editing methods like Crispr/Cas9 require many attempts to get precise editing to the desired allele at one place, and it doesn’t scale to multiple edits in the same animal … Not yet. But lots of smart people are competing to make it happen.

Genome editing of farm animals would also need a lot of reproductive technology, that currently isn’t really there (but probably more so for cattle than for other species). Again, lots of clever people work on it.

If it can be made to work, genome editing could be a useful breeding method.

What about the ethics of genome editing?

We don’t discuss ethics much in the paper. In one simple sense, that is because ethics isn’t our expertise. I also think a discussion of the ethics of RAGE, much like an informed discussion about the economics of it, requires empirical knowledge that we don’t have yet.

I am not of the opinion that there is a dignity or integrity to the genome that would prohibit genome editing as a rule. So the question is not ‘genome editing or not’, but ‘under what circumstances and for what applications is genome editing useful and justified?’ and ‘are the benefits of RAGE, PAGE, or whatever -GE, enough to outweigh the risks and costs?’. There is room for uncertainty and disagreement about those questions.

For a good discussion of the ethics of genome editing that is likely to raise more questions than it answers, see Eriksson et al. (2018). Among other things, they make the point that advanced reproductive technologies is a precondition for genome editing, but kind of slips out of the discussion sometimes. I think the most pressing question, both from the ethical and economical perspective, is whether the benefits of genome editing are enough to justify widespread use of reproductive technologies (in species where that isn’t already commonplace). I also like how they make the point that one needs to look at the specific applications of genome editing, in context, when evaluating them.

The simulation looks nifty! I want to simulate breeding programs like that!

You can! The simulations used the quantitative genetic simulation R package AlphaSimR with some modifications for simulating the fitness traits. There is code with the paper. Here are also the slides from when I talked about the paper at the Edinburgh R user group.

You make a ton of assumptions!

We do. Some of them are extremely uncontroversial (the basic framework of segregation and recombination during inheritance), some we can get some idea about by looking at the population genetics literature (we’ve taken inspiration from estimates of deleterious mutation rates and effect distributions estimated from humans), and some we don’t have much knowledge about at all (how does load of deleterious variants relate to the production, reproduction and health traits that are important to breeding? The only way to know is to measure). If you read the paper, don’t skip that part of the Discussion.

Would this work in plants?

Yes, probably! Plant breeding programs are a bit different, so I guess one should simulate them to really know. RAGE would be a part of the ‘Breeding 4.0’ logic of Wallace, Rodgers-Melnick & Butler (2018). In many ways the problems with plants are smaller, with less unknown reproductive technology that needs to be invented first, and an easier time field testing edited individuals.

Literature

Johnsson M, Gaynor RC, Jenko J, Gorjanc G, de Koning D-J, Hickey, JM. (2019) Removal of alleles by genome editing (RAGE) against deleterious load. Genetics Selection Evolution.

Jenko J, Gorjanc G, Cleveland MA, Varshney RK, Whitelaw CBA, Woolliams JA, Hickey JM. (2015). Potential of promotion of alleles by genome editing to improve quantitative traits in livestock breeding programs. Genetics Selection Evolution.

Eriksson, S., Jonas, E., Rydhmer, L., & Röcklinsberg, H. (2018). Invited review: Breeding and ethical perspectives on genetically modified and genome edited cattle. Journal of dairy science, 101(1), 1-17.

Wallace, J. G., Rodgers-Melnick, E., & Buckler, E. S. (2018). On the road to Breeding 4.0: unraveling the good, the bad, and the boring of crop quantitative genomics. Annual review of genetics, 52, 421-444.

Greek in biology

This is a fun essay about biological terms borrowed from or inspired by Greek, written by a group of (I presume) Greek speakers: Iliopoulos & al (2019), Hypothesis, analysis and synthesis, it’s all Greek to me.

We hope that this contribution will encourage scientists to think about the terminology used in modern science, technology and medicine (Wulff, 2004), and to be more careful when seeking to introduce new words and phrases into our vocabulary.

First, I like how they celebrate the value of knowing more than one language. I feel like bi- and multilingualism in science is most often discussed as a problem: Either we non-native speakers have problems catching up with the native speakers, or we’re burdening them with our poor writing. Here, the authors seem to argue that knowing another language (Greek) helps both your understanding of scientific language, and the style and grace with which you use it.

I think this is the central argument:

Non-Greek speakers will, we are sure, be surprised by the richness and structure of the Greek language, despite its often inept naturalization in English or other languages, and as a result be better able to understand their own areas of science (Snell, 1960; Montgomery, 2004). Our favorite example is the word ‘analysis’: everyone uses it, but few fully understand it. ‘Lysis’ means ‘breaking up’, while ‘ana-‘ means ‘from bottom to top’ but also ‘again/repetitively’: the subtle yet ingenious latter meaning of the term implies that if you break up something once, you might not know how it works; however, if you break up something twice, you must have reconstructed it, so you must understand the inner workings of the system.

I’m sure it is true that some of the use of Greek-inspired terms in scientific English is inept, and would benefit from checking by someone who knows Greek. However, this passage invites two objections.

First, why would anyone think that the Greek language has less richness and structure then English? Then again, if I learned Greek, it is possible that I would find that the richness would be even more than I expected.

Second, does knowing Greek mean that you have a deeper appreciation for the nuances of a concept like analysis? Maybe ‘analysis’ as understood without those double meanings of the ‘ana-‘ prefix is less exciting, but if it is true that most people don’t know about this subtlety, this can’t be what they mean by ‘analysis’. So, if that etymological understanding isn’t part of how most people use the word, do we really understand it better by learning that story? It sounds like they think that the word is supposed to have a true meaning separate from how it is used, and I’m not sure that is helpful.

So what are some less inept uses of Greek? They like the term ‘epigenomics’, writing that it is being ‘introduced in a thoughtful and meaningful way’. To me, this seems like an unfortunate example, because I can think of few terms in genomics that cause more confusion. ‘Epigenomics’ is the upgraded version of ‘epigenetics’, a word which was, unfortunately, coined at least twice with different meanings. And now, epigenetics is this two-headed beast that feeds on geneticists’s energy as they try to understand what on earth other geneticists are saying.

First, Conrad Waddington glued ‘epigenesis’ and ‘genetics’ together to define epigenetics as ‘the branch of biology that studies the causal interactions between genes and their products which bring the phenotype into being’ (Waddington 1942, quoted in Deans & Maggert 2015). That is, it is what we today might call developmental genetics. Later, David Nanney connected it to gene regulatory mechanisms that are stable through cell division, and we get the modern view of epigenetics as a layer of regulatory mechanisms on top of the DNA sequence. I would be interested to know which of these two intertwined meanings it is that the authors like.

Judging by the affiliations of the authors, the classification of the paper (by the way, how is this ‘computational and systems biology, genetics and genomics’, eLife?), and the citations (16 of 27 to medicine and science journals, a lot of which seems to be similar opinion pieces), this feels like a missed opportunity to connect with language scholarship. I’m no better myself–I’m not a scholar of language, and I haven’t tried to invite one to co-write this blog post with me … But there must be scholarship and expertise outside biomedicine relevant to this topic, and language sources richer than an etymological online dictionary?

Finally, the table of new Greek-inspired terms that ‘might be useful’ is a fun thought exercise, and if it serves as inspiration for someone to have an eureka moment about a concept they need to investigate, great (‘… but what is a katagenome, really? Oh, maybe …’). But I think that telling scientists to coin new words is inviting catastrophe. I’d much rather take the lesson that we need fewer new tortured terms borrowed from Greek, rather than more of them. It’s as if I, driven by the nuance and richness I recognise in my own first language, set out to coin övergenome, undergenome and pågenome.

Kauai field trip 2018

Let’s keep the tradition of delayed travel posts going!

In August last year, I joined Dom Wright, Rie Henriksen, and Robin Abbey-Lee, as part of Dom’s FERALGEN project, on their field work on Kauai. I did some of my dissertation work on the Kauai feral chickens, but I never saw them live until now. Our collaborator Eben Gering was also on the islands, but the closest we got to each other was Skyping between the islands. It all went smoothly until the end of the trip, when a hurricane came uncomfortably close to the island for a while. Here are some pictures. In time, I promise to blog about the actual research too.

Look! Chickens by the sea, chickens on parking lots, a sign telling people not to feed the chickens on a sidewalk in central Kapaa! Lots of chickens.

I’m not kidding: lots of chickens.

Links

An old Nature News feature from a previous field trip (without me)

My post about our 2016 paper on Kauai feralisation genomics

‘Any distinction in principle between qualitative and quantitative characters disappeared long ago’

Any distinction in principle between qualitative and quantitative characters disappeared long ago, although in the early days of Mendelism it was often conjectured that they might be inherited according to fundamentally different laws.

If it is still convenient to call some characters qualitative and others quantitative, it is only to denote that the former naturally have a discontinuous and the latter a continuous distribution, or that the former are not easily measured on a familiar metrical scale. Colors are an example. Differences between colors can be measured in terms of length of light waves, hue, brilliance etc., but most of us find it difficult to compare those measurements with our own visual impressions.

Most quantitative characters are affected by many pairs of genes and also importantly by environmental variations. It is rarely possible to identify the pertinent genes in a Mendelian way or to map the chromosomal position of any of them. Fortunately this inability to identify and describe the genes individually is almost no handicap to the breeder of economic plants or animals. What he would actually do if he knew the details about all the genes which affect a quantitative character in that population differs little from what he will do if he merely knows how heritable it is and whether much of the hereditary variance comes from dominance or overdominance, and from epistatic interactions between the genes.

(That last part might not always be true anymore, but it still remained on point for more than half the time that genetics as a discipline has existed.)

Jay L Lush (1949) Heritability of quantitative characters in farm animals

Using R: plotting the genome on a line

Imagine you want to make a Manhattan-style plot or anything else where you want a series of intervals laid out on one axis after one another. If it’s actually a Manhattan plot you may have a friendly R package that does it for you, but here is how to cobble the plot together ourselves with ggplot2.

We start by making some fake data. Here, we have three contigs (this could be your chromosomes, your genomic intervals or whatever) divided into one, two and three windows, respectively. Each window has a value that we’ll put on the y-axis.

library(dplyr)
library(ggplot2)

data <- data_frame(contig = c("a", "a", "a", "b", "b", "c"),
                   start = c(0, 500, 1000, 0, 500, 0),
                   end = c(500, 1000, 1500, 500, 1000, 200),
                   value = c(0.5, 0.2, 0.4, 0.5, 0.3, 0.1))

We will need to know how long each contig is. In this case, if we assume that the windows cover the whole thing, we can get this from the data. If not, say if the windows don’t go up to the end of the chromosome, we will have to get this data from elsewhere (often some genome assembly metadata). This is also where we can decide in what order we want the contigs.

contig_lengths <- summarise(group_by(data, contig), length = max(end))

Now, we need to transform the coordinates on each contig to coordinates on our new axis, where we lay the contings after one another. What we need to do is to add an offset to each point, where the offset is the sum of the lengths of the contigs we’ve layed down before this one. We make a function that takes three arguments: two vectors containing the contig of each point and the position of each point, and also the table of lengths we just made.

flatten_coordinates <- function(contig, coord, contig_lengths) {
    coord_flat <- coord
    offset <- 0

    for (contig_ix in 1:nrow(contig_lengths)) {
        on_contig <- contig == contig_lengths$contig[contig_ix]
        coord_flat[on_contig] <- coord[on_contig] + offset
        offset <- offset + contig_lengths$length[contig_ix]
    }

    coord_flat
}

Now, we use this to transform the start and end of each window. We also transform the vector of the length of the contigs, so we can use it to add vertical lines between the contigs.

data$start_flat <- flatten_coordinates(data$contig,
                                       data$start,
                                       contig_lengths)
data$end_flat <- flatten_coordinates(data$contig,
                                     data$end,
                                     contig_lengths)
contig_lengths$length_flat <- flatten_coordinates(contig_lengths$contig,
                                                  contig_lengths$length,
                                                  contig_lengths)

It would be nice to label the x-axis with contig names. One way to do this is to take the coordinates we just made for the vertical lines, add a zero, and shift them one position, like so:

axis_coord <- c(0, contig_lengths$length_flat[-nrow(contig_lengths)])

Now it’s time to plot! We add one layer of points for the values on the y-axis, where each point is centered on the middle of the window, followed by a layer of vertical lines at the borders between contigs. Finally, we add our custom x-axis, and also some window dressing.

plot_genome <- ggplot() +
    geom_point(aes(x = (start_flat + end_flat)/2,
                   y = value),
               data = data) +
    geom_vline(aes(xintercept = length_flat),
               data = contig_lengths) +
    scale_x_continuous(breaks = axis_coord,
                       labels = contig_lengths$contig,
                       limits = c(0, max(contig_lengths$length_flat))) +
    xlab("Contig") + ylim(0, 1) + theme_bw()

And this is what we get:

I’m sure your plot will look more impressive, but you get the idea.

Neutral citation again

Here is a piece of advice about citation:

Rule 4: Cite transparently, not neutrally

Citing, even in accordance with content, requires context. This is especially important when it happens as part of the article’s argument. Not all citations are a part of an article’s argument. Citations to data, resources, materials, and established methods require less, if any, context. As part of the argument, however, the mere inclusion of a citation, even when in the right spot, does not convey the value of the reference and, accordingly, the rationale for including it. In a recent editorial, the Nature Genetics editors argued against so-called neutral citation. This citation practice, they argue, appears neutral or procedural yet lacks required displays of context of the cited source or rationale for including [11]. Rather, citations should mention assessments of value, worth, relevance, or significance in the context of whether findings support or oppose reported data or conclusions.

This flows from the realisation that citations are political, even though that term is rarely used in this context. Researchers can use them to accurately represent, inflate, or deflate contributions, based on (1) whether they are included and (2) whether their contributions are qualified. Context or rationale can be qualified by using the right verbs. The contribution of a specific reference can be inflated or deflated through the absence of or use of the wrong qualifying term (‘the authors suggest’ versus ‘the authors establish’; ‘this excellent study shows’ versus ‘this pilot study shows’). If intentional, it is a form of deception, rewriting the content of scientific canon. If unintentional, it is the result of sloppy writing. Ask yourself why you are citing prior work and which value you are attributing to it, and whether the answers to these questions are accessible to your readers.

When Nature Genetics had an editorial condemning neutral citation, I took it to be a demand that authors show that they’ve read and thought about the papers they cite.

This piece of advice seems to ask something different: that authors be honest about their opinions about a work they cite. That is a radical suggestion, because if people were, I believe readers would get offended. That is, if the paper wasn’t held back by offended peer reviewers before it reached any readers. Honestly, as a reviewer, I would probably complain if I saw a value-laden and vacuous statement like ‘this excellent study’ in front of a citation. It would seem to me an rude attempt to tell the reader what to think.

So how are we to cite a study? On the one hand, we can’t just drop the citation in a sentence, but are obliged to ‘mention assessments of value, worth, relevance or significance’. On the other hand, we must make sure that they are ‘qualified by using the right verbs’. And if citation is political, then whether a study ‘suggests’ or ‘establishes’ conclusions is also political.

Disclaimer: I don’t like the 10 simple rules format at all. I find that they belong on someone’s personal blog and not in a scientific journal, given that their evidence for their assertions usually amounts to nothing more than my own meandering experience … This one is an exception, because Bart Penders does research on how scientists collaborate and communicate (even if he cites no research in this particular part of the text).

Penders B (2018) Ten simple rules for responsible referencing. PLoS Computional Biology