Sequencing-based methods called Dart

Some years ago James Hadfield at Enseqlopedia made a spreadsheet of acronyms for sequencing-based methods with some 50 rows. I can only imagine how long it would be today.

The overloading of acronyms is becoming a bit ridiculous. I recently saw a paper about DART-seq, a method for detecting N6-methyladenosine in RNA (Meyer 2019), and thought, ”wait a minute, isn’t DART-seq a reduced representation genotyping method?” It is, only stylised as DArTseq (seriously). Apparently, it’s also a droplet RNA-sequencing method (Saikia et al. 2018).

What are these methods doing?

  • DArT, diversity array technology, is a way to enrich for a part of a genome. It was originally developed with array technology in mind (Jaccoud et al. 2001). They take some DNA, cut it with restriction enzymes, add adapters and amplify regions close to the cut. Then they clone the resulting DNA, and then attach it to a slide, and that gives a custom microarray of anonymous fragments from the genome. For the Dart-seq version, it seems they make a sequencing library instead of going on to cloning (Ren et al. 2015). It falls in the same family as GBS and RAD-seq methods.
  • DART-seq, droplet-assisted RNA targeting, builds on Drop-seq, where they put single cells and beads that carry primers into the same oil droplet. As cells lyse, the RNA sticks to the primer. The beads also have a barcode so they can be identified in sequencing. Then they break the emulsion, reverse transcribe the RNA attached to beads, amplify and sequence. That is cool. However, because they capture the RNA with oligo-dT primers, they sequence from the 3′ end of the RNA. The Dart method adds primers to the beads, so they can target some specific RNAs and amplify more of them. It’s the super-high-tech version of gene-specific primers for reverse transcription..
  • DART-seq, deamination adjacent to RNA modification targets, uses a synthetic fusion protein that combines APOBEC1, which deaminates cytidines, with a protein domain from YTHDF2 which binds N6-methyladenosine. If an RNA has N6-methyladenosine, cytidines close to it will be deaminated to uracil, and this modification is common at bases right next to a cytidine. After RNA-sequencing, this will look like Cs next to As turning into Ts. It’s a little bit like bisulfite sequencing of methylated DNA, but with RNA.

On the one hand: Don’t people search the internet before they name their methods, or do they not care? On the other hand, realistically, the genotyping method Dart and the single cell RNA-seq method Dart are unlikely to show up in the same work. If you can call your groups ”treatment” and ”control” for the purpose of a paper, maybe you can call your method ”Dart”, and no-one gets too confused.

Using R: Animal model with simulated data

Last week’s post just happened to use MCMCglmm as an example of an R package that can get confused by tibble-style data frames. To make that example, I simulated some pedigree and trait data. Just for fun, let’s look at the simulation code, and use MCMCglmm and AnimalINLA to get heritability estimates.

First, here is some AlphaSimR code that creates a small random mating population, and collects trait and pedigree:


## Founder population
FOUNDERPOP <- runMacs(nInd = 100,
                      nChr = 20,
                      inbred = FALSE,
                      species = "GENERIC")

## Simulation parameters 
SIMPARAM$addTraitA(nQtlPerChr = 100,
                   mean = 100,
                   var = 10)
SIMPARAM$setVarE(h2 = 0.3)
## Random mating for 9 more generations
generations <- vector(mode = "list", length = 10) 
generations[[1]] <- newPop(FOUNDERPOP,
                           simParam = SIMPARAM)

for (gen in 2:10) {

    generations[[gen]] <- randCross(generations[[gen - 1]],
                                    nCrosses = 10,
                                    nProgeny = 10,
                                    simParam = SIMPARAM)


## Put them all together
combined <- Reduce(c, generations)

## Extract phentoypes
pheno <- data.frame(animal = combined@id,
                    pheno = combined@pheno[,1])

## Extract pedigree
ped <- data.frame(id = combined@id,
                  dam = combined@mother,
                  sire =combined@father)
ped$dam[ped$dam == 0] <- NA
ped$sire[ped$sire == 0] <- NA

## Write out the files
          file = "sim_pheno.csv",
          row.names = FALSE,
          quote = FALSE)

          file = "sim_ped.csv",
          row.names = FALSE,
          quote = FALSE)

In turn, we:

  1. Set up a founder population with a AlphaSimR’s generic livestock-like population history, and 20 chromosomes.
  2. Choose simulation parameters: we have an organism with separate sexes, a quantitative trait with an additive polygenic architecture, and we want an environmental variance to give us a heritability of 0.3.
  3. We store away the founders as the first generation, then run a loop to give us nine additional generations of random mating.
  4. Combine the resulting generations into one population.
  5. Extract phenotypes and pedigree into their own data frames.
  6. Optionally, save the latter data frames to files (for the last post).

Now that we have some data, we can fit a quantitative genetic pedigree model (”animal model”) to estimate genetic parameters. We’re going to try two methods to fit it: Markov Chain Monte Carlo and (the unfortunately named) Integrated Nested Laplace Approximation. MCMC explores the posterior distribution by sampling; I’m not sure where I heard it described as ”exploring a mountain by random teleportation”. INLA makes approximations to the posterior that can be integrated numerically; I guess it’s more like building a sculpture of the mountain.

First, a Gaussian animal model in MCMCglmm:


## Gamma priors for variances
prior_gamma <- list(R = list(V = 1, nu = 1),
                    G = list(G1 = list(V = 1, nu = 1)))
## Fit the model
model_mcmc  <- MCMCglmm(scaled ~ 1,
                        random = ~ animal,
                        family = "gaussian",
                        prior = prior_gamma,
                        pedigree = ped,
                        data = pheno,
                        nitt = 100000,
                        burnin = 10000,
                        thin = 10)

## Calculate heritability for heritability from variance components
h2_mcmc_object  <- model_mcmc$VCV[, "animal"] /
    (model_mcmc$VCV[, "animal"] + model_mcmc$VCV[, "units"])
## Summarise results from that posterior
h2_mcmc  <- data.frame(mean = mean(h2_mcmc_object),
                       lower = quantile(h2_mcmc_object, 0.025),
                       upper = quantile(h2_mcmc_object, 0.975),
                       method = "MCMC",
                       stringsAsFactors = FALSE)

And here is a similar animal model in AnimalINLA:


## Format pedigree to AnimalINLA's tastes
ped_inla <- ped
ped_inla$id  <- as.numeric(ped_inla$id)
ped_inla$dam  <- as.numeric(ped_inla$dam)
ped_inla$dam[$dam)] <- 0
ped_inla$sire  <- as.numeric(ped_inla$sire)
ped_inla$sire[$sire)] <- 0
## Turn to relationship matrix
A_inv <- compute.Ainverse(ped_inla)
## Fit the model
model_inla  <- animal.inla(response = scaled,
                           genetic = "animal",
                           Ainverse = A_inv,
                  = "gaussian",
                           data = pheno,
                           verbose = TRUE)

## Pull out summaries from the model object
summary_inla  <- summary(model_inla)

## Summarise results
h2_inla  <- data.frame(mean = summary_inla$summary.hyperparam["Heritability", "mean"],
                       lower = summary_inla$summary.hyperparam["Heritability", "0.025quant"],
                       upper = summary_inla$summary.hyperparam["Heritability", "0.975quant"],
                       method = "INLA",
                       stringsAsFactors = FALSE)

If we wrap this all in a loop, we can see how the estimation methods do on replicate data (full script on GitHub). Here are estimates and intervals from ten replicates (black dots show the actual heritability in the first generation):

As you can see, the MCMC and INLA estimates agree pretty well and mostly hit the mark. In the one replicate dataset where they falter, they falter together.

Using R: When weird errors occur in packages that used to work, check that you’re not feeding them a tibble

There are some things that are great about the tidyverse family of R packages and the style they encourage. There are also a few gotchas. Here’s a reminder to myself about this phenomenon: tidyverse-style data frames (”tibbles”) do not simplify to vectors upon extracting a single column with hard bracket indexing.

Because some packages rely on specific data.frame behaviours that tibbles don’t show, functions that work nicely with data frames, and normally have nice interpretable error messages, may mysteriously collapse in all kinds of ways when fed a tibble.

Here’s an example with MCMCglmm. This is not to pick on MCMCglmm; it just happened to be one of the handful of packages where I’ve run into this issue. Here, we use readr, the tidyverse alternative to the read.table family of functions to read some simulated data. The base function is called read.csv, and the readr alternative is read_csv.

Reading in tabular data is a surprisingly hard problem: tables can be formatted in any variety of obnoxious ways, and the reading function also needs to be fast enough to deal with large files. Using readr certainly isn’t always painless, but it reduces the friction a lot compared to read.table. One of the improvements is that read_csv will return a data.frame with the class tbl_df, affectionately called ”tibble

After reading the data, we centre and scale the trait, set up some priors and run an animal model. Unfortunately, MCMCglmm will choke on the tibble, and deliver a confusing error message.


ped <- read_csv("sim_ped.csv")
pheno <- read_csv("sim_pheno.csv")

pheno$scaled <- scale(pheno$pheno)

prior_gamma <- list(R = list(V = 1, nu = 1),
                    G = list(G1 = list(V = 1, nu = 1)))

model <- MCMCglmm(scaled ~ 1,
                  random = ~ animal,
                  family = "gaussian",
                  prior = prior_gamma,
                  pedigree = ped,
                  data = pheno,
                  nitt = 100000,
                  burnin = 10000,
                  thin = 10)

Error in inverseA(pedigree = pedigree, scale = scale, nodes = nodes) : 
  individuals appearing as dams but not in pedigree
In addition: Warning message:
In if (attr(pedigree, "class") == "phylo") { :
  the condition has length > 1 and only the first element will be used

In this pedigree, it is not the case that there are individuals appearing as dams but not listed. If we turn the data and pedigree into vanilla data frames instead, it will work:

ped <-
pheno <-

model <- MCMCglmm(scaled ~ 1,
                  random = ~ animal,
                  family = "gaussian",
                  prior = prior_gamma,
                  pedigree = ped,
                  data = pheno,
                  nitt = 100000,
                  burnin = 10000,
                  thin = 10)

                       MCMC iteration = 0

                       MCMC iteration = 1000

                       MCMC iteration = 2000

Genes do not form networks

As a wide-eyed PhD student, I read a lot of papers about gene expression networks and was mightily impressed by their power. You can see where this is going, can’t you?

Someone on Twitter talked about their doubts about gene networks: how networks ”must” be how biology works, but that they weren’t sure that network methods actually had helped genetics that much, how there are compelling annotation term enrichments, and individual results that ”make sense”, but not many hard predictions. I promise I’m not trying to gossip about them behind their back, but I couldn’t find the tweets again. If you think about it, however, I don’t think genes must form networks at all, quite the opposite. But there are probably reasons why the network idea is so attractive.

(Edit: Here is the tweet I was talking about by Jeffrey Barrett! Thanks to Guillaume Devailly for pointing me to it.)

First, network representations are handy! There are all kinds of things about genes that can be represented as networks: coexpression, protein interactions, being mentioned in the same PubMed abstract, working on the same substrate, being annotated by the same GO term, being linked in a database such as STRING which tries to combine all kinds of protein–protein interactions understood broadly (Szklarczyk & al 2018), differential coexpression, co-differential expression (Hudson, Reverter & Dalrymple 2009), … There are all kinds of ways of building networks between genes: correlations, mutual information, Bayesian networks, structural equations models … Sometimes one of them will make an interesting biological phenomena stand out and become striking to the eye, or to one of the many ways to cluster nodes and calculate their centrality.

Second, networks are appealing. Birgitte Nerlich has this great blog post–On books, circuits and life–about metaphors for gene editing (the book of life, writing, erasing, cutting and editing) and systems biology (genetic engineering, circuits, wiring, the genetic program). Maybe the view of gene networks fits into the latter category, if we imagine that the extremely dated analogy with cybernetics (Peluffo 2015) has been replaced with the only slightly dated idea of a universal network science. After Internet and Albert, Jeong & Barabási (1999), what could be more apt than understanding genes as forming networks?

I think it’s fair to say that for genes to form networks, the system needs to be reasonably well described by a graph of nodes and edges. If you look at systems of genes that are really well understood, like the gap gene ”network”, you will see that they do not look like this at all. Look at Fig 3 in Jaeger (2011). Here, there is dynamic and spatial information not captured by the network topology that needs to be overlaid for the network view to make sense.

Or look at insulin signalling, in Fig 1 of Nyman et al (2014). Here, there are modified versions of proteins, non-gene products such as glucose and the plasma membrane, and again, dynamics, including both RNA and protein synthesis themselves. There is no justification for assuming that any of that will be captured by any topology or any weighting of genes with edges between them.

We are free to name biological processes networks if we want to; there’s nothing wrong with calling a certain process and group of related genes ”the gap gene network”. And we are free to use any network representation we want when it is useful or visually pleasing, if that’s what we’re going for. However, genes do not actually form networks.


Szklarczyk, D, et al. (2018) STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research.

Hudson, N. J., Reverter, A., & Dalrymple, B. P. (2009). A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS computational biology, 5(5), e1000382.

Peluffo, A. E. (2015). The ”Genetic Program”: behind the genesis of an influential metaphor. Genetics, 200(3), 685-696.

Albert, R., Jeong, H., & Barabási, A. L. (1999). Diameter of the world-wide web. Nature, 401(6749), 130.

Jaeger, J. (2011). The gap gene network. Cellular and Molecular Life Sciences, 68(2), 243-274.

Nyman, E., Rajan, M. R., Fagerholm, S., Brännmark, C., Cedersund, G., & Strålfors, P. (2014). A single mechanism can explain network-wide insulin resistance in adipocytes from obese patients with type 2 diabetes. Journal of Biological Chemistry, 289(48), 33215-33230.

Journal club: ”Template plasmid integration in germline genome-edited cattle”

(This time it’s not just a Journal Club of One, because this post is based on a presentation given at the Hickey group journal club.)

The backstory goes like this: Polled cattle lack horns, and it would be safer and more convenient if more cattle were born polled. Unfortunately, not all breeds have a lot of polled cattle, and that means that breeding hornless cattle is difficult. Gene editing could help (see Bastiaansen et al. (2018) for a model).

In 2013, Tan et al. reported taking cells from horned cattle and editing them to carry the polled allele. In 2016, Carlson et al. cloned bulls based on a couple of these cell lines. The plan was to use the bulls, now grown, to breed polled cattle in Brazil (Molteni 2019). But a few weeks ago, FDA scientists (Norris et al 2019) posted a preprint that found inadvertent plasmid insertion in the bulls, using the public sequence data from 2016. Recombinetics, the company making the edited bulls, conceded that they’d missed the insertion.

”We weren’t looking for plasmid integrations,” says Tad Sonstegard, CEO of Recombinetics’ agriculture subsidiary, Acceligen, which was running the research with a Brazilian consulting partner. ”We should have.”


For context: To gene edit a cell, one needs to bring both the editing machinery (proteins in the case of TALENS, the method used here; proteins and RNA in the case of CRISPR) and the template DNA into the cell. The template DNA is the DNA you want to put in instead of the piece that you’re changing. There are different ways to get the components into the cell. In this case, the template was delivered as part of a plasmid, which is a bacterially-derived circular DNA.

The idea is that the editing machinery should find a specific place in the genome (where the variant that causes polledness is located), make a cut in the DNA, and the cell, in its efforts to repair the cut, will incorporate the template. Crucially, it’s supposed to incorporate only the template, and not the rest of the plasmid. But in this case, the plasmid DNA snuck in too, and became part of the edited chromosome. Biological accidents happen.

How did they miss that, and how did the FDA team detect it? Both the 2016 and 2019 paper are short letters where a lot of the action is relegated to the supplementary materials. Here are pertinent excerpts from Carlson & al 2016:

A first PCR assay was performed using (btHP-F1: 5’- GAAGGCGGCACTATCTTGATGGAA; btHP-R2- 5’- GGCAGAGATGTTGGTCTTGGGTGT) … The PCR creates a 591 bp product for Pc compared to the 389 bp product from the horned allele.

Secondly, clones were analyzed by PCR using the flanking F1 and R1 primers (HP1748-F1- 5’- GGGCAAGTTGCTCAGCTGTTTTTG; HP1594_1748-R1- 5’-TCCGCATGGTTTAGCAGGATTCA) … The PCR creates a 1,748 bp product for Pc compared to the 1,546 bp product from the horned allele.

All PCR products were TOPO cloned and sequenced.

Thus, they checked that the animals were homozygotes for the polled allele (called ”Pc”) by amplifying two diagnostic regions and sequenced them to check the edit. This shows that the target DNA is there.

Then, they used whole-genome short read sequencing to check for off-target edits:

Samples were sequenced to an average 20X coverage on the Illumina HiSeq 2500 high output mode with paired end 125 bp reads were compared to the bovine reference sequence (UMD3.1).

Structural variations were called using CLC probabilistic variant detection tools, and those with >7 reads were further considered even though this coverage provides only a 27.5% probability of accurately detecting heterozygosity.

Upon indel calls for the original non-edited cell lines and 2 of the edited animals, we screened for de novo indels in edited animal RCI-001, which are not in the progenitor cell-line, 2120.

We then applied PROGNOS4 with reference bovine genome build UMD3.1 to compute all potential off-targets likely caused by the TALENs pair.

For all matching sequences computed, we extract their corresponding information for comparison with de novo indels of RCI-001 and RCI-002. BEDTools was adopted to find de novo indels within 20 bp distance of predicted potential targets for the edited animal.

Only our intended edit mapped to within 10 bp of any of the identified degenerate targets, revealing that our animals are free of off-target events and further supporting the high specificity of TALENs, particularly for this locus.

That means, they sequenced the animals’ genomes in short fragment, puzzled it together by aligning it to the cow reference genome, and looked for insertions and deletions in regions that look similar enough that they might also be targeted by their TALENs and cut. And because they didn’t find any insertions or deletions close to these potential off-target sites, they concluded that the edits were fine.

The problem is that short read sequencing is notoriously bad at detecting larger insertions and deletions, especially of sequences that are not in the reference genome. In this case, the plasmid is not normally part of a cattle genome, and thus not in the reference genome. That means that short reads deriving from the inserted plasmid sequence would probably not be aligned anywhere, but thrown away in the alignment process. The irony is that with short reads, the bigger something is, the harder it is to detect. If you want to see a plasmid insertion, you have to make special efforts to look for it.

Tan et al. (2013) were aware of the risk of plasmid insertion, though, at least when concerned with the plasmid delivering the TALEN. Here is a quote:

In addition, after finding that one pair of TALENs delivered as mRNA had similar activity as plasmid DNA (SI Appendix, Fig. S2), we chose to deliver TALENs as mRNA to eliminate the possible genomic integration of TALEN expression plasmids. (my emphasis)

As a sidenote, the variant calling method used to look for off-target effects (CLC Probabilistic variant detection) doesn’t even seem that well suited to the task. The manual for the software says:

The size of insertions and deletions that can be found depend on how the reads are mapped: Only indels that are spanned by reads will be detected. This means that the reads have to align both before and after the indel. In order to detect larger insertions and deletions, please use the InDels and Structural Variation tool instead.

The CLC InDels and Structural Variation tool looks at the unaligned (soft-clipped) ends of short sequence reads, which is one way to get at structural variation with short read sequences. However, it might not have worked either; structural variation calling is a hard task, and the tool does not seem to be built for this kind of task.

What did Norris & al (2019) do differently? They took the published sequence data and aligned it to a cattle reference genome with the plasmid sequence added. Then, they loaded the alignment into the trusty Integrative Genomics Viewer and manually looked for reads aligning to the plasmid and reads supporting junctions between plasmid, template DNA and genome. This bespoken analysis is targeted to find plasmid insertions. The FDA authors must have gone ”nope, we don’t buy this” and decided to look for the plasmid.

Here is what they claim happened (Fig 1): The template DNA is there, as evidenced by the PCR genotyping, but it inserted twice, with the rest of the plasmid in-between.


Here is the evidence (Supplementary figs 1 and 2): These are two annotated screenshots from IGV. The first shows alignments of reads from the calves and the unedited cell lines to the plasmid sequence. In the unedited cells, there are only stray reads, probably misplaced, but in the edited calves, ther are reads covering the plasmid throughout. Unless somehow else contaminated, this shows that the plasmid is somewhere in their genomes.


Where is it then? This second supplementary figure shows alignments to expected junctions: where template DNA and genome are supposed to join. The colourful letters are mismatches, showing where unexpected DNA shows up. This is the evidence for where the plasmid integrated and what kind of complex rearrangement of template, plasmid and genome happened at the cut site. This must have been found by looking at alignments, hypothesising an insertion, and looking for the junctions supporting it.


Why didn’t the PCR and targeted sequencing find this? As this third supplementary figure shows, the PCRs used could, theoretically, produce longer products including plasmid sequence. But they are way too long for regular PCR.


Looking at this picture, I wonder if there were a few attempts to make a primer pair that went from insert into the downstream sequence, that failed and got blamed on bad primer design or PCR conditions.

In summary, the 2019 preprint finds indirect evidence of the plasmid insertion by looking hard at short read alignments. Targeted sequencing or long read sequencing could give better evidence by observing he whole insertion. Recombinetics have acknowledged the problem, which makes me think that they’ve gone back to the DNA samples and checked.

Where does that leave us with quality control of gene editing? There are three kinds of problems to worry about:

  • Off-target edits in similar places in other parts of the genome; this seems to be what people used to worry about the most, and what Carlson & al checked for
  • Complex rearrangements around cut site (probably due to repeated cutting; this became a big concern after Kosicki & al (2018), and should apply both to on- and off-target cuts
  • Insertion of plasmid or mutated target; this is what happened in here

The ways people check gene edits (targeted Sanger sequencing and short read sequencing) doesn’t detect any of them particularly well, at least not without bespoke analysis. Maybe the kind of analysis that Norris & al do could be automated to some extent, but currently, the state of the art seems to be to manually look closely at alignments. If I was reviewing the preprint, I would have liked it if the manuscript had given a fuller description of how they arrived at this picture, and exactly what the evidence for this particular complex rearrangement is. This is a bit hard to follow.

Finally, is this embarrassing? On the one hand, this is important stuff, plasmid integration is a known problem, so the original researchers probably should have looked harder for it. On the other hand, the cell lines were edited and the clones born before a lot of the discussion and research of off-target edits and on-target rearrangements that came out of CRISPR being widely applied, and when long read sequencing was a lot less common. Maybe it was easier to think that the sort read off-target analysis was enough then. In any case, we need a solid way to quality check edits.


Molteni M. (2019) Brazil’s plan for gene edited-cows got scrapped–here’s why. Wired.

Carlson DF, et al. (2016) Production of hornless dairy cattle from genome-edited cell lines. Nature Biotechnology.

Norris AL, et al. (2019) Template plasmid integration in germline genome-edited cattle. BioRxiv.

Tan W, et al. (2013) Efficient nonmeiotic allele introgression in livestock using custom endonucleases. Proceedings of the National Academy of Sciences.

Bastiaansen JWM, et al. (2018) The impact of genome editing on the introduction of monogenic traits in livestock. Genetics Selection Evolution.

Kosicki M, Tomberg K & Bradley A. (2018) Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements. Nature Biotechnology.

Computational Genetics Discussion Cookies

The Computational Genetics Discussion Group is an informal seminar series on anything quantitative genetics, genomics and breeding run by the Hickey group at the Roslin Institute. Over the last year and a half or so, I’ve been the one emailing people and bringing biscuits, and at some point, I got fed up with the biscuits available at my local Tesco. Here is my recipe for computational genetics discussion cookies.

Makes ca 50 cookies

1. Melt and brown 250 g butter.

2. Mix 100 g white sugar, 100 g Demerara sugar, 65 g of golden syrup, and 2 teaspoons of vanilla extract.

3. Add the melted butter and two eggs and whisk together.

4. Mix 375 g of flour and 0.75 teaspoons of bicarbonate. Add this to the butter, egg and sugar mix.

5. Split the batter into two halves. To each half, add one of:

  • 300 g chopped chocolate
  • 5 crushed digestive biscuits and 2 teaspoons of ground cinnamon
  • 50 g of crushed mini pretzels and 200 g of chopped fruit jellies
  • 50 g of oats and 120 g of raisins (weigh the raisins dry and then soak in hot water)
  • 75 g of desiccated coconut and 120 g of raisins
  • 125 g of granola mix

6. Bake for 7.5 minutes at 200 degrees Celsius.

7. Let rest for at least two minutes before moving off of the tray.

Temple Grandin at Roslin: optimisation and overselection

A couple of months ago (16 May to be precise), I listened to a talk by Temple Grandin at the Roslin Institute.

Grandin is a captivating speaker, and as an animal scientist (of some kind), I’m happy to have heard her talk at least once. The lecture contained a mix of:

  • practical experiences from a career of troubleshooting livestock management and systems,
  • how thinking differently (visually) helps in working with animal behaviour,
  • terrific personal anecdotes, among other things about starting up her business as a livestock management consultant from a student room,
  • a recurring theme, throughout the talk, of unintended side-effects in animal breeding, framed as a risk of ”overselecting” for any one trait, uncertainty about ”what is optimal”, and the importance of measuring and soberly evaluating many different things about animals and systems.

This latter point interests me, because it concerns genetics and animal breeding. Judging by the question in the Q&A, it also especially interested rest of the audience, mostly composed of vet students.

Grandin repeatedly cautioned against ”overselecting”. She argued that if you take one trait, any trait, and apply strong directional selection, bad side-effects will emerge. As a loosely worded biological principle, and taken to extremes, this seems likely to be true. If we assume that traits are polygenic, that means both that variants are likely to be pleiotropic (because there are many causal variants and a limited number of genes; this one argument for the omnigenic model) and that variants are likely to be linked to other variants that affect other traits. So changing one trait a lot is likely to affect other traits. And if we assume that the animal was in a pretty well-functioning state before selection, we should expect that if some trait that we’re not consciously selecting on changes far enough from that state, that is likely to cause problems.

We can also safely assume that there are always more traits that we care about than we can actually measure, either because they haven’t become a problem yet, or because we don’t have a good way to measure them. Taken together, this sound like a case for being cautious, measuring a lot of things about animal performance and welfare, and continuously re-evaluating what one is doing. Grandin emphasised the importance of measurement, drumming in that: ”you will manage what you measure”, ”this happens gradually”, and therefore, there is a risk that ”the bad becomes the new normal” if one does not keep tabs on the situation by recording hard quantitative data.

Doesn’t this sound a lot like the conventional view of mainstream animal breeding? I guess it depends: breeding is a big field, covering a lot of experiences and views, from individual farmers’ decisions, through private and public breeding organisations, to the relative Castalia of academic research. However, the impression from my view of the field, is Grandin and mainstream animal breeders are in agreement about the importance of:

  1. recording lots of traits about all aspects of the performance and functioning of the animal,
  2. optimising them with good performance on the farm as the goal,
  3. constantly re-evaluating practice and updating the breeding goals and management to keep everything on track.

To me, what Grandin presented as if it was a radical message (and maybe it was, some time ago, or maybe it still is, in some places) sounded much like singing the praises of economic selection indices. I had expected something more controversial. Then again, that depends on what assumptions are built into words like ”good performance”, ”on track”, ”functioning of the animal” etc. For example, she talked a bit about the strand of animal welfare research that aims to quantify positive emotions in animals; one could take the radical stance that we should measure positive emotions and include that in the breeding goal.

”Overselection” as a term also carries connotations that I don’t agree with, because I don’t think that the framing as biological overload is helpful. To talk about overload and ”overselection” makes one think of selection as a force that strains the animal in itself, and the alternative as ”backing off” (an expression term Grandin repeatedly used in the talk). But if the breeding goal is off the mark, in the sense that it doesn’t get towards what’s actually optimal for the animal on the farm, breeding less efficiently is not getting you to a better outcome; it only gets towards the same, suboptimal, outcome more slowly. The problem isn’t efficiency in itself, but misspecification, and uncertainty about what the goal should be.

Grandin expands on this idea in the introductory chapter to ”Are we pushing animals to their biological limits? Welfare and ethical applications” (Grandin & Whiting 2018, eds). (I don’t know much about the pig case used as illustration, but I can think of a couple of other examples that illustrate the same point.) It ends with this great metaphor about genomic power tools, that I will borrow for later:

We must be careful not to repeat the mistakes that were made with conventional breeding where bad traits were linked with desirable traits. One of the best ways to prevent this is for both animal and plant breeders to do what I did in the 1980s and 1990s: I observed many different pigs from many places and the behaviour problems became obvious. This enabled me to compare animals from different lines in the same environment. Today, both animal and plant breeders have ‘genomic power tools’ for changing an organism’s genetics. Power tools are good things, but they must be used carefully because changes gan be made more quickly. A circular saw can chop your hand off much more easily than a hand saw. It has to be used with more care.