Temple Grandin at Roslin: optimisation and overselection

A couple of months ago (16 May to be precise), I listened to a talk by Temple Grandin at the Roslin Institute.

Grandin is a captivating speaker, and as an animal scientist (of some kind), I’m happy to have heard her talk at least once. The lecture contained a mix of:

  • practical experiences from a career of troubleshooting livestock management and systems,
  • how thinking differently (visually) helps in working with animal behaviour,
  • terrific personal anecdotes, among other things about starting up her business as a livestock management consultant from a student room,
  • a recurring theme, throughout the talk, of unintended side-effects in animal breeding, framed as a risk of ”overselecting” for any one trait, uncertainty about ”what is optimal”, and the importance of measuring and soberly evaluating many different things about animals and systems.

This latter point interests me, because it concerns genetics and animal breeding. Judging by the question in the Q&A, it also especially interested rest of the audience, mostly composed of vet students.

Grandin repeatedly cautioned against ”overselecting”. She argued that if you take one trait, any trait, and apply strong directional selection, bad side-effects will emerge. As a loosely worded biological principle, and taken to extremes, this seems likely to be true. If we assume that traits are polygenic, that means both that variants are likely to be pleiotropic (because there are many causal variants and a limited number of genes; this one argument for the omnigenic model) and that variants are likely to be linked to other variants that affect other traits. So changing one trait a lot is likely to affect other traits. And if we assume that the animal was in a pretty well-functioning state before selection, we should expect that if some trait that we’re not consciously selecting on changes far enough from that state, that is likely to cause problems.

We can also safely assume that there are always more traits that we care about than we can actually measure, either because they haven’t become a problem yet, or because we don’t have a good way to measure them. Taken together, this sound like a case for being cautious, measuring a lot of things about animal performance and welfare, and continuously re-evaluating what one is doing. Grandin emphasised the importance of measurement, drumming in that: ”you will manage what you measure”, ”this happens gradually”, and therefore, there is a risk that ”the bad becomes the new normal” if one does not keep tabs on the situation by recording hard quantitative data.

Doesn’t this sound a lot like the conventional view of mainstream animal breeding? I guess it depends: breeding is a big field, covering a lot of experiences and views, from individual farmers’ decisions, through private and public breeding organisations, to the relative Castalia of academic research. However, the impression from my view of the field, is Grandin and mainstream animal breeders are in agreement about the importance of:

  1. recording lots of traits about all aspects of the performance and functioning of the animal,
  2. optimising them with good performance on the farm as the goal,
  3. constantly re-evaluating practice and updating the breeding goals and management to keep everything on track.

To me, what Grandin presented as if it was a radical message (and maybe it was, some time ago, or maybe it still is, in some places) sounded much like singing the praises of economic selection indices. I had expected something more controversial. Then again, that depends on what assumptions are built into words like ”good performance”, ”on track”, ”functioning of the animal” etc. For example, she talked a bit about the strand of animal welfare research that aims to quantify positive emotions in animals; one could take the radical stance that we should measure positive emotions and include that in the breeding goal.

”Overselection” as a term also carries connotations that I don’t agree with, because I don’t think that the framing as biological overload is helpful. To talk about overload and ”overselection” makes one think of selection as a force that strains the animal in itself, and the alternative as ”backing off” (an expression term Grandin repeatedly used in the talk). But if the breeding goal is off the mark, in the sense that it doesn’t get towards what’s actually optimal for the animal on the farm, breeding less efficiently is not getting you to a better outcome; it only gets towards the same, suboptimal, outcome more slowly. The problem isn’t efficiency in itself, but misspecification, and uncertainty about what the goal should be.

Grandin expands on this idea in the introductory chapter to ”Are we pushing animals to their biological limits? Welfare and ethical applications” (Grandin & Whiting 2018, eds). (I don’t know much about the pig case used as illustration, but I can think of a couple of other examples that illustrate the same point.) It ends with this great metaphor about genomic power tools, that I will borrow for later:

We must be careful not to repeat the mistakes that were made with conventional breeding where bad traits were linked with desirable traits. One of the best ways to prevent this is for both animal and plant breeders to do what I did in the 1980s and 1990s: I observed many different pigs from many places and the behaviour problems became obvious. This enabled me to compare animals from different lines in the same environment. Today, both animal and plant breeders have ‘genomic power tools’ for changing an organism’s genetics. Power tools are good things, but they must be used carefully because changes gan be made more quickly. A circular saw can chop your hand off much more easily than a hand saw. It has to be used with more care.

Jennifer Doudna & Samuel Sternberg ”A Crack In Creation”

While the blog is on a relaxed summer schedule, you can read my book review of Jennifer Doudna’s and Samuel Sternberg’s CRISPR-related autobiography and discussion of the future of genome editing in University of Edinburgh’s Science Magazine EUSci issue #24.

The book is called A Crack in Creation, subtitled The New Power to Control Evolution, or depending on edition, Gene Editing and the Unthinkable Power to Control Evolution. There are a couple of dramatic titles for you. The book starts out similarly dramatic, with Jennifer Doudna dreaming of a beach in Hawaii, where she comes from, imagining a wave rising from the ocean to crash down on her. The wave is CRISPR/Cas9 genome editing, the technological force of nature that Doudna and colleagues have let loose on the world.

I like the book, talk about some notable omissions, and take issue with the bombastic and inaccurate title(s). Read the rest here.

Links

Pivotal CRISPR patent battle won by Broad Institute. Nature News. 2018.

Sharon Begley & Andrew Joseph. The CRISPR shocker: How genome-editing scientist He Jiankui rose from obscurity to stun the world. Stat News. 2018.

A Crack in Creation. The book’s website.

‘Simulating genetic data with R: an example with deleterious variants (and a pun)’

A few weeks ago, I gave a talk at the Edinburgh R users group EdinbR on the RAGE paper. Since this is an R meetup, the talk concentrated on the mechanics of genetic data simulation and with the paper as a case study. I showed off some of what Chris Gaynor’s AlphaSimR can do, and how we built on that to make the specifics of this simulation study. The slides are on the EdinbR Github.

Genetic simulation is useful for all kinds of things. Sure, they’re only as good as the theory that underpins them, but the willingness to try things out in simulations is one of the things I always liked about breeding research.

This is my description of the logic of genetic simulation: we think of the genome as a large table of genotypes, drawn from some distribution of allele frequencies.

To make an utterly minimal simulation, we could draw allele frequencies from some distribution (like a Beta distribution), and then draw the genotypes from a binomial distribution. Done!

However, there is a ton of nuance we would like to have: chromosomes, linkage between variants, sexes, mating, selection …

AlphaSimR addresses all of this, and allows you to throw individuals and populations around to build pretty complicated designs. Here is the small example simulation I used in the talk.


library(AlphaSimR)
library(ggplot2)

## Generate founder chromsomes

FOUNDERPOP <- runMacs(nInd = 1000,
                      nChr = 10,
                      segSites = 5000,
                      inbred = FALSE,
                      species = "GENERIC")


## Simulation parameters

SIMPARAM <- SimParam$new(FOUNDERPOP)
SIMPARAM$addTraitA(nQtlPerChr = 100,
                   mean = 100,
                   var = 10)
SIMPARAM$addSnpChip(nSnpPerChr = 1000)
SIMPARAM$setGender("yes_sys")


## Founding population

pop <- newPop(FOUNDERPOP,
              simParam = SIMPARAM)

pop <- setPheno(pop,
                varE = 20,
                simParam = SIMPARAM)


## Breeding

print("Breeding")
breeding <- vector(length = 11, mode = "list")
breeding[[1]] <- pop

for (i in 2:11) {
    print(i)
    sires <- selectInd(pop = breeding[[i - 1]],
                       gender = "M",
                       nInd = 25,
                       trait = 1,
                       use = "pheno",
                       simParam = SIMPARAM)

    dams <- selectInd(pop = breeding[[i - 1]],
                      nInd = 500,
                      gender = "F",
                      trait = 1,
                      use = "pheno",
                      simParam = SIMPARAM)

    breeding[[i]] <- randCross2(males = sires,
                                females = dams,
                                nCrosses = 500,
                                nProgeny = 10,
                                simParam = SIMPARAM)
    breeding[[i]] <- setPheno(breeding[[i]],
                              varE = 20,
                              simParam = SIMPARAM)
}



## Look at genetic gain and shift in causative variant allele frequency

mean_g <- unlist(lapply(breeding, meanG))
sd_g <- sqrt(unlist(lapply(breeding, varG)))

plot_gain <- qplot(x = 1:11,
                   y = mean_g,
                   ymin = mean_g - sd_g,
                   ymax = mean_g + sd_g,
                   geom = "pointrange",
                   main = "Genetic mean and standard deviation",
                   xlab = "Generation", ylab = "Genetic mean")

start_geno <- pullQtlGeno(breeding[[1]], simParam = SIMPARAM)
start_freq <- colSums(start_geno)/(2 * nrow(start_geno))

end_geno <- pullQtlGeno(breeding[[11]], simParam = SIMPARAM)
end_freq <- colSums(end_geno)/(2 * nrow(end_geno))

plot_freq_before <- qplot(start_freq, main = "Causative variant frequency before") 
plot_freq_after <- qplot(end_freq, main = "Causative variant frequency after") 

This code builds a small livestock population, breeds it for ten generations, and looks at the resulting selection response in the form of a shift of the genetic mean, and the changes in the underlying distribution of causative variants. Here are the resulting plots:

‘Approaches to genetics for livestock research’ at IASH, University of Edinburgh

A couple of weeks ago, I was at a symposium on the history of genetics in animal breeding at the Institute of Advanced Studies in the Humanities, organized by Cheryl Lancaster. There were talks by two geneticists and two historians, and ample time for discussion.

First geneticists:

Gregor Gorjanc presented the very essence of quantitative genetics: the pedigree-based model. He illustrated this with graphs (in the sense of edges and vertices) and by predicting his own breeding value for height from trait values, and from his personal genomics results.

Then, yours truly gave this talk: ‘Genomics in animal breeding from the perspectives of matrices and molecules’. Here are the slides (only slightly mangled by Slideshare). This is the talk I was preparing for when I collected the quotes I posted a couple of weeks ago.

I talked about how there are two perspectives on genomics: you can think of genomes either as large matrices of ancestry indicators (statistical perspective) or as long strings of bases (sequence perspective). Both are useful, and give animal breeders and breeding researchers different tools (genomic selection, reference genomes). I also talked about potential future breeding strategies that use causative variants, and how they’re not about stopping breeding and designing the perfect animal in a lab, but about supplementing genomic selection in different ways.

Then, historians:

Cheryl Lancaster told the story of how ABGRO, the Animal Breeding and Genetics Research Organisation in Edinburgh, lost its G. The organisation was split up in the 1950s, separating fundamental genetics research and animal breeding. She said that she had expected this split to be do to scientific, methodological or conceptual differences, but instead found when going through the archives, that it all was due to personal conflicts. She also got into how the ABGRO researchers justified their work, framing it as ”fundamental research”, and aspired to do long term research projects.

Jim Lowe talked about the pig genome sequencing and mapping efforts, how it was different from the human genome project in organisation, and how it used comparisons to the human genome a lot. Here he’s showing a photo of Alan Archibald using the gEVAL genome browser to quality-check the pig genome. He also argued that the infrastructural outcomes of a project like the human genome project, such as making it possible for pig genome scientists to use the human genome for comparisons, are more important and less predictable than usually assumed.

The discussion included comments by some of the people who were there (Chris Haley, Bill Hill), discussion about the breed concept, and what scientists can learn from history.

What is a breed? Is it a genetical thing, defined by grouping individuals based on their relatedness, a historical thing, based on what people think a certain kind of animal is supposed to look like, or a marketing tool, naming animals that come from a certain system? It is probably a bit of everything. (I talked with Jim Lowe during lunch; he had noticed how I referred to Griffith & Stotz for gene concepts, but omitted the ”post-genomic” gene concept they actually favour. This is because I didn’t find it useful for understanding how animal breeding researchers think. It is striking how comfortable biologists are with using fuzzy concepts that can’t be defined in a way that cover all corner cases, because biology doesn’t work that way. If the nominal gene concept is broken by trans-splicing, practicing genomicists will probably think of that more as a practical issue with designing gene databases than a something that invalidates talking about genes in principle.)

What would researchers like to learn from history? Probably how to succeed with large research endeavors and how to get funding for them. Can one learn that from history? Maybe not, but there might be lessons about thinking of research as ”basic”, ”fundamental”, ”applied” etc, and about what the long term effects of research might be.

What single step does with relationship

We had a journal club about the single step GBLUP method for genomic evaluation a few weeks ago. In this post, we’ll make a few graphs of how the single step method models relatedness between individuals.

Imagine you want to use genomic selection in a breeding program that already has a bunch of historical pedigree and trait information. You could use some so-called multistep evaluation that uses one model for the classical pedigree + trait quantitative genetics and one model for the genotype + trait genomic evaluation, and then mix the predictions from them together. Or you could use the single-step method, which combines pedigree, genotypes and traits into one model. It does this by combining the relationship estimates from pedigree and genotypes. That matrix can then go into your mixed model.

We’ll illustrate this with a tiny simulated population: five generations of 100 individuals per generation, where ten random pairings produce the next generation, with families of ten individuals. (The R code is on Github and uses AlphaSimR for simulation and AGHmatrix for matrices). Here is a heatmap of the pedigree-based additive relationship matrix for the population:

What do we see? In the lower left corner are the founders, and not knowing anything about their heritage, the matrix has them down as unrelated. The squares of high relatedness along the diagonal are the families in each generation. As we go upwards and to the right, relationship is building up.

Now, imagine the last generation of the population also has been genotyped with a SNP chip. Here is a heatmap of their genomic relationship matrix:

Genomic relationship is more detailed. We can still discern the ten families within the last generation, but no longer are all the siblings equally related to each other and to their ancestors. The genotyping helps track segregation within families, pointing out to us when relatives are more or less related than the average that we get from the pedigree.

Enter the single-step relationship matrix. The idea is to put in the genomic relationships for the genotyped individuals into the big pedigree-based relationship matrix, and then adjust the rest of the matrix to propagate that extra information we now have from the genotyped individuals to their ungenotyped relatives. Here is the resulting heatmap:

You can find the matrix equations in Legarra, Aguilar & Misztal (2009). The matrix, called H, is broken down into four partitions called H11, H12, H21, and H22. H22 is the part that pertains to the genotyped animals, and it’s equal to the genomic relationship matrix G (after some rescaling). The others are transformations of G and the corresponding parts of the additive relationship matrix, spreading G onto A.

To show what is going on, here is the difference between the additive relationship matrix and the single-step relationship matrix, with lines delineating the genotyped animals and breaking the matrix into the four partitions:

What do we see? In the top right corner, we have a lot of difference, where the genomic relationship matrix has been plugged in. Then, fading as we go from top to bottom and from right to left, we see the influence of the genomic relationship on relatives, diminishing the further we get from the genotyped individuals.

Literature

Legarra, Andres, I. Aguilar, and I. Misztal. ”A relationship matrix including full pedigree and genomic information.” Journal of dairy science 92.9 (2009): 4656-4663.

Excerpts about genomics in animal breeding

Here are some good quotes I’ve come across while working on something.

Artificial selection on the phenotypes of domesticated species has been practiced consciously or unconsciously for millennia, with dramatic results. Recently, advances in molecular genetic engineering have promised to revolutionize agricultural practices. There are, however, several reasons why molecular genetics can never replace traditional methods of agricultural improvement, but instead they should be integrated to obtain the maximum improvement in economic value of domesticated populations.

Lande R & Thompson R (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics.

Smith and Smith suggested that the way to proceed is to map QTL to low resolution using standard mapping methods and then to increase the resolution of the map in these regions in order to locate more closely linked markers. In fact, future developments should make this approach unnecessary and make possible high resolution maps of the whole genome, even, perhaps, to the level of the DNA sequence. In addition to easing the application of selection on loci with appreciable individual effects, we argue further that the level of genomic information available will have an impact on infinitesimal models. Relationship information derived from marker information will replace the standard relationship matrix; thus, the average relationship coefficients that this represents will be replaced by actual relationships. Ultimately, we can envisage that current models combining few selected QTL with selection on polygenic or infinitesimal effects will be replaced with a unified model in which different regions of the genome are given weights appropriate to the variance they explain.

Haley C & Visscher P. (1998) Strategies to utilize marker–quantitative trait loci associations. Journal of Dairy Science.

Instead, since the late 1990s, DNA marker genotypes were included into the conventional BLUP analyses following Fernando and Grossman (1989): add the marker genotype (0, 1, or 2, for an animal) as a fixed effect to the statistical model for a trait, obtain the BLUP solutions for the additive polygenic effect as before, and also obtain the properly adjusted BLUE solution for the marker’s allele substitution effect; multiply this BLUE by 0, 1, or 2 (specic for the animal) and add the result to the animal’s BLUP to obtain its marker-enhanced EBV. A logical next step was to treat the marker genotypes as semi-random effects, making use of several different shrinkage strategies all based on the marker heritability (e.g., Tsuruta et al., 2001); by 2007, breeding value estimation packages such as PEST (Neumaier and Groeneveld, 1998) supported this strategy as part of their internal calculations. At that time, a typical genetic evaluation run for a production trait would involve up to 30 markers.

Knol EF, Nielsen B, Knap PW. (2016) Genomic selection in commercial pig breeding. Animal Frontiers.

Although it has not caught the media and public imagination as much as transgenics and cloning, genomics will, I believe, have just as great a long-term impact. Because of the availability of information from genetically well-researched species (humans and mice), genomics in farm animals has been established in an atypical way. We can now see it as progressing in four phases: (i) making a broad sweep map (~20 cM) with both highly informative (microsatellite) and evolutionary conserved (gene) markers; (ii) using the informative markers to identify regions of chromosomes containing quantitative trait loci (QTL) controlling commercially important traits–this requires complex pedigrees or crosses between phenotypically anc genetically divergent strains; (iii) progressing from the informative markers into the QTL and identifying trait genes(s) themselves either by complex pedigrees or back-crossing experiments, and/or using the conserved markers to identify candidate genes from their position in the gene-rich species; (iv) functional analysis of the trait genes to link the genome through physiology to the trait–the ‘phenotype gap’.

Bulfield G. (2000) Biotechnology: advances and impact. Journal of the Science of Food and Agriculture.

I believe animal breeding in the post-genomic era will be dramatically different to what it is today. There will be massive research effort to discover the function of genes including the effect of DNA polymorphisms on phenotype. Breeding programmes will utilize a large number of DNA-based tests for specific genes combined with new reproductive techniques and transgenes to increase the rate of genetic improvement and to produce for, or allocate animals to, the product line to which they are best suited. However, this stage will not be reached for some years by which time many of the early investors will have given up, disappointed with the early benefits.

Goddard M. (2003). Animal breeding in the (post-) genomic era. Animal Science.

Genetics is a quantitative subject. It deals with ratios, with measurements, and with the geometrical relationships of chromosomes. Unlike most sciences that are based largely on mathematical techniques, it makes use of its own system of units. Physics, chemistry, astronomy, and physiology all deal with atoms, molecules, electrons, centimeters, seconds, grams–their measuring systems are all reducible to these common units. Genetics has none of these as a recognizable component in its fundamental units, yet it is a mathematically formulated subject that is logically complete and self-contained.

Sturtevant AH & Beadle GW. (1939) An introduction to genetics. W.B. Saunders company, Philadelphia & London.

We begin by asking why genes on nonhomologous chromosomes assort independently. The simple cytological story rehearsed above answers the questions. That story generates further questions. For example, we might ask why nonhomologous chromosomes are distributed independently at meiosis. To answer this question we could describe the formation of the spindle and the migration of chromosomes to the poles of the spindle just before meiotic division. Once again, the narrative would generate yet further questions. Why do the chromosomes ”condense” at prophase? How is the spindle formed? Perhaps in answering these questions we would begin to introduce the chemical details of the process. Yet simply plugging a molecular account into the narratives offered at the previous stages would decrease the explanatory power of those narratives.

Kitcher, P. (1984) 1953 and all that. A tale of two sciences. Philosophical Review.

And, of course, this great quote by Jay Lush.

There is no breeder’s equation for environmental change

This post is about why heritability coefficients of human traits can’t tell us what to do. Yes, it is pretty much an elaborate subtweet.

Let us begin in a different place, where heritability coefficients are useful, if only a little. Imagine there is selection going on. It can be natural or artificial, but it’s selection the old-fashioned way: there is some trait of an individual that makes it more or less likely to successfully reproduce. We’re looking at one generation of selection: there is one parent generation, some of which reproduce and give rise to the offspring generation.

Then, if we have a well-behaved quantitative trait, no systematic difference between the environments that the two generations experience (also, no previous selection; this is one reason I said ‘if only a little’), we can get an estimate of the response to selection, that is how the mean of the trait will change between the generations:

R = h^2S

R is the response. S, the selection differential, is the difference between the mean all of the parental generation and the selected parents, and thus measures the strength of the selection. h2 is the infamous heritability, which measures the accuracy of the selection.

That is, the heritability coefficient tells you how well the selection of parents reflect the offspring traits. A heritability coefficient of 1 would mean that selection is perfect; you can just look at the parental individuals, pick the ones you like, and get the whole selection differential as a response. A heritability coefficient of 0 means that looking at the parents tells you nothing about what their offspring will be like, and selection thus does nothing.

Conceptually, the power of the breeder’s equation comes from the mathematical properties of selection, and the quantitative genetic assumptions of a linear parent–offspring relationship. (If you’re a true connoisseur of theoretical genetics or a glutton for punishment, you can derive it from the Price equation; see Walsh & Lynch (2018).) It allows you to look (one generation at a time) into the future only because we understand what selection does and assume reasonable things about inheritance.

We don’t have that kind machinery for environmental change.

Now, another way to phrase the meaning of the heritability coefficient is that it is a ratio of variances, namely the additive genetic variance (which measures the trait variation that runs in families) divided by the total variance (which measures the total variation in the population, duh). This is equally valid, more confusing, and also more relevant when we’re talking about something like a population of humans, where no breeding program is going on.

Thus, the heritability coefficient is telling us, in a specific highly geeky sense, how much of trait variation is due to inheritance. Anything we can measure about a population will have a heritability coefficient associated with it. What does this tell us? Say, if drug-related crime has yay big heritability, does that tell us anything about preventing drug-related crime? If heritability is high, does that mean interventions are useless?

The answers should be evident from the way I phrased those rhetorical questions and from the above discussion: There is no theoretical genetics machinery that allows us to predict the future if the environment changes. We are not doing selection on environments, so the mathematics of selection don’t help us. Environments are not inherited according to the rules of quantitative genetics. Nothing prevents a trait from being eminently heritable and respond even stronger to changes in the environment, or vice versa.

(There is also the argument that quantitative genetic modelling of human traits matters because it helps control for genetic differences when estimating other factors. One has to be more sympathetic towards that, because who can argue against accurate measurement? But ought implies can. For quantitative genetic models to be better, they need to solve the problems of identifying variance components and overcoming population stratification.)

Much criticism of heritability in humans concern estimation problems. These criticisms may be valid (estimation is hard) or silly (of course, lots of human traits have substantial heritabilities), but I think they miss the point. Even if accurately estimated, heritabilities don’t do us much good. They don’t help us with the genetic component, because we’re not doing breeding. They don’t help us with the environmental component, because there is no breeder’s equation for environmental change.