Using R: Animal model with simulated data

Last week’s post just happened to use MCMCglmm as an example of an R package that can get confused by tibble-style data frames. To make that example, I simulated some pedigree and trait data. Just for fun, let’s look at the simulation code, and use MCMCglmm and AnimalINLA to get heritability estimates.

First, here is some AlphaSimR code that creates a small random mating population, and collects trait and pedigree:

library(AlphaSimR)

## Founder population
FOUNDERPOP <- runMacs(nInd = 100,
                      nChr = 20,
                      inbred = FALSE,
                      species = "GENERIC")

## Simulation parameters 
SIMPARAM <- SimParam$new(FOUNDERPOP)
SIMPARAM$addTraitA(nQtlPerChr = 100,
                   mean = 100,
                   var = 10)
SIMPARAM$setGender("yes_sys")
SIMPARAM$setVarE(h2 = 0.3)
 
## Random mating for 9 more generations
generations <- vector(mode = "list", length = 10) 
generations[[1]] <- newPop(FOUNDERPOP,
                           simParam = SIMPARAM)


for (gen in 2:10) {

    generations[[gen]] <- randCross(generations[[gen - 1]],
                                    nCrosses = 10,
                                    nProgeny = 10,
                                    simParam = SIMPARAM)

}

## Put them all together
combined <- Reduce(c, generations)


## Extract phentoypes
pheno <- data.frame(animal = combined@id,
                    pheno = combined@pheno[,1])

## Extract pedigree
ped <- data.frame(id = combined@id,
                  dam = combined@mother,
                  sire =combined@father)
ped$dam[ped$dam == 0] <- NA
ped$sire[ped$sire == 0] <- NA

## Write out the files
write.csv(pheno,
          file = "sim_pheno.csv",
          row.names = FALSE,
          quote = FALSE)

write.csv(ped,
          file = "sim_ped.csv",
          row.names = FALSE,
          quote = FALSE)

In turn, we:

  1. Set up a founder population with a AlphaSimR’s generic livestock-like population history, and 20 chromosomes.
  2. Choose simulation parameters: we have an organism with separate sexes, a quantitative trait with an additive polygenic architecture, and we want an environmental variance to give us a heritability of 0.3.
  3. We store away the founders as the first generation, then run a loop to give us nine additional generations of random mating.
  4. Combine the resulting generations into one population.
  5. Extract phenotypes and pedigree into their own data frames.
  6. Optionally, save the latter data frames to files (for the last post).

Now that we have some data, we can fit a quantitative genetic pedigree model (”animal model”) to estimate genetic parameters. We’re going to try two methods to fit it: Markov Chain Monte Carlo and (the unfortunately named) Integrated Nested Laplace Approximation. MCMC explores the posterior distribution by sampling; I’m not sure where I heard it described as ”exploring a mountain by random teleportation”. INLA makes approximations to the posterior that can be integrated numerically; I guess it’s more like building a sculpture of the mountain.

First, a Gaussian animal model in MCMCglmm:

library(MCMCglmm)

## Gamma priors for variances
prior_gamma <- list(R = list(V = 1, nu = 1),
                    G = list(G1 = list(V = 1, nu = 1)))
    
## Fit the model
model_mcmc  <- MCMCglmm(scaled ~ 1,
                        random = ~ animal,
                        family = "gaussian",
                        prior = prior_gamma,
                        pedigree = ped,
                        data = pheno,
                        nitt = 100000,
                        burnin = 10000,
                        thin = 10)

## Calculate heritability for heritability from variance components
h2_mcmc_object  <- model_mcmc$VCV[, "animal"] /
    (model_mcmc$VCV[, "animal"] + model_mcmc$VCV[, "units"])
 
## Summarise results from that posterior
h2_mcmc  <- data.frame(mean = mean(h2_mcmc_object),
                       lower = quantile(h2_mcmc_object, 0.025),
                       upper = quantile(h2_mcmc_object, 0.975),
                       method = "MCMC",
                       stringsAsFactors = FALSE)

And here is a similar animal model in AnimalINLA:

library(AnimalINLA)

## Format pedigree to AnimalINLA's tastes
ped_inla <- ped
ped_inla$id  <- as.numeric(ped_inla$id)
ped_inla$dam  <- as.numeric(ped_inla$dam)
ped_inla$dam[is.na(ped_inla$dam)] <- 0
ped_inla$sire  <- as.numeric(ped_inla$sire)
ped_inla$sire[is.na(ped_inla$sire)] <- 0
    
## Turn to relationship matrix
A_inv <- compute.Ainverse(ped_inla)
    
## Fit the model
model_inla  <- animal.inla(response = scaled,
                           genetic = "animal",
                           Ainverse = A_inv,
                           type.data = "gaussian",
                           data = pheno,
                           verbose = TRUE)

## Pull out summaries from the model object
summary_inla  <- summary(model_inla)

## Summarise results
h2_inla  <- data.frame(mean = summary_inla$summary.hyperparam["Heritability", "mean"],
                       lower = summary_inla$summary.hyperparam["Heritability", "0.025quant"],
                       upper = summary_inla$summary.hyperparam["Heritability", "0.975quant"],
                       method = "INLA",
                       stringsAsFactors = FALSE)

If we wrap this all in a loop, we can see how the estimation methods do on replicate data (full script on GitHub). Here are estimates and intervals from ten replicates (black dots show the actual heritability in the first generation):

As you can see, the MCMC and INLA estimates agree pretty well and mostly hit the mark. In the one replicate dataset where they falter, they falter together.

There is no breeder’s equation for environmental change

This post is about why heritability coefficients of human traits can’t tell us what to do. Yes, it is pretty much an elaborate subtweet.

Let us begin in a different place, where heritability coefficients are useful, if only a little. Imagine there is selection going on. It can be natural or artificial, but it’s selection the old-fashioned way: there is some trait of an individual that makes it more or less likely to successfully reproduce. We’re looking at one generation of selection: there is one parent generation, some of which reproduce and give rise to the offspring generation.

Then, if we have a well-behaved quantitative trait, no systematic difference between the environments that the two generations experience (also, no previous selection; this is one reason I said ‘if only a little’), we can get an estimate of the response to selection, that is how the mean of the trait will change between the generations:

R = h^2S

R is the response. S, the selection differential, is the difference between the mean all of the parental generation and the selected parents, and thus measures the strength of the selection. h2 is the infamous heritability, which measures the accuracy of the selection.

That is, the heritability coefficient tells you how well the selection of parents reflect the offspring traits. A heritability coefficient of 1 would mean that selection is perfect; you can just look at the parental individuals, pick the ones you like, and get the whole selection differential as a response. A heritability coefficient of 0 means that looking at the parents tells you nothing about what their offspring will be like, and selection thus does nothing.

Conceptually, the power of the breeder’s equation comes from the mathematical properties of selection, and the quantitative genetic assumptions of a linear parent–offspring relationship. (If you’re a true connoisseur of theoretical genetics or a glutton for punishment, you can derive it from the Price equation; see Walsh & Lynch (2018).) It allows you to look (one generation at a time) into the future only because we understand what selection does and assume reasonable things about inheritance.

We don’t have that kind machinery for environmental change.

Now, another way to phrase the meaning of the heritability coefficient is that it is a ratio of variances, namely the additive genetic variance (which measures the trait variation that runs in families) divided by the total variance (which measures the total variation in the population, duh). This is equally valid, more confusing, and also more relevant when we’re talking about something like a population of humans, where no breeding program is going on.

Thus, the heritability coefficient is telling us, in a specific highly geeky sense, how much of trait variation is due to inheritance. Anything we can measure about a population will have a heritability coefficient associated with it. What does this tell us? Say, if drug-related crime has yay big heritability, does that tell us anything about preventing drug-related crime? If heritability is high, does that mean interventions are useless?

The answers should be evident from the way I phrased those rhetorical questions and from the above discussion: There is no theoretical genetics machinery that allows us to predict the future if the environment changes. We are not doing selection on environments, so the mathematics of selection don’t help us. Environments are not inherited according to the rules of quantitative genetics. Nothing prevents a trait from being eminently heritable and respond even stronger to changes in the environment, or vice versa.

(There is also the argument that quantitative genetic modelling of human traits matters because it helps control for genetic differences when estimating other factors. One has to be more sympathetic towards that, because who can argue against accurate measurement? But ought implies can. For quantitative genetic models to be better, they need to solve the problems of identifying variance components and overcoming population stratification.)

Much criticism of heritability in humans concern estimation problems. These criticisms may be valid (estimation is hard) or silly (of course, lots of human traits have substantial heritabilities), but I think they miss the point. Even if accurately estimated, heritabilities don’t do us much good. They don’t help us with the genetic component, because we’re not doing breeding. They don’t help us with the environmental component, because there is no breeder’s equation for environmental change.

Journal club of one: ‘The heritability fallacy’

Public debate about genetics often seems to centre on heritability and on psychiatric and mental traits, maybe because we really care about our minds, and because for a long time heritability was all human geneticists studying quantitative traits could estimate. Here is an anti-heritabililty paper that I think articulates many of the common grievances: Moore & Shenk (2016) The heritability fallacy. The abstract gives a snappy summary of the argument:

The term ‘heritability,’ as it is used today in human behavioral genetics, is one of the most misleading in the history of science. Contrary to popular belief, the measurable heritability of a trait does not tell us how ‘genetically inheritable’ that trait is. Further, it does not inform us about what causes a trait, the relative influence of genes in the development of a trait, or the relative influence of the environment in the development of a trait. Because we already know that genetic factors have significant influence on the development of all human traits, measures of heritability are of little value, except in very rare cases. We, therefore, suggest that continued use of the term does enormous damage to the public understanding of how human beings develop their individual traits and identities.

At first glance, this paper should be a paper for me. I tend to agree that heritability estimates of human traits aren’t very useful. I also agree that geneticists need to care about the interpretations of their claims beyond the purely scientific domain. But the more I read, the less excited I became. The paper is a list of complaints about heritability coefficients. Some are more or less convincing. For example, I find it hard to worry too much about the ‘equal environments assumption’ in twin studies. But sure, it’s hard to identify variance components, and in practice, researchers sometimes restort to designs that are a lot iffier than twin studies.

But I think the main thrust of the paper is this huge overstatement:

Most important of all is a deep flaw in an assumption that many people make about biology: That genetic influences on trait development can be separated from their environmental context. However, contemporary biology has demonstrated beyond any doubt that traits are produced by interactions between genetic and nongenetic factors that occur in each moment of developmental time … That is to say, there are simply no such things as gene-only influences.

There certainly is such a thing as additive genetic variance as well as additive gene action. This passage only makes sense to me if ‘interaction’ is interpreted not as a statistical term but as describing a causal interplay. If so, it is perfectly true that all traits are the outcomes of interplay between genes and environment. It doesn’t follow that genetic variants in populations will interact with variable environments to the degree that quantitative genetic models are ‘nonsensical in most circumstances’.

They illustrate with this parable: Billy and Suzy are filling a bucket. Suzy is holding the hose and Billy turns on the tap. How much of the water is due to Billy and how much is due to Suzy? The answer is supposed to be that the question makes no sense, because they are both filling the bucket through a causal interplay. Well. If they’re filling a dozen buckets, and halfway through, Billy opens the tap half a turn more, and Suzy starts moving faster between buckets, because she’s tired of this and wants lunch … The correct level of analysis for the quantitative bucketist isn’t Billy, Suzy and the hose. It is the half-turn of the tap and Suzy’s moving of the nozzle.

The point is that quantitative genetic models describe variation between individuals. The authors know this, of course, but they write as if genetic analysis of variance is some kind of sleight of hand, as if quantitative genetics ought to be about development, and the fact that it isn’t is a deliberate obfuscation. Here is how they describe Jay Lush’s understanding of heritability:

The intention was ‘to quantify the level of predictability of passage of a biologically interesting phenotype from parent to offspring’. In this way, the new technical use of ‘heritability’ accurately reflected that period’s understanding of genetic determinism. Still, it was a curious appropriation of the term, because—even by the admission of its proponents—it was meant only to represent how variation in DNA relates to variation in traits across a population, not to be a measure of the actual influence of genes on the development of any given trait.

I have no idea what position Lush took on genetic determinism. But we can find the context of heritability by looking at the very page before in Animal breeding plans. The definition of the heritability coefficient occurs on page 87. This is how Lush starts the chapter on page 86:

In the strictest sense of the word, the question of whether a characteristic is hereditary or environmental has no meaning. Every characteristic is both hereditary and environmental, since it is the end result of a long chain of interactions of the genes with each other, with the environment and with the intermediate products at each stage of development. The genes cannot develop the characteristic unless they have the proper environment, and no amount of attention to the environment will cause the characteristc to develop unless the necessary genes are present. If either the genes or the environment are changed, the characteristic which results from their interactions may be changed.

I don’t know — maybe the way quantitative genetics has been used in human behavioural and psychiatric genetics invites genetic determinism. Or maybe genetic determinism is one of those false common-sense views that are really hard to unlearn. In any case, I don’t think it’s reasonable to put the blame on the concept of heritability for not being some general ‘measure of the biological inheritability of complex traits’ — something that it was never intended to be, and cannot possibly be.

My guess is that new debates will be about polygenic scores and genomic prediction. I hope that will be more useful.

Literature

David S. Moore & David Shenk (2016) The heritability fallacy

Jay Lush Animal breeding plans. Online at: https://archive.org/details/animalbreedingpl032391mbp/page/n99

Journal club of one: ”Maternal and additive gentic effects contribute to variation in offspring traits in a lizard”

The posts this week have been about epigenetics. However, let’s step back from the molecular mechanisms and what not to look at the bigger picture. This recent paper by Noble, McFarlane, Keogh and Whiting (2014) looks at maternal effects and additive genetic effects on fitness-related traits in a lizard. Now we are in quantitative genetics territory where one uses pedigrees and phenotypes to look at the determinants of a trait while abstracting away the mechanistic details. Nowadays, quantitative genetics is also equipped with Bayesian animal models and the ability to do parentage assignment with molecular methods.

The authors measured at size, body mass, and growth and as well as the speed and endurance when running. The fun part is that while only endurance had a substantial heritability (0.4), the other traits had maternal components in the 0.2-0.5 range. So for most of the traits there’s little heritability while a big chunk of the trait variance is explained by maternal effects.

Comments:

I like the idea to include maternal traits to see look at what causes the maternal effect. Clutch size, maternal size and condition seem matter for some trait or another. In two cases the maternal effect is entirely explained away: the effect on growth by birth date and clutch size, and sprint speed by birth date.

The inferences come from an animal model that include a maternal effect. Something I’m curious about is how heritability would be overestimated if the maternal component was not accounted for. That is beside the point of the paper, though.

Another interesting point: I think everyone who deals with animals in some type of controlled environment wonder about how much our measurements differ from what would’ve been measured in a more natural environment. In this case, the authors measured offspring growth both in the test environment and in an enclosure. They find a maternal effect in the test environment, while the interval for the heritability goes from almost zero to 0.5. In the wilder environment they estimate very little genetic and maternal variance, as well as a larger residual variance. I don’t know if this is just because of increased noise, or because maternal effects actually interact with condition.

Also, I love figure 1 (the one figure). If more papers had caterpillar plots of most important estimated quantities, the world would be a better place.

Literature

Noble, D. W., McFarlane, S. E., Keogh, J. S., & Whiting, M. J. (2014). Maternal and additive genetic effects contribute to variation in offspring traits in a lizard. Behavioral Ecology, aru032.