Virtual animal breeding journal club: ”Structural equation models to disentangle the biological relationship between microbiota and complex traits …”

The other day was the first Virtual breeding and genetics journal club organised by John Cole. This was the first online journal club I’ve attended (shocking, given how many video calls I’ve been on for other sciencey reasons), so I thought I’d write a little about it: both the format and the paper. You can look the slide deck from the journal club here (pptx file).

The medium

We used Zoom, and that seemed to work, as I’m sure anything else would, if everyone just mute their microphone when they aren’t speaking. As John said, the key feature of Zoom seems to be the ability for the host to mute everyone else. During the call, I think we were at most 29 or so people, but only a handful spoke. It will probably get more intense with the turn taking if more people want to speak.

The format

John started the journal club with a code of conduct, which I expect helped to set what I felt was a good atmosphere. In most journal clubs I’ve been in, I feel like the atmosphere has been pretty good, but I think we’ve all heard stories about hyper-critical and hostile journal clubs, and that doesn’t sound particularly fun or useful. On that note, one of the authors, Oscar González-Recio, was on the call and answered some questions.

The paper

Saborío‐Montero, Alejandro, et al. ”Structural equation models to disentangle the biological relationship between microbiota and complex traits: Methane production in dairy cattle as a case of study.” Journal of Animal Breeding and Genetics 137.1 (2020): 36-48.

The authors measured methane emissions (by analysing breath with with an infrared gas monitor) and abundance of different microbes in the rumen (with Nanopore sequencing) from dairy cows. They genotyped the animals for relatedness.

They analysed the genetic relationship between breath methane and abundance of each taxon of microbe, individually, with either:

  • a bivariate animal model;
  • a structural equations model that allows for a causal effect of abundance on methane, capturing the assumption that the abundance of a taxon can affect the methane emission, but not the other way around.

They used them to estimate heritabilities of abundances and genetic correlations between methane and abundances, and in the case of the structural model: conditional on the assumed causal model, the effect of that taxon’s abundance on methane.

My thoughts

It’s cool how there’s a literature building up on genetic influences on the microbiome, with some consistency across studies. These intense high-tech studies on relatively few cattle might build up to finding new traits and proxies that can go into larger scale phenotyping for breeding.

As the title suggests, the paper advocates for using the structural equations model: ”Genetic correlation estimates revealed differences according to the usage of non‐recursive and recursive models, with a more biologically supported result for the recursive model estimation.” (Conclusions)

While I agree that a priori, it makes sense to assume a structural equations model with a causal structure, I don’t think the results provide much evidence that it’s better. The estimates of heritabilities and genetic correlations from the two models are near indistinguishable. Here is the key figure 4, comparing genetic correlation estimates:

saborio-montero-fig4

As you can see, there are a couple of examples of genetic correlations where the point estimate switches sign, and one of them (Succinivibrio sp.) where the credible intervals don’t overlap. ”Recursive” is the structural equations model. The error bars are 95% credible intervals. This is not strong evidence of anything; the authors are responsible about it and don’t go into interpreting this difference. But let us speculate! They write:

All genera in this case, excepting Succinivibrio sp. from the Proteobacteria phylum, resulted in overlapped genetic cor- relations between the non‐recursive bivariate model and the recursive model. However, high differences were observed. Succinivibrio sp. showed the largest disagreement changing from positively correlated (0.08) in the non‐recursive bivariate model to negatively correlated (−0.20) in the recursive model.

Succinivibrio are also the taxon with the estimated largest inhibitory effect on methane (from the structural equations model).

While some taxa, such as ciliate protozoa or Methanobrevibacter sp., increased the CH4 emissions …, others such as Succinivibrio sp. from Proteobacteria phylum decreased it

Looking at the paper that first described these bacteria (Bryan & Small 1955),  Succinivibrio were originally isolated from the cattle rumen, and their name is because ”they ferment glucose with the production of a large amount of succinic acid”. Bryant & Small made a fermentation experiment to see what came out, and it seems that the bacteria don’t produce methane:

succ_table2

This is also in line with a rRNA sequencing study of high and low methane emitting cows (Wallace & al 2015) that found lower Succinivibrio abundance in high methane emitters.

We may speculate that Succinivibrio species could be involved in diverting energy from methanogens, and thus reducing methane emissions. If that is true, then the structural equations model estimate (larger genetic negative correlation between Succinivibrio abundance and methane) might be better than one from the animal model.

Finally, while I’m on board with the a priori argument for using a structural equations model, as with other applications of causal modelling (gene networks, Mendelian randomisation etc), it might be dangerous to consider only parts of the system independently, where the microbes are likely to have causal effects on each other.

Literature

Saborío‐Montero, Alejandro, et al. ”Structural equation models to disentangle the biological relationship between microbiota and complex traits: Methane production in dairy cattle as a case of study.” Journal of Animal Breeding and Genetics 137.1 (2020): 36-48.

Wallace, R. John, et al. ”The rumen microbial metagenome associated with high methane production in cattle.” BMC genomics 16.1 (2015): 839.

Bryant, Marvin P., and Nola Small. ”Characteristics of two new genera of anaerobic curved rods isolated from the rumen of cattle.” Journal of bacteriology 72.1 (1956): 22.

Journal club of one: ”Genome-wide association of foraging behavior in Drosophila melanogaster fails to support large-effect alleles at the foraging gene” (preprint)

This preprint was posted on bioRxiv and Haldane’s sieve. It tells the story of one of the best known genetic variants affecting behaviour, the foraging gene in Drosophila melanogaster. for is still a nice example of a large-effect variant causing (developmentally) pleiotropic effects. However, Turner & al present evidence questioning whether for has any substantial effect in natural populations of flies. I think it’s self-evident why I’m interested.

They look at previous evidence for foraging as a quantitative trait gene in files sampled from natural populations and perform genome-wide association and population genetic tests with 35 DGRP lines, finding nothing at the for locus.

Comments:

(Since this is a preprint, I will feel free to suggest what I think could be improvements to the manuscript. Obviously, these are just my opinions.)

I’m not convinced one can really separate a unimodal from a bimodal distribution with 36 data points? Below are a few histograms simulated from a mixture of two normal distributions where 25 samples are ”rovers” and 11 ”sitters”.

bimodal

For fun, I also tested for normality with the Shapiro-Wilks’ test as the authors did, and about half of 1000 tests reject. My histograms should not be overinterpreted; I just generated two normal distributions with means log10(2.66) and log10(1.3) with standard deviations 0.1. I don’t know the actual standard deviations of the forS and forR reference strains. Of course, when the standard deviation is small enough, the distributions clearly separate and Shapiro-Wilks’ test will reject.

Power is difficult, but in this case the authors are looking at a well-known effect. They should be able to postulate some reasonable effect-sizes given the literature and the difference between the reference strains and make sure that they’re actually powered to detect it. 35 individuals for a GWAS is not much. They may still have good power to detect a effect of the size expected at for, at least in the single-point test, but it would be nice to demonstrate it. Power feels particularly pertinent as the authors claim to find evidence of absence. The same thing should apply to the population genetic tests, though it’s probably harder to know what effects to expect there.

The authors discuss alternative interpretations, and mention  the fact that in their hands the reference strains did not travel nearly as long as in previous experiments. How likely is it, though, that the variant isn’t segregating in Raleigh but in the populations previously sampled?

Literature

Thomas Turner, Christopher C Giauque, Daniel R Schrider, Andrew D Kern. (2014) Genome-wide association of foraging behavior in Drosophila melanogaster fails to support large-effect alleles at the foraging gene. Preprint on bioaRxiv. doi: 10.1101/004325

Journal club of one: ”Functionally enigmatic genes: a case study of the brain ignorome”

This recent paper, Pandey & al (2014), made me interested because I’m in the business of finding genes for traits, and have spent quite some time looking at lists of gene names and annotation database output. One is tempted to look for the ”outstanding candidates” that ”make biological sense” (quotes intended as scare quotes), but the truth is probably that no-one knows what genes and functions we should expect to be affected by genetic variation in, for instance, behaviour. This paper tries to make the case for the unknown parts of the brain transcriptome; they use data about gene expression, protein domains, paralogs and literature to argue that the unknown genes are unknown for no good reason and that they might be just as important as genes that happen to be well-known.

They found genes that are had a high ratio of expression in brain to average expression in other tissues of C57BL/6J and DBA/2J mice and searched PubMed for these genes in combination with neuroscience-related keywords. Some of them have few citations and these are their selectively expressed but little studied genes. They then make a series of comparisons between these and well-studied genes. It turns out the only major difference is that well-studied genes were discovered (entered into GenBank) earlier.

Comments:

I don’t know to what extent these results are suprising. I was not surprised by their main conclusion, but then again, that maybe my opinion was mostly prejudice. There is a literature on biases in the functional genomics literature, but I don’t know much about it. And apparently neither did the authors, initially, as Robert Williams writes in a comment on the PLOS ONE website:

We did not rediscover the lovely work of Robert Hoffmann (now head of WikiGene) until the paper had been submitted in succession to six higher profile journals … Hoffmann and colleagues showed that social factors account for much of the annotation imbalance for genes.

I love the idea of authors writing an informal comment about the background of the paper like this.

The coexpression network results show some of the little known genes are just as connected as known important genes. This suggest some of the unknown genes might be important too, if we can trust that coexpression hub genes are likely to be important (for various values of ”important”). Maybe this is a scientific opportunity for some neuroscientist. Several people I’ve talked with has imagined future Big Science initiatives to describe the function of unknown genes — ”divide them up between labs and characterise them!” — and some initiatives exist, such as the IMPC. On the other hand, how do we know that we really find the most important and interesting functions of a gene? The skeptic in me thinks that going bottom up, from gene to phenotype, will miss the most interesting surprising phenotypes.

I think ”ignorome” is one of those unnecessary bad omics words, which is why I’ve avoided using it.

Their PubMed query was restricted to mouse, human and rat. I wonder why. Maybe there could be something useful from fruit flies or roundworms?

Overall, a fun paper that I recommend reading over a few cups of coffee!

Literature

Pandey AK, Lu L, Wang X, Homayouni R, Williams RW (2014) Functionally Enigmatic Genes: A Case Study of the Brain Ignorome. PLoS ONE 9(2): e88889. doi:10.1371/journal.pone.0088889

Journal club of one: ”Parental olfactory experience influences behavior and neural structure in subsequent generations”

Okay, neither chickens nor genetics, really, but a little epigenetic inheritance. Dias & Ressler in Nature neuroscience:

When an odor (acetophenone) that activates a known odorant receptor (Olfr151) was used to condition F0 mice, the behavioral sensitivity of the F1 and F2 generations to acetophenone was complemented by an enhanced neuroanatomical representation of the Olfr151 pathway.

Meaning that the offspring of conditioned mice score higher in an odour potentiated startle test (more about that below), avoid the odour at a lower concentration in an aversion test and have more neurons expressing that odorant receptor in their olfactory epithelium and bulb, counted by betagalactosidase staining in transgenic mice expressing M71, the product of Olfr151, coupled to LacZ.

Furthermore,

Bisulfite sequencing of sperm DNA from conditioned F0 males and F1 naive offspring revealed CpG   hypomethylation in the Olfr151 gene. In addition, in vitro fertilization, F2 inheritance and cross-fostering revealed that these transgenerational effects are inherited via parental gametes.

That is, they detect a difference in methylation in one CpG dinucleotide in the 3′ region of the gene.

Comments:

First, I love how the journal does exactly the thing I like to see with figures: below each figure is a link that leads to a data file with the underlying data!

Olfactory behaviour is not my thing, so the tests are new to me, but I’m a bit puzzled by the way they calculate the results from the odour potentiated startle tests. The point is to test whether the presence of the odour make the mice react stronger to a noise. After buzzing the sound 15 times without odour, they perform ten trials with odour plus sound and ten trials with sound only. But in calculating the score, they use only the difference between the first trial with odour and the last trial with sound only divided by how much the mouse reacted to the last of the first 15 sounds. Maybe this is standard, but why throw away the trials in between?

It is not only the olfactory potentiated startle and the sensitivity test, but the staining results. Again, this is not my area, but the results all seem to point to increased sensitivity in the offspring of the treated animals. They react stronger in the startle test, react at lower concentration in the avoidance test and they (in this case, the transgenic mice) have more neurons expressing M71. The cross fostering and the fact that the males were treated but not the females points to genuine inheritance. So, how does the treatment get into the germline? It has to cross that boundary and enter the sperm somehow. Unless there is some mysterious way for information from the central nervous system to travel to the testis, acetophenone must affect the spermatogenesis as well as the olfactory neurons.

All this is very hypothetical, so a little skepticism is not surprising. Gonzalo Otazu wrote in a comment on the Nature news webpage:

The statistical tests in the paper, both for the behavioral measurements as well as for the size of the M71 glomeruli , use as n, number of samples, the number of F1 and F2 individuals. This would be fine if the individuals were actually independent samples. However, they arise from a presumably small number of FO males. The numbers of FO males are not given in the paper. This is a major concern given that there is a lot of variability in the levels of expression of olfactory receptors in these mice that might be inheritable …

I think this is a good point but it will not be solved, as the comment later suggests, by adjusting the degrees of freedom of the test. From the F1 generation and on, genetic differences between the treatment groups, if they do exist, will amplify into a bias issue. That is, it is a systematic difference that might be bigger or smaller than the treatment effect and go in the same or opposite direction — we don’t know. However, the bias should not be there all the time, and not in the same direction, so it strengthens the authors’ case that they’ve done the treatment at least twice (with C57B/6J and with M17-LacZ mice, if not more times).

Maybe my preference for genetics is showing, but I feel the big unadressed alternative hypothesis in most transgenerational effects experiments is cryptic heritability. If you divide individuals into two groups, treat one of them and look for treatment effects in the offspring, you need to be sure that there are not genetc differences between the founders of the two groups. In the subsequent generations, genetic and non-genetic inheritance will be counfounded by design.

Again, randomisation and replication will help, but to be really sure, maybe one can use founders of known relatedness to create a mixed population — say take founders from full-sibships and split them equally between treatment groups, allowing segregation to randomise the genotypes of the next generation. It doesn’t say in the methods — the authors might even have done something like this. One could even use a genetic mixed model that includes relatedness as to estimate treatment effects over in the prescence of a genetic effect. I have a suspicion this experiment would require a much larger sample size, which means more time, work and animals — but I also believe that many would find confounding genetic variation more plausible than transgenerational epigenetic effects of unknown mechanism.

Literature

Brian G Dias & Kerry J Ressler (2013) Parental olfactory experience influences behavior and neural structure in subsequent generations Nature neuroscience doi:10.1038/nn.3594

Journal club of one: ”Short copy number variations potentially associated with tonic immobility response in newly hatched chicks”

(‘Journal club of one’ will be quick notes on papers, probably mostly about my favourite topics — genetics and the noble chicken.)

Abe, Nagao & Inoue-Murayama (2013), recently published this paper in PLOS ONE about copy number variants and tonic immobility in two kinds of domestic chicken. This obviously interests me for several reasons: I’m working on the genetic basis of some traits in the chicken; tonic immobility is a fun and strange behaviour — how it works and if it has any adaptive importance is pretty much unknown, but it is a classic from the chicken literature — and the authors use QTL regions derived directly from the F2 generation of cross that I’m working on — we’ve published one paper so far on the F8 generation.

Results: They use arrays and qPCR to search for copy number variants in three regions on chromosome one in two breeds (White Leghorn and Nagoya, a Japanese breed). After quite a bit of filtering they end up with a few variants that differ between the breeds. The breeds also differ in their tonic immobility behaviour with Leghorns going into tonic immobility after three attempts on average and lying still for 75 s and Nagoya taking 4.5 attempts and lying for 100 s on average. But the copy number variants were not associated with tonic immobility attempts or duration within breeds, so there is not really any evidence that they affect tonic immobility behaviour.

Comments:

Apart from the issue that the regions (more than 60 Mb) will contain lots of other variants, we do not know whether these regions affect tonic immobility behaviour in these breeds in the first place. The intercross that the QTL come from is a wild by domestic Red Junglefowl x White Leghorn cross, and while Nagoya seem a very interesting breed that is distant from White Leghorn they are not junglefowl. When it comes to the Leghorn side of the experiments, I wouldn’t be surprised White Leghorn bred on a Swedish research institute and a Japanese research institute differed quite a bit. The breed differences in tonic immobility is not necessarily due to the genetic variants identified in this particular cross, especially since behaviour is probably very polygenic, and an F2 QTL study by necessity only scratches the surface.

In the discussion the authors bring up power: There were 71 Nagoya and 39 White Leghorn individuals and the experiment might be unable to reliably detect associations within the breeds. That does seem likely, but making a good informed guess about the expected effect is not so easy. A hint could come from looking at the effect sizes in the QTL study, but there is no guarantee that genetic background will not affect them. I don’t know really what this calculation comes from: ”Sample sizes would need to be increased more than 20-fold over the current study design” — maybe 11 tested copy number variants times two breeds? To me, that seems both overly optimistic, because it assumes that the entire breed difference would be due to these three QTL on chromosome 1, and overly pessimistic, since it assumes that the three QTL would fractionate into 11 variants.

Finally, with all diversity in the chicken, there’s certainly a place both for within and between population studies of various chickens with all kinds of genomic! Comparing breeds with different selection histories should be very interesting for distinguishing early ‘domestication QTL’ from ‘productivity QTL’ selected under modern chicken breeding. And I wish somebody would figure out a little more about how tonic immobility works.

Literature

Abe H, Nagao K, Inoue-Murayama M (2013) Short Copy Number Variations Potentially Associated with Tonic Immobility Responses in Newly Hatched Chicks. PLoS ONE 8(11): e80205. doi:10.1371/journal.pone.0080205