What is a locus, anyway?

”Locus” is one of those confusing genetics terms (its meaning, not just its pronunciation). We can probably all agree with a dictionary and with Wikipedia that it means a place in the genome, but a place of what and in what sense? We also use place-related word like ”site” and ”region” that one might think were synonymous, but don’t seem to be.

For an example, we can look at this relatively recent preprint (Chebib & Guillaume 2020) about a model of the causes of genetic correlation. They have pairs of linked loci that each affect one trait each (that’s the tight linkage condition), and also a set of loci that affect both traits (the pleiotropic condition), correlated Gaussian stabilising selection, and different levels of mutation, migration and recombination between the linked pairs. A mutation means adding a number to the effect of an allele.

This means that loci in this model can have a large number of alleles with quantitatively different effects. The alleles at a locus share a distribution of mutation effects, that can be either two-dimensional (with pleiotropy) or one-dimensional. They also share a recombination rate with all other loci, which is constant.

What kind of DNA sequences can have these properties? Single nucleotide sites are out of the question, as they can have four, or maybe five alleles if you count a deletion. Larger structural variants, such as inversions or allelic series of indels might work. A protein-coding gene taken as a unit could have a huge number of different alleles, but they would probably have different distributions of mutational effects in different sites, and (relatively small) differences in genetic distance to different sites.

It seems to me that we’re talking about an abstract group of potential alleles that have sufficiently similar effects and that are sufficiently closely linked. This is fine; I’m not saying this to criticise the model, but to explore how strange a locus really is.

They find that there is less genetic correlation with linkage than with pleiotropy, unless the mutation rate is high, which leads to a discussion about mutation rate. This reasoning about the mutation rate of a locus illustrates the issue:

A high rate of mutation (10−3) allows for multiple mutations in both loci in a tightly linked pair to accumulate and maintain levels of genetic covariance near to that of mutations in a single pleiotropic locus, but empirical estimations of mutation rates from varied species like bacteria and humans suggests that per-nucleotide mutation rates are in the order of 10−8 to 10−9 … If a polygenic locus consists of hundreds or thousands of nucleotides, as in the case of many quantitative trait loci (QTLs), then per-locus mutation rates may be as high as 10−5, but the larger the locus the higher the chance of recombination between within-locus variants that are contributing to genetic correlation. This leads us to believe that with empirically estimated levels of mutation and recombination, strong genetic correlation between traits are more likely to be maintained if there is an underlying pleiotropic architecture affecting them than will be maintained due to tight linkage.

I don’t know if it’s me or the authors who are conceptually confused here. If they are referring to QTL mapping, it is true that the quantitative trait loci that we detect in mapping studies often are huge. ”Thousands of nucleotides” is being generous to mapping studies: in many cases, we’re talking millions of them. But the size of a QTL region from a mapping experiment doesn’t tell us how many nucleotides in it that matter to the trait. It reflects our poor resolution in delineating the, one or more, causative variants that give rise to the association signal. That being said, it might be possible to use tricks like saturation mutagenesis to figure out which mutations within a relevant region that could affect a trait. Then, we could actually observe a locus in the above sense.

Another recent theoretical preprint (Chantepie & Chevin 2020) phrases it like this:

[N]ote that the nature of loci is not explicit in this model, but in any case these do not represent single nucleotides or even genes. Rather, they represent large stretches of effectively non-recombining portions of the genome, which may influence the traits by mutation. Since free recombination is also assumed across these loci (consistent with most previous studies), the latter can even be thought of as small chromosomes, for which mutation rates of the order to 10−2 seem reasonable.


Chebib and Guillaume. ”Pleiotropy or linkage? Their relative contributions to the genetic correlation of quantitative traits and detection by multi-trait GWA studies.” bioRxiv (2019): 656413.

Chantepie and Chevin. ”How does the strength of selection influence genetic correlations?” bioRxiv (2020).

Populär/vetenskapligt föredrag om hönskammar imorgon

Jag har helt missat att göra reklam för detta, men imorgon klockan fyra ska jag hålla ett kort föredrag om hönskammen som en del av Linköpings universitetsbiblioteks Fängslande forskning på femton minuter. Jag kommer använda kammen, som är ett sexuellt ornament hos höns, som exempel för att berätta om hur vi försöker reda ut den genetiska grunden för skillnader mellan tama och vilda höns. Orden ”Redan Charles Darwin …” kommer nämnas. Dessutom miljöteknik, medicinsk teknik och tunnfilmsfysik. Jag utgår ifrån att allt kommer vara roligt, men jag vet att Anette Karlssons forskning om muskler i magnetkamera ensamt skulle varit värt ett besök.


From my halftime seminar

A couple of weeks ago I presented my halftime seminar at IFM Biology, Linköping university. The halftime at our department isn’t a particularly dramatic event, but it means that after you’ve been going for two and a half years (since a typical Swedish PhD programme is four years plus 20% teaching to a total of five years), you get to talk about what you’ve been up to and discuss it with an invited opponent. I talked about combining genetic mapping and gene expression to search for quantitative trait genes for chicken domestication traits, and the work done so far particularly with relative comb mass. To give my esteemed readers an overview of what my project is about, here come a few of my slides about the mapping work — it is described in detail in Johnsson & al (2012). Yes, it does feel very good to write that — shout-outs to all the coauthors! This is part what I said on the seminar, part digression more suited for the blog format. Enjoy!

Slide04(Photo: Dominic Wright)

The common theme of my PhD project is genetic mapping and genetical genomics in an experimental intercross of wild and domestic chickens. The photo shows some of them as chicks. Since plumage colour is one of the things that segregate in this cross, their feathers actually make a very nice illustration of what is going on. We’re interested in traits that differ between wild and domestic chickens, so we use a cross based on a Red Jungefowl male and three domestic White Leghorn females. Their offspring have been mated with each other for several generations, giving rise to what is called an advanced intercross line. Genetic variants that cause differences between White Leghorn and Red Jungefowl chickens will segregate among the birds of the cross, and are mixed by recombination at meiosis. Some of the birds have the Red Junglefowl variant and some have the White Leghorn variant at a given part of their genome. By measuring traits that vary in the cross, and genotyping the birds for a map of genetic markers, we can find chromosomal chunks that are associated with particular traits, i.e. regions of the genome where we’re reasonably confident harbour a variant affecting the trait. These chromosomal chunks tend to be rather large, though, and contain several genes. My job is to use gene expression measurements from the cross to help zero in on the right genes.

The post continues below the fold! Läs mer