Theory in genetics

A couple of years ago, Brian Charlesworth published this essay about the value of theory in Heredity. He liked the same Sturtevant & Beadle quote that I liked.

Two outstanding geneticists, Alfred Sturtevant and George Beadle, started their splendid 1939 textbook of genetics (Sturtevant and Beadle 1939) with the remark ‘Genetics is a quantitative subject. It deals with ratios, and with the geometrical relationships of chromosomes. Unlike most sciences that are based largely on mathematical techniques, it makes use of its own system of units. Physics, chemistry, astronomy, and physiology all deal with atoms, molecules, electrons, centimeters, seconds, grams—their measuring systems are all reducible to these common units. Genetics has none of these as a recognizable component in its fundamental units, yet it is a mathematically formulated subject that is logically complete and self contained’.

This statement may surprise the large number of contemporary workers in genetics, who use high-tech methods to analyse the functions of genes by means of qualitative experiments, and think in terms of the molecular mechanisms underlying the cellular or developmental processes, in which they are interested. However, for those who work on transmission genetics, analyse the genetics of complex traits, or study genetic aspects of evolution, the core importance of mathematical approaches is obvious.

Maybe this comes a surprise to some molecularly minded biologists; I doubt those working adjacent to a field called ”biophysics” or trying to understand what on Earth a ”t-distributed stochastic neighbor embedding” does to turn single-cell sequences into colourful blobs will have missed that there are quantitative aspects to genetics.

Anyways, Sturtevant & Beadle (and Charlesworth) are thinking of another kind of quantitation: they don’t just mean that maths is useful to geneticists, but of genetics as a particular kind of abstract science with its own concepts. It’s the distinction between viewing genetics as chemistry and genetics as symbols. In this vein, Charlesworth makes the distinction between statistical estimation and mathematical modelling in genetics, and goes on to give examples of the latter by an anecdotal history models of genetic variation, eventually going deeper into linkage disequilibrium. It’s a fun read, but it doesn’t really live up to the title by spelling out actual arguments for mathematical models, other than the observation that they have been useful in population genetics.

The hypothetical recurring reader will know this blog’s position on theory in genetics: it is useful, not just for theoreticians. Consequently, I agree with Charlesworth that formal modelling in genetics is a good thing, and that there is (and ought to be more of) constructive interplay between data and theory. I like that he suggests that mathematical models don’t even have to be that sophisticated to be useful; even if you’re not a mathematician, you can sometimes improve your understanding by doing some sums. He then takes that back a little by telling a joke about how John Maynard Smith’s paper on hitch-hiking was so difficult that only two researchers in the country could be smart enough to understand it. The point still stands. I would add that this applies to even simpler models than I suspect that Charlesworth had in mind. Speaking from experience, a few pseudo-random draws from a binomial distribution can sometimes clear your head about a genetic phenomenon, and while this probably won’t amount to any great advances in the field, it might save you days of fruitless faffing.

As it happens, I also recently read this paper (Robinaugh et al. 2020) about the value of formal theory in psychology, and in many ways, it makes explicit some things that Charlesworth’s essay doesn’t spell out, but I think implies: We want our scientific theories to explain ”robust, generalisable features of the world” and represent the components of the world that give rise to those phenomena. Formal models, expressed in precise languages like maths and computational models are preferable to verbal models, that express the structure of a theory in words, because these precise languages make it easier to deduce what behaviour of the target system that the model implies. Charlesworth and Robinaugh et al. don’t perfectly agree. For one thing, Robinaugh et al. seem to suggest that a good formal model should be able to generate fake data that can be compared to empirical data summaries and give explanations of computational models, while Charlesworth seems to view simulation as an approximation one sometimes has to resort to.

However, something that occurred to me while reading Charlesworth’s essay was the negative framing of why theory is useful. This is how Charlesworth recommends mathematical modelling in population genetic theory, by approvingly repeating this James Crow quote:

I hope to have provided evidence that the mathematical modelling of population genetic processes is crucial for a proper understanding of how evolution works, although there is of course much scope for intuition and verbal arguments when carefully handled (The Genetical Theory of Natural Selection is full of examples of these). There are many situations in which biological complexity means that detailed population genetic models are intractable, and where we have to resort to computer simulations, or approximate representations of the evolutionary process such as game theory to produce useful results, but these are based on the same underlying principles. Over the past 20 years or so, the field has moved steadily away from modelling evolutionary processes to developing statistical tools for estimating relevant parameters from large datasets (see Walsh and Lynch 2017 for a comprehensive review). Nonetheless, there is still plenty of work to be done on improving our understanding of the properties of the basic processes of evolution.

The late, greatly loved, James Crow used to say that he had no objection to graduate students in his department not taking his course on population genetics, but that he would like them to sign a statement that they would not make any pronouncements about evolution. There are still many papers published with confused ideas about evolution, suggesting that we need a ‘Crow’s Law’, requiring authors who discuss evolution to have acquired a knowledge of basic population genetics.

This is one of the things I prefer about Robinaugh et al.’s account: To them, theory is not mainly about clearing up confusion and wrongness, but about developing ideas by checking their consistency with data, and exploring how they can be modified to be less wrong. And when we follow Charlesworth’s anecdotal history of linked selection, it can be read as sketching a similar path. It’s not a story about some people knowing ”basic population genetics” and being in the right, and others now knowing it and being confused (even if that surely happens also); it’s about a refinement of models in the face of data — and probably vice versa.

If you listen to someone talking about music theory, or literary theory, they will often defend themselves against the charge that theory drains their domain of the joy and creativity. Instead, they will argue that theory helps you appreciate the richness of music, and gives you tools to invent new and interesting music. You stay ignorant of theory at your own peril, not because you risk doing things wrong, but because you risk doing uninteresting rehashes, not even knowing what you’re missing. Or something like that. Adam Neely (”Why you should learn music theory”, YouTube video) said it better. Now, the analogy is not perfect, because the relationship between empirical data and theory in genetics is such that the theory really does try to say true or false things about the genetics in a way that music theory (at least as practiced by music theory YouTubers) does not. I still think there is something to be said for theory as a tool for creativity and enjoyment in genetics.


Charlesworth, B. (2019). In defence of doing sums in genetics. Heredity, 123(1), 44-49.

Robinaugh, D., Haslbeck, J., Ryan, O., Fried, E. I., & Waldorp, L. (2020). Invisible hands and fine calipers: A call to use formal theory as a toolkit for theory construction. Paper has since been published in a journal, but I read the preprint.

The word ”genome”

The sources I’ve seen attribute the coinage of ”genome” to botanist Hans Winkler (1920, p. 166).

The pertinent passage goes:

Ich schlage vor, für den haploiden Chromosomensatz, der im Verein mit dem zugehörigen Protoplasma die materielle Grundlage der systematischen Einheit darstellt den Ausdruck: das Genom zu verwenden … I suggest to use the expression ”the genome” for the haploid set of chromosomes, which together with the protoplasm it belongs with make up the material basis of the systematic unit …

That’s good, but why did Winkler need this term in the first place? In this chapter, he is dealing with the relationship between chromosome number and mode of reproduction. Of course, he’s going to talk about hybridization and ploidy, and he needs some terms to bring order to the mess. He goes on to coin a couple of other concepts that I had never heard of:

… und Kerne, Zellen und Organismen, in denen ein gleichartiges Genom mehr als einmal in jedem Kern vorhanden ist, homogenomatisch zu nennen, solche dagegen, die verschiedenartige Genome im Kern führen, heterogenomatisch.

So, a homogenomic organism has more than one copy of the same genome in its nuclei, while a heterogenomic organism has multiple genomes. He also suggests you could count the genomes, di-, tri- up to polygenomic organisms. He says that this is a different thing than polyploidy, which is when an organism has multiples of a haploid chromosome set. Winkler’s example: A hybrid between a diploid species with 10 chromosomes and another diploid species with 16 chromosomes might have 13 chromosomes and be polygenomic but not polyploid.

These terms don’t seem to have stuck as much, but I found them used here en there, for example in papers on bananas (Arvanitoyannis et al. 2008) and cotton (Brown & Menzel 1952); cooking bananas are heterogenomic.

This only really makes sense in cases with recent hybridisation, where you can trace different chromosomes to origins in different species. You need to be able to trace parts of the hybrid genome of the banana to genomes of other species. Otherwise, the genome of the banana just the genome of the banana.

Analogously, we also find polygenomes in this cancer paper (Navin et al. 2010):

We applied our methods to 20 primary ductal breast carcinomas, which enable us to classify them according to whether they appear as either monogenomic (nine tumors) or polygenomic (11 tumors). We define ”monogenomic” tumors to be those consisting of an apparently homogeneous population of tumor cells with highly similar genome profiles throughout the tumor mass. We define ”polygenomic” tumors as those containing multiple tumor subpopulations that can be distinguished and grouped by similar genome structure.

This makes sense; if a tumour has clones of cells in it with a sufficiently rearranged genome, maybe it is fair to describe it as a tumour with different genomes. It raises the question what is ”sufficiently” different for something to be a different genome.

How much difference can there be between sequences that are supposed to count as the same genome? In everything above, we have taken a kind of typological view: there is a genome of an individual, or a clone of cells, that can be thought of as one entity, despite the fact that every copy of it, in every different cell, is likely to have subtle differences. Philosopher John Dupré (2010), in ”The Polygenomic Organism”, questions what we mean by ”the genome” of an organism. How can we talk about an organism having one genome or another, when in fact, every cell in the body goes through mutation (actually, Dupré spends surprisingly little time on somatic mutation but more on epigenetics, but makes a similar point), sometimes chimerism, sometimes programmed genome rearrangements?

The genome is related to types of organism by attempts to find within it the essence of a species or other biological kind. This is a natural, if perhaps naïve, interpretation of the idea of the species ‘barcode’, the use of particular bits of DNA sequence to define or identify species membership. But in this paper I am interested rather in the relation sometimes thought to hold between genomes of a certain type and an individual organism. This need not be an explicitly essentialist thesis, merely the simple factual belief that the cells that make up an organism all, as a matter of fact, have in common the inclusion of a genome, and the genomes in these cells are, barring the odd collision with a cosmic ray or other unusual accident, identical.

Dupré’s answer is that there probably isn’t a universally correct way to divide living things into individuals, and what concept of individuality one should use really depends on what one wants to do with it. I take this to mean that it is perfectly fine to gloss over real biological detail, but that we need to keep in mind that they might unexpectedly start to matter. For example, when tracing X chromosomes through pedigrees, it might be fine to ignore that X-inactivation makes female mammals functionally mosaic–until you start looking at the expression of X-linked traits.

Photo of calico cat in Amsterdam by SpanishSnake (CC0 1.0). See, I found a reason to put in a cat picture!

Finally, the genome exists not just in the organism, but also in the computer, as sequences, maps and obscure bioinformatics file formats. Arguably, keeping the discussion above in mind, the genome only exists in the computer, as a scientific model of a much messier biology. Szymanski, Vermeulen & Wong (2019) investigate what the genome is by looking at how researchers talk about it. ”The genome” turns out to be many things to researchers. Here they are writing about what happened when the yeast genetics community created a reference genome.

If the digital genome is not assumed to solely a representation of a physical genome, we might instead see ”the genome” as a discursive entity moving from the cell to the database but without ever removing ”the genome” from the cell, aggregating rather than excluding. This move and its inherent multiplying has consequences for the shape of the community that continues to participate in constructing the genome as a digital text. It also has consequences for the work the genome can perform. As Chadarevian (2004) observes for the C. elegans genome sequence, moving the genome from cell to database enables it to become a new kind of mapping tool …


Consequently, the informational genome can be used to manufacture coherence across knowledge generated by disparate labs by making it possible to line up textual results – often quite literally, in the case of genome sequences as alphabetic texts — and read across them.


Prior to the availability of the reference genome, such coherence across the yeast community was generated by strain sharing practices and standard protocols and notation for documenting variation from the reference strain, S288C, authoritatively embodied in living cells housed at Mortimer’s stock center. After the sequencing project, part of that work was transferred to the informational, textual yeast genome, making the practice of lining up and making the same available to those who worked with the digital text as well as those who worked with the physical cell.

And that brings us back to Winkler: What does the genome have in common? That it makes up the basis for the systematic unit, that it belongs to organisms that we recognize as closely related enough to form a systematic unit.


Winkler H. (1920) Verbreitung und Ursache der Parthenogenesis im Pflanzen- und Tierreiche.

Arvanitoyannis, Ioannis S., et al. ”Banana: cultivars, biotechnological approaches and genetic transformation.” International journal of food science & technology 43.10 (2008): 1871-1879.

Navin, Nicholas, et al. ”Inferring tumor progression from genomic heterogeneity.” Genome research 20.1 (2010): 68-80.

Brown, Meta S., and Margaret Y. Menzel. ”Polygenomic hybrids in Gossypium. I. Cytology of hexaploids, pentaploids and hexaploid combinations.” Genetics 37.3 (1952): 242.

Dupré, John. ”The polygenomic organism.” The Sociological Review 58.1_suppl (2010): 19-31.

Szymanski, Erika, Niki Vermeulen, and Mark Wong. ”Yeast: one cell, one reference sequence, many genomes?.” New Genetics and Society 38.4 (2019): 430-450.

Robertson on genetic correlation and loss of variation

It’s not too uncommon to see animal breeding papers citing a paper by Alan Robertson (1959) to support a genetic correlation of 0.8 as a cut-off point for what is a meaningful difference. What is that based on?

The paper is called ”The sampling variance of the genetic correlation coefficient” and, as the name suggests, it is about methods for estimating genetic correlations. It contains a section about the genetic correlation between environments as a way to measure gene-by-environment interaction. There, Robertson discusses experimental designs for detecting gene-by-environment interaction–that is, estimating whether a genetic correlation between different environments is less than one. He finds that you need much larger samples than for estimating heritabilities. It is in this context that the 0.8 number comes up. Here is the whole paragraph:

No interaction means a genetic correlation of unity. How much must the correlation fall before it has biological or agricultural importance? I would suggest that this figure is around 0.8 and that no experiment on genotype-environment interaction would have been worth doing unless it could have detected, as a significant deviation from unity, a genetic correlation of 0.6. In the first instance, I propose to argue from the standpoint of a standard error of 0.2 as an absolute minimum.

That is, in the context of trying to make study design recommendations for detecting genotype-by-environment interactions, Robertson suggests that a genetic correlation of 0.8 might be a meaningful difference from 1. The paper does not deal with designing breeding programs for multiple environments or the definition of traits, and it has no data on any of that. It seems to be a little bit like Fisher’s p < 0.05: Suggest a rule of thumb, and risk it having a life of its own in the future.

In the process of looking up this quote, I also found this little gem, from ”The effect of selection on the estimation of genetic parameters” (Robertson 1977). It talks about the problems that arise with estimating genetic parameters in populations under selection, when many quantitative genetic results, in one way or another, depend on random mating. Here is how it ends:

This perhaps points the moral of this paper. The individuals of one generation are the parents of the next — if they are accurately evaluated and selected in the first generation, the variation between families will be reduced in the next. You cannot have your cake and eat it.


Robertson, A. ”The sampling variance of the genetic correlation coefficient.” Biometrics 15.3 (1959): 469-485.

Robertson, A. ”The effect of selection on the estimation of genetic parameters.” Zeitschrift für Tierzüchtung und Züchtungsbiologie 94.1‐4 (1977): 131-135.

Excerpts about genomics in animal breeding

Here are some good quotes I’ve come across while working on something.

Artificial selection on the phenotypes of domesticated species has been practiced consciously or unconsciously for millennia, with dramatic results. Recently, advances in molecular genetic engineering have promised to revolutionize agricultural practices. There are, however, several reasons why molecular genetics can never replace traditional methods of agricultural improvement, but instead they should be integrated to obtain the maximum improvement in economic value of domesticated populations.

Lande R & Thompson R (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics.

Smith and Smith suggested that the way to proceed is to map QTL to low resolution using standard mapping methods and then to increase the resolution of the map in these regions in order to locate more closely linked markers. In fact, future developments should make this approach unnecessary and make possible high resolution maps of the whole genome, even, perhaps, to the level of the DNA sequence. In addition to easing the application of selection on loci with appreciable individual effects, we argue further that the level of genomic information available will have an impact on infinitesimal models. Relationship information derived from marker information will replace the standard relationship matrix; thus, the average relationship coefficients that this represents will be replaced by actual relationships. Ultimately, we can envisage that current models combining few selected QTL with selection on polygenic or infinitesimal effects will be replaced with a unified model in which different regions of the genome are given weights appropriate to the variance they explain.

Haley C & Visscher P. (1998) Strategies to utilize marker–quantitative trait loci associations. Journal of Dairy Science.

Instead, since the late 1990s, DNA marker genotypes were included into the conventional BLUP analyses following Fernando and Grossman (1989): add the marker genotype (0, 1, or 2, for an animal) as a fixed effect to the statistical model for a trait, obtain the BLUP solutions for the additive polygenic effect as before, and also obtain the properly adjusted BLUE solution for the marker’s allele substitution effect; multiply this BLUE by 0, 1, or 2 (specic for the animal) and add the result to the animal’s BLUP to obtain its marker-enhanced EBV. A logical next step was to treat the marker genotypes as semi-random effects, making use of several different shrinkage strategies all based on the marker heritability (e.g., Tsuruta et al., 2001); by 2007, breeding value estimation packages such as PEST (Neumaier and Groeneveld, 1998) supported this strategy as part of their internal calculations. At that time, a typical genetic evaluation run for a production trait would involve up to 30 markers.

Knol EF, Nielsen B, Knap PW. (2016) Genomic selection in commercial pig breeding. Animal Frontiers.

Although it has not caught the media and public imagination as much as transgenics and cloning, genomics will, I believe, have just as great a long-term impact. Because of the availability of information from genetically well-researched species (humans and mice), genomics in farm animals has been established in an atypical way. We can now see it as progressing in four phases: (i) making a broad sweep map (~20 cM) with both highly informative (microsatellite) and evolutionary conserved (gene) markers; (ii) using the informative markers to identify regions of chromosomes containing quantitative trait loci (QTL) controlling commercially important traits–this requires complex pedigrees or crosses between phenotypically anc genetically divergent strains; (iii) progressing from the informative markers into the QTL and identifying trait genes(s) themselves either by complex pedigrees or back-crossing experiments, and/or using the conserved markers to identify candidate genes from their position in the gene-rich species; (iv) functional analysis of the trait genes to link the genome through physiology to the trait–the ‘phenotype gap’.

Bulfield G. (2000) Biotechnology: advances and impact. Journal of the Science of Food and Agriculture.

I believe animal breeding in the post-genomic era will be dramatically different to what it is today. There will be massive research effort to discover the function of genes including the effect of DNA polymorphisms on phenotype. Breeding programmes will utilize a large number of DNA-based tests for specific genes combined with new reproductive techniques and transgenes to increase the rate of genetic improvement and to produce for, or allocate animals to, the product line to which they are best suited. However, this stage will not be reached for some years by which time many of the early investors will have given up, disappointed with the early benefits.

Goddard M. (2003). Animal breeding in the (post-) genomic era. Animal Science.

Genetics is a quantitative subject. It deals with ratios, with measurements, and with the geometrical relationships of chromosomes. Unlike most sciences that are based largely on mathematical techniques, it makes use of its own system of units. Physics, chemistry, astronomy, and physiology all deal with atoms, molecules, electrons, centimeters, seconds, grams–their measuring systems are all reducible to these common units. Genetics has none of these as a recognizable component in its fundamental units, yet it is a mathematically formulated subject that is logically complete and self-contained.

Sturtevant AH & Beadle GW. (1939) An introduction to genetics. W.B. Saunders company, Philadelphia & London.

We begin by asking why genes on nonhomologous chromosomes assort independently. The simple cytological story rehearsed above answers the questions. That story generates further questions. For example, we might ask why nonhomologous chromosomes are distributed independently at meiosis. To answer this question we could describe the formation of the spindle and the migration of chromosomes to the poles of the spindle just before meiotic division. Once again, the narrative would generate yet further questions. Why do the chromosomes ”condense” at prophase? How is the spindle formed? Perhaps in answering these questions we would begin to introduce the chemical details of the process. Yet simply plugging a molecular account into the narratives offered at the previous stages would decrease the explanatory power of those narratives.

Kitcher, P. (1984) 1953 and all that. A tale of two sciences. Philosophical Review.

And, of course, this great quote by Jay Lush.

‘Any distinction in principle between qualitative and quantitative characters disappeared long ago’

Any distinction in principle between qualitative and quantitative characters disappeared long ago, although in the early days of Mendelism it was often conjectured that they might be inherited according to fundamentally different laws.

If it is still convenient to call some characters qualitative and others quantitative, it is only to denote that the former naturally have a discontinuous and the latter a continuous distribution, or that the former are not easily measured on a familiar metrical scale. Colors are an example. Differences between colors can be measured in terms of length of light waves, hue, brilliance etc., but most of us find it difficult to compare those measurements with our own visual impressions.

Most quantitative characters are affected by many pairs of genes and also importantly by environmental variations. It is rarely possible to identify the pertinent genes in a Mendelian way or to map the chromosomal position of any of them. Fortunately this inability to identify and describe the genes individually is almost no handicap to the breeder of economic plants or animals. What he would actually do if he knew the details about all the genes which affect a quantitative character in that population differs little from what he will do if he merely knows how heritable it is and whether much of the hereditary variance comes from dominance or overdominance, and from epistatic interactions between the genes.

(That last part might not always be true anymore, but it still remained on point for more than half the time that genetics as a discipline has existed.)

Jay L Lush (1949) Heritability of quantitative characters in farm animals

‘Hard cash paid down, over and over again’

The whole subject of inheritance is wonderful. When a new character arises, whatever its nature may be, it generally tends to be inherited, at least in a temporary and sometimes in a most persistent manner. What can be more wonderful than that some trifling peculiarity, not primordially attached to the species, should be transmitted through the male or female sexual cells, which are so minute as not to be visible to the naked eye, and afterwards through the incessant changes of a long course of development, undergone either in the womb or in the egg, and ultimately appear in the offspring when mature, or even when quite old, as in the case of certain diseases? Or again, what can be more wonderful than the well-ascertained fact that the minute ovule of a good milking cow will produce a male, from whom a cell, in union with an ovule, will produce a female, and she, when mature, will have large mammary glands, yielding an abundant supply of milk, and even milk of a particular quality?

Today is Charles Darwin’s birthday. I’m not such a serious Darwin reader, but it’s fun how it seems like you can open a Darwin book at almost any chapter and find something interesting or amusing. This is from The Variation of Animals And Plants Under Domestication, chapter twelve, ‘Inheritance’. Here we find Darwin overflowing with enthusiasm when trying to convince a sceptic about the importance of inheritance. In true Darwin style he launches into a long list of examples:

Some writers, who have not attended to natural history, have attempted to show that the force of inheritance has been much exaggerated. The breeders of animals would smile at such simplicity; and if they condescended to make any answer, might ask what would be the chance of winning a prize if two inferior animals were paired together? They might ask whether the half-wild Arabs were led by theoretical notions to keep pedigrees of their horses? Why have pedigrees been scrupulously kept and published of the Shorthorn cattle, and more recently of the Hereford breed? Is it an illusion that these recently improved animals safely transmit their excellent qualities even when crossed with other breeds? have the Shorthorns, without good reason, been purchased at immense prices and exported to almost every quarter of the globe, a thousand guineas having been given for a bull? With greyhounds pedigrees have likewise been kept, and the names of such dogs, as Snowball, Major, &c., are as well known to coursers as those of Eclipse and Herod on the turf. Even with the Gamecock, pedigrees of famous strains were formerly kept, and extended back for a century. With pigs, the Yorkshire and Cumberland breeders ”preserve and print pedigrees;” and to show how such highly-bred animals are valued, I may mention that Mr. Brown, who won all the first prizes for small breeds at Birmingham in 1850, sold a young sow and boar of his breed to Lord Ducie for 43 guineas; the sow alone was afterwards sold to the Rev. F. Thursby for 65 guineas; who writes, ”She paid me very well, having sold her produce for 300l., and having now four breeding sows from her.” Hard cash paid down, over and over again, is an excellent test of inherited superiority. In fact, the whole art of breeding, from which such great results have been attained during the present century, depends on the inheritance of each small detail of structure. But inheritance is not certain; for if it were, the breeder’s art would be reduced to a certainty, and there would be little scope left for that wonderful skill and perseverance shown by the men who have left an enduring monument of their success in the present state of our domesticated animals.

For the rest of the chapter, he will go on to talk about humans, again with long lists of examples, and then mixing in domestic animals and plants again. A lot of these examples of heredity surely hold up, and others seem like anecdotes. Here and even more in the following chapters–with subtitles including ‘reversion to atavism’, ‘prepotency’ and ‘on the good effects of crossing, and the evil effects of close interbreeding’–Darwin is trying hard to make sense of heredity. Why are certain features heritable? Why do they sometimes go away in the offspring but reappear in later generations? Why are offspring sometimes more like one parent than the other? In chapter 27, he will present his ‘provisional hypthesis of pangenesis’.


Darwin. 1875. The variation of animals and plants under domestication.

”These are all fairly obvious” (says Sewall Wright)

I was checking a quote from Sewall Wright, and it turned out that the whole passage was delightful. Here it is, from volume 1 of Genetics and the Evolution of Populations (pages 59-60):

There are a number of broad generalizations that follow from this netlike relationship between genome and complex characters. These are all fairly obvious but it may be well to state them explicitly.

1) The variations of most characters are affected by a great many loci (the multiple factor hypothesis).

2) In general, each gene replacement has effects on many characters (the principle of universal pleiotropy).

3) Each of the innumerable possible alleles at any locus has a unique array of differential effects on taking account of pleiotropy (uniqueness of alleles).

4) The dominance relation of two alleles is not an attribute of them but of the whole genome and of the environment. Dominance may differ for each pleiotropic effect and is in general easily modifiable (relativity of dominance).

5) The effects of multiple loci on a character in general involve much nonadditive interaction (universality of interaction effects).

6) Both ontogenetic and phylogenetic homology depend on calling into play similar chains of gene-controlled reactions under similar developmental conditions (homology).

7) The contributions of measurable characters to overall selective value usually involve interaction effects of the most extreme sort because of the usually intermediate position of the optimum grade, a situation that implies the existence of innumerable different selective peaks (multiple selective peaks).

What can we say about this?

It seems point one is true. People may argue about whether the variants behind complex traits are many, relatively common, with tiny individual effects or many, relatively rare, and with larger effects that average out to tiny effects when measured in the whole population. In any case, there are many causative variants, alright.

Point two — now also known as the omnigenetic model — hinges on how you read ”in general”, I guess. In some sense, universal pleiotropy follows from genome crowding. If there are enough causative variants and a limited number of genes, eventually every gene will be associated with every trait.

I don’t think that point three is true. I would assume that many loss of function mutations to protein coding genes, for example, would be interchangeable.

I don’t really understand points six and seven, about homology and fitness landscapes, that well. The later section about homology reads to me as if it could be part of a debate going on at the time. Number seven describes Wright’s view of natural selection as a kind of fitness whack-a-mole, where if a genotype is fit in one dimension, it probably loses in some other. The hypothesis and the metaphor have been extremely influential — I think largely because many people thought that it was wrong in many different ways.

Points four and five are related and, I imagine, the most controversial of the list. Why does Wright say that there is universal epistasis? Because of physiological genetics. Or, in modern parlance, maybe because of gene networks and systems biology. On page 71, he puts it like this:

Interaction effects necessarily occur with respect to the ultimate products of chains of metabolic processes in which each step is controlled by a different locus. This carries with it the implication that interaction effects are universal in the more complex characters that trace such processes.

The argument seems to persists to this day, and I think it is true. On the other hand, there is the question how much this matters to the variants that actually segregate in a given population and affect a given trait.

This is often framed as a question of variance. It turns out that even with epistatic gene action, in many cases, most of the genetic variance is still additive (Mäki-Tanila & Hill 2014, Huang & Mackay 2016). But something similar must apply to the effects that you will see from a locus. They also depend on the allele frequencies at other loci. An interaction does nothing when one of the interaction partners are fixed. If they are nearly to fixed, it will do nearly nothing. If they’re all at intermediate frequency, things become more interesting.

Wright’s principle of universal interaction is also grounded in his empirical work. A lot of space in this book is devoted to results from pigmentation genetics in guinea pigs, which includes lots of dominance and interaction. It could be that Wright was too quick to generalise from guinea pig coat colours to other traits. It could be that working in a system consisting of inbred lines draws your attention to nonlinearities that are rare and marginal in the source populations. On the other hand, it’s in these systems we can get a good handle on the dominance and interaction that may be missed elsewhere.

Study of effects in combination indicates a complicated network of interacting processes with numerous pleiotropic effects. There is no reason to suppose that a similar analysis of any character as complicated as melanin pigmentation would reveal a simpler genetic system. The inadequacy of any evolutionary theory that treats genes as if they had constant effects, favourable or unfavourable, irrespective of the rest of the genome, seems clear. (p. 88)

I’m not that well versed in pigmentation genetics, but I hope that someone is working on this. In an era where we can identify the molecular basis of classical genetic variants, I hope that someone keeps track of all these A, C, P, Q etc, and to what extent they’ve been mapped.


Wright, Sewall. ”Genetics and the Evolution of Populations” Volume 1 (1968).

Mäki-Tanila, Asko, and William G. Hill. ”Influence of gene interaction on complex trait variation with multilocus models.” Genetics 198.1 (2014): 355-367.

Huang, Wen, and Trudy FC Mackay. ”The genetic architecture of quantitative traits cannot be inferred from variance component analysis.” PLoS genetics 12.11 (2016): e1006421.


Yours truly outside the library on Thomas Bayes’ road, incredibly happy with having found the book.

”Forskaren är fri”

Politiska ideologier
eländets filosofi
etablissemangets kotterier
men forskaren är fri
dogmatiska religiösa sekter
vetenskapens trolleri
materialismens effekter
men forskaren är fri

Kjell Höglund, Forskaren är fri

En behöver egentligen inte ens veta att Kjell Höglund skrivit böcker med någon sorts esoteriskt innehåll. Det räcker med att lyssna på texten för att förstå att forskaren i det här fallet inte är en akademisk forskare. Men ändå.


”Made obvious by our use of contraceptives”

I recently reread part of The Selfish Gene. The introduction to the 30th anniversary edition is great fun. For one thing, Dawkins expresses doubts about the word ”selfish” in the title, and ponders whether he should have called it the Immortal or Cooperative gene instead. That feels very ironic, and I for one think that he made the right choice. It also contains this nugget:

Our brains have evolved to a point where we are capable of rebelling against our selfish genes. The fact that we can do so is made obvious by our use of contraceptives. The same principle can and should work on a larger scale.

E. O. Wilson and B. F. Skinner

E.O. Wilson: This is going to be a conversation that I will have with B.F. Skinner. This is Ed Wilson. He invited me to talk about sociobiology. Our relations have always been very friendly and I look forward to it. This should be an interesting talk this Thursday morning.
B.F. Skinner: We will start with a basic statement. I assume that you are what I call a behaviorist. You would accept that an organism is a biophysical and biochemical system, a product of evolution.
E.O. Wilson: I am.

B.F. Skinner: That would include not only genetic behavior, but also the kinds of behavior that can be learned because of genetic processes. Of course it (behavior) always goes back to genetics.

Naour, P (2009) E. O. Wilson and B. F. Skinner. A dialogue between sociobiology and radical behaviorism. New York: Springer.