Philip Ball, who is a knowledgeable and thoughtful science writer, published an piece in the Guardian a couple of months ago about the misunderstood legacy of the human genome project: ”20 years after the human genome was first sequenced, dangerous gene myths abound”.
The human genome project published the draft reference genome for the human species 20 years ago. Ball argues, in short, that the project was oversold with promises that it couldn’t deliver, and consequently has not delivered. Instead, the genome project was good for other things that had more to do with technology development and scientific infrastructure. The sequencing of the human genome was the platform for modern genome science, but it didn’t, for example, cure cancer or uncover a complete set of instructions for building a human.
He also argues that the rhetoric of human genome hype, which did not end with the promotion of the human genome project (see the ENCODE robot punching cancer in the face, for example), is harmful. It is scientifically harmful because it oversimplifies modern genetics, and it is politically harmful because it aligns well with genetic determinism and scientific racism.
I believe Ball is entirely right about this.
The breathless hype around the human genome project was embarrassing. Ball quotes some fragments, but you can to to the current human genome project site and enjoy quotes like ”it’s a transformative textbook of medicine, with insights that will give health care providers immense new powers to treat, prevent and cure disease”. This image has some metonymical truth to it — human genomics is helping medical science in different ways — but even as a metaphor, it is obviously false. You can go look at the human reference genome if you want, and you will discover that the ”text” such as it is looks more like this than a medical textbook:
This is a human Alu element from chromosome 17. It’s also in an intron of a gene, flanking a promoter, a few hundred basepairs away from an insulator (see the Ensembl genome browser) … All that is stuff that cannot be read from the sequence alone. You might be able to tell that it’s Alu if you’re an Alu genius or run a sequence recognition software, but there is no to read the other contextual genomic information, and there is no way you can tell anything about human health by reading it.
I think Ball is right that this is part of simplistic genetics that doesn’t appreciate the complexity either quantitative or molecular genetics. In short, quantitative genetics, as a framework, says that inheritance of traits between relatives is due to thousands and thousands of genetic differences each of them with tiny effects. Molecular genetics says that each of those genetic differences may operate through any of a dizzying selection of Rube Goldberg-esque molecular mechanisms, to the point where understanding one of them might be a lifetime of laboratory investigation.
Simple inheritance is essentially a fiction, or put more politely: a simple model that is useful as a step to build up a more better picture of inheritance. This is not knew; the knowledge that everything of note is complex has been around since the beginning of genetics. Even rare genetic diseases understood as monogenic are caused by sometimes thousands of different variants that happen in a particular small subset of the genome. Really simple traits, in the sense of one variant–one phenotype, seldom happen in large mixing and migrating populations like humans; they may occur in crosses constructed in the lab, or in extreme structured populations like dog breeds or possibly with balancing selection.
Can you market thick sequencing?
Ball is also right about what it was most useful about the human genome project: it enabled research at scale into human genetic variation, and it stimulated development of sequencing methods, both generating and using DNA sequence. Lowe (2018) talks about ”thick” sequencing, a notion of sequencing that includes associated activities like assembly, annotation and distribution to a community of researchers — on top of ”thin” sequencing as determination of sequences of base pairs. Thick sequencing better captures how genome sequencing is used and stimulates other research, and aligns with how sequencing is an iterative process, where reference genomes are successively refined and updated in the face of new data, expert knowledge and quality checking.
It is hard to imagine gene editing like CRISPR being applied in any human cell without a good genome sequence to help find out what to cut out and what to put instead. It is hard to imagine the developments in functional genomics that all use short read sequencing as a read-out without having a good genome sequence to anchor the reads on. It is possible to imagine genome-wide association just based on very dense linkage maps, but it is a bit far-fetched. And so on.
Now, this raises a painful but interesting question: Would the genome project ever have gotten funded on reasonable promises and reasonable uncertainties? If not, how do we feel about the genome hype — necessary evil, unforgivable deception, something in-between? Ball seems to think that gene hype was an honest mistake and that scientists were surprised that genomes turned out to be more complicated than anticipated. Unlike him, I do not believe that most researchers honestly believed the hype — they must have known that they were overselling like crazy. They were no fools.
An example of this is the story about how many genes humans have. Ball writes:
All the same, scientists thought genes and traits could be readily matched, like those children’s puzzles in which you trace convoluted links between two sets of items. That misconception explains why most geneticists overestimated the total number of human genes by a factor of several-fold – an error typically presented now with a grinning “Oops!” rather than as a sign of a fundamental error about what genes are and how they work.
This is a complicated history. Gene number estimates are varied, but enjoy this passage from Lewontin in 1977:
The number of genes is not large
While higher organisms have enough DNA to specify from 100,000 to 1,000,000 proteins of average size, it appears that the actual number of cistrons does not exceed a few thousand. Thus, saturation lethal mapping of the fourth chromosome (Hochman, 1971) and the X chromosome (Judd, Shen and Kaufman, 1972) of Drosophila melanogbaster make it appear that there is one cistron per salivary chromosome band, of which there are 5,000 in this species. Whether 5,000 is a large or small number of total genes depends, of course, on the degree of interaction of various cistrons in influencing various traits. Nevertheless, it is apparent that either a given trait is strongly influenced by only a small number of genes, or else there is a high order of gene interactions among developmental systems. With 5,000 genes we cannot maintain a view that different parts of the organism are both independent genetically and each influenced by large number of gene loci.
I don’t know if underestimating by an few folds is worse than overestimating by a few folds (D. melanogaster has 15,000 protein-coding genes or so), but the point is that knowledgeable geneticists did not go around believing that there was a simple 1-to-1 mapping between genes and traits, or even between genes and proteins at this time. I know Lewontin is a population geneticist, and in the popular mythology population geneticists are nothing but single-minded bean counters who do not appreciate the complexity of molecular biology … but you know, they were no fools.
The selfish cistron
One thing Ball gets wrong is evolutionary genetics, where he mixes genetic concepts that, really, have very little to do with each other despite superficially sounding similar.
Yet plenty remain happy to propagate the misleading idea that we are “gene machines” and our DNA is our “blueprint”. It is no wonder that public understanding of genetics is so blighted by notions of genetic determinism – not to mention the now ludicrous (but lucrative) idea that DNA genealogy tells you which percentage of you is “Scots”, “sub-Saharan African” or “Neanderthal”.
This passage smushes two very different parts of genetics together, that don’t belong together and have nothing to do with with the preceding questions about how many genes there are or if the DNA is a blueprint: The gene-centric view of adaptation, a way of thinking of natural selection where you imagine genetic variants (not organisms, not genomes, not populations or species) as competing for reproduction; and genetic genealogy and ancestry models, where you describe how individuals are related based on the genetic variation they carry. The gene-centric view is about adaptation, while genetic genealogy works because of effectively neutral genetics that just floats around, giving us a unique individual barcode due to the sheer combinatorics.
He doesn’t elaborate on the gene machines, but it links to a paper (Ridley 1984) on Williams’ and Dawkins’ ”selfish gene” or ”gene-centric perspective”. I’ve been on about this before, but when evolutionary geneticists say ”selfish gene”, they don’t mean ”the selfish protein-coding DNA element”; they mean something closer to ”the selfish allele”. They are not committed to any view that the genome is a blueprint, or that only protein-coding genes matter to adaptation, or that there is a 1-to-1 correspondence between genetic variants and traits.
This is the problem with correcting misconceptions in genetics: it’s easy to chide others for being confused about the parts you know well, and then make a hash of some other parts that you don’t know very well yourself. Maybe when researchers say ”gene” in a context that doesn’t sound right to you, they have a different use of the word in mind … or they’re conceptually confused fools, who knows.
Lewontin, R. C. (1977). The relevance of molecular biology to plant and animal breeding. In International Conference on Quantitative Genetics. Ames, Iowa (USA). 16-21 Aug 1976.
Lowe, J. W. (2018). Sequencing through thick and thin: Historiographical and philosophical implications. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 72, 10-27.