Johan Frostegård ”Evolutionen och jag”

Jag läste Johan Frostegårds bok om evolutionen och människan över jul. Frostegård är allmänbildad och skriver småtrevligt om lite allt möjligt — lite om människans förhistoria, evolutionära öppna frågor som sexuell fortplantning, altruism, typiskt mänskliga egenskaper, två kapitel om syfilis, plus författarens syn på vetenskaps-, medvetande- och moralfilosofi. Samt Gud och Bob Dylan. Det är kul med en bok om evolution som har så många skönlitterära citat. Det bästa kapitlet är nog kapitel 18, ”Immunologi, evolutionen och jag” som berör hans egen forskning.

Men jag har ett par invändningar. Det går för fort. Jag hänger inte med. Boken stannar aldrig särskilt länge på något ämne. Men det finns ett övergripande tema: att olika ämnen — medicin, moral, nationalekonomi, humaniora — skulle tjäna på en evolutionär analys. Tyvärr är den evolutionära analysen i boken ibland inte särskilt bra. Här är två exempel i detalj:

Så här står det på sidan 89 om färgseende:

Tänk bara på färgblindhet som finns i mycket högre grad hos män än hos kvinnor, och där en rätt rimlig förklaring kan vara att detta ger en fördel när det gäller synförmåga på långa distanser, där den färgblinde anses ha större förmåga att urskilja kontraster, vilket utnyttjats även i moderna arméer. Dess förekomst är statistiskt sett på många håll ungefär som om en i varje jägarlag skulle vara färgblind.

Vad är problemet här?

Det är inte uteslutet att röd–grön-färgblindhet kommer med vissa fördelar också skulle kunna vara föremål för naturligt urval i människor under vissa omständigheter. Som sagt, det finns forskning som tyder på att det finns fördelar och nackdelar med att se två respektive tre färger. Och det är tydligen inte helt ovanligt att primater har variation i färgseende inom arten (Surridge, Osorio & Mundy 2003).

Men frågan är, om det nu är bättre (obs, hypotetiskt) att se två färger och inte tre, varför är inte alla män färgblinda? Det finns flera olika omständigheter när naturligt urval göra så att det finns flera varianter av en gen i en population. Det vill säga: att det fortsätter finnas flera varianter av en gen, efter att den nya varianten uppstått genom mutation. Det händer när en variant är bra ibland, dålig ibland, och kallas balanserande selektion.

Det kan vara så att en genetisk variant har både positiva och negativa egenskaper, som gör att de individer som har en kopia av den (bär den i heterozygot tillstånd) får den bästa balansen av för- och nackdelar. Ett annat alternativ är att en genetisk variant ger fördelar när den är ovanlig i populationen, men är dålig när många andra bär på den.

Men det är också möjligt att färgblindhet uppstår hyfsat ofta genom mutation och att det inte är särskilt skadligt, och kan vara vanligt av den anledningen.

Hur det ligger till är en empirisk fråga. Det räcker inte med en idé om hur något skulle kunna vara en fördel för att ha en bra evolutionär hypotes. Vad tar läsaren med sig från resonemanget om hen inte redan vet vad balanserande selektion är? Jo, en typ av spekulation — om det finns ärftlig variation i egenskap X kanske det beror på att den har en evolutionär fördel — utan vidare data eller bevis, som är vanlig men missvisande.

Exempel 2: Det finns några passager och altruismens evolution och diskussionen om släktskap och gruppselektion.

E.O. Wilson beskriver människosläktets sociala förmåga, kallad eusocialitet, som en central egenskap, och anför till och med gruppselektion som en bakomliggande mekanism, det senare något som blivit mycket ifrågasatt. [38, 53] Gruppselektion innebär att konkurrensen i naturen, som är det naturliga urvalets motor, inte bara sker på individnivå utan även på gruppnivå. (s. 91)

/…/

Men en mindre grupp talar för teorin, med nestorn inom sociobiologi, E.O. Wilson, som ett framträdande namn. Han publicerade i den prestigefyllda tidskriften Nature en artikel där han med två medförfattare och matematiska modeller beskrev gruppselektion som en förklaring till social samverkan hos sociala djur som människan [38].

Studien blev genast omdebatterad och hårt kritiserad, bland annat av Richard Dawkins som menar att teorin om gruppselektion bortser från att det är generna som är i centrum för evolutionen, i kraft av att vara replikatorer. Detta förnekar inte heller Wilson. Dock är inte sista ordet sagt, och min gissning är att Wilsons uppfattning kommer vinna mark [37, 256]. (s. 307)

Ja, altruismnördar, referens nummer 38 är ingen mindre än Nowak, Tarnita & Wilson (2010). Nummer 256 är den svarsartikel som 140 evolutionsbiologer skrev i samma tidskrift. Och nej, det tillhör inte direkt vanligheterna att en vetenskaplig tidskrift följs av ett protestupprop i samma tidskrift. (Nummer 37 är en recension som Dawkins skrivit av en av Wilsons böcker.)

Det här är inte en lätt debatt att referera, och den går som synes något djupare än ett meningsutbyte mellan Wilson och Dawkins. Och Nowak, Tarnita & Wilson (2010) är inte någon lätt artikel att läsa. Det är nog inte bara författarnas fel, utan också tidskriftens utrymmesbegränsningars. Den består nämligen av sex sidor ”artikel” och 43 sidor ”supplementary materials” med alla detaljer. Den matematiska modellen får en dryg halv sida i själva artikeln, utan vare sig resultat eller beskrivning av metoden.

Vad kan vi säga om den?

För det första: ”eusocialitet” är inte riktigt ett ord för ”människans speciella sociala natur”. Det är det speciella sociala system där djur lever i kolonier där bara en minoritet reproducerar sig och de andra är sterila. Tänk bisamhällen, myrsamhällen och kolonier av nakenråttor. Författarna tycker uppenbarligen att eusocialitet har tillräckligt gemensamt med arbetsdelning hos människor för att det ska vara en intressant analogi, men det de skriver om människans sociala evolution i artikeln är bara det här:

We have not addressed the evolution of human social behavior here, but parallels with the scenarios of animal eusocial evolution exist, and they are, we believe, well worth examining.

För det andra: det här är en debatt om matematiska modeller. Det är inget fel med det. Matematiska modeller och teoretisk forskning är utmärkt, särskilt om man vill studera något som inte går att observera. I det här fallet hur ett visst beteende uppstod i en sedan länge utdöd förmoder och -fader till en art. Men en diskussion om det bästa sättet att bygga en matematisk modell för ett hypotetiskt scenario blir lätt en smula … teoretisk.

Om vi vill bygga matematiska modeller av hur altruism uppstod finns det lite olika sätt att räkna. Tänk på arbetsbina i ett bisamhälle. Varför har de förlorat förmågan att lägga ägg? Ett sätt är att räkna ut hur många barn de kan få indirekt genom att drottningen, alltså deras mamma, lägger ägg. Om deras arbete gör att drottningen lägger tillräckligt många ägg kan det vara ett effektivare sätt för dem att sprida sina gener än om de skulle ge sig ut i världen och lägga ägg på egen hand. Det är släktskapsselektion (Frostegård beskrier det på s. 304), och sättet att räkna kallas ”inclusive fitness”. ”Fitness” betyder reproduktiv framgång, och ”inclusive fitness” är reproduktiv framgång med släktingarnas bidrag inräknat.

För det tredje så handlar Nowak, Tarnita & Wilson (2010) inte om gruppselektion. Inte direkt, i alla fall. Artikeln är en attack mot släktskapsselektion som förklaring för eusocialitet. De hävdar istället att deras modell, som inte räknar på arbetarnas inclusive fitness, utan istället beskriver hur en mutation som får arbetare att stanna kvar i boet sprider sig i en population, är mer realistisk. Men framför allt verkar de tycka att den är snyggare. Så här skriver de i artikeln:

By formulating a mathematical model of population genetics and family structure, we see that there is no need for inclusive fitness theory. The competition between the eusocial and the solitary allele is described by a standard selection equation. There is no paradoxical altruism, no payoff matrix, no evolutionary game. A ”gene-centered” approach for the evolution of eusociality makes inclusive fitness theory unnecessary.

Och sedan i kommentarer på Nowaks grupps hemsida:

Our paper does not study group selection, and it does not compare group selection
and inclusive fitness. But given the limitations of inclusive fitness it is clear that many models of group selection cannot be analyzed in terms of inclusive fitness. Also note that our model for the evolution of eusociality is not a group selection model; instead it describes selection operating at the level of genes.

Som sagt, den här debatten är rätt teknisk, och på ren svenska en jävla röra. Jag förstår att man inte vill gå in på detaljer i en populärvetenskaplig bok på ämnet. Jag vill inte gå in på detaljer heller. Men än en gång kan man fråga sig om en läsare som inte redan är insatt i ämnet blir något klokare av det här. Vad får vi med oss förutom det felaktiga intrycket att eusocialitet är ”människosläktets sociala förmåga” och ett auktoritetsargument för gruppselektion?

Litteratur

Frostegård, Johan. (2017) Evolutionen och jag. Volante. Stockholm.

Nowak, Martin A., Corina E. Tarnita, Edward O. Wilson. (2010) ”The evolution of eusociality.” Nature 466.7310

Abbot, Patrick, et al. (2011) ”Inclusive fitness theory and eusociality.” Nature 471.7339

Surridge, Alison K., Daniel Osorio, and Nicholas I. Mundy. (2003) ”Evolution and selection of trichromatic vision in primates.” Trends in Ecology & Evolution 18.4

Nessa Carey ”Junk DNA”

I read two popular science books over Christmas. The other one was in Swedish, so I’ll do that in Swedish.

Nessa Carey’s ”Junk DNA: A Journey Through the Dark Matter of the Genome” is about noncoding DNA in the human genome. ”Coding” in this context means that it serves as template for proteins. ”Noncoding” is all the rest of the genome, 98% or so.

The book is full of fun molecular genetics: X-inactivation, rather in-depth discussion of telomeres and centromeres, the mechanism of noncoding microsatellite disease mutations, splicing — some of which isn’t often discussed at such length and clarity. It gives the reader a good look at how messy genomics can be. It has wonderful metaphors — two baseball bats with magnetic paint and velcro, for example. It even has an amusing account of the ENCODE debate. I wonder if it’s true that evolutionary biologists are more emotional than other biologists?

But it really suffers from the framing as a story about how noncoding DNA used to be dismissed as pointless, and now, surprisingly, turns out to have regulatory functions. This makes me a bit hesitant to recommend the book; you may come away from reading it with a lot of neat details, but misled about the big picture. In particular, you may believe a false history of all this was thought to be junk; look how wrong they were in the 70s, and the very dubious view that most of the human genome is important for our health.

On the first page of the book, junk DNA is defined like this:

Anything that doesn’t code for protein will be described as junk, as it originally was in the old days (second half of the twentieth century). Purists will scream, and that’s OK.

We should scream, or at least shake our heads, because this definition leads, for example, to describing ribosomes and transfer-RNA as ”junk” (chapter 11), even if both of them have been known to be noncoding and functional since at least the 60s. I guess the term ”junk” sticks, and that is why the book uses it, and why biologists love to argue about it. You couldn’t call the book something unspeakably dry like ”Noncoding DNA”.

So, this is a fun a popular science book about genomics. Read it, but keep in mind that if you want to define ”junk DNA” for any other purpose than to immediately shoot it down, it should be something like this:

For most of the 50 years since Ohno’s article, many of us accepted that most of our genome is ”junk”, by which we would loosely have meant DNA that is neither protein-coding nor involved in regulating the expression of DNA that is. (Doolittle & Brunet 2017)

The point of the term is not to dismiss everything that is not coding for a protein. The point is that the bulk of DNA in the genome is neither protein coding nor regulatory. This is part of why molecular genetics is so tricky: it is hard to find the important parts among all the rest. Researchers have become much better at sifting through the noncoding parts of the genome to find the sequences that are interesting and useful. Think of lots of tricky puzzles being solved, rather than of a paradigm being overthrown by revolution.

Literature

Carey, Nessa. (2015) Junk DNA: A Journey Through the Dark Matter of the Genome. Icon Books, London.

Doolittle, W. Ford, and Tyler DP Brunet. (2017) ”On causal roles and selected effects: our genome is mostly junk.” BMC Biology.

Boring meta-post of the year

Really, it’s the second boring meta-post of the year, since I’ve already posted this one.

There were some rumours recently that the Scienceblogs blog network would shut down the site. It appears to still be up, and there are still blogs going there, so I don’t know about that, but this reminded me that Scienceblogs existed. I don’t think I’ve read anything on Scienceblogs in years, but it was one of my inspirations when I started blogging. It’s not that I wanted to be a science writer, but Scienceblogs and the also now defunct ResearchBlogging RSS feed (Fausto & al 2012) made me figure out that blogging about science was a thing people did.

Slowly, this thing took shape and became a ”science community blog”, in the terminology of Saunders & al (2017). That is, this blog is not so much about outreach or popular science, but ”aimed at the academic community”. I think of it as part of a conversation about genetics, even if it may be largely a conversation with myself.

So what is the state of the blog now? In September 2016, I decided to try to post once or twice a month (and also to make sure that both posts weren’t pointless filler posts). This panned out pretty well up until October 2017, when I ran out of steam for a while. Probably unrelated to that, 2017 was also the year my blog traffic suddenly increased by more than a factor of two. I don’t know for sure why, but looking at the numbers of individual posts, it seems the increase is because a lot of R users are looking for tidyverse-related things. If I went by viewer statistics, I would post less about genetics and more about hip R packages.

Instead, 2018 I will:

  • Attempt to keep up the pace of writing one or two things every month. Some, but not all, of them will be pointless fillers.
  • Hopefully produce a couple of posts about papers, if those things get out of the pipeline eventually. The problem with this, as anyone who writes papers knows, is that once something is out of the pipeline, one has grown so enormously tired of it.
  • Write a few more posts about other scientific papers I read. I’ve heard that there is limited interest in that sorts of thing, but I enjoy it, and writing should make me think harder about what I read.

Using R: reshape2 to tidyr

Tidy data — it’s one of those terms that tend to confuse people, and certainly confused me. It’s Codd’s third normal form, but you can’t go around telling that to people and expect to be understood. One form is ”long”, the other is ”wide”. One form is ”melted”, another ”cast”. One form is ”gathered”, the other ”spread”. To make matters worse, I often botch the explanation and mix up at least two of the terms.

The word is also associated with the tidyverse suite of R packages in a somewhat loose way. But you don’t need to write in a tidyverse-style (including the %>%s and all) to enjoy tidy data.

But Hadley Wickham’s definition is straightforward:

In tidy data:
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.

In practice, I don’t think people always take their data frames all the way to tidy. For example, to make a scatterplot, it is convenient to keep a couple of variables as different columns. The key is that we need to move between different forms rapidly (brain time-rapidly, more than computer time-rapidly, I might add).

And not everything should be organized this way. If you’re a geneticist, genotypes are notoriously inconvenient in normalized form. Better keep that individual by marker matrix.

The first serious piece of R code I wrote for someone else was a function to turn data into long form for plotting. I suspect plotting is often the gateway to tidy data. The function was like what you’d expect from R code written by a beginner who comes from C-style languages: It reinvented the wheel, and I bet it had nested for loops, a bunch of hard bracket indices, and so on. Then I discovered reshape2.

library(reshape2)
fake_data <- data.frame(id = 1:20,
                        variable1 = runif(20, 0, 1),
                        variable2 = rnorm(20))
melted <- melt(fake_data, id.vars = "id")

The id.vars argument is to tell the function that the id column is the key, a column that tells us which individual each observation comes from. As the name suggests, id.vars can name multiple columns in a vector.

So the is the data before:

  id   variable1    variable2
1  1 0.938173781  0.852098580
2  2 0.408216233  0.261269134
3  3 0.341325188  1.796235963
4  4 0.958889279 -0.356218000

And this is after. We go from 20 rows to 40: two variables times 20 individuals.

  id  variable       value
1  1 variable1 0.938173781
2  2 variable1 0.408216233
3  3 variable1 0.341325188
4  4 variable1 0.958889279

And now: tidyr. tidyr is the new tidyverse package for rearranging data like this.

The tidyr equivalent of the melt function is called gather. There are two important differences that messed with my mind at first.

The melt and gather functions take the opposite default assumption about what columns should be treated as keys and what columns should be treated as containing values. In melt, as we saw above, we need to list the keys to keep them with each observation. In gather, we need to list the value columns, and the rest will be treated as keys.

Also, the second and third arguments (and they would be the first and second if you piped something into it), are the variable names that will be used in the long form data. In this case, to get a data frame that looks exactly the same as the first, we will stick with ”variable” and ”value”.

Here are five different ways to get the same long form data frame as above:

library(tidyr)
melted <- gather(fake_data, variable, value, 2:3)

## Column names instead of indices
melted <- gather(fake_data, variable, value, variable1, variable2)

## Excluding instead of including
melted <- gather(fake_data, variable, value, -1)

## Excluding using column name
melted <- gather(fake_data, variable, value, -id)

## With pipe
melted <- fake_data %>% gather(variable, value, -id)

Usually, this is the transformation we need: wide to long. If we need to go the other way, we can use plyr’s cast functions, and tidyr’s gather. This code recovers the original data frame:

## plyr
dcast(melted, id ~  variable)

## tidyr
spread(melted, variable, value)

Peerage of Science Reviewer Prize 2017

I won a prize! Hurrah! I’m obviously very happy.

If you want to hear me answer a couple of questions and see the Peerage of Science crew engaged in some amusing video editing, look at the interview.

How did that happen? After being told, about a year ago to check out the peer review platform Peerage of Science, I decided to keep reviewing manuscripts that showed up and were relevant to my interests. Reading and commenting on unpublished manuscripts is stimulating, and I thought it would help improve my reviewing and, maybe, writing.

Maybe this is a testament to the power of gamification. I admit that I’ve occasionally been checking my profile to see what the score is even without thinking of any reviewer prize.

Griffin & Nesseth ”The science of Orphan Black: the official companion”

I didn’t know that science fiction series Orphan Black actually had a real Cosima: Cosima Herter, science consultant. After reading this interview and finishing season 5, I realised that there is also a new book I needed to read: The science of Orphan Black: The official companion by PhD candidate in development, stem cells and regenerative medicine Casey Griffin and science communicator Nina Nesseth with a foreword by Cosima Hertner.

(Warning: This post contains serious spoilers for Orphan Black, and a conceptual spoiler for GATTACA.)

One thing about science fiction struck me when I was watching the last episodes of Orphan Black: Sometimes it makes a lot more sense if we don’t believe everything the fictional scientists tell us. Like real scientists, they may be wrong, or they may be exaggerating. The genetically segregated future of GATTACA becomes no less chilling when you realise that the silly high predictive accuracies claimed are likely just propaganda from a oppressive society. And as you realise that the dying P.T. Westmorland is an imposter, you can break your suspension of disbelief about LIN28A as a fountain of youth gene … Of course, genetics is a little more complicated than that, and he is just another rich dude who wants science to make him live forever.

However, it wouldn’t be Orphan Black if there weren’t a basis in reality: there are several single gene mutations in model animals (e.g. Kenyon & al 1993) that can make them live a lot longer than normal, and LIN28A is involved in ageing (reviewed by Jun-Hao & al 2016). It’s not out of the question that an engineered single gene disruption that substantially increases longevity in humans could be possible. Not practical, and not necessarily without unpleasant side effects, but not out of the question.

Orphan Black was part slightly scary adventure, part festival of ideas about science and society, part character-driven web of relationships, and part, sadly, bricolage of clichés. I found when watching season five that I’d forgotten most of the plots of seasons two through four, and I will probably never make the effort to sit through them again. The first and last seasons make up for it, though.

The series seems to have been set on squeezing as many different biological concepts as possible in there, so the book has to try to do the same. It has not just clones and transgenes, but also gene therapy, stem cells, prion disease, telomeres, dopamine, ancient DNA, stem cells in cosmetics and so on. Two chapters try valiantly to make sense of the clone disease and the cure. It shows that the authors have encyclopedic knowledge of life science, with a special interest in development and stem cells.

But I think they slightly oversell how accurate the show is. Like when Cosima tells Scott to ”run a PCR on these samples, see if there are any genetic markers” and ”can you sequence for cytochrome c?”, and Scott replies ”the barcode gene? that’s the one we use for species differentiation” … That’s what screen science is like. The right words, but not always in the right order.

Cosima and Scott sciencing at university, before everything went pear-shaped. One of the good thing about Orphan Black was the scientist characters. There was a ton of them! The good ones, geniuses with sparse resources and self experimentation, the evil ones, well funded and deeply unethical, and Delphine. This scene is an exception in that it plays the cringe-inducing nerd angle. Cosima and Scott grew after than this.

There are some scientific oddities. They must be impossible to avoid. For example, the section on epigenetics treats it as a completely new field, sort of missing the history of the subfield. DNA methylation research was going on already in the 1970s (Gitschier 2009). Genomic imprinting, arguably the only solid example of transgenerational epigenetic effects in humans, and X inactivation were both being discovered during 70s and 80s (reviewed by Ferguson-Smith 2011). The book also makes a hash of genome sequencing, which is a shame but understandable. It would have taken a lot of effort to disentangle how sequencing worked when the fictional clone experiment started and how it got to how it works in season five, when Cosima runs Nanopore sequencing.

The idea of human cloning is evocative. Orphan Black flipped it on its head by making the main clone characters strikingly different. It also cleverly acknowledged that human cloning is a somewhat dated 20th century idea, and that the cutting edge of life science has moved on. But I wish the book had been harder on the premise of the clone experiment:

By cloning the human genome and fostering a set of experimental subjects from birth, the scientists behind the project would gain many insights into the inner workings of the human body, from the relay of genetic code into observable traits (called phenotypes), to the viability of manipulated DNA as a potential therapeutic tool, to the effects of environmental factors on genetics. It’s a scientifically beautiful setup to learn myriad things about ourselves as humans, and the doctors at Dyad were quick to jump at that opportunity. (Chapter 1)

This is the very problem. Of course, sometimes ethically atrocious fictional science would, in principle, generate useful knowledge. But when when fictional science is near useless, let’s not pretend that it would produce a lot of valuable knowledge. When it comes to genetics and complex traits like human health, small sample studies of this kind (even if it was using clones) would be utterly useless. Worse than useless, they would likely be biased and misleading.

Researchers still float the idea of a ”baseline”, though, but in the form of a cell line, where it makes more sense. See the the (Human) Genome Project-write (Boeke & al 2016), suggesting the construction of an ideal baseline cell line for understanding human genome function:

Additional pilot projects being considered include … developing a homozygous reference genome bearing the most common pan-human allele (or allele ancestral to a given human population) at each position to develop cells powered by ”baseline” human genomes. Comparison with this baseline will aid in dissecting complex phenotypes, such as disease susceptibility.

In the end, the most important part of science in science fiction isn’t to be a factually correct, nor to be a coherent prediction about the future. If Orphan Black has raised interest in science, and I’m sure it has, that is great. And if it has stimulated discussions about the relationship between biological science, culture and ethics, that is even better.

The timeline of when relevant scientific discoveries happened in the real world and in Orphan Black is great. The book has a partial bibliography. The ”Clone Club Q&A” boxes range from silly fun to great open questions.

Orphan Black was probably the best genetics TV show around, and this book is a wonderful companion piece.

Plaque at the Roslin Institute to the sheep that haunts Orphan Black. ”Baa.”

Literature

Boeke, JD et al (2016) The genome project-write. Science.

Ferguson-Smith, AC (2011) Genomic imprinting: the emergence of an epigenetic paradigm. Nature reviews Genetics.

Gitschier, J. (2009). On the track of DNA methylation: An interview with Adrian Bird. PLOS Genetics.

Jun-Hao, E. T., Gupta, R. R., & Shyh-Chang, N. (2016). Lin28 and let-7 in the Metabolic Physiology of Aging. Trends in Endocrinology & Metabolism.

Kenyon, C., Chang, J., Gensch, E., Rudner, A., & Tabtiang, R. (1993). A C. elegans mutant that lives twice as long as wild type. Nature, 366(6454), 461-464.