Various positions

What use is there in keeping a blog if you can’t post your arbitrary idiosyncratic opinions as if you were an authority? Here is a list of opinions about life in the scientific community.

Social media for scientists

People who promote social media for scientists by humblebragging about how they got a glam journal paper because of Twitter should stop. An unknown PhD student from the middle of nowhere must be a lot more likely to get into trouble than get on a paper because of Twitter.

Speaking of that, who thinks that that writing an angry letter to someone’s boss is the appropriate response to disagreeing with someone on Twitter? Please stop with that.

Poster sessions

Poster sessions are a pain. Not only do you suffer the humiliation of not begin cool enough to give a talk, you also get to haul a poster tube to the conference. The trouble is that we can’t do away with poster sessions, because they fulfill the important function of letting a lot of people contribute to the conference so that they have a reason to go there.

Now cue comments of this kind: ”That’s not true! I’ve had some of my best conference conversations at poster sessions. Maybe you just don’t know how to make a poster …” It is true that I don’t know how to make a good poster. Regardless, my ad hoc hypothesis for why people say things like this is that they’re already known and connected enough to have good conversations anywhere at the conference, and that the poster served as a signpost for their colleagues to find them.

How can one make a poster session as good as possible? Try to make lots of space so people won’t have to elbow each other. Try to find a room that won’t be incredibly noisy and full of echos. Try to avoid having some posters hidden behind pillars and in corners.

Also, don’t organize a poster competition unless you also organize a keynote competition.

Theory

There is way way way too little theory in biology education, as far as I can tell. Much like computer programming — a little of which is recognized as a useful skill to have even for empirically minded biologists who are not going to be programming experts — it is very useful to be able to read a paper without skipping the equations, or tell whether a paper is throwing dust when it states that ”[unspecified] Theory predicts …” this or that. But somehow, materials for theory manage to be even more threatening than computer documentation, which is an impressive feat. If you disagree, please post in the comments a reference to an introduction to coalescent theory that is accessible for, say, a biology PhD student who hasn’t taken a probability course in a few years.

Language corrections

That thing when reviewers suggest that a paper be checked by a native English speaker, when they mean that it needs language editing, is rude. Find a way of phrasing it that won’t offend that one native English speaker who invariably is on the paper, but doesn’t have an English enough name and affiliation for you!

Using R: the best thing I’ve changed about my code in years

Hopefully, one’s coding habits are constantly improving. If you feel any doubt about yourself, I suggest looking back at something you wrote 2011.

One thing I’ve changed recently that made my life so much better is a simple silly thing: meaningful name for index and counter variables.

Take a look at these pieces of fake code, that both loop over a matrix of hypothetical data (say: genotypes) to pass each value to a function (that does something):

## First attempt
for (i in 1:ncol(geno)) {
   for (j in 1:nrow(geno)) {
        output[i,j] <- do_something(geno[j, i])
   }
}
## Second attempt
n_markers <- ncol(geno)
n_ind <- nrow(geno)
for (marker_ix in 1:n_markers) {
   for (individual_ix in 1:n_ind) {
        output[individual_ix, marker_ix] <-
            do_something(geno[individual_ix, marker_ix])
   }
}

Isn’t that much nicer? The second version explicitly states what is what: we are iterating over markers and individuals, where each row is an individual and each column a marker. It even helps us spot errors such as the one in the first version. You would marvel at how many years it took me to realise that there is no law that says that the loop variable must be called i.

(Yes, nested for loops and hard bracket indexing looks uglier than a split-apply-combine solution, and using an apply family function would do away with any risk of mixing up the indices. However, loops do look less arcane to the uninitiated, and sometimes in more complicated cases, we really need that loop variable for something else.)

The open access ‘Plan S’ is decisive action to do the wrong thing

Plan S (as Wikipedia puts it: ‘not to be confused with S-plan‘) is an plan by the European Research Council and other European research funders to promote open access publishing. They say that key idea is:

After 1 January 2020 scientific publications on the results from research funded by public grants provided by national and European research councils and funding bodies must be published in compliant Open Access Journals or on compliant Open Access Platforms.

What Plan S is doing right, in my opinion:

  • Research funders have realised that they have weight they can throw around. They can force change on publishers by telling researchers what to and deciding what they will pay for and not pay for.
  • They emphasise copyright and strong licensing (i.e. cc:by) that give readers the rights to use and reproduce.
  • They want publishing costs to be covered by funders, and be capped to be somehow reasonable.

What Plan S is doing wrong, in my opinion, can be summarised by quoting their ninth principle:

The ‘hybrid’ model of publishing is not compliant with the above principles

First, let us talk terminology. ‘Gold’ open access is where the journal is exclusively open access. ‘Green’ is when the journal may not be open access normally, but allows you to put up an accessible copy of the paper somewhere else, for example your friendly institutional repository. These labels unhelpful. They aren’t natural mnemonics, and as you might expect, they are used inconsistently. More importantly, author-pays full open access is not some higher form of publishing, so I wouldn’t call it ‘gold’.

‘Hybrid’ publishing is when only some papers in a journal are open access. This would not be allowed under Plan S. This would prevent publishing in Science, Nature and Cell. Depending on your stance on publishing that may be upsetting or encouraging. Of course, it would also disqualify a set of society journals like Genetics, Heredity, and Evolution.

Why, one might ask? Is it important that open access publishing happens only in exclusively open access journals? I guess the idea is to prevent library-pays journals from getting paid twice by also charging authors for some papers.

I think that is confusing a means with an end. The goal should be to get the most accessible papers with the least amount of effort, and to push scientific publishing in a positive direction. I am not sure that monolithic author-pays publishers are all that much better than monolithic library-pays publishers, so why should we give them a particular advantage?

In my opinion, a better option would be along these lines: We should accept preprints as a form of cheap open access, make sure to format our preprints a bit nicer (I’ve sinned against this by uploading double-spaced manuscripts with the figures at the end), and pressure subscription journals to accept preprinting of the final text without delay. Maybe one could even get journals to accept the preprints to be distributed under permissive licenses. This may be a tall order, but maybe no less realistic than trying to dictate the size of the publishing fee.

We could have the best of both: scientific communities could keep publishing in those quality society journals that all of our colleagues read, and everyone would get free and convenient access to papers. The problem of unreasonable subscription fees will remain, and that needs other plans for joint library and university action. For those of us that have a bit of an iconoclastic streak, it would also leave the field open for new ideas in publishing, rather than prescribing certain journals with a particular business model.

I’m looking at a life unfold
Dreaming of the green and gold
Just like the ancient stone
Every sunrise I know
Those eyes you gave to me
That let me see
Where I come from
(Lianne La Havas)

Journal club of one: ‘Sacred text as cultural genome: an inheritance mechanism and method for studying cultural evolution’

This is a fun paper about something I don’t know much about: Hartberg & Sloan Wilson (2017) ‘Sacred text as cultural genome: an inheritance mechanism and method for studying cultural evolution‘. It does exactly what it says on the package: it takes an image from genome science, that of genomic DNA and gene expression, and uses it as a metaphor for how pastors in Christian churches use the Bible. So, the Bible is the genome, churches are cells, and citing bible passages in a sermon is gene expression–at least something along those lines.

The authors use a quantitative analysis analogous to differential gene expression to compare the Bible passages cited in sermons from six Protestant churches in the US with different political leanings (three conservative and three progressive; coincidentally, N = 3 is kind of the stereotypical sample size of an early 2000s gene expression study). The main message is that the churches use the Bible differently, that the conservative churches use more of the text, and that even when they draw on the same book, they use different verses.

They exemplify with Figure 3, which shows a ‘Heat map showing the frequency with which two churches, one highly conservative (C1) and one highly progressive (P1), cite specific verses within chapter 3 of the Gospel According to John in their Sunday sermons.’ I will not reproduce it for copyright reasons, but it pretty clearly shows how P1 often cites the first half of the chapter but doesn’t use the second half at all. C1, instead, uses verses from the whole chapter, but its three most used verses are all in latter half, that is the block that P1 doesn’t use at all. What are these verses? The paper doesn’t quote them except 3:16 ‘For God so loved the world, that he gave his one and only Son, that whoever believes in him should not perish, but have eternal life’, which is the exception to the pattern — it’s the most common verse in both churches (and generally, a very famous passage).

Chapter 3 of the Gospel of John is the story of how Jesus teaches Nicodemus. Here is John 3:1-17:

1 Now there was a man of the Pharisees named Nicodemus, a ruler of the Jews. 2 The same came to him by night, and said to him, ”Rabbi, we know that you are a teacher come from God, for no one can do these signs that you do, unless God is with him.”
3 Jesus answered him, ”Most certainly, I tell you, unless one is born anew, he can’t see God’s Kingdom.”
4 Nicodemus said to him, ”How can a man be born when he is old? Can he enter a second time into his mother’s womb, and be born?”
5 Jesus answered, ”Most certainly I tell you, unless one is born of water and spirit, he can’t enter into God’s Kingdom. 6 That which is born of the flesh is flesh. That which is born of the Spirit is spirit. 7 Don’t marvel that I said to you, ‘You must be born anew.’ 8 The wind blows where it wants to, and you hear its sound, but don’t know where it comes from and where it is going. So is everyone who is born of the Spirit.”
9 Nicodemus answered him, ”How can these things be?”
10 Jesus answered him, ”Are you the teacher of Israel, and don’t understand these things? 11 Most certainly I tell you, we speak that which we know, and testify of that which we have seen, and you don’t receive our witness. 12 If I told you earthly things and you don’t believe, how will you believe if I tell you heavenly things? 13 No one has ascended into heaven but he who descended out of heaven, the Son of Man, who is in heaven. 14 As Moses lifted up the serpent in the wilderness, even so must the Son of Man be lifted up, 15 that whoever believes in him should not perish, but have eternal life. 16 For God so loved the world, that he gave his one and only Son, that whoever believes in him should not perish, but have eternal life. 17 For God didn’t send his Son into the world to judge the world, but that the world should be saved through him.”

This is the passage that P1 uses a lot, but they break before they get to the verses that come right after: John 3:18-21. The conservative church uses them the most out of this chapter.

18 Whoever believes in him is not condemned, but whoever does not believe stands condemned already because they have not believed in the name of God’s one and only Son. 19 This is the verdict: Light has come into the world, but people loved darkness instead of light because their deeds were evil. 20 Everyone who does evil hates the light, and will not come into the light for fear that their deeds will be exposed. 21 But whoever lives by the truth comes into the light, so that it may be seen plainly that what they have done has been done in the sight of God.

So this is consistent with the idea of the paper: In the progressive church, the pastor emphasises the story about doubt and the possibility of salvation, where Nicodemus comes to ask Jesus for explanations, and Jesus talks about being born again. It also has some beautiful perplexing Jesus-style imagery with the spirit being like the wind. In the conservative church, the part about condemnation and evildoers hating the light gets more traction.

I’m not sure that the analogy works. The metaphors are mixed, and it’s not obvious what the unit of inheritance is. For example, when the paper talks about ‘fitness-enhanching information’, does that refers to the fitness of the church, the members of the church, or the Bible itself? The paper sometimes talk as if the bible was passed on from generation to generation, for instance here in the introduction:

Any mechanism of inheritance must transmit information across generations with high fidelity and translate this information into phenotypic expression during each generation. In this article we argue that sacred texts have these properties and therefore qualify as important inheritance mechanisms in cultural evolution.

But the sacred text isn’t passed on from generation to generation. The Bible is literally a book that is transmitted by printing. What may be passed on is the way pastors interpret it and, in the authors’ words, ‘cherry pick’ verses to cite. But clearly, that is not stored in the bible ‘genome’ but somehow in the culture of churches and the institutions of learning that pastors attend.

If we want to stick to the idea of the bible as a genome, I think this story makes just as much sense: Don’t think about how this plasticity of interpretation may be adaptive for humans. Instead, take a sacred text-centric perspective, analogous to the gene-centric perspective. Think of the plasticity in interpretation as preserving the fitness of the bible by making it fit community values. Because the Bible can serve as source materials for churches with otherwise different values, it survives as one of the most important and widely read books in the world.

Literature

Hartberg, Yasha M., and David Sloan Wilson. ”Sacred text as cultural genome: an inheritance mechanism and method for studying cultural evolution.” Religion, Brain & Behavior 7.3 (2017): 178-190.

The Bible quotes are from the World English Bible translation.

Reading Strunk & White

I don’t remember who, but someone wrote that people who hand out the advice to read Strunk & White’s Elements of Style probably haven’t read it—the implication being that it’s not as good as its reputation. The names of famous works and authors often work as metonymies for common wisdom more than as references to their content, not just Strunk & White, but Machiavelli, The Modern Synthesis etc. The journal GENETICS, where I recently sent a couple of papers, have Strunk & White as part of the guidelines for authors, so I decided to read Strunk & White.

This is from the part that the GENETICS guidelines refer to, namely the famous ”Omit needless words” section:

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all sentences short, or avoid all detail and treat subjects only in outline, but that every word tell.

This is some good advice that is apt for scientific writing. All the rules that deal with vigorousness and conciseness would make life easier for all us readers of scientific papers:

15. Put statements in positive form.
16. Use definite, specific, concrete language.
17. Omit needless words.
18. Avoid a succession of loose sentences.
19. Express coordinate ideas in similar form.
20. Keep related words together.

Scientific writing is often too roundabout and guarded. I think that happens not because we’re trying to worm our way out of criticism, but because we have misguided notions about style that we get from reading heaps of badly written papers, and can’t shake without effort.

But there is also a ton of advice that is tedious and arbitrary. A problem with learning from authorities is that we end up reifying their pet peeves. The Elements of Style is full of them. There’s this thing you do when you state your opinion: You try to express it forcefully (forcibly, I guess, according to Strunk), but you are not Kantian about it. You don’t expect it to be elevated to a universal law. If it were, especially without the conditions that applied to those opinions, you might be unhappy with the results. Yes, we should omit unnecessary words, but for some authors, it may be more pressing to look for words to add, to elaborate, explain, and exemplify.

The same is true for graph-making (where should the y-axis end? what is the best way to show proportions?), the proper way to email university teachers as a student, the placement of figures in manuscripts to be reviewed, … In the absence of evidence, we substitute opinion, loudly spoken.

‘All domestic animals and plants are genetically modified already’

There is an argument among people who, like yours truly, support (or at least are not in principle against) applications of genetic modification in plant and animal breeding that ‘all domestic animals and plants are genetically modified already’ because of domestication and breeding. See for example Food Evolution or this little video from Sonal Katyal.

This is true in one sense, but it’s not very helpful, for two reasons.

First, it makes it look as if the safety and efficacy of genome editing turns on a definition. I don’t know what the people who pull out this idea in discussion expect that the response will be — that the people who worry about genetic modification as some kind of threat will go ‘oh, that was a clever turn of phrase; I guess it’s no problem then’. Again, I think the honest thing to say is that genetic modification (be it mutagenesis, transgenetics, or genome editing) is a different thing than classic breeding, but that it’s still okay.

Second, I also fear that it promotes the misunderstanding that selective breeding is somehow outdated and unimportant. This video is an example (and I don’t mean to bash on the video; I think what’s said in it is true, but not the whole story). Yes, genome editing allows us to introduce certain genetic changes precisely and independently of the surrounding region. This is as opposed to introducing a certain variant by crossing, when other undesired genetic variants will follow along. However, we need to know what to edit and what to put instead, so until knowledge of causative variants is near perfect (spoiler: never), selection will still play a role.

Genome editing in EU law

The European Court of Justice recently produced a judgement (Case C-528/16) that means that genome edited organisms will be regarded as genetically modified and subject to the EU directive 2001/18 about genetically modified organisms, which is bad news for anyone who wants to use genome editing to do anything with plant or animal breeding in Europe.

The judgement is in legalese, but I actually found it more clear and readable than the press coverage about it. The court does not seem conceptually confused: it knows what genome editing is, and makes reasonable distinctions. It’s just that it’s bound by the 2001 directive, and if we want genome editing to be useful, we need something better than that.

First, let’s talk about what ‘genetic modification’, ‘transgenics’, ‘mutagenesis’, and ‘genome editing’ are. This is how I understand the terms.

  • A genetically modified organism, the directive says, is ‘an organism, with the exception of human beings, in which the genetic material has been altered in a way that does not occur naturally by mating and/or natural recombination’. The directive goes on to clarify with some examples that count as genetic modification, and some that don’t, including in vitro fertilisation as well as bacterial and viral processes of horizontal gene transfer. As far as I can tell, this is sensible. The definition isn’t unassailable, of course, because a lot hinges on what counts as a natural process, but no definition in biology ever is.
  • Transgenics are organisms that have had new DNA sequences introduced into them for example from a different species. As such, their DNA is different in a way that is very unlikely to happen by spontaneous mutation. For technical reasons, this kind of genetic modification, even if it may seem more dramatic than changing a few basepairs, is easier to achieve than genome editing. This the old, ‘classic’, genetic modification that the directive was written to deal with.
  • Mutagenesis is when you do something to an organism to change the rate of spontaneous mutation, e.g. treat it with some mutagenic chemical or radiation. With mutagenesis, you don’t control what change will happen (but you may be able to affect the probability of causing a certain type of mutation, because mutagens have different properties).
  • Finally, genome editing means changing a genetic variant into another. These are changes that could probably be introduced by mutagenesis or crossing, but they can be made more quickly and precisely with editing techniques. This is what people often envisage when we talk about using Crispr/Cas9 in breeding or medicine.

On these definitions, the Crispr/Cas9 (and related systems) can be used to do either transgenics, mutagenesis or editing. You could use it for mutagenesis to generate targeted cuts, and let the cell repair by non-homologous end joining, which introduces deletions or rearrangements. This is how Crispr/Cas9 is used in a lot of molecular biology research, to knock out genes by directing disruptive mutations to them. You could also use it to make transgenics by introducing a foreign DNA sequence. For example, this is what happens when Crispr/Cas9 is used to create artificial gene drive systems. Or, you could edit by replacing alleles with other naturally occurring alleles.

Looking back at what is in the directive, it defines genetically modified organisms, and then it goes on to make a few exceptions — means of genetic modification that are exempted from the directive because they’re considered safe and accepted. The top one is mutagenesis, which was already old hat in 2001. And that takes us to the main question that the judgment answers: Should genome editing methods be slotted in there, with chemical and radiation mutagenesis, which are exempt from the directive even if they’re actually a kind of genetic modification, or should they be subject to the full regulatory weight of the directive, like transgenics? Unfortunately, the court found the latter. They write:

[T]he precautionary principle was taken into account in the drafting of the directive and must also be taken into account in its implementation. … In those circumstances, Article 3(1) of Directive 2001/18, read in conjunction with point 1 of Annex I B to that directive [these passages are where the exemption happens — MJ], cannot be interpreted as excluding, from the scope of the directive, organisms obtained by means of new techniques/methods of mutagenesis which have appeared or have been mostly developed since Directive 2001/18 was adopted. Such an interpretation would fail to have regard to the intention of the EU legislature … to exclude from the scope of the directive only organisms obtained by means of techniques/methods which have conventionally been used in a number of applications and have a long safety record.

My opinion is this: Crispr/Cas9, whether used for genome editing, targeted mutagenesis, or even to make transgenics is genetic modification, but genetic modification can be just as safe as old mutagenesis methods. So what do we need instead of the current genetic modification directive?

First, one could include genome edited and targeted mutagenesis products among the exclusions to the directive. There is no reason to think they’d be any less safe than varieties developed by traditional mutagenesis or by crossing. In fact, the new techniques will give you fewer unexpected other variants as side effects. However, EU law does not seem to acknowledge that kind of argument. There would need to be a new law that isn’t based on the precautionary principle.

Second, one could reform the entire directive to something less draconian. It’s not obvious how to do that, though. On the one hand, the directive is based on perceived risks to human health and the environment of genetic modification itself that have little basis in fact. Maybe starting from the precautionary principle was a reasonable position when the directive was written, but now we know that transgenic organisms in themselves are not a threat to human health, and there is no reason to demand each product be individually evaluated to establish that. On the other hand, one can see the need for some risk assessment of transgenic systems. Say for instance that synthetic gene drives become a reality. We really would want to see some kind of environmental risk assessment before they were used outside of the lab.