Researchers in ecology and evolution don’t use Platt’s strong inference, and that’s okay

This paper (Betts et al. 2021) came out about a month ago investigating whether ecology and evolution papers explicitly state mechanistic hypotheses, and arguing that they ought to, preferably multiple alternative hypotheses. It advocates the particular flavour of hypothetico–deductivism expressed by Platt (1964) as ”strong inference”.

The key idea in Platt’s (1964) account of strong inference, that distinguishes it from garden variety accounts of scientific reasoning, is his emphasis on multiple alternative hypotheses and experiments that distinguish between them. He describes science progressing like a decision tree, with experiments as branching points — a ”conditional inductive tree”. He also emphasises theory construction, as he approvingly quotes biologists on the need to think hard about what possibilities there are in order to make the most informative experiments.

The empirical part of Betts et al. (2021) consists of a literature survey, where the authors read 268 empirical articles from ecology, evolution and glam journals published 1991-2018 to look whether they explicitly stated hypotheses (that is, proposed explanations or causes, regardless of whether they used the actual word ”hypothesis”), whether these were mechanistic, and whether there were multiple working hypotheses contrasted against each other. They estimated the slope over time, and the association between hypothesis use and journal impact factor and whether the research was funded by a major grant.

The results suggest that papers with explicit hypotheses are in a minority, that there was no significant change over time, and little association with impact factor or grants. The prevalence of mechanistic hypotheses was 26% and of multiple working hypotheses 6.7%. There were no significant time trends in hypothesis use. There was a significant difference in journal impact factor in one of the comparisons, where papers with mechanistic hypotheses were published in journals with 0.3 higher impact factor on average. There was no association with grants.

The authors go on to discuss how strong inference is still useful both to the scientific community and to individual researchers, arguing that they might not get more grants or fancier papers, but they will feel better and their research will be of higher quality. How to interpret the lack of clear increase or decrease over time depends on one’s level of optimism I guess. An optimistic take could be that the authors’ fear that machine learning and large datasets are turning researchers away from explanation seem not to be a major concern. A pessimistic take could be, like the they suggest in the Discussion, that decades of admonitions to do hypothetico–deductive science have not had much effect.

Thinking about causality is a good thing

I wholeheartedly agree with the authors that thinking about explanations, causality and mechanism is a useful thing to do, and probably something we ought to do more. It is probably useful to spend more time than we do (for me, to spend more time than I do) thinking about how theories map to testable hypotheses, how those hypotheses map to quantities that can be estimated, and how well the methods and data at hand manage to perform that estimation. Some of my best lessons from science over the last years have come from that sort of thing.

I also agree with them that causality is often what scientists are after — even in many cases where we think that the goal is prediction, the most trustworthy explanation for any ability of a prediction model to generalise is going to be an explanation in terms of mechanisms. They don’t go into this too much, but the caption of Figure 1 gives an example of how even when we are interested in prediction, explanations can be handy.

To take an example from my field: genomic prediction, when we fit statistical models to DNA data to predict trait values for breeding, seems like a pure prediction problem. And animal breeders are pragmatic enough to use anything that worked; if tea leaves worked well for breeding value prediction, they would use them (I am sure I have heard or read some animal breeding researcher make that joke, but I can’t find the source now). But why don’t tea leaves work, while single nucleotide markers spaced somewhat evenly across the genome do? Because we have a fairly well established theory for how genetic variants cause trait variation between individuals in a fairly predictable way. That doesn’t automatically mean that the statistical associations and predictions will transfer between situations — in fact they don’t. But there is theory that helps explain why genomic predictions generalise more or less well.

I also like that they, when they define what a hypothesis is (a proposal of a mechanism or cause of a phenomenon), make very clear that statistical hypotheses and null hypotheses don’t count as scientific hypotheses. There is more to explore here about the relationship between statistical inference and scientific hypotheses, and about the rhetorical move to declare something the null or default model, but that is for another day.

If scientists don’t use strong inference, maybe the problem isn’t with the scientists

Given the mostly negative results, the discussion starts as follows:

Overall, the prevalence of hypothesis use in the ecological and evolutionary literature is strikingly low and has been so for the past 25 years despite repeated calls to reverse this pattern […]. Why is this the case?

They don’t really have an answer to this question. They consider whether most work is descriptive fact finding, or purely about making prediction models, but argue that it is unlikely that 75% of ecology and evolution research is like that — and I agree. They consider a lack of individual incentives for formulating hypotheses, and that might be true; there was no striking association between hypotheses, grants or glamorous publications (unless you consider 0.3 journal impact factor units a compelling individual-level incentive). They suggest that there are costs to hypothesising — it ”an feel like a daunting hurdle”. However, they do not consider the option that their proposed model of science isn’t actually a useful method.

To think about that, we should discuss some of the criticisms of strong inference.

O’Donohue & Buchanan (2001) criticise the strong inference model by arguing that there are problems with each step of the method, and that the history of science anecdotes that Platt use to illustrate it actually show little evidence of being based on strong inference.

Specifically, Platt’s first step, devising alternative hypotheses, is problematic both because one might lack the background knowledge to devise many alternative hypotheses, and that there is no sure way to enumerate the plausible alternative hypotheses.

The second step, devicing crucial experiments, is problematic because of the Duhem–Quine problem, namely that experiments are never conclusive; even when the data are inconsistent with a hypothesis, we do not know whether the problem is with the hypothesis or with any number of, sometimes implicit, auxiliary assumptions. (By the way, I love that Betts et al. cite two ecologists called Quinn and Dunham (1983) who wrote about problems with conclusively testing hypotheses in ecology and evolution. I wish they got together to write it just because the names are so perfect for the topic.)

The third step, conducting crucial experiments, is problematic because experimental results may not cleanly separate hypotheses. Then again, would Platt not just reply that one ought to devise a better experiment then? This objection seems weak. Science is hard and it seems perfectly possible that there are lots of plausible alternative hypotheses that can’t be told apart, at least with data that can be realistically gathered.

Finally, O’Donohue & Buchanan (2001) go through some of Platt’s examples given of supposed strong inference, and suggest that Platt did not represent them accurately. And Platt’s paper really reads as a series of hero-worshipping anecdotes about great scientists, who were very successful and therefore must have employed strong inference. It is not convincing history of science.

Bett et al. (2021) instead give two examples of science that they suppose could have been helped by strong inference. The first example is Lamarck who is supposed to have been able to possibly come up with evolution by natural selection if he had entertained multiple working hypotheses. The second is psychologist Amy Cuddy’s power pose work which supposedly could have been more reproducible had it considered more causal mechanisms. They give no analysis of Lamarck’s scientific method or argument for how strong inference might have helped him. The evidence that strong inference could have helped Amy Cuddy is that she said in an interview that she should have considered the psychological mechanisms behind power posing more.

The claim, inherited from Platt, that multiple working hypotheses reduce confirmation bias really cries out for evidence. As far as I can tell, neither Platt nor Betts et al. provide any, beyond the intuition that you get less attached to one hypothesis if you entertain more than one. That doesn’t seem unreasonable to me, but it just shoves the problem to the next step. Now I have several plausible hypotheses, and I need to decide on one of them, that will advance my decision tree of experiments to the next branching point and provide the headline result for my next paper. That choice seem to me to be just as ripe for confirmation bias and perverse incentives than the choice to call the result for or against a particular hypothesis. In cases where there are only two hypotheses that are taken to be mutually exclusive, the distinction seems only rhetorical.

How Betts et al. (2021) themselves use hypotheses

Let us look at how Betts et al. (2021) themselves use hypotheses and whether they successfully use strong inference for the empirical part of the paper.

That the abstract states two hypotheses — that the number of papers with explicit hypotheses could have decreased because of a perceived rise in descriptive big data research; that explicit hypotheses could have increased because of hypotheses being promoted by journals and funders — none of which turn out to be consistent with the data, which shows a steady low prevalence of explicitly stated hypotheses.

One should note that in no way are these two mechanistic accounts mutually exclusive. If the slope of the line had been positive, that would have no logical force to compel us to believe that the rise of machine learning in research did not lead some researchers to abandon hypothesis-driven research — at most, we could conclude that the quantitative effect of accounts that promote and discourage explicit hypotheses balance towards the former.

Thus, we see two of the objections to Platt’s strong inference paradigm in action: the set of alternative hypotheses is by no means covering the whole range of possibilities, and the study in question is not a conclusive test that allows us to exclude any of them.

In the second set of analyses, measuring whether explicitly stating a hypothesis was associated with journal ranking, citations, or funding, the authors predict that hypotheses ought to be associated with these things if they confer academic success. This conform to their ”if–then” pattern for a research hypothesis, so presumably it is a hypothesis. In this case, there is no alternative hypothesis. This illustrates a third problem with Platt’s strong inference, namely that it is seldom actually applied in real research, even by its proponents, presumably because it is difficult to do so.

If we look at these two sets of analyses (considering change over time in explicit hypothesis use and association between hypothesis use and individual-researcher incentives) and the main message of the paper, which is that strong inference is useful and needs to be encouraged, there is a disconnect. The two sets of hypotheses, whether they are examples of strong inference or not, do not in any way test the theory that strong inference is a useful scientific method, or the normative claim that it therefore should be incentivised — rather, they illustrate them. We can ask Platt’s diagnostic question from the 1964 paper about the idea that strong inference is a method that needs to be encouraged — what would disprove this view? Some kind of data, surely, but nothing that was analysed in this paper.

I hypothesise that this is common in scientific papers. A lot of the reasoning goes on at a higher level than the hypothesis — theories, frameworks, normative stances — and the whether individual hypotheses stand and fall have little bearing on these larger structures. This is not necessarily bad or unscientific, even if it does not conform to Platt’s strong inference.

Method angst

Finally, the paper starts out with a strange anecdote: the claim that there is in the beginning of most scientists’ careers a period of ”hypothesis angst” where the student questions the hypothetico–deductive method. This is stated without evidence, and without following through on the cliff-hanger by explaining how the angst resolves. How are early career scientists convinced to come back into the fold? The anecdote becomes even stranger once you realise that, according to their data, explicit hypothesis use isn’t very common. If most research don’t use explicit hypotheses, it seems more likely that students, who have just sat through courses on scientific method, would feel cognitive dissonance, annoyance or angst over the fact that researcher around them don’t state explicit hypotheses or follow the simple schema of hypothetico–deductivism.

Literature

Betts, MG, Hadley, AS, Frey, DW, et al. When are hypotheses useful in ecology and evolution?. Ecol Evol. 2021; 00: 1-15.

O’Donohue, W., & Buchanan, J. A. (2001). The weaknesses of strong inference. Behavior and philosophy, 1-20.

Platt, JR. (1964) Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science 146 (3642), 347-353.

Research politics

How is science political? As a working scientist, but not a political scientist or a scholar of science and technologies studies, I can immediately think of three categories of social relations that are important to science, and can be called ”politics”.

First, there is politics going on within the scientific community. We sometimes talk about ”the politics within a department” etc, and that seems like not just a metaphor but an accurate description. Who has money, who gets a position, who publishes where? This probably happens at different levels and sizes of micro-cultures, and we don’t have to imagine that it’s an altogether Machiavellian cloak and daggers affair. But we can ask ourselves simple questions like: Who in here is a big shot? Who is feared? Who do you turn to when you need to get something done? Who do you turn to when you need a name?

To the extent that scientist are humans living in a society, the politics within science is probably not all too dissimilar to politics outside. And to the extent that ideas attach themselves to people, this matters to the content of science, not just the people who do it. Sometimes, science changes in process of refinement of models that looks relatively rational and driven by theories and data. Sometimes, it changes, or doesn’t change, by bickering and animosity. Sometimes it changes because the proponents of certain ideas have resources and others don’t. Maybe we can imagine a scenario where parallel invention happen so often that, on average, it doesn’t matter who is in our out and the good ideas prevail. I doubt that is generally the case, though.

Second, there is politics in the sense of policy: government policy, international organisation policy, funding agency priorities, the strategy of a non-governmental organisation etc. Such organisations obviously have power over what research gets done and how, as they should in a democratic society — and as they certainly will make sure to, in any kind of society. To the extent that scientists respond to economic incentives and follow rules, that puts science in connection with politics. Certainly, any scientist involved in the process of applying for funding spends a lot of time thinking about how science aligns with policy and how it is useful.

Because, third, science is useful, which makes it political in the same sense that it is ethical or unethical — research responds to and has effects, even if often modest, on issues in the world. I would argue that science almost always aspires to do something useful, even if indirectly, even in basic science and obscure topics. Scientists are striving to make a difference, because they know how their topic can make a difference, when this isn’t common knowledge. Who knew that it would be important to study the molecular biology of emerging coronaviruses? Well, researchers who studied emerging coronaviruses, of course.

But even if researchers didn’t strive to do any good, all those grant applications were completely insincere and Hardy’s Mathematician’s Apology were right that researchers are chiefly driven by curiosity, pride and ambition … Almost all research would still have some, if modest, political ramifications. If there were no conceivable, even indirect, ways that some research affects any decisions taken by anyone — I’d say it’s either a case of very odd research indeed or very poor imagination.

This post is inspired by this tweet by John Cole, in turn replying to Hilary Agro. I don’t know who these scientists who don’t think that science has political elements are, but I’ll just agree and say that they are thoroughly mistaken.

Two books about academic writing

The university gave us gift cards for books for Christmas, and I spent them on academic self-help books. I expect that reading them will make me completely insufferable and, I hope, teach me something. Two of these books deal with how two write, but in very different ways: ”How to write a lot” by Paul Silvia and ”How to take smart notes” by Sönke Ahrens. In some ways, they have diametrically opposite views of what academic writing is, but they still agree on the main practical recommendation.


,

”How to write a lot” by Paul Silvia

In line with the subtitle — ”a practical guide to productive academic writing” — and the publication with the American Psychological Association brand ”LifeTools”, this is an extremely practical little book. It contains one single message that can be stated simply, a few chapters of elaboration on it, and a few chapters of padding in the form of advice on style, writing grant applications, and navigating the peer review process.

The message can be summarised like this: In order to write a lot, schedule writing time every day (in the morning, or in the afternoon if you are an afternoon person) and treat it like a class you’re teaching, in the sense that you won’t cancel or schedule something else over it unless absolutely necessary. In order to use that time productively, make a list of concrete next steps that will advance your writing projects (that may include other tasks, such as data analysis or background reading, that make the writing possible), and keep track of your progress. You might consider starting or joining a writing group for motivation and accountability.

If that summary was enough to convince you that keeping a writing schedule is a good thing, and give you an idea of how to do it, there isn’t that much else for you in the book. You might still want to read it, though, because it is short and quite funny. Also, the chapters I called padding contain sensible advice: carefully read the instructions for the grant you want to apply for, address all the reviewer comments either by changing something or providing a good argument not to change it, and so on. There is value in writing these things down; the book has potential as something to put in the hands of new researchers. The chapter on style is fine, I guess. I like that it recommends semicolons and discourages acronyms. But what is wrong with the word ”individuals”? Nothing, really, it’s just another academic advice-giver strunkwhiting their pet peeves.

However, if you aren’t convinced about the main message, the book provides a few sections trying to counter common counterarguments to scheduling writing time — ”specious barriers” according to the author — and cites some empirical evidence. That evidence consists of the one (1) publication about writing habits, which itself is a book with the word ”self-help” in the title (Boice 1990, which I couldn’t get a hold of). The data are re-drawn as a bar chart without sample sizes or uncertainty indicators. Uh-oh. I couldn’t get a hold of the book itself, but I did find this criticism of it (Sword 2016):

The admonition ‘Write every day!’ echoes like a mantra through recent books, manuals, and online resources on academic development and research productivity … [long list of citations including the first edition of Silvia’s book]. Ironically, however, this research-boosting advice is seldom backed up by the independent research of those who advocate it. Instead, proponents of the ‘write every day’ credo tend to base their recommendations mostly on anecdotal sources such as their own personal practice, the experiences of their students and colleagues, and the autobiographical accounts of full-time professional authors such as Stephen King, Annie Lamott, Maya Angelou, and bell hooks (see King, 2000, p. 148; Lamott, 1994, pp. xxii, 232; Charney, 2013; hooks, 1999, p. 15). Those who do seek to bolster their advice with research evidence almost inevitably cite the published findings of behavioural psychologist Robert Boice, whose famous intervention studies with ‘blocked’ writers took place more than two decades ago, were limited in demographic scope, and have never been replicated. Boice himself laced his empirical studies with the language of religious faith, referring to his write-every-day crusade as ‘missionary work’ and encouraging those who benefitted from his teachings to go forth and recruit new ‘disciples’ (Boice, 1990, p. 128). Remove Boice from the equation, and the existing literature on scholarly writing offers little or no conclusive evidence that academics who write every day are any more prolific, productive, or otherwise successful than those who do not.

You can tell from the tone where that is going. Sword goes on to give observational evidence that many academics don’t write every day and still do well enough to make it into an interview study of people considered ”exemplary writers”. Then again, maybe they would do even better if they did block out an hour of writing every morning. Sword ends by saying that she still recommends scheduled writing, keeping track of progress, etc, for the same reason as Silvia does — because they have worked for her. At any rate, the empirical backing seems relatively weak. As usual with academic advice, we are in anecdote country.

This book assumes that you have a backlog of writing to do and that academic writing is a matter of applying body to chair, hands to keyboard. You know what to do, now go do the work. This seems to often be true in natural science, and probably also in Silvia’s field of psychology: when we write a typical journal article know what we did, what the results were, and we have a fair idea about what about them is worth discussing. I’m not saying it’s necessarily easy, fun or painless to express that in stylish writing, but it doesn’t require much deep thought or new ideas. Sure, the research takes place in a larger framework of theory and ideas, but each paper moves that frame only very slightly, if at all. Silvia has this great quote that I think gets the metaphor right:

Novelists and poets are the landscape artists and portrait painters; academic writers are the people with big paint sprayers who repaint your basement.

Now on to a book about academic writing that actually does aspire to tell you how to have deep thoughts and new ideas.

”How to take smart notes” by Sönke Ahrens

”How to take smart notes”, instead, is a book that de-emphasises writing as a means to produce a text, and emphasises writing as a tool for thinking. It explains and advocates for a particular method for writing and organising research notes — about the literature and one’s own ideas –, arguing that it can can make researchers both more productive and creative. One could view the two books as dealing with two different steps of writing, with Ahrens’ book presenting a method for coming up with ideas and Silvia’s book presenting a method for turning those ideas into manuscripts, but Ahrens actually seems to suggest that ”How to take smart notes” provides a workflow that goes all the way to finished product — and as such, it paints a very different picture of the writing process.

The method is called Zettelkasten, which is German for a filing box for index cards (or ”slip-box”, but I refuse to call it that), and metonymically a note-taking method that uses one. That is, you use a personal index card system for research notes. In short, the point of the method is that when you read about or come up with an interesting idea (fact, hypothesis, conjecture etc) that you want to save, you write it on a single note, give it a number, and stick it in your archive. You also pull out other notes that relate to the idea, and add links between this new note and what’s already in your system. The box is nowadays metaphorical and replaced by software. Ahrens is certainly not the sole advocate; the idea even has its own little movement with hashtags, a subreddit and everything.

It is fun to compare ”How to take smart notes” to ”How to write a lot”, because early in the book, Ahrens criticises writing handbooks for missing the point by starting too late in the writing process — that is, when you already know what kind of text you are going to write — and neglecting the part that Ahrens thinks is most important: how you get the ideas in place to know what to write. He argues that the way to know what to write is to read widely, take good notes and make connections between those notes, and then the ideas for things to write will eventually emerge from the resulting structure. Then, at some point, you take the relevant notes out of the system, arrange them in the order into a manuscript, and then edit over them until the text is finished. So, we never really sit down to write. We write parts of our texts every day as notes, and then we edit them into shape. That is a pretty controversial suggestion, but it’s also charming. Overall, this book is delightfully contrarian, asking the reader not to plan their writing but be guided by their professional intuition; not to brainstorm ideas, because the good ideas will be in their notes and not in their brain; not to worry about forgetting what they read because forgetting is actually a good thing; to work only on things they find interesting, and so on.

This book is not practical, but an attempt to justify the method and turn it into a writing philosophy. I like that choice, because that is much more interesting than a simple guide. The explanation for how to practically implement a Zettelkasten system takes up less than three pages of the book, and does not include any meaningful practical information about how to set it up on a computer. All of that can be found on the internet in much greater detail. I’ve heard Ahrens say in an interview that the reason he didn’t go into the technology too much is that he isn’t convinced there is a satisfying software solution yet; I agree. The section on writing a paper, Zettelkasten-style, is less than five pages. The rest of the book is trying to connect the methods to observations from pedagogy and psychology, and to lots of anecdotes. Investor Charlie Munger said something about knowledge? You bet it can be read as an endorsement of Zettelkasten!

Books about writing can reveal something about how they were written, and both these books do. In ”How to write a lot”, Silvia talks about his own writing schedule and even includes a photo of his workspace to illustrate the point that you don’t need fancy equipment. In ”How to take smart notes”, Ahrens gives an example of how this note taking methods led him to an idea:

This book is also written with the help of a slip-box. It was for example a note on ”technology, acceptance problems” that pointed out to me that the answer to the question why some people struggle to implement the slip-box could be found in a book on the history of the shipping container. I certainly would not have looked for that intentionally — doing research for a book on effective writing! This is just one of many ideas the slip-box pointed out to me.

If we can learn something from what kinds of text this method produces by looking at ”How to take smart notes”, it seems that the method might help make connections between different topics and gather illustrative anecdotes, because the book is full of those. This also seems to be something Ahrens values in a text. On the other hand, it also seems that the method might lead to disorganised text, because the book also is full of that. It is divided into four principles and six steps, but I can neither remember what the steps and principles are nor how they relate to each other. The principle of organisation seems to be free form elaboration and variation, rather than disposition. Maybe it would work well as a hypertext, preserving some of the underlying network structure.

But we don’t know what the direction of causality is here. Maybe Ahrens just writes in this style and likes this method. Maybe with different style choices or editing, a Zettelkasten-composed text will look just like any other academic text. It must be possible to write plain old IMRAD journal articles with this system too. Imagine I needed to write an introductory paragraph on genetic effects on growth in chickens and were storing my notes in a Zettelkasten; I go to a structure note about growth in chickens, pull out all my linked literature notes about different studies, all accompanied by my own short summaries of what they found. Seems like this could be pretty neat, even for such a modest intellectual task.

Finally, what is that one main practical recommendation that both books, despite their utterly different perspectives on writing, agree on? To make it a habit to write every day.

Literature

Sword, H. (2016). ‘Write every day!’: a mantra dismantled. International Journal for Academic Development, 21(4), 312-322.

Silvia, P. J. (2019). How to write a lot: A practical guide to productive academic writing. Second edition. American Psychological Association

Ahrens, S. (2017). How to take smart notes: One simple technique to boost writing, learning and thinking. North Charleston, SC: CreateSpace Independent Publishing Platform.

Significance+novelty

I’ve had reasons to read and think more about research grant applications over the last years; I’ve written some, exchanged feedback with colleagues, and I was on one of the review panels for Vetenskapsrådet last year. As a general observation, it appears that I’m not the only one who struggles with explaining the ”Significance and novelty” of my work. That’s a pretty banal observation. But why is that?

It’s easy to imagine that this difficulty is just because of the curse of knowledge, that researchers are so deeply invested in our research topics that it is hard for us to imagine anyone not intuitively understanding what topic X is about, and how vital this is to humanity. Oh, those lofty scientists levitating in their mushroom towers! I am sure that is partially right; the curse of knowledge is a big problem when writing about your science, but there is a bigger problem.

If we look at statements of significance (for example in my own early drafts), it is pretty common to see significance and novelty established in a disembodied way:

This work is significant because topic X is a Big Problem.

This establishes that the sub-field encompassing the work is important in a general way.

This work is novel because, despite sustained research, no-one has yet done experiment Y in species Z with approach Å.

This establishes that there is a particular gap in the sub-sub-field where this research fits.

What these sentences fail to establish is the causal chain that the reader cares about: Will performing this research, at this time, make a worthwhile contribution to solving the Big Problem?

And there might be a simple explanation: The kind of reasoning required here is unique to the grant application. When writing papers, it is sufficient to establish that the area around the work is important and that the work ”… offers insights …” in some manner. After all, the insights are offered right there in the paper. The reader can look at them and figure out the value for themselves. The reader of a grant application can’t, because the insights have not materialised yet.

When planning new work and convincing your immediate collaborators that the work is worthwhile pursuing, you also don’t have to employ these kinds of arguments. The colleagues are likely motivated by other factors, like the direct implications for their work (and cv), how fun the new project will be, or how much they’d like to work with you. Again, the reader of the grant application needs another kind of convincing.

Thankfully, the funders help out. Here are some of the questions the VR peer review handbook (pdf) lists, that pertain to significance and novelty:

To what extent does the proposed project define new, interesting scientific questions?

To what extent does the proposed project use new ways and methods to address important scientific questions?

When applicable, is the proposed development of methods or techniques of high scientific significance? Does the proposed development allow new scientific questions to be addressed?

Maybe that helps. See you around, I need to go practice explaining how my work leads to new scientific questions.

The Fulsome Principle

”If it be aught to the old tune, my lord,
It is as fat and fulsome to mine ear
As howling after music.”
(Shakespeare, The Twelfth Night)

There are problematic words that can mean opposite things, I presume either because two once different expressions meandered in the space of meaning until they were nigh indistinguishable, or because something that already had a literal meaning went and became ironic. We can think of our favourites, like a particular Swedish expression that either means that you will get paid or not, or ”fulsome”. Is ”fulsome praise” a good thing (Merriam Webster Usage Note)?

Better avoid the ambiguity. I think this is a fitting name for an observation about language use.

The Fulsome Principle: smart people will gladly ridicule others for breaking supposed rules that are in fact poorly justified.

That is, if we take any strong position about what ”fulsome” means that doesn’t acknowledge the ambiguity, we are following a rule that is poorly justified. If we make fun of anyone for getting the rule wrong, condemning them for as misusing and degrading the English language, we are embarrassingly wrong. We are also in the company of many other smart people who snicker before checking the dictionary. It could also be called the Strunk & White principle.

This is related to:

The Them Principle: If you think something sounds like a novel misuse and degradation of language, chances are it’s in Shakespeare.

This has everything to do with language use in science. How many times have you heard geneticists or evolutionary biologists haranguing some outsider to their field, science writer or student for misusing ”gene”, ”fitness”, ”adaptation” or similar? I would suspect: Many. How many times was the usage, in fact, in line with how the word is used in an adjacent sub-subfield? I would suspect: Also, many.

In ”A stylistic note” at the beginning of his book The Limits of Kindness (Hare 2013), philosopher Caspar Hare writes:

I suspect that quite often, when professional philosophers use specialized terms, they have subtly different senses of those terms in mind.

One example, involving not-so-subtly-different senses of a not-very-specialized term: We talk a great deal about biting the bullet. For example, ”I confronted David Lewis with my damning objection to modal realism, and he bit the bullet.” I have asked a number of philosophers about what, precisely, this means, and received a startling range of replies.

Around 70% say that the metaphor has to do with surgery. … So, in philosophy, for you to bite the bullet is for you to grimly accept seemingly absurd consequences of the theory you endorse. This is, I think, the most widely understood sense of the term.
/…/

Some others say that the metaphor has to do with injury … So in philosophy, for you to acknowledge that you are biting the bullet is for you to acknowledge that an objection has gravely wounded your theory.

/…/

One philosopher said to me that the metaphor has to do with magic. To bite a bullet is to catch a bullet, Houdini-style, in your teeth. So, in philosophy, for you to bite the bullet is for you to elegantly intercept a seemingly lethal objection and render it benign.

/…/

I conclude from my highly unscientific survey that, more than 30 percent of the time, when a philosopher claims to be ”biting the bullet,” a small gap opens up between what he or she means and what his or her reader or listener understands him or her to mean.

And I guess these small gaps in understanding are more common than we normally think.”

Please, don’t go back to my blog archive and look for cases of me railing against someone’s improper use of scientific language, because I’m sure I’ve done it too many times. Mea maxima culpa.

The next notebook of work

Dear diary,

The last post was about my attempt to use the Getting Things Done method to bring some more order to research, work, and everything. This post will contain some more details about my system, at a little less than a year into the process, on the off chance that anyone wants to know. This post will use some Getting Things Done jargon without explaining it. There are many useful guides online, plus of course the book itself.

Medium

Most of my system lives in paper notebooks. The main notebook contains my action list, projects list, waiting for list and agendas plus a section for notes. I quickly learned that the someday/maybe lists won’t fit, so I now have a separate (bigger) notebook for those. My calendar is digital. I also use a note taking app for project support material, and as an extra inbox for notes I jot down on my phone. Thus, I guess it’s a paper/digital hybrid.

Contexts

I have five contexts: email/messaging, work computer, writing, office and home. There were more in the beginning, but I gradually took out the ones I didn’t use. They need to be few enough and map cleanly to situations, so that I remember to look at them. I added the writing context because I tend to treat, and schedule, writing tasks separately from other work tasks. The writing context also includes writing-adjacent support tasks such as updating figures, going through reviewer comments or searching for references.

Inboxes

I have a total of nine inboxes, if you include all the email accounts and messenger services where people might contact me about things I need to do. That sounds excessive, but only three of those are where I put things for myself (physical inbox, notes section of notebook, and notes app), and so far they’re all getting checked regularly.

Capture

I do most of my capture in the notes app on my phone (when not at a desk) or on piece of paper (when at my desk). When I get back to having in-person meetings, I assume more notes are going to end up in the physical notebook, because it’s nicer to take meeting notes on paper than on a phone.

Agendas

The biggest thing I changed in the new notebook was to dedicate much more space to agendas, but it’s already almost full! It turns out there are lots of things ”I should talk to X about the next time we’re speaking”, rather than send X an email immediately. Who knew?

Waiting for

This is probably my favourite. It is useful to have a list of who have said they will get back to me, when, and about what. That little date next to their name helps me not feel like a nag when I ask them again after a reasonable time, and makes me appreciate them more when they respond quickly.

Weekly review

I already had the habit of scheduling an appointment with myself on Fridays (or otherwise towards the end of the week) to go over some recurring items. I’ve expanded this appointment to do a weekly review of the notebook, calendar, someday/maybe list, and some other bespoke checklist items. I bribe myself with sweets to support this habit.

Things I’d like to improve

Here are some of the things I want to improve:

  • The project list. A project sensu Getting Things Done can be anything from purchase new shoes to taking over the world. The project list is supposed to keep track of what you’ve undertaken to do, and make sure you have come up with actions that progress them. My project list isn’t very complete, and doesn’t spark new actions very often.
  • Project backlogs. On the other hand, I have some things on the project list that are projects in a greater sense, and will have literally thousands of actions, both from me and others. These obviously need planning ahead beyond the next thing to do. I haven’t yet figured out the best way to keep a backlog of future things to do in a project, potentially with dependencies, and feed them into my list of things to do when they become current.
  • Notes. I have a strong note taking habit, but a weak note reading habit. Essentially, many of my notes are write-only; this feels like a waste. I’ve started my attempts to improve the situation with meeting notes: trying to take five minutes right after a meeting (if possible) to go over the notes, extract any calendar items, actions and waiting-fors, and decide whether I need to save the note or if I can throw it away. What to do about research notes from reading and from seminars is another matter.

One notebook’s worth of work

Image: an Aviagen sponsored notebook from the 100 Years of Genetics meeting in Edinburgh, with post-its sticking out, next to a blue Ballograf pen

Dear diary,

”If could just spend more time doing stuff instead of worrying about it …” (Me, at several points over the years.)

I started this notebook in spring last year and recently filled it up. It contains my first implementation of the system called ”Getting Things Done” (see the book by David Allen with the same name). Let me tell you a little about how it’s going.

The way I organised my work, with to-do lists, calendar, work journal, and routines for dealing with email had pretty much grown organically up until the beginning of this year. I’d gotten some advice, I’d read the odd blog post and column about email and calendar blocking, but beyond some courses in project management (which are a topic for another day), I’d gotten myself very little instruction on how to do any of this. How does one actually keep a good to-do list? Are there principles and best practices? I was aware that Getting Things Done was a thing, and last spring, a mention in passing on the Teaching in Higher Ed podcast prompted me to give it a try.

I read up a little. The book was right there in the university library, unsurprisingly. I also used a blog post by Alberto Taiuti about doing Getting Things Done in a notebook, and read some other writing by researchers about how they use the method (Robert Talbert and Veronika Cheplygina).

There is enough out there about this already that I won’t make my own attempt to explain the method in full, but here are some of the interesting particulars:

You are supposed to be careful about how you organise your to-do lists. You’re supposed to make sure everything on the list is a clear, unambiguous next action that you can start doing when you see it. Everything else that needs thinking, deciding, mulling over, reflecting etc, goes somewhere else, not on your list of thing to do. This means that you can easily pick something off your list and start work on it.

You are supposed to be careful about your calendar. You’re supposed to only put things in there that have a fixed date and time attached, not random reminders or aspirational scheduling of things you would like to do. This means that you can easily look at your calendar and know what your day, week and month look like.

You are supposed to be careful to record everything you think about that matters. You’re supposed to take a note as soon as you have a potentially important thought and put it in a dedicated place that you will check and go through regularly. This means that you don’t have to keep things in your head.

This sounds pretty straightforward, doesn’t it? Well, despite having to-do lists, calendars and a habit of note-taking for years, I’ve not been very disciplined about any of this before. My to-do list items have often been vague, too big tasks that are hard to get started on. My calendar has often contained aspirational planning entries that didn’t survive contact with the realities of the workday. I often delude myself that I’ll remember an idea or a decision, to have quietly it slip out of my mind.

Have I become more productive, or less stressed? The honest answer is that I don’t know. I don’t have a reliable way to track either productivity or stress levels, and even if I did: the last year has not really been comparable to the year before, for several reasons. However, I feel like thinking more about how I organise my work makes a difference, and I’ve felt a certain joy working on the process, as well as a certain dread when looking at it all organised in one place. Let’s keep going and see where this takes us.

Against question and answer time

Here is a semi-serious suggestion: Let’s do away with questions and answers after talks.

I’ll preface with two examples:

First, a scientist I respect highly had just given a talk. As we were chatting away afterwards, I referred to someone who had asked a question during the talk. The answer: ”I didn’t pay attention. I don’t listen when people talk at me like that.”

Second, Swedish author Göran Hägg had this little joke about question and answer time. I paraphrase from memory: Question time is useless because no reasonable person who has a useful contribution will be socially uninhibited enough to ask a question in a public forum (at least not in Sweden). To phrase it more nicely: Having a useful contribution and feeling comfortable to speak up might not be that well correlated.

I have two intuitions about this. On the one hand, there’s the idea that science thrives on vigorous criticism. I have been at talks where people bounce questions at the speaker, even during the talk and even with pretty serious criticisms, and it works just fine. I presume it has to do both with respect, skill at asking and answering, and the power and knowledge differentials between interlocutors.

On the other hand, we would prefer to have a good conversation and productive arguments, and I’m sure everyone has been in seminar rooms where that wasn’t the case. It’s not a good conversation if, say, question and answers turn into old established guys (sic) shouting down students. In some cases, it seems the asker is not after a productive argument, nor indeed any honest attempt to answer the question. (You might be able to tell by them barking a new question before the respondent has finished.)

Personally, I’ve turned to asking fewer questions. If it’s something I’ve misunderstood, it’s unlikely that I will get the explanation I need without conversation and interaction. If I have a criticism, it’s unlikely that I will get the best possible answer from the speaker on the spot. If I didn’t like the seminar, am upset with the speaker’s advisor, hate it when people mangle the definition of ”epigenetics” or when someone shows a cartoon of left-handed DNA, it’s my problem and not something I need to share with the audience.

I think questions and answers is one of thing that actually has benefitted from a move to digital seminars on a distance, where questions are often written in chat. This might be because of a difference in tone between writing a question down or asking it verbally, or thanks to the filtering capabilities of moderators.

Various positions II

Again, what good is a blog if you can’t post your arbitrary idiosyncratic opinions as if you were an authority?

Don’t make a conference app

I get it, you can’t print a full-blown paper program book: it is too much, no one reads it, and it feels wasteful. But please, please, for the love of everything holy, don’t make an app. Put the text, straight up, on a website in plaintext. It loads quickly, it’s searchable, it can be automatically generated. The conference app will be cloddy, take up space on the phone, eat bandwidth on some strained mobile contract, and invariably freeze.

Posters, still bad in 2020

Don’t believe the lies: a once folded canvas poster will never look good again. You haven’t had fun on a conference before you’ve tried ironing a poster on a hostel floor with an iron that belongs in a museum.

Poster sessions are bad by necessity. If they had had space and time to be anything other than a crowded mess, the conference would have to accept substantially fewer posters. That means fewer participants, probably especially earlier career participants, and the value of having them outweighs the value of a somewhat better poster session.

Gene accession numbers

PLOS Genetics has a great policy in their submission guidelines that doesn’t seem to get followed very much in papers they actually publish. This should be the norm in every genetics paper. I feel bad that it’s not the case in all my papers.

As much as possible, please provide accession numbers or identifiers for all entities such as genes, proteins, mutants, diseases, etc., for which there is an entry in a public database, for example:

Ensembl
Entrez Gene
FlyBase
InterPro
Mouse Genome Database (MGD)
Online Mendelian Inheritance in Man (OMIM)
PubChem

Identifiers should be provided in parentheses after the entity on first use.

In the future, with the right ontologies and repositories in place, I hope this will be the case with traits, methods and so on as well.

UK Biobank and dbGAP are not open data

And that is fine.

Stop it with the work-life balance tweets

No-one should tweet about work-life balance; whether you write about how much you work or how diligent you are about your hours, it comes off as bragging.

Tenses

Write your papers in the past or present tense, whichever you prefer. In the context of a scientific paper, the difference between past and present communicates nothing. I suppose you’re not supposed to mix tenses, but that doesn’t matter either. Most readers probably won’t notice. If you ask me about my stylistic opinion: present tense for everything. But again, it doesn’t matter.

A partial success

In 2010, Poliseno & co published some results on the regulation of a gene by a transcript from a pseudogene. Now, Kerwin & co have published a replication study, the protocol for which came out in 2015 (Khan et al). An editor summarises it like this in an accompanying commentary (Calin 2020):

The partial success of a study to reproduce experiments that linked pseudogenes and cancer proves that understanding RNA networks is more complicated than expected.

I guess he means ”partial success” in the sense that they partially succeeded in performing the replication experiments they wanted. These experiments did not reproduce the gene regulation results from 2010.

Seen from the outside — I have no insight in what is going on here or who the people involved are — something is not working here. If it takes five years from paper to replication effort, and then another five years to replication study accompanied by an editorial commentary that subtly undermines it, we can’t expect replication studies to update the literature, can we?

Communication

What’s the moral of the story, according to Calin?

What are the take-home messages from this Replication Study? One is the importance of fruitful communication between the laboratory that did the initial experiments and the lab trying to repeat them. The lack of such communication – which should extend to the exchange of protocols and reagents – was the reason why the experiments involving microRNAs could not be reproduced. The original paper did not give catalogue numbers for these reagents, so the wrong microRNA reagents were used in the Replication Study. The introduction of reporting standards at many journals means that this is less likely to be an issue for more recent papers.

There is something right and something wrong about this. On the one hand, talking to your colleagues in the field obviously makes life easier. We would like researchers to put all pertinent information in writing, and we would like there to be good communication channels in cases where the information turns out not to be what the reader needed. On the other hand, we don’t want science to be esoteric. We would like experiments to be reproducible without the special artifact or secret sauce. If nothing else, because the people’s time and willingness to provide tech support for their old papers might be limited. Of course, this is hard, in a world where the reproducibility of an experiment might depend on the length of digestion (Hines et al 2014) or that little plastic thingamajig you need for the washing step.

Another take-home message is that it is finally time for the research community to make raw data obtained with quantitative real-time PCR openly available for papers that rely on such data. This would be of great benefit to any group exploring the expression of the same gene/pseudogene/non-coding RNA in the same cell line or tissue type.

This is true. You know how doctored, or just poor, Western blots are a notorious issue in the literature? I don’t think that’s because Western blot as a technique is exceptionally bad, but because there is a culture of showing the raw data (the gel), so people can notice problems. However, even if I’m all for showing real-time PCR amplification curves (as well as melting curves, standard curves, and the actual batch and plate information from the runs), I doubt that it’s going to be possible to trouble-shoot PCR retrospectively from those curves. Maybe sometimes one would be able to spot a PCR that looks iffy, but beyond that, I’m not sure what we would learn. PCR issues are likely to have to do with subtle things like primer design, reaction conditions and handling that can only really be tackled in the lab.

The world is messy, alright

Both the commentary and the replication study (Kerwin et al 2020) are cautious when presenting their results. I think it reads as if the authors themselves either don’t truly believe their failure to replicate or are bending over backwards to acknowledge everything that could have gone wrong.

The original study reported that overexpression of PTEN 3’UTR increased PTENP1 levels in DU145 cells (Figure 4A), whereas the Replication Study reports that it does not. …

However, the original study and the Replication Study both found that overexpression of PTEN 3’UTR led to a statistically significant decrease in the proliferation of DU145 cells compared to controls.

In the original study Poliseno et al. reported that two microRNAs – miR-19b and miR-20a – suppress the transcription of both PTEN and PTENP1 in DU145 prostate cancer cells (Figure 1D), and that the depletion of PTEN or PTENP1 led to a statistically significant reduction in the corresponding pseudogene or gene (Figure 2G). Neither of these effects were seen in the Replication Study. There are many possible explanations for this. For example, although both studies used DU145 prostate cancer cells, they did not come from the same batch, so there could be significant genetic differences between them: see Andor et al. (2020) for more on cell lines acquiring mutations during cell cultures. Furthermore, one of the techniques used in both studies – quantitative real-time PCR – depends strongly on the reagents and operating procedures used in the experiments. Indeed, there are no widely accepted standard operating procedures for this technique, despite over a decade of efforts to establish such procedures (Willems et al., 2008; Schwarzenbach et al., 2015).

That is both commentary and replication study seem to subscribe to a view of the world where biology is so rich and complex that both might be right, conditional on unobserved moderating variables. This is true, but it throws us into a discussion of generalisability. If a result only holds in some genotypes of DU145 prostate cancer cells, which might very well be the case, does it generalise enough to be useful for cancer research?

Power underwhelming

There is another possible view of the world, though … Indeed, biology rich and complicated, but in the absence of accurate estimates, we don’t know which of all these potential moderating variables actually do anything. First order, before we start imagining scenarios that might explain the discrepancy, is to get a really good estimate of it. How do we do that? It’s hard, but how about starting with a cell size greater than N = 5?

The registered report contains power calculations, which is commendable. As far as I can see, it does not describe how they arrived at the assumed effect sizes. Power estimates for a study design depend on the assumed effect sizes. Small studies tend to exaggerate effect sizes (because, if an estimate is small the difference can’t be significant). This means that taking the estimates as staring effect sizes might leave you with a design that is still unable to detect a true effect of reasonable size.

I don’t know what effect sizes one should expect in these kinds of experiments, but my intuition would be that even if you think that you can get good power with a handful of samples per cell, can’t you please run a couple more? We are all limited by resources and time, but if you’re running something like a qPCR, the cost per sample must be much smaller than the cost for doing one run of the experiment in the first place. It’s really not as simple as adding one row on a plate, but almost.

Literature

Calin, George A. ”Reproducibility in Cancer Biology: Pseudogenes, RNAs and new reproducibility norms.” eLife 9 (2020): e56397.

Hines, William C., et al. ”Sorting out the FACS: a devil in the details.” Cell reports 6.5 (2014): 779-781.

Kerwin, John, and Israr Khan. ”Replication Study: A coding-independent function of gene and pseudogene mRNAs regulates tumour biology.” eLife 9 (2020): e51019.

Khan, Israr, et al. ”Registered report: a coding-independent function of gene and pseudogene mRNAs regulates tumour biology.” Elife 4 (2015): e08245.

Poliseno, Laura, et al. ”A coding-independent function of gene and pseudogene mRNAs regulates tumour biology.” Nature 465.7301 (2010): 1033-1038.