Better posters are nice, but we need better poster session experiences

Fear and loathing in the conference centre lobby

Let me start from a negative place, because my attitude to poster sessions is negative. Poster sessions are neither good ways to communicate science, nor to network at conferences. Moreover, they are unpleasant.

The experience of going to a poster session, as an attendant or a presenter goes something like this: You have to stand in a crowded room that is too loud and try to either read technical language or hold a conversation in about a difficult topic. Even without anxiety, mobility, or hearing difficulties, a poster session is unlikely to be enjoyable or efficient.

Poster sessions are bad because of necessities of conference organisation. We want to invite many people, but we can’t fit in many talks; we get crowded poster sessions.

They are made worse by efforts to make them better, such as mandating presenters to stand by their posters, in some cases on pain of some sanction by the organisers, or to have the poster presenters act as dispensers of alcohol. If you need to threaten or drug people to participate in an activity, that might be a sign.

They are made not worse but a bit silly, by assertions that poster sessions are of utmost importance for conferencing. Merely stating that the poster session is vibrant and inspiring, or that you want to emphasise the poster as an important form of communication, sadly, does not make it so, if the poster sessions are still business as usual.

Mike Morrison’s ”Better Scientific Poster” design

As you can see above, my diagnosis of the poster session problem is part that you’re forced to read walls of text or listen to mini-lectures, and part that it happens in an overcrowded space. The walls of text and mini-lecture might be improved by poster design.

Enter the Better Scientific Poster. I suggest clicking on that link and looking at the poster templates directly. I waited too long to look at the actual template files, because I expected a bunch of confusing designer stuff. It’s not. They contain their own documentation and examples.

There is also a video on YouTube expanding on the thinking behind the design, but I think this conversation on the Everyting Hertz podcast is the best introduction, if you need an introduction beyond the template. The YouTube video doesn’t go into enough detail, and is also a bit outdated. The poster template has gone through improvements since.

If you want to hear the criticisms of the design, here’s a blog post summarising some of it. In short, it is unscientific and intellectually arrogant to put a take home message in too large a font, and it would be boring if all posters used the same template. Okay.

The caveats

I am not a designer, which should be abundantly clear to everyone. I don’t really know what good graphic design principles for a poster are.

There is also no way to satisfy everyone. Some people will think you’ve put too little on the poster unless it ”tells the full story” and a has self-contained description of the methods with all caveats. Some people, like me, will think you’ve put way too much on it long before that.

What I like, however, is that Morrison’s design is based on an analysis of the poster session experience that aligns with mine, and that it is based on a goal for the poster that makes sense. The features of the design flow from that goal. If you listen to the video or the Hertz episode: Morrison has thought about the purpose of the poster.

He’s not just expressing some wisdom his PhD supervisor told him in stern voice, or what his gut feeling tells him, which I suspect is the two sources that scientists’ advice on communication is usually based on. We all think that poster sessions are bad, because we’ve been to poster sessions. We usually don’t have thought-through ideas about what to do better.

Back to a place of negativity

For those reasons, I think the better poster is likely to be an improvement. I was surprised that I didn’t see it sweep through poster sessions at the conferences I went to last summer, but there were a few. I was going to try it for TAGGC 2020 (here is my poster aboutthe genetics of recombination rate in the pig), but that moved online, which made poster presentations a little different.

However, changing up poster layout can only get you so far. Unless someone has a stroke of genius to improve the poster viewing experience or change the economics of poster attendance, there no bright future for the poster session. Individually, the rational course of action isn’t to fiddle with the design and spend time to squeeze marginal improvements out of our posters. It is to spend as little time as possible on posters, ignoring our colleagues’ helpful advice on how to make them prettier and more scientific, and lowering our expectations.

Virtual animal breeding journal club: ”Structural equation models to disentangle the biological relationship between microbiota and complex traits …”

The other day was the first Virtual breeding and genetics journal club organised by John Cole. This was the first online journal club I’ve attended (shocking, given how many video calls I’ve been on for other sciencey reasons), so I thought I’d write a little about it: both the format and the paper. You can look the slide deck from the journal club here (pptx file).

The medium

We used Zoom, and that seemed to work, as I’m sure anything else would, if everyone just mute their microphone when they aren’t speaking. As John said, the key feature of Zoom seems to be the ability for the host to mute everyone else. During the call, I think we were at most 29 or so people, but only a handful spoke. It will probably get more intense with the turn taking if more people want to speak.

The format

John started the journal club with a code of conduct, which I expect helped to set what I felt was a good atmosphere. In most journal clubs I’ve been in, I feel like the atmosphere has been pretty good, but I think we’ve all heard stories about hyper-critical and hostile journal clubs, and that doesn’t sound particularly fun or useful. On that note, one of the authors, Oscar González-Recio, was on the call and answered some questions.

The paper

Saborío‐Montero, Alejandro, et al. ”Structural equation models to disentangle the biological relationship between microbiota and complex traits: Methane production in dairy cattle as a case of study.” Journal of Animal Breeding and Genetics 137.1 (2020): 36-48.

The authors measured methane emissions (by analysing breath with with an infrared gas monitor) and abundance of different microbes in the rumen (with Nanopore sequencing) from dairy cows. They genotyped the animals for relatedness.

They analysed the genetic relationship between breath methane and abundance of each taxon of microbe, individually, with either:

  • a bivariate animal model;
  • a structural equations model that allows for a causal effect of abundance on methane, capturing the assumption that the abundance of a taxon can affect the methane emission, but not the other way around.

They used them to estimate heritabilities of abundances and genetic correlations between methane and abundances, and in the case of the structural model: conditional on the assumed causal model, the effect of that taxon’s abundance on methane.

My thoughts

It’s cool how there’s a literature building up on genetic influences on the microbiome, with some consistency across studies. These intense high-tech studies on relatively few cattle might build up to finding new traits and proxies that can go into larger scale phenotyping for breeding.

As the title suggests, the paper advocates for using the structural equations model: ”Genetic correlation estimates revealed differences according to the usage of non‐recursive and recursive models, with a more biologically supported result for the recursive model estimation.” (Conclusions)

While I agree that a priori, it makes sense to assume a structural equations model with a causal structure, I don’t think the results provide much evidence that it’s better. The estimates of heritabilities and genetic correlations from the two models are near indistinguishable. Here is the key figure 4, comparing genetic correlation estimates:

saborio-montero-fig4

As you can see, there are a couple of examples of genetic correlations where the point estimate switches sign, and one of them (Succinivibrio sp.) where the credible intervals don’t overlap. ”Recursive” is the structural equations model. The error bars are 95% credible intervals. This is not strong evidence of anything; the authors are responsible about it and don’t go into interpreting this difference. But let us speculate! They write:

All genera in this case, excepting Succinivibrio sp. from the Proteobacteria phylum, resulted in overlapped genetic cor- relations between the non‐recursive bivariate model and the recursive model. However, high differences were observed. Succinivibrio sp. showed the largest disagreement changing from positively correlated (0.08) in the non‐recursive bivariate model to negatively correlated (−0.20) in the recursive model.

Succinivibrio are also the taxon with the estimated largest inhibitory effect on methane (from the structural equations model).

While some taxa, such as ciliate protozoa or Methanobrevibacter sp., increased the CH4 emissions …, others such as Succinivibrio sp. from Proteobacteria phylum decreased it

Looking at the paper that first described these bacteria (Bryan & Small 1955),  Succinivibrio were originally isolated from the cattle rumen, and their name is because ”they ferment glucose with the production of a large amount of succinic acid”. Bryant & Small made a fermentation experiment to see what came out, and it seems that the bacteria don’t produce methane:

succ_table2

This is also in line with a rRNA sequencing study of high and low methane emitting cows (Wallace & al 2015) that found lower Succinivibrio abundance in high methane emitters.

We may speculate that Succinivibrio species could be involved in diverting energy from methanogens, and thus reducing methane emissions. If that is true, then the structural equations model estimate (larger genetic negative correlation between Succinivibrio abundance and methane) might be better than one from the animal model.

Finally, while I’m on board with the a priori argument for using a structural equations model, as with other applications of causal modelling (gene networks, Mendelian randomisation etc), it might be dangerous to consider only parts of the system independently, where the microbes are likely to have causal effects on each other.

Literature

Saborío‐Montero, Alejandro, et al. ”Structural equation models to disentangle the biological relationship between microbiota and complex traits: Methane production in dairy cattle as a case of study.” Journal of Animal Breeding and Genetics 137.1 (2020): 36-48.

Wallace, R. John, et al. ”The rumen microbial metagenome associated with high methane production in cattle.” BMC genomics 16.1 (2015): 839.

Bryant, Marvin P., and Nola Small. ”Characteristics of two new genera of anaerobic curved rods isolated from the rumen of cattle.” Journal of bacteriology 72.1 (1956): 22.

Things that really don’t matter: megabase or megabasepair

Should we talk about physical distance in genetics as number of base pairs (kbp, Mbp, and so on) or bases (kb, Mb)?

I got into a discussion about this recently, and I said I’d continue the struggle on my blog. Here it is. Let me first say that I don’t think this matters at all, and if you make a big deal out of this (or whether ”data” can be singular, or any of those inconsequential matters of taste we argue about for amusement), you shouldn’t. See this blog post as an exorcism, helping me not to trouble my colleagues with my issues.

What I’m objecting to is mostly the inconsistency of talking about long stretches of nucleotides as ”kilobase” and ”megabase” but talking about short stretches as ”base pairs”. I don’t think it’s very common to call a 100 nucleotide stretch ”a 100 b sequence”; I would expect ”100 bp”. For example, if we look at Ensembl, they might describe a large region as 1 Mb, but if you zoom in a lot, they give length in bp. My impression is that this is a common practice. However, if you consistently use ”bases” and ”megabase”, more power to you.

Unless you’re writing a very specific kind of bioinformatics paper, the risk of confusion with the computer storage unit isn’t a problem. But there are some biological arguments.

A biological argument for ”base”, might be that we care about the identity of the base, not the base pairing. We note only one nucleotide down when we write a nucleic acid sequence. The base pair is a different thing: that base bound to the one on the other strand that it’s paired with, or, if the DNA or RNA is single-stranded, it’s not even paired at all.

Conversely, a biochemical argument for ”base pair” might be that in a double-stranded molecule, the base pair is the relevant informational unit. We may only write one base in our nucleotide sequence for convenience, but because of the rules of base pairing, we know the complementing pair. In this case, maybe we should use ”base” for single-stranded molecules.

If we consult two more or less trustworthy sources, The Encylopedia of Life Sciences and Wiktionary, they both seem to take this view.

eLS says:

A megabase pair, abbreviated Mbp, is a unit of length of nucleic acids, equal to one million base pairs. The term ‘megabase’ (or Mb) is commonly used inter-changeably, although strictly this would refer to a single-stranded nucleic acid.

Wiktionary says:

A length of nucleic acid containing one million nucleotides (bases if single-stranded, base pairs if double-stranded)

Please return next week for the correct pronunciation of ”loci”.

Literature

Dear, P.H. (2006). Megabase Pair (Mbp). eLS.

If research is learning, how should researchers learn?

I’m taking a course on university pedagogy to, hopefully, become a better teacher. While reading about students’ learning and what teachers ought to do to facilitate it, I couldn’t help thinking about researchers’ learning, and what we ought to do to give ourselves a good learning environment.

Research is, largely, learning. First, a large part of any research work is learning what is already known, not just by me in particular; it’s a direct continuation of learning that takes place in courses. While doing any research project, we learn the concepts other researchers use in this specific sub-subfield, and the relations between them. First to the extent that we can orient ourselves, and eventually to be able to make a contribution that is intelligible to others who work there. We also learn their priorities, attitudes and platitudes. (Seriously, I suspect you learn a lot about a sub-subfield by trying to make jokes about it.) We also learn to do something new: perform a laboratory procedure, a calculation, or something like that.

But more importantly, research is learning about things no-one knows yet. The idea of constructivist learning theory seems apt: We are constructing new knowledge, building on pre-existing structures. We don’t go out and read the book of nature; we take the concepts and relations of our sub-subfield of choice, and graft, modify and rearrange them into our new model of the subject.

If there is something to this, it means that old clichéd phrases like ”institution of higher learning”, scientists as ”students of X”, and so on, name a deeper analogy than it might seem. It also suggests that innovations in student learning might also be good building blocks for research group management. Should we be concept mapping with our colleagues to figure out where we disagree about the definition of ”developmental pleiotropy”? It also makes one wonder why meetings and departmental seminars often take the form of sage on the stage lectures.

Two distinguishing traits of science are that there are errors all the time and that almost no-one can reproduce anything

I got annoyed and tweeted:

”If you can’t reproduce a result, it isn’t science” … so we’re at that stage now, when we write things that sound righteous but are nonsense.

Hashtag subtweet, I guess. But it doesn’t matter who first wrote the sentence I was complaining about; they won’t care what I think, and I’m not out to debate them. I only think the quoted sentence makes sense if you take ”science” to mean ”the truth”. The relationship between science and reproducibility is messier than that.

The first clause could mean a few different things:

You have previously produced a result, but now, you can’t reproduce it when you try …

Then you might have done something wrong the first time, or the second time. This is an everyday occurrence of any type of research, that probably happens to every postdoc every week. Not even purely theoretical results are safe. If the simulation is stochastic, one might have been interpreting noise. If there is an analytical result, one might have made an odd number of sign errors. In fact, it is a distinguishing trait of science that when we try to learn new things, there are errors all the time.

If that previous result is something that has been published, circulated to peers, and interpreted as if it was a useful finding, then that is unfortunate. The hypothetical you should probably make some effort to figure out why, and communicate that to peers. But it seems like a bad idea to suggest that because there was an error, you’re not doing science.

You personally can’t reproduce a results because you don’t have the expertise or resources …

Science takes a lot of skill and a lot of specialised technical stuff. I probably can’t reproduce even a simple organic chemistry experiment. In fact, it is a distinguishing trait of science that almost no-one can reproduce any of it, because it takes both expertise and special equipment.

No-one can ever reproduce a certain result even in principle …

It might still be science. The 1918 influenza epidemic will by the nature of time never happen again. Still, there is science about it.

You can’t reproduce someone else’s results when you try with a reasonably similar setup …

Of course, this is what the original authors of the sentence meant. When this turns out to be a common occurrence, as people systematically try to reproduce findings, there is clearly something wrong with the research methods scientists use: The original report may be the outcome of a meandering process of researcher degrees of freedom that produced a striking result that is unlikely to happen when the procedure is repeated, even with high fidelity. However, I would say that we’re dealing with bad science, rather than non-science. Reproducibility is not a demarcation criterion.

(Note: Some people reserve ”reproducibility” for the computational reproducibility of re-running someone’s analysis code and getting the same results. This was not the case with the sentence quoted above.)

Interpreting genome scans, with wisdom

Eric Fauman is a scientist at Pfizer who also tweets out interpretations of genome-wide association scans.

Background: There is a GWASbot twitter account which posts Manhattan plots with links for various traits from the UK Biobank. The bot was made by the Genetic Epidemiology lab at the Finnish Institute for Molecular Medicine and Harvard. The source of the results is these genome scans (probably; it’s little bit opaque); the bot also links to heritability and genetic correlation databases. There is also an EnrichrBot that replies with enrichment of chromatin marks (Chen et al. 2013). Fauman’s comments on some of the genome scans on his Twitter account.

Here are a couple of recent ones:

And here is his list of these threads as a Google Document.

This makes me thing of three things, two good, and one bad.

1. The ephemeral nature of genome scans

Isn’t it great that we’re now at a stage where a genome scan can be something to be tweeted or put en masse in a database, instead of published one paper per scan with lots of boilerplate. The researchers behind the genome scans say as much in their 2017 blog post on the first release:

To further enhance the value of this resource, we have performed a basic association test on ~337,000 unrelated individuals of British ancestry for over 2,000 of the available phenotypes. We’re making these results available for browsing through several portals, including the Global Biobank Engine where they will appear soon. They are also available for download here.

We have decided not to write a scientific article for publication based on these analyses. Rather, we have described the data processing in a detailed blog post linked to the underlying code repositories. The decision to eschew scientific publication for the basic association analysis is rooted in our view that we will continue to work on and analyze these data and, as a result, writing a paper would not reflect the current state of the scientific work we are performing. Our goal here is to make these results available as quickly as possible, for any geneticist, biologist or curious citizen to explore. This is not to suggest that we will not write any papers on these data, but rather only write papers for those activities that involve novel method development or more complex analytic approaches. A univariate genome-wide association analysis is now a relatively well-established activity, and while the scale of this is a bit grander than before, that in and of itself is a relatively perfunctory activity. [emphasis mine] Simply put, let the data be free.

That being said, when starting to write this post, first I missed a paper. It was pretty frustrating to find a detailed description of the methods: after circling back and forth between the different pages that link to each other, I landed on the original methods post, which is informative, and written in a light conversational style. On the internet, one would fear that this links may rot and die eventually, and a paper would probably (but not necessarily …) be longer-lasting.

2. Everything is a genome scan, if you’re brave enough

Another thing that the GWAS bot drives home is that you can map anything that you can measure. The results are not always straightforward. On the other hand, even if the trait in question seems a bit silly, the results are not necessarily nonsense either.

There is a risk, for geneticists and non-geneticists alike, to reify traits based on their genetic parameters. If we can measure the heritability coefficient of something, and localise it in the genome with a genome-wide association study, it better be a real and important thing, right? No. The truth is that geneticists choose traits to measure the same way all researchers choose things to measure. Sometimes for great reasons with serious validation and considerations about usefulness. Sometimes just because. The GWAS bot also helpfully links to the UK Biobank website that describes the traits.

Look at that bread intake genome scan above. Here, ”bread intake” is the self-reported number of slices of bread eaten per week, as entered by participants on a touch screen questionnaire at a UK Biobank assessment centre. I think we can be sure that this number doesn’t reveal any particularly deep truth about bread and its significance to humanity. It’s a limited, noisy, context-bound number measured, I bet, because once you ask a battery of lifestyle questions, you’ll ask about bread too. Still, the strongest association is at a region that contains olfactory receptor genes and also shows up two other scans about food (fruit and ice cream). The bread intake scan hits upon a nugget of genetic knowledge about human food preference. A small, local truth, but still.

Now substitute bread intake for some more socially relevant trait, also imperfectly measured.

3. Lost, like tweets in rain

Genome scan interpretation is just that: interpretation. It means pulling together quantitative data, a knowledge of biology, previous literature, and writing an unstructured text, such as a Discussion section or a Twitter thread. This makes them harder to organise, store and build on than the genome scans themselves. Sure, Fauman’s Twitter threads are linked from the above Google Document, and our Discussion sections are available from the library. But they’re spread out in different places, they mix (as they should) evidence with evaluation and speculation, and it’s not like we have a structured vocabulary for describing genetic mechanisms of quantitative trait loci, and the levels of evidence for them. Maybe we could, with genome-wide association study ontologies and wikis.

You’re not funny, but even if you were

Here is a kind of humour that is all too common in scientific communication; I’ll just show you the caricature, and I think you’ll recognize the shape of it:

Some slogan about how a married man is a slave or a prisoner kneeling and holding a credit card. Some joke where the denouement relies on: the perception that blondes are dumb, male preference for breast size, perceived associations between promiscuity and nationality, or anything involving genital size. Pretty much any one-panel cartoon taken from the Internet.

Should you find any of this in your own talk, here is a message to you: That may be funny to you; that isn’t the problem. To a fair number of the people who are listening, it’s likely to be trite, sad and annoying.

Humour totally has a place in academic speech and writing—probably more than one place. There is the laughter that is there to relieve tension. That is okay sometimes. There are jokes that are obviously put-downs. Those are probably only a good idea in private company, or in public forums where the object of derision is powerful enough that you’re not punching down, but powerless enough to not punch you back. Say, the ever-revered and long dead founder of your field—they may deserve a potshot at their bad manners and despicable views on eugenics.

Then there is that elusive ‘sudden perception of the incongruity between a concept and the real objects which have been thought through it in some relation’ (Schopenhauer, quoted in Stanford Encyclopedia of Philosophy). When humour is used right, a serious lecturer talking about serious issues has all kinds of opportunities to amuse the listener with incongruities between the expectations and what they really are like. So please don’t reveal yourself to be predictably trite.