From Evolution in Sweden 2014, Uppsala

Dear diary,

A couple of weeks ago I attended the Evolution in Sweden meeting in Uppsala, as expected a very nice meeting with lots of interesting things. My last conference was ESEB last summer, which was great because it was a huge conference with so much to see and so many people. Evolution in Sweden was great because it wasn’t huge, so that it was very possible to see everything, recognise familiar faces and talk with people. I had a poster on the behaviour genetics of chicken domestication (of course!).

Here are some of my personal highlights, in no particular order:

Kerstin Johannesson’s talk, an ”advertisement for marine organsims” was probably the most fun and engaging. I was very convinced that evolutionary research in the Baltic Sea is a great idea! Among other things she mentioned salinity gradients, the sexual and asexual reproduction of Fucus brown algae, Littorina saxatilis of course and the IMAGO project to sequence and assemble reference genomes for eight different species from the Baltic.

We have a great infrastructure for evolutionary research: the Baltic Sea. [quoted from memory]

Claudia Köhler spoke about why triploids in Arabidopsis thaliana fail, which is an interesting story involving the endosperm, which in a triploid seed turns out tetraploid, and genomic imprinting. They screened for mutants able to form triploid seeds and found paternally imprinted gene, that is dosage-sensitive and causes the failure of triploid seeds (Kradolfer & al 2013).

Anna Qvarnström and Hans Ellegren talked about different flycatcher projects. I don’t have that much clever to say about this right now, except that both projects are really fascinating and impressive. Everyone who cares about genomics in the wild should keep an eye on this.

There were two talks from Umeå Plant Science Centre: Stefan Jansson’s about association mapping in aspen (SwAsp), which sounds fun but difficult with tons of genetic variation, and Pär K. Ingvarsson’s about the Norway spruce genome (Nystedt & al 2013). An interesting observation from the latter was that it’s gigantic genome size (~20 Gb) apparently isn’t due to whole-genome duplications, but to unchecked transposable element activity. A nice nugget to remember: about half of the sequence, or three to four human genomes, consists of LTR-type repeats.

I’m afraid you will never read very much from me about theory talks. I am an engineer after all, so I don’t fear the equations that much, but most of the time I don’t have necessary context to have any clue where this particular model fits into the grand scheme of things. However, Jessica Abbott gave a fun talk presenting a model for sexual conflict in hermaphrodites that deserves a special mention.

I did see quite few a genomic plots of Fst outliers and I believe the question that needs answering about them is: What do they really mean? One can do comparisons of comparisons (like in Roger Butlin’s talk and  their paper on parallel evolution of morphs in Littorina; Butlin & al 2013), but when it comes to picking out the most differentiated loci on a genome-wide level, are they really the most interesting loci? Are the loci of highest differentiation the loci of adaptation; are they the loci of speciation? (Ellegren’s talk and the flycatcher genome paper; Ellegren & al 2012). It’s a bit like the problem faced by QTL mappers — ”now that we’ve got a few genomic regions, what do we do with them?” — with the added complication that we don’t have a phenotype associated with them.

Genetic architecture wasn’t an explicit theme of the meeting, but it always comes up, doesn’t it? Will traits be massively polygenic, dooming researchers to a lifetime search for missing heritability, or relatively simple with a handful of loci? And under what circumstances will either architecture occur? Jon Ågren talked about the fantastic Arabidopsis thaliana in situ QTL mapping experiment. I think it is best illustrated with the video he showed last time I heard him talk about this — Lost in transplantation:

Folmer Bokma used Lego dinosaurs to great effect to illustrate developmental constraints. Also a large part of the talk was quotes from different famous evolutionary biologists. Very memorable, but I’m not sure I understood where he was heading. I was expecting him to start talking about the need for G matrix methods any moment. My lack of understanding is of course my fault as well, not just of the speaker’s, and there were a few graphs of gene duplications and gene expression data in primates, but I don’t feel that he showed ”how phylogenetic analyses of genomic data can shed new light on these ideas”, as promised in the abstract.

Possibly the best expression of the meeting: Erik Svensson’s ”next generation fieldwork”. I’m not a fan of the inflation of words ending in -omics (and I sometimes feel ”genomics” should just be ”genetics”), but if we have genomics and proteomics, phenomics is also justified, I guess. As a tounge-in-cheek version ”next generation fieldwork” is spot on. And very true: clever phenotyping strategies in natural populations and natural settings is more even more important than rapid sequencing and genotyping. By the way, Erik Svensson, Jessica Abbott, Maren Wellenreuther and their groups have a lab blog which seems nice and active.

And finally, the thing that wasn’t so great, coincidentally, the same thing that wasn’t so great at ESEB: the gender balance: only 7 out of 28 speakers were women. I don’t know to what extent that ratio reflect the gender ratio of Swedish evolutionary biology, but regardless it is too low.

It’s been a while since mid-January, but I’ve been busy (with some fun things — will tell you more later). And maybe we’ll see each other at the next Evolution in Sweden in Lund.

uppsala_gustavianum uppsala_snow uppsala_chickens

Fall is the data analysis season


Dear diary,

I spent a lot of my summer in the lab, and my fall has been mostly data analysis, with a little writing and a couple of courses thrown in there. Data analysis means writing code, and nowadays I do most of my work with the help of R. R has even replaced python and perl for most ad hoc scripting. Case in point: I recently wrote an R script to generate and run a (long) series of tar commands for me. It might sound weird, but R can do these silly tasks just as well as any scripting language and even when its statistical functions play no role, its tabular data structures often come in handy.

Working on multiple similar but not identical projects also means I’ve got to reread and rework some old scripts, and I often find that when return to reuse some piece code, I’ve learned enough to rewrite it in a better way. Inspired by this paper, I’m trying to slowly improve my programming practices. The assertthat package is a new friend, and the next step is getting better testing routines going, probably with the aid of testthat. (Speaking of learning R, did you know that you get the underscore sign in ESS by double tapping the key? Just pressing it once makes an assignment arrow. I didn’t realise until the other day and I feel very stupid for it.)

We’ve been running a second season of the introduction to R seminars with the lab, also including some gene expression and massively parallel resequencing data. (The latter not so much with R, though.) I’ve learned quite a bit, and hopefully refined my R teaching skills a little. I have the impression that doing lots of in-seminar exercises has been helpful, and this time around I put a lot more emphasis on organising analysis code into scripts.

I’ve also gotten to play a bit more with quantitative genetics models with MCMCglmm, which is great fun. Speaking of MCMC, Gelman & co’s Bayesian Data Analysis 3rd edition has come out! My copy is on its way, and I’ve also bought Dirk Edelbuettel’s Rcpp book. Looking forward to that.

During November, my blog hits set a new record (almost doubling the previous most visited month), thanks to links from Matt Asher’s Probability and statistics blog and Sam Clifford’s blog . It’s very flattering to be linked by two statistics bloggers that I’ve read, one of which was already in my RSS reader.

By the way, I will be at the Evolution in Sweden meeting in Uppsala in January. If you’re there, say hi!

Morning coffee: alpha level 0.005


Valen Johnson recently published a paper in PNAS about Bayes factors and p-values. In null hypothesis testing p-values measure the probability of seeing data this extreme or more extreme, if the null hypothesis is true. Bayes factors measures the ratio between the posterior probability of the alternative hypothesis to the posterior probability of the null hypothesis. The words ‘probability of the hypothesis’ tells us we’re in Bayes land, but of course, that posterior probability comes from combining the prior probability with the likelihood, which is the probability of generating the data under the hypothesis. So the Bayes factor considers not only what happens if the null is true, but what happens if the alternative is true. That is one source of discrepancies between them. Johnson has found a way to construct Bayes factors so that they correspond certain common hypothesis tests (including an approximation for the t-test, so there goes most of biology), and found for many realistic test situations a p-value of 0.05 corresponds to pretty weak support in terms of Bayes factors. Therefore, he suggests the alpha level of hypothesis tests should be reduced to at least 0.005. I don’t know enough about Bayes factors to really appreciate Johnson’s analysis. However, I do know that some responses to the paper make things seem a bit too easy. Johnson writes:

Of course, there are costs associated with raising the bar for statistical significance. To achieve 80% power in detecting a standardized effect size of 0.3 on a normal mean, for instance, decreasing the threshold for significance from 0.05 to 0.005 requires an increase in sample size from 69 to 130 in experimental designs. To obtain a highly significant result, the sample size of a design must be increased from 112 to 172.

If one does not also increase the sample sizes to preserve — or, I guess, preferably improve — power, just reducing the alpha level to 0.005 will only make matters worse. With low power comes, as Andrew Gelman likes to put it, high Type M or magnitude error rate. That is if power is bad enough not only will there be few significant findings, but all of them will be overestimates.

Morning coffee: tables

(Note: ‘Morning coffee’ will be short musings about science-related topics.)


I don’t like tables. Or, more precisely: I don’t like tables that I have to read, but I love telling my computer to read tables for me. Tables made for human eyes tend to have certain features — I don’t know whether they really help facilitate human understanding, but people seem to think they do — such as merged cells or omission of repeated values, footnotes indicated by superscript symbols and sometimes colouring that conveys meaning. There is a conflict between keeping the number of columns small enough to be readable and putting in all the statistics that readers want. Someone might want the coefficient of determination while someone who of information theoretic persuasion wants the AIC. It is more convenient for the human reader to see the table close to the text, while the computer user would probably like it in a text file. Some journals do this almost right: right below the table there is a link to download it as comma separated values. I think ideally any data would be presented as a summary table — or even better a graph! — and the underlying computer-readable data would be the click of a link away.

From Lisbon, part 2

ESEB 2013 is over. I’ve had a great time, met with a lot of cool people and actually coped reasonably well with the outdoor temperature. As a wimpy Swede, I find anything above 30 degrees Celsius rather unpleasant. There have been too many talks and posters to mention all the good stuff, but here are a few more highlights:

Trudy Mackay’s plenary on epistasis in quantitative traits in D. melanogaster: Starting with the Drosophila Genetic Reference Panel and moving on to the Flyland advanced intercross population, Mackay’s group found what appeared to be extensive epistasis in several quantitative traits. Robert Anholt spoke later the same day about similar results in olfactory behaviour. While most of the genetic variance on the population level is still effectively additive, there seems to be a lot of interaction at the level of gene action, and it hinders QTL detection. The variants that did show up appeared to be involved in common networks. Again, we ask ourself how big these networks are and how conserved they might be among different species.

How did all this epistasis come about then? Mackay’s answer is phenotypic buffering or canalisation (as we say in the Nordic countries: a beloved child has many names). That is, that the organism has a certain buffering capacity against mutations, and that the effect of many of them are only revealed on a certain genetic background where buffering has been broken. See their paper: Huang et al (2012). Mackay mentioned some examples in answer to a question: potentially damaging exonic mutations travelled together with compensatory mutations that possibly made them less damaging. It would be really fun to see an investigation of the molecular basis of some examples.

(Being a domestication genetics person, this immediately brings me to Belyaev’s hypothesis about domestication. Belyaev started the famousic farm fox domestation experiment, selecting foxes for reduced fear of humans. And pretty quickly, the foxes started to become in many respects similar to dogs. Belyaev’s hypothesis is that ‘destabilising selection’ for tameness changed some regulatory system (probably in the hypothalamus–pituitary–adrenal axis) that exposed other kinds of variation. I think it’s essentially a hypothesis about buffering.)

Laurent Excoffier about detecting recent polygenic adaptation in humans. Very impressive! The first part of the talk presented a Fst outlier test applied to whole pathways together instead of individual loci. This seems to me analogous to gene set enrichment tests that calculate some expression statistic on predefined gene sets, instead of calculating the statistic individually and then applying term enrichment tests. In both cases, the point is to detect more subtle changes on the pathway as a whole. As with many other enrichment methods, particularly in humans, it is not that obvious what to do next with the list of annotation terms. Even when the list makes good biological sense — really, is there a gene list that wouldn’t seem to make at least a bit of biological sense? The results do (again) imply epistasis in human immune traits, and that is something that could potentially be tested. Though it would be a heroic amount of work, I hope someone will use this kind of methods in some organism where it is actually possible to test the function and compare locally adapted populations.

Alison Wright’s talk on Z chromosome evolution. She works with Judith Mank, and I’ve heard a bit about it before, but sex chromosomes and the idea that you can trace the ‘strata’ of chromosome evolution are always fascinating. Wright also presented some interesting differences in the male hypermethylated region between birds with different mating systems.

William Jeffery on blind cavefish: I’ve been thinking for ages that I should blog about the blind cavefish (for popular/science and in Swedish, that is), because it’s such a beautiful example. The case for eye regression as an adaptive trait rather than just the loss of an unnecessary structure seems pretty convincing! Making an eye regress at the molecular level seems at once rather simple — removal of the lens (by apoptosis in the blind cavefish) seems to be all that is needed — and complex (it’s polygenic and apparently not achieved the same way in all blind cavefish populations).

Virpi Lummaa’s plenary about using parish records from preindustrial Finland to investigate hypotheses about reproduction, longevity and menopause. I heard about the Grandmother hypothesis ages ago, so I knew about it, but I didn’t know to what extent there was empirical support for it. Unfortunately, that many of the cases where I’ve heard a nice hypothesis but don’t know the empirical support turn out to be disappointments. Not this time, however! On top of all the good stuff in the talk, Lummaa had very pretty slides with old photos and paintings by Albert Edelfelt. The visual qualities were surpassed only by Rich FitzJohn’s slides.


(Larin Paraske by Albert Edelfelt)

Two things that were not so great:

The poster sessions. Now my poster session on Friday turned out very well for me, but many others weren’t so lucky. I don’t know why half of the posters were hung facing the wall with hardly enough space for people to walk by the poster board, but it was a terrible idea and must have stopped a lot of people from seeing more posters.

The gender balance. According to Julia Schroeder only 27% of invited speakers were women. I don’t know how it worked behind the scenes and what the instructions to symposium organisers were, and there might not be an easy fix, but this urgently needs fixing.

Of course, there were many more good talks and posters than the handful I’ve mentioned, and apart from them, the twitter feed and tweetup, the social gatherings and the fact that there were actually several interesting people that came to my poster to chat were highlights for me. I come home with a long list of papers to read and several pages of things to try. Good times!


From Lisbon

Dear diary,

I’m at the Congress of the European Society for Evolutionary Biology in Lisbon. It’s great, of course and I expected nothing less, but there is so much of it! Every session at ESEB has nine symposia running in parallel, so there are many paths through the conference programme. Mine contains a lot of genomics for obvious reasons.

Some highlights so far:

Juliette de Meaux’s plenary: while talking about molecular basis of adaptations in Arabidopsis thaliana — one study based on a candidate gene and one on a large-effect QTL — de Meaux brought up two fun concepts that would recur in Thomas Mitchel-Olds’ talk and elsewhere:

1) The ‘mutational target’ and how many genes there are that could possibly be perturbed to change a trait in question. The size of the mutational target and the knowledge of the mechanisms underlying the trait of course affects whether it is fruitful to try any candidate gene approaches. My intuition is to be skeptical of candidate gene studies for complex traits, but as in the case of plant pathogen defense (or melanin synthesis for pigmentation — another example that got a lot of attention in several talks), if there is only one enzyme pathway to synthesise a compound and only one step that controls the rate of the reaction, there will be very few genes that can physically be altered to affect the trait.

2) Some of both de Meaux’s and Mitchel-Olds’ work exemplify the mapping of intermediate molecular phenotypes to get at small-effect variants for organismal traits — the idea being that while there might be many loci and large environmental effects on the organismal traits, they will act through different molecular intermediates and the intermediate traits will be simpler. The intermediate traits might be flagellin bindning, flux through an enzymatic pathway or maybe transcript abundance — this is a similar line of thinking as the motivations for using genetical genomics and eQTL mapping.

The ”Do QTN generally exist?” symposium: my favourite symposium so far. (Note: QTN stands for Quantitative Trait Nucleotide, and it means nothing more than a known causal sequence variant for some quantitative trait. Very few actual QTN featured in the session, so maybe it should’ve been called ”Do QTG generally exist?” Whatever.) I’ve heard both him and Annalise Paaby present their RNA inference experiments revealing cryptic genetic variation in C. elegans before, but Matt Rockman also talked about some conceptual points (”things we all know but sometimes forget” [I’m paraphrasing from memory]): adaptation does not require fixation; standing variation matters; effect-size is not an intrinsic feature of an allele. There was also a very memorable question at the end, asking whether the answer to the questions Rockman posed at the beginning, ”What number of loci contribute to adaptive evolution?” and ”What is the effect-size distribution?” should be ”any number of loci” and ”any distribution” … To which Rockman answered that those were pretty much his views.

In the same symposium, Luisa Pallares, showed some really nice genome wide association result for craniofacial morphology from natural hybrid mice. As someone who works on an experimental cross of animals, I found the idea very exciting, and of course I immediately started dreaming about hybrid genetical genomics.

Dieter Ebert’s plenary: how they with lots of work seem to have found actual live Red Queen dynamics with Daphnia magna and Pasteuria ramosa.

Larry Young and Hanna Kokko: Young and Kokko had two very different invited talks back to back in the sex role symposium, Young about the neurological basis of pair-bonding in the famous monogamous voles, and Kokko about models of evolution of some aspects of sex roles.

Susan Johnston‘s talk: about how heterozygote advantage maintains variation at a horn locus in the Soay sheep of St Kilda. Simply awesome presentation and results. Published yesterday!

On to our stuff! Dominic Wright had a talk presenting our chicken comb work in the QTN session, and on Friday I will have a poster on display about the behaviour side of the project. There’s actually quite a few of us from the AVIAN group here, most of them also presenting posters on Friday (Anna-Carin, Johan, Amir, Magnus, Hanne, Rie). And (though misspelled) my name is on the abstract of Per Jensen‘s talk as well, making this my personal record for conference contribution.

The poster sessions are very crowded and a lot of the posters are hung facing the wall with very little space for walking past, and some of them behind pillars. In all probability there’s a greater than 0.5 chance that my poster will be in a horrible spot. But if you happen to be curious feel free to grab me anywhere you see me, or tweet at me.

I looke like this when posing with statues or when I’m visibly troubled by the sunlight. If you’re into genetical genomics for QTG identification, domestication and that kind of stuff, this is the hairy beast you should talk too.


Summer in the lab

Dear diary,

The best thing about summer in the lab is that one can blast the Sweeny Todd and Rocky Horror Picture Show soundtracks as loud as one pleases *. Blogging has been and will be a bit sparse, but I’m having a fun summer so far in the lab finishing up the lab work of my second slightly bigger project. We’re also working through our last paper that got some really knowledgeable, thorough, and quite critical review comments …

My halftime seminar will be in August. That’s a pretty scary thought. Well, the seminar itself is decidedly non-scary; it’s rather the fact that I’ve been going for two and a half years. I’m tempted to write it up as a blog post, but I think I’ll wait and write something when the next paper comes out!

Also, I’m attending the Congress of the European Society for Evolutionary Biology in Lisbon in August. If you’re there, stop by my poster for domestication genomics goodness and to say hello to me and my crocheted chicken mascot!


(Made and photographed by Anna Nygren.)

*) That is, I’m not alone here, but everyone here this week has good taste.

From Uppsala

On a personal note, I had a great time at the Genetics of Adaptations symposium in Uppsala last Saturday. Pretty much everything was interesting, and I particularly enjoyed the following:

  • Bruce Walsh himself explaining G-matrices and how they can put constraints on evolution. The G-matrix is one of those things I’d very much like to understand, and listening to someone like Bruce Walsh certainly helps. (See e.g. this paper by Walsh & Blows)
  • Matt Rockman talked about some serious QTN work on awesomley weird phenotypes: worms depositing copulatory plugs on each other’s and their own heads. (See e.g. Palopoli et al.)
  • Saunak Sen spoke about mapping of function-valued traits, probably the most interesting talk to me. He concentrated on traits that are functions of one variable, namely time. (See Xiong et al.) However the most interesting to me (as a gene expression enthusiast) would be traits that are, as he put it, ”massively multivariate”, like eQTL data. In that case, there’s not really an obvious analogue of time, i.e. something that the values from one individual are a function of. I eagerly await what people might come up with in that regard.

It was a really fun time, and Uppsala is always nice. I’ll have to make sure to be there when the evolution museum is open some time. I always get the feeling that I should be better at, you know, networking at these things, but I had a couple of interesting conversations about making sense of gene expression results (incidentally, something that I’m very likely to blog about in the near future).