This week, I’m tweeting from the @sweden account. It is a rotating account with a new Swede every week. I honestly have no idea who could have nominated me, but I’m flattered and happy. So far I think it’s going well. As I wrote on curatorsofsweden.com:
I’m unlikely to present any great insights about the nature and meaning of Swedishness, but I hope I may be able to give you a new appreciation for the chicken comb.
I think I could probably just keep the week going by answering questions and comments, because there have been many good ones! We’ve been talking about domestication (of course), programming languages for data analysis, the bright but possibly distant future when quantitative genetics and systems biology come together, common misconceptions about genetics, what to say to your creationist friend etc.
I went to the Stockholm R useR group meetup on R in genomics at the Stockholm node of SciLifeLab. It was nice. If I had worked a bit closer I would attend meetups all the time. I even got to be pretentious with my notebook while waiting for the train.
The speakers were:
Jakub Orzechowski Westholm on R and genomics in general. He demonstrated genome browser-style tracks with Gviz, some GenomicRanges, and a couple of common plots of gene expression data. I have been on the fence about what package I should use for drawing genes and variants along the genome. I should play with Gviz.
Daniel Klevebring on clinical sequencing and how he uses R (not that much) in sequencing pipelines aimed at targeting the right therapy to patients based on the mutations in their cancer cells. He mentioned some getopt snippets for getting R to play nicely on the command line, which is something I should definitely try more!
Finally, Arvind Singh Mer on predictive modelling for clinical genomics (like the abovementioned ClinSeq data). He showed the caret package for machine learning, with an elastic net regression.
I don’t know the rest of the audience, so maybe the choice to gear talks towards the non-bio* person was spot on, but that made things a bit less interesting for me. For instance, in Jakub’s talk about gene expression, I would’ve preferred more about the messy stuff: how to make that nice gene-by-sample matrix in the first place, and if R can be of any help in that process; also, in the other end, what models one would use after that first pass of visualisation. But this isn’t a criticism of the presenters — time and complexity constraints apply. (If I was asked to present how I use R any demos would be toy analyses of clean datasets. That is the way these things go.)
We also heard repeated praise for and recommendations of the hadleyverse and data.table. I’m not a data.tabler myself, but I probably should be. And I completely agree about the value of dplyr — there’s this one analysis where a couple of lines with dplyr changed it from ”argh, do I have to rewrite this in C?” to being workable. I think we also saw all the three plotting systems: base graphics, ggplot2 and lattice in action.
I’m attending a course about scientific publishing, and the other day there was lecture about bibliometrics by Lovisa Österlund and David Lawrence from the Linköping University library. I don’t think I know anyone who particularly likes bibliometrics, but I guess it makes sense that if one needs to evaluate research without trying to understand what it is about there are only citations, the reputation of the publication channel and the cv of the researcher to look at. I imagine it’s a bit like reviewing a novel in a language one doesn’t know. A couple of things occured to me, though.
What to do when different instruments of evaluation give different results? Take the two papers (so far) published during my PhD: they both deal with the genetics of chicken comb size; one is published in PLOS Genetics and one in Molecular Ecology. If we look at journal impact factors (and we shouldn’t, but say that we do), PLOS Genetics comes out ahead with an impact factor of 8.5 against 6.3. For those that do not know this about it, journal impact factor is the mean number of citations for papers in that journal the last two years calculated by Thomson Reuters in their own secret way. However, Linköping University has for some reason decided to use the Norwegian index for evaluating publication channels. I don’t know why, and I don’t think it matters that much for me personally, since the system will change soon and I will finish in about a year and a half. In the Norwegian system journals are ranked as level one or two, where two is better and is supposed to represent the top 20% of that subject area. According to their database, Molecular ecology is level 2, while PLOS Genetics is level 1. The source of the discrepancy is probably that PLOS Genetics is counted as biomedicine, while Molecular Ecology is biology, according to the Norwegian database.
They also mentioned Altmetrics, and I don’t know what to make of it. On one hand, I guess it’s good to keep tabs on social media. On the other hand, what do numbers of tweets really tell you, except that one of the authors has a Twitter account? One of the examples in the lecture was the metrics page for this paper that I happen to be a contributor to. It is actually pretty strange. It shows three tweets or 11 tweets, depending on where on the page you look. Also, when I accessed this page earlier today it linked a blog. Now it doesn’t. That says something about the ephemeral nature of internet media. Regardless, when I first saw the page I thought perhaps the metrics page had picked up on my post about the paper, but that was not the case. I don’t know how altmetric.com define a ”science blog”, maybe the blog has to be listed on some aggregation site or another, and I’m not pretending my post is particularly insightful or important. Still it’s a little strange that the altmetrics page doesn’t list a post by one of the authors about the paper, but listed a post that referred to the paper with only two sentences and was mistaken about the conclusion.
I’ve had this blog since 2010, but it was not until last year that I started writing anything else than popular/science in Swedish. There is lots of discussion on academic blogs about whether PhD students, or any academics, should write on blogs or not and also quite a bit of fear, uncertainty and doubt going around. This is what I think: I don’t think my blog is such a big deal. It’s just a small hobby project that makes me happy. And while I hope it doesn’t hurt my research or my chances to continue doing science, I don’t think it helps them much either.
Do I have a target audience? There was recently a small survey to find what academics blog about and why; they found that most blogs were directed at peers, not for outreach. I’m not surprised. As I’ve already mentioned, my posts in Swedish are more popular/science, less technical and sometimes deal with things published in Swedish media. I think the target audience is still geeks of some kind, but not necessarily genetics geeks. My posts in English are more directed at academical things, either related to my research and work as a PhD student or about the R language. So my posts are a mix of languages and themes. Is that a problem? From a popularity or readership perspective, probably yes. I can see little reason not to split the posts to two blogs, each concentrated on one theme, except that I don’t feel like running two blogs.
Does blogging hurt me because it hurts my work? I hardly think so. First, blogging is not part of my duties at the university, and I don’t do it instead of writing, working in the lab or analysing data. I do it in the evening after work, or in the case of some posts in the morning before. I’m not convinced blogging makes me in any way a better scientist, but it can hardly make me worse. Thinking about science or how to explain it for another hour now and then can’t hurt. And yes, the time spent blogging could theoretically be spent writing papers or something, but so could theoretically the time spent at the gym, with family or friends. If we grant that academics do other things, blogging could be one of those activities. My blog is not completely disconnected from my work, but I think it’s disconnected enough to be regarded as a fun pastime.
Does blogging hurt my reputation because people might read my blog and disapprove? I don’t think that many people read my blog; actually, I know that not many people do. Still, it is certainly possible that some of the readers might be important to my career and that they don’t like what they see. It will be found when people look me up with a search engine. Maybe someone thinks that I’m wasting my time, or maybe I’ve written something controversial — or more likely, something stupid. I think and say things that are mistaken all the time, and some of those mistakes might end up in a blog post. The point is, though, that expressing my opinion about things I care about is not something I do because I think it’ll further my career. I do it because I want to. If my writing is successful, the things on my blog will be the kinds of things I honestly know, think and believe about science.
A couple of weeks ago I attended the Evolution in Sweden meeting in Uppsala, as expected a very nice meeting with lots of interesting things. My last conference was ESEB last summer, which was great because it was a huge conference with so much to see and so many people. Evolution in Sweden was great because it wasn’t huge, so that it was very possible to see everything, recognise familiar faces and talk with people. I had a poster on the behaviour genetics of chicken domestication (of course!).
Here are some of my personal highlights, in no particular order:
Kerstin Johannesson’s talk, an ”advertisement for marine organsims” was probably the most fun and engaging. I was very convinced that evolutionary research in the Baltic Sea is a great idea! Among other things she mentioned salinity gradients, the sexual and asexual reproduction of Fucus brown algae, Littorinasaxatilis of course and the IMAGO project to sequence and assemble reference genomes for eight different species from the Baltic.
We have a great infrastructure for evolutionary research: the Baltic Sea. [quoted from memory]
Claudia Köhler spoke about why triploids in Arabidopsis thaliana fail, which is an interesting story involving the endosperm, which in a triploid seed turns out tetraploid, and genomic imprinting. They screened for mutants able to form triploid seeds and found paternally imprinted gene, that is dosage-sensitive and causes the failure of triploid seeds (Kradolfer & al 2013).
Anna Qvarnström and Hans Ellegren talked about different flycatcher projects. I don’t have that much clever to say about this right now, except that both projects are really fascinating and impressive. Everyone who cares about genomics in the wild should keep an eye on this.
There were two talks from Umeå Plant Science Centre: Stefan Jansson’s about association mapping in aspen (SwAsp), which sounds fun but difficult with tons of genetic variation, and Pär K. Ingvarsson’s about the Norway spruce genome (Nystedt & al 2013). An interesting observation from the latter was that it’s gigantic genome size (~20 Gb) apparently isn’t due to whole-genome duplications, but to unchecked transposable element activity. A nice nugget to remember: about half of the sequence, or three to four human genomes, consists of LTR-type repeats.
I’m afraid you will never read very much from me about theory talks. I am an engineer after all, so I don’t fear the equations that much, but most of the time I don’t have necessary context to have any clue where this particular model fits into the grand scheme of things. However, Jessica Abbott gave a fun talk presenting a model for sexual conflict in hermaphrodites that deserves a special mention.
I did see quite few a genomic plots of Fst outliers and I believe the question that needs answering about them is: What do they really mean? One can do comparisons of comparisons (like in Roger Butlin’s talk and their paper on parallel evolution of morphs in Littorina; Butlin & al 2013), but when it comes to picking out the most differentiated loci on a genome-wide level, are they really the most interesting loci? Are the loci of highest differentiation the loci of adaptation; are they the loci of speciation? (Ellegren’s talk and the flycatcher genome paper; Ellegren & al 2012). It’s a bit like the problem faced by QTL mappers — ”now that we’ve got a few genomic regions, what do we do with them?” — with the added complication that we don’t have a phenotype associated with them.
Genetic architecture wasn’t an explicit theme of the meeting, but it always comes up, doesn’t it? Will traits be massively polygenic, dooming researchers to a lifetime search for missing heritability, or relatively simple with a handful of loci? And under what circumstances will either architecture occur? Jon Ågren talked about the fantastic Arabidopsis thaliana in situ QTL mapping experiment. I think it is best illustrated with the video he showed last time I heard him talk about this — Lost in transplantation:
Folmer Bokma used Lego dinosaurs to great effect to illustrate developmental constraints. Also a large part of the talk was quotes from different famous evolutionary biologists. Very memorable, but I’m not sure I understood where he was heading. I was expecting him to start talking about the need for G matrix methods any moment. My lack of understanding is of course my fault as well, not just of the speaker’s, and there were a few graphs of gene duplications and gene expression data in primates, but I don’t feel that he showed ”how phylogenetic analyses of genomic data can shed new light on these ideas”, as promised in the abstract.
Possibly the best expression of the meeting: Erik Svensson’s ”next generation fieldwork”. I’m not a fan of the inflation of words ending in -omics (and I sometimes feel ”genomics” should just be ”genetics”), but if we have genomics and proteomics, phenomics is also justified, I guess. As a tounge-in-cheek version ”next generation fieldwork” is spot on. And very true: clever phenotyping strategies in natural populations and natural settings is more even more important than rapid sequencing and genotyping. By the way, Erik Svensson, Jessica Abbott, Maren Wellenreuther and their groups have a lab blog which seems nice and active.
And finally, the thing that wasn’t so great, coincidentally, the same thing that wasn’t so great at ESEB: the gender balance: only 7 out of 28 speakers were women. I don’t know to what extent that ratio reflect the gender ratio of Swedish evolutionary biology, but regardless it is too low.
It’s been a while since mid-January, but I’ve been busy (with some fun things — will tell you more later). And maybe we’ll see each other at the next Evolution in Sweden in Lund.
I spent a lot of my summer in the lab, and my fall has been mostly data analysis, with a little writing and a couple of courses thrown in there. Data analysis means writing code, and nowadays I do most of my work with the help of R. R has even replaced python and perl for most ad hoc scripting. Case in point: I recently wrote an R script to generate and run a (long) series of tar commands for me. It might sound weird, but R can do these silly tasks just as well as any scripting language and even when its statistical functions play no role, its tabular data structures often come in handy.
Working on multiple similar but not identical projects also means I’ve got to reread and rework some old scripts, and I often find that when return to reuse some piece code, I’ve learned enough to rewrite it in a better way. Inspired by this paper, I’m trying to slowly improve my programming practices. The assertthat package is a new friend, and the next step is getting better testing routines going, probably with the aid of testthat. (Speaking of learning R, did you know that you get the underscore sign in ESS by double tapping the key? Just pressing it once makes an assignment arrow. I didn’t realise until the other day and I feel very stupid for it.)
We’ve been running a second season of the introduction to R seminars with the lab, also including some gene expression and massively parallel resequencing data. (The latter not so much with R, though.) I’ve learned quite a bit, and hopefully refined my R teaching skills a little. I have the impression that doing lots of in-seminar exercises has been helpful, and this time around I put a lot more emphasis on organising analysis code into scripts.
I’ve also gotten to play a bit more with quantitative genetics models with MCMCglmm, which is great fun. Speaking of MCMC, Gelman & co’s Bayesian Data Analysis 3rd edition has come out! My copy is on its way, and I’ve also bought Dirk Edelbuettel’s Rcpp book. Looking forward to that.
During November, my blog hits set a new record (almost doubling the previous most visited month), thanks to links from Matt Asher’s Probability and statistics blog and Sam Clifford’s blog . It’s very flattering to be linked by two statistics bloggers that I’ve read, one of which was already in my RSS reader.
By the way, I will be at the Evolution in Sweden meeting in Uppsala in January. If you’re there, say hi!
Valen Johnson recently published a paper in PNAS about Bayes factors and p-values. In null hypothesis testing p-values measure the probability of seeing data this extreme or more extreme, if the null hypothesis is true. Bayes factors measures the ratio between the posterior probability of the alternative hypothesis to the posterior probability of the null hypothesis. The words ‘probability of the hypothesis’ tells us we’re in Bayes land, but of course, that posterior probability comes from combining the prior probability with the likelihood, which is the probability of generating the data under the hypothesis. So the Bayes factor considers not only what happens if the null is true, but what happens if the alternative is true. That is one source of discrepancies between them. Johnson has found a way to construct Bayes factors so that they correspond certain common hypothesis tests (including an approximation for the t-test, so there goes most of biology), and found for many realistic test situations a p-value of 0.05 corresponds to pretty weak support in terms of Bayes factors. Therefore, he suggests the alpha level of hypothesis tests should be reduced to at least 0.005. I don’t know enough about Bayes factors to really appreciate Johnson’s analysis. However, I do know that some responses to the paper make things seem a bit too easy. Johnson writes:
Of course, there are costs associated with raising the bar for statistical significance. To achieve 80% power in detecting a standardized effect size of 0.3 on a normal mean, for instance, decreasing the threshold for significance from 0.05 to 0.005 requires an increase in sample size from 69 to 130 in experimental designs. To obtain a highly significant result, the sample size of a design must be increased from 112 to 172.
If one does not also increase the sample sizes to preserve — or, I guess, preferably improve — power, just reducing the alpha level to 0.005 will only make matters worse. With low power comes, as Andrew Gelman likes to put it, high Type M or magnitude error rate. That is if power is bad enough not only will there be few significant findings, but all of them will be overestimates.
(Note: ‘Morning coffee’ will be short musings about science-related topics.)
I don’t like tables. Or, more precisely: I don’t like tables that I have to read, but I love telling my computer to read tables for me. Tables made for human eyes tend to have certain features — I don’t know whether they really help facilitate human understanding, but people seem to think they do — such as merged cells or omission of repeated values, footnotes indicated by superscript symbols and sometimes colouring that conveys meaning. There is a conflict between keeping the number of columns small enough to be readable and putting in all the statistics that readers want. Someone might want the coefficient of determination while someone who of information theoretic persuasion wants the AIC. It is more convenient for the human reader to see the table close to the text, while the computer user would probably like it in a text file. Some journals do this almost right: right below the table there is a link to download it as comma separated values. I think ideally any data would be presented as a summary table — or even better a graph! — and the underlying computer-readable data would be the click of a link away.
ESEB 2013 is over. I’ve had a great time, met with a lot of cool people and actually coped reasonably well with the outdoor temperature. As a wimpy Swede, I find anything above 30 degrees Celsius rather unpleasant. There have been too many talks and posters to mention all the good stuff, but here are a few more highlights:
Trudy Mackay’s plenary on epistasis in quantitative traits in D. melanogaster: Starting with the Drosophila Genetic Reference Panel and moving on to the Flyland advanced intercross population, Mackay’s group found what appeared to be extensive epistasis in several quantitative traits. Robert Anholt spoke later the same day about similar results in olfactory behaviour. While most of the genetic variance on the population level is still effectively additive, there seems to be a lot of interaction at the level of gene action, and it hinders QTL detection. The variants that did show up appeared to be involved in common networks. Again, we ask ourself how big these networks are and how conserved they might be among different species.
How did all this epistasis come about then? Mackay’s answer is phenotypic buffering or canalisation (as we say in the Nordic countries: a beloved child has many names). That is, that the organism has a certain buffering capacity against mutations, and that the effect of many of them are only revealed on a certain genetic background where buffering has been broken. See their paper: Huang et al (2012). Mackay mentioned some examples in answer to a question: potentially damaging exonic mutations travelled together with compensatory mutations that possibly made them less damaging. It would be really fun to see an investigation of the molecular basis of some examples.
(Being a domestication genetics person, this immediately brings me to Belyaev’s hypothesis about domestication. Belyaev started the famousic farm fox domestation experiment, selecting foxes for reduced fear of humans. And pretty quickly, the foxes started to become in many respects similar to dogs. Belyaev’s hypothesis is that ‘destabilising selection’ for tameness changed some regulatory system (probably in the hypothalamus–pituitary–adrenal axis) that exposed other kinds of variation. I think it’s essentially a hypothesis about buffering.)
Laurent Excoffier about detecting recent polygenic adaptation in humans. Very impressive! The first part of the talk presented a Fst outlier test applied to whole pathways together instead of individual loci. This seems to me analogous to gene set enrichment tests that calculate some expression statistic on predefined gene sets, instead of calculating the statistic individually and then applying term enrichment tests. In both cases, the point is to detect more subtle changes on the pathway as a whole. As with many other enrichment methods, particularly in humans, it is not that obvious what to do next with the list of annotation terms. Even when the list makes good biological sense — really, is there a gene list that wouldn’t seem to make at least a bit of biological sense? The results do (again) imply epistasis in human immune traits, and that is something that could potentially be tested. Though it would be a heroic amount of work, I hope someone will use this kind of methods in some organism where it is actually possible to test the function and compare locally adapted populations.
Alison Wright’s talk on Z chromosome evolution. She works with Judith Mank, and I’ve heard a bit about it before, but sex chromosomes and the idea that you can trace the ‘strata’ of chromosome evolution are always fascinating. Wright also presented some interesting differences in the male hypermethylated region between birds with different mating systems.
William Jeffery on blind cavefish: I’ve been thinking for ages that I should blog about the blind cavefish (for popular/science and in Swedish, that is), because it’s such a beautiful example. The case for eye regression as an adaptive trait rather than just the loss of an unnecessary structure seems pretty convincing! Making an eye regress at the molecular level seems at once rather simple — removal of the lens (by apoptosis in the blind cavefish) seems to be all that is needed — and complex (it’s polygenic and apparently not achieved the same way in all blind cavefish populations).
Virpi Lummaa’s plenary about using parish records from preindustrial Finland to investigate hypotheses about reproduction, longevity and menopause. I heard about the Grandmother hypothesis ages ago, so I knew about it, but I didn’t know to what extent there was empirical support for it. Unfortunately, that many of the cases where I’ve heard a nice hypothesis but don’t know the empirical support turn out to be disappointments. Not this time, however! On top of all the good stuff in the talk, Lummaa had very pretty slides with old photos and paintings by Albert Edelfelt. The visual qualities were surpassed only by Rich FitzJohn’s slides.
The poster sessions. Now my poster session on Friday turned out very well for me, but many others weren’t so lucky. I don’t know why half of the posters were hung facing the wall with hardly enough space for people to walk by the poster board, but it was a terrible idea and must have stopped a lot of people from seeing more posters.
The gender balance. According to Julia Schroeder only 27% of invited speakers were women. I don’t know how it worked behind the scenes and what the instructions to symposium organisers were, and there might not be an easy fix, but this urgently needs fixing.
Of course, there were many more good talks and posters than the handful I’ve mentioned, and apart from them, the twitter feed and tweetup, the social gatherings and the fact that there were actually several interesting people that came to my poster to chat were highlights for me. I come home with a long list of papers to read and several pages of things to try. Good times!
I’m at the Congress of the European Society for Evolutionary Biology in Lisbon. It’s great, of course and I expected nothing less, but there is so much of it! Every session at ESEB has nine symposia running in parallel, so there are many paths through the conference programme. Mine contains a lot of genomics for obvious reasons.
Some highlights so far:
Juliette de Meaux’s plenary: while talking about molecular basis of adaptations in Arabidopsis thaliana — one study based on a candidate gene and one on a large-effect QTL — de Meaux brought up two fun concepts that would recur in Thomas Mitchel-Olds’ talk and elsewhere:
1) The ‘mutational target’ and how many genes there are that could possibly be perturbed to change a trait in question. The size of the mutational target and the knowledge of the mechanisms underlying the trait of course affects whether it is fruitful to try any candidate gene approaches. My intuition is to be skeptical of candidate gene studies for complex traits, but as in the case of plant pathogen defense (or melanin synthesis for pigmentation — another example that got a lot of attention in several talks), if there is only one enzyme pathway to synthesise a compound and only one step that controls the rate of the reaction, there will be very few genes that can physically be altered to affect the trait.
2) Some of both de Meaux’s and Mitchel-Olds’ work exemplify the mapping of intermediate molecular phenotypes to get at small-effect variants for organismal traits — the idea being that while there might be many loci and large environmental effects on the organismal traits, they will act through different molecular intermediates and the intermediate traits will be simpler. The intermediate traits might be flagellin bindning, flux through an enzymatic pathway or maybe transcript abundance — this is a similar line of thinking as the motivations for using genetical genomics and eQTL mapping.
The ”Do QTN generally exist?” symposium: my favourite symposium so far. (Note: QTN stands for Quantitative Trait Nucleotide, and it means nothing more than a known causal sequence variant for some quantitative trait. Very few actual QTN featured in the session, so maybe it should’ve been called ”Do QTG generally exist?” Whatever.) I’ve heard both him and Annalise Paaby present their RNA inference experiments revealing cryptic genetic variation in C. elegans before, but Matt Rockman also talked about some conceptual points (”things we all know but sometimes forget” [I’m paraphrasing from memory]): adaptation does not require fixation; standing variation matters; effect-size is not an intrinsic feature of an allele. There was also a very memorable question at the end, asking whether the answer to the questions Rockman posed at the beginning, ”What number of loci contribute to adaptive evolution?” and ”What is the effect-size distribution?” should be ”any number of loci” and ”any distribution” … To which Rockman answered that those were pretty much his views.
In the same symposium, Luisa Pallares, showed some really nice genome wide association result for craniofacial morphology from natural hybrid mice. As someone who works on an experimental cross of animals, I found the idea very exciting, and of course I immediately started dreaming about hybrid genetical genomics.
Dieter Ebert’s plenary: how they with lots of work seem to have found actual live Red Queen dynamics with Daphnia magna and Pasteuria ramosa.
Larry Young and Hanna Kokko: Young and Kokko had two very different invited talks back to back in the sex role symposium, Young about the neurological basis of pair-bonding in the famous monogamous voles, and Kokko about models of evolution of some aspects of sex roles.
Susan Johnston‘s talk: about how heterozygote advantage maintains variation at a horn locus in the Soay sheep of St Kilda. Simply awesome presentation and results. Published yesterday!
On to our stuff! Dominic Wright had a talk presenting our chicken comb work in the QTN session, and on Friday I will have a poster on display about the behaviour side of the project. There’s actually quite a few of us from the AVIAN group here, most of them also presenting posters on Friday (Anna-Carin, Johan, Amir, Magnus, Hanne, Rie). And (though misspelled) my name is on the abstract of Per Jensen‘s talk as well, making this my personal record for conference contribution.
The poster sessions are very crowded and a lot of the posters are hung facing the wall with very little space for walking past, and some of them behind pillars. In all probability there’s a greater than 0.5 chance that my poster will be in a horrible spot. But if you happen to be curious feel free to grab me anywhere you see me, or tweet at me.
I looke like this when posing with statues or when I’m visibly troubled by the sunlight. If you’re into genetical genomics for QTG identification, domestication and that kind of stuff, this is the hairy beast you should talk too.