Fall is the data analysis season


Dear diary,

I spent a lot of my summer in the lab, and my fall has been mostly data analysis, with a little writing and a couple of courses thrown in there. Data analysis means writing code, and nowadays I do most of my work with the help of R. R has even replaced python and perl for most ad hoc scripting. Case in point: I recently wrote an R script to generate and run a (long) series of tar commands for me. It might sound weird, but R can do these silly tasks just as well as any scripting language and even when its statistical functions play no role, its tabular data structures often come in handy.

Working on multiple similar but not identical projects also means I’ve got to reread and rework some old scripts, and I often find that when return to reuse some piece code, I’ve learned enough to rewrite it in a better way. Inspired by this paper, I’m trying to slowly improve my programming practices. The assertthat package is a new friend, and the next step is getting better testing routines going, probably with the aid of testthat. (Speaking of learning R, did you know that you get the underscore sign in ESS by double tapping the key? Just pressing it once makes an assignment arrow. I didn’t realise until the other day and I feel very stupid for it.)

We’ve been running a second season of the introduction to R seminars with the lab, also including some gene expression and massively parallel resequencing data. (The latter not so much with R, though.) I’ve learned quite a bit, and hopefully refined my R teaching skills a little. I have the impression that doing lots of in-seminar exercises has been helpful, and this time around I put a lot more emphasis on organising analysis code into scripts.

I’ve also gotten to play a bit more with quantitative genetics models with MCMCglmm, which is great fun. Speaking of MCMC, Gelman & co’s Bayesian Data Analysis 3rd edition has come out! My copy is on its way, and I’ve also bought Dirk Edelbuettel’s Rcpp book. Looking forward to that.

During November, my blog hits set a new record (almost doubling the previous most visited month), thanks to links from Matt Asher’s Probability and statistics blog and Sam Clifford’s blog . It’s very flattering to be linked by two statistics bloggers that I’ve read, one of which was already in my RSS reader.

By the way, I will be at the Evolution in Sweden meeting in Uppsala in January. If you’re there, say hi!