# Using R: Correlation heatmap with ggplot2

Just a short post to celebrate that I learned today how incredibly easy it is to make a heatmap of correlations with ggplot2 (and reshape2, of course).

```data(attitude)
library(ggplot2)
library(reshape2)
qplot(x=Var1, y=Var2, data=melt(cor(attitude)), fill=value, geom="tile")
``` So, what is going on in that short passage? cor makes a correlation matrix with all the pairwise correlations between variables (twice; plus a diagonal of ones). melt takes the matrix and creates a data frame in long form, each row consisting of id variables Var1 and Var2 and a single value. We then plot with the tile geometry, mapping the indicator variables to rows and columns, and value (i.e. correlations) to the fill colour.

## 14 reaktioner på ”Using R: Correlation heatmap with ggplot2”

1. Marius

Very nice- I knew there would be a quick way to get a correlation table out of ggplot2, but I hadn’t pursued it. Adding in the value of each correlation is pretty simple, starting from the base you’ve provided:

cor_melt = melt(cor(attitude))

ggplot(cor_melt, aes(Var1, Var2, fill=value, label=round(value, 2))) +
geom_tile() +
geom_text()

2. I just started to think about how to plot correlations with ggplot, too. 🙂 An alternative approach might be points that indicate the correlation strength:

# add point size, by multiplying the correlation value
corr =0.999)] <- 0

ggplot(data=corr, aes(x=Var1, y=Var2, fill=value)) +
geom_point(aes(fill=value), shape=21, size=corr\$psize) +
geom_text(aes(x=Var1, y=Var2), label=c(round(corr\$value,2)), colour="white")

A question still remains: how to deal with negative correlations? Would be nice to have, e.g., red to black for correlations from -1 to 0 and black to blue for positive correlations from 0 to 1. So, the darker the color, the weaker the correlation, and red/blue indicating negative or positive correlations.

• Seems like some lines of code were not accepted:

# add point size to data frame, by multiplying the correlation value
corr = cbind(corr, psize=c(exp(abs(corr\$value))*20))
# use this if you want to hide the diagonal 1-correlations
corr\$psize[which(corr\$value>=0.999)] = 0

• Ok, got a solution for the negative value thing:

ggplot(data=corr, aes(x=Var1, y=Var2, fill=value)) +
geom_point(shape=21, size=corr\$psize) +
scale_fill_gradientn(colours=c(”#ff9999”, ”#ff6666”, ”#cc4444”, ”black”, ”#3355cc”, ”#4488ff”, ”#6699ff”), limits=c(-1,1)) +
geom_text(label=c(round(corr\$value,2)), colour=”white”)

The color gradient is not very optimal, could be better. The ”limits”-attribute makes sure that the colour range is always from -1 to +1, independent from lowest and highest correlation coefficients.

3. Hi!

Thank you for your contributions! In the above I didn’t think a lot about the presentation, so I haven’t changed any of the default theme settings. Adding the correlation in text is very useful though, even for the first exploratory graphs you make for yourself.

In my opinion, mapping numbers to the area of something is often a bit iffy, so I think I prefer the heatmap style. But I’m no graphics whiz, and opinions differ 🙂

Cheers,

m.

4. Dhaval A.

Hi, I really appreciate your code, it will be very helpful to my research. Quick question: do you know how to remove ‘var1’ and ‘var2’ from the plot please?

5. Dhaval A.

this is how you remove var1, var2 from the plot. add this to your plot code:
+theme(axis.title=element_blank())

thank you again!

• Hi!

Yes, there are plenty of options that you can set depending on how you like your plots. See ggplot2 documentation at http://docs.ggplot2.org/current/

Cheers,

m.

6. Ping: Schaver.com