Using R: tibbles and the t.test function

A participant in the R course I’m teaching showed me a case where a tbl_df (the new flavour of data frame provided by the tibble package; standard in new RStudio versions) interacts badly with the t.test function. I had not seen this happen before. The reason is this:

Interacting with legacy code
A handful of functions are don’t work with tibbles because they expect df[, 1] to return a vector, not a data frame. If you encounter one of these functions, use to turn a tibble back to a data frame (tibble announcement on RStudio blog)

Here is code that reproduces the situation (tibble version 1.2):

chick_tibble <- as_tibble(chickwts)
casein <- subset(chickwts, feed == "casein")
sunflower <- subset(chick_tibble, feed == "sunflower")
t.test(sunflower$weight, casein$weight) ## this works
t.test([, 1]),[, 1])) ## this works too
t.test(sunflower[, 1], casein[, 1]) ## this doesn't

Error: Unsupported use of matrix or array for column indexing

I did not know that. The solution, which they found themselves, is to use

I can see why not dropping to a vector makes sense. I’m sure you’ve at some point expected a data frame and got an ”$ operator is invalid for atomic vectors”. But it’s an unfortunate fact that number of weird little thingamajigs to remember is always strictly increasing as the language evolves. And it’s a bit annoying that the standard RStudio setup breaks code that uses an old stats function, even if it’s in somewhat non-obvious way.

4 reaktioner på ”Using R: tibbles and the t.test function

  1. Though a newbie in R
    I think 1 valid point of view is that package writers shoukd validate class
    If you send a flowr(tibble) when a leaf(data.frame) is expected be prepared for traps & pitfalls

    Other 2 traps I faced
    Package writers use ellipsis functions and do not validate parameters
    So if you have a mistake in case or spelling things fail silently

    There is no namespace
    So I package happily ”replaces” functions of another

    Release 4 of R should iron out traps and pitfalls
    and also have a strict QA of the 10000 packages in CRAN for robustness

    • I think it’s a reasonable choice not to drop to a vector. I think the developers take a somewhat prescriptivist view — having tibbles be consistent is more important than maintaining perfect backward compatibility. Fair enough.

      And yes, I would also love it if R could be a bit more strict and yell at me more often when I make typoes and so on. But I’m sure that is easier said than done.

      And I don’t think it would be possible (or desirable, either) to demand that the CRAN maintainers adjudicate every naming conflict between packages. But hey, that’s just my opinion.

  2. But you are explicitly converting your data to a tibble – this is an intentional and documented effect of tibbles (do not drop a 1-column df to a vector).

    • I know. I don’t think this is a bad choice by the the tibble developers. It’s reasonable not to drop to a vector.

      I write about it not to complain, but to help my future self, and people like me, when we google this error message in the future.

      (And yes, the above code explicitly converts to a tibble. The course participant did it implicitly — because readr generates tibbles.)

Kommentarer är stängda.