Frequently Asked Questions about data.table6 months ago
Beginner FAQs | Why do DT[ , 5] and DT[2, 5] return a 1-column data.table rather than vectors like data.frame? | Why does DT[,"region"] return a 1-column data.table rather than a vector? | Why does DT[, region] return a vector for the "region" column? I'd like a 1-column data.table. | Why does DT[ , x, y, z] not work? I wanted the 3 columns x,y and z. | I assigned a variable mycol="x" but then DT[, mycol] returns an error. How do I get it to look up the column name contained in the mycol variable? | What are the benefits of being able to use column names as if they are variables inside DT[...]? | OK, I'm starting to see what data.table is about, but why didn't you just enhance data.frame in R? Why does it have to be a new package? | Why are the defaults the way they are? Why does it work the way it does? | Isn't this already done by with() and subset() in base? | Why does X[Y] return all the columns from Y too? Shouldn't it return a subset of X? | What is the difference between X[Y] and merge(X, Y)? | Anything else about X[Y, sum(foo*bar)]? | That's nice. How did you manage to change it given that users depended on the old behaviour? | General Syntax | How can I avoid writing a really long j expression? You've said that I should use the column names, but I've got a lot of columns. | Why is the default for mult now "all"? | I'm using c() in j and getting strange results. | I have built up a complex table with many columns. I want to use it as a template for a new table; i.e., create a new table with no rows, but with the column names and types copied from my table. Can I do that easily? | Is a null data.table the same as DT[0]? | Why has the DT() alias been removed? | But my code uses j = DT(...) and it works. The previous FAQ says that DT() has been removed. | What are the scoping rules for j expressions? | Can I trace the j expression as it runs through the groups? | Inside each group, why are the group variables length-1? | Only the first 10 rows are printed, how do I print more? | With an X[Y] join, what if X contains a column called "Y"? | X[Z[Y]] is failing because X contains a column "Y". I'd like it to use the table Y in calling scope. | Can you explain further why data.table is inspired by A[B] syntax in base? | Can base be changed to do this then, rather than a new package? | I've heard that data.table syntax is analogous to SQL. | What are the smaller syntax differences between data.frame and data.table | I'm using j for its side effect only, but I'm still getting data returned. How do I stop that? | Why does [.data.table now have a drop argument from v1.5? | Rolling joins are cool and very fast! Was that hard to program? | Why does DT[i, col := value] return the whole of DT? I expected either no visible value (consistent with <-), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference. | OK, thanks. What was so difficult about the result of DT[i, col := value] being returned invisibly? | Why do I have to type DT sometimes twice after using := to print the result to console? | I've noticed that base::cbind.data.frame (and base::rbind.data.frame) appear to be changed by data.table. How is this possible? Why? | I've read about method dispatch (e.g. merge may or may not dispatch to merge.data.table) but how does R know how to dispatch? Are dots significant or special? How on earth does R know which function to dispatch and when? | Why do T and F behave differently from TRUE and FALSE in some data.table queries? | Questions relating to compute time | I have 20 columns and a large number of rows. Why is an expression of one column so quick? | I don't have a key on a large table, but grouping is still really quick. Why is that? | Why is grouping by columns in the key faster than an ad hoc by? | What are primary and secondary indexes in data.table? | Error messages | "Could not find function DT" | "unused argument(s) (MySum = sum(v))" | "translateCharUTF8 must be called on a CHARSXP" | cbind(DT, DF) returns a strange format, e.g. Integer,5 | "cannot change value of locked binding for .SD" | "cannot change value of locked binding for .N" | Warning messages | "The following object(s) are masked from package:base: cbind, rbind" | "Coerced numeric RHS to integer to match the column's type" | Reading data.table from RDS or RData file | General questions about the package | v1.3 appears to be missing from the CRAN archive? | Is data.table compatible with S-plus? | Is it available for Linux, Mac and Windows? | I think it's great. What can I do? | I think it's not great. How do I warn others about my experience? | I have a question. I know the r-help posting guide tells me to contact the maintainer (not r-help), but is there a larger group of people I can ask? | Where are the datatable-help archives? | I'd prefer not to post on the Issues page, can I mail just one or two people privately? | I have created a package that uses data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works?
data.table 1.18.4Tyson Barrett datatable-faq.Rmd