Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table does not behave like a data.frame #5529

Closed
ja-ortiz-uniandes opened this issue Nov 16, 2022 · 8 comments
Closed

data.table does not behave like a data.frame #5529

ja-ortiz-uniandes opened this issue Nov 16, 2022 · 8 comments

Comments

@ja-ortiz-uniandes
Copy link

My understanding was that all data.tables are meant to be fully compatible with data.frames. However, I might be mistaken. If they are meant to be fully compatible I found something that data.frames can do that I believe data.tables cannot (and this is a bug).

I'm talking about appending a row of data at the end of the data.thing. A reproducible example is bellow.

If data.tables are not meant to be fully compatible with data.frames then it would be nice to have an easy way to append a row (and hopefully even insert) of data (this is also a feature request). I know currently one can use rbind() or rbindlist(). However, I am referring to something using the i and j arguments something like dt[3.5, `:=`(c(col1, col2, ...)) ] would insert a new row between rows 3 and 4 with the new data being presented in order in the vector c(col1, col2, ...) or in a named list list("col1" = val1, "col2" = val2, ...). It would also be cool if instead of using the := operator one usted the . wrapper to make it so that line insertions are not permanent.

I couldn't find any reference to a data.table way* of inserting/appending a new row nor to any other cases where a data.frame behaves differently.

  • By data.table way I'm referring to a fast, efficient way of preforming an operation inside [].

Thanks to everyone developing the data.table! this is my Nu.1 package for R ! <3

Minimal reproducible example

library(data.table)

# With a data.frame
df <- data.frame(var1 = c(4, 13, 7, 8),
                  var2 = c(15, 9, 9, 13),
                  var3 = c(12, 12, 7, 5))
df
#>   var1 var2 var3
#> 1    4   15   12
#> 2   13    9   12
#> 3    7    9    7
#> 4    8   13    5


# append row to end of data.frame 
df[nrow(df) + 1, ] <- c(5, 5, 3)
df
#>   var1 var2 var3
#> 1    4   15   12
#> 2   13    9   12
#> 3    7    9    7
#> 4    8   13    5
#> 5    5    5    3


# With a data.table
dt <- as.data.table(df1)
dt
#>    var1 var2 var3
#> 1:    4   15   12
#> 2:   13    9   12
#> 3:    7    9    7
#> 4:    8   13    5
#> 5:    5    5    3


# append row to end of data.table 
dt[nrow(dt) + 1, ] <- c(5, 5, 3)
dt
#>    var1 var2 var3
#> 1:    4   15   12
#> 2:   13    9   12
#> 3:    7    9    7
#> 4:    8   13    5
#> 5:    5    5    3

Created on 2022-11-16 with reprex v2.0.2

@jangorecki
Copy link
Member

Thanks for long report. I suggest to invest more time into reading documentation. FAQ and introduction. Feature you requested was already requested before so your request is a duplicate. In one issue I quite deeply elaborated why insertion/deletion rows is not as simple as columns.

@jangorecki
Copy link
Member

Actually using i=3.5 to insert a row between 3rd and 4th something new that haven't been requested before, and is quite interesting idea.

@jangorecki
Copy link
Member

jangorecki commented Nov 17, 2022

I meant this explanation #4345 (comment)
So doing insert by reference at the end of the table, is still on the roadmap, but in between the rows is out of scope, due to the C memory layout of R vectors, explained in this link.
Of course we could still provide a function to insert rows that doesn't operate by reference and just do rbindlist under the hood.

@ja-ortiz-uniandes
Copy link
Author

ja-ortiz-uniandes commented Nov 17, 2022

Thank you for your response here are my thoughts:

Thank you for sharing your comments, I humbly accept all your suggestions. Even though I am quite an avid user of data.table there is always more to learn.

I am happy you found the idea for i = 3.5 interesting.

Now that you mention it, I understand why adding rows by-reference is not as simple as adding columns. Perhaps then, := operator is not appropriate.

I am also happy that appending a row is in the roadmap and hope to take this opportunity to piggyback row insertion in there too.

Also, I think the under-the-hood solution for calling rbindlist() would be great!

As for the difference in behavior between data.tables and data.frames do you have any comments? What would the intended functionality be here?

@jangorecki
Copy link
Member

as of now only rbindlist/rbind

@jangorecki
Copy link
Member

I will close this issue as duplicate of #660

@ja-ortiz-uniandes
Copy link
Author

ja-ortiz-uniandes commented Nov 17, 2022

I assume this means that data.tables are not meant to function as data.frames.

@jangorecki
Copy link
Member

Not 100%, this syntax you presented here is currently not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants