-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set() adds new cols when rows aren't updated #6204
Conversation
Generated via commit a04231d Download link for the artifact containing the test results: ↓ atime-results.zip Time taken to finish the standard R installation steps: 11 minutes and 34 seconds Time taken to run |
oops tried to resolve merge conflict on tests.rraw and screwed up the diff.. will try to fix now git is hard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great thanks!
Is this behavior documented? If not please update docs.
Please add a NEWS item.
related docs I found are ?set
to me these docs imply that set only accepts integer i, but should be consistent with I also find "docs" about what i is allowed to be in the error message below > DT=data.table(x=1)
> set(DT, i=-1L, j="b", value=5)
Error in set(DT, i = -1L, j = "b", value = 5) :
i[1] is -1 which is out of range [1,nrow=1]
> set(DT, i=0L, j="b", value=5)
> set(DT, i=NA_integer_, j="b", value=5)
> DT
x
<num>
1: 1 The error message above implies that the only valid values for i are [1,nrow] (but missing/NA and 0 in i are not mentioned on ?set). It seems to me that allowing 0L and NA_integer_ are inconsistent with this error message. Also below we see that > DT[-1L, b := 5]
> DT[NA_integer_, c := 5]
> DT[2L, d := 5]
Error in `[.data.table`(DT, 2L, `:=`(c, 5)) :
i[1] is 2 which is out of range [1,nrow=1]
> DT
x b c
<num> <num> <num>
1: 1 NA NA So to summarise, we have in current master
For consistency and backward compatibility, we should probably change this to
and change the error message to or just error everywhere (but that would be a breaking change, maybe there will be revdep issues) |
The only test case I see which specifies the behavior for weird test(2005.02, set(DT, 4L, "b", NA), error="i[1] is 4 which is out of range [1,nrow=3]") So i=-1, 0 and NA_integer_ are unspecified behavior in tests as well as docs. |
wdyt we should do about this @joshhwuu @MichaelChirico @mb706 @ben-schwen ? |
I'd vote for your proposal making Maybe another testcase to consider would be set(DT, i=integer(0), j="d", value=numeric(0)) This was also ignored previously and adds a new column as of this PR, which I also think is the right way. |
you wrote set(DT, i=integer(0), j="d", value=numeric(0))
|
I also prefer that we bring |
Apart from news, LGTM Might want to add mb706's |
Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
@tdhock WDYT about changing the error to |
yes please add docs: missing values and non-positive values are ignored, positive values greater than nrow cause error. |
@tdhock upon further investigation, it seems that negative indexing values allowed in DT = data.table(x = 1:5)
DT[-1L, b := 1]
DT
x b
<int> <char>
1: 1 <NA>
2: 2 1
3: 3 1
4: 4 1
5: 5 1 Which I don't believe
I believe the correct way to go about this is to just mention missing values and 0L being ignored in the docs. LMK if you think otherwise. |
yes that is moving in the right direction. > DT=data.table(x=1:2)
> set(DT,-1,"x",0)
Error in set(DT, -1, "x", 0) :
i[1] is -1 which is out of range [1,nrow=2]
In addition: Warning message:
In set(DT, -1, "x", 0) :
Coerced i from numeric to integer. Please pass integer for efficiency; e.g., 2L rather than 2
> DT[NA, x := 0]
> DT
x
<int>
1: 1
2: 2
> DT[c(-1,1), x := 0]
Error in `[.data.table`(DT, c(-1, 1), `:=`(x, 0)) :
Item 1 of i is -1 and item 2 is 1. Cannot mix positives and negatives.
> DT[c(-1,NA), x := 0]
Error in `[.data.table`(DT, c(-1, NA), `:=`(x, 0)) :
Item 1 of i is -1 and item 2 is NA. Cannot mix negatives and NA.
> DT[c(NA,1), x := 0]
> DT
x
<int>
1: 0
2: 2 do you think any of this should be documented? I guess we can leave it un-documented for now. |
It would be difficult/time-consuming finding each of these cases and writing good concise documentation for them, so I agree that we can leave the documentation as-is for these niche cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Closes #5409
Brings
set()
behavior closer to:=
, so thatset()
doesn't return early even if no rows are updated.Question here: I noticed that
isString()
is used insideSEXP assign
to differentiate between calls fromset()
and from:=
, therefore this change worked. Although I just want to confirm that this is intended, as it isn't immediately obvious to me whycols
fromset()
is always passed as a string while not from:=
.