Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix tests for ties.method='last' to work on R 3.1 #4243

Merged
merged 4 commits into from
Feb 15, 2020
Merged

Conversation

MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Feb 14, 2020

Follow up to #4242

ties.method='last' only introduced on R 3.3 < 3.1, so the tests comparing base::rank to frank fail there.

This approach shuts of ties.method='last' testing for R 3.1 (easiest solution).

cc @jangorecki

PS anyone remember why we apparently shut off ties.method='random' testing? It was failing for me here, not sure why.

@codecov
Copy link

codecov bot commented Feb 14, 2020

Codecov Report

Merging #4243 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4243      +/-   ##
==========================================
+ Coverage   99.61%   99.61%   +<.01%     
==========================================
  Files          72       72              
  Lines       13873    13874       +1     
==========================================
+ Hits        13819    13820       +1     
  Misses         54       54
Impacted Files Coverage Δ
R/frank.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 230cfcf...518f39a. Read the comment docs.

@mattdowle mattdowle added this to the 1.12.9 milestone Feb 15, 2020
@mattdowle
Copy link
Member

mattdowle commented Feb 15, 2020

Very nice solution using setdiff ... formals. This way if any more are added to R-devel in future we'll know straight away in dev. Nice.

I loooked at history and I couldn't find ties.method='random' test being turned off. Rather it just seems it wasn't tested from the start: 1ff5e1e#diff-d322b46c285376a06f948ecef50b6763
frank was later moved from setkey.R to frank.R on March 9, 2016, and random wasn't tested then either.
Then you caught it and added coverage in test 1962.027.
So the question remains why random can't be added in this loop.
Including random works for me locally. Turning it on in this branch to see if it passes CI.
When you ran locally did you run through test.data.table() which sets the RNG to R's old default for test purposes?

@mattdowle
Copy link
Member

Ok I get a test fail too. Just in the second test (1369) with na.last=NA. So I left "random" on in 1368 and added a comment to 1369 so this branch should pass for now.
Will have another look later to see why there's a difference with na.last=NA ...

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Feb 15, 2020

I did look at it briefly and yes I think NA is the issue

# base::rank
random = sort.list(order(x, stats::runif(sum(!nas))))
# frankv
if (ties.method == "random") {
    v = stats::runif(nrow(x))
    if (is.na(na.last)) {
      idx = which_(nas, FALSE)
      set(x, idx, '..stats_runif..', v[idx])
    } else set(x, NULL, '..stats_runif..', v)
    order = if (length(order) == 1L) c(rep(order, length(cols)), 1L) else c(order, 1L)
    cols = c(cols, ncol(x))
  }

runif is called potentially a different number of times in the two functions which would throw off the seeds?

@mattdowle
Copy link
Member

But the test sets the seed before the calls. Isn't it runif(sum(!nas))) vs runif(nrow(x)) i.e. frank including the NA positions to remove them afterwards could result in a different ordering than the base way. Both are right I guess, just different.

@MichaelChirico
Copy link
Member Author

say the first element of the input is NA

then frank will use the first random draw on an NA, while base::rank will not take any draws, so the random ordering will end up different IINM.

I don't think we need to force exact consistency between rank and frank here -- random ordering should be random after all

@mattdowle
Copy link
Member

mattdowle commented Feb 15, 2020

Yes both are random. But we may as well match base unless there's a reason not to. In this case it was easier to match base and turn the test on with 'random' included (thanks to your spot) so it could all be put to bed. Otherwise we'd have had to find another way to test random with na.last=NA.

@mattdowle mattdowle merged commit 29ad52f into master Feb 15, 2020
@mattdowle mattdowle deleted the frank-last-3.1 branch February 15, 2020 02:32
@MichaelChirico
Copy link
Member Author

nice! I see the difference was only a few lines so yes that's preferred

@jangorecki jangorecki modified the milestones: 1.12.11, 1.12.9 May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants