Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bye, bye {V8}; Hello, {Rcpp}!! #4

Merged
merged 8 commits into from
Jul 12, 2019
Merged

Conversation

hrbrmstr
Copy link
Contributor

I think I dotted all the i's and crossed all the t's.

  • Removed all V8 bits in src and tests
  • Added in Rcpp dep
  • Added in R script to data-raw which will fetch the latest darksky stringified tree and munge it into something that can be used in Rcpp and auto-generates src/generated-vars.h.
  • Modified one test to use NA_character_ instead of NA
  • Re-built Rmd
  • Bumped up version
  • Added myself to DESCRIPTION and changed a teensy bit of text in there and README.Rmd to note the change to Rcpp.
  • Ran all tests
  • Ran CRAN checks

@hrbrmstr
Copy link
Contributor Author

now we'll see how Travis like this mod

@hrbrmstr
Copy link
Contributor Author

Lemme port https://github.com/darkskyapp/tz-lookup/blob/master/test.js over to here and run the full checks. (later this week)

@hrbrmstr
Copy link
Contributor Author

OK, so the mocha tests all pass for the darksky JS lib but some of them fail (i did a way too quick port of them, so it cld be my js array data wrangling) for the Rcpp port so i def have to take more time on this but will also run them against the accurate functions as well. shld be a robust test suite!

@ateucher
Copy link
Owner

@hrbrmstr this is absolutely brilliant, thank you so much! I definitely will take it. Thinking I'll take this now and when/if you get the time for another PR with the mocha tests that would be great. Do you have a start somewhere I can look at?

README.Rmd Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@hrbrmstr
Copy link
Contributor Author

I am liking that the fam sleeps late on holiday :-)

Running:

$ mocha test.js

in the tz-lookup repo runs 1,028 tests designed to catch what look like edge cases.

I made most of them into a CSV file: tz-cases.csv.gz and ran both 'fast' and 'accurate' tests. 'fast' has 193 test errors and 'accurate' has 126:

library(lutz)
library(tidyverse)

Read in the cases

cases <- read_csv("~/Data/tz-cases.csv.gz", col_types="ddc")

Do tests with ‘fast’

fast <- tz_lookup_coords(cases$lat, cases$lng, "fast", FALSE)
fast_err <- which(!(cases$val == fast))

Do tests with ‘accurate’

accurate <- tz_lookup_coords(cases$lat, cases$lng, "accurate", FALSE)
accurate_err <- which(!(cases$val == accurate))

Gather all the errors

bind_rows(
  cases[fast_err,] %>%
    mutate(case_num = fast_err, method = "fast") %>%
    select(case_num, method, everything()),
  cases[accurate_err,] %>%
    mutate(case_num = accurate_err, method = "accurate") %>%
    select(case_num, method, everything())
) -> tz_errs

See how many for each test

count(tz_errs, method)
## # A tibble: 2 x 2
##   method       n
##   <chr>    <int>
## 1 accurate   126
## 2 fast       193

See if there are common errors

count(tz_errs, case_num, sort = TRUE)
## # A tibble: 315 x 2
##    case_num     n
##       <int> <int>
##  1      573     2
##  2      596     2
##  3      846     2
##  4      940     2
##  5        4     1
##  6        7     1
##  7        9     1
##  8       12     1
##  9       14     1
## 10       15     1
## # … with 305 more rows

I'll poke at the C++ code a bit to see what I'm not translating from javascript properly.

@hrbrmstr
Copy link
Contributor Author

hrbrmstr commented Jun 25, 2019

One reason 'accurate' results differ is some return GMT offsets vs what the test functions expect.

However, some are errors like: Antarctica/Davis which shld be +7 unless I'm reading the local timezone db wrong.

# A tibble: 126 x 6
   case_num method     lat     lng val                err_val   
      <int> <chr>    <dbl>   <dbl> <chr>              <chr>     
 1       77 accurate  90    -90    Etc/GMT            Etc/GMT+6 
 2       79 accurate  90     90    Etc/GMT            Etc/GMT-6 
 3      525 accurate -79.0   81.4  Antarctica/Davis   Etc/GMT-5 
 4      526 accurate -73.1  107.   Antarctica/Vostok  Etc/GMT-7 
 5      528 accurate -73.4    5.44 Antarctica/Troll   Etc/GMT   
 6      529 accurate -66.5   64.1  Antarctica/Mawson  Etc/GMT-4 
 7      534 accurate -81.8   20.3  Antarctica/Troll   Etc/GMT-1 
 8      535 accurate -89.7  -12.8  Antarctica/McMurdo Etc/GMT+1 
 9      539 accurate -66.3  -77.8  Antarctica/Rothera Etc/GMT+5 
10      549 accurate -80.9 -167.   Antarctica/McMurdo Etc/GMT+11
# … with 116 more rows

This does bring up another question: do both methods need to return the exact same string for the exact same input vals? GMT offsets are fine timezone specifiers IMO but it'll make testing and comparing a tad more grungy. We may want to note the diff vals in docs somewhere.

@hrbrmstr
Copy link
Contributor Author

Looks like (re: 'fast' errs) might have something to do with the double math ops. I'll take a look at this again on the morrow.

@hrbrmstr
Copy link
Contributor Author

Yep. truncated a double prematurely.

✔ | 1020       | tz-lookup js edge cases [0.7 s]

Will update the PR with these then we can discuss the other bits abt tz strings being returned by each method.

@hrbrmstr
Copy link
Contributor Author

I rly cld not stand the line lengths for the generated data so i added a call to fold (linux/macOS) and also a {stringi} wrapper to make the generated file much less icky.

I need to add in a test for invalid input or a codecov noop comment to sget the codecov % back to passing.

@hrbrmstr
Copy link
Contributor Author

7408a20 shld say "fold" vs "find". I felt icky having something that cld not run on all platforms so rewrote fold in inline Rcpp in the script that generates the header.

@ateucher
Copy link
Owner

ateucher commented Jun 27, 2019

@hrbrmstr thanks again and sorry for the radio silence. I should be able to look at this more in depth next week, but I left a couple of inline comments - certainly not expecting you to do the work, but wanted to capture some thoughts, and at least contribute a little to your awesome work! I love the script to auto-generate src/generated-vars.h, especially the fold function 😄 .

Regarding perfect alignment between the "fast" and "accurate" methods, I'm also not too fussed about some being named, and some being GMT offsets... I think a lot will be resolved if we get them both using the latest version of the timezone boundaries, as I mentioned in my comment. I have started using the latest boundary file for the "accurate" method in the update-2019a branch, but ran into an issue with overlapping timezones which I haven't yet decided how to deal with (I don't think these are an issue (yet) in the darksky boundaries - see https://github.com/darkskyapp/tz-lookup/issues/34).

@ateucher ateucher merged commit 73bacf4 into ateucher:master Jul 12, 2019
ateucher added a commit that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants