Bye, bye {V8}; Hello, {Rcpp}!! #4

hrbrmstr · 2019-06-23T22:16:05Z

I think I dotted all the i's and crossed all the t's.

Removed all V8 bits in src and tests
Added in Rcpp dep
Added in R script to data-raw which will fetch the latest darksky stringified tree and munge it into something that can be used in Rcpp and auto-generates src/generated-vars.h.
Modified one test to use NA_character_ instead of NA
Re-built Rmd
Bumped up version
Added myself to DESCRIPTION and changed a teensy bit of text in there and README.Rmd to note the change to Rcpp.
Ran all tests
Ran CRAN checks

hrbrmstr · 2019-06-23T22:16:25Z

now we'll see how Travis like this mod

hrbrmstr · 2019-06-24T13:49:33Z

Lemme port https://github.com/darkskyapp/tz-lookup/blob/master/test.js over to here and run the full checks. (later this week)

hrbrmstr · 2019-06-24T15:19:12Z

OK, so the mocha tests all pass for the darksky JS lib but some of them fail (i did a way too quick port of them, so it cld be my js array data wrangling) for the Rcpp port so i def have to take more time on this but will also run them against the accurate functions as well. shld be a robust test suite!

ateucher · 2019-06-24T16:51:21Z

@hrbrmstr this is absolutely brilliant, thank you so much! I definitely will take it. Thinking I'll take this now and when/if you get the time for another PR with the mocha tests that would be great. Do you have a start somewhere I can look at?

README.Rmd

README.md

hrbrmstr · 2019-06-25T13:06:35Z

I am liking that the fam sleeps late on holiday :-)

Running:

$ mocha test.js

in the tz-lookup repo runs 1,028 tests designed to catch what look like edge cases.

I made most of them into a CSV file: tz-cases.csv.gz and ran both 'fast' and 'accurate' tests. 'fast' has 193 test errors and 'accurate' has 126:

library(lutz)
library(tidyverse)

Read in the cases

cases <- read_csv("~/Data/tz-cases.csv.gz", col_types="ddc")

Do tests with ‘fast’

fast <- tz_lookup_coords(cases$lat, cases$lng, "fast", FALSE)
fast_err <- which(!(cases$val == fast))

Do tests with ‘accurate’

accurate <- tz_lookup_coords(cases$lat, cases$lng, "accurate", FALSE)
accurate_err <- which(!(cases$val == accurate))

Gather all the errors

bind_rows(
  cases[fast_err,] %>%
    mutate(case_num = fast_err, method = "fast") %>%
    select(case_num, method, everything()),
  cases[accurate_err,] %>%
    mutate(case_num = accurate_err, method = "accurate") %>%
    select(case_num, method, everything())
) -> tz_errs

See how many for each test

count(tz_errs, method)
## # A tibble: 2 x 2
##   method       n
##   <chr>    <int>
## 1 accurate   126
## 2 fast       193

See if there are common errors

count(tz_errs, case_num, sort = TRUE)
## # A tibble: 315 x 2
##    case_num     n
##       <int> <int>
##  1      573     2
##  2      596     2
##  3      846     2
##  4      940     2
##  5        4     1
##  6        7     1
##  7        9     1
##  8       12     1
##  9       14     1
## 10       15     1
## # … with 305 more rows

I'll poke at the C++ code a bit to see what I'm not translating from javascript properly.

hrbrmstr · 2019-06-25T13:35:26Z

One reason 'accurate' results differ is some return GMT offsets vs what the test functions expect.

However, some are errors like: Antarctica/Davis which shld be +7 unless I'm reading the local timezone db wrong.

# A tibble: 126 x 6
   case_num method     lat     lng val                err_val   
      <int> <chr>    <dbl>   <dbl> <chr>              <chr>     
 1       77 accurate  90    -90    Etc/GMT            Etc/GMT+6 
 2       79 accurate  90     90    Etc/GMT            Etc/GMT-6 
 3      525 accurate -79.0   81.4  Antarctica/Davis   Etc/GMT-5 
 4      526 accurate -73.1  107.   Antarctica/Vostok  Etc/GMT-7 
 5      528 accurate -73.4    5.44 Antarctica/Troll   Etc/GMT   
 6      529 accurate -66.5   64.1  Antarctica/Mawson  Etc/GMT-4 
 7      534 accurate -81.8   20.3  Antarctica/Troll   Etc/GMT-1 
 8      535 accurate -89.7  -12.8  Antarctica/McMurdo Etc/GMT+1 
 9      539 accurate -66.3  -77.8  Antarctica/Rothera Etc/GMT+5 
10      549 accurate -80.9 -167.   Antarctica/McMurdo Etc/GMT+11
# … with 116 more rows

This does bring up another question: do both methods need to return the exact same string for the exact same input vals? GMT offsets are fine timezone specifiers IMO but it'll make testing and comparing a tad more grungy. We may want to note the diff vals in docs somewhere.

hrbrmstr · 2019-06-25T13:47:32Z

Looks like (re: 'fast' errs) might have something to do with the double math ops. I'll take a look at this again on the morrow.

hrbrmstr · 2019-06-25T15:08:07Z

Yep. truncated a double prematurely.

✔ | 1020       | tz-lookup js edge cases [0.7 s]

Will update the PR with these then we can discuss the other bits abt tz strings being returned by each method.

hrbrmstr · 2019-06-26T02:46:01Z

I rly cld not stand the line lengths for the generated data so i added a call to fold (linux/macOS) and also a {stringi} wrapper to make the generated file much less icky.

I need to add in a test for invalid input or a codecov noop comment to sget the codecov % back to passing.

hrbrmstr · 2019-06-26T03:16:41Z

7408a20 shld say "fold" vs "find". I felt icky having something that cld not run on all platforms so rewrote fold in inline Rcpp in the script that generates the header.

tests/testthat/test-tz-lookup-edge.R

data-raw/generate-vars.R

ateucher · 2019-06-27T19:21:38Z

@hrbrmstr thanks again and sorry for the radio silence. I should be able to look at this more in depth next week, but I left a couple of inline comments - certainly not expecting you to do the work, but wanted to capture some thoughts, and at least contribute a little to your awesome work! I love the script to auto-generate src/generated-vars.h, especially the fold function 😄 .

Regarding perfect alignment between the "fast" and "accurate" methods, I'm also not too fussed about some being named, and some being GMT offsets... I think a lot will be resolved if we get them both using the latest version of the timezone boundaries, as I mentioned in my comment. I have started using the latest boundary file for the "accurate" method in the update-2019a branch, but ran into an issue with overlapping timezones which I haven't yet decided how to deal with (I don't think these are an issue (yet) in the darksky boundaries - see https://github.com/darkskyapp/tz-lookup/issues/34).

hrbrmstr added 2 commits June 23, 2019 18:11

Surgically removed V8 bits and replaced with Rcpp bits

b482638

mebbe spell things right @hrbrmstr

9c3d1e2

hrbrmstr mentioned this pull request Jun 23, 2019

Rcpp (or I can convert it to plan C) version #3

Closed

fixing travis complaints

370ac28

ateucher reviewed Jun 24, 2019

View reviewed changes

README.Rmd Outdated Show resolved Hide resolved

ateucher reviewed Jun 24, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

hrbrmstr added 3 commits June 25, 2019 11:11

fixed double math; added more tests; fixed grammar

89a8f19

just slide NA into the vector index vs invalidate the entire lookup

26e5a09

Saner line lengths

c52815e

removed find binary dep (uses Rcpp function now) and tweaked codecov

7408a20

code cov

73bacf4

ateucher reviewed Jun 27, 2019

View reviewed changes

tests/testthat/test-tz-lookup-edge.R Show resolved Hide resolved

ateucher reviewed Jun 27, 2019

View reviewed changes

data-raw/generate-vars.R Show resolved Hide resolved

ateucher merged commit 73bacf4 into ateucher:master Jul 12, 2019

ateucher added a commit that referenced this pull request Jul 12, 2019

merge #4 into master

96b2660

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bye, bye {V8}; Hello, {Rcpp}!! #4

Bye, bye {V8}; Hello, {Rcpp}!! #4

hrbrmstr commented Jun 23, 2019

hrbrmstr commented Jun 23, 2019

hrbrmstr commented Jun 24, 2019

hrbrmstr commented Jun 24, 2019

ateucher commented Jun 24, 2019

hrbrmstr commented Jun 25, 2019

hrbrmstr commented Jun 25, 2019 •

edited

Loading

hrbrmstr commented Jun 25, 2019

hrbrmstr commented Jun 25, 2019

hrbrmstr commented Jun 26, 2019

hrbrmstr commented Jun 26, 2019

ateucher commented Jun 27, 2019 •

edited

Loading

Bye, bye {V8}; Hello, {Rcpp}!! #4

Bye, bye {V8}; Hello, {Rcpp}!! #4

Conversation

hrbrmstr commented Jun 23, 2019

hrbrmstr commented Jun 23, 2019

hrbrmstr commented Jun 24, 2019

hrbrmstr commented Jun 24, 2019

ateucher commented Jun 24, 2019

hrbrmstr commented Jun 25, 2019

hrbrmstr commented Jun 25, 2019 • edited Loading

hrbrmstr commented Jun 25, 2019

hrbrmstr commented Jun 25, 2019

hrbrmstr commented Jun 26, 2019

hrbrmstr commented Jun 26, 2019

ateucher commented Jun 27, 2019 • edited Loading

hrbrmstr commented Jun 25, 2019 •

edited

Loading

ateucher commented Jun 27, 2019 •

edited

Loading