ARROW-15659 [R] strptime should return NA (not error) with format mismatch #12402

dragosmg · 2022-02-11T12:58:29Z

This PR aligns arrow's binding to strptime() to base::strptime() when the value passed to the format argument does not match the data. Currently arrow errors, when it should return NA.

library(lubridate)
library(arrow)
library(dplyr)

df <- tibble(x = "2022-02-11")

df %>% 
  mutate(z = strptime(x, format = "%Y-%m %d"))
#> # A tibble: 1 × 2
#>   x          z     
#>   <chr>      <dttm>
#> 1 2022-02-11 NA

df %>% 
  record_batch() %>% 
  mutate(z = strptime(x, format = "%Y-%m %d")) %>% 
  collect()
#> Error: Invalid: Failed to parse string: '2022-02-11' as a scalar of type timestamp[ms]

^{Created on 2022-02-11 by the reprex package (v2.0.1)}

github-actions · 2022-02-11T12:58:48Z

https://issues.apache.org/jira/browse/ARROW-15659

github-actions · 2022-02-11T12:58:49Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

dragosmg · 2022-03-25T08:30:05Z

I think several things are needed:

new unit tests to show similar behaviour (returning NA in the same circumstances)
update existing unit tests. These are not using compare_dplyr_binding() and are not catching the fact that Arrow's strptime binding returns an object of a different class (POSIXct, a double vector) that base::strptime() (POSIXlt, a list)

waldo::compare() does ignore class when ignore_attr = TRUE, but it can't ignore typeof. I'm not sure we want to change the type of object Arrow's strptime binding returns.

dragosmg · 2022-03-28T13:00:03Z

@jonkeane what do you think? Is the type of the object returned by strptime of concern? base::strptime() returns a list (POSIXlt), while the Arrow one returns an atomic vector. We hadn't caught this before as the unit tests for strptime did not compare the {dplyr} and {arrow} pipelines.

jonkeane · 2022-03-28T13:59:23Z

@jonkeane what do you think? Is the type of the object returned by strptime of concern? base::strptime() returns a list (POSIXlt), while the Arrow one returns an atomic vector. We hadn't caught this before as the unit tests for strptime did not compare the {dplyr} and {arrow} pipelines.

There are a few comments about this in the tests that reference this — I don't think it was unknown. POSIXlt is pretty unique to R (I'm sure there were other languages that have similar types), but they aren't super wide-spread, and Arrow doesn't support them directly (though you can see in the R package we make a (pseudo)-extension class for them).

dragosmg · 2022-03-28T15:26:55Z

This needs a rebase once #12732 gets merged.

r/R/dplyr-funcs-datetime.R

jonkeane

A few questions about the tests

r/tests/testthat/test-dplyr-funcs-datetime.R

…tched format

…ed in R

ursabot · 2022-03-30T14:02:00Z

Benchmark runs are scheduled for baseline = 64560af and contender = ba04e7f. ba04e7f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.21% ⬆️0.04%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.04% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/419| ba04e7f8 ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/405| ba04e7f8 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/405| ba04e7f8 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/415| ba04e7f8 ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/418| 64560af6 ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/404| 64560af6 test-mac-arm>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/404| 64560af6 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/414| 64560af6 ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the Component: R label Feb 11, 2022

dragosmg force-pushed the strptime_return_na_vs_error branch from 7a2700d to d7fefa7 Compare March 24, 2022 16:00

github-actions bot added the Component: C++ label Mar 28, 2022

dragosmg commented Mar 28, 2022

View reviewed changes

r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

dragosmg force-pushed the strptime_return_na_vs_error branch from 0a1b35f to 727efbd Compare March 29, 2022 09:24

dragosmg marked this pull request as ready for review March 29, 2022 12:21

jonkeane requested changes Mar 29, 2022

View reviewed changes

r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

dragosmg requested a review from jonkeane March 29, 2022 19:52

dragosmg added 13 commits March 30, 2022 12:33

add unit test to show the different returns from strptime with mistma…

1ead9c5

…tched format

strptime returns NA / NULL instead of error

962ca1b

add failing unit test

48eee0a

added a C++ unit test to try and replicate the weird behavior i notic…

4894e8b

…ed in R

extend the C++ test

10708d7

unit test for the "%m/%d/%Y %z" format

3d56e03

test

96108a5

cleaned-up tests

870ca53

use build_expr() to support regular R objects too

cb4ecd5

added unit test for R objects

c696fb7

removed the C++ unit tests

e10bb1e

testing with NA

5d19668

updated tests to be more compare_dplyr-like

f669be7

dragosmg force-pushed the strptime_return_na_vs_error branch from 25863ef to f669be7 Compare March 30, 2022 11:33

jonkeane closed this in ba04e7f Mar 30, 2022

dragosmg deleted the strptime_return_na_vs_error branch April 26, 2022 09:34

asfimport mentioned this pull request Apr 2, 2022

[R] strptime should return NA (not error) with format mismatch #31114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-15659 [R] strptime should return NA (not error) with format mismatch #12402

ARROW-15659 [R] strptime should return NA (not error) with format mismatch #12402

Uh oh!

dragosmg commented Feb 11, 2022 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2022

Uh oh!

github-actions bot commented Feb 11, 2022

Uh oh!

dragosmg commented Mar 25, 2022 •

edited

Loading

Uh oh!

dragosmg commented Mar 28, 2022

Uh oh!

jonkeane commented Mar 28, 2022

Uh oh!

dragosmg commented Mar 28, 2022

Uh oh!

Uh oh!

jonkeane left a comment

Uh oh!

Uh oh!

Uh oh!

ursabot commented Mar 30, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ARROW-15659 [R] strptime should return NA (not error) with format mismatch #12402

ARROW-15659 [R] strptime should return NA (not error) with format mismatch #12402

Uh oh!

Conversation

dragosmg commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2022

Uh oh!

github-actions bot commented Feb 11, 2022

Uh oh!

dragosmg commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dragosmg commented Mar 28, 2022

Uh oh!

jonkeane commented Mar 28, 2022

Uh oh!

dragosmg commented Mar 28, 2022

Uh oh!

Uh oh!

jonkeane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ursabot commented Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dragosmg commented Feb 11, 2022 •

edited

Loading

dragosmg commented Mar 25, 2022 •

edited

Loading

ursabot commented Mar 30, 2022 •

edited

Loading