Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_xpt() fails on misconstructed dates #747

Open
DanChaltiel opened this issue Dec 18, 2023 · 3 comments
Open

read_xpt() fails on misconstructed dates #747

DanChaltiel opened this issue Dec 18, 2023 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@DanChaltiel
Copy link

Hi,

In some 3rd party-generated XPT files, Dates may be misconstructed, with the number of days being encoded as a string.

This doesn't cause any trouble when using SAS or other software (e.g. https://stattransfer.com/), but it causes instant failure when using {haven}.

Presently, the whole reading process throws an error if one single column is corrupt, while IMHO it should only throw a warning on the column and deliver it raw.
This way one could try to fix the issue manually.

Here is a reprex:

x = structure(c("20424", "20487"), label = "Date", class = "Date")
a = data.frame(id=1:2, date=x) #would not work with tibble()
a
#>   id       date
#> 1  1 2025-12-02
#> 2  2 2026-02-03

haven::write_xpt(a, "test.xpt")
haven::read_xpt("test.xpt")
#> Error in `date_validate()`:
#> ! Corrupt `Date` with unknown type character.
#> ℹ In file 'type-date-time.c' at line 344.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
#> Backtrace:
#>      ▆
#>   1. ├─haven::read_xpt("test.xpt")
#>   2. │ └─haven:::df_parse_xpt_file(spec, cols_skip, n_max, skip, name_repair = .name_repair)
#>   3. ├─tibble (local) `<fn>`(`<named list>`, .rows = 2L, .name_repair = "unique")
#>   4. ├─tibble:::as_tibble.list(`<named list>`, .rows = 2L, .name_repair = "unique")
#>   5. │ └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
#>   6. │   └─tibble:::check_valid_cols(x, call = call)
#>   7. │     ├─base::which(!map_lgl(x, is_valid_col))
#>   8. │     └─tibble:::map_lgl(x, is_valid_col)
#>   9. │       └─tibble:::map_mold(.x, .f, logical(1), ...)
#>  10. │         └─base::vapply(.x, .f, .mold, ..., USE.NAMES = FALSE)
#>  11. │           └─tibble (local) FUN(X[[i]], ...)
#>  12. │             └─vctrs::vec_is(x)
#>  13. │               └─vctrs::obj_is_vector(x)
#>  14. │                 └─vctrs (local) `<fn>`()
#>  15. │                   └─vctrs::vec_proxy(x = x)
#>  16. │                     └─vctrs:::date_validate(x)
#>  17. └─rlang:::stop_internal_c_lib(...)
#>  18.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)

Created on 2023-12-18 with reprex v2.0.2

This issue is related to #536, but with a reprex this time :-)

@botsp
Copy link

botsp commented Dec 29, 2023

Hi, Dan
It seems that there is an issue with the "test.xpt" file that was created using the {write_xpt} function.
image

As "$Date." is not a recognizable format for character variable in SAS/XPT, it should be "Date.". Maybe this affect the creation?

image

image

@DanChaltiel
Copy link
Author

Hi Kevin,
Yes, the "test.xpt" file has the same problem as the output of a 3rd party software which generates flawed XPT files in some settings.
Here, my object x is misconstructed as it holds a character vector of class Date while Dates should always be numeric.

The present issue is about error management in read_xpt() so that one can overcome such flawed XPT files.
Most R users cannot correct XPT files so if haven do not let us read them we are unfortunately helpless.
I'm not sure write_xpt() should be corrected for that matter, as this flawed R object x should never occur naturally.

@botsp
Copy link

botsp commented Dec 31, 2023

Agree, it seems the current conversion tool doesn't works very well.

@gorcha gorcha added the bug an unexpected problem or unintended behavior label Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants