Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in read_sas using catalog file #680

Closed
ValValetl opened this issue May 6, 2022 · 15 comments · Fixed by #713
Closed

Error in read_sas using catalog file #680

ValValetl opened this issue May 6, 2022 · 15 comments · Fixed by #713
Labels
bug an unexpected problem or unintended behavior readstat

Comments

@ValValetl
Copy link

Hi, I am getting the error message "Error: Failed to parse formats.sas7bcat: Invalid file, or file has unsupported features. " when importing SAS data with a catalog file. This is the same error as in the closed issue #34. The data import without catalog file works.

I am using the latest haven version (2.5.0) and tested it with the development version on github.

fpath <- "path/to/sas/data/file"
catalog  <- "formats.sas7bcat"
sas_data <- haven::read_sas(fpath, catalog_file = catalog)

Error: Failed to parse catlog.sas7bcat: Invalid file, or file has unsupported features. 
@gorcha
Copy link
Member

gorcha commented May 6, 2022

Hi @ValValetl, thanks for the bug report.

Can you please share the catalog file and also some example data if possible?
Without the catalog file it's not possible to track down the error.

@ValValetl
Copy link
Author

Hi @gorcha
Unfortunately, this is not possible at it is non-public data. I thought the issue report might still be of interested as issue #34 was closed a while ago, without any resolution of the issue.

@gorcha
Copy link
Member

gorcha commented May 6, 2022

Even if not the data, are you able to share the catalog file?

@ValValetl
Copy link
Author

I need to check with the owner. I will get back to you later. Thanks for your quick responses!

@gorcha gorcha added the reprex needs a minimal reproducible example label May 21, 2022
@ValValetl
Copy link
Author

Sorry for the long delay. Here is the catalog file that produces the error message: sas_catalog_file.zip

@gorcha
Copy link
Member

gorcha commented Jul 14, 2022

No worries at all, thanks!

@joshuaborn
Copy link

Was this ever diagnosed? I'm running into the same issue.

@gorcha gorcha added readstat bug an unexpected problem or unintended behavior and removed reprex needs a minimal reproducible example labels Sep 1, 2022
@gorcha
Copy link
Member

gorcha commented Sep 1, 2022

Hi @joshuaborn, I haven't had a chance to look at this yet unfortunately but hopefully will over the next few weeks.

There's no guarantee that this is the same issue affecting you. Would you be able to provide an example file that I can test by any chance?

@joshuaborn
Copy link

Hi, @gorcha . The particular file I first encountered the issue with was a restricted use file, but I've seen it with at least one other data set since then. I should have some time this weekend to try it out with public use data files, and if I can replicate it, I'll share.

@gorcha
Copy link
Member

gorcha commented Sep 2, 2022

Thanks @joshuaborn, much appreciated!

@joshuaborn
Copy link

joshuaborn commented Feb 20, 2023

NSFG_example.zip

I neglected to follow-up on this back in September, but I was using Haven today and found a good example of this issue with public use data. Attached are four files from the National Survey of Family Growth 2017-2019 public use data. The d2017_2019femresp.sas7bdat and d2017_2019femresp.sas7bcat pair load using read_sas just fine, but trying to use read_sas with the d2017_2019fempreg.sas7bdat and d2017_2019fempreg.sas7bcat pair leads to an error message of the form

Error: Failed to parse .../d2017_2019fempreg.sas7bcat: Invalid file, or file has unsupported features.

Using read_sas on just d2017_2019fempreg.sas7bdat without the catalog file works.

I'm using R version 4.2.2 on Windows 11 with Haven version 2.5.1.

The interesting thing about this example is that the pregnancy data table (d2017_2019fempreg) is ultimately derived from the female respondents table (d2017_2019femresp). I tried examining the two catalog files in SAS using PROC CATALOG, but didn't see anything obvious in one, but not the other.

As an aside, since these parse errors seem to happen with catalog files more than with regular SAS data files, maybe it would be worth adding to Haven the ability to side-load value labels from a sas7bdat file or even a CSV file? It seems pretty straightforward to load another table and call labelled as needed, and SAS can export its value labels to a regular data table easily with PROC CONTENTS, etc. I would be willing to work on this, since it would save me time in the long run.

@gorcha
Copy link
Member

gorcha commented Feb 20, 2023

Hi @joshuaborn, thanks for the extra example file - there have been a few recent updates in the dev version of ReadStat for catalog file reading that might resolve these issues, I'll check it out.

I suspect this is a little different to the initial problem in this issue (which was specifically a problem with Unix 64 bit file formats), but there are some other bugs that have been fixed that might affect this one.

@gorcha
Copy link
Member

gorcha commented Feb 20, 2023

Hi @joshuaborn, can confirm that the recent ReadStat changes have fixed the issue with this file. They've just released an update over there so these should be in haven shortly!

@joshuaborn
Copy link

Hi, @gorcha. Thanks for confirming that! And my apologies for resurrecting the wrong issue thread.

gorcha added a commit that referenced this issue Feb 21, 2023
Maintains iconv hack from c1f9f19 and solaris hack from 4a878a1.

* Fix various SAS catalog file reading bugs (fix #529, fix #653, fix #680, fix #696, fix #705).
* Increase maximum SAS page file size to 16MB (fix #697).
* Ignore invalid SAV timestamp strings (fix #683).
* Fix compiler warnings (fix #707).
@gorcha
Copy link
Member

gorcha commented Feb 21, 2023

No worries at all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior readstat
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants