Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Failed to parse [...].sav: Invalid file, or file has unsupported features when using haven package to read .sav file #287

Closed
deschen1 opened this issue Dec 2, 2022 · 3 comments

Comments

@deschen1
Copy link

deschen1 commented Dec 2, 2022

I've come across this problem several times now with different data sets that can't be read in with haven package. However, I'm posting it here, because I assume that the problem is with the underlying parsing from ReadStat.

Using the attached data set once it is unpacked (sorry for the file size!) and reading it in with haven in R:

library(haven)
df <- read_sav(file.choose())

I'm getting the following error:

Parsed 12 of 312 bytes. Remaining bytes: 	Q5_990=4000	Q10_90=4000	Q14_90=4000	Q15_90=4000	Q21_90=4000	Q23_90=4000	Q24_90=4000	Q28_90=4000	HQUAL0=2000	HQUAL8=2000	HQUALG=2000	HQUALO=2000	HQUALW=2000	HQUAL14=2000	HQUAL1C=2000	HQUAL1K=2000	HQUAL1S=2000	HQUAL20=2000	HQUAL28=2000	HQUAL2G=20
[731886-data-Rtest3.zip](https://github.com/WizardMac/ReadStat/files/10140850/731886-data-Rtest3.zip)
00	HQUAL2O=2000	HQUAL2W=2000	HQUAL34=2000	HQUAL3C=2000	
Error: Failed to parse [...].sav: Invalid file, or file has unsupported features.

The problem is probably again with some weird or superlong character variables (maybe it's also connected to this multibyte problem?) that SPSS would handle without issues, but haven (or other tools do not).

BTW: When I open the file in SPSS, save it and then try to open it again with haven, it works just fine.

731886-data-Rtest3.zip

@evanmiller
Copy link
Contributor

Hi, I've gotten a chance to investigate this issue. It looks like the file provided doesn't quite match the (unofficial) spec here:

https://www.gnu.org/software/pspp/pspp-dev/html_node/Very-Long-String-Record.html

According to that document, the key=value pairs in a Very Long String Record are delimited by a zero byte followed by a tab character. In this file, the zero byte is missing. Fortunately, this should be an easy fix.

@deschen1
Copy link
Author

Great, thanks for adding the fix. So we probably need to wait until you officially release an update and then I probably need to nudge the haven devs about it so that it does find its way to the R package.

@evanmiller
Copy link
Contributor

@deschen1 Yes. You can also try patching locally but it may be more trouble than it's worth. I'll aim to have an official ReadStat update in about a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants