Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect ordering in StrL comparison functions #248

Merged
merged 1 commit into from
Jul 25, 2021

Conversation

gorcha
Copy link
Contributor

@gorcha gorcha commented Jul 25, 2021

Hi @evanmiller,

While investigating an unrelated issue with stata files crashing R in haven (tidyverse/haven#600) I noticed that readstat isn't implementing StrL ordering correctly.

In the dta spec it says:

  1. GSOs must appear in "ascending" order of (v,o). Ascending order is defined as the same order as they appeared in <data>...</data>: ascending v for o==1, followed by ascending v for o==2, ....

The comparison function used for searching the StrL array is currently assuming that it's ordered by v then o, i.e. ascending o for v == 1, followed by ascending o for v == 2 etc. As a result the bsearch thinks that a lot of StrL references don't exist so they're missing from the imported file.

The same assumption has been made for writing (as one would expect 🙂), so files written by readstat roundtrip successfully.

This PR fixes the comparison functions for reading and writing.

Test data

I was double checking results against another R library that has an independently implemented parser and noticed that haven/readstat was missing a bunch of string values in the imported file.
For reference, the file I was using for testing was linked by the issue creator, and can be found in this repo:
https://github.com/sjkiss/ces19/raw/main/2019%20Canadian%20Election%20Study%20-%20Online%20Survey%20v1.0.dta

@evanmiller
Copy link
Contributor

Nice catch, simple fix! Will merge.

@gorcha
Copy link
Contributor Author

gorcha commented Jul 31, 2021

No worries, always happy to help! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants