Fix incorrect ordering in StrL comparison functions #248
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @evanmiller,
While investigating an unrelated issue with stata files crashing R in haven (tidyverse/haven#600) I noticed that readstat isn't implementing StrL ordering correctly.
In the dta spec it says:
The comparison function used for searching the StrL array is currently assuming that it's ordered by v then o, i.e. ascending o for v == 1, followed by ascending o for v == 2 etc. As a result the bsearch thinks that a lot of StrL references don't exist so they're missing from the imported file.
The same assumption has been made for writing (as one would expect 🙂), so files written by readstat roundtrip successfully.
This PR fixes the comparison functions for reading and writing.
Test data
I was double checking results against another R library that has an independently implemented parser and noticed that haven/readstat was missing a bunch of string values in the imported file.
For reference, the file I was using for testing was linked by the issue creator, and can be found in this repo:
https://github.com/sjkiss/ces19/raw/main/2019%20Canadian%20Election%20Study%20-%20Online%20Survey%20v1.0.dta