-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unordered_map for string references #584
Conversation
And version 13 or later. And document.
It wouldn't leak memory, but I guess you could have dangling pointers if you free whatever string data the pointers you put into the map are pointing to and don't remove them from the map. |
@gorcha do you feel comfortable reviewing/finishing this off? |
Yeah for sure, shall have a look! |
Hey @hadley, the writing code on the haven side looks fine but it looks like there are a couple of bugs in ReadStat.
I'll follow up with Evan over at ReadStat. |
After poking around a bit more I've found a workaround for the map issue. The string refs need to be added before we start writing data so the offsets in the map are written out correctly. I've opened a PR for the off by one error (WizardMac/ReadStat#270) - without this fixed we can still roundtrip strLs successfully, but there will likely be issues reading the output files in Stata since the indexing is off. |
@gorcha thanks for looking into this! |
No worries! I've just added the fix for the off by one issue as well, the PR has been merged into ReadStat and it's a small change that doesn't affect anything other than string ref writing so I figured there's no need to wait for it to come through in a ReadStat release. |
Sounds good. Do you want to merge it? |
One final thought - should we make the threshold for strLs user controllable rather than hard coding to 500? |
It'd be great if you wanted to do that! |
Yep no worries! Shall do in the next couple of days. |
Done - I've set it to 2045 by default since that mimics the default Stata behaviour. |
Do you think it's worth doing a release now? Or are there other changes you want to wait on? |
I think so - the other big/impactful change in progress is fixing the encoding issue in #615, but the PR over at ReadStat isn't quite ready so I feel like it'll still be a while (we still need to actually implement the new ReadStat functionality in haven). There are a couple of minor things to do (adding a warning when there are conflicting labels for #667, adding a cleanup script for #668) but I can sort them out today. |
No rush, just ping me when I should start the release process. |
@jimhester could you please take a look at this? I can't remember if this is ok or it if will leak memory.
readstat_string_ref_t
is: