-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent formatting information in SPSS metadata? #77
Comments
Hi, your issue looks very much like WizardMac/ReadStat#210. Try updating to pyreadstat 1.0.2 and see if that fixes the issue. |
I agree it does look similar, but I'm afraid this is happening on 1.0.2. |
Okay, it looks like each variable has several "widths" that need to be distinguished.
Then there is the It looks like there is a bug in ReadStat similar to #210 that affects While ReadStat is successfully reading the special record that indicates the length of a long string, it's only using that information to determine the storage width. It should be using that information to override the format width as well. All of this is to say, I think I know what's going on, and a fix should find its way through pipes before long. Thanks for the detailed report! |
Great to hear, and thanks so much for the swift attention! :) |
the issue will be solved on the next version as it has already been fixed in Readstat. |
solved in pyreadstat version 1.0.6 |
Hi! I've been using this package for a good while now, and love it immensely - it is the centerpiece of several advanced applications that I have written for organizing and modifying SPSS files, and it's made a real difference to my organization and clients. I can't thank you enough for providing it.
This issue is something that I detected a while back, but have heretofore just been working around; I'm not sure how to classify it, and I'm hoping I can get some information about how the metadata information is gathered.
Describe the issue
The basic problem is that there is a difference between these three things:
Here's what we see in variable view of SPSS:
Here's the original_variable_types:
{'ResponseId': 'A18', 'StartDate': 'A255', 'Duration__in_seconds_': 'F40.2', 'Finished': 'F1.0'}
...and here's the variable_storage_width:
{'ResponseId': 24, 'StartDate': 1024, 'Duration__in_seconds_': 8, 'Finished': 8}
Look at the two text variables: ResponseId reads the A18 'correctly', but the StartDate field is showing A255 when it should be showing A1024. If it were always that the variable_storage_width were the reliable source, I could use that to overwrite the format, but, looking again at ResponseId, if I did that in this case, I would get A24, which would be incorrect. Note that the numeric variables do provide the correct thing - I just left those in for visibility/comparison.
So I guess the question is, how does original_variable_types gather its data, and is there a way that I can predict which one of these items is the one that SPSS will expect, so that I can reliably hold the 'real' format? Or is this a bug, and the A255 is showing because it's hitting some kind of small-string limit? Thinking about it as I'm writing all of this out, I suppose 255 is a very suspicious number for that to insert...
To Reproduce
This isn't really a code issue, but here's the simple code I ran to produce those, nothing out of the ordinary:
File example
test_width.zip
Expected behavior
I guess what I'm really after is how do I reliably recreate the 'actual' format as shown in the variable view, so that I can write syntax against it that refers to the correct size.
Setup Information:
How did you install pyreadstat? (pip)
Platform (windows, 64 bit)
Python Version (3.7)
Python Distribution (plain python)
Using Virtualenv or condaenv? (No)
The text was updated successfully, but these errors were encountered: