-
Notifications
You must be signed in to change notification settings - Fork 16
Slightly better Typechecking when exporting to SQL #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@s-kuberski should take a closer look at this, just a quick comment: Maybe |
Sure, could be good, I was not sure how to do that in SQL and JSON at the same time, but since the function is originally for JSON maybe that is a better idea. |
I'll have a look. It is certainly possible to use either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I think I handled this isn the way you meant, if not, could you give me a reference which lines/files do not work as you expect?
pyerrors/input/json.py
Outdated
@@ -479,8 +481,10 @@ def import_json_string(json_string, verbose=True, full_output=False): | |||
result : dict | |||
if full_output=True | |||
""" | |||
|
|||
return _parse_json_dict(json.loads(json_string), verbose, full_output) | |||
if json_string != "NONE": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current version, this is always evaluated to be True, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is true, do you think we should return None or leave it as NaN, as it currently is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't really get why changes in json.py
are necessary at all, as the routines work if the passed parameters have the correct form (A list of Obs, list, numpy.ndarray, Corr
for create_json_string
and a json
string with data for import_json_string
). In my opinion it would be better to call the functions only if the parameters have the correct format and to handle the other cases where they appear (in this case: when applying the functions to the data frames) by not calling the functions. This would circumvent putting these hacks in the json
functions.
tests/pandas_test.py
Outdated
assert np.all(reconstructed_df.loc[1]) == np.all(my_df.loc[1]) | ||
assert np.all(reconstructed_df.loc[3:]) == np.all(my_df.loc[3:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldnt we rather like to check that each of the entries matches between
my_dfand
reconstructed_df`?
Hi, |
Hi, |
Hi,
so the The question is a bit what kind of entries you like to allow and how to handle these cases. When I adapt your code such that I can export entries that contain |
Thank you for testing this. You are bringing up a good question, that I did not think about. |
I thought about this a little more, the alternative to only keeping one would again be to use some kind of token for the other case, so e.g. either |
Hi. |
So, I think now everything works as it should. There is only one inconsistency remaining that I can think of at the moment: |
The test fails because you confused input with output, I guess. The way it should work is that |
Hi, I'm not sure if I understand what you mean. Concerning your answer: you are right, but I think your way only works if the cell is not zipped. Otherwise there is an error in pandas.py line 151, as None cannot be encoded in utf-8. |
Thanks for fixing the remaining issues! The failure of the windows test seems to be related to a third-party library. |
Hi @fjosw, thanks for having an other look at this, there was in fact an oversight, where in some cases the zipping was not performed. This should be fixed now. The reason this didn't show up in the tests was because |
Is this ready to be merged or are you still testing the changes? |
Hi, I found one last bug, I think this should be save to merge into develop, I'll pull the develop branch into my analysis workflow and play around with it there. For now I cannot think of any test that would further improve the quality of the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Hi,
I had a problem where one of my pandas dfs was not correctly exported as the check for which datatype is in a column of a table was determined by the first entry in the column, which was "None" for me. I think @PiaLJP had a similar issue a few weeks ago.
Like this, at least someone can have a free cell in the first row and the DF will still be exported.
Maybe we find an even better way for this?
Specifically, I dont like my changes to input/json.py yet, but I am not sure how to improve that.
I'd be glad if someone could have a look at this.
I have also NOT tested the implications for the json export yet, as the tests sofar do not provide the utilities for this.