-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Deprecate compact_ints and use_unsigned in read_csv #13323
API: Deprecate compact_ints and use_unsigned in read_csv #13323
Conversation
4924f90
to
0d56c09
Compare
FYI: the |
Current coverage is 84.22%@@ master #13323 diff @@
==========================================
Files 138 138
Lines 50713 50671 -42
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 42715 42676 -39
+ Misses 7998 7995 -3
Partials 0 0
|
This is fine to document these. In reality the parser should not be doing this, and these options just add to an already huge API. Better to have this an an option to I get that this makes things more 'automatic'. So I think that we should deprecate these for now. |
@jreback : No issue with simplifying the API, though it did seem like it was a somewhat decent memory-conscientious option. If it is deprecated, telling them to refer to |
@gfyoung you could simply add them as options to |
How would they be toggled from the parser API exactly? |
@gfyoung they wouldn't that why we deprecate :> though you could very easily. Move these to a post-processing step and simply call something like
and simply move the code This is indepedent of deprecation or not though. (e.g. you can fix this and can separately deprecate). |
Ah, I see. Will do that then. |
0d56c09
to
dff6a69
Compare
Actually, it's not as straightforward as it seems. |
dff6a69
to
a42eb7c
Compare
you can just do this is maybe_convert_numeric or equivalent |
a753b45
to
c0ff67e
Compare
@jreback : Refactored |
I don't think this works if the arr has any na_values in it |
@jreback : I literally moved the function from one location to another. If that's an issue, then it would have appeared in the parser as well. Will add a test though for that. |
@gfyoung no, if its by efinition int64 coming in, then |
@jreback : |
1d19e36
to
a1263d4
Compare
@gfyoung you are missing my point. int arrays don't support missing values (currently), so if you actually have a missing value then this must be converted to float which is impossible in the current impl. so they ARE superfluous. |
@jreback : That's not the point of this function at all. If it has missing values (i.e. |
@gfyoung then prove it by showing an example |
@jreback : I think my test cases speak for themselves. |
it's kind of like a fancy astype |
7467dda
to
48ab4db
Compare
Hmm...not sure why my test for the deprecation is failing on just one machine. What am I doing wrong here? It couldn't be this difficult to add, should it? |
@@ -76,6 +76,7 @@ Other enhancements | |||
|
|||
- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``decimal`` option (:issue:`12933`) | |||
- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``na_filter`` option (:issue:`13321`) | |||
- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``compact_ints`` and ``use_unsigned`` options (:issue:`13323`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit strange to announce this as a new feature, while it has been deprecated at the same time ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a fair point, but I did add support though. The exact direction of this PR is still somewhat in flux, and I'll adjust the whatsnew
once things have settled down a bit. More importantly, I would like to figure out why Travis is unhappy with me at the moment, as the tests pass on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, certainly fine to change things here in the end (and not sure why travis is failing, I also do not see something obvious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I agree here. Since we are deprecating this, I think that no need to announce it for python engine support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Will change as soon as I placate the Travis.
@jreback : any thoughts on why Travis is unhappy on one machine? Seems like me and @jorisvandenbossche are stumped at the moment |
Regarding the deprecation, I would be fine with that (I don't think many people already have heard of these keywords (in any case, I don't), although it would be nice if we could check this somewhere), but, it would be nice if there is then an alternative available (like the pd.to_numeric idea). I am only wondering if it is needed to add the documentation to the docstring when we are deprecating it at the same time (so we don't actually want to encourage people to use it, and the docstring is already really long) |
@jorisvandenbossche : yeah, me neither. I hadn't really heard of it until I started looking into what it did in the CParser. Surprising just how many options are undocumented for a function that works quite well for most everyday purposes such as my own. 😄 The |
}) | ||
|
||
# default behaviour for 'use_unsigned' | ||
out = self.read_csv(StringIO(data), compact_ints=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every one of these needs a tm.assert_produces_warning
block. The warning on travis is because of below when you DO try to catch the warning it has already been triggered (and hence doesn't show). These are somewhat hard to isolated. Usually just keep running smaller and smaller blocks of code to figure out where it is triggering the uncaught warning, then inspect the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I did not know that. Fixed.
I am ok with doc-stringing this (always helpful) and deprecating is fine with me. Whenever we add this to |
48ab4db
to
c51c5b6
Compare
c51c5b6
to
95f7ba8
Compare
@jreback @jorisvandenbossche : Travis is happy now, and I made the requested doc changes. Ready to merge if there are no other concerns. |
It still feels a bit strange to add (to the python parser) and doc a feature that we are deprecating directly :-), but OK to merge for me |
@jorisvandenbossche though this will prob be deprecated for a while... |
@gfyoung ty, pls also create an issue to add to |
Deprecated in v0.19.0 xref pandas-devgh-13323
Deprecated in v0.19.0 xref pandas-devgh-13323
* CLN: Drop compact_ints/use_unsigned from read_csv Deprecated in v0.19.0 xref gh-13323 * CLN: Remove downcast_int64 from inference.pyx It was only being used for the compact_ints and use_unsigned parameters in read_csv.
Title is self-explanatory.
xref #12686 - I don't quite understand why these are marked (if at all) as internal to the C engine only, as the benefits for having these options accepted for the Python engine is quite clear based on the documentation I added as well.
Implementation simply just calls the already-written function in
pandas/parsers.pyx
- as it isn't specific to theTextReader
class, crossing over to grab this function from Cython (instead of duplicating in pure Python) seems reasonable while maintaining that separation between the C and Python engines.