-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Deprecate read_feather nthreads argument + update feather-format to pyarrow.feather #23112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate read_feather nthreads argument + update feather-format to pyarrow.feather #23112
Conversation
Hello @ingwinlu! Thanks for updating the PR.
Comment last updated on October 15, 2018 at 05:42 Hours UTC |
this would need a whatsnew note |
Codecov Report
@@ Coverage Diff @@
## master #23112 +/- ##
==========================================
+ Coverage 92.21% 92.22% +<.01%
==========================================
Files 161 161
Lines 51187 51191 +4
==========================================
+ Hits 47202 47210 +8
+ Misses 3985 3981 -4
Continue to review full report at Codecov.
|
I test that the warning is issued now. Also rebased and followed the commit message guidelines. Added an whatsnew entry as well. |
Shouldn't it be replaced with |
this would require pinning pyarrow > 0.10.0 as a dependency |
doc/source/whatsnew/v0.23.5.txt
Outdated
@@ -33,6 +33,8 @@ Fixed Regressions | |||
Development | |||
~~~~~~~~~~~ | |||
- The minimum required pytest version has been increased to 3.6 (:issue:`22319`) | |||
- Deprecated the `nthreads` keyword of `pandas.read_feather()` in favor of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the :func: reference here
pandas/io/feather_format.py
Outdated
@@ -96,6 +101,11 @@ def read_feather(path, nthreads=1): | |||
Number of CPU threads to use when reading to pandas.DataFrame | |||
|
|||
.. versionadded 0.21.0 | |||
.. deprecated 0.23.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.24.0
@jreback can you have another look? |
pandas/io/feather_format.py
Outdated
path = _stringify_path(path) | ||
|
||
if LooseVersion(feather.__version__) < LooseVersion('0.4.0'): | ||
if LooseVersion(feather.__version__) < LooseVersion('0.4.0') or \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: start the line with parentheses instead of using a backslash to continue the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion for the alternative? It does not line up as nicely (since it would require an additional indent of the second line). Making it in my opinion harder to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (LooseVersion(feather.__version__) < LooseVersion('0.4.0') or
LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0')):
return feather.read_dataframe(path)
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -979,3 +979,5 @@ Other | |||
- :meth:`~pandas.io.formats.style.Styler.bar` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` and setting clipping range with ``vmin`` and ``vmax`` (:issue:`21548` and :issue:`21526`). ``NaN`` values are also handled properly. | |||
- Logical operations ``&, |, ^`` between :class:`Series` and :class:`Index` will no longer raise ``ValueError`` (:issue:`22092`) | |||
- Bug in :meth:`DataFrame.combine_first` in which column types were unexpectedly converted to float (:issue:`20699`) | |||
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather()` in favor of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the parentheses in read_feather()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this emit any warnings in the tests?
I would actually be ok with removing support for feather < 0.4.0 if it makes it easier here. you may actually need to do this as importing an older feather doesn't have pyarrow as a dep so the importer might fail.
None, but ci does not test with feather 0.3.1 AFAIK. @jreback can we directly depend on pyarrow without getting there via feather? That would make it simpler. I am not sure if going over feather + pyarrow combinations and checking them all for how to call them is the way to go. feather-format, 0.3.1 - no pyarrow - does not support any args So the current implementation with import pyarrow should not be merged as it would not play nice in case someone is still on feather-format 0.3.1. //edit: maybe ask feather to release a new version that pins to a higher pyarrow version... |
@ingwinlu actually I will revise the though above. Am ok with dropping support for feather entirely and just using pyarrow here. Can you revise? |
Sure. Will probably work on it later today. |
@jreback do you want to depend on pyarrow 0.4.0 or go for 0.11.0 directly? |
if u can use our min version of pyarrow (we use for parquet) we could bump that slightly also but.l not past 0.8.0 |
@jreback done. don't think the windows test fail is related to the changes in the PR. |
This is still missing the removal of feather-format refs in ci configs as well as some doc entry in source/io I missed. |
If dropping feather, will also close #21639 |
I did not rewrite the io.feather section of the docs. I feel like the currently linked repository (feather) provides more information. If you feel like it is necessary we can add a point to the caveats listed where we express that we directly require the upstream pyarrow library. |
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -1004,3 +1004,7 @@ Other | |||
- :meth:`~pandas.io.formats.style.Styler.bar` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` and setting clipping range with ``vmin`` and ``vmax`` (:issue:`21548` and :issue:`21526`). ``NaN`` values are also handled properly. | |||
- Logical operations ``&, |, ^`` between :class:`Series` and :class:`Index` will no longer raise ``ValueError`` (:issue:`22092`) | |||
- Bug in :meth:`DataFrame.combine_first` in which column types were unexpectedly converted to float (:issue:`20699`) | |||
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of | |||
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`) | |||
- Drop `feather-format` as a dependency for feather based storage and use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to api-breaking changes section
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -233,6 +233,10 @@ If installed, we now require: | |||
| scipy | 0.18.1 | | | |||
+-----------------+-----------------+----------+ | |||
|
|||
|
|||
Additionally we no longer depend on `feather-format` for feather based storage | |||
and replaced it with references to `pyarrow`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue refernce here
going to merge after passing: #23281, which you will need to revert the xfails. |
can you rebase this and check changes in #23281 |
Rebased and reactivated part of the disabled tests from #23281. Did not check if the rest could also be resolved by the changes in this PR (parquet + gbq). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. let's just add a prior to change pyarrow version to test (0.10.0), ping on green.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -235,6 +235,11 @@ If installed, we now require: | |||
+-----------------+-----------------+----------+ | |||
| scipy | 0.18.1 | | | |||
+-----------------+-----------------+----------+ | |||
| pyarrow | 0.4.1 | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently already the minimal supported version no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but not if you don't have feather
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingwinlu actually maybe take out pyarrow from the table, your comment below is fine
pandas/io/feather_format.py
Outdated
if LooseVersion(feather.__version__) < LooseVersion('0.4.0'): | ||
return feather.read_dataframe(path) | ||
if LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0'): | ||
return feather.read_feather(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should still pass nthreads
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingwinlu can you check this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nthreads is not available anymore after the conversion in the wrapper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need to do it in some other way, but not passing it here means breaking the functionality for people having pyarrow < 0.11 (which we still support)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingwinlu what you could do is remove the mapping
part of the deprecation, then you can easily just pass the argument here as read_feather(path, nthreads=int(use_threads))
I think would work, then on the other branch, just pass read_feather(path, use_threads=bool(use_threads))
can you rebase and fixup? |
I can not replicate the
issues locally. Any pointers? |
Maybe a different version of pyarrow? https://travis-ci.org/pandas-dev/pandas/jobs/446626253#L2219 |
I replicated the conda environment of that test run. parquet tests run correctly but feather tests are not run at all due to an import arrow on pyarrow:
Was not using boost from conda-forge and hence had a missmatch in ABI's. The other issue with compat seems to be related to apache/arrow#2634 which I will test now. |
@jreback should be good now. |
pandas/io/feather_format.py
Outdated
if LooseVersion(feather.__version__) < LooseVersion('0.4.0'): | ||
return feather.read_dataframe(path) | ||
if LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0'): | ||
return feather.read_feather(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingwinlu can you check this comment?
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -235,6 +235,11 @@ If installed, we now require: | |||
+-----------------+-----------------+----------+ | |||
| scipy | 0.18.1 | | | |||
+-----------------+-----------------+----------+ | |||
| pyarrow | 0.4.1 | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingwinlu actually maybe take out pyarrow from the table, your comment below is fine
pandas/io/feather_format.py
Outdated
if LooseVersion(feather.__version__) < LooseVersion('0.4.0'): | ||
return feather.read_dataframe(path) | ||
if LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0'): | ||
return feather.read_feather(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingwinlu what you could do is remove the mapping
part of the deprecation, then you can easily just pass the argument here as read_feather(path, nthreads=int(use_threads))
I think would work, then on the other branch, just pass read_feather(path, use_threads=bool(use_threads))
nthreads=0 makes some fun results >.> |
@ingwinlu thanks for the responsiveness. @jorisvandenbossche over to you. |
can you rebase once more. |
done |
The nthreads argument is no longer supported since pyarrow 0.11.0 and was replaced with use_threads. Hence we deprecate the argument now as well so we can remove it in the future. This commit also: - removes feather-format as a dependency and replaces it with usage of pyarrow directly. - sets CI dependencies to respect the changes above. We test backwards compatibility with pyarrow 0.9.0 as conda does not provide a pyarrow 0.10.0 and the conda-forge version has comatibility issues with the rest of the installed packages. Resolves #23053. Resolves #21639.
|
||
return feather.read_dataframe(path, nthreads=nthreads) | ||
return feather.read_feather(path, use_threads=bool(use_threads)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is fully correct. If someone did before nthreads=1
(which meant: no additional threads), this will be translated into use_threads=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, maybe that is not really a problem since the default in pyarrow also actually changed from nthreads=1
to use_threads=True
@ingwinlu Thanks a lot! |
…xamples * repo_org/master: (66 commits) CLN: doc string (pandas-dev#23469) DOC: Add cookbook entry for triangular correlation matrix (GH22840) (pandas-dev#23032) add number of Errors, Warnings to scripts/validate_docstrings.py (pandas-dev#23150) BUG: Allow freq conversion from dt64 to period (pandas-dev#23460) ENH: Add FrozenList.union and .difference (pandas-dev#23394) REF: cython cleanup, typing, optimizations (pandas-dev#23464) strictness and checks for Timedelta _simple_new (pandas-dev#23433) Fixing flake8 problems new to flake8 3.6.0 (pandas-dev#23472) DOC: Updating the docstring of Series.dot (pandas-dev#22890) TST: Fixturize series/test_analytics.py (pandas-dev#22755) BUG/ENH: Handle NonexistentTimeError in date rounding (pandas-dev#23406) PERF: speed up concat on Series by making _get_axis_number() a classmethod (pandas-dev#23404) REF: Remove DatetimelikeArrayMixin._shallow_copy (pandas-dev#23430) REF: strictness/simplification in DatetimeArray/Index _simple_new (pandas-dev#23431) REF: cython cleanup, typing, optimizations (pandas-dev#23456) TST: tweak Hypothesis configuration and idioms (pandas-dev#23441) BUG: fix HDFStore.append with all empty strings error (GH12242) (pandas-dev#23435) TST: Skip 32bit failing IntervalTree tests (pandas-dev#23442) BUG: Deprecate nthreads argument (pandas-dev#23112) style: fix import format at pandas/core/reshape (pandas-dev#23387) ...
The nthreads argument is no longer supported since pyarrow 0.11.0 and was replaced with use_threads. Hence we deprecate the argument now as well so we can remove it in the future. This commit also: - removes feather-format as a dependency and replaces it with usage of pyarrow directly. - sets CI dependencies to respect the changes above. We test backwards compatibility with pyarrow 0.9.0 as conda does not provide a pyarrow 0.10.0 and the conda-forge version has comatibility issues with the rest of the installed packages. Resolves pandas-dev#23053. Resolves pandas-dev#21639.
The nthreads argument is no longer supported since pyarrow 0.11.0 and was replaced with use_threads. Hence we deprecate the argument now as well so we can remove it in the future. This commit also: - removes feather-format as a dependency and replaces it with usage of pyarrow directly. - sets CI dependencies to respect the changes above. We test backwards compatibility with pyarrow 0.9.0 as conda does not provide a pyarrow 0.10.0 and the conda-forge version has comatibility issues with the rest of the installed packages. Resolves pandas-dev#23053. Resolves pandas-dev#21639.
The nthreads argument is no longer supported since pyarrow 0.11.0 and was replaced with use_threads. Hence we deprecate the argument now as well so we can remove it in the future. This commit also: - removes feather-format as a dependency and replaces it with usage of pyarrow directly. - sets CI dependencies to respect the changes above. We test backwards compatibility with pyarrow 0.9.0 as conda does not provide a pyarrow 0.10.0 and the conda-forge version has comatibility issues with the rest of the installed packages. Resolves pandas-dev#23053. Resolves pandas-dev#21639.
git diff upstream/master -u -- "*.py" | flake8 --diff