Deprecate read_feather nthreads argument + update feather-format to pyarrow.feather #23112

ingwinlu · 2018-10-12T15:56:18Z

closes pandas/io/feather_format.py should call use_threads instead of nthreads to prevent breakage in pyarrow 0.11.0 #23053, closes DEPS/DEPR: Allow import of feather through pyarrow #21639
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-10-12T15:56:20Z

Hello @ingwinlu! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/io/feather_format.py !
There are no PEP8 issues in the file pandas/tests/io/test_feather.py !

Comment last updated on October 15, 2018 at 05:42 Hours UTC

jreback · 2018-10-12T20:00:11Z

this would need a whatsnew note
and i believe this is tested so pls update the tests and catch the warning

codecov · 2018-10-13T04:08:22Z

Codecov Report

Merging #23112 into master will increase coverage by <.01%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master   #23112      +/-   ##
==========================================
+ Coverage   92.21%   92.22%   +<.01%     
==========================================
  Files         161      161              
  Lines       51187    51191       +4     
==========================================
+ Hits        47202    47210       +8     
+ Misses       3985     3981       -4

Flag	Coverage Δ
#multiple	`90.6% <16.66%> (-0.05%)`	⬇️
#single	`42.26% <83.33%> (+0.04%)`	⬆️

Impacted Files	Coverage Δ
pandas/io/feather_format.py	`89.74% <83.33%> (+12.6%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9019582...e7d576d. Read the comment docs.

ingwinlu · 2018-10-13T04:21:26Z

I test that the warning is issued now. Also rebased and followed the commit message guidelines. Added an whatsnew entry as well.

jorisvandenbossche · 2018-10-13T08:40:39Z

Shouldn't it be replaced with use_threads ?

ingwinlu · 2018-10-13T10:58:01Z

this would require pinning pyarrow > 0.10.0 as a dependency

jreback · 2018-10-14T14:06:20Z

doc/source/whatsnew/v0.23.5.txt

@@ -33,6 +33,8 @@ Fixed Regressions
 Development
 ~~~~~~~~~~~
 - The minimum required pytest version has been increased to 3.6 (:issue:`22319`)
+- Deprecated the `nthreads` keyword of `pandas.read_feather()` in favor of


use the :func: reference here

jreback · 2018-10-14T14:06:38Z

pandas/io/feather_format.py

@@ -96,6 +101,11 @@ def read_feather(path, nthreads=1):
        Number of CPU threads to use when reading to pandas.DataFrame

       .. versionadded 0.21.0
+       .. deprecated 0.23.5


pandas/io/feather_format.py

ingwinlu · 2018-10-15T10:25:54Z

@jreback can you have another look?

TomAugspurger · 2018-10-15T10:52:41Z

pandas/io/feather_format.py

    path = _stringify_path(path)

-    if LooseVersion(feather.__version__) < LooseVersion('0.4.0'):
+    if LooseVersion(feather.__version__) < LooseVersion('0.4.0') or \


Style nit: start the line with parentheses instead of using a backslash to continue the line.

Do you have a suggestion for the alternative? It does not line up as nicely (since it would require an additional indent of the second line). Making it in my opinion harder to read.

if (LooseVersion(feather.__version__) < LooseVersion('0.4.0') or LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0')): return feather.read_dataframe(path)

TomAugspurger · 2018-10-15T10:54:16Z

doc/source/whatsnew/v0.24.0.txt

@@ -979,3 +979,5 @@ Other
 - :meth:`~pandas.io.formats.style.Styler.bar` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` and setting clipping range with ``vmin`` and ``vmax`` (:issue:`21548` and :issue:`21526`). ``NaN`` values are also handled properly.
 - Logical operations ``&, |, ^`` between :class:`Series` and :class:`Index` will no longer raise ``ValueError`` (:issue:`22092`)
 - Bug in :meth:`DataFrame.combine_first` in which column types were unexpectedly converted to float (:issue:`20699`)
+- Deprecated the `nthreads` keyword of :func:`pandas.read_feather()` in favor of


Don't need the parentheses in read_feather()

jreback

does this emit any warnings in the tests?

I would actually be ok with removing support for feather < 0.4.0 if it makes it easier here. you may actually need to do this as importing an older feather doesn't have pyarrow as a dep so the importer might fail.

ingwinlu · 2018-10-15T16:22:06Z

does this emit any warnings in the tests?

None, but ci does not test with feather 0.3.1 AFAIK.

@jreback can we directly depend on pyarrow without getting there via feather? That would make it simpler.

I am not sure if going over feather + pyarrow combinations and checking them all for how to call them is the way to go.

feather-format, 0.3.1 - no pyarrow - does not support any args
feather-format, 0.4.0 - pyarrow > 0.4.0 - does support nthreads
feather-format, 0.4.0 - pyarrow >= 0.11.0 - does not support nthreads, supports use_threads.

So the current implementation with import pyarrow should not be merged as it would not play nice in case someone is still on feather-format 0.3.1.

//edit: maybe ask feather to release a new version that pins to a higher pyarrow version...

jreback · 2018-10-17T13:00:00Z

@ingwinlu actually I will revise the though above. Am ok with dropping support for feather entirely and just using pyarrow here. Can you revise?

ingwinlu · 2018-10-17T13:08:37Z

Sure. Will probably work on it later today.

ingwinlu · 2018-10-17T16:34:34Z

@jreback do you want to depend on pyarrow 0.4.0 or go for 0.11.0 directly?

jreback · 2018-10-17T17:04:28Z

if u can use our min version of pyarrow (we use for parquet)

we could bump that slightly also but.l not past 0.8.0

ingwinlu · 2018-10-17T20:13:06Z

@jreback done. don't think the windows test fail is related to the changes in the PR.

ingwinlu · 2018-10-18T07:18:24Z

This is still missing the removal of feather-format refs in ci configs as well as some doc entry in source/io I missed.

h-vetinari · 2018-10-18T07:48:34Z

If dropping feather, will also close #21639

ingwinlu · 2018-10-18T13:58:07Z

I did not rewrite the io.feather section of the docs. I feel like the currently linked repository (feather) provides more information.

If you feel like it is necessary we can add a point to the caveats listed where we express that we directly require the upstream pyarrow library.

jreback · 2018-10-18T14:01:50Z

doc/source/whatsnew/v0.24.0.txt

@@ -1004,3 +1004,7 @@ Other
 - :meth:`~pandas.io.formats.style.Styler.bar` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` and setting clipping range with ``vmin`` and ``vmax`` (:issue:`21548` and :issue:`21526`). ``NaN`` values are also handled properly.
 - Logical operations ``&, |, ^`` between :class:`Series` and :class:`Index` will no longer raise ``ValueError`` (:issue:`22092`)
 - Bug in :meth:`DataFrame.combine_first` in which column types were unexpectedly converted to float (:issue:`20699`)
+- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
+  `use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
+- Drop `feather-format` as a dependency for feather based storage and use


move this to api-breaking changes section

jreback · 2018-10-18T14:41:52Z

doc/source/whatsnew/v0.24.0.txt

@@ -233,6 +233,10 @@ If installed, we now require:
 | scipy           | 0.18.1          |          |
 +-----------------+-----------------+----------+

+
+Additionally we no longer depend on `feather-format` for feather based storage
+and replaced it with references to `pyarrow`.


add the issue refernce here

jreback · 2018-10-22T18:07:20Z

going to merge after passing: #23281, which you will need to revert the xfails.

jreback · 2018-10-23T03:11:25Z

can you rebase this and check changes in #23281

ingwinlu · 2018-10-23T05:25:36Z

Rebased and reactivated part of the disabled tests from #23281. Did not check if the rest could also be resolved by the changes in this PR (parquet + gbq).

jreback

looks good. let's just add a prior to change pyarrow version to test (0.10.0), ping on green.

doc/source/whatsnew/v0.24.0.txt

ci/travis-36.yaml

jorisvandenbossche · 2018-10-23T21:54:28Z

doc/source/whatsnew/v0.24.0.txt

@@ -235,6 +235,11 @@ If installed, we now require:
 +-----------------+-----------------+----------+
 | scipy           | 0.18.1          |          |
 +-----------------+-----------------+----------+
+| pyarrow         | 0.4.1           |          |


This is currently already the minimal supported version no?

yes, but not if you don't have feather

@ingwinlu actually maybe take out pyarrow from the table, your comment below is fine

jorisvandenbossche · 2018-10-23T21:58:09Z

pandas/io/feather_format.py

-    if LooseVersion(feather.__version__) < LooseVersion('0.4.0'):
-        return feather.read_dataframe(path)
+    if LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0'):
+        return feather.read_feather(path)


we should still pass nthreads here?

@ingwinlu can you check this comment?

nthreads is not available anymore after the conversion in the wrapper function.

Then we need to do it in some other way, but not passing it here means breaking the functionality for people having pyarrow < 0.11 (which we still support)

@ingwinlu what you could do is remove the mapping part of the deprecation, then you can easily just pass the argument here as read_feather(path, nthreads=int(use_threads)) I think would work, then on the other branch, just pass read_feather(path, use_threads=bool(use_threads))

jreback · 2018-10-26T02:09:40Z

can you rebase and fixup?

ingwinlu · 2018-10-26T11:37:57Z

I can not replicate the

>   import pyarrow.compat as compat
E   AttributeError: module 'pyarrow' has no attribute 'compat'

issues locally. Any pointers?

TomAugspurger · 2018-10-26T20:51:22Z

Maybe a different version of pyarrow? https://travis-ci.org/pandas-dev/pandas/jobs/446626253#L2219

ingwinlu · 2018-10-27T06:05:56Z

I replicated the conda environment of that test run. parquet tests run correctly but feather tests are not run at all due to an import arrow on pyarrow:

Python 3.6.6 |Anaconda, Inc.| (default, Oct  9 2018, 12:34:16) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/winlu/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/__init__.py", line 60, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: /home/winlu/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.1: undefined symbol: _ZN5boost13match_resultsIN9__gnu_cxx17__normal_iteratorIPKcSsEESaINS_9sub_matchIS5_EEEE12maybe_assignERKS9_
>>>

~~Could that be some caching issue?~~

Was not using boost from conda-forge and hence had a missmatch in ABI's.

The other issue with compat seems to be related to apache/arrow#2634 which I will test now.

ingwinlu · 2018-10-27T08:50:03Z

@jreback should be good now.

jorisvandenbossche · 2018-10-27T09:00:22Z

pandas/io/feather_format.py

-    if LooseVersion(feather.__version__) < LooseVersion('0.4.0'):
-        return feather.read_dataframe(path)
+    if LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0'):
+        return feather.read_feather(path)


@ingwinlu can you check this comment?

jreback · 2018-10-28T03:32:21Z

doc/source/whatsnew/v0.24.0.txt

@@ -235,6 +235,11 @@ If installed, we now require:
 +-----------------+-----------------+----------+
 | scipy           | 0.18.1          |          |
 +-----------------+-----------------+----------+
+| pyarrow         | 0.4.1           |          |


@ingwinlu actually maybe take out pyarrow from the table, your comment below is fine

jreback · 2018-10-28T03:35:35Z

pandas/io/feather_format.py

-    if LooseVersion(feather.__version__) < LooseVersion('0.4.0'):
-        return feather.read_dataframe(path)
+    if LooseVersion(pyarrow.__version__) < LooseVersion('0.11.0'):
+        return feather.read_feather(path)


@ingwinlu what you could do is remove the mapping part of the deprecation, then you can easily just pass the argument here as read_feather(path, nthreads=int(use_threads)) I think would work, then on the other branch, just pass read_feather(path, use_threads=bool(use_threads))

ingwinlu · 2018-10-28T06:26:48Z

nthreads=0 makes some fun results >.>

jreback · 2018-10-28T13:47:26Z

@ingwinlu thanks for the responsiveness. @jorisvandenbossche over to you.

jreback · 2018-11-01T01:17:48Z

can you rebase once more.

ingwinlu · 2018-11-01T06:19:47Z

done

The nthreads argument is no longer supported since pyarrow 0.11.0 and was replaced with use_threads. Hence we deprecate the argument now as well so we can remove it in the future. This commit also: - removes feather-format as a dependency and replaces it with usage of pyarrow directly. - sets CI dependencies to respect the changes above. We test backwards compatibility with pyarrow 0.9.0 as conda does not provide a pyarrow 0.10.0 and the conda-forge version has comatibility issues with the rest of the installed packages. Resolves #23053. Resolves #21639.

jorisvandenbossche · 2018-11-01T07:53:36Z

pandas/io/feather_format.py


-    return feather.read_dataframe(path, nthreads=nthreads)
+    return feather.read_feather(path, use_threads=bool(use_threads))


I don't think this is fully correct. If someone did before nthreads=1 (which meant: no additional threads), this will be translated into use_threads=True.

Although, maybe that is not really a problem since the default in pyarrow also actually changed from nthreads=1 to use_threads=True

jorisvandenbossche · 2018-11-01T12:02:50Z

@ingwinlu Thanks a lot!

…xamples * repo_org/master: (66 commits) CLN: doc string (pandas-dev#23469) DOC: Add cookbook entry for triangular correlation matrix (GH22840) (pandas-dev#23032) add number of Errors, Warnings to scripts/validate_docstrings.py (pandas-dev#23150) BUG: Allow freq conversion from dt64 to period (pandas-dev#23460) ENH: Add FrozenList.union and .difference (pandas-dev#23394) REF: cython cleanup, typing, optimizations (pandas-dev#23464) strictness and checks for Timedelta _simple_new (pandas-dev#23433) Fixing flake8 problems new to flake8 3.6.0 (pandas-dev#23472) DOC: Updating the docstring of Series.dot (pandas-dev#22890) TST: Fixturize series/test_analytics.py (pandas-dev#22755) BUG/ENH: Handle NonexistentTimeError in date rounding (pandas-dev#23406) PERF: speed up concat on Series by making _get_axis_number() a classmethod (pandas-dev#23404) REF: Remove DatetimelikeArrayMixin._shallow_copy (pandas-dev#23430) REF: strictness/simplification in DatetimeArray/Index _simple_new (pandas-dev#23431) REF: cython cleanup, typing, optimizations (pandas-dev#23456) TST: tweak Hypothesis configuration and idioms (pandas-dev#23441) BUG: fix HDFStore.append with all empty strings error (GH12242) (pandas-dev#23435) TST: Skip 32bit failing IntervalTree tests (pandas-dev#23442) BUG: Deprecate nthreads argument (pandas-dev#23112) style: fix import format at pandas/core/reshape (pandas-dev#23387) ...

The nthreads argument is no longer supported since pyarrow 0.11.0 and was replaced with use_threads. Hence we deprecate the argument now as well so we can remove it in the future. This commit also: - removes feather-format as a dependency and replaces it with usage of pyarrow directly. - sets CI dependencies to respect the changes above. We test backwards compatibility with pyarrow 0.9.0 as conda does not provide a pyarrow 0.10.0 and the conda-forge version has comatibility issues with the rest of the installed packages. Resolves pandas-dev#23053. Resolves pandas-dev#21639.

jreback requested changes Oct 14, 2018

View reviewed changes

TomAugspurger reviewed Oct 15, 2018

View reviewed changes

jreback requested changes Oct 15, 2018

View reviewed changes

jreback added Deprecate Functionality to remove in pandas IO Data IO issues that don't fit into a more specific label labels Oct 15, 2018

jorisvandenbossche changed the title ~~Deprecate nthreads argument~~ Deprecate read_feather nthreads argument + update feather-format to pyarrow.feather Oct 18, 2018

jreback requested changes Oct 18, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Oct 18, 2018

jreback requested changes Oct 18, 2018

View reviewed changes

jbrockmendel mentioned this pull request Oct 21, 2018

CI: Fix pyarrow compat #23266

Closed

jreback requested changes Oct 23, 2018

View reviewed changes

doc/source/whatsnew/v0.24.0.txt Show resolved Hide resolved

ci/travis-36.yaml Show resolved Hide resolved

jorisvandenbossche reviewed Oct 23, 2018

View reviewed changes

jorisvandenbossche requested changes Oct 27, 2018

View reviewed changes

jreback requested changes Oct 28, 2018

View reviewed changes

jreback approved these changes Oct 28, 2018

View reviewed changes

jorisvandenbossche requested changes Nov 1, 2018

View reviewed changes

jorisvandenbossche approved these changes Nov 1, 2018

View reviewed changes

jorisvandenbossche merged commit 6b9318c into pandas-dev:master Nov 1, 2018

andersrmr mentioned this pull request Dec 4, 2018

pandas/io/feather_format.py should call use_threads instead of nthreads to prevent breakage in pyarrow 0.11.0 #23053

Closed

ingwinlu deleted the deprecate_nthreads branch April 27, 2021 14:29


		return feather.read_dataframe(path, nthreads=nthreads)
		return feather.read_feather(path, use_threads=bool(use_threads))

Deprecate read_feather nthreads argument + update feather-format to pyarrow.feather #23112

Deprecate read_feather nthreads argument + update feather-format to pyarrow.feather #23112

Conversation

ingwinlu commented Oct 12, 2018 • edited by jorisvandenbossche Loading

pep8speaks commented Oct 12, 2018 • edited Loading

Comment last updated on October 15, 2018 at 05:42 Hours UTC

jreback commented Oct 12, 2018

codecov bot commented Oct 13, 2018 • edited Loading

Codecov Report

ingwinlu commented Oct 13, 2018

jorisvandenbossche commented Oct 13, 2018

ingwinlu commented Oct 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ingwinlu commented Oct 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

ingwinlu commented Oct 15, 2018 • edited Loading

jreback commented Oct 17, 2018

ingwinlu commented Oct 17, 2018

ingwinlu commented Oct 17, 2018

jreback commented Oct 17, 2018 • edited Loading

ingwinlu commented Oct 17, 2018

ingwinlu commented Oct 18, 2018

h-vetinari commented Oct 18, 2018

ingwinlu commented Oct 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 22, 2018

jreback commented Oct 23, 2018

ingwinlu commented Oct 23, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 26, 2018

ingwinlu commented Oct 26, 2018

TomAugspurger commented Oct 26, 2018

ingwinlu commented Oct 27, 2018 • edited Loading

ingwinlu commented Oct 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ingwinlu commented Oct 28, 2018

jreback commented Oct 28, 2018

jreback commented Nov 1, 2018

ingwinlu commented Nov 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 1, 2018

ingwinlu commented Oct 12, 2018 •

edited by jorisvandenbossche

Loading

pep8speaks commented Oct 12, 2018 •

edited

Loading

codecov bot commented Oct 13, 2018 •

edited

Loading

ingwinlu commented Oct 15, 2018 •

edited

Loading

jreback commented Oct 17, 2018 •

edited

Loading

ingwinlu commented Oct 27, 2018 •

edited

Loading