diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 21df1a3aacd59..faff68b636109 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -8,16 +8,16 @@ Our main contributing guide can be found [in this repo](https://github.com/panda If you are looking to contribute to the *pandas* codebase, the best place to start is the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues). This is also a great place for filing bug reports and making suggestions for ways in which we can improve the code and documentation. -If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#where-to-start)" section. +If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#where-to-start)" section. ## Filing Issues -If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#bug-reports-and-enhancement-requests)" section. +If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#bug-reports-and-enhancement-requests)" section. ## Contributing to the Codebase -The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#working-with-the-code)" section. +The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#working-with-the-code)" section. -Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#code-standards)" section. +Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#code-standards)" section. -Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! +Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! diff --git a/Makefile b/Makefile index d2bd067950fd0..956ff52338839 100644 --- a/Makefile +++ b/Makefile @@ -23,4 +23,3 @@ doc: cd doc; \ python make.py clean; \ python make.py html - python make.py spellcheck diff --git a/asv_bench/benchmarks/__init__.py b/asv_bench/benchmarks/__init__.py index e69de29bb2d1d..eada147852fe1 100644 --- a/asv_bench/benchmarks/__init__.py +++ b/asv_bench/benchmarks/__init__.py @@ -0,0 +1 @@ +"""Pandas benchmarks.""" diff --git a/asv_bench/benchmarks/algorithms.py b/asv_bench/benchmarks/algorithms.py index 34fb161e5afcb..74849d330f2bc 100644 --- a/asv_bench/benchmarks/algorithms.py +++ b/asv_bench/benchmarks/algorithms.py @@ -5,7 +5,6 @@ import pandas as pd from pandas.util import testing as tm - for imp in ['pandas.util', 'pandas.tools.hashing']: try: hashing = import_module(imp) @@ -142,4 +141,4 @@ def time_quantile(self, quantile, interpolation, dtype): self.idx.quantile(quantile, interpolation=interpolation) -from .pandas_vb_common import setup # noqa: F401 +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/groupby.py b/asv_bench/benchmarks/groupby.py index 59e43ee22afde..27d279bb90a31 100644 --- a/asv_bench/benchmarks/groupby.py +++ b/asv_bench/benchmarks/groupby.py @@ -14,7 +14,7 @@ method_blacklist = { 'object': {'median', 'prod', 'sem', 'cumsum', 'sum', 'cummin', 'mean', 'max', 'skew', 'cumprod', 'cummax', 'rank', 'pct_change', 'min', - 'var', 'mad', 'describe', 'std'}, + 'var', 'mad', 'describe', 'std', 'quantile'}, 'datetime': {'median', 'prod', 'sem', 'cumsum', 'sum', 'mean', 'skew', 'cumprod', 'cummax', 'pct_change', 'var', 'mad', 'describe', 'std'} @@ -316,8 +316,9 @@ class GroupByMethods(object): ['all', 'any', 'bfill', 'count', 'cumcount', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'ffill', 'first', 'head', 'last', 'mad', 'max', 'min', 'median', 'mean', 'nunique', - 'pct_change', 'prod', 'rank', 'sem', 'shift', 'size', 'skew', - 'std', 'sum', 'tail', 'unique', 'value_counts', 'var'], + 'pct_change', 'prod', 'quantile', 'rank', 'sem', 'shift', + 'size', 'skew', 'std', 'sum', 'tail', 'unique', 'value_counts', + 'var'], ['direct', 'transformation']] def setup(self, dtype, method, application): diff --git a/asv_bench/benchmarks/io/hdf.py b/asv_bench/benchmarks/io/hdf.py index f08904ba70a5f..a5dc28eb9508c 100644 --- a/asv_bench/benchmarks/io/hdf.py +++ b/asv_bench/benchmarks/io/hdf.py @@ -1,7 +1,5 @@ -import warnings - import numpy as np -from pandas import DataFrame, Panel, date_range, HDFStore, read_hdf +from pandas import DataFrame, date_range, HDFStore, read_hdf import pandas.util.testing as tm from ..pandas_vb_common import BaseIO @@ -99,31 +97,6 @@ def time_store_info(self): self.store.info() -class HDFStorePanel(BaseIO): - - def setup(self): - self.fname = '__test__.h5' - with warnings.catch_warnings(record=True): - self.p = Panel(np.random.randn(20, 1000, 25), - items=['Item%03d' % i for i in range(20)], - major_axis=date_range('1/1/2000', periods=1000), - minor_axis=['E%03d' % i for i in range(25)]) - self.store = HDFStore(self.fname) - self.store.append('p1', self.p) - - def teardown(self): - self.store.close() - self.remove(self.fname) - - def time_read_store_table_panel(self): - with warnings.catch_warnings(record=True): - self.store.select('p1') - - def time_write_store_table_panel(self): - with warnings.catch_warnings(record=True): - self.store.append('p2', self.p) - - class HDF(BaseIO): params = ['table', 'fixed'] diff --git a/asv_bench/benchmarks/series_methods.py b/asv_bench/benchmarks/series_methods.py index f7d0083b86a01..3303483c50e20 100644 --- a/asv_bench/benchmarks/series_methods.py +++ b/asv_bench/benchmarks/series_methods.py @@ -124,6 +124,25 @@ def time_dropna(self, dtype): self.s.dropna() +class SearchSorted(object): + + goal_time = 0.2 + params = ['int8', 'int16', 'int32', 'int64', + 'uint8', 'uint16', 'uint32', 'uint64', + 'float16', 'float32', 'float64', + 'str'] + param_names = ['dtype'] + + def setup(self, dtype): + N = 10**5 + data = np.array([1] * N + [2] * N + [3] * N).astype(dtype) + self.s = Series(data) + + def time_searchsorted(self, dtype): + key = '2' if dtype == 'str' else 2 + self.s.searchsorted(key) + + class Map(object): params = ['dict', 'Series'] diff --git a/asv_bench/benchmarks/strings.py b/asv_bench/benchmarks/strings.py index e9f2727f64e15..b5b2c955f0133 100644 --- a/asv_bench/benchmarks/strings.py +++ b/asv_bench/benchmarks/strings.py @@ -102,10 +102,10 @@ def setup(self, repeats): N = 10**5 self.s = Series(tm.makeStringIndex(N)) repeat = {'int': 1, 'array': np.random.randint(1, 3, N)} - self.repeat = repeat[repeats] + self.values = repeat[repeats] def time_repeat(self, repeats): - self.s.str.repeat(self.repeat) + self.s.str.repeat(self.values) class Cat(object): diff --git a/ci/code_checks.sh b/ci/code_checks.sh index c8bfc564e7573..c4840f1e836c4 100755 --- a/ci/code_checks.sh +++ b/ci/code_checks.sh @@ -93,7 +93,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then # this particular codebase (e.g. src/headers, src/klib, src/msgpack). However, # we can lint all header files since they aren't "generated" like C files are. MSG='Linting .c and .h' ; echo $MSG - cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime + cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime pandas/io/msgpack pandas/_libs/*.cpp pandas/util RET=$(($RET + $?)) ; echo $MSG "DONE" echo "isort --version-number" @@ -174,9 +174,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then MSG='Check that no file in the repo contains tailing whitespaces' ; echo $MSG set -o pipefail if [[ "$AZURE" == "true" ]]; then - ! grep -n --exclude="*.svg" -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}' + # we exclude all c/cpp files as the c/cpp files of pandas code base are tested when Linting .c and .h files + ! grep -n '--exclude=*.'{svg,c,cpp,html} -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}' else - ! grep -n --exclude="*.svg" -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}' + ! grep -n '--exclude=*.'{svg,c,cpp,html} -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}' fi RET=$(($RET + $?)) ; echo $MSG "DONE" fi @@ -206,7 +207,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then MSG='Doctests frame.py' ; echo $MSG pytest -q --doctest-modules pandas/core/frame.py \ - -k"-axes -combine -itertuples -join -pivot_table -query -reindex -reindex_axis -round" + -k" -itertuples -join -reindex -reindex_axis -round" RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Doctests series.py' ; echo $MSG @@ -240,8 +241,8 @@ fi ### DOCSTRINGS ### if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then - MSG='Validate docstrings (GL06, GL07, GL09, SS04, PR03, PR05, EX04)' ; echo $MSG - $BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,PR03,PR05,EX04 + MSG='Validate docstrings (GL06, GL07, GL09, SS04, SS05, PR03, PR04, PR05, PR10, EX04, RT04, RT05, SA05)' ; echo $MSG + $BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT04,RT05,SA05 RET=$(($RET + $?)) ; echo $MSG "DONE" fi diff --git a/ci/deps/azure-27-compat.yaml b/ci/deps/azure-27-compat.yaml index 8899e22bdf6cf..a7784f17d1956 100644 --- a/ci/deps/azure-27-compat.yaml +++ b/ci/deps/azure-27-compat.yaml @@ -18,8 +18,10 @@ dependencies: - xlsxwriter=0.5.2 - xlwt=0.7.5 # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock + - isort - pip: - html5lib==1.0b2 - beautifulsoup4==4.2.1 diff --git a/ci/deps/azure-27-locale.yaml b/ci/deps/azure-27-locale.yaml index 0846ef5e8264e..8636a63d02fed 100644 --- a/ci/deps/azure-27-locale.yaml +++ b/ci/deps/azure-27-locale.yaml @@ -20,9 +20,11 @@ dependencies: - xlsxwriter=0.5.2 - xlwt=0.7.5 # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - hypothesis>=3.58.0 + - isort - pip: - html5lib==1.0b2 - beautifulsoup4==4.2.1 diff --git a/ci/deps/azure-36-locale_slow.yaml b/ci/deps/azure-36-locale_slow.yaml index c7d2334623501..3f788e5ddcf39 100644 --- a/ci/deps/azure-36-locale_slow.yaml +++ b/ci/deps/azure-36-locale_slow.yaml @@ -26,8 +26,10 @@ dependencies: - xlsxwriter - xlwt # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - moto + - isort - pip: - hypothesis>=3.58.0 diff --git a/ci/deps/azure-37-locale.yaml b/ci/deps/azure-37-locale.yaml index b5a05c49b8083..9d598cddce91a 100644 --- a/ci/deps/azure-37-locale.yaml +++ b/ci/deps/azure-37-locale.yaml @@ -25,8 +25,10 @@ dependencies: - xlsxwriter - xlwt # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock + - isort - pip: - hypothesis>=3.58.0 - moto # latest moto in conda-forge fails with 3.7, move to conda dependencies when this is fixed diff --git a/ci/deps/azure-37-numpydev.yaml b/ci/deps/azure-37-numpydev.yaml index 99ae228f25de3..e58c1f599279c 100644 --- a/ci/deps/azure-37-numpydev.yaml +++ b/ci/deps/azure-37-numpydev.yaml @@ -6,9 +6,11 @@ dependencies: - pytz - Cython>=0.28.2 # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - hypothesis>=3.58.0 + - isort - pip: - "git+git://github.com/dateutil/dateutil.git" - "-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com" diff --git a/ci/deps/azure-macos-35.yaml b/ci/deps/azure-macos-35.yaml index 58abbabce3d86..2326e8092cc85 100644 --- a/ci/deps/azure-macos-35.yaml +++ b/ci/deps/azure-macos-35.yaml @@ -21,9 +21,11 @@ dependencies: - xlrd - xlsxwriter - xlwt - # universal - - pytest - - pytest-xdist + - isort - pip: - python-dateutil==2.5.3 + # universal + - pytest>=4.0.2 + - pytest-xdist + - pytest-mock - hypothesis>=3.58.0 diff --git a/ci/deps/azure-windows-27.yaml b/ci/deps/azure-windows-27.yaml index b1533b071fa74..f40efdfca3cbd 100644 --- a/ci/deps/azure-windows-27.yaml +++ b/ci/deps/azure-windows-27.yaml @@ -25,7 +25,9 @@ dependencies: - xlwt # universal - cython>=0.28.2 - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - moto - hypothesis>=3.58.0 + - isort diff --git a/ci/deps/azure-windows-36.yaml b/ci/deps/azure-windows-36.yaml index 7b132a134c44e..8517d340f2ba8 100644 --- a/ci/deps/azure-windows-36.yaml +++ b/ci/deps/azure-windows-36.yaml @@ -23,6 +23,8 @@ dependencies: - xlwt # universal - cython>=0.28.2 - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - hypothesis>=3.58.0 + - isort diff --git a/ci/deps/travis-27.yaml b/ci/deps/travis-27.yaml index 2624797b24fa1..a910af36a6b10 100644 --- a/ci/deps/travis-27.yaml +++ b/ci/deps/travis-27.yaml @@ -39,10 +39,12 @@ dependencies: - xlsxwriter=0.5.2 - xlwt=0.7.5 # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - moto==1.3.4 - hypothesis>=3.58.0 + - isort - pip: - backports.lzma - pandas-gbq diff --git a/ci/deps/travis-36-doc.yaml b/ci/deps/travis-36-doc.yaml index 26f3a17432ab2..6f33bc58a8b21 100644 --- a/ci/deps/travis-36-doc.yaml +++ b/ci/deps/travis-36-doc.yaml @@ -41,5 +41,6 @@ dependencies: - xlsxwriter - xlwt # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - isort diff --git a/ci/deps/travis-36-locale.yaml b/ci/deps/travis-36-locale.yaml index 2b38465c04512..34b289e6c0c2f 100644 --- a/ci/deps/travis-36-locale.yaml +++ b/ci/deps/travis-36-locale.yaml @@ -28,8 +28,10 @@ dependencies: - xlsxwriter - xlwt # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - moto + - isort - pip: - hypothesis>=3.58.0 diff --git a/ci/deps/travis-36-slow.yaml b/ci/deps/travis-36-slow.yaml index a6ffdb95e5e7c..46875d59411d9 100644 --- a/ci/deps/travis-36-slow.yaml +++ b/ci/deps/travis-36-slow.yaml @@ -25,7 +25,9 @@ dependencies: - xlsxwriter - xlwt # universal - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - moto - hypothesis>=3.58.0 + - isort diff --git a/ci/deps/travis-36.yaml b/ci/deps/travis-36.yaml index 74db888d588f4..06fc0d76a3d16 100644 --- a/ci/deps/travis-36.yaml +++ b/ci/deps/travis-36.yaml @@ -33,10 +33,12 @@ dependencies: - xlsxwriter - xlwt # universal - - pytest + - pytest>=4.0.2 - pytest-xdist - pytest-cov + - pytest-mock - hypothesis>=3.58.0 + - isort - pip: - brotlipy - coverage diff --git a/ci/deps/travis-37.yaml b/ci/deps/travis-37.yaml index c503124d8cd26..f71d29fe13378 100644 --- a/ci/deps/travis-37.yaml +++ b/ci/deps/travis-37.yaml @@ -12,9 +12,11 @@ dependencies: - nomkl - pyarrow - pytz - - pytest + - pytest>=4.0.2 - pytest-xdist + - pytest-mock - hypothesis>=3.58.0 - s3fs + - isort - pip: - moto diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf index 696ed288cf7a6..48da05d053b96 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf and b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx index f8b98a6f1f8e4..039b3898fa301 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx and b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf index daa65a944e68a..cf1e40e627f33 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf and b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx index 6270a71e20ee8..564d92ddbb56a 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx and b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx differ diff --git a/doc/make.py b/doc/make.py index 438c4a04a3f08..6ffbd3ef86e68 100755 --- a/doc/make.py +++ b/doc/make.py @@ -294,14 +294,16 @@ def main(): help='number of jobs used by sphinx-build') argparser.add_argument('--no-api', default=False, - help='ommit api and autosummary', + help='omit api and autosummary', action='store_true') argparser.add_argument('--single', metavar='FILENAME', type=str, default=None, - help=('filename of section or method name to ' - 'compile, e.g. "indexing", "DataFrame.join"')) + help=('filename (relative to the "source" folder)' + ' of section or method name to compile, e.g. ' + '"development/contributing.rst",' + ' "ecosystem.rst", "pandas.DataFrame.join"')) argparser.add_argument('--python-path', type=str, default=os.path.dirname(DOC_PATH), @@ -323,7 +325,7 @@ def main(): # the import of `python_path` correctly. The latter is used to resolve # the import within the module, injecting it into the global namespace os.environ['PYTHONPATH'] = args.python_path - sys.path.append(args.python_path) + sys.path.insert(0, args.python_path) globals()['pandas'] = importlib.import_module('pandas') # Set the matplotlib backend to the non-interactive Agg backend for all diff --git a/doc/source/conf.py b/doc/source/conf.py index 776b1bfa7bdd7..c59d28a6dc3ea 100644 --- a/doc/source/conf.py +++ b/doc/source/conf.py @@ -98,9 +98,9 @@ if (fname == 'index.rst' and os.path.abspath(dirname) == source_path): continue - elif pattern == '-api' and dirname == 'api': + elif pattern == '-api' and dirname == 'reference': exclude_patterns.append(fname) - elif fname != pattern: + elif pattern != '-api' and fname != pattern: exclude_patterns.append(fname) with open(os.path.join(source_path, 'index.rst.template')) as f: diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index c9d6845107dfc..434df772ae9d1 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -54,7 +54,7 @@ Bug reports must: ... ``` -#. Include the full version string of *pandas* and its dependencies. You can use the built in function:: +#. Include the full version string of *pandas* and its dependencies. You can use the built-in function:: >>> import pandas as pd >>> pd.show_versions() @@ -178,6 +178,7 @@ We'll now kick off a three-step process: # Create and activate the build environment conda env create -f environment.yml conda activate pandas-dev + conda uninstall --force pandas # or with older versions of Anaconda: source activate pandas-dev @@ -211,7 +212,7 @@ See the full conda docs `here `__. Creating a Python Environment (pip) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If you aren't using conda for you development environment, follow these instructions. +If you aren't using conda for your development environment, follow these instructions. You'll need to have at least python3.5 installed on your system. .. code-block:: none @@ -428,14 +429,14 @@ reducing the turn-around time for checking your changes. python make.py clean python make.py --no-api - # compile the docs with only a single - # section, that which is in indexing.rst + # compile the docs with only a single section, relative to the "source" folder. + # For example, compiling only this guide (docs/source/development/contributing.rst) python make.py clean - python make.py --single indexing + python make.py --single development/contributing.rst # compile the reference docs for a single function python make.py clean - python make.py --single DataFrame.join + python make.py --single pandas.DataFrame.join For comparison, a full documentation build may take 15 minutes, but a single section may take 15 seconds. Subsequent builds, which only process portions @@ -484,7 +485,7 @@ contributing them to the project:: ./ci/code_checks.sh -The script verify the linting of code files, it looks for common mistake patterns +The script verifies the linting of code files, it looks for common mistake patterns (like missing spaces around sphinx directives that make the documentation not being rendered properly) and it also validates the doctests. It is possible to run the checks independently by using the parameters ``lint``, ``patterns`` and @@ -675,7 +676,7 @@ Otherwise, you need to do it manually: You'll also need to -1. write a new test that asserts a warning is issued when calling with the deprecated argument +1. Write a new test that asserts a warning is issued when calling with the deprecated argument 2. Update all of pandas existing tests and code to use the new argument See :ref:`contributing.warnings` for more. @@ -731,7 +732,7 @@ extensions in `numpy.testing .. note:: - The earliest supported pytest version is 3.6.0. + The earliest supported pytest version is 4.0.2. Writing tests ~~~~~~~~~~~~~ diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index e6928d9efde06..9e5034f6d3db0 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -33,8 +33,9 @@ decorate a class, providing the name of attribute to add. The class's @staticmethod def _validate(obj): - if 'lat' not in obj.columns or 'lon' not in obj.columns: - raise AttributeError("Must have 'lat' and 'lon'.") + # verify there is a column latitude and a column longitude + if 'latitude' not in obj.columns or 'longitude' not in obj.columns: + raise AttributeError("Must have 'latitude' and 'longitude'.") @property def center(self): diff --git a/doc/source/getting_started/basics.rst b/doc/source/getting_started/basics.rst index 02cbc7e2c3b6d..bbec7b5de1d2e 100644 --- a/doc/source/getting_started/basics.rst +++ b/doc/source/getting_started/basics.rst @@ -505,7 +505,7 @@ So, for instance, to reproduce :meth:`~DataFrame.combine_first` as above: .. ipython:: python def combiner(x, y): - np.where(pd.isna(x), y, x) + return np.where(pd.isna(x), y, x) df1.combine(df2, combiner) .. _basics.stats: diff --git a/doc/source/install.rst b/doc/source/install.rst index 92364fcc9ebd2..5310667c403e5 100644 --- a/doc/source/install.rst +++ b/doc/source/install.rst @@ -202,7 +202,7 @@ pandas is equipped with an exhaustive set of unit tests, covering about 97% of the code base as of this writing. To run it on your machine to verify that everything is working (and that you have all of the dependencies, soft and hard, installed), make sure you have `pytest -`__ >= 3.6 and `Hypothesis +`__ >= 4.0.2 and `Hypothesis `__ >= 3.58, then run: :: diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst index 1dc74ad83b7e6..a129b75636536 100644 --- a/doc/source/reference/arrays.rst +++ b/doc/source/reference/arrays.rst @@ -120,6 +120,7 @@ Methods Timestamp.timetuple Timestamp.timetz Timestamp.to_datetime64 + Timestamp.to_numpy Timestamp.to_julian_date Timestamp.to_period Timestamp.to_pydatetime @@ -191,6 +192,7 @@ Methods Timedelta.round Timedelta.to_pytimedelta Timedelta.to_timedelta64 + Timedelta.to_numpy Timedelta.total_seconds A collection of timedeltas may be stored in a :class:`TimedeltaArray`. diff --git a/doc/source/reference/groupby.rst b/doc/source/reference/groupby.rst index 6ed85ff2fac43..c7f9113b53c22 100644 --- a/doc/source/reference/groupby.rst +++ b/doc/source/reference/groupby.rst @@ -99,6 +99,7 @@ application to columns of a specific data type. DataFrameGroupBy.idxmax DataFrameGroupBy.idxmin DataFrameGroupBy.mad + DataFrameGroupBy.nunique DataFrameGroupBy.pct_change DataFrameGroupBy.plot DataFrameGroupBy.quantile diff --git a/doc/source/reference/series.rst b/doc/source/reference/series.rst index a6ac40b5203bf..b406893e3414a 100644 --- a/doc/source/reference/series.rst +++ b/doc/source/reference/series.rst @@ -409,6 +409,7 @@ strings and apply several methods to it. These can be accessed like :template: autosummary/accessor_method.rst Series.str.capitalize + Series.str.casefold Series.str.cat Series.str.center Series.str.contains diff --git a/doc/source/styled.xlsx b/doc/source/styled.xlsx new file mode 100644 index 0000000000000..1233ff2b8692b Binary files /dev/null and b/doc/source/styled.xlsx differ diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst index 953f40d1afebe..e4dd82afcdf65 100644 --- a/doc/source/user_guide/groupby.rst +++ b/doc/source/user_guide/groupby.rst @@ -15,7 +15,7 @@ steps: Out of these, the split step is the most straightforward. In fact, in many situations we may wish to split the data set into groups and do something with -those groups. In the apply step, we might wish to one of the +those groups. In the apply step, we might wish to do one of the following: * **Aggregation**: compute a summary statistic (or statistics) for each @@ -1317,7 +1317,7 @@ arbitrary function, for example: df.groupby(['Store', 'Product']).pipe(mean) where ``mean`` takes a GroupBy object and finds the mean of the Revenue and Quantity -columns repectively for each Store-Product combination. The ``mean`` function can +columns respectively for each Store-Product combination. The ``mean`` function can be any function that takes in a GroupBy object; the ``.pipe`` will pass the GroupBy object as a parameter into the function you specify. diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index be1745e2664a1..00d4dc9efc8cc 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -435,7 +435,7 @@ Selection By Position This is sometimes called ``chained assignment`` and should be avoided. See :ref:`Returning a View versus Copy `. -Pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely Python and NumPy slicing. These are ``0-based`` indexing. When slicing, the start bounds is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``. +Pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely Python and NumPy slicing. These are ``0-based`` indexing. When slicing, the start bound is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``. The ``.iloc`` attribute is the primary access method. The following are valid inputs: @@ -545,7 +545,7 @@ Selection By Callable .. versionadded:: 0.18.1 ``.loc``, ``.iloc``, and also ``[]`` indexing can accept a ``callable`` as indexer. -The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing. +The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) that returns valid output for indexing. .. ipython:: python @@ -569,7 +569,7 @@ You can use callable indexing in ``Series``. df1.A.loc[lambda s: s > 0] Using these methods / indexers, you can chain data selection operations -without using temporary variable. +without using a temporary variable. .. ipython:: python @@ -907,7 +907,7 @@ of the DataFrame): df[df['A'] > 0] -List comprehensions and ``map`` method of Series can also be used to produce +List comprehensions and the ``map`` method of Series can also be used to produce more complex criteria: .. ipython:: python @@ -1556,7 +1556,7 @@ See :ref:`Advanced Indexing ` for usage of MultiIndexes. ind ``set_names``, ``set_levels``, and ``set_codes`` also take an optional -`level`` argument +``level`` argument .. ipython:: python diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 58e1b2370c7c8..b23a0f10e9e2b 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -989,6 +989,36 @@ a single date rather than the entire array. os.remove('tmp.csv') + +.. _io.csv.mixed_timezones: + +Parsing a CSV with mixed Timezones +++++++++++++++++++++++++++++++++++ + +Pandas cannot natively represent a column or index with mixed timezones. If your CSV +file contains columns with a mixture of timezones, the default result will be +an object-dtype column with strings, even with ``parse_dates``. + + +.. ipython:: python + + content = """\ + a + 2000-01-01T00:00:00+05:00 + 2000-01-01T00:00:00+06:00""" + df = pd.read_csv(StringIO(content), parse_dates=['a']) + df['a'] + +To parse the mixed-timezone values as a datetime column, pass a partially-applied +:func:`to_datetime` with ``utc=True`` as the ``date_parser``. + +.. ipython:: python + + df = pd.read_csv(StringIO(content), parse_dates=['a'], + date_parser=lambda col: pd.to_datetime(col, utc=True)) + df['a'] + + .. _io.dayfirst: diff --git a/doc/source/user_guide/missing_data.rst b/doc/source/user_guide/missing_data.rst index a462f01dcd14f..7883814e91c94 100644 --- a/doc/source/user_guide/missing_data.rst +++ b/doc/source/user_guide/missing_data.rst @@ -335,7 +335,7 @@ examined :ref:`in the API `. Interpolation ~~~~~~~~~~~~~ -.. versionadded:: 0.21.0 +.. versionadded:: 0.23.0 The ``limit_area`` keyword argument was added. diff --git a/doc/source/user_guide/text.rst b/doc/source/user_guide/text.rst index e4f60a761750d..6f21a7d9beb36 100644 --- a/doc/source/user_guide/text.rst +++ b/doc/source/user_guide/text.rst @@ -600,6 +600,7 @@ Method Summary :meth:`~Series.str.partition`;Equivalent to ``str.partition`` :meth:`~Series.str.rpartition`;Equivalent to ``str.rpartition`` :meth:`~Series.str.lower`;Equivalent to ``str.lower`` + :meth:`~Series.str.casefold`;Equivalent to ``str.casefold`` :meth:`~Series.str.upper`;Equivalent to ``str.upper`` :meth:`~Series.str.find`;Equivalent to ``str.find`` :meth:`~Series.str.rfind`;Equivalent to ``str.rfind`` diff --git a/doc/source/user_guide/timeseries.rst b/doc/source/user_guide/timeseries.rst index f56ad710973dd..4e2c428415926 100644 --- a/doc/source/user_guide/timeseries.rst +++ b/doc/source/user_guide/timeseries.rst @@ -321,6 +321,15 @@ which can be specified. These are computed from the starting point specified by pd.to_datetime([1349720105100, 1349720105200, 1349720105300, 1349720105400, 1349720105500], unit='ms') +Constructing a :class:`Timestamp` or :class:`DatetimeIndex` with an epoch timestamp +with the ``tz`` argument specified will localize the epoch timestamps to UTC +first then convert the result to the specified time zone. + +.. ipython:: python + + pd.Timestamp(1262347200000000000, tz='US/Pacific') + pd.DatetimeIndex([1262347200000000000], tz='US/Pacific') + .. note:: Epoch times will be rounded to the nearest nanosecond. @@ -624,6 +633,16 @@ We are stopping on the included end-point as it is part of the index: dft2 = dft2.swaplevel(0, 1).sort_index() dft2.loc[idx[:, '2013-01-05'], :] +.. versionadded:: 0.25.0 + +Slicing with string indexing also honors UTC offset. + +.. ipython:: python + + df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific')) + df + df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00'] + .. _timeseries.slice_vs_exact_match: Slice vs. Exact Match @@ -2129,11 +2148,13 @@ These can easily be converted to a ``PeriodIndex``: Time Zone Handling ------------------ -Pandas provides rich support for working with timestamps in different time -zones using ``pytz`` and ``dateutil`` libraries. ``dateutil`` currently is only -supported for fixed offset and tzfile zones. The default library is ``pytz``. -Support for ``dateutil`` is provided for compatibility with other -applications e.g. if you use ``dateutil`` in other Python packages. +pandas provides rich support for working with timestamps in different time +zones using the ``pytz`` and ``dateutil`` libraries. + +.. note:: + + pandas does not yet support ``datetime.timezone`` objects from the standard + library. Working with Time Zones ~~~~~~~~~~~~~~~~~~~~~~~ @@ -2145,13 +2166,16 @@ By default, pandas objects are time zone unaware: rng = pd.date_range('3/6/2012 00:00', periods=15, freq='D') rng.tz is None -To supply the time zone, you can use the ``tz`` keyword to ``date_range`` and -other functions. Dateutil time zone strings are distinguished from ``pytz`` -time zones by starting with ``dateutil/``. +To localize these dates to a time zone (assign a particular time zone to a naive date), +you can use the ``tz_localize`` method or the ``tz`` keyword argument in +:func:`date_range`, :class:`Timestamp`, or :class:`DatetimeIndex`. +You can either pass ``pytz`` or ``dateutil`` time zone objects or Olson time zone database strings. +Olson time zone strings will return ``pytz`` time zone objects by default. +To return ``dateutil`` time zone objects, append ``dateutil/`` before the string. * In ``pytz`` you can find a list of common (and less common) time zones using ``from pytz import common_timezones, all_timezones``. -* ``dateutil`` uses the OS timezones so there isn't a fixed list available. For +* ``dateutil`` uses the OS time zones so there isn't a fixed list available. For common zones, the names are the same as ``pytz``. .. ipython:: python @@ -2159,23 +2183,23 @@ time zones by starting with ``dateutil/``. import dateutil # pytz - rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D', + rng_pytz = pd.date_range('3/6/2012 00:00', periods=3, freq='D', tz='Europe/London') rng_pytz.tz # dateutil - rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D', - tz='dateutil/Europe/London') + rng_dateutil = pd.date_range('3/6/2012 00:00', periods=3, freq='D') + rng_dateutil = rng_dateutil.tz_localize('dateutil/Europe/London') rng_dateutil.tz # dateutil - utc special case - rng_utc = pd.date_range('3/6/2012 00:00', periods=10, freq='D', + rng_utc = pd.date_range('3/6/2012 00:00', periods=3, freq='D', tz=dateutil.tz.tzutc()) rng_utc.tz -Note that the ``UTC`` timezone is a special case in ``dateutil`` and should be constructed explicitly -as an instance of ``dateutil.tz.tzutc``. You can also construct other timezones explicitly first, -which gives you more control over which time zone is used: +Note that the ``UTC`` time zone is a special case in ``dateutil`` and should be constructed explicitly +as an instance of ``dateutil.tz.tzutc``. You can also construct other time +zones objects explicitly first. .. ipython:: python @@ -2183,56 +2207,61 @@ which gives you more control over which time zone is used: # pytz tz_pytz = pytz.timezone('Europe/London') - rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D', - tz=tz_pytz) + rng_pytz = pd.date_range('3/6/2012 00:00', periods=3, freq='D') + rng_pytz = rng_pytz.tz_localize(tz_pytz) rng_pytz.tz == tz_pytz # dateutil tz_dateutil = dateutil.tz.gettz('Europe/London') - rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D', + rng_dateutil = pd.date_range('3/6/2012 00:00', periods=3, freq='D', tz=tz_dateutil) rng_dateutil.tz == tz_dateutil -Timestamps, like Python's ``datetime.datetime`` object can be either time zone -naive or time zone aware. Naive time series and ``DatetimeIndex`` objects can be -*localized* using ``tz_localize``: +To convert a time zone aware pandas object from one time zone to another, +you can use the ``tz_convert`` method. .. ipython:: python - ts = pd.Series(np.random.randn(len(rng)), rng) + rng_pytz.tz_convert('US/Eastern') - ts_utc = ts.tz_localize('UTC') - ts_utc +.. note:: -Again, you can explicitly construct the timezone object first. -You can use the ``tz_convert`` method to convert pandas objects to convert -tz-aware data to another time zone: + When using ``pytz`` time zones, :class:`DatetimeIndex` will construct a different + time zone object than a :class:`Timestamp` for the same time zone input. A :class:`DatetimeIndex` + can hold a collection of :class:`Timestamp` objects that may have different UTC offsets and cannot be + succinctly represented by one ``pytz`` time zone instance while one :class:`Timestamp` + represents one point in time with a specific UTC offset. -.. ipython:: python + .. ipython:: python - ts_utc.tz_convert('US/Eastern') + dti = pd.date_range('2019-01-01', periods=3, freq='D', tz='US/Pacific') + dti.tz + ts = pd.Timestamp('2019-01-01', tz='US/Pacific') + ts.tz .. warning:: - Be wary of conversions between libraries. For some zones ``pytz`` and ``dateutil`` have different - definitions of the zone. This is more of a problem for unusual timezones than for + Be wary of conversions between libraries. For some time zones, ``pytz`` and ``dateutil`` have different + definitions of the zone. This is more of a problem for unusual time zones than for 'standard' zones like ``US/Eastern``. .. warning:: - Be aware that a timezone definition across versions of timezone libraries may not - be considered equal. This may cause problems when working with stored data that - is localized using one version and operated on with a different version. - See :ref:`here` for how to handle such a situation. + Be aware that a time zone definition across versions of time zone libraries may not + be considered equal. This may cause problems when working with stored data that + is localized using one version and operated on with a different version. + See :ref:`here` for how to handle such a situation. .. warning:: - It is incorrect to pass a timezone directly into the ``datetime.datetime`` constructor (e.g., - ``datetime.datetime(2011, 1, 1, tz=timezone('US/Eastern'))``. Instead, the datetime - needs to be localized using the localize method on the timezone. + For ``pytz`` time zones, it is incorrect to pass a time zone object directly into + the ``datetime.datetime`` constructor + (e.g., ``datetime.datetime(2011, 1, 1, tz=pytz.timezone('US/Eastern'))``. + Instead, the datetime needs to be localized using the ``localize`` method + on the ``pytz`` time zone object. -Under the hood, all timestamps are stored in UTC. Scalar values from a -``DatetimeIndex`` with a time zone will have their fields (day, hour, minute) +Under the hood, all timestamps are stored in UTC. Values from a time zone aware +:class:`DatetimeIndex` or :class:`Timestamp` will have their fields (day, hour, minute, etc.) localized to the time zone. However, timestamps with the same UTC value are still considered to be equal even if they are in different time zones: @@ -2241,51 +2270,35 @@ still considered to be equal even if they are in different time zones: rng_eastern = rng_utc.tz_convert('US/Eastern') rng_berlin = rng_utc.tz_convert('Europe/Berlin') - rng_eastern[5] - rng_berlin[5] - rng_eastern[5] == rng_berlin[5] - -Like ``Series``, ``DataFrame``, and ``DatetimeIndex``; ``Timestamp`` objects -can be converted to other time zones using ``tz_convert``: - -.. ipython:: python - - rng_eastern[5] - rng_berlin[5] - rng_eastern[5].tz_convert('Europe/Berlin') + rng_eastern[2] + rng_berlin[2] + rng_eastern[2] == rng_berlin[2] -Localization of ``Timestamp`` functions just like ``DatetimeIndex`` and ``Series``: - -.. ipython:: python - - rng[5] - rng[5].tz_localize('Asia/Shanghai') - - -Operations between ``Series`` in different time zones will yield UTC -``Series``, aligning the data on the UTC timestamps: +Operations between :class:`Series` in different time zones will yield UTC +:class:`Series`, aligning the data on the UTC timestamps: .. ipython:: python + ts_utc = pd.Series(range(3), pd.date_range('20130101', periods=3, tz='UTC')) eastern = ts_utc.tz_convert('US/Eastern') berlin = ts_utc.tz_convert('Europe/Berlin') result = eastern + berlin result result.index -To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or ``tz_convert(None)``. -``tz_localize(None)`` will remove timezone holding local time representations. -``tz_convert(None)`` will remove timezone after converting to UTC time. +To remove time zone information, use ``tz_localize(None)`` or ``tz_convert(None)``. +``tz_localize(None)`` will remove the time zone yielding the local time representation. +``tz_convert(None)`` will remove the time zone after converting to UTC time. .. ipython:: python didx = pd.date_range(start='2014-08-01 09:00', freq='H', - periods=10, tz='US/Eastern') + periods=3, tz='US/Eastern') didx didx.tz_localize(None) didx.tz_convert(None) - # tz_convert(None) is identical with tz_convert('UTC').tz_localize(None) + # tz_convert(None) is identical to tz_convert('UTC').tz_localize(None) didx.tz_convert('UTC').tz_localize(None) .. _timeseries.timezone_ambiguous: @@ -2293,54 +2306,34 @@ To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or Ambiguous Times when Localizing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In some cases, localize cannot determine the DST and non-DST hours when there are -duplicates. This often happens when reading files or database records that simply -duplicate the hours. Passing ``ambiguous='infer'`` into ``tz_localize`` will -attempt to determine the right offset. Below the top example will fail as it -contains ambiguous times and the bottom will infer the right offset. +``tz_localize`` may not be able to determine the UTC offset of a timestamp +because daylight savings time (DST) in a local time zone causes some times to occur +twice within one day ("clocks fall back"). The following options are available: + +* ``'raise'``: Raises a ``pytz.AmbiguousTimeError`` (the default behavior) +* ``'infer'``: Attempt to determine the correct offset base on the monotonicity of the timestamps +* ``'NaT'``: Replaces ambiguous times with ``NaT`` +* ``bool``: ``True`` represents a DST time, ``False`` represents non-DST time. An array-like of ``bool`` values is supported for a sequence of times. .. ipython:: python rng_hourly = pd.DatetimeIndex(['11/06/2011 00:00', '11/06/2011 01:00', - '11/06/2011 01:00', '11/06/2011 02:00', - '11/06/2011 03:00']) + '11/06/2011 01:00', '11/06/2011 02:00']) -This will fail as there are ambiguous times +This will fail as there are ambiguous times (``'11/06/2011 01:00'``) .. code-block:: ipython In [2]: rng_hourly.tz_localize('US/Eastern') AmbiguousTimeError: Cannot infer dst time from Timestamp('2011-11-06 01:00:00'), try using the 'ambiguous' argument -Infer the ambiguous times - -.. ipython:: python - - rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', ambiguous='infer') - rng_hourly_eastern.to_list() - -In addition to 'infer', there are several other arguments supported. Passing -an array-like of bools or 0s/1s where True represents a DST hour and False a -non-DST hour, allows for distinguishing more than one DST -transition (e.g., if you have multiple records in a database each with their -own DST transition). Or passing 'NaT' will fill in transition times -with not-a-time values. These methods are available in the ``DatetimeIndex`` -constructor as well as ``tz_localize``. +Handle these ambiguous times by specifying the following. .. ipython:: python - rng_hourly_dst = np.array([1, 1, 0, 0, 0]) - rng_hourly.tz_localize('US/Eastern', ambiguous=rng_hourly_dst).to_list() - rng_hourly.tz_localize('US/Eastern', ambiguous='NaT').to_list() - - didx = pd.date_range(start='2014-08-01 09:00', freq='H', - periods=10, tz='US/Eastern') - didx - didx.tz_localize(None) - didx.tz_convert(None) - - # tz_convert(None) is identical with tz_convert('UTC').tz_localize(None) - didx.tz_convert('UCT').tz_localize(None) + rng_hourly.tz_localize('US/Eastern', ambiguous='infer') + rng_hourly.tz_localize('US/Eastern', ambiguous='NaT') + rng_hourly.tz_localize('US/Eastern', ambiguous=[True, True, False, False]) .. _timeseries.timezone_nonexistent: @@ -2348,7 +2341,7 @@ Nonexistent Times when Localizing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A DST transition may also shift the local time ahead by 1 hour creating nonexistent -local times. The behavior of localizing a timeseries with nonexistent times +local times ("clocks spring forward"). The behavior of localizing a timeseries with nonexistent times can be controlled by the ``nonexistent`` argument. The following options are available: * ``'raise'``: Raises a ``pytz.NonExistentTimeError`` (the default behavior) @@ -2382,58 +2375,61 @@ Transform nonexistent times to ``NaT`` or shift the times. .. _timeseries.timezone_series: -TZ Aware Dtypes -~~~~~~~~~~~~~~~ +Time Zone Series Operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~ -``Series/DatetimeIndex`` with a timezone **naive** value are represented with a dtype of ``datetime64[ns]``. +A :class:`Series` with time zone **naive** values is +represented with a dtype of ``datetime64[ns]``. .. ipython:: python s_naive = pd.Series(pd.date_range('20130101', periods=3)) s_naive -``Series/DatetimeIndex`` with a timezone **aware** value are represented with a dtype of ``datetime64[ns, tz]``. +A :class:`Series` with a time zone **aware** values is +represented with a dtype of ``datetime64[ns, tz]`` where ``tz`` is the time zone .. ipython:: python s_aware = pd.Series(pd.date_range('20130101', periods=3, tz='US/Eastern')) s_aware -Both of these ``Series`` can be manipulated via the ``.dt`` accessor, see :ref:`here `. +Both of these :class:`Series` time zone information +can be manipulated via the ``.dt`` accessor, see :ref:`the dt accessor section `. -For example, to localize and convert a naive stamp to timezone aware. +For example, to localize and convert a naive stamp to time zone aware. .. ipython:: python s_naive.dt.tz_localize('UTC').dt.tz_convert('US/Eastern') - -Further more you can ``.astype(...)`` timezone aware (and naive). This operation is effectively a localize AND convert on a naive stamp, and -a convert on an aware stamp. +Time zone information can also be manipulated using the ``astype`` method. +This method can localize and convert time zone naive timestamps or +convert time zone aware timestamps. .. ipython:: python - # localize and convert a naive timezone + # localize and convert a naive time zone s_naive.astype('datetime64[ns, US/Eastern]') # make an aware tz naive s_aware.astype('datetime64[ns]') - # convert to a new timezone + # convert to a new time zone s_aware.astype('datetime64[ns, CET]') .. note:: Using :meth:`Series.to_numpy` on a ``Series``, returns a NumPy array of the data. - NumPy does not currently support timezones (even though it is *printing* in the local timezone!), - therefore an object array of Timestamps is returned for timezone aware data: + NumPy does not currently support time zones (even though it is *printing* in the local time zone!), + therefore an object array of Timestamps is returned for time zone aware data: .. ipython:: python s_naive.to_numpy() s_aware.to_numpy() - By converting to an object array of Timestamps, it preserves the timezone + By converting to an object array of Timestamps, it preserves the time zone information. For example, when converting back to a Series: .. ipython:: python diff --git a/doc/source/whatsnew/v0.10.0.rst b/doc/source/whatsnew/v0.10.0.rst index bc2a4918bc27b..2d6550bb6888d 100644 --- a/doc/source/whatsnew/v0.10.0.rst +++ b/doc/source/whatsnew/v0.10.0.rst @@ -370,7 +370,7 @@ Updated PyTables Support df1.get_dtype_counts() - performance improvements on table writing -- support for arbitrarly indexed dimensions +- support for arbitrarily indexed dimensions - ``SparseSeries`` now has a ``density`` property (:issue:`2384`) - enable ``Series.str.strip/lstrip/rstrip`` methods to take an input argument to strip arbitrary characters (:issue:`2411`) diff --git a/doc/source/whatsnew/v0.16.1.rst b/doc/source/whatsnew/v0.16.1.rst index 7621cb9c1e27c..cbcb23e356577 100644 --- a/doc/source/whatsnew/v0.16.1.rst +++ b/doc/source/whatsnew/v0.16.1.rst @@ -136,7 +136,7 @@ groupby operations on the index will preserve the index nature as well reindexing operations, will return a resulting index based on the type of the passed indexer, meaning that passing a list will return a plain-old-``Index``; indexing with a ``Categorical`` will return a ``CategoricalIndex``, indexed according to the categories -of the PASSED ``Categorical`` dtype. This allows one to arbitrarly index these even with +of the PASSED ``Categorical`` dtype. This allows one to arbitrarily index these even with values NOT in the categories, similarly to how you can reindex ANY pandas index. .. code-block:: ipython diff --git a/doc/source/whatsnew/v0.24.0.rst b/doc/source/whatsnew/v0.24.0.rst index 16319a3b83ca4..a49ea2cf493a6 100644 --- a/doc/source/whatsnew/v0.24.0.rst +++ b/doc/source/whatsnew/v0.24.0.rst @@ -648,6 +648,52 @@ that the dates have been converted to UTC pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"], utc=True) + +.. _whatsnew_0240.api_breaking.read_csv_mixed_tz: + +Parsing mixed-timezones with :func:`read_csv` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:func:`read_csv` no longer silently converts mixed-timezone columns to UTC (:issue:`24987`). + +*Previous Behavior* + +.. code-block:: python + + >>> import io + >>> content = """\ + ... a + ... 2000-01-01T00:00:00+05:00 + ... 2000-01-01T00:00:00+06:00""" + >>> df = pd.read_csv(io.StringIO(content), parse_dates=['a']) + >>> df.a + 0 1999-12-31 19:00:00 + 1 1999-12-31 18:00:00 + Name: a, dtype: datetime64[ns] + +*New Behavior* + +.. ipython:: python + + import io + content = """\ + a + 2000-01-01T00:00:00+05:00 + 2000-01-01T00:00:00+06:00""" + df = pd.read_csv(io.StringIO(content), parse_dates=['a']) + df.a + +As can be seen, the ``dtype`` is object; each value in the column is a string. +To convert the strings to an array of datetimes, the ``date_parser`` argument + +.. ipython:: python + + df = pd.read_csv(io.StringIO(content), parse_dates=['a'], + date_parser=lambda col: pd.to_datetime(col, utc=True)) + df.a + +See :ref:`whatsnew_0240.api.timezone_offset_parsing` for more. + .. _whatsnew_0240.api_breaking.period_end_time: Time values in ``dt.end_time`` and ``to_timestamp(how='end')`` diff --git a/doc/source/whatsnew/v0.24.1.rst b/doc/source/whatsnew/v0.24.1.rst index 3ac2ed73ea53f..be0a2eb682e87 100644 --- a/doc/source/whatsnew/v0.24.1.rst +++ b/doc/source/whatsnew/v0.24.1.rst @@ -2,8 +2,8 @@ .. _whatsnew_0241: -Whats New in 0.24.1 (February XX, 2019) ---------------------------------------- +Whats New in 0.24.1 (February 3, 2019) +-------------------------------------- .. warning:: @@ -13,64 +13,69 @@ Whats New in 0.24.1 (February XX, 2019) {{ header }} These are the changes in pandas 0.24.1. See :ref:`release` for a full changelog -including other versions of pandas. +including other versions of pandas. See :ref:`whatsnew_0240` for the 0.24.0 changelog. +.. _whatsnew_0241.api: -.. _whatsnew_0241.enhancements: +API Changes +~~~~~~~~~~~ -Enhancements -^^^^^^^^^^^^ +Changing the ``sort`` parameter for :class:`Index` set operations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The default ``sort`` value for :meth:`Index.union` has changed from ``True`` to ``None`` (:issue:`24959`). +The default *behavior*, however, remains the same: the result is sorted, unless -.. _whatsnew_0241.bug_fixes: - -Bug Fixes -~~~~~~~~~ - -**Conversion** +1. ``self`` and ``other`` are identical +2. ``self`` or ``other`` is empty +3. ``self`` or ``other`` contain values that can not be compared (a ``RuntimeWarning`` is raised). -- -- -- +This change will allow ``sort=True`` to mean "always sort" in a future release. -**Indexing** +The same change applies to :meth:`Index.difference` and :meth:`Index.symmetric_difference`, which +would not sort the result when the values could not be compared. -- -- -- +The `sort` option for :meth:`Index.intersection` has changed in three ways. -**I/O** +1. The default has changed from ``True`` to ``False``, to restore the + pandas 0.23.4 and earlier behavior of not sorting by default. +2. The behavior of ``sort=True`` can now be obtained with ``sort=None``. + This will sort the result only if the values in ``self`` and ``other`` + are not identical. +3. The value ``sort=True`` is no longer allowed. A future version of pandas + will properly support ``sort=True`` meaning "always sort". -- -- -- +.. _whatsnew_0241.regressions: -**Categorical** +Fixed Regressions +~~~~~~~~~~~~~~~~~ -- -- -- +- Fixed regression in :meth:`DataFrame.to_dict` with ``records`` orient raising an + ``AttributeError`` when the ``DataFrame`` contained more than 255 columns, or + wrongly converting column names that were not valid python identifiers (:issue:`24939`, :issue:`24940`). +- Fixed regression in :func:`read_sql` when passing certain queries with MySQL/pymysql (:issue:`24988`). +- Fixed regression in :class:`Index.intersection` incorrectly sorting the values by default (:issue:`24959`). +- Fixed regression in :func:`merge` when merging an empty ``DataFrame`` with multiple timezone-aware columns on one of the timezone-aware columns (:issue:`25014`). +- Fixed regression in :meth:`Series.rename_axis` and :meth:`DataFrame.rename_axis` where passing ``None`` failed to remove the axis name (:issue:`25034`) +- Fixed regression in :func:`to_timedelta` with `box=False` incorrectly returning a ``datetime64`` object instead of a ``timedelta64`` object (:issue:`24961`) +- Fixed regression where custom hashable types could not be used as column keys in :meth:`DataFrame.set_index` (:issue:`24969`) -**Timezones** +.. _whatsnew_0241.bug_fixes: -- -- -- +Bug Fixes +~~~~~~~~~ -**Timedelta** +**Reshaping** -- -- -- +- Bug in :meth:`DataFrame.groupby` with :class:`Grouper` when there is a time change (DST) and grouping frequency is ``'1d'`` (:issue:`24972`) -**Reshaping** +**Visualization** -- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`) +- Fixed the warning for implicitly registered matplotlib converters not showing. See :ref:`whatsnew_0211.converters` for more (:issue:`24963`). **Other** -- -- +- Fixed AttributeError when printing a DataFrame's HTML repr after accessing the IPython config object (:issue:`25036`) .. _whatsnew_0.241.contributors: diff --git a/doc/source/whatsnew/v0.24.2.rst b/doc/source/whatsnew/v0.24.2.rst new file mode 100644 index 0000000000000..e80b1060e867d --- /dev/null +++ b/doc/source/whatsnew/v0.24.2.rst @@ -0,0 +1,111 @@ +:orphan: + +.. _whatsnew_0242: + +Whats New in 0.24.2 (February XX, 2019) +--------------------------------------- + +.. warning:: + + The 0.24.x series of releases will be the last to support Python 2. Future feature + releases will support Python 3 only. See :ref:`install.dropping-27` for more. + +{{ header }} + +These are the changes in pandas 0.24.2. See :ref:`release` for a full changelog +including other versions of pandas. + +.. _whatsnew_0242.regressions: + +Fixed Regressions +^^^^^^^^^^^^^^^^^ + +- Fixed regression in :meth:`DataFrame.all` and :meth:`DataFrame.any` where ``bool_only=True`` was ignored (:issue:`25101`) +- Fixed issue in ``DataFrame`` construction with passing a mixed list of mixed types could segfault. (:issue:`25075`) +- Fixed regression in :meth:`DataFrame.apply` causing ``RecursionError`` when ``dict``-like classes were passed as argument. (:issue:`25196`) +- Fixed regression in :meth:`DataFrame.replace` where ``regex=True`` was only replacing patterns matching the start of the string (:issue:`25259`) + +- Fixed regression in :meth:`DataFrame.duplicated()`, where empty dataframe was not returning a boolean dtyped Series. (:issue:`25184`) +- Fixed regression in :meth:`Series.min` and :meth:`Series.max` where ``numeric_only=True`` was ignored when the ``Series`` contained ```Categorical`` data (:issue:`25299`) +- Fixed regression in subtraction between :class:`Series` objects with ``datetime64[ns]`` dtype incorrectly raising ``OverflowError`` when the `Series` on the right contains null values (:issue:`25317`) +- Fixed regression in :class:`TimedeltaIndex` where `np.sum(index)` incorrectly returned a zero-dimensional object instead of a scalar (:issue:`25282`) +- Fixed regression in ``IntervalDtype`` construction where passing an incorrect string with 'Interval' as a prefix could result in a ``RecursionError``. (:issue:`25338`) + +- Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`) + +.. _whatsnew_0242.enhancements: + +Enhancements +^^^^^^^^^^^^ + +- +- + +.. _whatsnew_0242.bug_fixes: + +Bug Fixes +~~~~~~~~~ + +**Conversion** + +- +- +- + +**Indexing** + +- +- +- + +**I/O** + +- Better handling of terminal printing when the terminal dimensions are not known (:issue:`25080`); +- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`) +- Bug in reading a JSON with ``orient='table'`` generated by :meth:`DataFrame.to_json` with ``index=False`` (:issue:`25170`) +- Bug where float indexes could have misaligned values when printing (:issue:`25061`) +- + +**Categorical** + +- +- +- + +**Timezones** + +- +- +- + +**Timedelta** + +- +- +- + +**Reshaping** + +- Bug in :meth:`pandas.core.groupby.GroupBy.transform` where applying a function to a timezone aware column would return a timezone naive result (:issue:`24198`) +- Bug in :func:`DataFrame.join` when joining on a timezone aware :class:`DatetimeIndex` (:issue:`23931`) +- + +**Visualization** + +- +- +- + +**Other** + +- Bug in :meth:`Series.is_unique` where single occurrences of ``NaN`` were not considered unique (:issue:`25180`) +- Bug in :func:`merge` when merging an empty ``DataFrame`` with an ``Int64`` column or a non-empty ``DataFrame`` with an ``Int64`` column that is all ``NaN`` (:issue:`25183`) +- Bug in ``IntervalTree`` where a ``RecursionError`` occurs upon construction due to an overflow when adding endpoints, which also causes :class:`IntervalIndex` to crash during indexing operations (:issue:`25485`) +- + +.. _whatsnew_0.242.contributors: + +Contributors +~~~~~~~~~~~~ + +.. contributors:: v0.24.1..v0.24.2 diff --git a/doc/source/whatsnew/v0.25.0.rst b/doc/source/whatsnew/v0.25.0.rst index 5129449e4fdf3..ce6f259f97062 100644 --- a/doc/source/whatsnew/v0.25.0.rst +++ b/doc/source/whatsnew/v0.25.0.rst @@ -19,23 +19,73 @@ including other versions of pandas. Other Enhancements ^^^^^^^^^^^^^^^^^^ +- Indexing of ``DataFrame`` and ``Series`` now accepts zerodim ``np.ndarray`` (:issue:`24919`) +- :meth:`Timestamp.replace` now supports the ``fold`` argument to disambiguate DST transition times (:issue:`25017`) +- :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :meth:`datetime.time` objects with timezones (:issue:`24043`) +- ``Series.str`` has gained :meth:`Series.str.casefold` method to removes all case distinctions present in a string (:issue:`25405`) +- :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`) +- :meth:`DatetimeIndex.union` now supports the ``sort`` argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`) - -- -- - .. _whatsnew_0250.api_breaking: Backwards incompatible API changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. _whatsnew_0250.api_breaking.utc_offset_indexing: + +Indexing with date strings with UTC offsets +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Indexing a :class:`DataFrame` or :class:`Series` with a :class:`DatetimeIndex` with a +date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset +is respected in indexing. (:issue:`24076`, :issue:`16785`) + +*Previous Behavior*: + +.. code-block:: ipython + + In [1]: df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific')) + + In [2]: df + Out[2]: + 0 + 2019-01-01 00:00:00-08:00 0 + + In [3]: df['2019-01-01 00:00:00+04:00':'2019-01-01 01:00:00+04:00'] + Out[3]: + 0 + 2019-01-01 00:00:00-08:00 0 + +*New Behavior*: + +.. ipython:: ipython + + df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific')) + df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00'] + +.. _whatsnew_0250.api_breaking.deps: + +Increased minimum versions for dependencies +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +We have updated our minimum supported versions of dependencies (:issue:`23519`). +If installed, we now require: + ++-----------------+-----------------+----------+ +| Package | Minimum Version | Required | ++=================+=================+==========+ +| pytest (dev) | 4.0.2 | | ++-----------------+-----------------+----------+ + .. _whatsnew_0250.api.other: Other API Changes ^^^^^^^^^^^^^^^^^ -- -- +- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`) +- ``Timestamp`` and ``Timedelta`` scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`) +- :meth:`Timestamp.strptime` will now rise a ``NotImplementedError`` (:issue:`25016`) - .. _whatsnew_0250.deprecations: @@ -43,16 +93,13 @@ Other API Changes Deprecations ~~~~~~~~~~~~ -- -- -- - +- Deprecated the `M (months)` and `Y (year)` `units` parameter of :func: `pandas.to_timedelta`, :func: `pandas.Timedelta` and :func: `pandas.TimedeltaIndex` (:issue:`16344`) .. _whatsnew_0250.prior_deprecations: Removal of prior version deprecations/changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - +- Removed (parts of) :class:`Panel` (:issue:`25047`,:issue:`25191`,:issue:`25231`) - - - @@ -62,15 +109,20 @@ Removal of prior version deprecations/changes Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ -- -- -- +- Significant speedup in `SparseArray` initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`) +- `DataFrame.to_stata()` is now faster when outputting data with any string or non-native endian columns (:issue:`25045`) +- Improved performance of :meth:`Series.searchsorted`. The speedup is especially large when the dtype is + int8/int16/int32 and the searched key is within the integer bounds for the dtype (:issue:`22034`) +- Improved performance of :meth:`pandas.core.groupby.GroupBy.quantile` (:issue:`20405`) .. _whatsnew_0250.bug_fixes: Bug Fixes ~~~~~~~~~ +- Bug in :func:`to_datetime` which would raise an (incorrect) ``ValueError`` when called with a date far into the future and the ``format`` argument specified instead of raising ``OutOfBoundsDatetime`` (:issue:`23830`) +- +- Categorical ^^^^^^^^^^^ @@ -82,7 +134,7 @@ Categorical Datetimelike ^^^^^^^^^^^^ -- +- Added support for ISO week year format ('%G-%V-%u') when parsing datetimes using :meth: `to_datetime` (:issue:`16607`) - - @@ -96,13 +148,15 @@ Timedelta Timezones ^^^^^^^^^ -- -- +- Bug in :func:`to_datetime` with ``utc=True`` and datetime strings that would apply previously parsed UTC offsets to subsequent arguments (:issue:`24992`) +- Bug in :func:`Timestamp.tz_localize` and :func:`Timestamp.tz_convert` does not propagate ``freq`` (:issue:`25241`) - Numeric ^^^^^^^ +- Bug in :meth:`to_numeric` in which large negative numbers were being improperly handled (:issue:`24910`) +- Bug in :meth:`to_numeric` in which numbers were being coerced to float, even though ``errors`` was not ``coerce`` (:issue:`24910`) - - - @@ -141,21 +195,25 @@ Indexing Missing ^^^^^^^ -- +- Fixed misleading exception message in :meth:`Series.missing` if argument ``order`` is required, but omitted (:issue:`10633`, :issue:`24014`). - - MultiIndex ^^^^^^^^^^ +- Bug in which incorrect exception raised by :meth:`pd.Timedelta` when testing the membership of :class:`MultiIndex` (:issue:`24570`) - - -- - I/O ^^^ +- Bug in :func:`DataFrame.to_html()` where values were truncated using display options instead of outputting the full content (:issue:`17004`) +- Fixed bug in missing text when using :meth:`to_clipboard` if copying utf-16 characters in Python 3 on Windows (:issue:`25040`) +- Bug in :func:`read_json` for ``orient='table'`` when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`) +- Bug in :func:`read_json` for ``orient='table'`` and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`) +- Bug in :func:`read_json` for ``orient='table'`` and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`) - - - @@ -171,24 +229,25 @@ Plotting Groupby/Resample/Rolling ^^^^^^^^^^^^^^^^^^^^^^^^ -- -- -- +- Bug in :meth:`pandas.core.resample.Resampler.agg` with a timezone aware index where ``OverflowError`` would raise when passing a list of functions (:issue:`22660`) +- Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.nunique` in which the names of column levels were lost (:issue:`23222`) +- Bug in :func:`pandas.core.groupby.GroupBy.agg` when applying a aggregation function to timezone aware data (:issue:`23683`) +- Bug in :func:`pandas.core.groupby.GroupBy.first` and :func:`pandas.core.groupby.GroupBy.last` where timezone information would be dropped (:issue:`21603`) Reshaping ^^^^^^^^^ -- -- -- +- Bug in :func:`pandas.merge` adds a string of ``None`` if ``None`` is assigned in suffixes instead of remain the column name as-is (:issue:`24782`). +- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`) +- :func:`to_records` now accepts dtypes to its `column_dtypes` parameter (:issue:`24895`) Sparse ^^^^^^ -- -- +- Significant speedup in `SparseArray` initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`) +- Bug in :class:`SparseFrame` constructor where passing ``None`` as the data would cause ``default_fill_value`` to be ignored (:issue:`16807`) - @@ -206,4 +265,3 @@ Contributors ~~~~~~~~~~~~ .. contributors:: v0.24.x..HEAD - diff --git a/environment.yml b/environment.yml index 47fe8e4c2a640..c1669c9f49017 100644 --- a/environment.yml +++ b/environment.yml @@ -19,7 +19,8 @@ dependencies: - hypothesis>=3.82 - isort - moto - - pytest>=4.0 + - pytest>=4.0.2 + - pytest-mock - sphinx - numpydoc diff --git a/pandas/_libs/groupby.pxd b/pandas/_libs/groupby.pxd new file mode 100644 index 0000000000000..70ad8a62871e9 --- /dev/null +++ b/pandas/_libs/groupby.pxd @@ -0,0 +1,6 @@ +cdef enum InterpolationEnumType: + INTERPOLATION_LINEAR, + INTERPOLATION_LOWER, + INTERPOLATION_HIGHER, + INTERPOLATION_NEAREST, + INTERPOLATION_MIDPOINT diff --git a/pandas/_libs/groupby.pyx b/pandas/_libs/groupby.pyx index e6036654c71c3..71e25c3955a6d 100644 --- a/pandas/_libs/groupby.pyx +++ b/pandas/_libs/groupby.pyx @@ -2,6 +2,7 @@ import cython from cython import Py_ssize_t +from cython cimport floating from libc.stdlib cimport malloc, free @@ -381,6 +382,368 @@ def group_any_all(uint8_t[:] out, if values[i] == flag_val: out[lab] = flag_val +# ---------------------------------------------------------------------- +# group_add, group_prod, group_var, group_mean, group_ohlc +# ---------------------------------------------------------------------- + + +@cython.wraparound(False) +@cython.boundscheck(False) +def _group_add(floating[:, :] out, + int64_t[:] counts, + floating[:, :] values, + const int64_t[:] labels, + Py_ssize_t min_count=0): + """ + Only aggregates on axis=0 + """ + cdef: + Py_ssize_t i, j, N, K, lab, ncounts = len(counts) + floating val, count + floating[:, :] sumx, nobs + + if len(values) != len(labels): + raise AssertionError("len(index) != len(labels)") + + nobs = np.zeros_like(out) + sumx = np.zeros_like(out) + + N, K = (values).shape + + with nogil: + for i in range(N): + lab = labels[i] + if lab < 0: + continue + + counts[lab] += 1 + for j in range(K): + val = values[i, j] + + # not nan + if val == val: + nobs[lab, j] += 1 + sumx[lab, j] += val + + for i in range(ncounts): + for j in range(K): + if nobs[i, j] < min_count: + out[i, j] = NAN + else: + out[i, j] = sumx[i, j] + + +group_add_float32 = _group_add['float'] +group_add_float64 = _group_add['double'] + + +@cython.wraparound(False) +@cython.boundscheck(False) +def _group_prod(floating[:, :] out, + int64_t[:] counts, + floating[:, :] values, + const int64_t[:] labels, + Py_ssize_t min_count=0): + """ + Only aggregates on axis=0 + """ + cdef: + Py_ssize_t i, j, N, K, lab, ncounts = len(counts) + floating val, count + floating[:, :] prodx, nobs + + if not len(values) == len(labels): + raise AssertionError("len(index) != len(labels)") + + nobs = np.zeros_like(out) + prodx = np.ones_like(out) + + N, K = (values).shape + + with nogil: + for i in range(N): + lab = labels[i] + if lab < 0: + continue + + counts[lab] += 1 + for j in range(K): + val = values[i, j] + + # not nan + if val == val: + nobs[lab, j] += 1 + prodx[lab, j] *= val + + for i in range(ncounts): + for j in range(K): + if nobs[i, j] < min_count: + out[i, j] = NAN + else: + out[i, j] = prodx[i, j] + + +group_prod_float32 = _group_prod['float'] +group_prod_float64 = _group_prod['double'] + + +@cython.wraparound(False) +@cython.boundscheck(False) +@cython.cdivision(True) +def _group_var(floating[:, :] out, + int64_t[:] counts, + floating[:, :] values, + const int64_t[:] labels, + Py_ssize_t min_count=-1): + cdef: + Py_ssize_t i, j, N, K, lab, ncounts = len(counts) + floating val, ct, oldmean + floating[:, :] nobs, mean + + assert min_count == -1, "'min_count' only used in add and prod" + + if not len(values) == len(labels): + raise AssertionError("len(index) != len(labels)") + + nobs = np.zeros_like(out) + mean = np.zeros_like(out) + + N, K = (values).shape + + out[:, :] = 0.0 + + with nogil: + for i in range(N): + lab = labels[i] + if lab < 0: + continue + + counts[lab] += 1 + + for j in range(K): + val = values[i, j] + + # not nan + if val == val: + nobs[lab, j] += 1 + oldmean = mean[lab, j] + mean[lab, j] += (val - oldmean) / nobs[lab, j] + out[lab, j] += (val - mean[lab, j]) * (val - oldmean) + + for i in range(ncounts): + for j in range(K): + ct = nobs[i, j] + if ct < 2: + out[i, j] = NAN + else: + out[i, j] /= (ct - 1) + + +group_var_float32 = _group_var['float'] +group_var_float64 = _group_var['double'] + + +@cython.wraparound(False) +@cython.boundscheck(False) +def _group_mean(floating[:, :] out, + int64_t[:] counts, + floating[:, :] values, + const int64_t[:] labels, + Py_ssize_t min_count=-1): + cdef: + Py_ssize_t i, j, N, K, lab, ncounts = len(counts) + floating val, count + floating[:, :] sumx, nobs + + assert min_count == -1, "'min_count' only used in add and prod" + + if not len(values) == len(labels): + raise AssertionError("len(index) != len(labels)") + + nobs = np.zeros_like(out) + sumx = np.zeros_like(out) + + N, K = (values).shape + + with nogil: + for i in range(N): + lab = labels[i] + if lab < 0: + continue + + counts[lab] += 1 + for j in range(K): + val = values[i, j] + # not nan + if val == val: + nobs[lab, j] += 1 + sumx[lab, j] += val + + for i in range(ncounts): + for j in range(K): + count = nobs[i, j] + if nobs[i, j] == 0: + out[i, j] = NAN + else: + out[i, j] = sumx[i, j] / count + + +group_mean_float32 = _group_mean['float'] +group_mean_float64 = _group_mean['double'] + + +@cython.wraparound(False) +@cython.boundscheck(False) +def _group_ohlc(floating[:, :] out, + int64_t[:] counts, + floating[:, :] values, + const int64_t[:] labels, + Py_ssize_t min_count=-1): + """ + Only aggregates on axis=0 + """ + cdef: + Py_ssize_t i, j, N, K, lab + floating val, count + Py_ssize_t ngroups = len(counts) + + assert min_count == -1, "'min_count' only used in add and prod" + + if len(labels) == 0: + return + + N, K = (values).shape + + if out.shape[1] != 4: + raise ValueError('Output array must have 4 columns') + + if K > 1: + raise NotImplementedError("Argument 'values' must have only " + "one dimension") + out[:] = np.nan + + with nogil: + for i in range(N): + lab = labels[i] + if lab == -1: + continue + + counts[lab] += 1 + val = values[i, 0] + if val != val: + continue + + if out[lab, 0] != out[lab, 0]: + out[lab, 0] = out[lab, 1] = out[lab, 2] = out[lab, 3] = val + else: + out[lab, 1] = max(out[lab, 1], val) + out[lab, 2] = min(out[lab, 2], val) + out[lab, 3] = val + + +group_ohlc_float32 = _group_ohlc['float'] +group_ohlc_float64 = _group_ohlc['double'] + + +@cython.boundscheck(False) +@cython.wraparound(False) +def group_quantile(ndarray[float64_t] out, + ndarray[int64_t] labels, + numeric[:] values, + ndarray[uint8_t] mask, + float64_t q, + object interpolation): + """ + Calculate the quantile per group. + + Parameters + ---------- + out : ndarray + Array of aggregated values that will be written to. + labels : ndarray + Array containing the unique group labels. + values : ndarray + Array containing the values to apply the function against. + q : float + The quantile value to search for. + + Notes + ----- + Rather than explicitly returning a value, this function modifies the + provided `out` parameter. + """ + cdef: + Py_ssize_t i, N=len(labels), ngroups, grp_sz, non_na_sz + Py_ssize_t grp_start=0, idx=0 + int64_t lab + uint8_t interp + float64_t q_idx, frac, val, next_val + ndarray[int64_t] counts, non_na_counts, sort_arr + + assert values.shape[0] == N + inter_methods = { + 'linear': INTERPOLATION_LINEAR, + 'lower': INTERPOLATION_LOWER, + 'higher': INTERPOLATION_HIGHER, + 'nearest': INTERPOLATION_NEAREST, + 'midpoint': INTERPOLATION_MIDPOINT, + } + interp = inter_methods[interpolation] + + counts = np.zeros_like(out, dtype=np.int64) + non_na_counts = np.zeros_like(out, dtype=np.int64) + ngroups = len(counts) + + # First figure out the size of every group + with nogil: + for i in range(N): + lab = labels[i] + counts[lab] += 1 + if not mask[i]: + non_na_counts[lab] += 1 + + # Get an index of values sorted by labels and then values + order = (values, labels) + sort_arr = np.lexsort(order).astype(np.int64, copy=False) + + with nogil: + for i in range(ngroups): + # Figure out how many group elements there are + grp_sz = counts[i] + non_na_sz = non_na_counts[i] + + if non_na_sz == 0: + out[i] = NaN + else: + # Calculate where to retrieve the desired value + # Casting to int will intentionaly truncate result + idx = grp_start + (q * (non_na_sz - 1)) + + val = values[sort_arr[idx]] + # If requested quantile falls evenly on a particular index + # then write that index's value out. Otherwise interpolate + q_idx = q * (non_na_sz - 1) + frac = q_idx % 1 + + if frac == 0.0 or interp == INTERPOLATION_LOWER: + out[i] = val + else: + next_val = values[sort_arr[idx + 1]] + if interp == INTERPOLATION_LINEAR: + out[i] = val + (next_val - val) * frac + elif interp == INTERPOLATION_HIGHER: + out[i] = next_val + elif interp == INTERPOLATION_MIDPOINT: + out[i] = (val + next_val) / 2.0 + elif interp == INTERPOLATION_NEAREST: + if frac > .5 or (frac == .5 and q > .5): # Always OK? + out[i] = next_val + else: + out[i] = val + + # Increment the index reference in sorted_arr for the next group + grp_start += grp_sz + # generated from template include "groupby_helper.pxi" diff --git a/pandas/_libs/groupby_helper.pxi.in b/pandas/_libs/groupby_helper.pxi.in index 858039f038d02..63cd4d6ac6ff2 100644 --- a/pandas/_libs/groupby_helper.pxi.in +++ b/pandas/_libs/groupby_helper.pxi.in @@ -8,266 +8,6 @@ cdef extern from "numpy/npy_math.h": float64_t NAN "NPY_NAN" _int64_max = np.iinfo(np.int64).max -# ---------------------------------------------------------------------- -# group_add, group_prod, group_var, group_mean, group_ohlc -# ---------------------------------------------------------------------- - -{{py: - -# name, c_type -dtypes = [('float64', 'float64_t'), - ('float32', 'float32_t')] - -def get_dispatch(dtypes): - - for name, c_type in dtypes: - yield name, c_type -}} - -{{for name, c_type in get_dispatch(dtypes)}} - - -@cython.wraparound(False) -@cython.boundscheck(False) -def group_add_{{name}}({{c_type}}[:, :] out, - int64_t[:] counts, - {{c_type}}[:, :] values, - const int64_t[:] labels, - Py_ssize_t min_count=0): - """ - Only aggregates on axis=0 - """ - cdef: - Py_ssize_t i, j, N, K, lab, ncounts = len(counts) - {{c_type}} val, count - ndarray[{{c_type}}, ndim=2] sumx, nobs - - if not len(values) == len(labels): - raise AssertionError("len(index) != len(labels)") - - nobs = np.zeros_like(out) - sumx = np.zeros_like(out) - - N, K = (values).shape - - with nogil: - - for i in range(N): - lab = labels[i] - if lab < 0: - continue - - counts[lab] += 1 - for j in range(K): - val = values[i, j] - - # not nan - if val == val: - nobs[lab, j] += 1 - sumx[lab, j] += val - - for i in range(ncounts): - for j in range(K): - if nobs[i, j] < min_count: - out[i, j] = NAN - else: - out[i, j] = sumx[i, j] - - -@cython.wraparound(False) -@cython.boundscheck(False) -def group_prod_{{name}}({{c_type}}[:, :] out, - int64_t[:] counts, - {{c_type}}[:, :] values, - const int64_t[:] labels, - Py_ssize_t min_count=0): - """ - Only aggregates on axis=0 - """ - cdef: - Py_ssize_t i, j, N, K, lab, ncounts = len(counts) - {{c_type}} val, count - ndarray[{{c_type}}, ndim=2] prodx, nobs - - if not len(values) == len(labels): - raise AssertionError("len(index) != len(labels)") - - nobs = np.zeros_like(out) - prodx = np.ones_like(out) - - N, K = (values).shape - - with nogil: - for i in range(N): - lab = labels[i] - if lab < 0: - continue - - counts[lab] += 1 - for j in range(K): - val = values[i, j] - - # not nan - if val == val: - nobs[lab, j] += 1 - prodx[lab, j] *= val - - for i in range(ncounts): - for j in range(K): - if nobs[i, j] < min_count: - out[i, j] = NAN - else: - out[i, j] = prodx[i, j] - - -@cython.wraparound(False) -@cython.boundscheck(False) -@cython.cdivision(True) -def group_var_{{name}}({{c_type}}[:, :] out, - int64_t[:] counts, - {{c_type}}[:, :] values, - const int64_t[:] labels, - Py_ssize_t min_count=-1): - cdef: - Py_ssize_t i, j, N, K, lab, ncounts = len(counts) - {{c_type}} val, ct, oldmean - ndarray[{{c_type}}, ndim=2] nobs, mean - - assert min_count == -1, "'min_count' only used in add and prod" - - if not len(values) == len(labels): - raise AssertionError("len(index) != len(labels)") - - nobs = np.zeros_like(out) - mean = np.zeros_like(out) - - N, K = (values).shape - - out[:, :] = 0.0 - - with nogil: - for i in range(N): - lab = labels[i] - if lab < 0: - continue - - counts[lab] += 1 - - for j in range(K): - val = values[i, j] - - # not nan - if val == val: - nobs[lab, j] += 1 - oldmean = mean[lab, j] - mean[lab, j] += (val - oldmean) / nobs[lab, j] - out[lab, j] += (val - mean[lab, j]) * (val - oldmean) - - for i in range(ncounts): - for j in range(K): - ct = nobs[i, j] - if ct < 2: - out[i, j] = NAN - else: - out[i, j] /= (ct - 1) -# add passing bin edges, instead of labels - - -@cython.wraparound(False) -@cython.boundscheck(False) -def group_mean_{{name}}({{c_type}}[:, :] out, - int64_t[:] counts, - {{c_type}}[:, :] values, - const int64_t[:] labels, - Py_ssize_t min_count=-1): - cdef: - Py_ssize_t i, j, N, K, lab, ncounts = len(counts) - {{c_type}} val, count - ndarray[{{c_type}}, ndim=2] sumx, nobs - - assert min_count == -1, "'min_count' only used in add and prod" - - if not len(values) == len(labels): - raise AssertionError("len(index) != len(labels)") - - nobs = np.zeros_like(out) - sumx = np.zeros_like(out) - - N, K = (values).shape - - with nogil: - for i in range(N): - lab = labels[i] - if lab < 0: - continue - - counts[lab] += 1 - for j in range(K): - val = values[i, j] - # not nan - if val == val: - nobs[lab, j] += 1 - sumx[lab, j] += val - - for i in range(ncounts): - for j in range(K): - count = nobs[i, j] - if nobs[i, j] == 0: - out[i, j] = NAN - else: - out[i, j] = sumx[i, j] / count - - -@cython.wraparound(False) -@cython.boundscheck(False) -def group_ohlc_{{name}}({{c_type}}[:, :] out, - int64_t[:] counts, - {{c_type}}[:, :] values, - const int64_t[:] labels, - Py_ssize_t min_count=-1): - """ - Only aggregates on axis=0 - """ - cdef: - Py_ssize_t i, j, N, K, lab - {{c_type}} val, count - Py_ssize_t ngroups = len(counts) - - assert min_count == -1, "'min_count' only used in add and prod" - - if len(labels) == 0: - return - - N, K = (values).shape - - if out.shape[1] != 4: - raise ValueError('Output array must have 4 columns') - - if K > 1: - raise NotImplementedError("Argument 'values' must have only " - "one dimension") - out[:] = np.nan - - with nogil: - for i in range(N): - lab = labels[i] - if lab == -1: - continue - - counts[lab] += 1 - val = values[i, 0] - if val != val: - continue - - if out[lab, 0] != out[lab, 0]: - out[lab, 0] = out[lab, 1] = out[lab, 2] = out[lab, 3] = val - else: - out[lab, 1] = max(out[lab, 1], val) - out[lab, 2] = min(out[lab, 2], val) - out[lab, 3] = val - -{{endfor}} - # ---------------------------------------------------------------------- # group_nth, group_last, group_rank # ---------------------------------------------------------------------- diff --git a/pandas/_libs/interval.pyx b/pandas/_libs/interval.pyx index 3147f36dcc835..e86b692e9915e 100644 --- a/pandas/_libs/interval.pyx +++ b/pandas/_libs/interval.pyx @@ -18,7 +18,6 @@ cnp.import_array() cimport pandas._libs.util as util -util.import_array() from pandas._libs.hashtable cimport Int64Vector, Int64VectorData @@ -151,9 +150,6 @@ cdef class Interval(IntervalMixin): Left bound for the interval. right : orderable scalar Right bound for the interval. - closed : {'left', 'right', 'both', 'neither'}, default 'right' - Whether the interval is closed on the left-side, right-side, both or - neither. closed : {'right', 'left', 'both', 'neither'}, default 'right' Whether the interval is closed on the left-side, right-side, both or neither. See the Notes for more detailed explanation. diff --git a/pandas/_libs/intervaltree.pxi.in b/pandas/_libs/intervaltree.pxi.in index fb6f30c030f11..196841f35ed8d 100644 --- a/pandas/_libs/intervaltree.pxi.in +++ b/pandas/_libs/intervaltree.pxi.in @@ -284,7 +284,7 @@ cdef class {{dtype_title}}Closed{{closed_title}}IntervalNode: else: # calculate a pivot so we can create child nodes self.is_leaf_node = False - self.pivot = np.median(left + right) / 2 + self.pivot = np.median(left / 2 + right / 2) left_set, right_set, center_set = self.classify_intervals( left, right) diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx index 4745916eb0ce2..34ceeb20e260e 100644 --- a/pandas/_libs/lib.pyx +++ b/pandas/_libs/lib.pyx @@ -233,10 +233,11 @@ def fast_unique_multiple(list arrays, sort: bool=True): if val not in table: table[val] = stub uniques.append(val) - if sort: + if sort is None: try: uniques.sort() except Exception: + # TODO: RuntimeWarning? pass return uniques @@ -938,6 +939,7 @@ _TYPE_MAP = { 'float32': 'floating', 'float64': 'floating', 'f': 'floating', + 'complex64': 'complex', 'complex128': 'complex', 'c': 'complex', 'string': 'string' if PY2 else 'bytes', @@ -1304,6 +1306,9 @@ def infer_dtype(value: object, skipna: object=None) -> str: elif is_decimal(val): return 'decimal' + elif is_complex(val): + return 'complex' + elif util.is_float_object(val): if is_float_array(values): return 'floating' @@ -1828,7 +1833,7 @@ def maybe_convert_numeric(ndarray[object] values, set na_values, except (ValueError, OverflowError, TypeError): pass - # otherwise, iterate and do full infererence + # Otherwise, iterate and do full inference. cdef: int status, maybe_int Py_ssize_t i, n = values.size @@ -1865,10 +1870,10 @@ def maybe_convert_numeric(ndarray[object] values, set na_values, else: seen.float_ = True - if val <= oINT64_MAX: + if oINT64_MIN <= val <= oINT64_MAX: ints[i] = val - if seen.sint_ and seen.uint_: + if val < oINT64_MIN or (seen.sint_ and seen.uint_): seen.float_ = True elif util.is_bool_object(val): @@ -1910,23 +1915,28 @@ def maybe_convert_numeric(ndarray[object] values, set na_values, else: seen.saw_int(as_int) - if not (seen.float_ or as_int in na_values): + if as_int not in na_values: if as_int < oINT64_MIN or as_int > oUINT64_MAX: - raise ValueError('Integer out of range.') + if seen.coerce_numeric: + seen.float_ = True + else: + raise ValueError("Integer out of range.") + else: + if as_int >= 0: + uints[i] = as_int - if as_int >= 0: - uints[i] = as_int - if as_int <= oINT64_MAX: - ints[i] = as_int + if as_int <= oINT64_MAX: + ints[i] = as_int seen.float_ = seen.float_ or (seen.uint_ and seen.sint_) else: seen.float_ = True except (TypeError, ValueError) as e: if not seen.coerce_numeric: - raise type(e)(str(e) + ' at position {pos}'.format(pos=i)) + raise type(e)(str(e) + " at position {pos}".format(pos=i)) elif "uint64" in str(e): # Exception from check functions. raise + seen.saw_null() floats[i] = NaN @@ -2271,7 +2281,7 @@ def to_object_array(rows: object, int min_width=0): result = np.empty((n, k), dtype=object) for i in range(n): - row = input_rows[i] + row = list(input_rows[i]) for j in range(len(row)): result[i, j] = row[j] diff --git a/pandas/_libs/reduction.pyx b/pandas/_libs/reduction.pyx index 507567cf480d7..517d59c399179 100644 --- a/pandas/_libs/reduction.pyx +++ b/pandas/_libs/reduction.pyx @@ -342,7 +342,9 @@ cdef class SeriesGrouper: index = None else: values = dummy.values - if dummy.dtype != self.arr.dtype: + # GH 23683: datetimetz types are equivalent to datetime types here + if (dummy.dtype != self.arr.dtype + and values.dtype != self.arr.dtype): raise ValueError('Dummy array must be same dtype') if not values.flags.contiguous: values = values.copy() diff --git a/pandas/_libs/sparse.pyx b/pandas/_libs/sparse.pyx index f5980998f6db4..5471c8184e458 100644 --- a/pandas/_libs/sparse.pyx +++ b/pandas/_libs/sparse.pyx @@ -72,9 +72,6 @@ cdef class IntIndex(SparseIndex): A ValueError is raised if any of these conditions is violated. """ - cdef: - int32_t index, prev = -1 - if self.npoints > self.length: msg = ("Too many indices. Expected " "{exp} but found {act}").format( @@ -86,17 +83,15 @@ cdef class IntIndex(SparseIndex): if self.npoints == 0: return - if min(self.indices) < 0: + if self.indices.min() < 0: raise ValueError("No index can be less than zero") - if max(self.indices) >= self.length: + if self.indices.max() >= self.length: raise ValueError("All indices must be less than the length") - for index in self.indices: - if prev != -1 and index <= prev: - raise ValueError("Indices must be strictly increasing") - - prev = index + monotonic = np.all(self.indices[:-1] < self.indices[1:]) + if not monotonic: + raise ValueError("Indices must be strictly increasing") def equals(self, other): if not isinstance(other, IntIndex): diff --git a/pandas/_libs/src/compat_helper.h b/pandas/_libs/src/compat_helper.h index 462f53392adee..078069fb48af2 100644 --- a/pandas/_libs/src/compat_helper.h +++ b/pandas/_libs/src/compat_helper.h @@ -29,8 +29,8 @@ the macro, which restores compat. #ifndef PYPY_VERSION # if PY_VERSION_HEX < 0x03070000 && defined(PySlice_GetIndicesEx) # undef PySlice_GetIndicesEx -# endif -#endif +# endif // PY_VERSION_HEX +#endif // PYPY_VERSION PANDAS_INLINE int slice_get_indices(PyObject *s, Py_ssize_t length, @@ -44,7 +44,7 @@ PANDAS_INLINE int slice_get_indices(PyObject *s, #else return PySlice_GetIndicesEx((PySliceObject *)s, length, start, stop, step, slicelength); -#endif +#endif // PY_VERSION_HEX } #endif // PANDAS__LIBS_SRC_COMPAT_HELPER_H_ diff --git a/pandas/_libs/src/inline_helper.h b/pandas/_libs/src/inline_helper.h index 397ec8e7b2cb8..e203a05d2eb56 100644 --- a/pandas/_libs/src/inline_helper.h +++ b/pandas/_libs/src/inline_helper.h @@ -19,7 +19,7 @@ The full license is in the LICENSE file, distributed with this software. #define PANDAS_INLINE static inline #else #define PANDAS_INLINE - #endif -#endif + #endif // __GNUC__ +#endif // PANDAS_INLINE #endif // PANDAS__LIBS_SRC_INLINE_HELPER_H_ diff --git a/pandas/_libs/src/parse_helper.h b/pandas/_libs/src/parse_helper.h index b71131bee7008..6fcd2ed0a9ea0 100644 --- a/pandas/_libs/src/parse_helper.h +++ b/pandas/_libs/src/parse_helper.h @@ -30,7 +30,7 @@ int to_double(char *item, double *p_value, char sci, char decimal, #if PY_VERSION_HEX < 0x02060000 #define PyBytes_Check PyString_Check #define PyBytes_AS_STRING PyString_AS_STRING -#endif +#endif // PY_VERSION_HEX int floatify(PyObject *str, double *result, int *maybe_int) { int status; diff --git a/pandas/_libs/src/parser/io.c b/pandas/_libs/src/parser/io.c index 19271c78501ba..f578ce138e274 100644 --- a/pandas/_libs/src/parser/io.c +++ b/pandas/_libs/src/parser/io.c @@ -15,7 +15,7 @@ The full license is in the LICENSE file, distributed with this software. #ifndef O_BINARY #define O_BINARY 0 -#endif /* O_BINARY */ +#endif // O_BINARY /* On-disk FILE, uncompressed @@ -277,4 +277,4 @@ void *buffer_mmap_bytes(void *source, size_t nbytes, size_t *bytes_read, return NULL; } -#endif +#endif // HAVE_MMAP diff --git a/pandas/_libs/src/parser/io.h b/pandas/_libs/src/parser/io.h index d22e8ddaea88d..074322c7bdf78 100644 --- a/pandas/_libs/src/parser/io.h +++ b/pandas/_libs/src/parser/io.h @@ -25,7 +25,7 @@ typedef struct _file_source { #if !defined(_WIN32) && !defined(HAVE_MMAP) #define HAVE_MMAP -#endif +#endif // HAVE_MMAP typedef struct _memory_map { int fd; diff --git a/pandas/_libs/src/parser/tokenizer.c b/pandas/_libs/src/parser/tokenizer.c index a86af7c5416de..6acf3c3de0c91 100644 --- a/pandas/_libs/src/parser/tokenizer.c +++ b/pandas/_libs/src/parser/tokenizer.c @@ -1480,7 +1480,7 @@ int main(int argc, char *argv[]) { return 0; } -#endif +#endif // TEST // --------------------------------------------------------------------------- // Implementation of xstrtod diff --git a/pandas/_libs/src/parser/tokenizer.h b/pandas/_libs/src/parser/tokenizer.h index c32c061c7fa89..ce9dd39b16222 100644 --- a/pandas/_libs/src/parser/tokenizer.h +++ b/pandas/_libs/src/parser/tokenizer.h @@ -42,7 +42,7 @@ See LICENSE for the license #if defined(_MSC_VER) #define strtoll _strtoi64 -#endif +#endif // _MSC_VER /* @@ -75,7 +75,7 @@ See LICENSE for the license #define TRACE(X) printf X; #else #define TRACE(X) -#endif +#endif // VERBOSE #define PARSER_OUT_OF_MEMORY -1 diff --git a/pandas/_libs/tslib.pyx b/pandas/_libs/tslib.pyx index 798e338d5581b..624872c1c56c6 100644 --- a/pandas/_libs/tslib.pyx +++ b/pandas/_libs/tslib.pyx @@ -645,6 +645,8 @@ cpdef array_to_datetime(ndarray[object] values, str errors='raise', out_tzoffset_vals.add(out_tzoffset * 60.) tz = pytz.FixedOffset(out_tzoffset) value = tz_convert_single(value, tz, UTC) + out_local = 0 + out_tzoffset = 0 else: # Add a marker for naive string, to track if we are # parsing mixed naive and aware strings @@ -668,9 +670,11 @@ cpdef array_to_datetime(ndarray[object] values, str errors='raise', # dateutil parser will return incorrect result because # it will ignore nanoseconds if is_raise: - raise ValueError("time data {val} doesn't " - "match format specified" - .format(val=val)) + + # Still raise OutOfBoundsDatetime, + # as error message is informative. + raise + assert is_ignore return values, tz_out raise diff --git a/pandas/_libs/tslibs/nattype.pyx b/pandas/_libs/tslibs/nattype.pyx index a55d15a7c4e85..79e2e256c501d 100644 --- a/pandas/_libs/tslibs/nattype.pyx +++ b/pandas/_libs/tslibs/nattype.pyx @@ -183,9 +183,31 @@ cdef class _NaT(datetime): return np.datetime64(NPY_NAT, 'ns') def to_datetime64(self): - """ Returns a numpy.datetime64 object with 'ns' precision """ + """ + Return a numpy.datetime64 object with 'ns' precision. + """ return np.datetime64('NaT', 'ns') + def to_numpy(self, dtype=None, copy=False): + """ + Convert the Timestamp to a NumPy datetime64. + + .. versionadded:: 0.25.0 + + This is an alias method for `Timestamp.to_datetime64()`. The dtype and + copy parameters are available here only for compatibility. Their values + will not affect the return value. + + Returns + ------- + numpy.datetime64 + + See Also + -------- + DatetimeIndex.to_numpy : Similar method for DatetimeIndex. + """ + return self.to_datetime64() + def __repr__(self): return 'NaT' @@ -352,7 +374,6 @@ class NaTType(_NaT): utctimetuple = _make_error_func('utctimetuple', datetime) timetz = _make_error_func('timetz', datetime) timetuple = _make_error_func('timetuple', datetime) - strptime = _make_error_func('strptime', datetime) strftime = _make_error_func('strftime', datetime) isocalendar = _make_error_func('isocalendar', datetime) dst = _make_error_func('dst', datetime) @@ -366,6 +387,14 @@ class NaTType(_NaT): # The remaining methods have docstrings copy/pasted from the analogous # Timestamp methods. + strptime = _make_error_func('strptime', # noqa:E128 + """ + Timestamp.strptime(string, format) + + Function is not implemented. Use pd.to_datetime(). + """ + ) + utcfromtimestamp = _make_error_func('utcfromtimestamp', # noqa:E128 """ Timestamp.utcfromtimestamp(ts) @@ -382,7 +411,7 @@ class NaTType(_NaT): ) combine = _make_error_func('combine', # noqa:E128 """ - Timsetamp.combine(date, time) + Timestamp.combine(date, time) date, time -> datetime with same date and time fields """ @@ -448,7 +477,7 @@ class NaTType(_NaT): """ Timestamp.now(tz=None) - Returns new Timestamp object representing current time local to + Return new Timestamp object representing current time local to tz. Parameters @@ -669,7 +698,6 @@ class NaTType(_NaT): nanosecond : int, optional tzinfo : tz-convertible, optional fold : int, optional, default is 0 - added in 3.6, NotImplemented Returns ------- diff --git a/pandas/_libs/tslibs/offsets.pyx b/pandas/_libs/tslibs/offsets.pyx index 856aa52f82cf5..e28462f7103b9 100644 --- a/pandas/_libs/tslibs/offsets.pyx +++ b/pandas/_libs/tslibs/offsets.pyx @@ -18,6 +18,7 @@ from numpy cimport int64_t cnp.import_array() +from pandas._libs.tslibs cimport util from pandas._libs.tslibs.util cimport is_string_object, is_integer_object from pandas._libs.tslibs.ccalendar import MONTHS, DAYS @@ -408,6 +409,10 @@ class _BaseOffset(object): return self.apply(other) def __mul__(self, other): + if hasattr(other, "_typ"): + return NotImplemented + if util.is_array(other): + return np.array([self * x for x in other]) return type(self)(n=other * self.n, normalize=self.normalize, **self.kwds) @@ -458,6 +463,9 @@ class _BaseOffset(object): TypeError if `int(n)` raises ValueError if n != int(n) """ + if util.is_timedelta64_object(n): + raise TypeError('`n` argument must be an integer, ' + 'got {ntype}'.format(ntype=type(n))) try: nint = int(n) except (ValueError, TypeError): @@ -533,12 +541,20 @@ class _Tick(object): can do isinstance checks on _Tick and avoid importing tseries.offsets """ + # ensure that reversed-ops with numpy scalars return NotImplemented + __array_priority__ = 1000 + def __truediv__(self, other): result = self.delta.__truediv__(other) return _wrap_timedelta_result(result) + def __rtruediv__(self, other): + result = self.delta.__rtruediv__(other) + return _wrap_timedelta_result(result) + if PY2: __div__ = __truediv__ + __rdiv__ = __rtruediv__ # ---------------------------------------------------------------------- diff --git a/pandas/_libs/tslibs/period.pyx b/pandas/_libs/tslibs/period.pyx index e38e9a1ca5df6..a5a50ea59753d 100644 --- a/pandas/_libs/tslibs/period.pyx +++ b/pandas/_libs/tslibs/period.pyx @@ -138,11 +138,11 @@ cdef int64_t get_daytime_conversion_factor(int from_index, int to_index) nogil: return daytime_conversion_factor_matrix[row - 6][col - 6] -cdef int64_t nofunc(int64_t ordinal, asfreq_info *af_info): - return np.iinfo(np.int32).min +cdef int64_t nofunc(int64_t ordinal, asfreq_info *af_info) nogil: + return INT32_MIN -cdef int64_t no_op(int64_t ordinal, asfreq_info *af_info): +cdef int64_t no_op(int64_t ordinal, asfreq_info *af_info) nogil: return ordinal @@ -270,7 +270,8 @@ cdef int64_t DtoB_weekday(int64_t unix_date) nogil: return ((unix_date + 4) // 7) * 5 + ((unix_date + 4) % 7) - 4 -cdef int64_t DtoB(npy_datetimestruct *dts, int roll_back, int64_t unix_date): +cdef int64_t DtoB(npy_datetimestruct *dts, int roll_back, + int64_t unix_date) nogil: cdef: int day_of_week = dayofweek(dts.year, dts.month, dts.day) @@ -286,21 +287,23 @@ cdef int64_t DtoB(npy_datetimestruct *dts, int roll_back, int64_t unix_date): return DtoB_weekday(unix_date) -cdef inline int64_t upsample_daytime(int64_t ordinal, asfreq_info *af_info): +cdef inline int64_t upsample_daytime(int64_t ordinal, + asfreq_info *af_info) nogil: if (af_info.is_end): return (ordinal + 1) * af_info.intraday_conversion_factor - 1 else: return ordinal * af_info.intraday_conversion_factor -cdef inline int64_t downsample_daytime(int64_t ordinal, asfreq_info *af_info): +cdef inline int64_t downsample_daytime(int64_t ordinal, + asfreq_info *af_info) nogil: return ordinal // (af_info.intraday_conversion_factor) cdef inline int64_t transform_via_day(int64_t ordinal, asfreq_info *af_info, freq_conv_func first_func, - freq_conv_func second_func): + freq_conv_func second_func) nogil: cdef: int64_t result @@ -313,7 +316,7 @@ cdef inline int64_t transform_via_day(int64_t ordinal, # Conversion _to_ Daily Freq cdef void AtoD_ym(int64_t ordinal, int64_t *year, - int *month, asfreq_info *af_info): + int *month, asfreq_info *af_info) nogil: year[0] = ordinal + 1970 month[0] = 1 @@ -327,7 +330,7 @@ cdef void AtoD_ym(int64_t ordinal, int64_t *year, year[0] -= 1 -cdef int64_t asfreq_AtoDT(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_AtoDT(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int64_t unix_date, year int month @@ -341,7 +344,7 @@ cdef int64_t asfreq_AtoDT(int64_t ordinal, asfreq_info *af_info): cdef void QtoD_ym(int64_t ordinal, int *year, - int *month, asfreq_info *af_info): + int *month, asfreq_info *af_info) nogil: year[0] = ordinal // 4 + 1970 month[0] = (ordinal % 4) * 3 + 1 @@ -353,7 +356,7 @@ cdef void QtoD_ym(int64_t ordinal, int *year, year[0] -= 1 -cdef int64_t asfreq_QtoDT(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_QtoDT(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int64_t unix_date int year, month @@ -366,12 +369,12 @@ cdef int64_t asfreq_QtoDT(int64_t ordinal, asfreq_info *af_info): return upsample_daytime(unix_date, af_info) -cdef void MtoD_ym(int64_t ordinal, int *year, int *month): +cdef void MtoD_ym(int64_t ordinal, int *year, int *month) nogil: year[0] = ordinal // 12 + 1970 month[0] = ordinal % 12 + 1 -cdef int64_t asfreq_MtoDT(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_MtoDT(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int64_t unix_date int year, month @@ -384,7 +387,7 @@ cdef int64_t asfreq_MtoDT(int64_t ordinal, asfreq_info *af_info): return upsample_daytime(unix_date, af_info) -cdef int64_t asfreq_WtoDT(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_WtoDT(int64_t ordinal, asfreq_info *af_info) nogil: ordinal = (ordinal * 7 + af_info.from_end - 4 + (7 - 1) * (af_info.is_end - 1)) return upsample_daytime(ordinal, af_info) @@ -393,7 +396,7 @@ cdef int64_t asfreq_WtoDT(int64_t ordinal, asfreq_info *af_info): # -------------------------------------------------------------------- # Conversion _to_ BusinessDay Freq -cdef int64_t asfreq_AtoB(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_AtoB(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int roll_back npy_datetimestruct dts @@ -404,7 +407,7 @@ cdef int64_t asfreq_AtoB(int64_t ordinal, asfreq_info *af_info): return DtoB(&dts, roll_back, unix_date) -cdef int64_t asfreq_QtoB(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_QtoB(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int roll_back npy_datetimestruct dts @@ -415,7 +418,7 @@ cdef int64_t asfreq_QtoB(int64_t ordinal, asfreq_info *af_info): return DtoB(&dts, roll_back, unix_date) -cdef int64_t asfreq_MtoB(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_MtoB(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int roll_back npy_datetimestruct dts @@ -426,7 +429,7 @@ cdef int64_t asfreq_MtoB(int64_t ordinal, asfreq_info *af_info): return DtoB(&dts, roll_back, unix_date) -cdef int64_t asfreq_WtoB(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_WtoB(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int roll_back npy_datetimestruct dts @@ -437,7 +440,7 @@ cdef int64_t asfreq_WtoB(int64_t ordinal, asfreq_info *af_info): return DtoB(&dts, roll_back, unix_date) -cdef int64_t asfreq_DTtoB(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_DTtoB(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int roll_back npy_datetimestruct dts @@ -452,7 +455,7 @@ cdef int64_t asfreq_DTtoB(int64_t ordinal, asfreq_info *af_info): # ---------------------------------------------------------------------- # Conversion _from_ Daily Freq -cdef int64_t asfreq_DTtoA(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_DTtoA(int64_t ordinal, asfreq_info *af_info) nogil: cdef: npy_datetimestruct dts @@ -464,7 +467,7 @@ cdef int64_t asfreq_DTtoA(int64_t ordinal, asfreq_info *af_info): return (dts.year - 1970) -cdef int DtoQ_yq(int64_t ordinal, asfreq_info *af_info, int *year): +cdef int DtoQ_yq(int64_t ordinal, asfreq_info *af_info, int *year) nogil: cdef: npy_datetimestruct dts int quarter @@ -485,7 +488,7 @@ cdef int DtoQ_yq(int64_t ordinal, asfreq_info *af_info, int *year): return quarter -cdef int64_t asfreq_DTtoQ(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_DTtoQ(int64_t ordinal, asfreq_info *af_info) nogil: cdef: int year, quarter @@ -495,7 +498,7 @@ cdef int64_t asfreq_DTtoQ(int64_t ordinal, asfreq_info *af_info): return ((year - 1970) * 4 + quarter - 1) -cdef int64_t asfreq_DTtoM(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_DTtoM(int64_t ordinal, asfreq_info *af_info) nogil: cdef: npy_datetimestruct dts @@ -504,7 +507,7 @@ cdef int64_t asfreq_DTtoM(int64_t ordinal, asfreq_info *af_info): return ((dts.year - 1970) * 12 + dts.month - 1) -cdef int64_t asfreq_DTtoW(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_DTtoW(int64_t ordinal, asfreq_info *af_info) nogil: ordinal = downsample_daytime(ordinal, af_info) return (ordinal + 3 - af_info.to_end) // 7 + 1 @@ -512,30 +515,30 @@ cdef int64_t asfreq_DTtoW(int64_t ordinal, asfreq_info *af_info): # -------------------------------------------------------------------- # Conversion _from_ BusinessDay Freq -cdef int64_t asfreq_BtoDT(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_BtoDT(int64_t ordinal, asfreq_info *af_info) nogil: ordinal = ((ordinal + 3) // 5) * 7 + (ordinal + 3) % 5 -3 return upsample_daytime(ordinal, af_info) -cdef int64_t asfreq_BtoA(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_BtoA(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_BtoDT, asfreq_DTtoA) -cdef int64_t asfreq_BtoQ(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_BtoQ(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_BtoDT, asfreq_DTtoQ) -cdef int64_t asfreq_BtoM(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_BtoM(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_BtoDT, asfreq_DTtoM) -cdef int64_t asfreq_BtoW(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_BtoW(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_BtoDT, asfreq_DTtoW) @@ -544,25 +547,25 @@ cdef int64_t asfreq_BtoW(int64_t ordinal, asfreq_info *af_info): # ---------------------------------------------------------------------- # Conversion _from_ Annual Freq -cdef int64_t asfreq_AtoA(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_AtoA(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_AtoDT, asfreq_DTtoA) -cdef int64_t asfreq_AtoQ(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_AtoQ(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_AtoDT, asfreq_DTtoQ) -cdef int64_t asfreq_AtoM(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_AtoM(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_AtoDT, asfreq_DTtoM) -cdef int64_t asfreq_AtoW(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_AtoW(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_AtoDT, asfreq_DTtoW) @@ -571,25 +574,25 @@ cdef int64_t asfreq_AtoW(int64_t ordinal, asfreq_info *af_info): # ---------------------------------------------------------------------- # Conversion _from_ Quarterly Freq -cdef int64_t asfreq_QtoQ(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_QtoQ(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_QtoDT, asfreq_DTtoQ) -cdef int64_t asfreq_QtoA(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_QtoA(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_QtoDT, asfreq_DTtoA) -cdef int64_t asfreq_QtoM(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_QtoM(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_QtoDT, asfreq_DTtoM) -cdef int64_t asfreq_QtoW(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_QtoW(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_QtoDT, asfreq_DTtoW) @@ -598,19 +601,19 @@ cdef int64_t asfreq_QtoW(int64_t ordinal, asfreq_info *af_info): # ---------------------------------------------------------------------- # Conversion _from_ Monthly Freq -cdef int64_t asfreq_MtoA(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_MtoA(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_MtoDT, asfreq_DTtoA) -cdef int64_t asfreq_MtoQ(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_MtoQ(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_MtoDT, asfreq_DTtoQ) -cdef int64_t asfreq_MtoW(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_MtoW(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_MtoDT, asfreq_DTtoW) @@ -619,25 +622,25 @@ cdef int64_t asfreq_MtoW(int64_t ordinal, asfreq_info *af_info): # ---------------------------------------------------------------------- # Conversion _from_ Weekly Freq -cdef int64_t asfreq_WtoA(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_WtoA(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_WtoDT, asfreq_DTtoA) -cdef int64_t asfreq_WtoQ(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_WtoQ(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_WtoDT, asfreq_DTtoQ) -cdef int64_t asfreq_WtoM(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_WtoM(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_WtoDT, asfreq_DTtoM) -cdef int64_t asfreq_WtoW(int64_t ordinal, asfreq_info *af_info): +cdef int64_t asfreq_WtoW(int64_t ordinal, asfreq_info *af_info) nogil: return transform_via_day(ordinal, af_info, asfreq_WtoDT, asfreq_DTtoW) @@ -971,7 +974,7 @@ cdef int get_yq(int64_t ordinal, int freq, int *quarter, int *year): return qtr_freq -cdef inline int month_to_quarter(int month): +cdef inline int month_to_quarter(int month) nogil: return (month - 1) // 3 + 1 @@ -1024,9 +1027,6 @@ def periodarr_to_dt64arr(int64_t[:] periodarr, int freq): with nogil: for i in range(l): - if periodarr[i] == NPY_NAT: - out[i] = NPY_NAT - continue out[i] = period_ordinal_to_dt64(periodarr[i], freq) return out.base # .base to access underlying np.ndarray diff --git a/pandas/_libs/tslibs/src/datetime/np_datetime.c b/pandas/_libs/tslibs/src/datetime/np_datetime.c index 866c9ca9d3ac7..87866d804503e 100644 --- a/pandas/_libs/tslibs/src/datetime/np_datetime.c +++ b/pandas/_libs/tslibs/src/datetime/np_datetime.c @@ -30,7 +30,7 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt #if PY_MAJOR_VERSION >= 3 #define PyInt_AsLong PyLong_AsLong -#endif +#endif // PyInt_AsLong const npy_datetimestruct _NS_MIN_DTS = { 1677, 9, 21, 0, 12, 43, 145225, 0, 0}; diff --git a/pandas/_libs/tslibs/src/datetime/np_datetime_strings.c b/pandas/_libs/tslibs/src/datetime/np_datetime_strings.c index 05ccdd13598fb..207da4b8f8340 100644 --- a/pandas/_libs/tslibs/src/datetime/np_datetime_strings.c +++ b/pandas/_libs/tslibs/src/datetime/np_datetime_strings.c @@ -609,7 +609,7 @@ int make_iso_8601_datetime(npy_datetimestruct *dts, char *outstr, int outlen, tmplen = _snprintf(substr, sublen, "%04" NPY_INT64_FMT, dts->year); #else tmplen = snprintf(substr, sublen, "%04" NPY_INT64_FMT, dts->year); -#endif +#endif // _WIN32 /* If it ran out of space or there isn't space for the NULL terminator */ if (tmplen < 0 || tmplen > sublen) { goto string_too_short; diff --git a/pandas/_libs/tslibs/strptime.pyx b/pandas/_libs/tslibs/strptime.pyx index 87658ae92175e..e7674c5c364a1 100644 --- a/pandas/_libs/tslibs/strptime.pyx +++ b/pandas/_libs/tslibs/strptime.pyx @@ -54,7 +54,10 @@ cdef dict _parse_code_table = {'y': 0, 'W': 16, 'Z': 17, 'p': 18, # an additional key, only with I - 'z': 19} + 'z': 19, + 'G': 20, + 'V': 21, + 'u': 22} def array_strptime(object[:] values, object fmt, @@ -77,6 +80,7 @@ def array_strptime(object[:] values, object fmt, object[:] result_timezone int year, month, day, minute, hour, second, weekday, julian int week_of_year, week_of_year_start, parse_code, ordinal + int iso_week, iso_year int64_t us, ns object val, group_key, ampm, found, timezone dict found_key @@ -169,13 +173,14 @@ def array_strptime(object[:] values, object fmt, raise ValueError("time data %r does not match format " "%r (search)" % (values[i], fmt)) + iso_year = -1 year = 1900 month = day = 1 hour = minute = second = ns = us = 0 timezone = None # Default to -1 to signify that values not known; not critical to have, # though - week_of_year = -1 + iso_week = week_of_year = -1 week_of_year_start = -1 # weekday and julian defaulted to -1 so as to signal need to calculate # values @@ -265,13 +270,44 @@ def array_strptime(object[:] values, object fmt, timezone = pytz.timezone(found_dict['Z']) elif parse_code == 19: timezone = parse_timezone_directive(found_dict['z']) + elif parse_code == 20: + iso_year = int(found_dict['G']) + elif parse_code == 21: + iso_week = int(found_dict['V']) + elif parse_code == 22: + weekday = int(found_dict['u']) + weekday -= 1 + + # don't assume default values for ISO week/year + if iso_year != -1: + if iso_week == -1 or weekday == -1: + raise ValueError("ISO year directive '%G' must be used with " + "the ISO week directive '%V' and a weekday " + "directive '%A', '%a', '%w', or '%u'.") + if julian != -1: + raise ValueError("Day of the year directive '%j' is not " + "compatible with ISO year directive '%G'. " + "Use '%Y' instead.") + elif year != -1 and week_of_year == -1 and iso_week != -1: + if weekday == -1: + raise ValueError("ISO week directive '%V' must be used with " + "the ISO year directive '%G' and a weekday " + "directive '%A', '%a', '%w', or '%u'.") + else: + raise ValueError("ISO week directive '%V' is incompatible with" + " the year directive '%Y'. Use the ISO year " + "'%G' instead.") # If we know the wk of the year and what day of that wk, we can figure # out the Julian day of the year. - if julian == -1 and week_of_year != -1 and weekday != -1: - week_starts_Mon = True if week_of_year_start == 0 else False - julian = _calc_julian_from_U_or_W(year, week_of_year, weekday, - week_starts_Mon) + if julian == -1 and weekday != -1: + if week_of_year != -1: + week_starts_Mon = week_of_year_start == 0 + julian = _calc_julian_from_U_or_W(year, week_of_year, weekday, + week_starts_Mon) + elif iso_year != -1 and iso_week != -1: + year, julian = _calc_julian_from_V(iso_year, iso_week, + weekday + 1) # Cannot pre-calculate datetime_date() since can change in Julian # calculation and thus could have different value for the day of the wk # calculation. @@ -511,6 +547,7 @@ class TimeRE(dict): # The " \d" part of the regex is to make %c from ANSI C work 'd': r"(?P3[0-1]|[1-2]\d|0[1-9]|[1-9]| [1-9])", 'f': r"(?P[0-9]{1,9})", + 'G': r"(?P\d\d\d\d)", 'H': r"(?P2[0-3]|[0-1]\d|\d)", 'I': r"(?P1[0-2]|0[1-9]|[1-9])", 'j': (r"(?P36[0-6]|3[0-5]\d|[1-2]\d\d|0[1-9]\d|00[1-9]|" @@ -518,7 +555,9 @@ class TimeRE(dict): 'm': r"(?P1[0-2]|0[1-9]|[1-9])", 'M': r"(?P[0-5]\d|\d)", 'S': r"(?P6[0-1]|[0-5]\d|\d)", + 'u': r"(?P[1-7])", 'U': r"(?P5[0-3]|[0-4]\d|\d)", + 'V': r"(?P5[0-3]|0[1-9]|[1-4]\d|\d)", 'w': r"(?P[0-6])", # W is set below by using 'U' 'y': r"(?P\d\d)", @@ -597,7 +636,16 @@ cdef _calc_julian_from_U_or_W(int year, int week_of_year, int day_of_week, int week_starts_Mon): """Calculate the Julian day based on the year, week of the year, and day of the week, with week_start_day representing whether the week of the year - assumes the week starts on Sunday or Monday (6 or 0).""" + assumes the week starts on Sunday or Monday (6 or 0). + + :param year: the year + :param week_of_year: week taken from format U or W + :param day_of_week: weekday + :param week_starts_Mon: represent whether the week of the year + assumes the week starts on Sunday or Monday (6 or 0) + :returns: converted julian day. + :rtype: int + """ cdef: int first_weekday, week_0_length, days_to_week @@ -620,6 +668,31 @@ cdef _calc_julian_from_U_or_W(int year, int week_of_year, return 1 + days_to_week + day_of_week +cdef _calc_julian_from_V(int iso_year, int iso_week, int iso_weekday): + """Calculate the Julian day based on the ISO 8601 year, week, and weekday. + ISO weeks start on Mondays, with week 01 being the week containing 4 Jan. + ISO week days range from 1 (Monday) to 7 (Sunday). + + :param iso_year: the year taken from format %G + :param iso_week: the week taken from format %V + :param iso_weekday: weekday taken from format %u + :returns: the passed in year and the Gregorian ordinal date / julian date system + :rtype: (int, int) + """ + + cdef: + int correction, ordinal + correction = datetime_date(iso_year, 1, 4).isoweekday() + 3 + ordinal = (iso_week * 7) + iso_weekday - correction + # ordinal may be negative or 0 now, which means the date is in the previous + # calendar year + if ordinal < 1: + ordinal += datetime_date(iso_year, 1, 1).toordinal() + iso_year -= 1 + ordinal -= datetime_date(iso_year, 1, 1).toordinal() + return iso_year, ordinal + + cdef parse_timezone_directive(object z): """ Parse the '%z' directive and return a pytz.FixedOffset diff --git a/pandas/_libs/tslibs/timedeltas.pyx b/pandas/_libs/tslibs/timedeltas.pyx index 0a19d8749fc7c..6e40063fb925a 100644 --- a/pandas/_libs/tslibs/timedeltas.pyx +++ b/pandas/_libs/tslibs/timedeltas.pyx @@ -824,6 +824,26 @@ cdef class _Timedelta(timedelta): """ Returns a numpy.timedelta64 object with 'ns' precision """ return np.timedelta64(self.value, 'ns') + def to_numpy(self, dtype=None, copy=False): + """ + Convert the Timestamp to a NumPy timedelta64. + + .. versionadded:: 0.25.0 + + This is an alias method for `Timedelta.to_timedelta64()`. The dtype and + copy parameters are available here only for compatibility. Their values + will not affect the return value. + + Returns + ------- + numpy.timedelta64 + + See Also + -------- + Series.to_numpy : Similar method for Series. + """ + return self.to_timedelta64() + def total_seconds(self): """ Total duration of timedelta in seconds (to ns precision) @@ -1127,10 +1147,11 @@ class Timedelta(_Timedelta): 'ms', 'milliseconds', 'millisecond', 'milli', 'millis', 'L', 'us', 'microseconds', 'microsecond', 'micro', 'micros', 'U', 'ns', 'nanoseconds', 'nano', 'nanos', 'nanosecond', 'N'} - days, seconds, microseconds, - milliseconds, minutes, hours, weeks : numeric, optional + **kwargs + Available kwargs: {days, seconds, microseconds, + milliseconds, minutes, hours, weeks}. Values for construction in compat with datetime.timedelta. - np ints and floats will be coerced to python ints and floats. + Numpy ints and floats will be coerced to python ints and floats. Notes ----- @@ -1158,6 +1179,11 @@ class Timedelta(_Timedelta): "[weeks, days, hours, minutes, seconds, " "milliseconds, microseconds, nanoseconds]") + if unit in {'Y', 'y', 'M'}: + warnings.warn("M and Y units are deprecated and " + "will be removed in a future version.", + FutureWarning, stacklevel=1) + if isinstance(value, Timedelta): value = value.value elif is_string_object(value): diff --git a/pandas/_libs/tslibs/timestamps.pyx b/pandas/_libs/tslibs/timestamps.pyx index fe0564cb62c30..8d825e0a6179e 100644 --- a/pandas/_libs/tslibs/timestamps.pyx +++ b/pandas/_libs/tslibs/timestamps.pyx @@ -1,4 +1,5 @@ # -*- coding: utf-8 -*- +import sys import warnings from cpython cimport (PyObject_RichCompareBool, PyObject_RichCompare, @@ -43,10 +44,11 @@ from pandas._libs.tslibs.timezones import UTC # Constants _zero_time = datetime_time(0, 0) _no_input = object() - +PY36 = sys.version_info >= (3, 6) # ---------------------------------------------------------------------- + def maybe_integer_op_deprecated(obj): # GH#22535 add/sub of integers and int-arrays is deprecated if obj.freq is not None: @@ -197,7 +199,7 @@ def round_nsint64(values, mode, freq): # This is PITA. Because we inherit from datetime, which has very specific # construction requirements, we need to do object instantiation in python -# (see Timestamp class above). This will serve as a C extension type that +# (see Timestamp class below). This will serve as a C extension type that # shadows the python class, where we do any heavy lifting. cdef class _Timestamp(datetime): @@ -338,9 +340,31 @@ cdef class _Timestamp(datetime): self.microsecond, self.tzinfo) cpdef to_datetime64(self): - """ Returns a numpy.datetime64 object with 'ns' precision """ + """ + Return a numpy.datetime64 object with 'ns' precision. + """ return np.datetime64(self.value, 'ns') + def to_numpy(self, dtype=None, copy=False): + """ + Convert the Timestamp to a NumPy datetime64. + + .. versionadded:: 0.25.0 + + This is an alias method for `Timestamp.to_datetime64()`. The dtype and + copy parameters are available here only for compatibility. Their values + will not affect the return value. + + Returns + ------- + numpy.datetime64 + + See Also + -------- + DatetimeIndex.to_numpy : Similar method for DatetimeIndex. + """ + return self.to_datetime64() + def __add__(self, other): cdef: int64_t other_int, nanos @@ -500,6 +524,9 @@ cdef class _Timestamp(datetime): @property def asm8(self): + """ + Return numpy datetime64 format in nanoseconds. + """ return np.datetime64(self.value, 'ns') @property @@ -566,15 +593,18 @@ class Timestamp(_Timestamp): Using the primary calling convention: This converts a datetime-like string + >>> pd.Timestamp('2017-01-01T12') Timestamp('2017-01-01 12:00:00') This converts a float representing a Unix epoch in units of seconds + >>> pd.Timestamp(1513393355.5, unit='s') Timestamp('2017-12-16 03:02:35.500000') This converts an int representing a Unix-epoch in units of seconds and for a particular timezone + >>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific') Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific') @@ -612,7 +642,7 @@ class Timestamp(_Timestamp): """ Timestamp.now(tz=None) - Returns new Timestamp object representing current time local to + Return new Timestamp object representing current time local to tz. Parameters @@ -667,10 +697,21 @@ class Timestamp(_Timestamp): """ return cls(datetime.fromtimestamp(ts)) + # Issue 25016. + @classmethod + def strptime(cls, date_string, format): + """ + Timestamp.strptime(string, format) + + Function is not implemented. Use pd.to_datetime(). + """ + raise NotImplementedError("Timestamp.strptime() is not implmented." + "Use to_datetime() to parse date strings.") + @classmethod def combine(cls, date, time): """ - Timsetamp.combine(date, time) + Timestamp.combine(date, time) date, time -> datetime with same date and time fields """ @@ -930,6 +971,9 @@ class Timestamp(_Timestamp): @property def dayofweek(self): + """ + Return day of whe week. + """ return self.weekday() def day_name(self, locale=None): @@ -979,30 +1023,48 @@ class Timestamp(_Timestamp): @property def dayofyear(self): + """ + Return the day of the year. + """ return ccalendar.get_day_of_year(self.year, self.month, self.day) @property def week(self): + """ + Return the week number of the year. + """ return ccalendar.get_week_of_year(self.year, self.month, self.day) weekofyear = week @property def quarter(self): + """ + Return the quarter of the year. + """ return ((self.month - 1) // 3) + 1 @property def days_in_month(self): + """ + Return the number of days in the month. + """ return ccalendar.get_days_in_month(self.year, self.month) daysinmonth = days_in_month @property def freqstr(self): + """ + Return the total number of days in the month. + """ return getattr(self.freq, 'freqstr', self.freq) @property def is_month_start(self): + """ + Return True if date is first day of month. + """ if self.freq is None: # fast-path for non-business frequencies return self.day == 1 @@ -1010,6 +1072,9 @@ class Timestamp(_Timestamp): @property def is_month_end(self): + """ + Return True if date is last day of month. + """ if self.freq is None: # fast-path for non-business frequencies return self.day == self.days_in_month @@ -1017,6 +1082,9 @@ class Timestamp(_Timestamp): @property def is_quarter_start(self): + """ + Return True if date is first day of the quarter. + """ if self.freq is None: # fast-path for non-business frequencies return self.day == 1 and self.month % 3 == 1 @@ -1024,6 +1092,9 @@ class Timestamp(_Timestamp): @property def is_quarter_end(self): + """ + Return True if date is last day of the quarter. + """ if self.freq is None: # fast-path for non-business frequencies return (self.month % 3) == 0 and self.day == self.days_in_month @@ -1031,6 +1102,9 @@ class Timestamp(_Timestamp): @property def is_year_start(self): + """ + Return True if date is first day of the year. + """ if self.freq is None: # fast-path for non-business frequencies return self.day == self.month == 1 @@ -1038,6 +1112,9 @@ class Timestamp(_Timestamp): @property def is_year_end(self): + """ + Return True if date is last day of the year. + """ if self.freq is None: # fast-path for non-business frequencies return self.month == 12 and self.day == 31 @@ -1045,6 +1122,9 @@ class Timestamp(_Timestamp): @property def is_leap_year(self): + """ + Return True if year is a leap year. + """ return bool(ccalendar.is_leapyear(self.year)) def tz_localize(self, tz, ambiguous='raise', nonexistent='raise', @@ -1138,12 +1218,12 @@ class Timestamp(_Timestamp): value = tz_localize_to_utc(np.array([self.value], dtype='i8'), tz, ambiguous=ambiguous, nonexistent=nonexistent)[0] - return Timestamp(value, tz=tz) + return Timestamp(value, tz=tz, freq=self.freq) else: if tz is None: # reset tz value = tz_convert_single(self.value, UTC, self.tz) - return Timestamp(value, tz=None) + return Timestamp(value, tz=tz, freq=self.freq) else: raise TypeError('Cannot localize tz-aware Timestamp, use ' 'tz_convert for conversions') @@ -1173,7 +1253,7 @@ class Timestamp(_Timestamp): 'tz_localize to localize') else: # Same UTC timestamp, different time zone - return Timestamp(self.value, tz=tz) + return Timestamp(self.value, tz=tz, freq=self.freq) astimezone = tz_convert @@ -1195,7 +1275,6 @@ class Timestamp(_Timestamp): nanosecond : int, optional tzinfo : tz-convertible, optional fold : int, optional, default is 0 - added in 3.6, NotImplemented Returns ------- @@ -1252,12 +1331,16 @@ class Timestamp(_Timestamp): # see GH#18319 ts_input = _tzinfo.localize(datetime(dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, - dts.us)) + dts.us), + is_dst=not bool(fold)) _tzinfo = ts_input.tzinfo else: - ts_input = datetime(dts.year, dts.month, dts.day, - dts.hour, dts.min, dts.sec, dts.us, - tzinfo=_tzinfo) + kwargs = {'year': dts.year, 'month': dts.month, 'day': dts.day, + 'hour': dts.hour, 'minute': dts.min, 'second': dts.sec, + 'microsecond': dts.us, 'tzinfo': _tzinfo} + if PY36: + kwargs['fold'] = fold + ts_input = datetime(**kwargs) ts = convert_datetime_to_tsobject(ts_input, _tzinfo) value = ts.value + (dts.ps // 1000) diff --git a/pandas/compat/__init__.py b/pandas/compat/__init__.py index f9c659106a516..4036af85b7212 100644 --- a/pandas/compat/__init__.py +++ b/pandas/compat/__init__.py @@ -9,7 +9,6 @@ * lists: lrange(), lmap(), lzip(), lfilter() * unicode: u() [no unicode builtin in Python 3] * longs: long (int in Python 3) -* callable * iterable method compatibility: iteritems, iterkeys, itervalues * Uses the original method if available, otherwise uses items, keys, values. * types: @@ -138,6 +137,7 @@ def lfilter(*args, **kwargs): reload = reload Hashable = collections.abc.Hashable Iterable = collections.abc.Iterable + Iterator = collections.abc.Iterator Mapping = collections.abc.Mapping MutableMapping = collections.abc.MutableMapping Sequence = collections.abc.Sequence @@ -200,6 +200,7 @@ def get_range_parameters(data): Hashable = collections.Hashable Iterable = collections.Iterable + Iterator = collections.Iterator Mapping = collections.Mapping MutableMapping = collections.MutableMapping Sequence = collections.Sequence @@ -378,14 +379,6 @@ class ResourceWarning(Warning): string_and_binary_types = string_types + (binary_type,) -try: - # callable reintroduced in later versions of Python - callable = callable -except NameError: - def callable(obj): - return any("__call__" in klass.__dict__ for klass in type(obj).__mro__) - - if PY2: # In PY2 functools.wraps doesn't provide metadata pytest needs to generate # decorated tests using parametrization. See pytest GH issue #2782 @@ -411,8 +404,6 @@ def wrapper(cls): return metaclass(cls.__name__, cls.__bases__, orig_vars) return wrapper -from collections import OrderedDict, Counter - if PY3: def raise_with_traceback(exc, traceback=Ellipsis): if traceback == Ellipsis: diff --git a/pandas/compat/numpy/__init__.py b/pandas/compat/numpy/__init__.py index 5e67cf2ee2837..6e9f768d8bd68 100644 --- a/pandas/compat/numpy/__init__.py +++ b/pandas/compat/numpy/__init__.py @@ -12,6 +12,8 @@ _np_version_under1p13 = _nlv < LooseVersion('1.13') _np_version_under1p14 = _nlv < LooseVersion('1.14') _np_version_under1p15 = _nlv < LooseVersion('1.15') +_np_version_under1p16 = _nlv < LooseVersion('1.16') +_np_version_under1p17 = _nlv < LooseVersion('1.17') if _nlv < '1.12': @@ -64,5 +66,7 @@ def np_array_datetime64_compat(arr, *args, **kwargs): __all__ = ['np', '_np_version_under1p13', '_np_version_under1p14', - '_np_version_under1p15' + '_np_version_under1p15', + '_np_version_under1p16', + '_np_version_under1p17' ] diff --git a/pandas/compat/numpy/function.py b/pandas/compat/numpy/function.py index 417ddd0d8af17..f15783ad642b4 100644 --- a/pandas/compat/numpy/function.py +++ b/pandas/compat/numpy/function.py @@ -17,10 +17,10 @@ and methods that are spread throughout the codebase. This module will make it easier to adjust to future upstream changes in the analogous numpy signatures. """ +from collections import OrderedDict from numpy import ndarray -from pandas.compat import OrderedDict from pandas.errors import UnsupportedFunctionCall from pandas.util._validators import ( validate_args, validate_args_and_kwargs, validate_kwargs) diff --git a/pandas/compat/pickle_compat.py b/pandas/compat/pickle_compat.py index 61295b8249f58..8f16f8154b952 100644 --- a/pandas/compat/pickle_compat.py +++ b/pandas/compat/pickle_compat.py @@ -201,7 +201,7 @@ def load_newobj_ex(self): pass -def load(fh, encoding=None, compat=False, is_verbose=False): +def load(fh, encoding=None, is_verbose=False): """load a pickle, with a provided encoding if compat is True: @@ -212,7 +212,6 @@ def load(fh, encoding=None, compat=False, is_verbose=False): ---------- fh : a filelike object encoding : an optional encoding - compat : provide Series compatibility mode, boolean, default False is_verbose : show exception output """ diff --git a/pandas/core/accessor.py b/pandas/core/accessor.py index 961488ff12e58..050749741e7bd 100644 --- a/pandas/core/accessor.py +++ b/pandas/core/accessor.py @@ -16,11 +16,15 @@ class DirNamesMixin(object): ['asobject', 'base', 'data', 'flags', 'itemsize', 'strides']) def _dir_deletions(self): - """ delete unwanted __dir__ for this object """ + """ + Delete unwanted __dir__ for this object. + """ return self._accessors | self._deprecations def _dir_additions(self): - """ add additional __dir__ for this object """ + """ + Add additional __dir__ for this object. + """ rv = set() for accessor in self._accessors: try: @@ -33,7 +37,7 @@ def _dir_additions(self): def __dir__(self): """ Provide method name lookup and completion - Only provide 'public' methods + Only provide 'public' methods. """ rv = set(dir(type(self))) rv = (rv - self._dir_deletions()) | self._dir_additions() @@ -42,7 +46,7 @@ def __dir__(self): class PandasDelegate(object): """ - an abstract base class for delegating methods/properties + An abstract base class for delegating methods/properties. """ def _delegate_property_get(self, name, *args, **kwargs): @@ -65,10 +69,10 @@ def _add_delegate_accessors(cls, delegate, accessors, typ, ---------- cls : the class to add the methods/properties to delegate : the class to get methods/properties & doc-strings - acccessors : string list of accessors to add + accessors : string list of accessors to add typ : 'property' or 'method' overwrite : boolean, default False - overwrite the method/property in the target class if it exists + overwrite the method/property in the target class if it exists. """ def _create_delegator_property(name): @@ -117,7 +121,7 @@ def delegate_names(delegate, accessors, typ, overwrite=False): ---------- delegate : object the class to get methods/properties & doc-strings - acccessors : Sequence[str] + accessors : Sequence[str] List of accessor to add typ : {'property', 'method'} overwrite : boolean, default False diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py index b473a7aef929e..4a71951e2435e 100644 --- a/pandas/core/algorithms.py +++ b/pandas/core/algorithms.py @@ -19,7 +19,7 @@ ensure_float64, ensure_int64, ensure_object, ensure_platform_int, ensure_uint64, is_array_like, is_bool_dtype, is_categorical_dtype, is_complex_dtype, is_datetime64_any_dtype, is_datetime64tz_dtype, - is_datetimelike, is_extension_array_dtype, is_float_dtype, + is_datetimelike, is_extension_array_dtype, is_float_dtype, is_integer, is_integer_dtype, is_interval_dtype, is_list_like, is_numeric_dtype, is_object_dtype, is_period_dtype, is_scalar, is_signed_integer_dtype, is_sparse, is_timedelta64_dtype, is_unsigned_integer_dtype, @@ -288,15 +288,20 @@ def unique(values): Returns ------- - unique values. - - If the input is an Index, the return is an Index - - If the input is a Categorical dtype, the return is a Categorical - - If the input is a Series/ndarray, the return will be an ndarray + numpy.ndarray or ExtensionArray + + The return can be: + + * Index : when the input is an Index + * Categorical : when the input is a Categorical dtype + * ndarray : when the input is a Series/ndarray + + Return numpy.ndarray or ExtensionArray. See Also -------- - pandas.Index.unique - pandas.Series.unique + Index.unique + Series.unique Examples -------- @@ -563,7 +568,7 @@ def _factorize_array(values, na_sentinel=-1, size_hint=None, coerced to ndarrays before factorization. """), order=dedent("""\ - order + order : None .. deprecated:: 0.23.0 This parameter has no effect and is deprecated. @@ -1724,6 +1729,89 @@ def func(arr, indexer, out, fill_value=np.nan): return out +# ------------ # +# searchsorted # +# ------------ # + +def searchsorted(arr, value, side="left", sorter=None): + """ + Find indices where elements should be inserted to maintain order. + + .. versionadded:: 0.25.0 + + Find the indices into a sorted array `arr` (a) such that, if the + corresponding elements in `value` were inserted before the indices, + the order of `arr` would be preserved. + + Assuming that `arr` is sorted: + + ====== ================================ + `side` returned index `i` satisfies + ====== ================================ + left ``arr[i-1] < value <= self[i]`` + right ``arr[i-1] <= value < self[i]`` + ====== ================================ + + Parameters + ---------- + arr: array-like + Input array. If `sorter` is None, then it must be sorted in + ascending order, otherwise `sorter` must be an array of indices + that sort it. + value : array_like + Values to insert into `arr`. + side : {'left', 'right'}, optional + If 'left', the index of the first suitable location found is given. + If 'right', return the last such index. If there is no suitable + index, return either 0 or N (where N is the length of `self`). + sorter : 1-D array_like, optional + Optional array of integer indices that sort array a into ascending + order. They are typically the result of argsort. + + Returns + ------- + array of ints + Array of insertion points with the same shape as `value`. + + See Also + -------- + numpy.searchsorted : Similar method from NumPy. + """ + if sorter is not None: + sorter = ensure_platform_int(sorter) + + if isinstance(arr, np.ndarray) and is_integer_dtype(arr) and ( + is_integer(value) or is_integer_dtype(value)): + from .arrays.array_ import array + # if `arr` and `value` have different dtypes, `arr` would be + # recast by numpy, causing a slow search. + # Before searching below, we therefore try to give `value` the + # same dtype as `arr`, while guarding against integer overflows. + iinfo = np.iinfo(arr.dtype.type) + value_arr = np.array([value]) if is_scalar(value) else np.array(value) + if (value_arr >= iinfo.min).all() and (value_arr <= iinfo.max).all(): + # value within bounds, so no overflow, so can convert value dtype + # to dtype of arr + dtype = arr.dtype + else: + dtype = value_arr.dtype + + if is_scalar(value): + value = dtype.type(value) + else: + value = array(value, dtype=dtype) + elif not (is_object_dtype(arr) or is_numeric_dtype(arr) or + is_categorical_dtype(arr)): + from pandas.core.series import Series + # E.g. if `arr` is an array with dtype='datetime64[ns]' + # and `value` is a pd.Timestamp, we may need to convert value + value_ser = Series(value)._values + value = value_ser[0] if is_scalar(value) else value_ser + + result = arr.searchsorted(value, side=side, sorter=sorter) + return result + + # ---- # # diff # # ---- # diff --git a/pandas/core/arrays/base.py b/pandas/core/arrays/base.py index 7aaefef3d03e5..e770281596134 100644 --- a/pandas/core/arrays/base.py +++ b/pandas/core/arrays/base.py @@ -555,17 +555,17 @@ def searchsorted(self, value, side="left", sorter=None): .. versionadded:: 0.24.0 Find the indices into a sorted array `self` (a) such that, if the - corresponding elements in `v` were inserted before the indices, the - order of `self` would be preserved. + corresponding elements in `value` were inserted before the indices, + the order of `self` would be preserved. - Assuming that `a` is sorted: + Assuming that `self` is sorted: - ====== ============================ + ====== ================================ `side` returned index `i` satisfies - ====== ============================ - left ``self[i-1] < v <= self[i]`` - right ``self[i-1] <= v < self[i]`` - ====== ============================ + ====== ================================ + left ``self[i-1] < value <= self[i]`` + right ``self[i-1] <= value < self[i]`` + ====== ================================ Parameters ---------- @@ -581,7 +581,7 @@ def searchsorted(self, value, side="left", sorter=None): Returns ------- - indices : array of ints + array of ints Array of insertion points with the same shape as `value`. See Also diff --git a/pandas/core/arrays/categorical.py b/pandas/core/arrays/categorical.py index 35b662eaae9a5..7f77a5dcce613 100644 --- a/pandas/core/arrays/categorical.py +++ b/pandas/core/arrays/categorical.py @@ -214,7 +214,7 @@ def contains(cat, key, container): class Categorical(ExtensionArray, PandasObject): """ - Represents a categorical variable in classic R / S-plus fashion + Represent a categorical variable in classic R / S-plus fashion. `Categoricals` can only take on only a limited, and usually fixed, number of possible values (`categories`). In contrast to statistical categorical @@ -235,7 +235,7 @@ class Categorical(ExtensionArray, PandasObject): The unique categories for this categorical. If not given, the categories are assumed to be the unique values of `values` (sorted, if possible, otherwise in the order in which they appear). - ordered : boolean, (default False) + ordered : bool, default False Whether or not this categorical is treated as a ordered categorical. If True, the resulting categorical will be ordered. An ordered categorical respects, when sorted, the order of its @@ -253,7 +253,7 @@ class Categorical(ExtensionArray, PandasObject): codes : ndarray The codes (integer positions, which point to the categories) of this categorical, read only. - ordered : boolean + ordered : bool Whether or not this Categorical is ordered. dtype : CategoricalDtype The instance of ``CategoricalDtype`` storing the ``categories`` @@ -276,7 +276,7 @@ class Categorical(ExtensionArray, PandasObject): See Also -------- - pandas.api.types.CategoricalDtype : Type for categorical data. + api.types.CategoricalDtype : Type for categorical data. CategoricalIndex : An Index with an underlying ``Categorical``. Notes @@ -297,7 +297,7 @@ class Categorical(ExtensionArray, PandasObject): Ordered `Categoricals` can be sorted according to the custom order of the categories and can have a min and max value. - >>> c = pd.Categorical(['a','b','c','a','b','c'], ordered=True, + >>> c = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'], ordered=True, ... categories=['c', 'b', 'a']) >>> c [a, b, c, a, b, c] @@ -323,14 +323,6 @@ def __init__(self, values, categories=None, ordered=None, dtype=None, # we may have dtype.categories be None, and we need to # infer categories in a factorization step futher below - if is_categorical(values): - # GH23814, for perf, if values._values already an instance of - # Categorical, set values to codes, and run fastpath - if (isinstance(values, (ABCSeries, ABCIndexClass)) and - isinstance(values._values, type(self))): - values = values._values.codes.copy() - fastpath = True - if fastpath: self._codes = coerce_indexer_dtype(values, dtype.categories) self._dtype = self._dtype.update_dtype(dtype) @@ -382,7 +374,7 @@ def __init__(self, values, categories=None, ordered=None, dtype=None, dtype = CategoricalDtype(categories, dtype.ordered) elif is_categorical_dtype(values): - old_codes = (values.cat.codes if isinstance(values, ABCSeries) + old_codes = (values._values.codes if isinstance(values, ABCSeries) else values.codes) codes = _recode_for_categories(old_codes, values.dtype.categories, dtype.categories) @@ -618,7 +610,7 @@ def from_codes(cls, codes, categories=None, ordered=None, dtype=None): ---------- codes : array-like, integers An integer array, where each integer points to a category in - categories or dtype.categories, or else is -1 for NaN + categories or dtype.categories, or else is -1 for NaN. categories : index-like, optional The categories for the categorical. Items need to be unique. If the categories are not given here, then they must be provided @@ -700,7 +692,7 @@ def _set_categories(self, categories, fastpath=False): Parameters ---------- - fastpath : boolean (default: False) + fastpath : bool, default False Don't perform validation of the categories for uniqueness or nulls Examples @@ -747,15 +739,15 @@ def _set_dtype(self, dtype): def set_ordered(self, value, inplace=False): """ - Sets the ordered attribute to the boolean value + Set the ordered attribute to the boolean value. Parameters ---------- - value : boolean to set whether this categorical is ordered (True) or - not (False) - inplace : boolean (default: False) - Whether or not to set the ordered attribute inplace or return a copy - of this categorical with ordered set to the value + value : bool + Set whether this categorical is ordered (True) or not (False). + inplace : bool, default False + Whether or not to set the ordered attribute in-place or return + a copy of this categorical with ordered set to the value. """ inplace = validate_bool_kwarg(inplace, 'inplace') new_dtype = CategoricalDtype(self.categories, ordered=value) @@ -770,9 +762,9 @@ def as_ordered(self, inplace=False): Parameters ---------- - inplace : boolean (default: False) - Whether or not to set the ordered attribute inplace or return a copy - of this categorical with ordered set to True + inplace : bool, default False + Whether or not to set the ordered attribute in-place or return + a copy of this categorical with ordered set to True. """ inplace = validate_bool_kwarg(inplace, 'inplace') return self.set_ordered(True, inplace=inplace) @@ -783,9 +775,9 @@ def as_unordered(self, inplace=False): Parameters ---------- - inplace : boolean (default: False) - Whether or not to set the ordered attribute inplace or return a copy - of this categorical with ordered set to False + inplace : bool, default False + Whether or not to set the ordered attribute in-place or return + a copy of this categorical with ordered set to False. """ inplace = validate_bool_kwarg(inplace, 'inplace') return self.set_ordered(False, inplace=inplace) @@ -793,7 +785,7 @@ def as_unordered(self, inplace=False): def set_categories(self, new_categories, ordered=None, rename=False, inplace=False): """ - Sets the categories to the specified new_categories. + Set the categories to the specified new_categories. `new_categories` can include new categories (which will result in unused categories) or remove old categories (which results in values @@ -815,19 +807,19 @@ def set_categories(self, new_categories, ordered=None, rename=False, ---------- new_categories : Index-like The categories in new order. - ordered : boolean, (default: False) + ordered : bool, default False Whether or not the categorical is treated as a ordered categorical. If not given, do not change the ordered information. - rename : boolean (default: False) + rename : bool, default False Whether or not the new_categories should be considered as a rename of the old categories or as reordered categories. - inplace : boolean (default: False) - Whether or not to reorder the categories inplace or return a copy of - this categorical with reordered categories. + inplace : bool, default False + Whether or not to reorder the categories in-place or return a copy + of this categorical with reordered categories. Returns ------- - cat : Categorical with reordered categories or None if inplace. + Categorical with reordered categories or None if inplace. Raises ------ @@ -864,7 +856,7 @@ def set_categories(self, new_categories, ordered=None, rename=False, def rename_categories(self, new_categories, inplace=False): """ - Renames categories. + Rename categories. Parameters ---------- @@ -890,7 +882,7 @@ def rename_categories(self, new_categories, inplace=False): Currently, Series are considered list like. In a future version of pandas they'll be considered dict-like. - inplace : boolean (default: False) + inplace : bool, default False Whether or not to rename the categories inplace or return a copy of this categorical with renamed categories. @@ -958,7 +950,7 @@ def rename_categories(self, new_categories, inplace=False): def reorder_categories(self, new_categories, ordered=None, inplace=False): """ - Reorders categories as specified in new_categories. + Reorder categories as specified in new_categories. `new_categories` need to include all old categories and no new category items. @@ -967,10 +959,10 @@ def reorder_categories(self, new_categories, ordered=None, inplace=False): ---------- new_categories : Index-like The categories in new order. - ordered : boolean, optional + ordered : bool, optional Whether or not the categorical is treated as a ordered categorical. If not given, do not change the ordered information. - inplace : boolean (default: False) + inplace : bool, default False Whether or not to reorder the categories inplace or return a copy of this categorical with reordered categories. @@ -1010,7 +1002,7 @@ def add_categories(self, new_categories, inplace=False): ---------- new_categories : category or list-like of category The new categories to be included. - inplace : boolean (default: False) + inplace : bool, default False Whether or not to add the categories inplace or return a copy of this categorical with added categories. @@ -1051,7 +1043,7 @@ def add_categories(self, new_categories, inplace=False): def remove_categories(self, removals, inplace=False): """ - Removes the specified categories. + Remove the specified categories. `removals` must be included in the old categories. Values which were in the removed categories will be set to NaN @@ -1060,7 +1052,7 @@ def remove_categories(self, removals, inplace=False): ---------- removals : category or list of categories The categories which should be removed. - inplace : boolean (default: False) + inplace : bool, default False Whether or not to remove the categories inplace or return a copy of this categorical with removed categories. @@ -1104,11 +1096,11 @@ def remove_categories(self, removals, inplace=False): def remove_unused_categories(self, inplace=False): """ - Removes categories which are not used. + Remove categories which are not used. Parameters ---------- - inplace : boolean (default: False) + inplace : bool, default False Whether or not to drop unused categories inplace or return a copy of this categorical with unused categories dropped. @@ -1289,10 +1281,10 @@ def __array__(self, dtype=None): Returns ------- - values : numpy array + numpy.array A numpy array of either the specified dtype or, if dtype==None (default), the same dtype as - categorical.categories.dtype + categorical.categories.dtype. """ ret = take_1d(self.categories.values, self._codes) if dtype and not is_dtype_equal(dtype, self.categories.dtype): @@ -1454,13 +1446,13 @@ def dropna(self): def value_counts(self, dropna=True): """ - Returns a Series containing counts of each category. + Return a Series containing counts of each category. Every category will have an entry, even those with a count of 0. Parameters ---------- - dropna : boolean, default True + dropna : bool, default True Don't include counts of NaN. Returns @@ -1499,9 +1491,9 @@ def get_values(self): Returns ------- - values : numpy array + numpy.array A numpy array of the same dtype as categorical.categories.dtype or - Index if datetime / periods + Index if datetime / periods. """ # if we are a datetime and period index, return Index to keep metadata if is_datetimelike(self.categories): @@ -1540,7 +1532,7 @@ def argsort(self, *args, **kwargs): Returns ------- - argsorted : numpy array + numpy.array See Also -------- @@ -1570,7 +1562,7 @@ def argsort(self, *args, **kwargs): def sort_values(self, inplace=False, ascending=True, na_position='last'): """ - Sorts the Categorical by category value returning a new + Sort the Categorical by category value returning a new Categorical by default. While an ordering is applied to the category values, sorting in this @@ -1581,9 +1573,9 @@ def sort_values(self, inplace=False, ascending=True, na_position='last'): Parameters ---------- - inplace : boolean, default False + inplace : bool, default False Do operation in place. - ascending : boolean, default True + ascending : bool, default True Order ascending. Passing False orders descending. The ordering parameter provides the method by which the category values are organized. @@ -1593,7 +1585,7 @@ def sort_values(self, inplace=False, ascending=True, na_position='last'): Returns ------- - y : Categorical or None + Categorical or None See Also -------- @@ -1667,7 +1659,7 @@ def _values_for_rank(self): Returns ------- - numpy array + numpy.array """ from pandas import Series @@ -1695,7 +1687,7 @@ def ravel(self, order='C'): Returns ------- - raveled : numpy array + numpy.array """ return np.array(self) @@ -2167,13 +2159,12 @@ def _reverse_indexer(self): r, counts = libalgos.groupsort_indexer(self.codes.astype('int64'), categories.size) counts = counts.cumsum() - result = [r[counts[indexer]:counts[indexer + 1]] - for indexer in range(len(counts) - 1)] + result = (r[start:end] for start, end in zip(counts, counts[1:])) result = dict(zip(categories, result)) return result # reduction ops # - def _reduce(self, name, axis=0, skipna=True, **kwargs): + def _reduce(self, name, axis=0, **kwargs): func = getattr(self, name, None) if func is None: msg = 'Categorical cannot perform the operation {op}' @@ -2240,7 +2231,7 @@ def mode(self, dropna=True): Parameters ---------- - dropna : boolean, default True + dropna : bool, default True Don't consider counts of NaN/NaT. .. versionadded:: 0.24.0 @@ -2321,8 +2312,7 @@ def _values_for_factorize(self): @classmethod def _from_factorized(cls, uniques, original): return original._constructor(original.categories.take(uniques), - categories=original.categories, - ordered=original.ordered) + dtype=original.dtype) def equals(self, other): """ @@ -2334,7 +2324,7 @@ def equals(self, other): Returns ------- - are_equal : boolean + bool """ if self.is_dtype_equal(other): if self.categories.equals(other.categories): @@ -2358,7 +2348,7 @@ def is_dtype_equal(self, other): Returns ------- - are_equal : boolean + bool """ try: @@ -2627,6 +2617,9 @@ def _recode_for_categories(codes, old_categories, new_categories): if len(old_categories) == 0: # All null anyway, so just retain the nulls return codes.copy() + elif new_categories.equals(old_categories): + # Same categories, so no need to actually recode + return codes.copy() indexer = coerce_indexer_dtype(new_categories.get_indexer(old_categories), new_categories) new_codes = take_1d(indexer, codes.copy(), fill_value=-1) @@ -2674,9 +2667,7 @@ def _factorize_from_iterable(values): if is_categorical(values): if isinstance(values, (ABCCategoricalIndex, ABCSeries)): values = values._values - categories = CategoricalIndex(values.categories, - categories=values.categories, - ordered=values.ordered) + categories = CategoricalIndex(values.categories, dtype=values.dtype) codes = values.codes else: # The value of ordered is irrelevant since we don't use cat as such, diff --git a/pandas/core/arrays/datetimelike.py b/pandas/core/arrays/datetimelike.py index 73e799f9e0a36..94668c74c1693 100644 --- a/pandas/core/arrays/datetimelike.py +++ b/pandas/core/arrays/datetimelike.py @@ -144,7 +144,7 @@ def strftime(self, date_format): Return an Index of formatted strings specified by date_format, which supports the same string format as the python standard library. Details of the string format can be found in `python string format - doc <%(URL)s>`__ + doc <%(URL)s>`__. Parameters ---------- @@ -154,7 +154,7 @@ def strftime(self, date_format): Returns ------- Index - Index of formatted strings + Index of formatted strings. See Also -------- @@ -748,7 +748,7 @@ def _maybe_mask_results(self, result, fill_value=iNaT, convert=None): mask the result if needed, convert to the provided dtype if its not None - This is an internal routine + This is an internal routine. """ if self._hasnans: @@ -1047,7 +1047,7 @@ def _sub_period_array(self, other): Returns ------- result : np.ndarray[object] - Array of DateOffset objects; nulls represented by NaT + Array of DateOffset objects; nulls represented by NaT. """ if not is_period_dtype(self): raise TypeError("cannot subtract {dtype}-dtype from {cls}" diff --git a/pandas/core/arrays/datetimes.py b/pandas/core/arrays/datetimes.py index d7a8417a71be2..75cf658423210 100644 --- a/pandas/core/arrays/datetimes.py +++ b/pandas/core/arrays/datetimes.py @@ -128,7 +128,7 @@ def _dt_array_cmp(cls, op): Wrap comparison operations to convert datetime-like to datetime64 """ opname = '__{name}__'.format(name=op.__name__) - nat_result = True if opname == '__ne__' else False + nat_result = opname == '__ne__' def wrapper(self, other): if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)): @@ -720,11 +720,11 @@ def _sub_datetime_arraylike(self, other): self_i8 = self.asi8 other_i8 = other.asi8 + arr_mask = self._isnan | other._isnan new_values = checked_add_with_arr(self_i8, -other_i8, - arr_mask=self._isnan) + arr_mask=arr_mask) if self._hasnans or other._hasnans: - mask = (self._isnan) | (other._isnan) - new_values[mask] = iNaT + new_values[arr_mask] = iNaT return new_values.view('timedelta64[ns]') def _add_offset(self, offset): @@ -799,14 +799,14 @@ def tz_convert(self, tz): Parameters ---------- - tz : string, pytz.timezone, dateutil.tz.tzfile or None + tz : str, pytz.timezone, dateutil.tz.tzfile or None Time zone for time. Corresponding timestamps would be converted to this time zone of the Datetime Array/Index. A `tz` of None will convert to UTC and remove the timezone information. Returns ------- - normalized : same type as self + Array or Index Raises ------ @@ -842,7 +842,7 @@ def tz_convert(self, tz): With the ``tz=None``, we can remove the timezone (after converting to UTC if necessary): - >>> dti = pd.date_range(start='2014-08-01 09:00',freq='H', + >>> dti = pd.date_range(start='2014-08-01 09:00', freq='H', ... periods=3, tz='Europe/Berlin') >>> dti @@ -882,7 +882,7 @@ def tz_localize(self, tz, ambiguous='raise', nonexistent='raise', Parameters ---------- - tz : string, pytz.timezone, dateutil.tz.tzfile or None + tz : str, pytz.timezone, dateutil.tz.tzfile or None Time zone to convert timestamps to. Passing ``None`` will remove the time zone information preserving local time. ambiguous : 'infer', 'NaT', bool array, default 'raise' @@ -930,7 +930,7 @@ def tz_localize(self, tz, ambiguous='raise', nonexistent='raise', Returns ------- - result : same type as self + Same type as self Array/Index converted to the specified time zone. Raises @@ -970,43 +970,39 @@ def tz_localize(self, tz, ambiguous='raise', nonexistent='raise', Be careful with DST changes. When there is sequential data, pandas can infer the DST time: - >>> s = pd.to_datetime(pd.Series([ - ... '2018-10-28 01:30:00', - ... '2018-10-28 02:00:00', - ... '2018-10-28 02:30:00', - ... '2018-10-28 02:00:00', - ... '2018-10-28 02:30:00', - ... '2018-10-28 03:00:00', - ... '2018-10-28 03:30:00'])) + >>> s = pd.to_datetime(pd.Series(['2018-10-28 01:30:00', + ... '2018-10-28 02:00:00', + ... '2018-10-28 02:30:00', + ... '2018-10-28 02:00:00', + ... '2018-10-28 02:30:00', + ... '2018-10-28 03:00:00', + ... '2018-10-28 03:30:00'])) >>> s.dt.tz_localize('CET', ambiguous='infer') - 2018-10-28 01:30:00+02:00 0 - 2018-10-28 02:00:00+02:00 1 - 2018-10-28 02:30:00+02:00 2 - 2018-10-28 02:00:00+01:00 3 - 2018-10-28 02:30:00+01:00 4 - 2018-10-28 03:00:00+01:00 5 - 2018-10-28 03:30:00+01:00 6 - dtype: int64 + 0 2018-10-28 01:30:00+02:00 + 1 2018-10-28 02:00:00+02:00 + 2 2018-10-28 02:30:00+02:00 + 3 2018-10-28 02:00:00+01:00 + 4 2018-10-28 02:30:00+01:00 + 5 2018-10-28 03:00:00+01:00 + 6 2018-10-28 03:30:00+01:00 + dtype: datetime64[ns, CET] In some cases, inferring the DST is impossible. In such cases, you can pass an ndarray to the ambiguous parameter to set the DST explicitly - >>> s = pd.to_datetime(pd.Series([ - ... '2018-10-28 01:20:00', - ... '2018-10-28 02:36:00', - ... '2018-10-28 03:46:00'])) + >>> s = pd.to_datetime(pd.Series(['2018-10-28 01:20:00', + ... '2018-10-28 02:36:00', + ... '2018-10-28 03:46:00'])) >>> s.dt.tz_localize('CET', ambiguous=np.array([True, True, False])) - 0 2018-10-28 01:20:00+02:00 - 1 2018-10-28 02:36:00+02:00 - 2 2018-10-28 03:46:00+01:00 - dtype: datetime64[ns, CET] + 0 2015-03-29 03:00:00+02:00 + 1 2015-03-29 03:30:00+02:00 + dtype: datetime64[ns, Europe/Warsaw] If the DST transition causes nonexistent times, you can shift these dates forward or backwards with a timedelta object or `'shift_forward'` or `'shift_backwards'`. - >>> s = pd.to_datetime(pd.Series([ - ... '2015-03-29 02:30:00', - ... '2015-03-29 03:30:00'])) + >>> s = pd.to_datetime(pd.Series(['2015-03-29 02:30:00', + ... '2015-03-29 03:30:00'])) >>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_forward') 0 2015-03-29 03:00:00+02:00 1 2015-03-29 03:30:00+02:00 @@ -1129,7 +1125,7 @@ def to_period(self, freq=None): Parameters ---------- - freq : string or Offset, optional + freq : str or Offset, optional One of pandas' :ref:`offset strings ` or an Offset object. Will be inferred by default. @@ -1150,7 +1146,7 @@ def to_period(self, freq=None): Examples -------- - >>> df = pd.DataFrame({"y": [1,2,3]}, + >>> df = pd.DataFrame({"y": [1, 2, 3]}, ... index=pd.to_datetime(["2000-03-31 00:00:00", ... "2000-05-31 00:00:00", ... "2000-08-31 00:00:00"])) @@ -2058,7 +2054,7 @@ def validate_tz_from_dtype(dtype, tz): # tz-naive dtype (i.e. datetime64[ns]) if tz is not None and not timezones.tz_compare(tz, dtz): raise ValueError("cannot supply both a tz and a " - "timezone-naive dtype (i.e. datetime64[ns]") + "timezone-naive dtype (i.e. datetime64[ns])") return tz diff --git a/pandas/core/arrays/integer.py b/pandas/core/arrays/integer.py index a6a4a49d3a939..fd90aec3b5e8c 100644 --- a/pandas/core/arrays/integer.py +++ b/pandas/core/arrays/integer.py @@ -561,7 +561,7 @@ def cmp_method(self, other): else: mask = self._mask | mask - result[mask] = True if op_name == 'ne' else False + result[mask] = op_name == 'ne' return result name = '__{name}__'.format(name=op.__name__) diff --git a/pandas/core/arrays/numpy_.py b/pandas/core/arrays/numpy_.py index 47517782e2bbf..8e2ab586cacb6 100644 --- a/pandas/core/arrays/numpy_.py +++ b/pandas/core/arrays/numpy_.py @@ -4,6 +4,7 @@ from pandas._libs import lib from pandas.compat.numpy import function as nv +from pandas.util._decorators import Appender from pandas.util._validators import validate_fillna_kwargs from pandas.core.dtypes.dtypes import ExtensionDtype @@ -12,6 +13,7 @@ from pandas import compat from pandas.core import nanops +from pandas.core.algorithms import searchsorted from pandas.core.missing import backfill_1d, pad_1d from .base import ExtensionArray, ExtensionOpsMixin @@ -222,7 +224,7 @@ def __getitem__(self, item): item = item._ndarray result = self._ndarray[item] - if not lib.is_scalar(result): + if not lib.is_scalar(item): result = type(self)(result) return result @@ -423,6 +425,11 @@ def to_numpy(self, dtype=None, copy=False): return result + @Appender(ExtensionArray.searchsorted.__doc__) + def searchsorted(self, value, side='left', sorter=None): + return searchsorted(self.to_numpy(), value, + side=side, sorter=sorter) + # ------------------------------------------------------------------------ # Ops diff --git a/pandas/core/arrays/period.py b/pandas/core/arrays/period.py index e0c71b5609096..3ddceb8c2839d 100644 --- a/pandas/core/arrays/period.py +++ b/pandas/core/arrays/period.py @@ -46,7 +46,7 @@ def _period_array_cmp(cls, op): Wrap comparison operations to convert Period-like to PeriodDtype """ opname = '__{name}__'.format(name=op.__name__) - nat_result = True if opname == '__ne__' else False + nat_result = opname == '__ne__' def wrapper(self, other): op = getattr(self.asi8, opname) diff --git a/pandas/core/arrays/timedeltas.py b/pandas/core/arrays/timedeltas.py index 4f0c96f7927da..74fe8072e6924 100644 --- a/pandas/core/arrays/timedeltas.py +++ b/pandas/core/arrays/timedeltas.py @@ -62,7 +62,7 @@ def _td_array_cmp(cls, op): Wrap comparison operations to convert timedelta-like to timedelta64 """ opname = '__{name}__'.format(name=op.__name__) - nat_result = True if opname == '__ne__' else False + nat_result = opname == '__ne__' def wrapper(self, other): if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)): @@ -190,6 +190,8 @@ def __init__(self, values, dtype=_TD_DTYPE, freq=None, copy=False): "ndarray, or Series or Index containing one of those." ) raise ValueError(msg.format(type(values).__name__)) + if values.ndim != 1: + raise ValueError("Only 1-dimensional input arrays are supported.") if values.dtype == 'i8': # for compat with datetime/timedelta/period shared methods, @@ -945,6 +947,9 @@ def sequence_to_td64ns(data, copy=False, unit="ns", errors="raise"): .format(dtype=data.dtype)) data = np.array(data, copy=copy) + if data.ndim != 1: + raise ValueError("Only 1-dimensional input arrays are supported.") + assert data.dtype == 'm8[ns]', data return data, inferred_freq diff --git a/pandas/core/base.py b/pandas/core/base.py index c02ba88ea7fda..f896596dd5216 100644 --- a/pandas/core/base.py +++ b/pandas/core/base.py @@ -1,6 +1,7 @@ """ Base and utility classes for pandas objects. """ +from collections import OrderedDict import textwrap import warnings @@ -8,7 +9,7 @@ import pandas._libs.lib as lib import pandas.compat as compat -from pandas.compat import PYPY, OrderedDict, builtins, map, range +from pandas.compat import PYPY, builtins, map, range from pandas.compat.numpy import function as nv from pandas.errors import AbstractMethodError from pandas.util._decorators import Appender, Substitution, cache_readonly @@ -376,7 +377,7 @@ def nested_renaming_depr(level=4): # eg. {'A' : ['mean']}, normalize all to # be list-likes if any(is_aggregator(x) for x in compat.itervalues(arg)): - new_arg = compat.OrderedDict() + new_arg = OrderedDict() for k, v in compat.iteritems(arg): if not isinstance(v, (tuple, list, dict)): new_arg[k] = [v] @@ -444,14 +445,14 @@ def _agg(arg, func): run the aggregations over the arg with func return an OrderedDict """ - result = compat.OrderedDict() + result = OrderedDict() for fname, agg_how in compat.iteritems(arg): result[fname] = func(fname, agg_how) return result # set the final keys keys = list(compat.iterkeys(arg)) - result = compat.OrderedDict() + result = OrderedDict() # nested renamer if is_nested_renamer: @@ -459,7 +460,7 @@ def _agg(arg, func): if all(isinstance(r, dict) for r in result): - result, results = compat.OrderedDict(), result + result, results = OrderedDict(), result for r in results: result.update(r) keys = list(compat.iterkeys(result)) @@ -793,7 +794,7 @@ def array(self): Returns ------- - array : ExtensionArray + ExtensionArray An ExtensionArray of the values stored within. For extension types, this is the actual array. For NumPy native types, this is a thin (no copy) wrapper around :class:`numpy.ndarray`. @@ -1021,7 +1022,7 @@ def max(self, axis=None, skipna=True): def argmax(self, axis=None, skipna=True): """ - Return a ndarray of the maximum argument indexer. + Return an ndarray of the maximum argument indexer. Parameters ---------- @@ -1086,6 +1087,10 @@ def argmin(self, axis=None, skipna=True): Dummy argument for consistency with Series skipna : bool, default True + Returns + ------- + numpy.ndarray + See Also -------- numpy.ndarray.argmin @@ -1101,6 +1106,10 @@ def tolist(self): (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period) + Returns + ------- + list + See Also -------- numpy.ndarray.tolist @@ -1161,7 +1170,7 @@ def _map_values(self, mapper, na_action=None): Returns ------- - applied : Union[Index, MultiIndex], inferred + Union[Index, MultiIndex], inferred The output of the mapping function applied to the index. If the function returns a tuple with more than one element a MultiIndex will be returned. @@ -1234,7 +1243,7 @@ def value_counts(self, normalize=False, sort=True, ascending=False, If True then the object returned will contain the relative frequencies of the unique values. sort : boolean, default True - Sort by values. + Sort by frequencies. ascending : boolean, default False Sort in ascending order. bins : integer, optional @@ -1245,7 +1254,7 @@ def value_counts(self, normalize=False, sort=True, ascending=False, Returns ------- - counts : Series + Series See Also -------- @@ -1323,12 +1332,31 @@ def nunique(self, dropna=True): Parameters ---------- - dropna : boolean, default True + dropna : bool, default True Don't include NaN in the count. Returns ------- - nunique : int + int + + See Also + -------- + DataFrame.nunique: Method nunique for DataFrame. + Series.count: Count non-NA/null observations in the Series. + + Examples + -------- + >>> s = pd.Series([1, 3, 5, 7, 7]) + >>> s + 0 1 + 1 3 + 2 5 + 3 7 + 4 7 + dtype: int64 + + >>> s.nunique() + 4 """ uniqs = self.unique() n = len(uniqs) @@ -1343,9 +1371,9 @@ def is_unique(self): Returns ------- - is_unique : boolean + bool """ - return self.nunique() == len(self) + return self.nunique(dropna=False) == len(self) @property def is_monotonic(self): @@ -1357,7 +1385,7 @@ def is_monotonic(self): Returns ------- - is_monotonic : boolean + bool """ from pandas import Index return Index(self).is_monotonic @@ -1374,7 +1402,7 @@ def is_monotonic_decreasing(self): Returns ------- - is_monotonic_decreasing : boolean + bool """ from pandas import Index return Index(self).is_monotonic_decreasing @@ -1494,11 +1522,11 @@ def factorize(self, sort=False, na_sentinel=-1): array([3]) """) - @Substitution(klass='IndexOpsMixin') + @Substitution(klass='Index') @Appender(_shared_docs['searchsorted']) def searchsorted(self, value, side='left', sorter=None): - # needs coercion on the key (DatetimeIndex does already) - return self._values.searchsorted(value, side=side, sorter=sorter) + return algorithms.searchsorted(self._values, value, + side=side, sorter=sorter) def drop_duplicates(self, keep='first', inplace=False): inplace = validate_bool_kwarg(inplace, 'inplace') diff --git a/pandas/core/common.py b/pandas/core/common.py index b4de0daa13b16..5b83cb344b1e7 100644 --- a/pandas/core/common.py +++ b/pandas/core/common.py @@ -5,6 +5,7 @@ """ import collections +from collections import OrderedDict from datetime import datetime, timedelta from functools import partial import inspect @@ -13,7 +14,7 @@ from pandas._libs import lib, tslibs import pandas.compat as compat -from pandas.compat import PY36, OrderedDict, iteritems +from pandas.compat import PY36, iteritems from pandas.core.dtypes.cast import construct_1d_object_array_from_listlike from pandas.core.dtypes.common import ( @@ -32,7 +33,8 @@ class SettingWithCopyWarning(Warning): def flatten(l): - """Flatten an arbitrarily nested sequence. + """ + Flatten an arbitrarily nested sequence. Parameters ---------- @@ -160,12 +162,16 @@ def cast_scalar_indexer(val): def _not_none(*args): - """Returns a generator consisting of the arguments that are not None""" + """ + Returns a generator consisting of the arguments that are not None. + """ return (arg for arg in args if arg is not None) def _any_none(*args): - """Returns a boolean indicating if any argument is None""" + """ + Returns a boolean indicating if any argument is None. + """ for arg in args: if arg is None: return True @@ -173,7 +179,9 @@ def _any_none(*args): def _all_none(*args): - """Returns a boolean indicating if all arguments are None""" + """ + Returns a boolean indicating if all arguments are None. + """ for arg in args: if arg is not None: return False @@ -181,7 +189,9 @@ def _all_none(*args): def _any_not_none(*args): - """Returns a boolean indicating if any argument is not None""" + """ + Returns a boolean indicating if any argument is not None. + """ for arg in args: if arg is not None: return True @@ -189,7 +199,9 @@ def _any_not_none(*args): def _all_not_none(*args): - """Returns a boolean indicating if all arguments are not None""" + """ + Returns a boolean indicating if all arguments are not None. + """ for arg in args: if arg is None: return False @@ -197,7 +209,9 @@ def _all_not_none(*args): def count_not_none(*args): - """Returns the count of arguments that are not None""" + """ + Returns the count of arguments that are not None. + """ return sum(x is not None for x in args) @@ -277,7 +291,9 @@ def maybe_make_list(obj): def is_null_slice(obj): - """ we have a null slice """ + """ + We have a null slice. + """ return (isinstance(obj, slice) and obj.start is None and obj.stop is None and obj.step is None) @@ -291,7 +307,9 @@ def is_true_slices(l): # TODO: used only once in indexing; belongs elsewhere? def is_full_slice(obj, l): - """ we have a full length slice """ + """ + We have a full length slice. + """ return (isinstance(obj, slice) and obj.start == 0 and obj.stop == l and obj.step is None) @@ -316,7 +334,7 @@ def get_callable_name(obj): def apply_if_callable(maybe_callable, obj, **kwargs): """ Evaluate possibly callable input using obj and kwargs if it is callable, - otherwise return as it is + otherwise return as it is. Parameters ---------- @@ -333,7 +351,8 @@ def apply_if_callable(maybe_callable, obj, **kwargs): def dict_compat(d): """ - Helper function to convert datetimelike-keyed dicts to Timestamp-keyed dict + Helper function to convert datetimelike-keyed dicts + to Timestamp-keyed dict. Parameters ---------- @@ -383,13 +402,6 @@ def standardize_mapping(into): return into -def sentinel_factory(): - class Sentinel(object): - pass - - return Sentinel() - - def random_state(state=None): """ Helper function for processing random_state arguments. diff --git a/pandas/core/computation/eval.py b/pandas/core/computation/eval.py index b768ed6df303e..23c3e0eaace81 100644 --- a/pandas/core/computation/eval.py +++ b/pandas/core/computation/eval.py @@ -205,7 +205,7 @@ def eval(expr, parser='pandas', engine=None, truediv=True, A list of objects implementing the ``__getitem__`` special method that you can use to inject an additional collection of namespaces to use for variable lookup. For example, this is used in the - :meth:`~pandas.DataFrame.query` method to inject the + :meth:`~DataFrame.query` method to inject the ``DataFrame.index`` and ``DataFrame.columns`` variables that refer to their respective :class:`~pandas.DataFrame` instance attributes. @@ -248,8 +248,8 @@ def eval(expr, parser='pandas', engine=None, truediv=True, See Also -------- - pandas.DataFrame.query - pandas.DataFrame.eval + DataFrame.query + DataFrame.eval Notes ----- diff --git a/pandas/core/computation/ops.py b/pandas/core/computation/ops.py index 8c3218a976b6b..5c70255982e54 100644 --- a/pandas/core/computation/ops.py +++ b/pandas/core/computation/ops.py @@ -8,11 +8,11 @@ import numpy as np +from pandas._libs.tslibs import Timestamp from pandas.compat import PY3, string_types, text_type from pandas.core.dtypes.common import is_list_like, is_scalar -import pandas as pd from pandas.core.base import StringMixin import pandas.core.common as com from pandas.core.computation.common import _ensure_decoded, _result_type_many @@ -399,8 +399,9 @@ def evaluate(self, env, engine, parser, term_type, eval_in_python): if self.op in eval_in_python: res = self.func(left.value, right.value) else: - res = pd.eval(self, local_dict=env, engine=engine, - parser=parser) + from pandas.core.computation.eval import eval + res = eval(self, local_dict=env, engine=engine, + parser=parser) name = env.add_tmp(res) return term_type(name, env=env) @@ -422,7 +423,7 @@ def stringify(value): v = rhs.value if isinstance(v, (int, float)): v = stringify(v) - v = pd.Timestamp(_ensure_decoded(v)) + v = Timestamp(_ensure_decoded(v)) if v.tz is not None: v = v.tz_convert('UTC') self.rhs.update(v) @@ -431,7 +432,7 @@ def stringify(value): v = lhs.value if isinstance(v, (int, float)): v = stringify(v) - v = pd.Timestamp(_ensure_decoded(v)) + v = Timestamp(_ensure_decoded(v)) if v.tz is not None: v = v.tz_convert('UTC') self.lhs.update(v) diff --git a/pandas/core/computation/pytables.py b/pandas/core/computation/pytables.py index 00de29b07c75d..18f13e17c046e 100644 --- a/pandas/core/computation/pytables.py +++ b/pandas/core/computation/pytables.py @@ -5,6 +5,7 @@ import numpy as np +from pandas._libs.tslibs import Timedelta, Timestamp from pandas.compat import DeepChainMap, string_types, u from pandas.core.dtypes.common import is_list_like @@ -185,12 +186,12 @@ def stringify(value): if isinstance(v, (int, float)): v = stringify(v) v = _ensure_decoded(v) - v = pd.Timestamp(v) + v = Timestamp(v) if v.tz is not None: v = v.tz_convert('UTC') return TermValue(v, v.value, kind) elif kind == u('timedelta64') or kind == u('timedelta'): - v = pd.Timedelta(v, unit='s').value + v = Timedelta(v, unit='s').value return TermValue(int(v), v, kind) elif meta == u('category'): metadata = com.values_from_object(self.metadata) @@ -251,7 +252,7 @@ def evaluate(self): .format(slf=self)) rhs = self.conform(self.rhs) - values = [TermValue(v, v, self.kind) for v in rhs] + values = [TermValue(v, v, self.kind).value for v in rhs] if self.is_in_table: @@ -262,7 +263,7 @@ def evaluate(self): self.filter = ( self.lhs, filter_op, - pd.Index([v.value for v in values])) + pd.Index(values)) return self return None @@ -274,7 +275,7 @@ def evaluate(self): self.filter = ( self.lhs, filter_op, - pd.Index([v.value for v in values])) + pd.Index(values)) else: raise TypeError("passing a filterable condition to a non-table " diff --git a/pandas/core/computation/scope.py b/pandas/core/computation/scope.py index 33c5a1c2e0f0a..e158bc8c568eb 100644 --- a/pandas/core/computation/scope.py +++ b/pandas/core/computation/scope.py @@ -11,9 +11,9 @@ import numpy as np +from pandas._libs.tslibs import Timestamp from pandas.compat import DeepChainMap, StringIO, map -import pandas as pd # noqa from pandas.core.base import StringMixin import pandas.core.computation as compu @@ -48,7 +48,7 @@ def _raw_hex_id(obj): _DEFAULT_GLOBALS = { - 'Timestamp': pd._libs.tslib.Timestamp, + 'Timestamp': Timestamp, 'datetime': datetime.datetime, 'True': True, 'False': False, diff --git a/pandas/core/config.py b/pandas/core/config.py index 0f43ca65d187a..01664fffb1e27 100644 --- a/pandas/core/config.py +++ b/pandas/core/config.py @@ -282,8 +282,8 @@ def __doc__(self): Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced. -value : - new value of option. +value : object + New value of option. Returns ------- diff --git a/pandas/core/dtypes/base.py b/pandas/core/dtypes/base.py index ab1cb9cf2499a..88bbdcf342d66 100644 --- a/pandas/core/dtypes/base.py +++ b/pandas/core/dtypes/base.py @@ -153,8 +153,8 @@ class ExtensionDtype(_DtypeOpsMixin): See Also -------- - pandas.api.extensions.register_extension_dtype - pandas.api.extensions.ExtensionArray + extensions.register_extension_dtype + extensions.ExtensionArray Notes ----- @@ -173,7 +173,7 @@ class ExtensionDtype(_DtypeOpsMixin): Optionally one can override construct_array_type for construction with the name of this dtype via the Registry. See - :meth:`pandas.api.extensions.register_extension_dtype`. + :meth:`extensions.register_extension_dtype`. * construct_array_type diff --git a/pandas/core/dtypes/cast.py b/pandas/core/dtypes/cast.py index ad62146dda268..f6561948df99a 100644 --- a/pandas/core/dtypes/cast.py +++ b/pandas/core/dtypes/cast.py @@ -1111,11 +1111,9 @@ def find_common_type(types): # this is different from numpy, which casts bool with float/int as int has_bools = any(is_bool_dtype(t) for t in types) if has_bools: - has_ints = any(is_integer_dtype(t) for t in types) - has_floats = any(is_float_dtype(t) for t in types) - has_complex = any(is_complex_dtype(t) for t in types) - if has_ints or has_floats or has_complex: - return np.object + for t in types: + if is_integer_dtype(t) or is_float_dtype(t) or is_complex_dtype(t): + return np.object return np.find_common_type(types, []) diff --git a/pandas/core/dtypes/common.py b/pandas/core/dtypes/common.py index e9bf0f87088db..4be7eb8ddb890 100644 --- a/pandas/core/dtypes/common.py +++ b/pandas/core/dtypes/common.py @@ -139,7 +139,8 @@ def is_object_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is of the object dtype. + boolean + Whether or not the array-like or dtype is of the object dtype. Examples -------- @@ -230,8 +231,8 @@ def is_scipy_sparse(arr): Returns ------- - boolean : Whether or not the array-like is a - scipy.sparse.spmatrix instance. + boolean + Whether or not the array-like is a scipy.sparse.spmatrix instance. Notes ----- @@ -270,7 +271,8 @@ def is_categorical(arr): Returns ------- - boolean : Whether or not the array-like is of a Categorical instance. + boolean + Whether or not the array-like is of a Categorical instance. Examples -------- @@ -305,8 +307,9 @@ def is_datetimetz(arr): Returns ------- - boolean : Whether or not the array-like is a datetime array-like with - a timezone component in its dtype. + boolean + Whether or not the array-like is a datetime array-like with a + timezone component in its dtype. Examples -------- @@ -347,7 +350,8 @@ def is_offsetlike(arr_or_obj): Returns ------- - boolean : Whether the object is a DateOffset or listlike of DatetOffsets + boolean + Whether the object is a DateOffset or listlike of DatetOffsets Examples -------- @@ -381,7 +385,8 @@ def is_period(arr): Returns ------- - boolean : Whether or not the array-like is a periodical index. + boolean + Whether or not the array-like is a periodical index. Examples -------- @@ -411,8 +416,8 @@ def is_datetime64_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is of - the datetime64 dtype. + boolean + Whether or not the array-like or dtype is of the datetime64 dtype. Examples -------- @@ -442,8 +447,8 @@ def is_datetime64tz_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is of - a DatetimeTZDtype dtype. + boolean + Whether or not the array-like or dtype is of a DatetimeTZDtype dtype. Examples -------- @@ -480,8 +485,8 @@ def is_timedelta64_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is - of the timedelta64 dtype. + boolean + Whether or not the array-like or dtype is of the timedelta64 dtype. Examples -------- @@ -511,7 +516,8 @@ def is_period_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is of the Period dtype. + boolean + Whether or not the array-like or dtype is of the Period dtype. Examples -------- @@ -544,8 +550,8 @@ def is_interval_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is - of the Interval dtype. + boolean + Whether or not the array-like or dtype is of the Interval dtype. Examples -------- @@ -580,8 +586,8 @@ def is_categorical_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array-like or dtype is - of the Categorical dtype. + boolean + Whether or not the array-like or dtype is of the Categorical dtype. Examples -------- @@ -613,7 +619,8 @@ def is_string_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the string dtype. + boolean + Whether or not the array or dtype is of the string dtype. Examples -------- @@ -647,8 +654,9 @@ def is_period_arraylike(arr): Returns ------- - boolean : Whether or not the array-like is a periodical - array-like or PeriodIndex instance. + boolean + Whether or not the array-like is a periodical array-like or + PeriodIndex instance. Examples -------- @@ -678,8 +686,9 @@ def is_datetime_arraylike(arr): Returns ------- - boolean : Whether or not the array-like is a datetime - array-like or DatetimeIndex. + boolean + Whether or not the array-like is a datetime array-like or + DatetimeIndex. Examples -------- @@ -713,7 +722,8 @@ def is_datetimelike(arr): Returns ------- - boolean : Whether or not the array-like is a datetime-like array-like. + boolean + Whether or not the array-like is a datetime-like array-like. Examples -------- @@ -754,7 +764,8 @@ def is_dtype_equal(source, target): Returns ---------- - boolean : Whether or not the two dtypes are equal. + boolean + Whether or not the two dtypes are equal. Examples -------- @@ -794,7 +805,8 @@ def is_dtype_union_equal(source, target): Returns ---------- - boolean : Whether or not the two dtypes are equal. + boolean + Whether or not the two dtypes are equal. >>> is_dtype_equal("int", int) True @@ -835,7 +847,8 @@ def is_any_int_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of an integer dtype. + boolean + Whether or not the array or dtype is of an integer dtype. Examples -------- @@ -883,8 +896,9 @@ def is_integer_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of an integer dtype - and not an instance of timedelta64. + boolean + Whether or not the array or dtype is of an integer dtype and + not an instance of timedelta64. Examples -------- @@ -938,8 +952,9 @@ def is_signed_integer_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a signed integer dtype - and not an instance of timedelta64. + boolean + Whether or not the array or dtype is of a signed integer dtype + and not an instance of timedelta64. Examples -------- @@ -993,8 +1008,8 @@ def is_unsigned_integer_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of an - unsigned integer dtype. + boolean + Whether or not the array or dtype is of an unsigned integer dtype. Examples -------- @@ -1036,7 +1051,8 @@ def is_int64_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the int64 dtype. + boolean + Whether or not the array or dtype is of the int64 dtype. Notes ----- @@ -1086,7 +1102,8 @@ def is_datetime64_any_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the datetime64 dtype. + boolean + Whether or not the array or dtype is of the datetime64 dtype. Examples -------- @@ -1126,7 +1143,8 @@ def is_datetime64_ns_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the datetime64[ns] dtype. + boolean + Whether or not the array or dtype is of the datetime64[ns] dtype. Examples -------- @@ -1178,8 +1196,8 @@ def is_timedelta64_ns_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the - timedelta64[ns] dtype. + boolean + Whether or not the array or dtype is of the timedelta64[ns] dtype. Examples -------- @@ -1207,8 +1225,9 @@ def is_datetime_or_timedelta_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a - timedelta64, or datetime64 dtype. + boolean + Whether or not the array or dtype is of a timedelta64, + or datetime64 dtype. Examples -------- @@ -1248,7 +1267,8 @@ def _is_unorderable_exception(e): Returns ------- - boolean : Whether or not the exception raised is an unorderable exception. + boolean + Whether or not the exception raised is an unorderable exception. """ if PY36: @@ -1275,8 +1295,8 @@ def is_numeric_v_string_like(a, b): Returns ------- - boolean : Whether we return a comparing a string-like - object to a numeric array. + boolean + Whether we return a comparing a string-like object to a numeric array. Examples -------- @@ -1332,8 +1352,8 @@ def is_datetimelike_v_numeric(a, b): Returns ------- - boolean : Whether we return a comparing a datetime-like - to a numeric object. + boolean + Whether we return a comparing a datetime-like to a numeric object. Examples -------- @@ -1388,8 +1408,8 @@ def is_datetimelike_v_object(a, b): Returns ------- - boolean : Whether we return a comparing a datetime-like - to an object instance. + boolean + Whether we return a comparing a datetime-like to an object instance. Examples -------- @@ -1442,7 +1462,8 @@ def needs_i8_conversion(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype should be converted to int64. + boolean + Whether or not the array or dtype should be converted to int64. Examples -------- @@ -1480,7 +1501,8 @@ def is_numeric_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a numeric dtype. + boolean + Whether or not the array or dtype is of a numeric dtype. Examples -------- @@ -1524,7 +1546,8 @@ def is_string_like_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of the string dtype. + boolean + Whether or not the array or dtype is of the string dtype. Examples -------- @@ -1555,7 +1578,8 @@ def is_float_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a float dtype. + boolean + Whether or not the array or dtype is of a float dtype. Examples -------- @@ -1586,7 +1610,8 @@ def is_bool_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a boolean dtype. + boolean + Whether or not the array or dtype is of a boolean dtype. Notes ----- @@ -1655,8 +1680,8 @@ def is_extension_type(arr): Returns ------- - boolean : Whether or not the array-like is of a pandas - extension class instance. + boolean + Whether or not the array-like is of a pandas extension class instance. Examples -------- @@ -1760,7 +1785,8 @@ def is_complex_dtype(arr_or_dtype): Returns ------- - boolean : Whether or not the array or dtype is of a compex dtype. + boolean + Whether or not the array or dtype is of a compex dtype. Examples -------- @@ -1980,7 +2006,7 @@ def _validate_date_like_dtype(dtype): def pandas_dtype(dtype): """ - Converts input into a pandas only dtype object or a numpy dtype object. + Convert input into a pandas only dtype object or a numpy dtype object. Parameters ---------- diff --git a/pandas/core/dtypes/concat.py b/pandas/core/dtypes/concat.py index aada777decaa7..10e903acbe538 100644 --- a/pandas/core/dtypes/concat.py +++ b/pandas/core/dtypes/concat.py @@ -123,8 +123,6 @@ def is_nonempty(x): except Exception: return True - nonempty = [x for x in to_concat if is_nonempty(x)] - # If all arrays are empty, there's nothing to convert, just short-cut to # the concatenation, #3121. # @@ -148,11 +146,11 @@ def is_nonempty(x): elif 'sparse' in typs: return _concat_sparse(to_concat, axis=axis, typs=typs) - extensions = [is_extension_array_dtype(x) for x in to_concat] - if any(extensions) and axis == 1: + all_empty = all(not is_nonempty(x) for x in to_concat) + if any(is_extension_array_dtype(x) for x in to_concat) and axis == 1: to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat] - if not nonempty: + if all_empty: # we have all empties, but may need to coerce the result dtype to # object if we have non-numeric type operands (numpy would otherwise # cast this to float) diff --git a/pandas/core/dtypes/dtypes.py b/pandas/core/dtypes/dtypes.py index f84471c3b04e8..11a132c4d14ee 100644 --- a/pandas/core/dtypes/dtypes.py +++ b/pandas/core/dtypes/dtypes.py @@ -8,7 +8,8 @@ from pandas._libs.interval import Interval from pandas._libs.tslibs import NaT, Period, Timestamp, timezones -from pandas.core.dtypes.generic import ABCCategoricalIndex, ABCIndexClass +from pandas.core.dtypes.generic import ( + ABCCategoricalIndex, ABCDateOffset, ABCIndexClass) from pandas import compat @@ -17,7 +18,8 @@ def register_extension_dtype(cls): - """Class decorator to register an ExtensionType with pandas. + """ + Register an ExtensionType with pandas as class decorator. .. versionadded:: 0.24.0 @@ -194,7 +196,7 @@ class CategoricalDtype(PandasExtensionDtype, ExtensionDtype): See Also -------- - pandas.Categorical + Categorical Notes ----- @@ -413,8 +415,7 @@ def _hash_categories(categories, ordered=True): cat_array = hash_tuples(categories) else: if categories.dtype == 'O': - types = [type(x) for x in categories] - if not len(set(types)) == 1: + if len({type(x) for x in categories}) != 1: # TODO: hash_array doesn't handle mixed types. It casts # everything to a str first, which means we treat # {'1', '2'} the same as {'1', 2} @@ -639,6 +640,7 @@ def __init__(self, unit="ns", tz=None): if tz: tz = timezones.maybe_get_tz(tz) + tz = timezones.tz_standardize(tz) elif tz is not None: raise pytz.UnknownTimeZoneError(tz) elif tz is None: @@ -757,8 +759,7 @@ def __new__(cls, freq=None): # empty constructor for pickle compat return object.__new__(cls) - from pandas.tseries.offsets import DateOffset - if not isinstance(freq, DateOffset): + if not isinstance(freq, ABCDateOffset): freq = cls._parse_dtype_strict(freq) try: @@ -789,12 +790,10 @@ def construct_from_string(cls, string): Strict construction from a string, raise a TypeError if not possible """ - from pandas.tseries.offsets import DateOffset - if (isinstance(string, compat.string_types) and (string.startswith('period[') or string.startswith('Period[')) or - isinstance(string, DateOffset)): + isinstance(string, ABCDateOffset)): # do not parse string like U as period[U] # avoid tuple to be regarded as freq try: @@ -932,13 +931,18 @@ def construct_from_string(cls, string): attempt to construct this type from a string, raise a TypeError if its not possible """ - if (isinstance(string, compat.string_types) and - (string.startswith('interval') or - string.startswith('Interval'))): - return cls(string) + if not isinstance(string, compat.string_types): + msg = "a string needs to be passed, got type {typ}" + raise TypeError(msg.format(typ=type(string))) - msg = "a string needs to be passed, got type {typ}" - raise TypeError(msg.format(typ=type(string))) + if (string.lower() == 'interval' or + cls._match.search(string) is not None): + return cls(string) + + msg = ('Incorrectly formatted string passed to constructor. ' + 'Valid formats include Interval or Interval[dtype] ' + 'where dtype is numeric, datetime, or timedelta') + raise TypeError(msg) @property def type(self): @@ -979,7 +983,7 @@ def is_dtype(cls, dtype): return True else: return False - except ValueError: + except (ValueError, TypeError): return False else: return False diff --git a/pandas/core/dtypes/inference.py b/pandas/core/dtypes/inference.py index b11542622451c..1a02623fa6072 100644 --- a/pandas/core/dtypes/inference.py +++ b/pandas/core/dtypes/inference.py @@ -44,7 +44,7 @@ def is_number(obj): See Also -------- - pandas.api.types.is_integer: Checks a subgroup of numbers. + api.types.is_integer: Checks a subgroup of numbers. Examples -------- @@ -397,12 +397,15 @@ def is_dict_like(obj): True >>> is_dict_like([1, 2, 3]) False + >>> is_dict_like(dict) + False + >>> is_dict_like(dict()) + True """ - for attr in ("__getitem__", "keys", "__contains__"): - if not hasattr(obj, attr): - return False - - return True + dict_like_attrs = ("__getitem__", "keys", "__contains__") + return (all(hasattr(obj, attr) for attr in dict_like_attrs) + # [GH 25196] exclude classes + and not isinstance(obj, type)) def is_named_tuple(obj): diff --git a/pandas/core/dtypes/missing.py b/pandas/core/dtypes/missing.py index 3c6d3f212342b..697c58a365233 100644 --- a/pandas/core/dtypes/missing.py +++ b/pandas/core/dtypes/missing.py @@ -221,8 +221,8 @@ def _isna_ndarraylike(obj): # box if isinstance(obj, ABCSeries): - from pandas import Series - result = Series(result, index=obj.index, name=obj.name, copy=False) + result = obj._constructor( + result, index=obj.index, name=obj.name, copy=False) return result @@ -250,8 +250,8 @@ def _isna_ndarraylike_old(obj): # box if isinstance(obj, ABCSeries): - from pandas import Series - result = Series(result, index=obj.index, name=obj.name, copy=False) + result = obj._constructor( + result, index=obj.index, name=obj.name, copy=False) return result diff --git a/pandas/core/frame.py b/pandas/core/frame.py index b4f79bda25517..6b4d95055d06d 100644 --- a/pandas/core/frame.py +++ b/pandas/core/frame.py @@ -13,6 +13,7 @@ from __future__ import division import collections +from collections import OrderedDict import functools import itertools import sys @@ -32,7 +33,7 @@ from pandas import compat from pandas.compat import (range, map, zip, lmap, lzip, StringIO, u, - OrderedDict, PY36, raise_with_traceback, + PY36, raise_with_traceback, Iterator, string_and_binary_types) from pandas.compat.numpy import function as nv from pandas.core.dtypes.cast import ( @@ -318,7 +319,7 @@ class DataFrame(NDFrame): DataFrame.from_records : Constructor from tuples, also record arrays. DataFrame.from_dict : From dicts of Series, arrays, or dicts. DataFrame.from_items : From sequence of (key, value) pairs - pandas.read_csv, pandas.read_table, pandas.read_clipboard. + read_csv, pandas.read_table, pandas.read_clipboard. Examples -------- @@ -482,7 +483,7 @@ def axes(self): -------- >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) >>> df.axes - [RangeIndex(start=0, stop=2, step=1), Index(['coll', 'col2'], + [RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='object')] """ return [self.index, self.columns] @@ -640,16 +641,6 @@ def _repr_html_(self): Mainly for IPython notebook. """ - # qtconsole doesn't report its line width, and also - # behaves badly when outputting an HTML table - # that doesn't fit the window, so disable it. - # XXX: In IPython 3.x and above, the Qt console will not attempt to - # display HTML, so this check can be removed when support for - # IPython 2.x is no longer needed. - if console.in_qtconsole(): - # 'HTML output is disabled in QtConsole' - return None - if self._info_repr(): buf = StringIO(u("")) self.info(buf=buf) @@ -727,7 +718,7 @@ def style(self): See Also -------- - pandas.io.formats.style.Styler + io.formats.style.Styler """ from pandas.io.formats.style import Styler return Styler(self) @@ -847,7 +838,7 @@ def itertuples(self, index=True, name="Pandas"): ---------- index : bool, default True If True, return the index as the first element of the tuple. - name : str, default "Pandas" + name : str or None, default "Pandas" The name of the returned namedtuples or None to return regular tuples. @@ -1074,7 +1065,7 @@ def from_dict(cls, data, orient='columns', dtype=None, columns=None): Returns ------- - pandas.DataFrame + DataFrame See Also -------- @@ -1154,7 +1145,7 @@ def to_numpy(self, dtype=None, copy=False): Returns ------- - array : numpy.ndarray + numpy.ndarray See Also -------- @@ -1290,23 +1281,26 @@ def to_dict(self, orient='dict', into=dict): ('columns', self.columns.tolist()), ('data', [ list(map(com.maybe_box_datetimelike, t)) - for t in self.itertuples(index=False)] - ))) + for t in self.itertuples(index=False, name=None) + ]))) elif orient.lower().startswith('s'): return into_c((k, com.maybe_box_datetimelike(v)) for k, v in compat.iteritems(self)) elif orient.lower().startswith('r'): + columns = self.columns.tolist() + rows = (dict(zip(columns, row)) + for row in self.itertuples(index=False, name=None)) return [ into_c((k, com.maybe_box_datetimelike(v)) - for k, v in compat.iteritems(row._asdict())) - for row in self.itertuples(index=False)] + for k, v in compat.iteritems(row)) + for row in rows] elif orient.lower().startswith('i'): if not self.index.is_unique: raise ValueError( "DataFrame index must be unique for orient='index'." ) return into_c((t[0], dict(zip(self.columns, t[1:]))) - for t in self.itertuples()) + for t in self.itertuples(name=None)) else: raise ValueError("orient '{o}' not understood".format(o=orient)) @@ -1406,7 +1400,7 @@ def to_gbq(self, destination_table, project_id=None, chunksize=None, See Also -------- pandas_gbq.to_gbq : This function in the pandas-gbq library. - pandas.read_gbq : Read a DataFrame from Google BigQuery. + read_gbq : Read a DataFrame from Google BigQuery. """ from pandas.io import gbq return gbq.to_gbq( @@ -1445,7 +1439,7 @@ def from_records(cls, data, index=None, exclude=None, columns=None, Returns ------- - df : DataFrame + DataFrame """ # Make a copy of the input columns so we can modify it @@ -1524,8 +1518,8 @@ def from_records(cls, data, index=None, exclude=None, columns=None, result_index = Index([], name=index) else: try: - to_remove = [arr_columns.get_loc(field) for field in index] - index_data = [arrays[i] for i in to_remove] + index_data = [arrays[arr_columns.get_loc(field)] + for field in index] result_index = ensure_index_from_sequences(index_data, names=index) @@ -1716,7 +1710,8 @@ def to_records(self, index=True, convert_datetime64=None, # string naming a type. if dtype_mapping is None: formats.append(v.dtype) - elif isinstance(dtype_mapping, (type, compat.string_types)): + elif isinstance(dtype_mapping, (type, np.dtype, + compat.string_types)): formats.append(dtype_mapping) else: element = "row" if i < index_len else "column" @@ -1760,7 +1755,7 @@ def from_items(cls, items, columns=None, orient='columns'): Returns ------- - frame : DataFrame + DataFrame """ warnings.warn("from_items is deprecated. Please use " @@ -1831,14 +1826,14 @@ def from_csv(cls, path, header=0, sep=',', index_col=0, parse_dates=True, Read CSV file. .. deprecated:: 0.21.0 - Use :func:`pandas.read_csv` instead. + Use :func:`read_csv` instead. - It is preferable to use the more powerful :func:`pandas.read_csv` + It is preferable to use the more powerful :func:`read_csv` for most general purposes, but ``from_csv`` makes for an easy roundtrip to and from a file (the exact counterpart of ``to_csv``), especially with a DataFrame of time series data. - This method only differs from the preferred :func:`pandas.read_csv` + This method only differs from the preferred :func:`read_csv` in some defaults: - `index_col` is ``0`` instead of ``None`` (take first column as index @@ -1871,11 +1866,11 @@ def from_csv(cls, path, header=0, sep=',', index_col=0, parse_dates=True, Returns ------- - y : DataFrame + DataFrame See Also -------- - pandas.read_csv + read_csv """ warnings.warn("from_csv is deprecated. Please use read_csv(...) " @@ -1961,47 +1956,9 @@ def to_panel(self): Returns ------- - panel : Panel + Panel """ - # only support this kind for now - if (not isinstance(self.index, MultiIndex) or # pragma: no cover - len(self.index.levels) != 2): - raise NotImplementedError('Only 2-level MultiIndex are supported.') - - if not self.index.is_unique: - raise ValueError("Can't convert non-uniquely indexed " - "DataFrame to Panel") - - self._consolidate_inplace() - - # minor axis must be sorted - if self.index.lexsort_depth < 2: - selfsorted = self.sort_index(level=0) - else: - selfsorted = self - - major_axis, minor_axis = selfsorted.index.levels - major_codes, minor_codes = selfsorted.index.codes - shape = len(major_axis), len(minor_axis) - - # preserve names, if any - major_axis = major_axis.copy() - major_axis.name = self.index.names[0] - - minor_axis = minor_axis.copy() - minor_axis.name = self.index.names[1] - - # create new axes - new_axes = [selfsorted.columns, major_axis, minor_axis] - - # create new manager - new_mgr = selfsorted._data.reshape_nd(axes=new_axes, - labels=[major_codes, - minor_codes], - shape=shape, - ref_items=selfsorted.columns) - - return self._constructor_expanddim(new_mgr) + raise NotImplementedError("Panel is being removed in pandas 0.25.0.") @deprecate_kwarg(old_arg_name='encoding', new_arg_name=None) def to_stata(self, fname, convert_dates=None, write_index=True, @@ -2521,7 +2478,7 @@ def memory_usage(self, index=True, deep=False): Returns ------- - sizes : Series + Series A Series whose index is the original column names and whose values is the memory usage of each column in bytes. @@ -2530,7 +2487,7 @@ def memory_usage(self, index=True, deep=False): numpy.ndarray.nbytes : Total bytes consumed by the elements of an ndarray. Series.memory_usage : Bytes consumed by a Series. - pandas.Categorical : Memory-efficient array for string values with + Categorical : Memory-efficient array for string values with many repeated values. DataFrame.info : Concise summary of a DataFrame. @@ -2739,7 +2696,7 @@ def get_value(self, index, col, takeable=False): Returns ------- - value : scalar value + scalar """ warnings.warn("get_value is deprecated and will be removed " @@ -2779,14 +2736,14 @@ def set_value(self, index, col, value, takeable=False): ---------- index : row label col : column label - value : scalar value + value : scalar takeable : interpret the index/col as indexers, default False Returns ------- - frame : DataFrame + DataFrame If label pair is contained, will be reference to calling DataFrame, - otherwise a new object + otherwise a new object. """ warnings.warn("set_value is deprecated and will be removed " "in a future release. Please use " @@ -2881,6 +2838,7 @@ def _ixs(self, i, axis=0): return result def __getitem__(self, key): + key = lib.item_from_zerodim(key) key = com.apply_if_callable(key, self) # shortcut if the key is in columns @@ -3005,28 +2963,30 @@ def query(self, expr, inplace=False, **kwargs): Parameters ---------- - expr : string + expr : str The query string to evaluate. You can refer to variables in the environment by prefixing them with an '@' character like ``@a + b``. inplace : bool Whether the query should modify the data in place or return - a modified copy + a modified copy. + **kwargs + See the documentation for :func:`eval` for complete details + on the keyword arguments accepted by :meth:`DataFrame.query`. .. versionadded:: 0.18.0 - kwargs : dict - See the documentation for :func:`pandas.eval` for complete details - on the keyword arguments accepted by :meth:`DataFrame.query`. - Returns ------- - q : DataFrame + DataFrame + DataFrame resulting from the provided query expression. See Also -------- - pandas.eval - DataFrame.eval + eval : Evaluate a string describing operations on + DataFrame columns. + DataFrame.eval : Evaluate a string describing operations on + DataFrame columns. Notes ----- @@ -3035,7 +2995,7 @@ def query(self, expr, inplace=False, **kwargs): multidimensional key (e.g., a DataFrame) then the result will be passed to :meth:`DataFrame.__getitem__`. - This method uses the top-level :func:`pandas.eval` function to + This method uses the top-level :func:`eval` function to evaluate the passed query. The :meth:`~pandas.DataFrame.query` method uses a slightly @@ -3065,9 +3025,23 @@ def query(self, expr, inplace=False, **kwargs): Examples -------- - >>> df = pd.DataFrame(np.random.randn(10, 2), columns=list('ab')) - >>> df.query('a > b') - >>> df[df.a > df.b] # same result as the previous expression + >>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)}) + >>> df + A B + 0 1 10 + 1 2 8 + 2 3 6 + 3 4 4 + 4 5 2 + >>> df.query('A > B') + A B + 4 5 2 + + The previous expression is equivalent to + + >>> df[df.A > df.B] + A B + 4 5 2 """ inplace = validate_bool_kwarg(inplace, 'inplace') if not isinstance(expr, compat.string_types): @@ -3108,7 +3082,7 @@ def eval(self, expr, inplace=False, **kwargs): .. versionadded:: 0.18.0. kwargs : dict - See the documentation for :func:`~pandas.eval` for complete details + See the documentation for :func:`eval` for complete details on the keyword arguments accepted by :meth:`~pandas.DataFrame.query`. @@ -3123,12 +3097,12 @@ def eval(self, expr, inplace=False, **kwargs): of a frame. DataFrame.assign : Can evaluate an expression or function to create new values for a column. - pandas.eval : Evaluate a Python expression as a string using various + eval : Evaluate a Python expression as a string using various backends. Notes ----- - For more details see the API documentation for :func:`~pandas.eval`. + For more details see the API documentation for :func:`~eval`. For detailed examples see :ref:`enhancing performance with eval `. @@ -3204,7 +3178,7 @@ def select_dtypes(self, include=None, exclude=None): Returns ------- - subset : DataFrame + DataFrame The subset of the frame including the dtypes in ``include`` and excluding the dtypes in ``exclude``. @@ -3569,7 +3543,7 @@ def _sanitize_column(self, key, value, broadcast=True): Returns ------- - sanitized_column : numpy-array + numpy.ndarray """ def reindexer(value): @@ -3823,7 +3797,12 @@ def drop(self, labels=None, axis=0, index=None, columns=None, axis : {0 or 'index', 1 or 'columns'}, default 0 Whether to drop labels from the index (0 or 'index') or columns (1 or 'columns'). - index, columns : single label or list-like + index : single label or list-like + Alternative to specifying axis (``labels, axis=0`` + is equivalent to ``index=labels``). + + .. versionadded:: 0.21.0 + columns : single label or list-like Alternative to specifying axis (``labels, axis=1`` is equivalent to ``columns=labels``). @@ -3838,12 +3817,13 @@ def drop(self, labels=None, axis=0, index=None, columns=None, Returns ------- - dropped : pandas.DataFrame + DataFrame + DataFrame without the removed index or column labels. Raises ------ KeyError - If none of the labels are found in the selected axis + If any of the labels is not found in the selected axis. See Also -------- @@ -3856,7 +3836,7 @@ def drop(self, labels=None, axis=0, index=None, columns=None, Examples -------- - >>> df = pd.DataFrame(np.arange(12).reshape(3,4), + >>> df = pd.DataFrame(np.arange(12).reshape(3, 4), ... columns=['A', 'B', 'C', 'D']) >>> df A B C D @@ -3893,7 +3873,7 @@ def drop(self, labels=None, axis=0, index=None, columns=None, >>> df = pd.DataFrame(index=midx, columns=['big', 'small'], ... data=[[45, 30], [200, 100], [1.5, 1], [30, 20], ... [250, 150], [1.5, 0.8], [320, 250], - ... [1, 0.8], [0.3,0.2]]) + ... [1, 0.8], [0.3, 0.2]]) >>> df big small lama speed 45.0 30.0 @@ -3963,11 +3943,11 @@ def rename(self, *args, **kwargs): Returns ------- - renamed : DataFrame + DataFrame See Also -------- - pandas.DataFrame.rename_axis + DataFrame.rename_axis Examples -------- @@ -4051,7 +4031,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False, This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, "array" - encompasses :class:`Series`, :class:`Index` and ``np.ndarray``. + encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and + instances of :class:`abc.Iterator`. drop : bool, default True Delete columns to be used as the new index. append : bool, default False @@ -4127,33 +4108,34 @@ def set_index(self, keys, drop=True, append=False, inplace=False, 4 16 10 2014 31 """ inplace = validate_bool_kwarg(inplace, 'inplace') + if not isinstance(keys, list): + keys = [keys] err_msg = ('The parameter "keys" may be a column key, one-dimensional ' 'array, or a list containing only valid column keys and ' 'one-dimensional arrays.') - if (is_scalar(keys) or isinstance(keys, tuple) - or isinstance(keys, (ABCIndexClass, ABCSeries, np.ndarray))): - # make sure we have a container of keys/arrays we can iterate over - # tuples can appear as valid column keys! - keys = [keys] - elif not isinstance(keys, list): - raise ValueError(err_msg) - missing = [] for col in keys: - if (is_scalar(col) or isinstance(col, tuple)): - # if col is a valid column key, everything is fine - # tuples are always considered keys, never as list-likes - if col not in self: - missing.append(col) - elif (not isinstance(col, (ABCIndexClass, ABCSeries, - np.ndarray, list)) - or getattr(col, 'ndim', 1) > 1): - raise ValueError(err_msg) + if isinstance(col, (ABCIndexClass, ABCSeries, np.ndarray, + list, Iterator)): + # arrays are fine as long as they are one-dimensional + # iterators get converted to list below + if getattr(col, 'ndim', 1) != 1: + raise ValueError(err_msg) + else: + # everything else gets tried as a key; see GH 24969 + try: + found = col in self.columns + except TypeError: + raise TypeError(err_msg + ' Received column of ' + 'type {}'.format(type(col))) + else: + if not found: + missing.append(col) if missing: - raise KeyError('{}'.format(missing)) + raise KeyError('None of {} are in the columns'.format(missing)) if inplace: frame = self @@ -4183,6 +4165,9 @@ def set_index(self, keys, drop=True, append=False, inplace=False, elif isinstance(col, (list, np.ndarray)): arrays.append(col) names.append(None) + elif isinstance(col, Iterator): + arrays.append(list(col)) + names.append(None) # from here, col can only be a column label else: arrays.append(frame[col]._values) @@ -4190,6 +4175,15 @@ def set_index(self, keys, drop=True, append=False, inplace=False, if drop: to_remove.append(col) + if len(arrays[-1]) != len(self): + # check newest element against length of calling frame, since + # ensure_index_from_sequences would not raise for append=False. + raise ValueError('Length mismatch: Expected {len_self} rows, ' + 'received array of length {len_col}'.format( + len_self=len(self), + len_col=len(arrays[-1]) + )) + index = ensure_index_from_sequences(arrays, names) if verify_integrity and not index.is_unique: @@ -4614,7 +4608,8 @@ def dropna(self, axis=0, how='any', thresh=None, subset=None, def drop_duplicates(self, subset=None, keep='first', inplace=False): """ Return DataFrame with duplicate rows removed, optionally only - considering certain columns. + considering certain columns. Indexes, including time indexes + are ignored. Parameters ---------- @@ -4630,7 +4625,7 @@ def drop_duplicates(self, subset=None, keep='first', inplace=False): Returns ------- - deduplicated : DataFrame + DataFrame """ if self.empty: return self.copy() @@ -4664,13 +4659,13 @@ def duplicated(self, subset=None, keep='first'): Returns ------- - duplicated : Series + Series """ from pandas.core.sorting import get_group_index from pandas._libs.hashtable import duplicated_int64, _SIZE_HINT_LIMIT if self.empty: - return Series() + return Series(dtype=bool) def f(vals): labels, shape = algorithms.factorize( @@ -5032,7 +5027,7 @@ def swaplevel(self, i=-2, j=-1, axis=0): Returns ------- - swapped : same type as caller (new object) + DataFrame .. versionchanged:: 0.18.1 @@ -5130,8 +5125,7 @@ def _combine_const(self, other, func): def combine(self, other, func, fill_value=None, overwrite=True): """ - Perform column-wise combine with another DataFrame based on a - passed function. + Perform column-wise combine with another DataFrame. Combines a DataFrame with `other` DataFrame using `func` to element-wise combine columns. The row and column indexes of the @@ -5147,13 +5141,14 @@ def combine(self, other, func, fill_value=None, overwrite=True): fill_value : scalar value, default None The value to fill NaNs with prior to passing any column to the merge func. - overwrite : boolean, default True + overwrite : bool, default True If True, columns in `self` that do not exist in `other` will be overwritten with NaNs. Returns ------- - result : DataFrame + DataFrame + Combination of the provided DataFrames. See Also -------- @@ -5197,15 +5192,15 @@ def combine(self, other, func, fill_value=None, overwrite=True): >>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]}) >>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]}) >>> df1.combine(df2, take_smaller, fill_value=-5) - A B - 0 0 NaN + A B + 0 0 -5.0 1 0 3.0 Example that demonstrates the use of `overwrite` and behavior when the axis differ between the dataframes. >>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]}) - >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2]) + >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2]) >>> df1.combine(df2, take_smaller) A B C 0 NaN NaN NaN @@ -5220,7 +5215,7 @@ def combine(self, other, func, fill_value=None, overwrite=True): Demonstrating the preference of the passed in dataframe. - >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2]) + >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2]) >>> df2.combine(df1, take_smaller) A B C 0 0.0 NaN NaN @@ -5311,7 +5306,7 @@ def combine_first(self, other): Returns ------- - combined : DataFrame + DataFrame See Also -------- @@ -5672,7 +5667,7 @@ def pivot(self, index=None, columns=None, values=None): Returns ------- - table : DataFrame + DataFrame See Also -------- @@ -5704,19 +5699,19 @@ def pivot(self, index=None, columns=None, values=None): This first example aggregates values by taking the sum. - >>> table = pivot_table(df, values='D', index=['A', 'B'], + >>> table = pd.pivot_table(df, values='D', index=['A', 'B'], ... columns=['C'], aggfunc=np.sum) >>> table C large small A B - bar one 4 5 - two 7 6 - foo one 4 1 - two NaN 6 + bar one 4.0 5.0 + two 7.0 6.0 + foo one 4.0 1.0 + two NaN 6.0 We can also fill missing values using the `fill_value` parameter. - >>> table = pivot_table(df, values='D', index=['A', 'B'], + >>> table = pd.pivot_table(df, values='D', index=['A', 'B'], ... columns=['C'], aggfunc=np.sum, fill_value=0) >>> table C large small @@ -5728,12 +5723,11 @@ def pivot(self, index=None, columns=None, values=None): The next example aggregates by taking the mean across multiple columns. - >>> table = pivot_table(df, values=['D', 'E'], index=['A', 'C'], + >>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'], ... aggfunc={'D': np.mean, ... 'E': np.mean}) >>> table - D E - mean mean + D E A C bar large 5.500000 7.500000 small 5.500000 8.500000 @@ -5743,17 +5737,17 @@ def pivot(self, index=None, columns=None, values=None): We can also calculate multiple types of aggregations for any given value column. - >>> table = pivot_table(df, values=['D', 'E'], index=['A', 'C'], + >>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'], ... aggfunc={'D': np.mean, ... 'E': [min, max, np.mean]}) >>> table - D E - mean max mean min + D E + mean max mean min A C - bar large 5.500000 9 7.500000 6 - small 5.500000 9 8.500000 8 - foo large 2.000000 5 4.500000 4 - small 2.333333 6 4.333333 2 + bar large 5.500000 9.0 7.500000 6.0 + small 5.500000 9.0 8.500000 8.0 + foo large 2.000000 5.0 4.500000 4.0 + small 2.333333 6.0 4.333333 2.0 """ @Substitution('') @@ -5813,9 +5807,9 @@ def stack(self, level=-1, dropna=True): Notes ----- The function is named by analogy with a collection of books - being re-organised from being side by side on a horizontal + being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked - vertically on top of of each other (in the index of the + vertically on top of each other (in the index of the dataframe). Examples @@ -5959,7 +5953,7 @@ def unstack(self, level=-1, fill_value=None): Returns ------- - unstacked : DataFrame or Series + Series or DataFrame See Also -------- @@ -6001,7 +5995,7 @@ def unstack(self, level=-1, fill_value=None): return unstack(self, level, fill_value) _shared_docs['melt'] = (""" - Unpivots a DataFrame from wide format to long format, optionally + Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set. This function is useful to massage a DataFrame into a format where one @@ -6125,7 +6119,7 @@ def diff(self, periods=1, axis=0): Returns ------- - diffed : DataFrame + DataFrame See Also -------- @@ -6238,11 +6232,11 @@ def _gotitem(self, -------- DataFrame.apply : Perform any type of operations. DataFrame.transform : Perform transformation type operations. - pandas.core.groupby.GroupBy : Perform operations over groups. - pandas.core.resample.Resampler : Perform operations over resampled bins. - pandas.core.window.Rolling : Perform operations over rolling window. - pandas.core.window.Expanding : Perform operations over expanding window. - pandas.core.window.EWM : Perform operation over exponential weighted + core.groupby.GroupBy : Perform operations over groups. + core.resample.Resampler : Perform operations over resampled bins. + core.window.Rolling : Perform operations over rolling window. + core.window.Expanding : Perform operations over expanding window. + core.window.EWM : Perform operation over exponential weighted window. """) @@ -6397,7 +6391,9 @@ def apply(self, func, axis=0, broadcast=None, raw=False, reduce=None, Returns ------- - applied : Series or DataFrame + Series or DataFrame + Result of applying ``func`` along the given axis of the + DataFrame. See Also -------- @@ -6416,7 +6412,7 @@ def apply(self, func, axis=0, broadcast=None, raw=False, reduce=None, Examples -------- - >>> df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B']) + >>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B']) >>> df A B 0 4 9 @@ -6590,11 +6586,11 @@ def append(self, other, ignore_index=False, Returns ------- - appended : DataFrame + DataFrame See Also -------- - pandas.concat : General function to concatenate DataFrame, Series + concat : General function to concatenate DataFrame, Series or Panel objects. Notes @@ -6891,41 +6887,67 @@ def round(self, decimals=0, *args, **kwargs): columns not included in `decimals` will be left as is. Elements of `decimals` which are not columns of the input will be ignored. + *args + Additional keywords have no effect but might be accepted for + compatibility with numpy. + **kwargs + Additional keywords have no effect but might be accepted for + compatibility with numpy. Returns ------- DataFrame + A DataFrame with the affected columns rounded to the specified + number of decimal places. See Also -------- - numpy.around - Series.round + numpy.around : Round a numpy array to the given number of decimals. + Series.round : Round a Series to the given number of decimals. Examples -------- - >>> df = pd.DataFrame(np.random.random([3, 3]), - ... columns=['A', 'B', 'C'], index=['first', 'second', 'third']) + >>> df = pd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)], + ... columns=['dogs', 'cats']) >>> df - A B C - first 0.028208 0.992815 0.173891 - second 0.038683 0.645646 0.577595 - third 0.877076 0.149370 0.491027 - >>> df.round(2) - A B C - first 0.03 0.99 0.17 - second 0.04 0.65 0.58 - third 0.88 0.15 0.49 - >>> df.round({'A': 1, 'C': 2}) - A B C - first 0.0 0.992815 0.17 - second 0.0 0.645646 0.58 - third 0.9 0.149370 0.49 - >>> decimals = pd.Series([1, 0, 2], index=['A', 'B', 'C']) + dogs cats + 0 0.21 0.32 + 1 0.01 0.67 + 2 0.66 0.03 + 3 0.21 0.18 + + By providing an integer each column is rounded to the same number + of decimal places + + >>> df.round(1) + dogs cats + 0 0.2 0.3 + 1 0.0 0.7 + 2 0.7 0.0 + 3 0.2 0.2 + + With a dict, the number of places for specific columns can be + specfified with the column names as key and the number of decimal + places as value + + >>> df.round({'dogs': 1, 'cats': 0}) + dogs cats + 0 0.2 0.0 + 1 0.0 1.0 + 2 0.7 0.0 + 3 0.2 0.0 + + Using a Series, the number of places for specific columns can be + specfified with the column names as index and the number of + decimal places as value + + >>> decimals = pd.Series([0, 1], index=['cats', 'dogs']) >>> df.round(decimals) - A B C - first 0.0 1 0.17 - second 0.0 1 0.58 - third 0.9 0 0.49 + dogs cats + 0 0.2 0.0 + 1 0.0 1.0 + 2 0.7 0.0 + 3 0.2 0.0 """ from pandas.core.reshape.concat import concat @@ -6978,16 +7000,18 @@ def corr(self, method='pearson', min_periods=1): * spearman : Spearman rank correlation * callable: callable with input two 1d ndarrays and returning a float + .. versionadded:: 0.24.0 min_periods : int, optional Minimum number of observations required per pair of columns - to have a valid result. Currently only available for pearson - and spearman correlation + to have a valid result. Currently only available for Pearson + and Spearman correlation. Returns ------- - y : DataFrame + DataFrame + Correlation matrix. See Also -------- @@ -6996,14 +7020,15 @@ def corr(self, method='pearson', min_periods=1): Examples -------- - >>> histogram_intersection = lambda a, b: np.minimum(a, b - ... ).sum().round(decimals=1) + >>> def histogram_intersection(a, b): + ... v = np.minimum(a, b).sum().round(decimals=1) + ... return v >>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], ... columns=['dogs', 'cats']) >>> df.corr(method=histogram_intersection) - dogs cats - dogs 1.0 0.3 - cats 0.3 1.0 + dogs cats + dogs 1.0 0.3 + cats 0.3 1.0 """ numeric_df = self._get_numeric_data() cols = numeric_df.columns @@ -7078,10 +7103,10 @@ def cov(self, min_periods=None): See Also -------- - pandas.Series.cov : Compute covariance with another Series. - pandas.core.window.EWM.cov: Exponential weighted sample covariance. - pandas.core.window.Expanding.cov : Expanding sample covariance. - pandas.core.window.Rolling.cov : Rolling sample covariance. + Series.cov : Compute covariance with another Series. + core.window.EWM.cov: Exponential weighted sample covariance. + core.window.Expanding.cov : Expanding sample covariance. + core.window.Rolling.cov : Rolling sample covariance. Notes ----- @@ -7166,10 +7191,11 @@ def corrwith(self, other, axis=0, drop=False, method='pearson'): Parameters ---------- other : DataFrame, Series + Object with which to compute correlations. axis : {0 or 'index', 1 or 'columns'}, default 0 - 0 or 'index' to compute column-wise, 1 or 'columns' for row-wise - drop : boolean, default False - Drop missing indices from result + 0 or 'index' to compute column-wise, 1 or 'columns' for row-wise. + drop : bool, default False + Drop missing indices from result. method : {'pearson', 'kendall', 'spearman'} or callable * pearson : standard correlation coefficient * kendall : Kendall Tau correlation coefficient @@ -7181,7 +7207,8 @@ def corrwith(self, other, axis=0, drop=False, method='pearson'): Returns ------- - correls : Series + Series + Pairwise correlations. See Also ------- @@ -7262,7 +7289,7 @@ def count(self, axis=0, level=None, numeric_only=False): If the axis is a `MultiIndex` (hierarchical), count along a particular `level`, collapsing into a `DataFrame`. A `str` specifies the level name. - numeric_only : boolean, default False + numeric_only : bool, default False Include only `float`, `int` or `boolean` data. Returns @@ -7464,7 +7491,8 @@ def f(x): if filter_type is None or filter_type == 'numeric': data = self._get_numeric_data() elif filter_type == 'bool': - data = self + # GH 25101, # GH 24434 + data = self._get_bool_data() if axis == 0 else self else: # pragma: no cover msg = ("Generating numeric_only data with filter_type {f}" "not supported.".format(f=filter_type)) @@ -7510,7 +7538,7 @@ def nunique(self, axis=0, dropna=True): Returns ------- - nunique : Series + Series See Also -------- @@ -7548,7 +7576,8 @@ def idxmin(self, axis=0, skipna=True): Returns ------- - idxmin : Series + Series + Indexes of minima along the specified axis. Raises ------ @@ -7584,7 +7613,8 @@ def idxmax(self, axis=0, skipna=True): Returns ------- - idxmax : Series + Series + Indexes of maxima along the specified axis. Raises ------ @@ -7731,12 +7761,12 @@ def quantile(self, q=0.5, axis=0, numeric_only=True, Returns ------- - quantiles : Series or DataFrame + Series or DataFrame - - If ``q`` is an array, a DataFrame will be returned where the + If ``q`` is an array, a DataFrame will be returned where the index is ``q``, the columns are the columns of self, and the values are the quantiles. - - If ``q`` is a float, a Series will be returned where the + If ``q`` is a float, a Series will be returned where the index is the columns of self and the values are the quantiles. See Also @@ -7801,19 +7831,19 @@ def to_timestamp(self, freq=None, how='start', axis=0, copy=True): Parameters ---------- - freq : string, default frequency of PeriodIndex - Desired frequency + freq : str, default frequency of PeriodIndex + Desired frequency. how : {'s', 'e', 'start', 'end'} Convention for converting period to timestamp; start of period - vs. end + vs. end. axis : {0 or 'index', 1 or 'columns'}, default 0 - The axis to convert (the index by default) - copy : boolean, default True - If false then underlying input data is not copied + The axis to convert (the index by default). + copy : bool, default True + If False then underlying input data is not copied. Returns ------- - df : DataFrame with DatetimeIndex + DataFrame with DatetimeIndex """ new_data = self._data if copy: @@ -7837,15 +7867,16 @@ def to_period(self, freq=None, axis=0, copy=True): Parameters ---------- - freq : string, default + freq : str, default + Frequency of the PeriodIndex. axis : {0 or 'index', 1 or 'columns'}, default 0 - The axis to convert (the index by default) - copy : boolean, default True - If False then underlying input data is not copied + The axis to convert (the index by default). + copy : bool, default True + If False then underlying input data is not copied. Returns ------- - ts : TimeSeries with PeriodIndex + TimeSeries with PeriodIndex """ new_data = self._data if copy: @@ -7918,7 +7949,7 @@ def isin(self, values): match. Note that 'falcon' does not match based on the number of legs in df2. - >>> other = pd.DataFrame({'num_legs': [8, 2],'num_wings': [0, 2]}, + >>> other = pd.DataFrame({'num_legs': [8, 2], 'num_wings': [0, 2]}, ... index=['spider', 'falcon']) >>> df.isin(other) num_legs num_wings diff --git a/pandas/core/generic.py b/pandas/core/generic.py index a351233a77465..ee8f9cba951b3 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -61,6 +61,10 @@ by : str or list of str Name or list of names to sort by""") +# sentinel value to use as kwarg in place of None when None has special meaning +# and needs to be distinguished from a user explicitly passing None. +sentinel = object() + def _single_replace(self, to_replace, method, inplace, limit): """ @@ -290,11 +294,16 @@ def _construct_axes_dict_for_slice(self, axes=None, **kwargs): d.update(kwargs) return d - def _construct_axes_from_arguments(self, args, kwargs, require_all=False): + def _construct_axes_from_arguments( + self, args, kwargs, require_all=False, sentinel=None): """Construct and returns axes if supplied in args/kwargs. If require_all, raise if all axis arguments are not supplied return a tuple of (axes, kwargs). + + sentinel specifies the default parameter when an axis is not + supplied; useful to distinguish when a user explicitly passes None + in scenarios where None has special meaning. """ # construct the args @@ -322,7 +331,7 @@ def _construct_axes_from_arguments(self, args, kwargs, require_all=False): raise TypeError("not enough/duplicate arguments " "specified!") - axes = {a: kwargs.pop(a, None) for a in self._AXIS_ORDERS} + axes = {a: kwargs.pop(a, sentinel) for a in self._AXIS_ORDERS} return axes, kwargs @classmethod @@ -639,6 +648,8 @@ def transpose(self, *args, **kwargs): copy : boolean, default False Make a copy of the underlying data. Mixed-dtype data will always result in a copy + **kwargs + Additional keyword arguments will be passed to the function. Returns ------- @@ -763,18 +774,18 @@ def pop(self, item): Parameters ---------- item : str - Column label to be popped + Label of column to be popped. Returns ------- - popped : Series + Series Examples -------- - >>> df = pd.DataFrame([('falcon', 'bird', 389.0), - ... ('parrot', 'bird', 24.0), - ... ('lion', 'mammal', 80.5), - ... ('monkey', 'mammal', np.nan)], + >>> df = pd.DataFrame([('falcon', 'bird', 389.0), + ... ('parrot', 'bird', 24.0), + ... ('lion', 'mammal', 80.5), + ... ('monkey','mammal', np.nan)], ... columns=('name', 'class', 'max_speed')) >>> df name class max_speed @@ -926,7 +937,7 @@ def swaplevel(self, i=-2, j=-1, axis=0): Parameters ---------- - i, j : int, string (can be mixed) + i, j : int, str (can be mixed) Level of index to be swapped. Can pass level name as string. Returns @@ -962,9 +973,9 @@ def rename(self, *args, **kwargs): and raise on DataFrame or Panel. dict-like or functions are transformations to apply to that axis' values - copy : boolean, default True - Also copy underlying data - inplace : boolean, default False + copy : bool, default True + Also copy underlying data. + inplace : bool, default False Whether to return a new %(klass)s. If True then value of copy is ignored. level : int or level name, default None @@ -977,7 +988,7 @@ def rename(self, *args, **kwargs): See Also -------- - pandas.NDFrame.rename_axis + NDFrame.rename_axis Examples -------- @@ -1089,7 +1100,7 @@ def rename(self, *args, **kwargs): @rewrite_axis_style_signature('mapper', [('copy', True), ('inplace', False)]) - def rename_axis(self, mapper=None, **kwargs): + def rename_axis(self, mapper=sentinel, **kwargs): """ Set the name of the axis for the index or columns. @@ -1218,7 +1229,8 @@ class name cat 4 0 monkey 2 2 """ - axes, kwargs = self._construct_axes_from_arguments((), kwargs) + axes, kwargs = self._construct_axes_from_arguments( + (), kwargs, sentinel=sentinel) copy = kwargs.pop('copy', True) inplace = kwargs.pop('inplace', False) axis = kwargs.pop('axis', 0) @@ -1231,7 +1243,7 @@ class name inplace = validate_bool_kwarg(inplace, 'inplace') - if (mapper is not None): + if (mapper is not sentinel): # Use v0.23 behavior if a scalar or list non_mapper = is_scalar(mapper) or (is_list_like(mapper) and not is_dict_like(mapper)) @@ -1254,7 +1266,7 @@ class name for axis in lrange(self._AXIS_LEN): v = axes.get(self._AXIS_NAMES[axis]) - if v is None: + if v is sentinel: continue non_mapper = is_scalar(v) or (is_list_like(v) and not is_dict_like(v)) @@ -1321,7 +1333,6 @@ def _set_axis_name(self, name, axis=0, inplace=False): cat 4 monkey 2 """ - pd.MultiIndex.from_product([["mammal"], ['dog', 'cat', 'monkey']]) axis = self._get_axis_number(axis) idx = self._get_axis(axis).set_names(name) @@ -1554,14 +1565,14 @@ def _is_label_reference(self, key, axis=0): ------- is_label: bool """ - axis = self._get_axis_number(axis) - other_axes = [ax for ax in range(self._AXIS_LEN) if ax != axis] - if self.ndim > 2: raise NotImplementedError( "_is_label_reference is not implemented for {type}" .format(type=type(self))) + axis = self._get_axis_number(axis) + other_axes = (ax for ax in range(self._AXIS_LEN) if ax != axis) + return (key is not None and is_hashable(key) and any(key in self.axes[ax] for ax in other_axes)) @@ -1613,15 +1624,14 @@ def _check_label_or_level_ambiguity(self, key, axis=0): ------ ValueError: `key` is ambiguous """ - - axis = self._get_axis_number(axis) - other_axes = [ax for ax in range(self._AXIS_LEN) if ax != axis] - if self.ndim > 2: raise NotImplementedError( "_check_label_or_level_ambiguity is not implemented for {type}" .format(type=type(self))) + axis = self._get_axis_number(axis) + other_axes = (ax for ax in range(self._AXIS_LEN) if ax != axis) + if (key is not None and is_hashable(key) and key in self.axes[axis].names and @@ -1679,15 +1689,14 @@ def _get_label_or_level_values(self, key, axis=0): if `key` is ambiguous. This will become an ambiguity error in a future version """ - - axis = self._get_axis_number(axis) - other_axes = [ax for ax in range(self._AXIS_LEN) if ax != axis] - if self.ndim > 2: raise NotImplementedError( "_get_label_or_level_values is not implemented for {type}" .format(type=type(self))) + axis = self._get_axis_number(axis) + other_axes = [ax for ax in range(self._AXIS_LEN) if ax != axis] + if self._is_label_reference(key, axis=axis): self._check_label_or_level_ambiguity(key, axis=axis) values = self.xs(key, axis=other_axes[0])._values @@ -1743,14 +1752,13 @@ def _drop_labels_or_levels(self, keys, axis=0): ValueError if any `keys` match neither a label nor a level """ - - axis = self._get_axis_number(axis) - if self.ndim > 2: raise NotImplementedError( "_drop_labels_or_levels is not implemented for {type}" .format(type=type(self))) + axis = self._get_axis_number(axis) + # Validate keys keys = com.maybe_make_list(keys) invalid_keys = [k for k in keys if not @@ -1807,7 +1815,7 @@ def __hash__(self): ' hashed'.format(self.__class__.__name__)) def __iter__(self): - """Iterate over infor axis""" + """Iterate over info axis""" return iter(self._info_axis) # can we get a better explanation of this? @@ -1851,8 +1859,8 @@ def empty(self): See Also -------- - pandas.Series.dropna - pandas.DataFrame.dropna + Series.dropna + DataFrame.dropna Notes ----- @@ -2799,14 +2807,17 @@ def to_latex(self, buf=None, columns=None, col_space=None, header=True, defaults to 'ascii' on Python 2 and 'utf-8' on Python 3. decimal : str, default '.' Character recognized as decimal separator, e.g. ',' in Europe. + .. versionadded:: 0.18.0 multicolumn : bool, default True Use \multicolumn to enhance MultiIndex columns. The default will be read from the config module. + .. versionadded:: 0.20.0 multicolumn_format : str, default 'l' The alignment for multicolumns, similar to `column_format` The default will be read from the config module. + .. versionadded:: 0.20.0 multirow : bool, default False Use \multirow to enhance MultiIndex rows. Requires adding a @@ -2814,6 +2825,7 @@ def to_latex(self, buf=None, columns=None, col_space=None, header=True, centered labels (instead of top-aligned) across the contained rows, separating groups via clines. The default will be read from the pandas config module. + .. versionadded:: 0.20.0 Returns @@ -2938,7 +2950,7 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None, will treat them as non-numeric. quotechar : str, default '\"' String of length 1. Character used to quote fields. - line_terminator : string, optional + line_terminator : str, optional The newline character or character sequence to use in the output file. Defaults to `os.linesep`, which depends on the OS in which this method is called ('\n' for linux, '\r\n' for Windows, i.e.). @@ -4940,11 +4952,15 @@ def pipe(self, func, *args, **kwargs): Returns ------- - DataFrame, Series or scalar - if DataFrame.agg is called with a single function, returns a Series - if DataFrame.agg is called with several functions, returns a DataFrame - if Series.agg is called with single function, returns a scalar - if Series.agg is called with several functions, returns a Series + scalar, Series or DataFrame + + The return can be: + + * scalar : when Series.agg is called with single function + * Series : when DataFrame.agg is called with a single function + * DataFrame : when DataFrame.agg is called with several functions + + Return scalar, Series or DataFrame. %(see_also)s @@ -5262,8 +5278,8 @@ def values(self): See Also -------- DataFrame.to_numpy : Recommended alternative to this method. - pandas.DataFrame.index : Retrieve the index labels. - pandas.DataFrame.columns : Retrieving the column names. + DataFrame.index : Retrieve the index labels. + DataFrame.columns : Retrieving the column names. Notes ----- @@ -5334,18 +5350,18 @@ def get_values(self): Return an ndarray after converting sparse values to dense. This is the same as ``.values`` for non-sparse data. For sparse - data contained in a `pandas.SparseArray`, the data are first + data contained in a `SparseArray`, the data are first converted to a dense representation. Returns ------- numpy.ndarray - Numpy representation of DataFrame + Numpy representation of DataFrame. See Also -------- values : Numpy representation of DataFrame. - pandas.SparseArray : Container for sparse data. + SparseArray : Container for sparse data. Examples -------- @@ -5419,7 +5435,7 @@ def get_ftype_counts(self): ------- dtype : Series Series with the count of columns with each type and - sparsity (dense/sparse) + sparsity (dense/sparse). See Also -------- @@ -5466,7 +5482,7 @@ def dtypes(self): See Also -------- - pandas.DataFrame.ftypes : Dtype and sparsity information. + DataFrame.ftypes : Dtype and sparsity information. Examples -------- @@ -5502,8 +5518,8 @@ def ftypes(self): See Also -------- - pandas.DataFrame.dtypes: Series with just dtype information. - pandas.SparseDataFrame : Container for sparse tabular data. + DataFrame.dtypes: Series with just dtype information. + SparseDataFrame : Container for sparse tabular data. Notes ----- @@ -5950,17 +5966,18 @@ def fillna(self, value=None, method=None, axis=None, inplace=False, value : scalar, dict, Series, or DataFrame Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for - each index (for a Series) or column (for a DataFrame). (values not - in the dict/Series/DataFrame will not be filled). This value cannot + each index (for a Series) or column (for a DataFrame). Values not + in the dict/Series/DataFrame will not be filled. This value cannot be a list. method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid - backfill / bfill: use NEXT valid observation to fill gap + backfill / bfill: use next valid observation to fill gap. axis : %(axes_single_arg)s - inplace : boolean, default False - If True, fill in place. Note: this will modify any - other views on this object, (e.g. a no-copy slice for a column in a + Axis along which to fill missing values. + inplace : bool, default False + If True, fill in-place. Note: this will modify any + other views on this object (e.g., a no-copy slice for a column in a DataFrame). limit : int, default None If method is specified, this is the maximum number of consecutive @@ -5970,18 +5987,20 @@ def fillna(self, value=None, method=None, axis=None, inplace=False, maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None. downcast : dict, default is None - a dict of item->dtype of what to downcast if possible, + A dict of item->dtype of what to downcast if possible, or the string 'infer' which will try to downcast to an appropriate - equal type (e.g. float64 to int64 if possible) + equal type (e.g. float64 to int64 if possible). Returns ------- - filled : %(klass)s + %(klass)s + Object with missing values filled. See Also -------- interpolate : Fill NaN values using interpolation. - reindex, asfreq + reindex : Conform object to new index. + asfreq : Convert TimeSeries to specified frequency. Examples -------- @@ -5989,7 +6008,7 @@ def fillna(self, value=None, method=None, axis=None, inplace=False, ... [3, 4, np.nan, 1], ... [np.nan, np.nan, np.nan, 5], ... [np.nan, 3, np.nan, 4]], - ... columns=list('ABCD')) + ... columns=list('ABCD')) >>> df A B C D 0 NaN 2.0 NaN 0 @@ -6599,10 +6618,10 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, * 'pad': Fill in NaNs using existing values. * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline', 'barycentric', 'polynomial': Passed to - `scipy.interpolate.interp1d`. Both 'polynomial' and 'spline' - require that you also specify an `order` (int), - e.g. ``df.interpolate(method='polynomial', order=4)``. - These use the numerical values of the index. + `scipy.interpolate.interp1d`. These methods use the numerical + values of the index. Both 'polynomial' and 'spline' require that + you also specify an `order` (int), e.g. + ``df.interpolate(method='polynomial', order=5)``. * 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima': Wrappers around the SciPy interpolation methods of similar names. See `Notes`. @@ -6637,7 +6656,7 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, (interpolate). * 'outside': Only fill NaNs outside valid values (extrapolate). - .. versionadded:: 0.21.0 + .. versionadded:: 0.23.0 downcast : optional, 'infer' or None, defaults to None Downcast dtypes if possible. @@ -6648,7 +6667,7 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, ------- Series or DataFrame Returns the same object type as the caller, interpolated at - some or all ``NaN`` values + some or all ``NaN`` values. See Also -------- @@ -6743,7 +6762,7 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, Note how the first entry in column 'b' remains ``NaN``, because there is no entry befofe it to use for interpolation. - >>> df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0), + >>> df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0), ... (np.nan, 2.0, np.nan, np.nan), ... (2.0, 3.0, np.nan, 9.0), ... (np.nan, 4.0, -4.0, 16.0)], @@ -6868,11 +6887,15 @@ def asof(self, where, subset=None): ------- scalar, Series, or DataFrame - * scalar : when `self` is a Series and `where` is a scalar - * Series: when `self` is a Series and `where` is an array-like, - or when `self` is a DataFrame and `where` is a scalar - * DataFrame : when `self` is a DataFrame and `where` is an - array-like + The return can be: + + * scalar : when `self` is a Series and `where` is a scalar + * Series: when `self` is a Series and `where` is an array-like, + or when `self` is a DataFrame and `where` is a scalar + * DataFrame : when `self` is a DataFrame and `where` is an + array-like + + Return scalar, Series, or DataFrame. See Also -------- @@ -7212,9 +7235,9 @@ def clip(self, lower=None, upper=None, axis=None, inplace=False, upper : float or array_like, default None Maximum threshold value. All values above this threshold will be set to it. - axis : int or string axis name, optional + axis : int or str axis name, optional Align object with lower and upper along the given axis. - inplace : boolean, default False + inplace : bool, default False Whether to perform the operation in place on the data. .. versionadded:: 0.21.0 @@ -7226,7 +7249,7 @@ def clip(self, lower=None, upper=None, axis=None, inplace=False, ------- Series or DataFrame Same type as calling object with the values outside the - clip boundaries replaced + clip boundaries replaced. Examples -------- @@ -7336,7 +7359,7 @@ def clip_upper(self, threshold, axis=None, inplace=False): axis : {0 or 'index', 1 or 'columns'}, default 0 Align object with `threshold` along the given axis. - inplace : boolean, default False + inplace : bool, default False Whether to perform the operation in place on the data. .. versionadded:: 0.21.0 @@ -7417,7 +7440,7 @@ def clip_lower(self, threshold, axis=None, inplace=False): axis : {0 or 'index', 1 or 'columns'}, default 0 Align `self` with `threshold` along the given axis. - inplace : boolean, default False + inplace : bool, default False Whether to perform the operation in place on the data. .. versionadded:: 0.21.0 @@ -7574,9 +7597,9 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, Examples -------- - >>> df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon', - ... 'Parrot', 'Parrot'], - ... 'Max Speed' : [380., 370., 24., 26.]}) + >>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon', + ... 'Parrot', 'Parrot'], + ... 'Max Speed': [380., 370., 24., 26.]}) >>> df Animal Max Speed 0 Falcon 380.0 @@ -7595,16 +7618,16 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, using the `level` parameter: >>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'], - ... ['Capitve', 'Wild', 'Capitve', 'Wild']] + ... ['Captive', 'Wild', 'Captive', 'Wild']] >>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type')) - >>> df = pd.DataFrame({'Max Speed' : [390., 350., 30., 20.]}, - ... index=index) + >>> df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]}, + ... index=index) >>> df Max Speed Animal Type - Falcon Capitve 390.0 + Falcon Captive 390.0 Wild 350.0 - Parrot Capitve 30.0 + Parrot Captive 30.0 Wild 20.0 >>> df.groupby(level=0).mean() Max Speed @@ -7614,7 +7637,7 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, >>> df.groupby(level=1).mean() Max Speed Type - Capitve 210.0 + Captive 210.0 Wild 185.0 """ from pandas.core.groupby.groupby import groupby @@ -7731,14 +7754,14 @@ def at_time(self, time, asof=False, axis=None): Parameters ---------- - time : datetime.time or string + time : datetime.time or str axis : {0 or 'index', 1 or 'columns'}, default 0 .. versionadded:: 0.24.0 Returns ------- - values_at_time : same type as caller + Series or DataFrame Raises ------ @@ -7756,7 +7779,7 @@ def at_time(self, time, asof=False, axis=None): Examples -------- >>> i = pd.date_range('2018-04-09', periods=4, freq='12H') - >>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i) + >>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i) >>> ts A 2018-04-09 00:00:00 1 @@ -7791,17 +7814,17 @@ def between_time(self, start_time, end_time, include_start=True, Parameters ---------- - start_time : datetime.time or string - end_time : datetime.time or string - include_start : boolean, default True - include_end : boolean, default True + start_time : datetime.time or str + end_time : datetime.time or str + include_start : bool, default True + include_end : bool, default True axis : {0 or 'index', 1 or 'columns'}, default 0 .. versionadded:: 0.24.0 Returns ------- - values_between_time : same type as caller + Series or DataFrame Raises ------ @@ -7819,7 +7842,7 @@ def between_time(self, start_time, end_time, include_start=True, Examples -------- >>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min') - >>> ts = pd.DataFrame({'A': [1,2,3,4]}, index=i) + >>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i) >>> ts A 2018-04-09 00:00:00 1 @@ -8377,7 +8400,7 @@ def ranker(data): Returns ------- (left, right) : (%(klass)s, type of other) - Aligned objects + Aligned objects. """) @Appender(_shared_docs['align'] % _shared_doc_kwargs) @@ -8569,7 +8592,7 @@ def _where(self, cond, other=np.nan, inplace=False, axis=None, level=None, cond = self._constructor(cond, **self._construct_axes_dict()) # make sure we are boolean - fill_value = True if inplace else False + fill_value = bool(inplace) cond = cond.fillna(fill_value) msg = "Boolean array expected for the condition, not {dtype}" @@ -9984,8 +10007,7 @@ def _add_numeric_operations(cls): cls, 'all', name, name2, axis_descr, _all_desc, nanops.nanall, _all_see_also, _all_examples, empty_value=True) - @Substitution(outname='mad', - desc="Return the mean absolute deviation of the values " + @Substitution(desc="Return the mean absolute deviation of the values " "for the requested axis.", name1=name, name2=name2, axis_descr=axis_descr, min_count='', see_also='', examples='') @@ -10026,8 +10048,7 @@ def mad(self, axis=None, skipna=None, level=None): "ddof argument", nanops.nanstd) - @Substitution(outname='compounded', - desc="Return the compound percentage of the values for " + @Substitution(desc="Return the compound percentage of the values for " "the requested axis.", name1=name, name2=name2, axis_descr=axis_descr, min_count='', see_also='', examples='') @@ -10117,7 +10138,7 @@ def nanptp(values, axis=0, skipna=True): cls.ptp = _make_stat_function( cls, 'ptp', name, name2, axis_descr, - """Returns the difference between the maximum value and the + """Return the difference between the maximum value and the minimum value in the object. This is the equivalent of the ``numpy.ndarray`` method ``ptp``.\n\n.. deprecated:: 0.24.0 Use numpy.ptp instead""", @@ -10235,8 +10256,8 @@ def last_valid_index(self): def _doc_parms(cls): """Return a tuple of the doc parms.""" - axis_descr = "{%s}" % ', '.join(["{0} ({1})".format(a, i) - for i, a in enumerate(cls._AXIS_ORDERS)]) + axis_descr = "{%s}" % ', '.join("{0} ({1})".format(a, i) + for i, a in enumerate(cls._AXIS_ORDERS)) name = (cls._constructor_sliced.__name__ if cls._AXIS_LEN > 1 else 'scalar') name2 = cls.__name__ @@ -10264,7 +10285,7 @@ def _doc_parms(cls): Returns ------- -%(outname)s : %(name1)s or %(name2)s (if level specified) +%(name1)s or %(name2)s (if level specified) %(see_also)s %(examples)s\ """ @@ -10275,7 +10296,7 @@ def _doc_parms(cls): Parameters ---------- axis : %(axis_descr)s -skipna : boolean, default True +skipna : bool, default True Exclude NA/null values. If an entire row/column is NA, the result will be NA level : int or level name, default None @@ -10284,13 +10305,13 @@ def _doc_parms(cls): ddof : int, default 1 Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. -numeric_only : boolean, default None +numeric_only : bool, default None Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series. Returns ------- -%(outname)s : %(name1)s or %(name2)s (if level specified)\n""" +%(name1)s or %(name2)s (if level specified)\n""" _bool_doc = """ %(desc)s @@ -10409,7 +10430,7 @@ def _doc_parms(cls): Returns ------- -%(outname)s : %(name1)s or %(name2)s\n +%(name1)s or %(name2)s\n See Also -------- core.window.Expanding.%(accum_func_name)s : Similar functionality @@ -10857,7 +10878,7 @@ def _doc_parms(cls): Series.max : Return the maximum. Series.idxmin : Return the index of the minimum. Series.idxmax : Return the index of the maximum. -DataFrame.min : Return the sum over the requested axis. +DataFrame.sum : Return the sum over the requested axis. DataFrame.min : Return the minimum over the requested axis. DataFrame.max : Return the maximum over the requested axis. DataFrame.idxmin : Return the index of the minimum over the requested axis. @@ -10902,7 +10923,7 @@ def _doc_parms(cls): def _make_min_count_stat_function(cls, name, name1, name2, axis_descr, desc, f, see_also='', examples=''): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr, min_count=_min_count_stub, see_also=see_also, examples=examples) @Appender(_num_doc) @@ -10930,7 +10951,7 @@ def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None, def _make_stat_function(cls, name, name1, name2, axis_descr, desc, f, see_also='', examples=''): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr, min_count='', see_also=see_also, examples=examples) @Appender(_num_doc) @@ -10954,7 +10975,7 @@ def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None, def _make_stat_function_ddof(cls, name, name1, name2, axis_descr, desc, f): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr) @Appender(_num_ddof_doc) def stat_func(self, axis=None, skipna=None, level=None, ddof=1, @@ -10975,7 +10996,7 @@ def stat_func(self, axis=None, skipna=None, level=None, ddof=1, def _make_cum_function(cls, name, name1, name2, axis_descr, desc, accum_func, accum_func_name, mask_a, mask_b, examples): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr, accum_func_name=accum_func_name, examples=examples) @Appender(_cnum_doc) @@ -11010,7 +11031,7 @@ def cum_func(self, axis=None, skipna=True, *args, **kwargs): def _make_logical_function(cls, name, name1, name2, axis_descr, desc, f, see_also, examples, empty_value): - @Substitution(outname=name, desc=desc, name1=name1, name2=name2, + @Substitution(desc=desc, name1=name1, name2=name2, axis_descr=axis_descr, see_also=see_also, examples=examples, empty_value=empty_value) @Appender(_bool_doc) diff --git a/pandas/core/groupby/__init__.py b/pandas/core/groupby/__init__.py index 9c15a5ebfe0f2..ac35f3825e5e8 100644 --- a/pandas/core/groupby/__init__.py +++ b/pandas/core/groupby/__init__.py @@ -1,4 +1,4 @@ from pandas.core.groupby.groupby import GroupBy # noqa: F401 from pandas.core.groupby.generic import ( # noqa: F401 - SeriesGroupBy, DataFrameGroupBy, PanelGroupBy) + SeriesGroupBy, DataFrameGroupBy) from pandas.core.groupby.grouper import Grouper # noqa: F401 diff --git a/pandas/core/groupby/generic.py b/pandas/core/groupby/generic.py index c5142a4ee98cc..683c21f7bd47a 100644 --- a/pandas/core/groupby/generic.py +++ b/pandas/core/groupby/generic.py @@ -1,5 +1,5 @@ """ -Define the SeriesGroupBy, DataFrameGroupBy, and PanelGroupBy +Define the SeriesGroupBy and DataFrameGroupBy classes that hold the groupby interfaces (and some implementations). These are user facing as the result of the ``df.groupby(...)`` operations, @@ -39,7 +39,6 @@ from pandas.core.index import CategoricalIndex, Index, MultiIndex import pandas.core.indexes.base as ibase from pandas.core.internals import BlockManager, make_block -from pandas.core.panel import Panel from pandas.core.series import Series from pandas.plotting._core import boxplot_frame_groupby @@ -965,7 +964,7 @@ def _transform_fast(self, func, func_nm): ids, _, ngroup = self.grouper.group_info cast = self._transform_should_cast(func_nm) - out = algorithms.take_1d(func().values, ids) + out = algorithms.take_1d(func()._values, ids) if cast: out = self._try_cast(out, self.obj) return Series(out, index=self.obj.index, name=self.obj.name) @@ -1021,7 +1020,9 @@ def true_and_notna(x, *args, **kwargs): return filtered def nunique(self, dropna=True): - """ Returns number of unique elements in the group """ + """ + Return number of unique elements in the group. + """ ids, _, _ = self.grouper.group_info val = self.obj.get_values() @@ -1461,8 +1462,8 @@ def _reindex_output(self, result): # reindex `result`, and then reset the in-axis grouper columns. # Select in-axis groupers - in_axis_grps = [(i, ping.name) for (i, ping) - in enumerate(groupings) if ping.in_axis] + in_axis_grps = ((i, ping.name) for (i, ping) + in enumerate(groupings) if ping.in_axis) g_nums, g_names = zip(*in_axis_grps) result = result.drop(labels=list(g_names), axis=1) @@ -1578,96 +1579,10 @@ def groupby_series(obj, col=None): from pandas.core.reshape.concat import concat results = [groupby_series(obj[col], col) for col in obj.columns] results = concat(results, axis=1) + results.columns.names = obj.columns.names if not self.as_index: results.index = ibase.default_index(len(results)) return results boxplot = boxplot_frame_groupby - - -class PanelGroupBy(NDFrameGroupBy): - - def aggregate(self, arg, *args, **kwargs): - return super(PanelGroupBy, self).aggregate(arg, *args, **kwargs) - - agg = aggregate - - def _iterate_slices(self): - if self.axis == 0: - # kludge - if self._selection is None: - slice_axis = self._selected_obj.items - else: - slice_axis = self._selection_list - slicer = lambda x: self._selected_obj[x] - else: - raise NotImplementedError("axis other than 0 is not supported") - - for val in slice_axis: - if val in self.exclusions: - continue - - yield val, slicer(val) - - def aggregate(self, arg, *args, **kwargs): - """ - Aggregate using input function or dict of {column -> function} - - Parameters - ---------- - arg : function or dict - Function to use for aggregating groups. If a function, must either - work when passed a Panel or when passed to Panel.apply. If - pass a dict, the keys must be DataFrame column names - - Returns - ------- - aggregated : Panel - """ - if isinstance(arg, compat.string_types): - return getattr(self, arg)(*args, **kwargs) - - return self._aggregate_generic(arg, *args, **kwargs) - - def _wrap_generic_output(self, result, obj): - if self.axis == 0: - new_axes = list(obj.axes) - new_axes[0] = self.grouper.result_index - elif self.axis == 1: - x, y, z = obj.axes - new_axes = [self.grouper.result_index, z, x] - else: - x, y, z = obj.axes - new_axes = [self.grouper.result_index, y, x] - - result = Panel._from_axes(result, new_axes) - - if self.axis == 1: - result = result.swapaxes(0, 1).swapaxes(0, 2) - elif self.axis == 2: - result = result.swapaxes(0, 2) - - return result - - def _aggregate_item_by_item(self, func, *args, **kwargs): - obj = self._obj_with_exclusions - result = {} - - if self.axis > 0: - for item in obj: - try: - itemg = DataFrameGroupBy(obj[item], - axis=self.axis - 1, - grouper=self.grouper) - result[item] = itemg.aggregate(func, *args, **kwargs) - except (ValueError, TypeError): - raise - new_axes = list(obj.axes) - new_axes[self.axis] = self.grouper.result_index - return Panel._from_axes(result, new_axes) - else: - raise ValueError("axis value must be greater than 0") - - def _wrap_aggregated_output(self, output, names=None): - raise AbstractMethodError(self) diff --git a/pandas/core/groupby/groupby.py b/pandas/core/groupby/groupby.py index 8766fdbc29755..926da40deaff2 100644 --- a/pandas/core/groupby/groupby.py +++ b/pandas/core/groupby/groupby.py @@ -18,7 +18,7 @@ class providing the base-class of operations. from pandas._libs import Timestamp, groupby as libgroupby import pandas.compat as compat -from pandas.compat import callable, range, set_function_name, zip +from pandas.compat import range, set_function_name, zip from pandas.compat.numpy import function as nv from pandas.errors import AbstractMethodError from pandas.util._decorators import Appender, Substitution, cache_readonly @@ -26,9 +26,12 @@ class providing the base-class of operations. from pandas.core.dtypes.cast import maybe_downcast_to_dtype from pandas.core.dtypes.common import ( - ensure_float, is_extension_array_dtype, is_numeric_dtype, is_scalar) + ensure_float, is_datetime64tz_dtype, is_extension_array_dtype, + is_numeric_dtype, is_scalar) from pandas.core.dtypes.missing import isna, notna +from pandas.api.types import ( + is_datetime64_dtype, is_integer_dtype, is_object_dtype) import pandas.core.algorithms as algorithms from pandas.core.base import ( DataError, GroupByError, PandasObject, SelectionMixin, SpecificationError) @@ -44,9 +47,9 @@ class providing the base-class of operations. _common_see_also = """ See Also -------- - pandas.Series.%(name)s - pandas.DataFrame.%(name)s - pandas.Panel.%(name)s + Series.%(name)s + DataFrame.%(name)s + Panel.%(name)s """ _apply_docs = dict( @@ -206,8 +209,8 @@ class providing the base-class of operations. See Also -------- -pandas.Series.pipe : Apply a function with arguments to a series. -pandas.DataFrame.pipe: Apply a function with arguments to a dataframe. +Series.pipe : Apply a function with arguments to a series. +DataFrame.pipe: Apply a function with arguments to a dataframe. apply : Apply function to each group instead of to the full %(klass)s object. @@ -443,12 +446,12 @@ def get_converter(s): raise ValueError(msg) converters = [get_converter(s) for s in index_sample] - names = [tuple(f(n) for f, n in zip(converters, name)) - for name in names] + names = (tuple(f(n) for f, n in zip(converters, name)) + for name in names) else: converter = get_converter(index_sample) - names = [converter(name) for name in names] + names = (converter(name) for name in names) return [self.indices.get(name, []) for name in names] @@ -625,7 +628,7 @@ def curried(x): def get_group(self, name, obj=None): """ - Constructs NDFrame from group with provided name. + Construct NDFrame from group with provided name. Parameters ---------- @@ -764,7 +767,21 @@ def _try_cast(self, result, obj, numeric_only=False): dtype = obj.dtype if not is_scalar(result): - if is_extension_array_dtype(dtype): + if is_datetime64tz_dtype(dtype): + # GH 23683 + # Prior results _may_ have been generated in UTC. + # Ensure we localize to UTC first before converting + # to the target timezone + try: + result = obj._values._from_sequence( + result, dtype='datetime64[ns, UTC]' + ) + result = result.astype(dtype) + except TypeError: + # _try_cast was called at a point where the result + # was already tz-aware + pass + elif is_extension_array_dtype(dtype): # The function can return something of any type, so check # if the type is compatible with the calling EA. try: @@ -1024,15 +1041,17 @@ def _bool_agg(self, val_test, skipna): """ def objs_to_bool(vals): - try: - vals = vals.astype(np.bool) - except ValueError: # for objects + # type: np.ndarray -> (np.ndarray, typing.Type) + if is_object_dtype(vals): vals = np.array([bool(x) for x in vals]) + else: + vals = vals.astype(np.bool) - return vals.view(np.uint8) + return vals.view(np.uint8), np.bool - def result_to_bool(result): - return result.astype(np.bool, copy=False) + def result_to_bool(result, inference): + # type: (np.ndarray, typing.Type) -> np.ndarray + return result.astype(inference, copy=False) return self._get_cythonized_result('group_any_all', self.grouper, aggregate=True, @@ -1047,7 +1066,7 @@ def result_to_bool(result): @Appender(_common_see_also) def any(self, skipna=True): """ - Returns True if any value in the group is truthful, else False. + Return True if any value in the group is truthful, else False. Parameters ---------- @@ -1060,7 +1079,7 @@ def any(self, skipna=True): @Appender(_common_see_also) def all(self, skipna=True): """ - Returns True if all values in the group are truthful, else False. + Return True if all values in the group are truthful, else False. Parameters ---------- @@ -1351,7 +1370,7 @@ def resample(self, rule, *args, **kwargs): See Also -------- - pandas.Grouper : Specify a frequency to resample with when + Grouper : Specify a frequency to resample with when grouping by a key. DatetimeIndex.resample : Frequency conversion and resampling of time series. @@ -1688,6 +1707,75 @@ def nth(self, n, dropna=None): return result + def quantile(self, q=0.5, interpolation='linear'): + """ + Return group values at the given quantile, a la numpy.percentile. + + Parameters + ---------- + q : float or array-like, default 0.5 (50% quantile) + Value(s) between 0 and 1 providing the quantile(s) to compute. + interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'} + Method to use when the desired quantile falls between two points. + + Returns + ------- + Series or DataFrame + Return type determined by caller of GroupBy object. + + See Also + -------- + Series.quantile : Similar method for Series. + DataFrame.quantile : Similar method for DataFrame. + numpy.percentile : NumPy method to compute qth percentile. + + Examples + -------- + >>> df = pd.DataFrame([ + ... ['a', 1], ['a', 2], ['a', 3], + ... ['b', 1], ['b', 3], ['b', 5] + ... ], columns=['key', 'val']) + >>> df.groupby('key').quantile() + val + key + a 2.0 + b 3.0 + """ + + def pre_processor(vals): + # type: np.ndarray -> (np.ndarray, Optional[typing.Type]) + if is_object_dtype(vals): + raise TypeError("'quantile' cannot be performed against " + "'object' dtypes!") + + inference = None + if is_integer_dtype(vals): + inference = np.int64 + elif is_datetime64_dtype(vals): + inference = 'datetime64[ns]' + vals = vals.astype(np.float) + + return vals, inference + + def post_processor(vals, inference): + # type: (np.ndarray, Optional[typing.Type]) -> np.ndarray + if inference: + # Check for edge case + if not (is_integer_dtype(inference) and + interpolation in {'linear', 'midpoint'}): + vals = vals.astype(inference) + + return vals + + return self._get_cythonized_result('group_quantile', self.grouper, + aggregate=True, + needs_values=True, + needs_mask=True, + cython_dtype=np.float64, + pre_processing=pre_processor, + post_processing=post_processor, + q=q, interpolation=interpolation) + @Substitution(name='groupby') def ngroup(self, ascending=True): """ @@ -1813,7 +1901,7 @@ def cumcount(self, ascending=True): def rank(self, method='average', ascending=True, na_option='keep', pct=False, axis=0): """ - Provides the rank of values within each group. + Provide the rank of values within each group. Parameters ---------- @@ -1835,7 +1923,7 @@ def rank(self, method='average', ascending=True, na_option='keep', The axis of the object over which to compute the rank. Returns - ----- + ------- DataFrame with ranking of values within each group """ if na_option not in {'keep', 'top', 'bottom'}: @@ -1924,10 +2012,16 @@ def _get_cythonized_result(self, how, grouper, aggregate=False, Whether the result of the Cython operation is an index of values to be retrieved, instead of the actual values themselves pre_processing : function, default None - Function to be applied to `values` prior to passing to Cython - Raises if `needs_values` is False + Function to be applied to `values` prior to passing to Cython. + Function should return a tuple where the first element is the + values to be passed to Cython and the second element is an optional + type which the values should be converted to after being returned + by the Cython operation. Raises if `needs_values` is False. post_processing : function, default None - Function to be applied to result of Cython function + Function to be applied to result of Cython function. Should accept + an array of values as the first argument and type inferences as its + second argument, i.e. the signature should be + (ndarray, typing.Type). **kwargs : dict Extra arguments to be passed back to Cython funcs @@ -1963,10 +2057,12 @@ def _get_cythonized_result(self, how, grouper, aggregate=False, result = np.zeros(result_sz, dtype=cython_dtype) func = partial(base_func, result, labels) + inferences = None + if needs_values: vals = obj.values if pre_processing: - vals = pre_processing(vals) + vals, inferences = pre_processing(vals) func = partial(func, vals) if needs_mask: @@ -1982,7 +2078,7 @@ def _get_cythonized_result(self, how, grouper, aggregate=False, result = algorithms.take_nd(obj.values, result) if post_processing: - result = post_processing(result) + result = post_processing(result, inferences) output[name] = result @@ -2039,7 +2135,7 @@ def pct_change(self, periods=1, fill_method='pad', limit=None, freq=None, @Substitution(name='groupby', see_also=_common_see_also) def head(self, n=5): """ - Returns first n rows of each group. + Return first n rows of each group. Essentially equivalent to ``.apply(lambda x: x.head(n))``, except ignores as_index flag. @@ -2067,7 +2163,7 @@ def head(self, n=5): @Substitution(name='groupby', see_also=_common_see_also) def tail(self, n=5): """ - Returns last n rows of each group. + Return last n rows of each group. Essentially equivalent to ``.apply(lambda x: x.tail(n))``, except ignores as_index flag. diff --git a/pandas/core/groupby/grouper.py b/pandas/core/groupby/grouper.py index 260417bc0d598..d1ebb9cbe8ac4 100644 --- a/pandas/core/groupby/grouper.py +++ b/pandas/core/groupby/grouper.py @@ -8,7 +8,7 @@ import numpy as np import pandas.compat as compat -from pandas.compat import callable, zip +from pandas.compat import zip from pandas.util._decorators import cache_readonly from pandas.core.dtypes.common import ( @@ -54,14 +54,17 @@ class Grouper(object): axis : number/name of the axis, defaults to 0 sort : boolean, default to False whether to sort the resulting labels - - additional kwargs to control time-like groupers (when `freq` is passed) - - closed : closed end of interval; 'left' or 'right' - label : interval boundary to use for labeling; 'left' or 'right' + closed : {'left' or 'right'} + Closed end of interval. Only when `freq` parameter is passed. + label : {'left' or 'right'} + Interval boundary to use for labeling. + Only when `freq` parameter is passed. convention : {'start', 'end', 'e', 's'} - If grouper is PeriodIndex - base, loffset + If grouper is PeriodIndex and `freq` parameter is passed. + base : int, default 0 + Only when `freq` parameter is passed. + loffset : string, DateOffset, timedelta object + Only when `freq` parameter is passed. Returns ------- @@ -195,9 +198,9 @@ def groups(self): return self.grouper.groups def __repr__(self): - attrs_list = ["{}={!r}".format(attr_name, getattr(self, attr_name)) + attrs_list = ("{}={!r}".format(attr_name, getattr(self, attr_name)) for attr_name in self._attributes - if getattr(self, attr_name) is not None] + if getattr(self, attr_name) is not None) attrs = ", ".join(attrs_list) cls_name = self.__class__.__name__ return "{}({})".format(cls_name, attrs) diff --git a/pandas/core/groupby/ops.py b/pandas/core/groupby/ops.py index 87f48d5a40554..78c9aa9187135 100644 --- a/pandas/core/groupby/ops.py +++ b/pandas/core/groupby/ops.py @@ -380,7 +380,7 @@ def get_func(fname): # otherwise find dtype-specific version, falling back to object for dt in [dtype_str, 'object']: f = getattr(libgroupby, "{fname}_{dtype_str}".format( - fname=fname, dtype_str=dtype_str), None) + fname=fname, dtype_str=dt), None) if f is not None: return f diff --git a/pandas/core/indexes/accessors.py b/pandas/core/indexes/accessors.py index c43469d3c3a81..602e11a08b4ed 100644 --- a/pandas/core/indexes/accessors.py +++ b/pandas/core/indexes/accessors.py @@ -140,7 +140,7 @@ def to_pydatetime(self): Returns ------- numpy.ndarray - object dtype array containing native Python datetime objects. + Object dtype array containing native Python datetime objects. See Also -------- @@ -208,7 +208,7 @@ def to_pytimedelta(self): Returns ------- a : numpy.ndarray - 1D array containing data with `datetime.timedelta` type. + Array of 1D containing data with `datetime.timedelta` type. See Also -------- diff --git a/pandas/core/indexes/api.py b/pandas/core/indexes/api.py index 684a19c56c92f..6299fc482d0df 100644 --- a/pandas/core/indexes/api.py +++ b/pandas/core/indexes/api.py @@ -112,7 +112,7 @@ def _get_combined_index(indexes, intersect=False, sort=False): elif intersect: index = indexes[0] for other in indexes[1:]: - index = index.intersection(other, sort=sort) + index = index.intersection(other) else: index = _union_indexes(indexes, sort=sort) index = ensure_index(index) diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py index 767da81c5c43a..dee181fc1c569 100644 --- a/pandas/core/indexes/base.py +++ b/pandas/core/indexes/base.py @@ -6,9 +6,10 @@ import numpy as np from pandas._libs import ( - Timedelta, algos as libalgos, index as libindex, join as libjoin, lib, - tslibs) + algos as libalgos, index as libindex, join as libjoin, lib) from pandas._libs.lib import is_datetime_array +from pandas._libs.tslibs import OutOfBoundsDatetime, Timedelta, Timestamp +from pandas._libs.tslibs.timezones import tz_compare import pandas.compat as compat from pandas.compat import range, set_function_name, u from pandas.compat.numpy import function as nv @@ -447,7 +448,7 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None, try: return DatetimeIndex(subarr, copy=copy, name=name, **kwargs) - except tslibs.OutOfBoundsDatetime: + except OutOfBoundsDatetime: pass elif inferred.startswith('timedelta'): @@ -665,7 +666,8 @@ def __array_wrap__(self, result, context=None): """ Gets called after a ufunc. """ - if is_bool_dtype(result): + result = lib.item_from_zerodim(result) + if is_bool_dtype(result) or lib.is_scalar(result): return result attrs = self._get_attributes_dict() @@ -1441,7 +1443,7 @@ def sortlevel(self, level=None, ascending=True, sort_remaining=None): Returns ------- - sorted_index : Index + Index """ return self.sort_values(return_indexer=True, ascending=ascending) @@ -1459,7 +1461,7 @@ def _get_level_values(self, level): Returns ------- - values : Index + Index Calling object, as there is only one level in the Index. See Also @@ -1504,7 +1506,7 @@ def droplevel(self, level=0): Returns ------- - index : Index or MultiIndex + Index or MultiIndex """ if not isinstance(level, (tuple, list)): level = [level] @@ -1556,11 +1558,11 @@ def droplevel(self, level=0): Returns ------- grouper : Index - Index of values to group on + Index of values to group on. labels : ndarray of int or None - Array of locations in level_index + Array of locations in level_index. uniques : Index or None - Index of unique values for level + Index of unique values for level. """ @Appender(_index_shared_docs['_get_grouper_for_level']) @@ -1828,13 +1830,13 @@ def isna(self): Returns ------- numpy.ndarray - A boolean array of whether my values are NA + A boolean array of whether my values are NA. See Also -------- - pandas.Index.notna : Boolean inverse of isna. - pandas.Index.dropna : Omit entries with missing values. - pandas.isna : Top-level isna. + Index.notna : Boolean inverse of isna. + Index.dropna : Omit entries with missing values. + isna : Top-level isna. Series.isna : Detect missing values in Series object. Examples @@ -1892,7 +1894,7 @@ def notna(self): -------- Index.notnull : Alias of notna. Index.isna: Inverse of notna. - pandas.notna : Top-level notna. + notna : Top-level notna. Examples -------- @@ -2074,9 +2076,9 @@ def duplicated(self, keep='first'): See Also -------- - pandas.Series.duplicated : Equivalent method on pandas.Series. - pandas.DataFrame.duplicated : Equivalent method on pandas.DataFrame. - pandas.Index.drop_duplicates : Remove duplicate values from Index. + Series.duplicated : Equivalent method on pandas.Series. + DataFrame.duplicated : Equivalent method on pandas.DataFrame. + Index.drop_duplicates : Remove duplicate values from Index. Examples -------- @@ -2245,18 +2247,37 @@ def _get_reconciled_name_object(self, other): return self._shallow_copy(name=name) return self - def union(self, other, sort=True): + def _validate_sort_keyword(self, sort): + if sort not in [None, False]: + raise ValueError("The 'sort' keyword only takes the values of " + "None or False; {0} was passed.".format(sort)) + + def union(self, other, sort=None): """ Form the union of two Index objects. Parameters ---------- other : Index or array-like - sort : bool, default True - Sort the resulting index if possible + sort : bool or None, default None + Whether to sort the resulting Index. + + * None : Sort the result, except when + + 1. `self` and `other` are equal. + 2. `self` or `other` has length 0. + 3. Some values in `self` or `other` cannot be compared. + A RuntimeWarning is issued in this case. + + * False : do not sort the result. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- union : Index @@ -2269,6 +2290,7 @@ def union(self, other, sort=True): >>> idx1.union(idx2) Int64Index([1, 2, 3, 4, 5, 6], dtype='int64') """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other = ensure_index(other) @@ -2319,7 +2341,7 @@ def union(self, other, sort=True): else: result = lvals - if sort: + if sort is None: try: result = sorting.safe_sort(result) except TypeError as e: @@ -2333,7 +2355,7 @@ def union(self, other, sort=True): def _wrap_setop_result(self, other, result): return self._constructor(result, name=get_op_result_name(self, other)) - def intersection(self, other, sort=True): + def intersection(self, other, sort=False): """ Form the intersection of two Index objects. @@ -2342,11 +2364,20 @@ def intersection(self, other, sort=True): Parameters ---------- other : Index or array-like - sort : bool, default True - Sort the resulting index if possible + sort : False or None, default False + Whether to sort the resulting index. + + * False : do not sort the result. + * None : sort the result, except when `self` and `other` are equal + or when the values cannot be compared. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default from ``True`` to ``False``, to match + the behaviour of 0.23.4 and earlier. + Returns ------- intersection : Index @@ -2359,6 +2390,7 @@ def intersection(self, other, sort=True): >>> idx1.intersection(idx2) Int64Index([3, 4], dtype='int64') """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other = ensure_index(other) @@ -2398,7 +2430,7 @@ def intersection(self, other, sort=True): taken = other.take(indexer) - if sort: + if sort is None: taken = sorting.safe_sort(taken.values) if self.name != other.name: name = None @@ -2411,7 +2443,7 @@ def intersection(self, other, sort=True): return taken - def difference(self, other, sort=True): + def difference(self, other, sort=None): """ Return a new Index with elements from the index that are not in `other`. @@ -2421,11 +2453,22 @@ def difference(self, other, sort=True): Parameters ---------- other : Index or array-like - sort : bool, default True - Sort the resulting index if possible + sort : False or None, default None + Whether to sort the resulting index. By default, the + values are attempted to be sorted, but any TypeError from + incomparable elements is caught by pandas. + + * None : Attempt to sort the result, but catch any TypeErrors + from comparing incomparable elements. + * False : Do not sort the result. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- difference : Index @@ -2440,6 +2483,7 @@ def difference(self, other, sort=True): >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64') """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) if self.equals(other): @@ -2456,7 +2500,7 @@ def difference(self, other, sort=True): label_diff = np.setdiff1d(np.arange(this.size), indexer, assume_unique=True) the_diff = this.values.take(label_diff) - if sort: + if sort is None: try: the_diff = sorting.safe_sort(the_diff) except TypeError: @@ -2464,7 +2508,7 @@ def difference(self, other, sort=True): return this._shallow_copy(the_diff, name=result_name, freq=None) - def symmetric_difference(self, other, result_name=None, sort=True): + def symmetric_difference(self, other, result_name=None, sort=None): """ Compute the symmetric difference of two Index objects. @@ -2472,11 +2516,22 @@ def symmetric_difference(self, other, result_name=None, sort=True): ---------- other : Index or array-like result_name : str - sort : bool, default True - Sort the resulting index if possible + sort : False or None, default None + Whether to sort the resulting index. By default, the + values are attempted to be sorted, but any TypeError from + incomparable elements is caught by pandas. + + * None : Attempt to sort the result, but catch any TypeErrors + from comparing incomparable elements. + * False : Do not sort the result. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- symmetric_difference : Index @@ -2500,6 +2555,7 @@ def symmetric_difference(self, other, result_name=None, sort=True): >>> idx1 ^ idx2 Int64Index([1, 5], dtype='int64') """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other, result_name_update = self._convert_can_do_setop(other) if result_name is None: @@ -2520,7 +2576,7 @@ def symmetric_difference(self, other, result_name=None, sort=True): right_diff = other.values.take(right_indexer) the_diff = _concat._concat_compat([left_diff, right_diff]) - if sort: + if sort is None: try: the_diff = sorting.safe_sort(the_diff) except TypeError: @@ -2916,9 +2972,10 @@ def _convert_listlike_indexer(self, keyarr, kind=None): Returns ------- - tuple (indexer, keyarr) - indexer is an ndarray or None if cannot convert - keyarr are tuple-safe keys + indexer : numpy.ndarray or None + Return an ndarray or None if cannot convert. + keyarr : numpy.ndarray + Return tuple-safe keys. """ if isinstance(keyarr, Index): keyarr = self._convert_index_indexer(keyarr) @@ -3044,9 +3101,9 @@ def reindex(self, target, method=None, level=None, limit=None, Returns ------- new_index : pd.Index - Resulting index + Resulting index. indexer : np.ndarray or None - Indices of output values in original index + Indices of output values in original index. """ # GH6552: preserve names when reindexing to non-named target @@ -3102,9 +3159,9 @@ def _reindex_non_unique(self, target): Returns ------- new_index : pd.Index - Resulting index + Resulting index. indexer : np.ndarray or None - Indices of output values in original index + Indices of output values in original index. """ @@ -3995,7 +4052,7 @@ def putmask(self, mask, value): def equals(self, other): """ - Determines if two Index objects contain the same elements. + Determine if two Index objects contain the same elements. """ if self.is_(other): return True @@ -4090,7 +4147,7 @@ def asof(self, label): def asof_locs(self, where, mask): """ - Finds the locations (indices) of the labels from the index for + Find the locations (indices) of the labels from the index for every entry in the `where` argument. As in the `asof` function, if the label (a particular entry in @@ -4150,8 +4207,8 @@ def sort_values(self, return_indexer=False, ascending=True): See Also -------- - pandas.Series.sort_values : Sort values of a Series. - pandas.DataFrame.sort_values : Sort values in a DataFrame. + Series.sort_values : Sort values of a Series. + DataFrame.sort_values : Sort values in a DataFrame. Examples -------- @@ -4205,7 +4262,7 @@ def shift(self, periods=1, freq=None): Returns ------- pandas.Index - shifted index + Shifted index. See Also -------- @@ -4368,7 +4425,7 @@ def set_value(self, arr, key, value): in the target are marked by -1. missing : ndarray of int An indexer into the target of the values not found. - These correspond to the -1 in the indexer array + These correspond to the -1 in the indexer array. """ @Appender(_index_shared_docs['get_indexer_non_unique'] % _index_doc_kwargs) @@ -4812,6 +4869,20 @@ def slice_locs(self, start=None, end=None, step=None, kind=None): # If it's a reverse slice, temporarily swap bounds. start, end = end, start + # GH 16785: If start and end happen to be date strings with UTC offsets + # attempt to parse and check that the offsets are the same + if (isinstance(start, (compat.string_types, datetime)) + and isinstance(end, (compat.string_types, datetime))): + try: + ts_start = Timestamp(start) + ts_end = Timestamp(end) + except (ValueError, TypeError): + pass + else: + if not tz_compare(ts_start.tzinfo, ts_end.tzinfo): + raise ValueError("Both dates must have the " + "same UTC offset") + start_slice = None if start is not None: start_slice = self.get_slice_bound(start, 'left', kind) @@ -5123,9 +5194,9 @@ def _add_logical_methods(cls): See Also -------- - pandas.Index.any : Return whether any element in an Index is True. - pandas.Series.any : Return whether any element in a Series is True. - pandas.Series.all : Return whether all elements in a Series are True. + Index.any : Return whether any element in an Index is True. + Series.any : Return whether any element in a Series is True. + Series.all : Return whether all elements in a Series are True. Notes ----- @@ -5163,8 +5234,8 @@ def _add_logical_methods(cls): See Also -------- - pandas.Index.all : Return whether all elements are True. - pandas.Series.all : Return whether all elements are True. + Index.all : Return whether all elements are True. + Series.all : Return whether all elements are True. Notes ----- diff --git a/pandas/core/indexes/category.py b/pandas/core/indexes/category.py index e43b64827d02a..b494c41c3b58c 100644 --- a/pandas/core/indexes/category.py +++ b/pandas/core/indexes/category.py @@ -42,20 +42,35 @@ typ='method', overwrite=True) class CategoricalIndex(Index, accessor.PandasDelegate): """ - Immutable Index implementing an ordered, sliceable set. CategoricalIndex - represents a sparsely populated Index with an underlying Categorical. + Index based on an underlying :class:`Categorical`. + + CategoricalIndex, like Categorical, can only take on a limited, + and usually fixed, number of possible values (`categories`). Also, + like Categorical, it might have an order, but numerical operations + (additions, divisions, ...) are not possible. Parameters ---------- - data : array-like or Categorical, (1-dimensional) - categories : optional, array-like - categories for the CategoricalIndex - ordered : boolean, - designating if the categories are ordered - copy : bool - Make a copy of input ndarray - name : object - Name to be stored in the index + data : array-like (1-dimensional) + The values of the categorical. If `categories` are given, values not in + `categories` will be replaced with NaN. + categories : index-like, optional + The categories for the categorical. Items need to be unique. + If the categories are not given here (and also not in `dtype`), they + will be inferred from the `data`. + ordered : bool, optional + Whether or not this categorical is treated as an ordered + categorical. If not given here or in `dtype`, the resulting + categorical will be unordered. + dtype : CategoricalDtype or the string "category", optional + If :class:`CategoricalDtype`, cannot be used together with + `categories` or `ordered`. + + .. versionadded:: 0.21.0 + copy : bool, default False + Make a copy of input ndarray. + name : object, optional + Name to be stored in the index. Attributes ---------- @@ -75,9 +90,45 @@ class CategoricalIndex(Index, accessor.PandasDelegate): as_unordered map + Raises + ------ + ValueError + If the categories do not validate. + TypeError + If an explicit ``ordered=True`` is given but no `categories` and the + `values` are not sortable. + See Also -------- - Categorical, Index + Index : The base pandas Index type. + Categorical : A categorical array. + CategoricalDtype : Type for categorical data. + + Notes + ----- + See the `user guide + `_ + for more. + + Examples + -------- + >>> pd.CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c']) + CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category') # noqa + + ``CategoricalIndex`` can also be instantiated from a ``Categorical``: + + >>> c = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c']) + >>> pd.CategoricalIndex(c) + CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category') # noqa + + Ordered ``CategoricalIndex`` can have a min and max value. + + >>> ci = pd.CategoricalIndex(['a','b','c','a','b','c'], ordered=True, + ... categories=['c', 'b', 'a']) + >>> ci + CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'], categories=['c', 'b', 'a'], ordered=True, dtype='category') # noqa + >>> ci.min() + 'c' """ _typ = 'categoricalindex' @@ -232,7 +283,7 @@ def _is_dtype_compat(self, other): def equals(self, other): """ - Determines if two CategorialIndex objects contain the same elements. + Determine if two CategorialIndex objects contain the same elements. """ if self.is_(other): return True @@ -780,8 +831,8 @@ def _concat_same_dtype(self, to_concat, name): Concatenate to_concat which has the same class ValueError if other is not in the categories """ - to_concat = [self._is_dtype_compat(c) for c in to_concat] - codes = np.concatenate([c.codes for c in to_concat]) + codes = np.concatenate([self._is_dtype_compat(c).codes + for c in to_concat]) result = self._create_from_codes(codes, name=name) # if name is None, _create_from_codes sets self.name result.name = name diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py index cc373c06efcc9..b8d052ce7be04 100644 --- a/pandas/core/indexes/datetimes.py +++ b/pandas/core/indexes/datetimes.py @@ -32,9 +32,8 @@ from pandas.core.ops import get_op_result_name import pandas.core.tools.datetimes as tools -from pandas.tseries import offsets from pandas.tseries.frequencies import Resolution, to_offset -from pandas.tseries.offsets import CDay, prefix_mapping +from pandas.tseries.offsets import CDay, Nano, prefix_mapping def _new_DatetimeIndex(cls, d): @@ -460,7 +459,7 @@ def _formatter_func(self): # -------------------------------------------------------------------- # Set Operation Methods - def union(self, other): + def union(self, other, sort=None): """ Specialized union for DatetimeIndex objects. If combine overlapping ranges with the same DateOffset, will be much @@ -469,15 +468,29 @@ def union(self, other): Parameters ---------- other : DatetimeIndex or array-like + sort : bool or None, default None + Whether to sort the resulting Index. + + * None : Sort the result, except when + + 1. `self` and `other` are equal. + 2. `self` or `other` has length 0. + 3. Some values in `self` or `other` cannot be compared. + A RuntimeWarning is issued in this case. + + * False : do not sort the result + + .. versionadded:: 0.25.0 Returns ------- y : Index or DatetimeIndex """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) if len(other) == 0 or self.equals(other) or len(self) == 0: - return super(DatetimeIndex, self).union(other) + return super(DatetimeIndex, self).union(other, sort=sort) if not isinstance(other, DatetimeIndex): try: @@ -488,9 +501,9 @@ def union(self, other): this, other = self._maybe_utc_convert(other) if this._can_fast_union(other): - return this._fast_union(other) + return this._fast_union(other, sort=sort) else: - result = Index.union(this, other) + result = Index.union(this, other, sort=sort) if isinstance(result, DatetimeIndex): # TODO: we shouldn't be setting attributes like this; # in all the tests this equality already holds @@ -563,16 +576,28 @@ def _can_fast_union(self, other): # this will raise return False - def _fast_union(self, other): + def _fast_union(self, other, sort=None): if len(other) == 0: return self.view(type(self)) if len(self) == 0: return other.view(type(self)) - # to make our life easier, "sort" the two ranges + # Both DTIs are monotonic. Check if they are already + # in the "correct" order if self[0] <= other[0]: left, right = self, other + # DTIs are not in the "correct" order and we don't want + # to sort but want to remove overlaps + elif sort is False: + left, right = self, other + left_start = left[0] + loc = right.searchsorted(left_start, side='left') + right_chunk = right.values[:loc] + dates = _concat._concat_compat((left.values, right_chunk)) + return self._shallow_copy(dates) + # DTIs are not in the "correct" order and we want + # to sort else: left, right = other, self @@ -594,7 +619,7 @@ def _wrap_setop_result(self, other, result): name = get_op_result_name(self, other) return self._shallow_copy(result, name=name, freq=None, tz=self.tz) - def intersection(self, other, sort=True): + def intersection(self, other, sort=False): """ Specialized intersection for DatetimeIndex objects. May be much faster than Index.intersection @@ -602,11 +627,21 @@ def intersection(self, other, sort=True): Parameters ---------- other : DatetimeIndex or array-like + sort : False or None, default False + Sort the resulting index if possible. + + .. versionadded:: 0.24.0 + + .. versionchanged:: 0.24.1 + + Changed the default to ``False`` to match the behaviour + from before 0.24.0. Returns ------- y : Index or DatetimeIndex """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) if self.equals(other): @@ -816,54 +851,57 @@ def _parsed_string_to_bounds(self, reso, parsed): lower, upper: pd.Timestamp """ + valid_resos = {'year', 'month', 'quarter', 'day', 'hour', 'minute', + 'second', 'minute', 'second', 'microsecond'} + if reso not in valid_resos: + raise KeyError if reso == 'year': - return (Timestamp(datetime(parsed.year, 1, 1), tz=self.tz), - Timestamp(datetime(parsed.year, 12, 31, 23, - 59, 59, 999999), tz=self.tz)) + start = Timestamp(parsed.year, 1, 1) + end = Timestamp(parsed.year, 12, 31, 23, 59, 59, 999999) elif reso == 'month': d = ccalendar.get_days_in_month(parsed.year, parsed.month) - return (Timestamp(datetime(parsed.year, parsed.month, 1), - tz=self.tz), - Timestamp(datetime(parsed.year, parsed.month, d, 23, - 59, 59, 999999), tz=self.tz)) + start = Timestamp(parsed.year, parsed.month, 1) + end = Timestamp(parsed.year, parsed.month, d, 23, 59, 59, 999999) elif reso == 'quarter': qe = (((parsed.month - 1) + 2) % 12) + 1 # two months ahead d = ccalendar.get_days_in_month(parsed.year, qe) # at end of month - return (Timestamp(datetime(parsed.year, parsed.month, 1), - tz=self.tz), - Timestamp(datetime(parsed.year, qe, d, 23, 59, - 59, 999999), tz=self.tz)) + start = Timestamp(parsed.year, parsed.month, 1) + end = Timestamp(parsed.year, qe, d, 23, 59, 59, 999999) elif reso == 'day': - st = datetime(parsed.year, parsed.month, parsed.day) - return (Timestamp(st, tz=self.tz), - Timestamp(Timestamp(st + offsets.Day(), - tz=self.tz).value - 1)) + start = Timestamp(parsed.year, parsed.month, parsed.day) + end = start + timedelta(days=1) - Nano(1) elif reso == 'hour': - st = datetime(parsed.year, parsed.month, parsed.day, - hour=parsed.hour) - return (Timestamp(st, tz=self.tz), - Timestamp(Timestamp(st + offsets.Hour(), - tz=self.tz).value - 1)) + start = Timestamp(parsed.year, parsed.month, parsed.day, + parsed.hour) + end = start + timedelta(hours=1) - Nano(1) elif reso == 'minute': - st = datetime(parsed.year, parsed.month, parsed.day, - hour=parsed.hour, minute=parsed.minute) - return (Timestamp(st, tz=self.tz), - Timestamp(Timestamp(st + offsets.Minute(), - tz=self.tz).value - 1)) + start = Timestamp(parsed.year, parsed.month, parsed.day, + parsed.hour, parsed.minute) + end = start + timedelta(minutes=1) - Nano(1) elif reso == 'second': - st = datetime(parsed.year, parsed.month, parsed.day, - hour=parsed.hour, minute=parsed.minute, - second=parsed.second) - return (Timestamp(st, tz=self.tz), - Timestamp(Timestamp(st + offsets.Second(), - tz=self.tz).value - 1)) + start = Timestamp(parsed.year, parsed.month, parsed.day, + parsed.hour, parsed.minute, parsed.second) + end = start + timedelta(seconds=1) - Nano(1) elif reso == 'microsecond': - st = datetime(parsed.year, parsed.month, parsed.day, - parsed.hour, parsed.minute, parsed.second, - parsed.microsecond) - return (Timestamp(st, tz=self.tz), Timestamp(st, tz=self.tz)) - else: - raise KeyError + start = Timestamp(parsed.year, parsed.month, parsed.day, + parsed.hour, parsed.minute, parsed.second, + parsed.microsecond) + end = start + timedelta(microseconds=1) - Nano(1) + # GH 24076 + # If an incoming date string contained a UTC offset, need to localize + # the parsed date to this offset first before aligning with the index's + # timezone + if parsed.tzinfo is not None: + if self.tz is None: + raise ValueError("The index must be timezone aware " + "when indexing with a date string with a " + "UTC offset") + start = start.tz_localize(parsed.tzinfo).tz_convert(self.tz) + end = end.tz_localize(parsed.tzinfo).tz_convert(self.tz) + elif self.tz is not None: + start = start.tz_localize(self.tz) + end = end.tz_localize(self.tz) + return start, end def _partial_date_slice(self, reso, parsed, use_lhs=True, use_rhs=True): is_monotonic = self.is_monotonic @@ -1000,7 +1038,7 @@ def get_loc(self, key, method=None, tolerance=None): except (KeyError, ValueError, TypeError): try: return self._get_string_slice(key) - except (TypeError, KeyError, ValueError): + except (TypeError, KeyError, ValueError, OverflowError): pass try: @@ -1274,7 +1312,7 @@ def delete(self, loc): def indexer_at_time(self, time, asof=False): """ - Returns index locations of index values at particular time of day + Return index locations of index values at particular time of day (e.g. 9:30AM). Parameters @@ -1292,20 +1330,19 @@ def indexer_at_time(self, time, asof=False): -------- indexer_between_time, DataFrame.at_time """ - from dateutil.parser import parse - if asof: raise NotImplementedError("'asof' argument is not supported") if isinstance(time, compat.string_types): + from dateutil.parser import parse time = parse(time).time() if time.tzinfo: - # TODO - raise NotImplementedError("argument 'time' with timezone info is " - "not supported") - - time_micros = self._get_time_micros() + if self.tz is None: + raise ValueError("Index must be timezone aware.") + time_micros = self.tz_convert(time.tzinfo)._get_time_micros() + else: + time_micros = self._get_time_micros() micros = _time_to_micros(time) return (micros == time_micros).nonzero()[0] @@ -1403,10 +1440,10 @@ def date_range(start=None, end=None, periods=None, freq=None, tz=None, See Also -------- - pandas.DatetimeIndex : An immutable container for datetimes. - pandas.timedelta_range : Return a fixed frequency TimedeltaIndex. - pandas.period_range : Return a fixed frequency PeriodIndex. - pandas.interval_range : Return a fixed frequency IntervalIndex. + DatetimeIndex : An immutable container for datetimes. + timedelta_range : Return a fixed frequency TimedeltaIndex. + period_range : Return a fixed frequency PeriodIndex. + interval_range : Return a fixed frequency IntervalIndex. Notes ----- diff --git a/pandas/core/indexes/interval.py b/pandas/core/indexes/interval.py index 0210560aaa21f..2c63fe33c57fe 100644 --- a/pandas/core/indexes/interval.py +++ b/pandas/core/indexes/interval.py @@ -1093,8 +1093,8 @@ def equals(self, other): def overlaps(self, other): return self._data.overlaps(other) - def _setop(op_name): - def func(self, other, sort=True): + def _setop(op_name, sort=None): + def func(self, other, sort=sort): other = self._as_like_interval_index(other) # GH 19016: ensure set op will not return a prohibited dtype @@ -1128,7 +1128,7 @@ def is_all_dates(self): return False union = _setop('union') - intersection = _setop('intersection') + intersection = _setop('intersection', sort=False) difference = _setop('difference') symmetric_difference = _setop('symmetric_difference') diff --git a/pandas/core/indexes/multi.py b/pandas/core/indexes/multi.py index e4d01a40bd181..616c17cd16f9a 100644 --- a/pandas/core/indexes/multi.py +++ b/pandas/core/indexes/multi.py @@ -61,7 +61,7 @@ def _codes_to_ints(self, codes): Returns ------ int_keys : scalar or 1-dimensional array, of dtype uint64 - Integer(s) representing one combination (each) + Integer(s) representing one combination (each). """ # Shift the representation of each level by the pre-calculated number # of bits: @@ -101,7 +101,7 @@ def _codes_to_ints(self, codes): Returns ------ int_keys : int, or 1-dimensional array of dtype object - Integer(s) representing one combination (each) + Integer(s) representing one combination (each). """ # Shift the representation of each level by the pre-calculated number @@ -324,11 +324,17 @@ def from_arrays(cls, arrays, sortorder=None, names=None): codes=[[0, 0, 1, 1], [1, 0, 1, 0]], names=['number', 'color']) """ + error_msg = "Input must be a list / sequence of array-likes." if not is_list_like(arrays): - raise TypeError("Input must be a list / sequence of array-likes.") + raise TypeError(error_msg) elif is_iterator(arrays): arrays = list(arrays) + # Check if elements of array are list-like + for array in arrays: + if not is_list_like(array): + raise TypeError(error_msg) + # Check if lengths of all arrays are equal or not, # raise ValueError, if not for i in range(1, len(arrays)): @@ -840,7 +846,7 @@ def __contains__(self, key): try: self.get_loc(key) return True - except (LookupError, TypeError): + except (LookupError, TypeError, ValueError): return False contains = __contains__ @@ -1391,7 +1397,7 @@ def get_level_values(self, level): Returns ------- values : Index - ``values`` is a level of this MultiIndex converted to + Values is a level of this MultiIndex converted to a single :class:`Index` (or subclass thereof). Examples @@ -1956,7 +1962,7 @@ def swaplevel(self, i=-2, j=-1): Returns ------- MultiIndex - A new MultiIndex + A new MultiIndex. .. versionchanged:: 0.18.1 @@ -2053,9 +2059,9 @@ def sortlevel(self, level=0, ascending=True, sort_remaining=True): Returns ------- sorted_index : pd.MultiIndex - Resulting index + Resulting index. indexer : np.ndarray - Indices of output values in original index + Indices of output values in original index. """ from pandas.core.sorting import indexer_from_factorized @@ -2189,7 +2195,7 @@ def reindex(self, target, method=None, level=None, limit=None, new_index : pd.MultiIndex Resulting index indexer : np.ndarray or None - Indices of output values in original index + Indices of output values in original index. """ # GH6552: preserve names when reindexing to non-named target @@ -2879,30 +2885,47 @@ def equal_levels(self, other): return False return True - def union(self, other, sort=True): + def union(self, other, sort=None): """ Form the union of two MultiIndex objects Parameters ---------- other : MultiIndex or array / Index of tuples - sort : bool, default True - Sort the resulting MultiIndex if possible + sort : False or None, default None + Whether to sort the resulting Index. + + * None : Sort the result, except when + + 1. `self` and `other` are equal. + 2. `self` has length 0. + 3. Some values in `self` or `other` cannot be compared. + A RuntimeWarning is issued in this case. + + * False : do not sort the result. .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- Index >>> index.union(index2) """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other, result_names = self._convert_can_do_setop(other) if len(other) == 0 or self.equals(other): return self + # TODO: Index.union returns other when `len(self)` is 0. + uniq_tuples = lib.fast_unique_multiple([self._ndarray_values, other._ndarray_values], sort=sort) @@ -2910,22 +2933,28 @@ def union(self, other, sort=True): return MultiIndex.from_arrays(lzip(*uniq_tuples), sortorder=0, names=result_names) - def intersection(self, other, sort=True): + def intersection(self, other, sort=False): """ Form the intersection of two MultiIndex objects. Parameters ---------- other : MultiIndex or array / Index of tuples - sort : bool, default True + sort : False or None, default False Sort the resulting MultiIndex if possible .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default from ``True`` to ``False``, to match + behaviour from before 0.24.0 + Returns ------- Index """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other, result_names = self._convert_can_do_setop(other) @@ -2936,7 +2965,7 @@ def intersection(self, other, sort=True): other_tuples = other._ndarray_values uniq_tuples = set(self_tuples) & set(other_tuples) - if sort: + if sort is None: uniq_tuples = sorted(uniq_tuples) if len(uniq_tuples) == 0: @@ -2947,22 +2976,28 @@ def intersection(self, other, sort=True): return MultiIndex.from_arrays(lzip(*uniq_tuples), sortorder=0, names=result_names) - def difference(self, other, sort=True): + def difference(self, other, sort=None): """ Compute set difference of two MultiIndex objects Parameters ---------- other : MultiIndex - sort : bool, default True + sort : False or None, default None Sort the resulting MultiIndex if possible .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default value from ``True`` to ``None`` + (without change in behaviour). + Returns ------- diff : MultiIndex """ + self._validate_sort_keyword(sort) self._assert_can_do_setop(other) other, result_names = self._convert_can_do_setop(other) @@ -2982,7 +3017,7 @@ def difference(self, other, sort=True): label_diff = np.setdiff1d(np.arange(this.size), indexer, assume_unique=True) difference = this.values.take(label_diff) - if sort: + if sort is None: difference = sorted(difference) if len(difference) == 0: diff --git a/pandas/core/indexes/range.py b/pandas/core/indexes/range.py index ebf5b279563cf..5aafe9734b6a0 100644 --- a/pandas/core/indexes/range.py +++ b/pandas/core/indexes/range.py @@ -343,22 +343,28 @@ def equals(self, other): return super(RangeIndex, self).equals(other) - def intersection(self, other, sort=True): + def intersection(self, other, sort=False): """ Form the intersection of two Index objects. Parameters ---------- other : Index or array-like - sort : bool, default True + sort : False or None, default False Sort the resulting index if possible .. versionadded:: 0.24.0 + .. versionchanged:: 0.24.1 + + Changed the default to ``False`` to match the behaviour + from before 0.24.0. + Returns ------- intersection : Index """ + self._validate_sort_keyword(sort) if self.equals(other): return self._get_reconciled_name_object(other) @@ -401,7 +407,7 @@ def intersection(self, other, sort=True): if (self._step < 0 and other._step < 0) is not (new_index._step < 0): new_index = new_index[::-1] - if sort: + if sort is None: new_index = new_index.sort_values() return new_index diff --git a/pandas/core/indexes/timedeltas.py b/pandas/core/indexes/timedeltas.py index cbe5ae198838f..830925535dab1 100644 --- a/pandas/core/indexes/timedeltas.py +++ b/pandas/core/indexes/timedeltas.py @@ -207,6 +207,11 @@ def __new__(cls, data=None, unit=None, freq=None, start=None, end=None, 'collection of some kind, {data} was passed' .format(cls=cls.__name__, data=repr(data))) + if unit in {'Y', 'y', 'M'}: + warnings.warn("M and Y units are deprecated and " + "will be removed in a future version.", + FutureWarning, stacklevel=2) + if isinstance(data, TimedeltaArray): if copy: data = data.copy() diff --git a/pandas/core/indexing.py b/pandas/core/indexing.py index bbcde8f3b3305..623a48acdd48b 100755 --- a/pandas/core/indexing.py +++ b/pandas/core/indexing.py @@ -5,6 +5,7 @@ import numpy as np from pandas._libs.indexing import _NDFrameIndexerBase +from pandas._libs.lib import item_from_zerodim import pandas.compat as compat from pandas.compat import range, zip from pandas.errors import AbstractMethodError @@ -347,10 +348,10 @@ def _setitem_with_indexer(self, indexer, value): # must have all defined axes if we have a scalar # or a list-like on the non-info axes if we have a # list-like - len_non_info_axes = [ + len_non_info_axes = ( len(_ax) for _i, _ax in enumerate(self.obj.axes) if _i != i - ] + ) if any(not l for l in len_non_info_axes): if not is_list_like_indexer(value): raise ValueError("cannot set a frame with no " @@ -1856,6 +1857,7 @@ def _getitem_axis(self, key, axis=None): if axis is None: axis = self.axis or 0 + key = item_from_zerodim(key) if is_iterator(key): key = list(key) @@ -2222,6 +2224,7 @@ def _getitem_axis(self, key, axis=None): # a single integer else: + key = item_from_zerodim(key) if not is_integer(key): raise TypeError("Cannot index by location index with a " "non-integer key") diff --git a/pandas/core/internals/__init__.py b/pandas/core/internals/__init__.py index 7878613a8b1b1..a662e1d3ae197 100644 --- a/pandas/core/internals/__init__.py +++ b/pandas/core/internals/__init__.py @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- from .blocks import ( # noqa:F401 - _block2d_to_blocknd, _factor_indexer, _block_shape, # io.pytables + _block_shape, # io.pytables _safe_reshape, # io.packers make_block, # io.pytables, io.packers FloatBlock, IntBlock, ComplexBlock, BoolBlock, ObjectBlock, diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py index df764aa4ba666..4e2c04dba8b04 100644 --- a/pandas/core/internals/blocks.py +++ b/pandas/core/internals/blocks.py @@ -87,7 +87,8 @@ def __init__(self, values, placement, ndim=None): '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs))) def _check_ndim(self, values, ndim): - """ndim inference and validation. + """ + ndim inference and validation. Infers ndim from 'values' if not provided to __init__. Validates that values.ndim and ndim are consistent if and only if @@ -267,20 +268,6 @@ def _slice(self, slicer): """ return a slice of my values """ return self.values[slicer] - def reshape_nd(self, labels, shape, ref_items): - """ - Parameters - ---------- - labels : list of new axis labels - shape : new shape - ref_items : new ref_items - - return a new block that is transformed to a nd block - """ - return _block2d_to_blocknd(values=self.get_values().T, - placement=self.mgr_locs, shape=shape, - labels=labels, ref_items=ref_items) - def getitem_block(self, slicer, new_mgr_locs=None): """ Perform __getitem__-like, return result as block. @@ -1128,24 +1115,18 @@ def check_int_bool(self, inplace): fill_value=fill_value, coerce=coerce, downcast=downcast) - # try an interp method - try: - m = missing.clean_interp_method(method, **kwargs) - except ValueError: - m = None - - if m is not None: - r = check_int_bool(self, inplace) - if r is not None: - return r - return self._interpolate(method=m, index=index, values=values, - axis=axis, limit=limit, - limit_direction=limit_direction, - limit_area=limit_area, - fill_value=fill_value, inplace=inplace, - downcast=downcast, **kwargs) - - raise ValueError("invalid method '{0}' to interpolate.".format(method)) + # validate the interp method + m = missing.clean_interp_method(method, **kwargs) + + r = check_int_bool(self, inplace) + if r is not None: + return r + return self._interpolate(method=m, index=index, values=values, + axis=axis, limit=limit, + limit_direction=limit_direction, + limit_area=limit_area, + fill_value=fill_value, inplace=inplace, + downcast=downcast, **kwargs) def _interpolate_with_fill(self, method='pad', axis=0, inplace=False, limit=None, fill_value=None, coerce=False, @@ -2072,17 +2053,9 @@ def get_values(self, dtype=None): return object dtype as boxed values, such as Timestamps/Timedelta """ if is_object_dtype(dtype): - values = self.values - - if self.ndim > 1: - values = values.ravel() - - values = lib.map_infer(values, self._box_func) - - if self.ndim > 1: - values = values.reshape(self.values.shape) - - return values + values = self.values.ravel() + result = self._holder(values).astype(object) + return result.reshape(self.values.shape) return self.values @@ -3155,31 +3128,6 @@ def _merge_blocks(blocks, dtype=None, _can_consolidate=True): return blocks -def _block2d_to_blocknd(values, placement, shape, labels, ref_items): - """ pivot to the labels shape """ - panel_shape = (len(placement),) + shape - - # TODO: lexsort depth needs to be 2!! - - # Create observation selection vector using major and minor - # labels, for converting to panel format. - selector = _factor_indexer(shape[1:], labels) - mask = np.zeros(np.prod(shape), dtype=bool) - mask.put(selector, True) - - if mask.all(): - pvalues = np.empty(panel_shape, dtype=values.dtype) - else: - dtype, fill_value = maybe_promote(values.dtype) - pvalues = np.empty(panel_shape, dtype=dtype) - pvalues.fill(fill_value) - - for i in range(len(placement)): - pvalues[i].flat[mask] = values[:, i] - - return make_block(pvalues, placement=placement) - - def _safe_reshape(arr, new_shape): """ If possible, reshape `arr` to have shape `new_shape`, @@ -3202,16 +3150,6 @@ def _safe_reshape(arr, new_shape): return arr -def _factor_indexer(shape, labels): - """ - given a tuple of shape and a list of Categorical labels, return the - expanded label indexer - """ - mult = np.array(shape)[::-1].cumprod()[::-1] - return ensure_platform_int( - np.sum(np.array(labels).T * np.append(mult, [1]), axis=1).T) - - def _putmask_smart(v, m, n): """ Return a new ndarray, try to preserve dtype if possible. diff --git a/pandas/core/internals/concat.py b/pandas/core/internals/concat.py index 4a16707a376e9..cb98274962656 100644 --- a/pandas/core/internals/concat.py +++ b/pandas/core/internals/concat.py @@ -183,13 +183,15 @@ def get_reindexed_values(self, empty_dtype, upcasted_na): is_datetime64tz_dtype(empty_dtype)): if self.block is None: array = empty_dtype.construct_array_type() - return array(np.full(self.shape[1], fill_value), + return array(np.full(self.shape[1], fill_value.value), dtype=empty_dtype) pass elif getattr(self.block, 'is_categorical', False): pass elif getattr(self.block, 'is_sparse', False): pass + elif getattr(self.block, 'is_extension', False): + pass else: missing_arr = np.empty(self.shape, dtype=empty_dtype) missing_arr.fill(fill_value) @@ -335,8 +337,10 @@ def get_empty_dtype_and_na(join_units): elif 'category' in upcast_classes: return np.dtype(np.object_), np.nan elif 'datetimetz' in upcast_classes: + # GH-25014. We use NaT instead of iNaT, since this eventually + # ends up in DatetimeArray.take, which does not allow iNaT. dtype = upcast_classes['datetimetz'] - return dtype[0], tslibs.iNaT + return dtype[0], tslibs.NaT elif 'datetime' in upcast_classes: return np.dtype('M8[ns]'), tslibs.iNaT elif 'timedelta' in upcast_classes: diff --git a/pandas/core/internals/construction.py b/pandas/core/internals/construction.py index c05a9a0f8f3c7..7e97512682720 100644 --- a/pandas/core/internals/construction.py +++ b/pandas/core/internals/construction.py @@ -197,18 +197,12 @@ def init_dict(data, index, columns, dtype=None): arrays.loc[missing] = [val] * missing.sum() else: - - for key in data: - if (isinstance(data[key], ABCDatetimeIndex) and - data[key].tz is not None): - # GH#24096 need copy to be deep for datetime64tz case - # TODO: See if we can avoid these copies - data[key] = data[key].copy(deep=True) - keys = com.dict_keys_to_ordered_list(data) columns = data_names = Index(keys) - arrays = [data[k] for k in keys] - + # GH#24096 need copy to be deep for datetime64tz case + # TODO: See if we can avoid these copies + arrays = [data[k] if not is_datetime64tz_dtype(data[k]) else + data[k].copy(deep=True) for k in keys] return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) diff --git a/pandas/core/internals/managers.py b/pandas/core/internals/managers.py index 050c3d3e87fc6..407db772d73e8 100644 --- a/pandas/core/internals/managers.py +++ b/pandas/core/internals/managers.py @@ -552,9 +552,9 @@ def comp(s, regex=False): if isna(s): return isna(values) if hasattr(s, 'asm8'): - return _compare_or_regex_match(maybe_convert_objects(values), - getattr(s, 'asm8'), regex) - return _compare_or_regex_match(values, s, regex) + return _compare_or_regex_search(maybe_convert_objects(values), + getattr(s, 'asm8'), regex) + return _compare_or_regex_search(values, s, regex) masks = [comp(s, regex) for i, s in enumerate(src_list)] @@ -584,10 +584,6 @@ def comp(s, regex=False): bm._consolidate_inplace() return bm - def reshape_nd(self, axes, **kwargs): - """ a 2d-nd reshape operation on a BlockManager """ - return self.apply('reshape_nd', axes=axes, **kwargs) - def is_consolidated(self): """ Return True if more than one block with the same dtype @@ -1901,11 +1897,11 @@ def _consolidate(blocks): return new_blocks -def _compare_or_regex_match(a, b, regex=False): +def _compare_or_regex_search(a, b, regex=False): """ Compare two array_like inputs of the same shape or two scalar values - Calls operator.eq or re.match, depending on regex argument. If regex is + Calls operator.eq or re.search, depending on regex argument. If regex is True, perform an element-wise regex matching. Parameters @@ -1921,7 +1917,7 @@ def _compare_or_regex_match(a, b, regex=False): if not regex: op = lambda x: operator.eq(x, b) else: - op = np.vectorize(lambda x: bool(re.match(b, x)) if isinstance(x, str) + op = np.vectorize(lambda x: bool(re.search(b, x)) if isinstance(x, str) else False) is_a_array = isinstance(a, np.ndarray) @@ -1971,16 +1967,28 @@ def items_overlap_with_suffix(left, lsuffix, right, rsuffix): raise ValueError('columns overlap but no suffix specified: ' '{rename}'.format(rename=to_rename)) - def lrenamer(x): - if x in to_rename: - return '{x}{lsuffix}'.format(x=x, lsuffix=lsuffix) - return x + def renamer(x, suffix): + """Rename the left and right indices. + + If there is overlap, and suffix is not None, add + suffix, otherwise, leave it as-is. - def rrenamer(x): - if x in to_rename: - return '{x}{rsuffix}'.format(x=x, rsuffix=rsuffix) + Parameters + ---------- + x : original column name + suffix : str or None + + Returns + ------- + x : renamed column name + """ + if x in to_rename and suffix is not None: + return '{x}{suffix}'.format(x=x, suffix=suffix) return x + lrenamer = partial(renamer, suffix=lsuffix) + rrenamer = partial(renamer, suffix=rsuffix) + return (_transform_index(left, lrenamer), _transform_index(right, rrenamer)) diff --git a/pandas/core/missing.py b/pandas/core/missing.py index 15538b8196684..9acdb1a06b2d1 100644 --- a/pandas/core/missing.py +++ b/pandas/core/missing.py @@ -1,5 +1,5 @@ """ -Routines for filling missing data +Routines for filling missing data. """ from distutils.version import LooseVersion import operator @@ -116,7 +116,7 @@ def interpolate_1d(xvalues, yvalues, method='linear', limit=None, xvalues and yvalues will each be 1-d arrays of the same length. Bounds_error is currently hardcoded to False since non-scipy ones don't - take it as an argumnet. + take it as an argument. """ # Treat the original, non-scipy methods first. @@ -244,9 +244,9 @@ def interpolate_1d(xvalues, yvalues, method='linear', limit=None, def _interpolate_scipy_wrapper(x, y, new_x, method, fill_value=None, bounds_error=False, order=None, **kwargs): """ - passed off to scipy.interpolate.interp1d. method is scipy's kind. + Passed off to scipy.interpolate.interp1d. method is scipy's kind. Returns an array interpolated at new_x. Add any new methods to - the list in _clean_interp_method + the list in _clean_interp_method. """ try: from scipy import interpolate @@ -293,9 +293,10 @@ def _interpolate_scipy_wrapper(x, y, new_x, method, fill_value=None, bounds_error=bounds_error) new_y = terp(new_x) elif method == 'spline': - # GH #10633 - if not order: - raise ValueError("order needs to be specified and greater than 0") + # GH #10633, #24014 + if isna(order) or (order <= 0): + raise ValueError("order needs to be specified and greater than 0; " + "got order: {}".format(order)) terp = interpolate.UnivariateSpline(x, y, k=order, **kwargs) new_y = terp(new_x) else: @@ -314,7 +315,7 @@ def _interpolate_scipy_wrapper(x, y, new_x, method, fill_value=None, def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False): """ - Convenience function for interpolate.BPoly.from_derivatives + Convenience function for interpolate.BPoly.from_derivatives. Construct a piecewise polynomial in the Bernstein basis, compatible with the specified values and derivatives at breakpoints. @@ -325,7 +326,7 @@ def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False): sorted 1D array of x-coordinates yi : array_like or list of array-likes yi[i][j] is the j-th derivative known at xi[i] - orders : None or int or array_like of ints. Default: None. + order: None or int or array_like of ints. Default: None. Specifies the degree of local polynomials. If not None, some derivatives are ignored. der : int or list @@ -344,8 +345,7 @@ def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False): Returns ------- y : scalar or array_like - The result, of length R or length M or M by R, - + The result, of length R or length M or M by R. """ import scipy from scipy import interpolate @@ -418,8 +418,9 @@ def _akima_interpolate(xi, yi, x, der=0, axis=0): def interpolate_2d(values, method='pad', axis=0, limit=None, fill_value=None, dtype=None): - """ perform an actual interpolation of values, values will be make 2-d if - needed fills inplace, returns the result + """ + Perform an actual interpolation of values, values will be make 2-d if + needed fills inplace, returns the result. """ transf = (lambda x: x) if axis == 0 else (lambda x: x.T) @@ -533,13 +534,13 @@ def clean_reindex_fill_method(method): def fill_zeros(result, x, y, name, fill): """ - if this is a reversed op, then flip x,y + If this is a reversed op, then flip x,y - if we have an integer value (or array in y) + If we have an integer value (or array in y) and we have 0's, fill them with the fill, - return the result + return the result. - mask the nan's from x + Mask the nan's from x. """ if fill is None or is_float_dtype(result): return result diff --git a/pandas/core/ops.py b/pandas/core/ops.py index 10cebc6f94b92..dbdabecafae3a 100644 --- a/pandas/core/ops.py +++ b/pandas/core/ops.py @@ -447,7 +447,7 @@ def _get_op_name(op, special): _op_descriptions[reverse_op]['reverse'] = key _flex_doc_SERIES = """ -{desc} of series and other, element-wise (binary operator `{op_name}`). +Return {desc} of series and other, element-wise (binary operator `{op_name}`). Equivalent to ``{equiv}``, but with support to substitute a fill_value for missing data in one of the inputs. @@ -459,14 +459,15 @@ def _get_op_name(op, special): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing - the result will be missing + the result will be missing. level : int or name Broadcast across a level, matching Index values on the - passed MultiIndex level + passed MultiIndex level. Returns ------- -result : Series +Series + The result of the operation. See Also -------- @@ -495,6 +496,27 @@ def _get_op_name(op, special): d 1.0 e NaN dtype: float64 +>>> a.subtract(b, fill_value=0) +a 0.0 +b 1.0 +c 1.0 +d -1.0 +e NaN +dtype: float64 +>>> a.multiply(b) +a 1.0 +b NaN +c NaN +d NaN +e NaN +dtype: float64 +>>> a.divide(b, fill_value=0) +a 1.0 +b inf +c inf +d 0.0 +e NaN +dtype: float64 """ _arith_doc_FRAME = """ @@ -525,7 +547,7 @@ def _get_op_name(op, special): """ _flex_doc_FRAME = """ -{desc} of dataframe and other, element-wise (binary operator `{op_name}`). +Get {desc} of dataframe and other, element-wise (binary operator `{op_name}`). Equivalent to ``{equiv}``, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, `{reverse}`. @@ -679,7 +701,7 @@ def _get_op_name(op, special): """ _flex_comp_doc_FRAME = """ -{desc} of dataframe and other, element-wise (binary operator `{op_name}`). +Get {desc} of dataframe and other, element-wise (binary operator `{op_name}`). Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison operators. @@ -825,7 +847,7 @@ def _get_op_name(op, special): """ _flex_doc_PANEL = """ -{desc} of series and other, element-wise (binary operator `{op_name}`). +Return {desc} of series and other, element-wise (binary operator `{op_name}`). Equivalent to ``{equiv}``. Parameters diff --git a/pandas/core/panel.py b/pandas/core/panel.py index 540192d1a592c..1555542079d80 100644 --- a/pandas/core/panel.py +++ b/pandas/core/panel.py @@ -4,12 +4,13 @@ # pylint: disable=E1103,W0231,W0212,W0621 from __future__ import division +from collections import OrderedDict import warnings import numpy as np import pandas.compat as compat -from pandas.compat import OrderedDict, map, range, u, zip +from pandas.compat import map, range, u, zip from pandas.compat.numpy import function as nv from pandas.util._decorators import Appender, Substitution, deprecate_kwarg from pandas.util._validators import validate_axis_style_args @@ -42,7 +43,7 @@ axes_single_arg="{0, 1, 2, 'items', 'major_axis', 'minor_axis'}", optional_mapper='', optional_axis='', optional_labels='') _shared_doc_kwargs['args_transpose'] = ( - "three positional arguments: each one of\n{ax_single}".format( + "{ax_single}\n\tThree positional arguments from given options.".format( ax_single=_shared_doc_kwargs['axes_single_arg'])) @@ -539,7 +540,7 @@ def set_value(self, *args, **kwargs): ------- panel : Panel If label combo is contained, will be reference to calling Panel, - otherwise a new object + otherwise a new object. """ warnings.warn("set_value is deprecated and will be removed " "in a future release. Please use " @@ -802,7 +803,7 @@ def major_xs(self, key): Returns ------- y : DataFrame - index -> minor axis, columns -> items + Index -> minor axis, columns -> items. Notes ----- @@ -826,7 +827,7 @@ def minor_xs(self, key): Returns ------- y : DataFrame - index -> major axis, columns -> items + Index -> major axis, columns -> items. Notes ----- @@ -917,9 +918,7 @@ def groupby(self, function, axis='major'): ------- grouped : PanelGroupBy """ - from pandas.core.groupby import PanelGroupBy - axis = self._get_axis_number(axis) - return PanelGroupBy(self, function, axis=axis) + raise NotImplementedError("Panel is removed in pandas 0.25.0") def to_frame(self, filter_observations=True): """ @@ -999,7 +998,7 @@ def construct_index_parts(idx, major=True): def apply(self, func, axis='major', **kwargs): """ - Applies function along axis (or axes) of the Panel. + Apply function along axis (or axes) of the Panel. Parameters ---------- @@ -1010,7 +1009,8 @@ def apply(self, func, axis='major', **kwargs): DataFrames of items & major axis will be passed axis : {'items', 'minor', 'major'}, or {0, 1, 2}, or a tuple with two axes - Additional keyword arguments will be passed as keywords to the function + **kwargs + Additional keyword arguments will be passed to the function. Returns ------- diff --git a/pandas/core/resample.py b/pandas/core/resample.py index 6822225273906..b3b28d7772713 100644 --- a/pandas/core/resample.py +++ b/pandas/core/resample.py @@ -20,7 +20,7 @@ import pandas.core.algorithms as algos from pandas.core.generic import _shared_docs from pandas.core.groupby.base import GroupByMixin -from pandas.core.groupby.generic import PanelGroupBy, SeriesGroupBy +from pandas.core.groupby.generic import SeriesGroupBy from pandas.core.groupby.groupby import ( GroupBy, _GroupBy, _pipe_template, groupby) from pandas.core.groupby.grouper import Grouper @@ -30,14 +30,12 @@ from pandas.core.indexes.timedeltas import TimedeltaIndex, timedelta_range from pandas.tseries.frequencies import to_offset -from pandas.tseries.offsets import ( - DateOffset, Day, Nano, Tick, delta_to_nanoseconds) +from pandas.tseries.offsets import DateOffset, Day, Nano, Tick _shared_docs_kwargs = dict() class Resampler(_GroupBy): - """ Class for resampling datetimelike data, a groupby-like operation. See aggregate, transform, and apply functions on this object. @@ -85,9 +83,9 @@ def __unicode__(self): """ Provide a nice str repr of our rolling object. """ - attrs = ["{k}={v}".format(k=k, v=getattr(self.groupby, k)) + attrs = ("{k}={v}".format(k=k, v=getattr(self.groupby, k)) for k in self._attributes if - getattr(self.groupby, k, None) is not None] + getattr(self.groupby, k, None) is not None) return "{klass} [{attrs}]".format(klass=self.__class__.__name__, attrs=', '.join(attrs)) @@ -108,7 +106,7 @@ def __iter__(self): Returns ------- Generator yielding sequence of (name, subsetted object) - for each group + for each group. See Also -------- @@ -215,9 +213,9 @@ def pipe(self, func, *args, **kwargs): _agg_see_also_doc = dedent(""" See Also -------- - pandas.DataFrame.groupby.aggregate - pandas.DataFrame.resample.transform - pandas.DataFrame.aggregate + DataFrame.groupby.aggregate + DataFrame.resample.transform + DataFrame.aggregate """) _agg_examples_doc = dedent(""" @@ -287,8 +285,8 @@ def transform(self, arg, *args, **kwargs): Parameters ---------- - func : function - To apply to each group. Should return a Series with the same index + arg : function + To apply to each group. Should return a Series with the same index. Returns ------- @@ -342,15 +340,10 @@ def _groupby_and_aggregate(self, how, grouper=None, *args, **kwargs): obj = self._selected_obj - try: - grouped = groupby(obj, by=None, grouper=grouper, axis=self.axis) - except TypeError: - - # panel grouper - grouped = PanelGroupBy(obj, grouper=grouper, axis=self.axis) + grouped = groupby(obj, by=None, grouper=grouper, axis=self.axis) try: - if isinstance(obj, ABCDataFrame) and compat.callable(how): + if isinstance(obj, ABCDataFrame) and callable(how): # Check if the function is reducing or not. result = grouped._aggregate_item_by_item(how, *args, **kwargs) else: @@ -424,7 +417,7 @@ def pad(self, limit=None): Returns ------- - an upsampled Series + An upsampled Series. See Also -------- @@ -524,9 +517,9 @@ def backfill(self, limit=None): 'backfill'. nearest : Fill NaN values with nearest neighbor starting from center. pad : Forward fill NaN values. - pandas.Series.fillna : Fill NaN values in the Series using the + Series.fillna : Fill NaN values in the Series using the specified method, which can be 'backfill'. - pandas.DataFrame.fillna : Fill NaN values in the DataFrame using the + DataFrame.fillna : Fill NaN values in the DataFrame using the specified method, which can be 'backfill'. References @@ -637,9 +630,9 @@ def fillna(self, method, limit=None): nearest : Fill NaN values in the resampled data with nearest neighbor starting from center. interpolate : Fill NaN values using interpolation. - pandas.Series.fillna : Fill NaN values in the Series using the + Series.fillna : Fill NaN values in the Series using the specified method, which can be 'bfill' and 'ffill'. - pandas.DataFrame.fillna : Fill NaN values in the DataFrame using the + DataFrame.fillna : Fill NaN values in the DataFrame using the specified method, which can be 'bfill' and 'ffill'. References @@ -802,7 +795,7 @@ def std(self, ddof=1, *args, **kwargs): Parameters ---------- ddof : integer, default 1 - degrees of freedom + Degrees of freedom. """ nv.validate_resampler_func('std', args, kwargs) return self._downsample('std', ddof=ddof) @@ -1613,20 +1606,20 @@ def _get_timestamp_range_edges(first, last, offset, closed='left', base=0): A tuple of length 2, containing the adjusted pd.Timestamp objects. """ if isinstance(offset, Tick): - is_day = isinstance(offset, Day) - day_nanos = delta_to_nanoseconds(timedelta(1)) - - # #1165 and #24127 - if (is_day and not offset.nanos % day_nanos) or not is_day: - first, last = _adjust_dates_anchored(first, last, offset, - closed=closed, base=base) - if is_day and first.tz is not None: - # _adjust_dates_anchored assumes 'D' means 24H, but first/last - # might contain a DST transition (23H, 24H, or 25H). - # Ensure first/last snap to midnight. - first = first.normalize() - last = last.normalize() - return first, last + if isinstance(offset, Day): + # _adjust_dates_anchored assumes 'D' means 24H, but first/last + # might contain a DST transition (23H, 24H, or 25H). + # So "pretend" the dates are naive when adjusting the endpoints + tz = first.tz + first = first.tz_localize(None) + last = last.tz_localize(None) + + first, last = _adjust_dates_anchored(first, last, offset, + closed=closed, base=base) + if isinstance(offset, Day): + first = first.tz_localize(tz) + last = last.tz_localize(tz) + return first, last else: first = first.normalize() diff --git a/pandas/core/reshape/concat.py b/pandas/core/reshape/concat.py index 53671e00e88b4..a6c945ac2e464 100644 --- a/pandas/core/reshape/concat.py +++ b/pandas/core/reshape/concat.py @@ -38,15 +38,15 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, If a dict is passed, the sorted keys will be used as the `keys` argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless - they are all None in which case a ValueError will be raised + they are all None in which case a ValueError will be raised. axis : {0/'index', 1/'columns'}, default 0 - The axis to concatenate along + The axis to concatenate along. join : {'inner', 'outer'}, default 'outer' - How to handle indexes on other axis(es) + How to handle indexes on other axis (or axes). join_axes : list of Index objects Specific indexes to use for the other n - 1 axes instead of performing - inner/outer set logic - ignore_index : boolean, default False + inner/outer set logic. + ignore_index : bool, default False If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, ..., n - 1. This is useful if you are concatenating objects where the concatenation axis does not have @@ -54,16 +54,16 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, axes are still respected in the join. keys : sequence, default None If multiple levels passed, should contain tuples. Construct - hierarchical index using the passed keys as the outermost level + hierarchical index using the passed keys as the outermost level. levels : list of sequences, default None Specific levels (unique values) to use for constructing a - MultiIndex. Otherwise they will be inferred from the keys + MultiIndex. Otherwise they will be inferred from the keys. names : list, default None - Names for the levels in the resulting hierarchical index - verify_integrity : boolean, default False + Names for the levels in the resulting hierarchical index. + verify_integrity : bool, default False Check whether the new concatenated axis contains duplicates. This can - be very expensive relative to the actual data concatenation - sort : boolean, default None + be very expensive relative to the actual data concatenation. + sort : bool, default None Sort non-concatenation axis if it is not already aligned when `join` is 'outer'. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas. @@ -76,12 +76,12 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, .. versionadded:: 0.23.0 - copy : boolean, default True - If False, do not copy data unnecessarily + copy : bool, default True + If False, do not copy data unnecessarily. Returns ------- - concatenated : object, type of objs + object, type of objs When concatenating all ``Series`` along the index (axis=0), a ``Series`` is returned. When ``objs`` contains at least one ``DataFrame``, a ``DataFrame`` is returned. When concatenating along @@ -89,10 +89,10 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, See Also -------- - Series.append - DataFrame.append - DataFrame.join - DataFrame.merge + Series.append : Concatenate Series. + DataFrame.append : Concatenate DataFrames. + DataFrame.join : Join DataFrames using indexes. + DataFrame.merge : Merge DataFrames by indexes or columns. Notes ----- @@ -128,7 +128,7 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, Add a hierarchical index at the outermost level of the data with the ``keys`` option. - >>> pd.concat([s1, s2], keys=['s1', 's2',]) + >>> pd.concat([s1, s2], keys=['s1', 's2']) s1 0 a 1 b s2 0 c diff --git a/pandas/core/reshape/melt.py b/pandas/core/reshape/melt.py index 312a108ad3380..0fa80de812c5f 100644 --- a/pandas/core/reshape/melt.py +++ b/pandas/core/reshape/melt.py @@ -230,7 +230,7 @@ def wide_to_long(df, stubnames, i, j, sep="", suffix=r'\d+'): ------- DataFrame A DataFrame that contains each stub name as a variable, with new index - (i, j) + (i, j). Notes ----- diff --git a/pandas/core/reshape/merge.py b/pandas/core/reshape/merge.py index 1dd19a7c1514e..fb50a3c60f705 100644 --- a/pandas/core/reshape/merge.py +++ b/pandas/core/reshape/merge.py @@ -159,9 +159,15 @@ def merge_ordered(left, right, on=None, left DataFrame fill_method : {'ffill', None}, default None Interpolation method for data - suffixes : 2-length sequence (tuple, list, ...) - Suffix to apply to overlapping column names in the left and right - side, respectively + suffixes : Sequence, default is ("_x", "_y") + A length-2 sequence where each element is optionally a string + indicating the suffix to add to overlapping column names in + `left` and `right` respectively. Pass a value of `None` instead + of a string to indicate that the column name from `left` or + `right` should be left as-is, with no suffix. At least one of the + values must not be None. + + .. versionchanged:: 0.25.0 how : {'left', 'right', 'outer', 'inner'}, default 'outer' * left: use only keys from left frame (SQL: left outer join) * right: use only keys from right frame (SQL: right outer join) @@ -903,7 +909,7 @@ def _get_merge_keys(self): in zip(self.right.index.levels, self.right.index.codes)] else: - right_keys = [self.right.index.values] + right_keys = [self.right.index._values] elif _any(self.right_on): for k in self.right_on: if is_rkey(k): diff --git a/pandas/core/reshape/pivot.py b/pandas/core/reshape/pivot.py index c7c447d18b6b1..8d7616c4b6b61 100644 --- a/pandas/core/reshape/pivot.py +++ b/pandas/core/reshape/pivot.py @@ -88,9 +88,9 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', # the original values are ints # as we grouped with a NaN value # and then dropped, coercing to floats - for v in [v for v in values if v in data and v in agged]: - if (is_integer_dtype(data[v]) and - not is_integer_dtype(agged[v])): + for v in values: + if (v in data and is_integer_dtype(data[v]) and + v in agged and not is_integer_dtype(agged[v])): agged[v] = maybe_downcast_to_dtype(agged[v], data[v].dtype) table = agged @@ -392,36 +392,36 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False): """ - Compute a simple cross-tabulation of two (or more) factors. By default + Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an - aggregation function are passed + aggregation function are passed. Parameters ---------- index : array-like, Series, or list of arrays/Series - Values to group by in the rows + Values to group by in the rows. columns : array-like, Series, or list of arrays/Series - Values to group by in the columns + Values to group by in the columns. values : array-like, optional Array of values to aggregate according to the factors. Requires `aggfunc` be specified. rownames : sequence, default None - If passed, must match number of row arrays passed + If passed, must match number of row arrays passed. colnames : sequence, default None - If passed, must match number of column arrays passed + If passed, must match number of column arrays passed. aggfunc : function, optional - If specified, requires `values` be specified as well - margins : boolean, default False - Add row/column margins (subtotals) - margins_name : string, default 'All' - Name of the row / column that will contain the totals + If specified, requires `values` be specified as well. + margins : bool, default False + Add row/column margins (subtotals). + margins_name : str, default 'All' + Name of the row/column that will contain the totals when margins is True. .. versionadded:: 0.21.0 - dropna : boolean, default True - Do not include columns whose entries are all NaN - normalize : boolean, {'all', 'index', 'columns'}, or {0,1}, default False + dropna : bool, default True + Do not include columns whose entries are all NaN. + normalize : bool, {'all', 'index', 'columns'}, or {0,1}, default False Normalize by dividing all values by the sum of values. - If passed 'all' or `True`, will normalize over all values. @@ -433,7 +433,13 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None, Returns ------- - crosstab : DataFrame + DataFrame + Cross tabulation of the data. + + See Also + -------- + DataFrame.pivot : Reshape data based on column values. + pivot_table : Create a pivot table as a DataFrame. Notes ----- @@ -455,32 +461,26 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None, ... "one", "two", "two", "two", "one"], dtype=object) >>> c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny", ... "shiny", "dull", "shiny", "shiny", "shiny"], - ... dtype=object) - + ... dtype=object) >>> pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c']) - ... # doctest: +NORMALIZE_WHITESPACE b one two c dull shiny dull shiny a bar 1 2 1 0 foo 2 2 1 2 + Here 'c' and 'f' are not represented in the data and will not be + shown in the output because dropna is True by default. Set + dropna=False to preserve categories with no data. + >>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c']) >>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f']) - >>> crosstab(foo, bar) # 'c' and 'f' are not represented in the data, - # and will not be shown in the output because - # dropna is True by default. Set 'dropna=False' - # to preserve categories with no data - ... # doctest: +SKIP + >>> pd.crosstab(foo, bar) col_0 d e row_0 a 1 0 b 0 1 - - >>> crosstab(foo, bar, dropna=False) # 'c' and 'f' are not represented - # in the data, but they still will be counted - # and shown in the output - ... # doctest: +SKIP + >>> pd.crosstab(foo, bar, dropna=False) col_0 d e f row_0 a 1 0 0 diff --git a/pandas/core/reshape/reshape.py b/pandas/core/reshape/reshape.py index f436b3b92a359..6ba33301753d6 100644 --- a/pandas/core/reshape/reshape.py +++ b/pandas/core/reshape/reshape.py @@ -701,19 +701,20 @@ def _convert_level_number(level_num, columns): def get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None): """ - Convert categorical variable into dummy/indicator variables + Convert categorical variable into dummy/indicator variables. Parameters ---------- data : array-like, Series, or DataFrame - prefix : string, list of strings, or dict of strings, default None + Data of which to get dummy indicators. + prefix : str, list of str, or dict of str, default None String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternatively, `prefix` can be a dictionary mapping column names to prefixes. - prefix_sep : string, default '_' + prefix_sep : str, default '_' If appending prefix, separator/delimiter to use. Or pass a - list or dictionary as with `prefix.` + list or dictionary as with `prefix`. dummy_na : bool, default False Add a column to indicate NaNs, if False NaNs are ignored. columns : list-like, default None @@ -736,11 +737,12 @@ def get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, Returns ------- - dummies : DataFrame + DataFrame + Dummy-coded data. See Also -------- - Series.str.get_dummies + Series.str.get_dummies : Convert Series to dummy codes. Examples -------- diff --git a/pandas/core/reshape/tile.py b/pandas/core/reshape/tile.py index c107ed51226b0..f99fd9004bb31 100644 --- a/pandas/core/reshape/tile.py +++ b/pandas/core/reshape/tile.py @@ -35,7 +35,7 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3, ---------- x : array-like The input array to be binned. Must be 1-dimensional. - bins : int, sequence of scalars, or pandas.IntervalIndex + bins : int, sequence of scalars, or IntervalIndex The criteria to bin by. * int : Defines the number of equal-width bins in the range of `x`. The @@ -70,16 +70,16 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3, Returns ------- - out : pandas.Categorical, Series, or ndarray + out : Categorical, Series, or ndarray An array-like object representing the respective bin for each value of `x`. The type depends on the value of `labels`. * True (default) : returns a Series for Series `x` or a - pandas.Categorical for all other inputs. The values stored within + Categorical for all other inputs. The values stored within are Interval dtype. * sequence of scalars : returns a Series for Series `x` or a - pandas.Categorical for all other inputs. The values stored within + Categorical for all other inputs. The values stored within are whatever the type in the sequence is. * False : returns an ndarray of integers. @@ -94,16 +94,15 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3, -------- qcut : Discretize variable into equal-sized buckets based on rank or based on sample quantiles. - pandas.Categorical : Array type for storing data that come from a + Categorical : Array type for storing data that come from a fixed set of values. Series : One-dimensional array with axis labels (including time series). - pandas.IntervalIndex : Immutable Index implementing an ordered, - sliceable set. + IntervalIndex : Immutable Index implementing an ordered, sliceable set. Notes ----- Any NA values will be NA in the result. Out of bounds values will be NA in - the resulting Series or pandas.Categorical object. + the resulting Series or Categorical object. Examples -------- @@ -164,7 +163,7 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3, Use `drop` optional when bins is not unique >>> pd.cut(s, [0, 2, 4, 6, 10, 10], labels=False, retbins=True, - ... right=False, duplicates='drop') + ... right=False, duplicates='drop') ... # doctest: +ELLIPSIS (a 0.0 b 1.0 @@ -373,14 +372,6 @@ def _bins_to_cuts(x, bins, right=True, labels=None, return result, bins -def _trim_zeros(x): - while len(x) > 1 and x[-1] == '0': - x = x[:-1] - if len(x) > 1 and x[-1] == '.': - x = x[:-1] - return x - - def _coerce_to_type(x): """ if the passed data is of datetime/timedelta type, diff --git a/pandas/core/series.py b/pandas/core/series.py index a25aa86a47927..f6598ed1ee614 100644 --- a/pandas/core/series.py +++ b/pandas/core/series.py @@ -3,6 +3,7 @@ """ from __future__ import division +from collections import OrderedDict from textwrap import dedent import warnings @@ -10,7 +11,7 @@ from pandas._libs import iNaT, index as libindex, lib, tslibs import pandas.compat as compat -from pandas.compat import PY36, OrderedDict, StringIO, u, zip +from pandas.compat import PY36, StringIO, u, zip from pandas.compat.numpy import function as nv from pandas.util._decorators import Appender, Substitution, deprecate from pandas.util._validators import validate_bool_kwarg @@ -129,7 +130,7 @@ class Series(base.IndexOpsMixin, generic.NDFrame): sequence are used, the index will override the keys found in the dict. dtype : str, numpy.dtype, or ExtensionDtype, optional - dtype for the output Series. If not specified, this will be + Data type for the output Series. If not specified, this will be inferred from `data`. See the :ref:`user guide ` for more usages. copy : bool, default False @@ -444,7 +445,7 @@ def values(self): Returns ------- - arr : numpy.ndarray or ndarray-like + numpy.ndarray or ndarray-like See Also -------- @@ -513,6 +514,11 @@ def ravel(self, order='C'): """ Return the flattened underlying data as an ndarray. + Returns + ------- + numpy.ndarray or ndarray-like + Flattened data of the Series. + See Also -------- numpy.ndarray.ravel @@ -580,7 +586,7 @@ def nonzero(self): def put(self, *args, **kwargs): """ - Applies the `put` method to its `values` attribute if it has one. + Apply the `put` method to its `values` attribute if it has one. See Also -------- @@ -687,7 +693,7 @@ def __array__(self, dtype=None): See Also -------- - pandas.array : Create a new array from data. + array : Create a new array from data. Series.array : Zero-copy view to the array backing the Series. Series.to_numpy : Series method for similar behavior. @@ -830,7 +836,7 @@ def _ixs(self, i, axis=0): Returns ------- - value : scalar (int) or Series (slice, sequence) + scalar (int) or Series (slice, sequence) """ try: @@ -1120,7 +1126,7 @@ def repeat(self, repeats, axis=None): Returns ------- - repeated_series : Series + Series Newly created Series with repeated elements. See Also @@ -1173,7 +1179,7 @@ def get_value(self, label, takeable=False): Returns ------- - value : scalar value + scalar value """ warnings.warn("get_value is deprecated and will be removed " "in a future release. Please use " @@ -1207,9 +1213,9 @@ def set_value(self, label, value, takeable=False): Returns ------- - series : Series + Series If label is contained, will be reference to calling Series, - otherwise a new object + otherwise a new object. """ warnings.warn("set_value is deprecated and will be removed " "in a future release. Please use " @@ -1394,29 +1400,30 @@ def to_string(self, buf=None, na_rep='NaN', float_format=None, header=True, Parameters ---------- buf : StringIO-like, optional - buffer to write to - na_rep : string, optional - string representation of NAN to use, default 'NaN' + Buffer to write to. + na_rep : str, optional + String representation of NaN to use, default 'NaN'. float_format : one-parameter function, optional - formatter function to apply to columns' elements if they are floats - default None - header : boolean, default True - Add the Series header (index name) + Formatter function to apply to columns' elements if they are + floats, default None. + header : bool, default True + Add the Series header (index name). index : bool, optional - Add index (row) labels, default True - length : boolean, default False - Add the Series length - dtype : boolean, default False - Add the Series dtype - name : boolean, default False - Add the Series name if not None + Add index (row) labels, default True. + length : bool, default False + Add the Series length. + dtype : bool, default False + Add the Series dtype. + name : bool, default False + Add the Series name if not None. max_rows : int, optional Maximum number of rows to show before truncating. If None, show all. Returns ------- - formatted : string (if not buffer passed) + str or None + String representation of Series if ``buf=None``, otherwise None. """ formatter = fmt.SeriesFormatter(self, name=name, length=length, @@ -1456,7 +1463,7 @@ def iteritems(self): def keys(self): """ - Alias for index. + Return alias for index. """ return self.index @@ -1476,7 +1483,8 @@ def to_dict(self, into=dict): Returns ------- - value_dict : collections.Mapping + collections.Mapping + Key-value representation of Series. Examples -------- @@ -1488,7 +1496,7 @@ def to_dict(self, into=dict): OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)]) >>> dd = defaultdict(list) >>> s.to_dict(dd) - defaultdict(, {0: 1, 1: 2, 2: 3, 3: 4}) + defaultdict(, {0: 1, 1: 2, 2: 3, 3: 4}) """ # GH16122 into_c = com.standardize_mapping(into) @@ -1506,7 +1514,18 @@ def to_frame(self, name=None): Returns ------- - data_frame : DataFrame + DataFrame + DataFrame representation of Series. + + Examples + -------- + >>> s = pd.Series(["a", "b", "c"], + ... name="vals") + >>> s.to_frame() + vals + 0 a + 1 b + 2 c """ if name is None: df = self._constructor_expanddim(self) @@ -1521,12 +1540,14 @@ def to_sparse(self, kind='block', fill_value=None): Parameters ---------- - kind : {'block', 'integer'} + kind : {'block', 'integer'}, default 'block' fill_value : float, defaults to NaN (missing) + Value to use for filling NaN values. Returns ------- - sp : SparseSeries + SparseSeries + Sparse representation of the Series. """ # TODO: deprecate from pandas.core.sparse.series import SparseSeries @@ -1564,11 +1585,18 @@ def count(self, level=None): ---------- level : int or level name, default None If the axis is a MultiIndex (hierarchical), count along a - particular level, collapsing into a smaller Series + particular level, collapsing into a smaller Series. Returns ------- - nobs : int or Series (if level specified) + int or Series (if level specified) + Number of non-null values in the Series. + + Examples + -------- + >>> s = pd.Series([0.0, 1.0, np.nan]) + >>> s.count() + 2 """ if level is None: return notna(com.values_from_object(self)).sum() @@ -1597,14 +1625,15 @@ def mode(self, dropna=True): Parameters ---------- - dropna : boolean, default True + dropna : bool, default True Don't consider counts of NaN/NaT. .. versionadded:: 0.24.0 Returns ------- - modes : Series (sorted) + Series + Modes of the Series in sorted order. """ # TODO: Add option for bins like value_counts() return algorithms.mode(self, dropna=dropna) @@ -1619,10 +1648,19 @@ def unique(self): Returns ------- ndarray or ExtensionArray - The unique values returned as a NumPy array. In case of an - extension-array backed Series, a new - :class:`~api.extensions.ExtensionArray` of that type with just - the unique values is returned. This includes + The unique values returned as a NumPy array. See Notes. + + See Also + -------- + unique : Top-level unique method for any 1-d array-like object. + Index.unique : Return Index with unique values from an Index object. + + Notes + ----- + Returns the unique values as a NumPy array. In case of an + extension-array backed Series, a new + :class:`~api.extensions.ExtensionArray` of that type with just + the unique values is returned. This includes * Categorical * Period @@ -1631,10 +1669,7 @@ def unique(self): * Sparse * IntegerNA - See Also - -------- - unique : Top-level unique method for any 1-d array-like object. - Index.unique : Return Index with unique values from an Index object. + See Examples section. Examples -------- @@ -1677,12 +1712,13 @@ def drop_duplicates(self, keep='first', inplace=False): - 'first' : Drop duplicates except for the first occurrence. - 'last' : Drop duplicates except for the last occurrence. - ``False`` : Drop all duplicates. - inplace : boolean, default ``False`` + inplace : bool, default ``False`` If ``True``, performs operation inplace and returns None. Returns ------- - deduplicated : Series + Series + Series with duplicates dropped. See Also -------- @@ -1759,7 +1795,9 @@ def duplicated(self, keep='first'): Returns ------- - pandas.core.series.Series + Series + Series indicating whether each value has occurred in the + preceding values. See Also -------- @@ -1823,7 +1861,7 @@ def idxmin(self, axis=0, skipna=True, *args, **kwargs): Parameters ---------- - skipna : boolean, default True + skipna : bool, default True Exclude NA/null values. If the entire Series is NA, the result will be NA. axis : int, default 0 @@ -1835,7 +1873,8 @@ def idxmin(self, axis=0, skipna=True, *args, **kwargs): Returns ------- - idxmin : Index of minimum of values. + Index + Label of the minimum value. Raises ------ @@ -1860,7 +1899,7 @@ def idxmin(self, axis=0, skipna=True, *args, **kwargs): Examples -------- >>> s = pd.Series(data=[1, None, 4, 1], - ... index=['A' ,'B' ,'C' ,'D']) + ... index=['A', 'B', 'C', 'D']) >>> s A 1.0 B NaN @@ -1892,7 +1931,7 @@ def idxmax(self, axis=0, skipna=True, *args, **kwargs): Parameters ---------- - skipna : boolean, default True + skipna : bool, default True Exclude NA/null values. If the entire Series is NA, the result will be NA. axis : int, default 0 @@ -1904,7 +1943,8 @@ def idxmax(self, axis=0, skipna=True, *args, **kwargs): Returns ------- - idxmax : Index of maximum of values. + Index + Label of the maximum value. Raises ------ @@ -1988,12 +2028,22 @@ def round(self, decimals=0, *args, **kwargs): Returns ------- - Series object + Series + Rounded values of the Series. See Also -------- - numpy.around - DataFrame.round + numpy.around : Round values of an np.array. + DataFrame.round : Round values of a DataFrame. + + Examples + -------- + >>> s = pd.Series([0.1, 1.3, 2.7]) + >>> s.round() + 0 0.0 + 1 1.0 + 2 3.0 + dtype: float64 """ nv.validate_round(args, kwargs) result = com.values_from_object(self).round(decimals) @@ -2008,7 +2058,7 @@ def quantile(self, q=0.5, interpolation='linear'): Parameters ---------- q : float or array-like, default 0.5 (50% quantile) - 0 <= q <= 1, the quantile(s) to compute + 0 <= q <= 1, the quantile(s) to compute. interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'} .. versionadded:: 0.18.0 @@ -2024,9 +2074,10 @@ def quantile(self, q=0.5, interpolation='linear'): Returns ------- - quantile : float or Series - if ``q`` is an array, a Series will be returned where the - index is ``q`` and the values are the quantiles. + float or Series + If ``q`` is an array, a Series will be returned where the + index is ``q`` and the values are the quantiles, otherwise + a float will be returned. See Also -------- @@ -2072,6 +2123,7 @@ def corr(self, other, method='pearson', min_periods=None): Parameters ---------- other : Series + Series with which to compute the correlation. method : {'pearson', 'kendall', 'spearman'} or callable * pearson : standard correlation coefficient * kendall : Kendall Tau correlation coefficient @@ -2081,16 +2133,18 @@ def corr(self, other, method='pearson', min_periods=None): .. versionadded:: 0.24.0 min_periods : int, optional - Minimum number of observations needed to have a valid result + Minimum number of observations needed to have a valid result. Returns ------- - correlation : float + float + Correlation with other. Examples -------- - >>> histogram_intersection = lambda a, b: np.minimum(a, b - ... ).sum().round(decimals=1) + >>> def histogram_intersection(a, b): + ... v = np.minimum(a, b).sum().round(decimals=1) + ... return v >>> s1 = pd.Series([.2, .0, .6, .2]) >>> s2 = pd.Series([.3, .6, .0, .1]) >>> s1.corr(s2, method=histogram_intersection) @@ -2115,14 +2169,22 @@ def cov(self, other, min_periods=None): Parameters ---------- other : Series + Series with which to compute the covariance. min_periods : int, optional - Minimum number of observations needed to have a valid result + Minimum number of observations needed to have a valid result. Returns ------- - covariance : float + float + Covariance between Series and other normalized by N-1 + (unbiased estimator). - Normalized by N-1 (unbiased estimator). + Examples + -------- + >>> s1 = pd.Series([0.90010907, 0.13484424, 0.62036035]) + >>> s2 = pd.Series([0.12528585, 0.26962463, 0.51111198]) + >>> s1.cov(s2) + -0.01685762652715874 """ this, other = self.align(other, join='inner', copy=False) if len(this) == 0: @@ -2145,7 +2207,8 @@ def diff(self, periods=1): Returns ------- - diffed : Series + Series + First differences of the Series. See Also -------- @@ -2279,7 +2342,7 @@ def dot(self, other): 8 >>> s @ other 8 - >>> df = pd.DataFrame([[0 ,1], [-2, 3], [4, -5], [6, 7]]) + >>> df = pd.DataFrame([[0, 1], [-2, 3], [4, -5], [6, 7]]) >>> s.dot(df) 0 24 1 14 @@ -2331,12 +2394,8 @@ def __rmatmul__(self, other): @Substitution(klass='Series') @Appender(base._shared_docs['searchsorted']) def searchsorted(self, value, side='left', sorter=None): - if sorter is not None: - sorter = ensure_platform_int(sorter) - result = self._values.searchsorted(Series(value)._values, - side=side, sorter=sorter) - - return result[0] if is_scalar(value) else result + return algorithms.searchsorted(self._values, value, + side=side, sorter=sorter) # ------------------------------------------------------------------- # Combination @@ -2348,17 +2407,19 @@ def append(self, to_append, ignore_index=False, verify_integrity=False): Parameters ---------- to_append : Series or list/tuple of Series - ignore_index : boolean, default False + Series to append with self. + ignore_index : bool, default False If True, do not use the index labels. .. versionadded:: 0.19.0 - verify_integrity : boolean, default False - If True, raise Exception on creating index with duplicates + verify_integrity : bool, default False + If True, raise Exception on creating index with duplicates. Returns ------- - appended : Series + Series + Concatenated Series. See Also -------- @@ -2376,7 +2437,7 @@ def append(self, to_append, ignore_index=False, verify_integrity=False): -------- >>> s1 = pd.Series([1, 2, 3]) >>> s2 = pd.Series([4, 5, 6]) - >>> s3 = pd.Series([4, 5, 6], index=[3,4,5]) + >>> s3 = pd.Series([4, 5, 6], index=[3, 4, 5]) >>> s1.append(s2) 0 1 1 2 @@ -2439,7 +2500,7 @@ def _binop(self, other, func, level=None, fill_value=None): Returns ------- - combined : Series + Series """ if not isinstance(other, Series): raise AssertionError('Other operand must be Series') @@ -2862,7 +2923,7 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False, Returns ------- - pandas.Series + Series The original Series sorted by the labels. See Also @@ -2987,7 +3048,7 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False, def argsort(self, axis=0, kind='quicksort', order=None): """ - Overrides ndarray.argsort. Argsorts the value, omitting NA/null values, + Override ndarray.argsort. Argsorts the value, omitting NA/null values, and places the result in the same locations as the non-NA values. Parameters @@ -3002,7 +3063,9 @@ def argsort(self, axis=0, kind='quicksort', order=None): Returns ------- - argsorted : Series, with -1 indicated where nan values are present + Series + Positions of values within the sort order with -1 indicating + nan values. See Also -------- @@ -3035,8 +3098,10 @@ def nlargest(self, n=5, keep='first'): When there are duplicate values that cannot all fit in a Series of `n` elements: - - ``first`` : take the first occurrences based on the index order - - ``last`` : take the last occurrences based on the index order + - ``first`` : return the first `n` occurrences in order + of appearance. + - ``last`` : return the last `n` occurrences in reverse + order of appearance. - ``all`` : keep all occurrences. This can result in a Series of size larger than `n`. @@ -3131,8 +3196,10 @@ def nsmallest(self, n=5, keep='first'): When there are duplicate values that cannot all fit in a Series of `n` elements: - - ``first`` : take the first occurrences based on the index order - - ``last`` : take the last occurrences based on the index order + - ``first`` : return the first `n` occurrences in order + of appearance. + - ``last`` : return the last `n` occurrences in reverse + order of appearance. - ``all`` : keep all occurrences. This can result in a Series of size larger than `n`. @@ -3173,7 +3240,7 @@ def nsmallest(self, n=5, keep='first'): Monserat 5200 dtype: int64 - The `n` largest elements where ``n=5`` by default. + The `n` smallest elements where ``n=5`` by default. >>> s.nsmallest() Monserat 5200 @@ -3220,12 +3287,13 @@ def swaplevel(self, i=-2, j=-1, copy=True): Parameters ---------- - i, j : int, string (can be mixed) + i, j : int, str (can be mixed) Level of index to be swapped. Can pass level name as string. Returns ------- - swapped : Series + Series + Series with levels swapped in MultiIndex. .. versionchanged:: 0.18.1 @@ -3265,21 +3333,23 @@ def unstack(self, level=-1, fill_value=None): Parameters ---------- - level : int, string, or list of these, default last level - Level(s) to unstack, can pass level name - fill_value : replace NaN with this value if the unstack produces - missing values + level : int, str, or list of these, default last level + Level(s) to unstack, can pass level name. + fill_value : scalar value, default None + Value to use when replacing NaN values. .. versionadded:: 0.18.0 Returns ------- - unstacked : DataFrame + DataFrame + Unstacked Series. Examples -------- >>> s = pd.Series([1, 2, 3, 4], - ... index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']])) + ... index=pd.MultiIndex.from_product([['one', 'two'], + ... ['a', 'b']])) >>> s one a 1 b 2 @@ -3610,8 +3680,12 @@ def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None, if axis is not None: self._get_axis_number(axis) - # dispatch to ExtensionArray interface - if isinstance(delegate, ExtensionArray): + if isinstance(delegate, Categorical): + # TODO deprecate numeric_only argument for Categorical and use + # skipna as well, see GH25303 + return delegate._reduce(name, numeric_only=numeric_only, **kwds) + elif isinstance(delegate, ExtensionArray): + # dispatch to ExtensionArray interface return delegate._reduce(name, skipna=skipna, **kwds) elif is_datetime64_dtype(delegate): # use DatetimeIndex implementation to handle skipna correctly @@ -3679,7 +3753,7 @@ def rename(self, index=None, **kwargs): Scalar or hashable sequence-like will alter the ``Series.name`` attribute. copy : bool, default True - Also copy underlying data + Whether to copy underlying data. inplace : bool, default False Whether to return a new Series. If True then value of copy is ignored. @@ -3689,11 +3763,12 @@ def rename(self, index=None, **kwargs): Returns ------- - renamed : Series (new object) + Series + Series with index labels or name altered. See Also -------- - Series.rename_axis + Series.rename_axis : Set the name of the axis. Examples -------- @@ -3703,7 +3778,7 @@ def rename(self, index=None, **kwargs): 1 2 2 3 dtype: int64 - >>> s.rename("my_name") # scalar, changes Series.name + >>> s.rename("my_name") # scalar, changes Series.name 0 1 1 2 2 3 @@ -3762,7 +3837,8 @@ def drop(self, labels=None, axis=0, index=None, columns=None, Returns ------- - dropped : pandas.Series + Series + Series with specified index labels removed. Raises ------ @@ -3778,7 +3854,7 @@ def drop(self, labels=None, axis=0, index=None, columns=None, Examples -------- - >>> s = pd.Series(data=np.arange(3), index=['A','B','C']) + >>> s = pd.Series(data=np.arange(3), index=['A', 'B', 'C']) >>> s A 0 B 1 @@ -3787,7 +3863,7 @@ def drop(self, labels=None, axis=0, index=None, columns=None, Drop labels B en C - >>> s.drop(labels=['B','C']) + >>> s.drop(labels=['B', 'C']) A 0 dtype: int64 @@ -3960,7 +4036,8 @@ def isin(self, values): Returns ------- - isin : Series (bool dtype) + Series + Series of booleans indicating if each element is in values. Raises ------ @@ -4019,7 +4096,8 @@ def between(self, left, right, inclusive=True): Returns ------- Series - Each element will be a boolean. + Series representing whether each element is between left and + right (inclusive). See Also -------- @@ -4101,27 +4179,27 @@ def from_csv(cls, path, sep=',', parse_dates=True, header=None, Parameters ---------- - path : string file path or file handle / StringIO - sep : string, default ',' - Field delimiter - parse_dates : boolean, default True - Parse dates. Different default from read_table + path : str, file path, or file handle / StringIO + sep : str, default ',' + Field delimiter. + parse_dates : bool, default True + Parse dates. Different default from read_table. header : int, default None - Row to use as header (skip prior rows) + Row to use as header (skip prior rows). index_col : int or sequence, default 0 Column to use for index. If a sequence is given, a MultiIndex - is used. Different default from read_table - encoding : string, optional - a string representing the encoding to use if the contents are - non-ascii, for python versions prior to 3 - infer_datetime_format : boolean, default False + is used. Different default from read_table. + encoding : str, optional + A string representing the encoding to use if the contents are + non-ascii, for python versions prior to 3. + infer_datetime_format : bool, default False If True and `parse_dates` is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up. Returns ------- - y : Series + Series See Also -------- @@ -4322,19 +4400,21 @@ def valid(self, inplace=False, **kwargs): def to_timestamp(self, freq=None, how='start', copy=True): """ - Cast to datetimeindex of timestamps, at *beginning* of period. + Cast to DatetimeIndex of Timestamps, at *beginning* of period. Parameters ---------- - freq : string, default frequency of PeriodIndex - Desired frequency + freq : str, default frequency of PeriodIndex + Desired frequency. how : {'s', 'e', 'start', 'end'} Convention for converting period to timestamp; start of period - vs. end + vs. end. + copy : bool, default True + Whether or not to return a copy. Returns ------- - ts : Series with DatetimeIndex + Series with DatetimeIndex """ new_values = self._values if copy: @@ -4351,11 +4431,15 @@ def to_period(self, freq=None, copy=True): Parameters ---------- - freq : string, default + freq : str, default None + Frequency associated with the PeriodIndex. + copy : bool, default True + Whether or not to return a copy. Returns ------- - ts : Series with PeriodIndex + Series + Series with index converted to PeriodIndex. """ new_values = self._values if copy: diff --git a/pandas/core/sparse/frame.py b/pandas/core/sparse/frame.py index 586193fe11850..2d54b82a3c844 100644 --- a/pandas/core/sparse/frame.py +++ b/pandas/core/sparse/frame.py @@ -124,8 +124,8 @@ def __init__(self, data=None, index=None, columns=None, default_kind=None, columns = Index([]) else: for c in columns: - data[c] = SparseArray(np.nan, index=index, - kind=self._default_kind, + data[c] = SparseArray(self._default_fill_value, + index=index, kind=self._default_kind, fill_value=self._default_fill_value) mgr = to_manager(data, columns, index) if dtype is not None: @@ -194,7 +194,9 @@ def sp_maker(x): return to_manager(sdict, columns, index) def _init_matrix(self, data, index, columns, dtype=None): - """ Init self from ndarray or list of lists """ + """ + Init self from ndarray or list of lists. + """ data = prep_ndarray(data, copy=False) index, columns = self._prep_index(data, index, columns) data = {idx: data[:, i] for i, idx in enumerate(columns)} @@ -202,7 +204,9 @@ def _init_matrix(self, data, index, columns, dtype=None): def _init_spmatrix(self, data, index, columns, dtype=None, fill_value=None): - """ Init self from scipy.sparse matrix """ + """ + Init self from scipy.sparse matrix. + """ index, columns = self._prep_index(data, index, columns) data = data.tocoo() N = len(index) @@ -302,7 +306,9 @@ def __getstate__(self): _default_kind=self._default_kind) def _unpickle_sparse_frame_compat(self, state): - """ original pickle format """ + """ + Original pickle format + """ series, cols, idx, fv, kind = state if not isinstance(cols, Index): # pragma: no cover @@ -338,7 +344,9 @@ def to_dense(self): return DataFrame(data, index=self.index, columns=self.columns) def _apply_columns(self, func): - """ get new SparseDataFrame applying func to each columns """ + """ + Get new SparseDataFrame applying func to each columns + """ new_data = {col: func(series) for col, series in compat.iteritems(self)} diff --git a/pandas/core/sparse/scipy_sparse.py b/pandas/core/sparse/scipy_sparse.py index 2d0ce2d5e5951..5a39a1529a33a 100644 --- a/pandas/core/sparse/scipy_sparse.py +++ b/pandas/core/sparse/scipy_sparse.py @@ -3,7 +3,9 @@ Currently only includes SparseSeries.to_coo helpers. """ -from pandas.compat import OrderedDict, lmap +from collections import OrderedDict + +from pandas.compat import lmap from pandas.core.index import Index, MultiIndex from pandas.core.series import Series @@ -90,7 +92,8 @@ def _get_index_subset_to_coord_dict(index, subset, sort_labels=False): def _sparse_series_to_coo(ss, row_levels=(0, ), column_levels=(1, ), sort_labels=False): - """ Convert a SparseSeries to a scipy.sparse.coo_matrix using index + """ + Convert a SparseSeries to a scipy.sparse.coo_matrix using index levels row_levels, column_levels as the row and column labels respectively. Returns the sparse_matrix, row and column labels. """ @@ -116,7 +119,8 @@ def _sparse_series_to_coo(ss, row_levels=(0, ), column_levels=(1, ), def _coo_to_sparse_series(A, dense_index=False): - """ Convert a scipy.sparse.coo_matrix to a SparseSeries. + """ + Convert a scipy.sparse.coo_matrix to a SparseSeries. Use the defaults given in the SparseSeries constructor. """ s = Series(A.data, MultiIndex.from_arrays((A.row, A.col))) diff --git a/pandas/core/strings.py b/pandas/core/strings.py index ca79dcd9408d8..9577b07360f65 100644 --- a/pandas/core/strings.py +++ b/pandas/core/strings.py @@ -120,7 +120,7 @@ def str_count(arr, pat, flags=0): Returns ------- - counts : Series or Index + Series or Index Same type as the calling object containing the integer counts. See Also @@ -283,7 +283,7 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True): return `True`. However, '.0' as a regex matches any character followed by a 0. - >>> s2 = pd.Series(['40','40.0','41','41.0','35']) + >>> s2 = pd.Series(['40', '40.0', '41', '41.0', '35']) >>> s2.str.contains('.0', regex=True) 0 True 1 True @@ -433,13 +433,13 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0, regex=True): Parameters ---------- - pat : string or compiled regex + pat : str or compiled regex String can be a character sequence or regular expression. .. versionadded:: 0.20.0 `pat` also accepts a compiled regex. - repl : string or callable + repl : str or callable Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See :func:`re.sub`. @@ -448,15 +448,15 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0, regex=True): `repl` also accepts a callable. n : int, default -1 (all) - Number of replacements to make from start - case : boolean, default None + Number of replacements to make from start. + case : bool, default None - If True, case sensitive (the default if `pat` is a string) - Set to False for case insensitive - Cannot be set if `pat` is a compiled regex flags : int, default 0 (no flags) - re module flags, e.g. re.IGNORECASE - Cannot be set if `pat` is a compiled regex - regex : boolean, default True + regex : bool, default True - If True, assumes the passed-in pattern is a regular expression. - If False, treats the pattern as a literal string - Cannot be set to False if `pat` is a compiled regex or `repl` is @@ -537,6 +537,7 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0, regex=True): Using a compiled regex with flags + >>> import re >>> regex_pat = re.compile(r'FUZ', flags=re.IGNORECASE) >>> pd.Series(['foo', 'fuz', np.nan]).str.replace(regex_pat, 'bar') 0 foo @@ -604,6 +605,7 @@ def str_repeat(arr, repeats): 0 a 1 b 2 c + dtype: object Single int repeats string in Series @@ -611,6 +613,7 @@ def str_repeat(arr, repeats): 0 aa 1 bb 2 cc + dtype: object Sequence of int repeats corresponding string in Series @@ -618,6 +621,7 @@ def str_repeat(arr, repeats): 0 a 1 bb 2 ccc + dtype: object """ if is_scalar(repeats): def rep(x): @@ -646,13 +650,14 @@ def str_match(arr, pat, case=True, flags=0, na=np.nan): Parameters ---------- - pat : string - Character sequence or regular expression - case : boolean, default True - If True, case sensitive + pat : str + Character sequence or regular expression. + case : bool, default True + If True, case sensitive. flags : int, default 0 (no flags) - re module flags, e.g. re.IGNORECASE - na : default NaN, fill value for missing values + re module flags, e.g. re.IGNORECASE. + na : default NaN + Fill value for missing values. Returns ------- @@ -768,7 +773,7 @@ def str_extract(arr, pat, flags=0, expand=True): Parameters ---------- - pat : string + pat : str Regular expression pattern with capturing groups. flags : int, default 0 (no flags) Flags from the ``re`` module, e.g. ``re.IGNORECASE``, that @@ -966,21 +971,23 @@ def str_extractall(arr, pat, flags=0): def str_get_dummies(arr, sep='|'): """ - Split each string in the Series by sep and return a frame of - dummy/indicator variables. + Split each string in the Series by sep and return a DataFrame + of dummy/indicator variables. Parameters ---------- - sep : string, default "|" + sep : str, default "|" String to split on. Returns ------- - dummies : DataFrame + DataFrame + Dummy variables corresponding to values of the Series. See Also -------- - get_dummies + get_dummies : Convert categorical variable into dummy/indicator + variables. Examples -------- @@ -1089,11 +1096,11 @@ def str_findall(arr, pat, flags=0): Parameters ---------- - pat : string + pat : str Pattern or regular expression. flags : int, default 0 - ``re`` module flags, e.g. `re.IGNORECASE` (default is 0, which means - no flags). + Flags from ``re`` module, e.g. `re.IGNORECASE` (default is 0, which + means no flags). Returns ------- @@ -1182,17 +1189,18 @@ def str_find(arr, sub, start=0, end=None, side='left'): Parameters ---------- sub : str - Substring being searched + Substring being searched. start : int - Left edge index + Left edge index. end : int - Right edge index + Right edge index. side : {'left', 'right'}, default 'left' - Specifies a starting side, equivalent to ``find`` or ``rfind`` + Specifies a starting side, equivalent to ``find`` or ``rfind``. Returns ------- - found : Series/Index of integer values + Series or Index + Indexes where substring is found. """ if not isinstance(sub, compat.string_types): @@ -1430,7 +1438,7 @@ def str_slice_replace(arr, start=None, stop=None, repl=None): Returns ------- - replaced : Series or Index + Series or Index Same type as the original object. See Also @@ -1513,7 +1521,7 @@ def str_strip(arr, to_strip=None, side='both'): Returns ------- - stripped : Series/Index of objects + Series or Index """ if side == 'both': f = lambda x: x.strip(to_strip) @@ -1537,30 +1545,30 @@ def str_wrap(arr, width, **kwargs): Parameters ---------- width : int - Maximum line-width + Maximum line width. expand_tabs : bool, optional - If true, tab characters will be expanded to spaces (default: True) + If True, tab characters will be expanded to spaces (default: True). replace_whitespace : bool, optional - If true, each whitespace character (as defined by string.whitespace) + If True, each whitespace character (as defined by string.whitespace) remaining after tab expansion will be replaced by a single space - (default: True) + (default: True). drop_whitespace : bool, optional - If true, whitespace that, after wrapping, happens to end up at the - beginning or end of a line is dropped (default: True) + If True, whitespace that, after wrapping, happens to end up at the + beginning or end of a line is dropped (default: True). break_long_words : bool, optional - If true, then words longer than width will be broken in order to ensure + If True, then words longer than width will be broken in order to ensure that no lines are longer than width. If it is false, long words will - not be broken, and some lines may be longer than width. (default: True) + not be broken, and some lines may be longer than width (default: True). break_on_hyphens : bool, optional - If true, wrapping will occur preferably on whitespace and right after + If True, wrapping will occur preferably on whitespace and right after hyphens in compound words, as it is customary in English. If false, only whitespaces will be considered as potentially good places for line breaks, but you need to set break_long_words to false if you want truly - insecable words. (default: True) + insecable words (default: True). Returns ------- - wrapped : Series/Index of objects + Series or Index Notes ----- @@ -1581,6 +1589,7 @@ def str_wrap(arr, width, **kwargs): >>> s.str.wrap(12) 0 line to be\nwrapped 1 another line\nto be\nwrapped + dtype: object """ kwargs['width'] = width @@ -1613,7 +1622,7 @@ def str_translate(arr, table, deletechars=None): Returns ------- - translated : Series/Index of objects + Series or Index """ if deletechars is None: f = lambda x: x.translate(table) @@ -1641,15 +1650,16 @@ def str_get(arr, i): Returns ------- - items : Series/Index of objects + Series or Index Examples -------- >>> s = pd.Series(["String", - (1, 2, 3), - ["a", "b", "c"], - 123, -456, - {1:"Hello", "2":"World"}]) + ... (1, 2, 3), + ... ["a", "b", "c"], + ... 123, + ... -456, + ... {1: "Hello", "2": "World"}]) >>> s 0 String 1 (1, 2, 3) @@ -1674,7 +1684,7 @@ def str_get(arr, i): 2 c 3 NaN 4 NaN - 5 NaN + 5 None dtype: object """ def f(x): @@ -1699,7 +1709,7 @@ def str_decode(arr, encoding, errors="strict"): Returns ------- - decoded : Series/Index of objects + Series or Index """ if encoding in _cpython_optimized_decoders: # CPython optimized implementation @@ -1872,7 +1882,7 @@ def _wrap_result(self, result, use_codes=True, if expand is None: # infer from ndim if expand is not specified - expand = False if result.ndim == 1 else True + expand = result.ndim != 1 elif expand is True and not isinstance(self._orig, Index): # required when expand=True is explicitly specified @@ -2091,7 +2101,7 @@ def cat(self, others=None, sep=None, na_rep=None, join=None): Returns ------- - concat : str or Series/Index of objects + str, Series or Index If `others` is None, `str` is returned, otherwise a `Series/Index` (same type as caller) of objects is returned. @@ -2869,7 +2879,7 @@ def rindex(self, sub, start=0, end=None): return self._wrap_result(result) _shared_docs['len'] = (""" - Computes the length of each element in the Series/Index. The element may be + Compute the length of each element in the Series/Index. The element may be a sequence (such as a string, tuple or list) or a collection (such as a dictionary). @@ -2916,7 +2926,7 @@ def rindex(self, sub, start=0, end=None): _shared_docs['casemethods'] = (""" Convert strings in the Series/Index to %(type)s. - + %(version)s Equivalent to :meth:`str.%(method)s`. Returns @@ -2933,6 +2943,7 @@ def rindex(self, sub, start=0, end=None): remaining to lowercase. Series.str.swapcase : Converts uppercase to lowercase and lowercase to uppercase. + Series.str.casefold: Removes all case distinctions in the string. Examples -------- @@ -2979,12 +2990,15 @@ def rindex(self, sub, start=0, end=None): 3 sWaPcAsE dtype: object """) - _shared_docs['lower'] = dict(type='lowercase', method='lower') - _shared_docs['upper'] = dict(type='uppercase', method='upper') - _shared_docs['title'] = dict(type='titlecase', method='title') + _shared_docs['lower'] = dict(type='lowercase', method='lower', version='') + _shared_docs['upper'] = dict(type='uppercase', method='upper', version='') + _shared_docs['title'] = dict(type='titlecase', method='title', version='') _shared_docs['capitalize'] = dict(type='be capitalized', - method='capitalize') - _shared_docs['swapcase'] = dict(type='be swapcased', method='swapcase') + method='capitalize', version='') + _shared_docs['swapcase'] = dict(type='be swapcased', method='swapcase', + version='') + _shared_docs['casefold'] = dict(type='be casefolded', method='casefold', + version='\n .. versionadded:: 0.25.0\n') lower = _noarg_wrapper(lambda x: x.lower(), docstring=_shared_docs['casemethods'] % _shared_docs['lower']) @@ -3000,6 +3014,9 @@ def rindex(self, sub, start=0, end=None): swapcase = _noarg_wrapper(lambda x: x.swapcase(), docstring=_shared_docs['casemethods'] % _shared_docs['swapcase']) + casefold = _noarg_wrapper(lambda x: x.casefold(), + docstring=_shared_docs['casemethods'] % + _shared_docs['casefold']) _shared_docs['ismethods'] = (""" Check whether all characters in each string are %(type)s. diff --git a/pandas/core/tools/datetimes.py b/pandas/core/tools/datetimes.py index e6478da400d76..0c76ac6cd75ac 100644 --- a/pandas/core/tools/datetimes.py +++ b/pandas/core/tools/datetimes.py @@ -497,8 +497,8 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, See Also -------- - pandas.DataFrame.astype : Cast argument to a specified dtype. - pandas.to_timedelta : Convert argument to timedelta. + DataFrame.astype : Cast argument to a specified dtype. + to_timedelta : Convert argument to timedelta. Examples -------- @@ -588,9 +588,8 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, if not cache_array.empty: result = arg.map(cache_array) else: - from pandas import Series values = convert_listlike(arg._values, True, format) - result = Series(values, index=arg.index, name=arg.name) + result = arg._constructor(values, index=arg.index, name=arg.name) elif isinstance(arg, (ABCDataFrame, compat.MutableMapping)): result = _assemble_from_unit_mappings(arg, errors, box, tz) elif isinstance(arg, ABCIndexClass): @@ -827,7 +826,6 @@ def to_time(arg, format=None, infer_time_format=False, errors='raise'): ------- datetime.time """ - from pandas.core.series import Series def _convert_listlike(arg, format): @@ -892,9 +890,9 @@ def _convert_listlike(arg, format): return arg elif isinstance(arg, time): return arg - elif isinstance(arg, Series): + elif isinstance(arg, ABCSeries): values = _convert_listlike(arg._values, format) - return Series(values, index=arg.index, name=arg.name) + return arg._constructor(values, index=arg.index, name=arg.name) elif isinstance(arg, ABCIndexClass): return _convert_listlike(arg, format) elif is_list_like(arg): diff --git a/pandas/core/tools/numeric.py b/pandas/core/tools/numeric.py index 79d8ee38637f9..08ce649d8602c 100644 --- a/pandas/core/tools/numeric.py +++ b/pandas/core/tools/numeric.py @@ -19,6 +19,14 @@ def to_numeric(arg, errors='raise', downcast=None): depending on the data supplied. Use the `downcast` parameter to obtain other dtypes. + Please note that precision loss may occur if really large numbers + are passed in. Due to the internal limitations of `ndarray`, if + numbers smaller than `-9223372036854775808` (np.iinfo(np.int64).min) + or larger than `18446744073709551615` (np.iinfo(np.uint64).max) are + passed in, it is very likely they will be converted to float so that + they can stored in an `ndarray`. These warnings apply similarly to + `Series` since it internally leverages `ndarray`. + Parameters ---------- arg : scalar, list, tuple, 1-d array, or Series @@ -51,13 +59,13 @@ def to_numeric(arg, errors='raise', downcast=None): Returns ------- ret : numeric if parsing succeeded. - Return type depends on input. Series if Series, otherwise ndarray + Return type depends on input. Series if Series, otherwise ndarray. See Also -------- - pandas.DataFrame.astype : Cast argument to a specified dtype. - pandas.to_datetime : Convert argument to datetime. - pandas.to_timedelta : Convert argument to timedelta. + DataFrame.astype : Cast argument to a specified dtype. + to_datetime : Convert argument to datetime. + to_timedelta : Convert argument to timedelta. numpy.ndarray.astype : Cast a numpy array to a specified type. Examples @@ -130,7 +138,7 @@ def to_numeric(arg, errors='raise', downcast=None): values = values.astype(np.int64) else: values = ensure_object(values) - coerce_numeric = False if errors in ('ignore', 'raise') else True + coerce_numeric = errors not in ('ignore', 'raise') values = lib.maybe_convert_numeric(values, set(), coerce_numeric=coerce_numeric) diff --git a/pandas/core/tools/timedeltas.py b/pandas/core/tools/timedeltas.py index e3428146b91d8..7ebaf3056e79e 100644 --- a/pandas/core/tools/timedeltas.py +++ b/pandas/core/tools/timedeltas.py @@ -2,14 +2,16 @@ timedelta support tools """ +import warnings + import numpy as np +from pandas._libs.tslibs import NaT from pandas._libs.tslibs.timedeltas import Timedelta, parse_timedelta_unit from pandas.core.dtypes.common import is_list_like from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries -import pandas as pd from pandas.core.arrays.timedeltas import sequence_to_td64ns @@ -90,13 +92,17 @@ def to_timedelta(arg, unit='ns', box=True, errors='raise'): raise ValueError("errors must be one of 'ignore', " "'raise', or 'coerce'}") + if unit in {'Y', 'y', 'M'}: + warnings.warn("M and Y units are deprecated and " + "will be removed in a future version.", + FutureWarning, stacklevel=2) + if arg is None: return arg elif isinstance(arg, ABCSeries): - from pandas import Series values = _convert_listlike(arg._values, unit=unit, box=False, errors=errors) - return Series(values, index=arg.index, name=arg.name) + return arg._constructor(values, index=arg.index, name=arg.name) elif isinstance(arg, ABCIndexClass): return _convert_listlike(arg, unit=unit, box=box, errors=errors, name=arg.name) @@ -120,7 +126,8 @@ def _coerce_scalar_to_timedelta_type(r, unit='ns', box=True, errors='raise'): try: result = Timedelta(r, unit) if not box: - result = result.asm8 + # explicitly view as timedelta64 for case when result is pd.NaT + result = result.asm8.view('timedelta64[ns]') except ValueError: if errors == 'raise': raise @@ -128,7 +135,7 @@ def _coerce_scalar_to_timedelta_type(r, unit='ns', box=True, errors='raise'): return r # coerce - result = pd.NaT + result = NaT return result diff --git a/pandas/core/window.py b/pandas/core/window.py index 5a9157b43ecd6..9e29fdb94c1e0 100644 --- a/pandas/core/window.py +++ b/pandas/core/window.py @@ -164,9 +164,9 @@ def __unicode__(self): Provide a nice str repr of our rolling object. """ - attrs = ["{k}={v}".format(k=k, v=getattr(self, k)) + attrs = ("{k}={v}".format(k=k, v=getattr(self, k)) for k in self._attributes - if getattr(self, k, None) is not None] + if getattr(self, k, None) is not None) return "{klass} [{attrs}]".format(klass=self._window_type, attrs=','.join(attrs)) @@ -438,7 +438,7 @@ def aggregate(self, arg, *args, **kwargs): class Window(_Window): """ - Provides rolling window calculations. + Provide rolling window calculations. .. versionadded:: 0.18.0 @@ -900,9 +900,9 @@ class _Rolling_and_Expanding(_Rolling): See Also -------- - pandas.Series.%(name)s : Calling object with Series data. - pandas.DataFrame.%(name)s : Calling object with DataFrames. - pandas.DataFrame.count : Count of the full DataFrame. + Series.%(name)s : Calling object with Series data. + DataFrame.%(name)s : Calling object with DataFrames. + DataFrame.count : Count of the full DataFrame. Examples -------- @@ -1271,7 +1271,7 @@ def skew(self, **kwargs): ------- Series or DataFrame Returned object type is determined by the caller of the %(name)s - calculation + calculation. See Also -------- @@ -1322,9 +1322,9 @@ def kurt(self, **kwargs): See Also -------- - pandas.Series.quantile : Computes value at the given quantile over all data + Series.quantile : Computes value at the given quantile over all data in Series. - pandas.DataFrame.quantile : Computes values at the given quantile over + DataFrame.quantile : Computes values at the given quantile over requested axis in DataFrame. Examples @@ -1626,8 +1626,8 @@ def _validate_freq(self): _agg_see_also_doc = dedent(""" See Also -------- - pandas.Series.rolling - pandas.DataFrame.rolling + Series.rolling + DataFrame.rolling """) _agg_examples_doc = dedent(""" @@ -1803,7 +1803,7 @@ def corr(self, other=None, pairwise=None, **kwargs): class RollingGroupby(_GroupByMixin, Rolling): """ - Provides a rolling groupby implementation. + Provide a rolling groupby implementation. .. versionadded:: 0.18.1 @@ -1834,7 +1834,7 @@ def _validate_monotonic(self): class Expanding(_Rolling_and_Expanding): """ - Provides expanding transformations. + Provide expanding transformations. .. versionadded:: 0.18.0 @@ -1916,9 +1916,9 @@ def _get_window(self, other=None): _agg_see_also_doc = dedent(""" See Also -------- - pandas.DataFrame.expanding.aggregate - pandas.DataFrame.rolling.aggregate - pandas.DataFrame.aggregate + DataFrame.expanding.aggregate + DataFrame.rolling.aggregate + DataFrame.aggregate """) _agg_examples_doc = dedent(""" @@ -2076,7 +2076,7 @@ def corr(self, other=None, pairwise=None, **kwargs): class ExpandingGroupby(_GroupByMixin, Expanding): """ - Provides a expanding groupby implementation. + Provide a expanding groupby implementation. .. versionadded:: 0.18.1 @@ -2117,7 +2117,7 @@ def _constructor(self): class EWM(_Rolling): r""" - Provides exponential weighted functions. + Provide exponential weighted functions. .. versionadded:: 0.18.0 @@ -2125,16 +2125,17 @@ class EWM(_Rolling): ---------- com : float, optional Specify decay in terms of center of mass, - :math:`\alpha = 1 / (1 + com),\text{ for } com \geq 0` + :math:`\alpha = 1 / (1 + com),\text{ for } com \geq 0`. span : float, optional Specify decay in terms of span, - :math:`\alpha = 2 / (span + 1),\text{ for } span \geq 1` + :math:`\alpha = 2 / (span + 1),\text{ for } span \geq 1`. halflife : float, optional Specify decay in terms of half-life, - :math:`\alpha = 1 - exp(log(0.5) / halflife),\text{ for } halflife > 0` + :math:`\alpha = 1 - exp(log(0.5) / halflife),\text{ for } + halflife > 0`. alpha : float, optional Specify smoothing factor :math:`\alpha` directly, - :math:`0 < \alpha \leq 1` + :math:`0 < \alpha \leq 1`. .. versionadded:: 0.18.0 @@ -2143,14 +2144,19 @@ class EWM(_Rolling): (otherwise result is NA). adjust : bool, default True Divide by decaying adjustment factor in beginning periods to account - for imbalance in relative weightings (viewing EWMA as a moving average) + for imbalance in relative weightings + (viewing EWMA as a moving average). ignore_na : bool, default False Ignore missing values when calculating weights; - specify True to reproduce pre-0.15.0 behavior + specify True to reproduce pre-0.15.0 behavior. + axis : {0 or 'index', 1 or 'columns'}, default 0 + The axis to use. The value 0 identifies the rows, and 1 + identifies the columns. Returns ------- - a Window sub-classed for the particular operation + DataFrame + A Window sub-classed for the particular operation. See Also -------- @@ -2188,6 +2194,7 @@ class EWM(_Rolling): -------- >>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]}) + >>> df B 0 0.0 1 1.0 diff --git a/pandas/errors/__init__.py b/pandas/errors/__init__.py index eb6a4674a7497..7d5a7f1a99e41 100644 --- a/pandas/errors/__init__.py +++ b/pandas/errors/__init__.py @@ -9,10 +9,10 @@ class PerformanceWarning(Warning): """ - Warning raised when there is a possible - performance impact. + Warning raised when there is a possible performance impact. """ + class UnsupportedFunctionCall(ValueError): """ Exception raised when attempting to call a numpy function @@ -20,6 +20,7 @@ class UnsupportedFunctionCall(ValueError): the object e.g. ``np.cumsum(groupby_object)``. """ + class UnsortedIndexError(KeyError): """ Error raised when attempting to get a slice of a MultiIndex, @@ -31,7 +32,15 @@ class UnsortedIndexError(KeyError): class ParserError(ValueError): """ - Exception that is raised by an error encountered in `pd.read_csv`. + Exception that is raised by an error encountered in parsing file contents. + + This is a generic error raised for errors encountered when functions like + `read_csv` or `read_html` are parsing contents of a file. + + See Also + -------- + read_csv : Read CSV (comma-separated) file into a DataFrame. + read_html : Read HTML table into a DataFrame. """ @@ -45,8 +54,8 @@ class DtypeWarning(Warning): See Also -------- - pandas.read_csv : Read CSV (comma-separated) file into a DataFrame. - pandas.read_table : Read general delimited file into a DataFrame. + read_csv : Read CSV (comma-separated) file into a DataFrame. + read_table : Read general delimited file into a DataFrame. Notes ----- @@ -180,4 +189,4 @@ def __str__(self): else: name = self.class_instance.__class__.__name__ msg = "This {methodtype} must be defined in the concrete class {name}" - return (msg.format(methodtype=self.methodtype, name=name)) + return msg.format(methodtype=self.methodtype, name=name) diff --git a/pandas/io/clipboard/windows.py b/pandas/io/clipboard/windows.py index 3d979a61b5f2d..4f5275af693b7 100644 --- a/pandas/io/clipboard/windows.py +++ b/pandas/io/clipboard/windows.py @@ -29,6 +29,7 @@ def init_windows_clipboard(): HINSTANCE, HMENU, BOOL, UINT, HANDLE) windll = ctypes.windll + msvcrt = ctypes.CDLL('msvcrt') safeCreateWindowExA = CheckedCall(windll.user32.CreateWindowExA) safeCreateWindowExA.argtypes = [DWORD, LPCSTR, LPCSTR, DWORD, INT, INT, @@ -71,6 +72,10 @@ def init_windows_clipboard(): safeGlobalUnlock.argtypes = [HGLOBAL] safeGlobalUnlock.restype = BOOL + wcslen = CheckedCall(msvcrt.wcslen) + wcslen.argtypes = [c_wchar_p] + wcslen.restype = UINT + GMEM_MOVEABLE = 0x0002 CF_UNICODETEXT = 13 @@ -129,13 +134,13 @@ def copy_windows(text): # If the hMem parameter identifies a memory object, # the object must have been allocated using the # function with the GMEM_MOVEABLE flag. - count = len(text) + 1 + count = wcslen(text) + 1 handle = safeGlobalAlloc(GMEM_MOVEABLE, count * sizeof(c_wchar)) locked_handle = safeGlobalLock(handle) - ctypes.memmove(c_wchar_p(locked_handle), - c_wchar_p(text), count * sizeof(c_wchar)) + ctypes.memmove(c_wchar_p(locked_handle), c_wchar_p(text), + count * sizeof(c_wchar)) safeGlobalUnlock(handle) safeSetClipboardData(CF_UNICODETEXT, handle) diff --git a/pandas/io/excel.py b/pandas/io/excel.py deleted file mode 100644 index 3d85ae7fd1f46..0000000000000 --- a/pandas/io/excel.py +++ /dev/null @@ -1,2027 +0,0 @@ -""" -Module parse to/from Excel -""" - -# --------------------------------------------------------------------- -# ExcelFile class -import abc -from datetime import date, datetime, time, timedelta -from distutils.version import LooseVersion -from io import UnsupportedOperation -import os -from textwrap import fill -import warnings - -import numpy as np - -import pandas._libs.json as json -import pandas.compat as compat -from pandas.compat import ( - OrderedDict, add_metaclass, lrange, map, range, string_types, u, zip) -from pandas.errors import EmptyDataError -from pandas.util._decorators import Appender, deprecate_kwarg - -from pandas.core.dtypes.common import ( - is_bool, is_float, is_integer, is_list_like) - -from pandas.core import config -from pandas.core.frame import DataFrame - -from pandas.io.common import ( - _NA_VALUES, _is_url, _stringify_path, _urlopen, _validate_header_arg, - get_filepath_or_buffer) -from pandas.io.formats.printing import pprint_thing -from pandas.io.parsers import TextParser - -__all__ = ["read_excel", "ExcelWriter", "ExcelFile"] - -_writer_extensions = ["xlsx", "xls", "xlsm"] -_writers = {} - -_read_excel_doc = """ -Read an Excel file into a pandas DataFrame. - -Support both `xls` and `xlsx` file extensions from a local filesystem or URL. -Support an option to read a single sheet or a list of sheets. - -Parameters ----------- -io : str, file descriptor, pathlib.Path, ExcelFile or xlrd.Book - The string could be a URL. Valid URL schemes include http, ftp, s3, - gcs, and file. For file URLs, a host is expected. For instance, a local - file could be /path/to/workbook.xlsx. -sheet_name : str, int, list, or None, default 0 - Strings are used for sheet names. Integers are used in zero-indexed - sheet positions. Lists of strings/integers are used to request - multiple sheets. Specify None to get all sheets. - - Available cases: - - * Defaults to ``0``: 1st sheet as a `DataFrame` - * ``1``: 2nd sheet as a `DataFrame` - * ``"Sheet1"``: Load sheet with name "Sheet1" - * ``[0, 1, "Sheet5"]``: Load first, second and sheet named "Sheet5" - as a dict of `DataFrame` - * None: All sheets. - -header : int, list of int, default 0 - Row (0-indexed) to use for the column labels of the parsed - DataFrame. If a list of integers is passed those row positions will - be combined into a ``MultiIndex``. Use None if there is no header. -names : array-like, default None - List of column names to use. If file contains no header row, - then you should explicitly pass header=None. -index_col : int, list of int, default None - Column (0-indexed) to use as the row labels of the DataFrame. - Pass None if there is no such column. If a list is passed, - those columns will be combined into a ``MultiIndex``. If a - subset of data is selected with ``usecols``, index_col - is based on the subset. -parse_cols : int or list, default None - Alias of `usecols`. - - .. deprecated:: 0.21.0 - Use `usecols` instead. - -usecols : int, str, list-like, or callable default None - Return a subset of the columns. - * If None, then parse all columns. - * If int, then indicates last column to be parsed. - - .. deprecated:: 0.24.0 - Pass in a list of int instead from 0 to `usecols` inclusive. - - * If str, then indicates comma separated list of Excel column letters - and column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of - both sides. - * If list of int, then indicates list of column numbers to be parsed. - * If list of string, then indicates list of column names to be parsed. - - .. versionadded:: 0.24.0 - - * If callable, then evaluate each column name against it and parse the - column if the callable returns ``True``. - - .. versionadded:: 0.24.0 - -squeeze : bool, default False - If the parsed data only contains one column then return a Series. -dtype : Type name or dict of column -> type, default None - Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32} - Use `object` to preserve data as stored in Excel and not interpret dtype. - If converters are specified, they will be applied INSTEAD - of dtype conversion. - - .. versionadded:: 0.20.0 - -engine : str, default None - If io is not a buffer or path, this must be set to identify io. - Acceptable values are None or xlrd. -converters : dict, default None - Dict of functions for converting values in certain columns. Keys can - either be integers or column labels, values are functions that take one - input argument, the Excel cell content, and return the transformed - content. -true_values : list, default None - Values to consider as True. - - .. versionadded:: 0.19.0 - -false_values : list, default None - Values to consider as False. - - .. versionadded:: 0.19.0 - -skiprows : list-like - Rows to skip at the beginning (0-indexed). -nrows : int, default None - Number of rows to parse. - - .. versionadded:: 0.23.0 - -na_values : scalar, str, list-like, or dict, default None - Additional strings to recognize as NA/NaN. If dict passed, specific - per-column NA values. By default the following values are interpreted - as NaN: '""" + fill("', '".join(sorted(_NA_VALUES)), 70, subsequent_indent=" ") + """'. -keep_default_na : bool, default True - If na_values are specified and keep_default_na is False the default NaN - values are overridden, otherwise they're appended to. -verbose : bool, default False - Indicate number of NA values placed in non-numeric columns. -parse_dates : bool, list-like, or dict, default False - The behavior is as follows: - - * bool. If True -> try parsing the index. - * list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 - each as a separate date column. - * list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as - a single date column. - * dict, e.g. {{'foo' : [1, 3]}} -> parse columns 1, 3 as date and call - result 'foo' - - If a column or index contains an unparseable date, the entire column or - index will be returned unaltered as an object data type. For non-standard - datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv`` - - Note: A fast-path exists for iso8601-formatted dates. -date_parser : function, optional - Function to use for converting a sequence of string columns to an array of - datetime instances. The default uses ``dateutil.parser.parser`` to do the - conversion. Pandas will try to call `date_parser` in three different ways, - advancing to the next if an exception occurs: 1) Pass one or more arrays - (as defined by `parse_dates`) as arguments; 2) concatenate (row-wise) the - string values from the columns defined by `parse_dates` into a single array - and pass that; and 3) call `date_parser` once for each row using one or - more strings (corresponding to the columns defined by `parse_dates`) as - arguments. -thousands : str, default None - Thousands separator for parsing string columns to numeric. Note that - this parameter is only necessary for columns stored as TEXT in Excel, - any numeric columns will automatically be parsed, regardless of display - format. -comment : str, default None - Comments out remainder of line. Pass a character or characters to this - argument to indicate comments in the input file. Any data between the - comment string and the end of the current line is ignored. -skip_footer : int, default 0 - Alias of `skipfooter`. - - .. deprecated:: 0.23.0 - Use `skipfooter` instead. -skipfooter : int, default 0 - Rows at the end to skip (0-indexed). -convert_float : bool, default True - Convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric - data will be read in as floats: Excel stores all numbers as floats - internally. -mangle_dupe_cols : bool, default True - Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than - 'X'...'X'. Passing in False will cause data to be overwritten if there - are duplicate names in the columns. -**kwds : optional - Optional keyword arguments can be passed to ``TextFileReader``. - -Returns -------- -DataFrame or dict of DataFrames - DataFrame from the passed in Excel file. See notes in sheet_name - argument for more information on when a dict of DataFrames is returned. - -See Also --------- -to_excel : Write DataFrame to an Excel file. -to_csv : Write DataFrame to a comma-separated values (csv) file. -read_csv : Read a comma-separated values (csv) file into DataFrame. -read_fwf : Read a table of fixed-width formatted lines into DataFrame. - -Examples --------- -The file can be read using the file name as string or an open file object: - ->>> pd.read_excel('tmp.xlsx', index_col=0) # doctest: +SKIP - Name Value -0 string1 1 -1 string2 2 -2 #Comment 3 - ->>> pd.read_excel(open('tmp.xlsx', 'rb'), -... sheet_name='Sheet3') # doctest: +SKIP - Unnamed: 0 Name Value -0 0 string1 1 -1 1 string2 2 -2 2 #Comment 3 - -Index and header can be specified via the `index_col` and `header` arguments - ->>> pd.read_excel('tmp.xlsx', index_col=None, header=None) # doctest: +SKIP - 0 1 2 -0 NaN Name Value -1 0.0 string1 1 -2 1.0 string2 2 -3 2.0 #Comment 3 - -Column types are inferred but can be explicitly specified - ->>> pd.read_excel('tmp.xlsx', index_col=0, -... dtype={'Name': str, 'Value': float}) # doctest: +SKIP - Name Value -0 string1 1.0 -1 string2 2.0 -2 #Comment 3.0 - -True, False, and NA values, and thousands separators have defaults, -but can be explicitly specified, too. Supply the values you would like -as strings or lists of strings! - ->>> pd.read_excel('tmp.xlsx', index_col=0, -... na_values=['string1', 'string2']) # doctest: +SKIP - Name Value -0 NaN 1 -1 NaN 2 -2 #Comment 3 - -Comment lines in the excel input file can be skipped using the `comment` kwarg - ->>> pd.read_excel('tmp.xlsx', index_col=0, comment='#') # doctest: +SKIP - Name Value -0 string1 1.0 -1 string2 2.0 -2 None NaN -""" - - -def register_writer(klass): - """Adds engine to the excel writer registry. You must use this method to - integrate with ``to_excel``. Also adds config options for any new - ``supported_extensions`` defined on the writer.""" - if not compat.callable(klass): - raise ValueError("Can only register callables as engines") - engine_name = klass.engine - _writers[engine_name] = klass - for ext in klass.supported_extensions: - if ext.startswith('.'): - ext = ext[1:] - if ext not in _writer_extensions: - config.register_option("io.excel.{ext}.writer".format(ext=ext), - engine_name, validator=str) - _writer_extensions.append(ext) - - -def _get_default_writer(ext): - _default_writers = {'xlsx': 'openpyxl', 'xlsm': 'openpyxl', 'xls': 'xlwt'} - try: - import xlsxwriter # noqa - _default_writers['xlsx'] = 'xlsxwriter' - except ImportError: - pass - return _default_writers[ext] - - -def get_writer(engine_name): - try: - return _writers[engine_name] - except KeyError: - raise ValueError("No Excel writer '{engine}'" - .format(engine=engine_name)) - - -@Appender(_read_excel_doc) -@deprecate_kwarg("parse_cols", "usecols") -@deprecate_kwarg("skip_footer", "skipfooter") -def read_excel(io, - sheet_name=0, - header=0, - names=None, - index_col=None, - parse_cols=None, - usecols=None, - squeeze=False, - dtype=None, - engine=None, - converters=None, - true_values=None, - false_values=None, - skiprows=None, - nrows=None, - na_values=None, - keep_default_na=True, - verbose=False, - parse_dates=False, - date_parser=None, - thousands=None, - comment=None, - skip_footer=0, - skipfooter=0, - convert_float=True, - mangle_dupe_cols=True, - **kwds): - - # Can't use _deprecate_kwarg since sheetname=None has a special meaning - if is_integer(sheet_name) and sheet_name == 0 and 'sheetname' in kwds: - warnings.warn("The `sheetname` keyword is deprecated, use " - "`sheet_name` instead", FutureWarning, stacklevel=2) - sheet_name = kwds.pop("sheetname") - - if 'sheet' in kwds: - raise TypeError("read_excel() got an unexpected keyword argument " - "`sheet`") - - if not isinstance(io, ExcelFile): - io = ExcelFile(io, engine=engine) - - return io.parse( - sheet_name=sheet_name, - header=header, - names=names, - index_col=index_col, - usecols=usecols, - squeeze=squeeze, - dtype=dtype, - converters=converters, - true_values=true_values, - false_values=false_values, - skiprows=skiprows, - nrows=nrows, - na_values=na_values, - keep_default_na=keep_default_na, - verbose=verbose, - parse_dates=parse_dates, - date_parser=date_parser, - thousands=thousands, - comment=comment, - skipfooter=skipfooter, - convert_float=convert_float, - mangle_dupe_cols=mangle_dupe_cols, - **kwds) - - -@add_metaclass(abc.ABCMeta) -class _BaseExcelReader(object): - - @property - @abc.abstractmethod - def sheet_names(self): - pass - - @abc.abstractmethod - def get_sheet_by_name(self, name): - pass - - @abc.abstractmethod - def get_sheet_by_index(self, index): - pass - - @abc.abstractmethod - def get_sheet_data(self, sheet, convert_float): - pass - - def parse(self, - sheet_name=0, - header=0, - names=None, - index_col=None, - usecols=None, - squeeze=False, - dtype=None, - true_values=None, - false_values=None, - skiprows=None, - nrows=None, - na_values=None, - verbose=False, - parse_dates=False, - date_parser=None, - thousands=None, - comment=None, - skipfooter=0, - convert_float=True, - mangle_dupe_cols=True, - **kwds): - - _validate_header_arg(header) - - ret_dict = False - - # Keep sheetname to maintain backwards compatibility. - if isinstance(sheet_name, list): - sheets = sheet_name - ret_dict = True - elif sheet_name is None: - sheets = self.sheet_names - ret_dict = True - else: - sheets = [sheet_name] - - # handle same-type duplicates. - sheets = list(OrderedDict.fromkeys(sheets).keys()) - - output = OrderedDict() - - for asheetname in sheets: - if verbose: - print("Reading sheet {sheet}".format(sheet=asheetname)) - - if isinstance(asheetname, compat.string_types): - sheet = self.get_sheet_by_name(asheetname) - else: # assume an integer if not a string - sheet = self.get_sheet_by_index(asheetname) - - data = self.get_sheet_data(sheet, convert_float) - usecols = _maybe_convert_usecols(usecols) - - if sheet.nrows == 0: - output[asheetname] = DataFrame() - continue - - if is_list_like(header) and len(header) == 1: - header = header[0] - - # forward fill and pull out names for MultiIndex column - header_names = None - if header is not None and is_list_like(header): - header_names = [] - control_row = [True] * len(data[0]) - - for row in header: - if is_integer(skiprows): - row += skiprows - - data[row], control_row = _fill_mi_header(data[row], - control_row) - - if index_col is not None: - header_name, _ = _pop_header_name(data[row], index_col) - header_names.append(header_name) - - if is_list_like(index_col): - # Forward fill values for MultiIndex index. - if not is_list_like(header): - offset = 1 + header - else: - offset = 1 + max(header) - - # Check if we have an empty dataset - # before trying to collect data. - if offset < len(data): - for col in index_col: - last = data[offset][col] - - for row in range(offset + 1, len(data)): - if data[row][col] == '' or data[row][col] is None: - data[row][col] = last - else: - last = data[row][col] - - has_index_names = is_list_like(header) and len(header) > 1 - - # GH 12292 : error when read one empty column from excel file - try: - parser = TextParser(data, - names=names, - header=header, - index_col=index_col, - has_index_names=has_index_names, - squeeze=squeeze, - dtype=dtype, - true_values=true_values, - false_values=false_values, - skiprows=skiprows, - nrows=nrows, - na_values=na_values, - parse_dates=parse_dates, - date_parser=date_parser, - thousands=thousands, - comment=comment, - skipfooter=skipfooter, - usecols=usecols, - mangle_dupe_cols=mangle_dupe_cols, - **kwds) - - output[asheetname] = parser.read(nrows=nrows) - - if not squeeze or isinstance(output[asheetname], DataFrame): - if header_names: - output[asheetname].columns = output[ - asheetname].columns.set_names(header_names) - elif compat.PY2: - output[asheetname].columns = _maybe_convert_to_string( - output[asheetname].columns) - - except EmptyDataError: - # No Data, return an empty DataFrame - output[asheetname] = DataFrame() - - if ret_dict: - return output - else: - return output[asheetname] - - -class _XlrdReader(_BaseExcelReader): - - def __init__(self, filepath_or_buffer): - """Reader using xlrd engine. - - Parameters - ---------- - filepath_or_buffer : string, path object or Workbook - Object to be parsed. - """ - err_msg = "Install xlrd >= 1.0.0 for Excel support" - - try: - import xlrd - except ImportError: - raise ImportError(err_msg) - else: - if xlrd.__VERSION__ < LooseVersion("1.0.0"): - raise ImportError(err_msg + - ". Current version " + xlrd.__VERSION__) - - # If filepath_or_buffer is a url, want to keep the data as bytes so - # can't pass to get_filepath_or_buffer() - if _is_url(filepath_or_buffer): - filepath_or_buffer = _urlopen(filepath_or_buffer) - elif not isinstance(filepath_or_buffer, (ExcelFile, xlrd.Book)): - filepath_or_buffer, _, _, _ = get_filepath_or_buffer( - filepath_or_buffer) - - if isinstance(filepath_or_buffer, xlrd.Book): - self.book = filepath_or_buffer - elif hasattr(filepath_or_buffer, "read"): - # N.B. xlrd.Book has a read attribute too - if hasattr(filepath_or_buffer, 'seek'): - try: - # GH 19779 - filepath_or_buffer.seek(0) - except UnsupportedOperation: - # HTTPResponse does not support seek() - # GH 20434 - pass - - data = filepath_or_buffer.read() - self.book = xlrd.open_workbook(file_contents=data) - elif isinstance(filepath_or_buffer, compat.string_types): - self.book = xlrd.open_workbook(filepath_or_buffer) - else: - raise ValueError('Must explicitly set engine if not passing in' - ' buffer or path for io.') - - @property - def sheet_names(self): - return self.book.sheet_names() - - def get_sheet_by_name(self, name): - return self.book.sheet_by_name(name) - - def get_sheet_by_index(self, index): - return self.book.sheet_by_index(index) - - def get_sheet_data(self, sheet, convert_float): - from xlrd import (xldate, XL_CELL_DATE, - XL_CELL_ERROR, XL_CELL_BOOLEAN, - XL_CELL_NUMBER) - - epoch1904 = self.book.datemode - - def _parse_cell(cell_contents, cell_typ): - """converts the contents of the cell into a pandas - appropriate object""" - - if cell_typ == XL_CELL_DATE: - - # Use the newer xlrd datetime handling. - try: - cell_contents = xldate.xldate_as_datetime( - cell_contents, epoch1904) - except OverflowError: - return cell_contents - - # Excel doesn't distinguish between dates and time, - # so we treat dates on the epoch as times only. - # Also, Excel supports 1900 and 1904 epochs. - year = (cell_contents.timetuple())[0:3] - if ((not epoch1904 and year == (1899, 12, 31)) or - (epoch1904 and year == (1904, 1, 1))): - cell_contents = time(cell_contents.hour, - cell_contents.minute, - cell_contents.second, - cell_contents.microsecond) - - elif cell_typ == XL_CELL_ERROR: - cell_contents = np.nan - elif cell_typ == XL_CELL_BOOLEAN: - cell_contents = bool(cell_contents) - elif convert_float and cell_typ == XL_CELL_NUMBER: - # GH5394 - Excel 'numbers' are always floats - # it's a minimal perf hit and less surprising - val = int(cell_contents) - if val == cell_contents: - cell_contents = val - return cell_contents - - data = [] - - for i in range(sheet.nrows): - row = [_parse_cell(value, typ) - for value, typ in zip(sheet.row_values(i), - sheet.row_types(i))] - data.append(row) - - return data - - -class ExcelFile(object): - """ - Class for parsing tabular excel sheets into DataFrame objects. - Uses xlrd. See read_excel for more documentation - - Parameters - ---------- - io : string, path object (pathlib.Path or py._path.local.LocalPath), - file-like object or xlrd workbook - If a string or path object, expected to be a path to xls or xlsx file. - engine : string, default None - If io is not a buffer or path, this must be set to identify io. - Acceptable values are None or ``xlrd``. - """ - - _engines = { - 'xlrd': _XlrdReader, - } - - def __init__(self, io, engine=None): - if engine is None: - engine = 'xlrd' - if engine not in self._engines: - raise ValueError("Unknown engine: {engine}".format(engine=engine)) - - # could be a str, ExcelFile, Book, etc. - self.io = io - # Always a string - self._io = _stringify_path(io) - - self._reader = self._engines[engine](self._io) - - def __fspath__(self): - return self._io - - def parse(self, - sheet_name=0, - header=0, - names=None, - index_col=None, - usecols=None, - squeeze=False, - converters=None, - true_values=None, - false_values=None, - skiprows=None, - nrows=None, - na_values=None, - parse_dates=False, - date_parser=None, - thousands=None, - comment=None, - skipfooter=0, - convert_float=True, - mangle_dupe_cols=True, - **kwds): - """ - Parse specified sheet(s) into a DataFrame - - Equivalent to read_excel(ExcelFile, ...) See the read_excel - docstring for more info on accepted parameters - """ - - # Can't use _deprecate_kwarg since sheetname=None has a special meaning - if is_integer(sheet_name) and sheet_name == 0 and 'sheetname' in kwds: - warnings.warn("The `sheetname` keyword is deprecated, use " - "`sheet_name` instead", FutureWarning, stacklevel=2) - sheet_name = kwds.pop("sheetname") - elif 'sheetname' in kwds: - raise TypeError("Cannot specify both `sheet_name` " - "and `sheetname`. Use just `sheet_name`") - - if 'chunksize' in kwds: - raise NotImplementedError("chunksize keyword of read_excel " - "is not implemented") - - return self._reader.parse(sheet_name=sheet_name, - header=header, - names=names, - index_col=index_col, - usecols=usecols, - squeeze=squeeze, - converters=converters, - true_values=true_values, - false_values=false_values, - skiprows=skiprows, - nrows=nrows, - na_values=na_values, - parse_dates=parse_dates, - date_parser=date_parser, - thousands=thousands, - comment=comment, - skipfooter=skipfooter, - convert_float=convert_float, - mangle_dupe_cols=mangle_dupe_cols, - **kwds) - - @property - def book(self): - return self._reader.book - - @property - def sheet_names(self): - return self._reader.sheet_names - - def close(self): - """close io if necessary""" - if hasattr(self.io, 'close'): - self.io.close() - - def __enter__(self): - return self - - def __exit__(self, exc_type, exc_value, traceback): - self.close() - - -def _excel2num(x): - """ - Convert Excel column name like 'AB' to 0-based column index. - - Parameters - ---------- - x : str - The Excel column name to convert to a 0-based column index. - - Returns - ------- - num : int - The column index corresponding to the name. - - Raises - ------ - ValueError - Part of the Excel column name was invalid. - """ - index = 0 - - for c in x.upper().strip(): - cp = ord(c) - - if cp < ord("A") or cp > ord("Z"): - raise ValueError("Invalid column name: {x}".format(x=x)) - - index = index * 26 + cp - ord("A") + 1 - - return index - 1 - - -def _range2cols(areas): - """ - Convert comma separated list of column names and ranges to indices. - - Parameters - ---------- - areas : str - A string containing a sequence of column ranges (or areas). - - Returns - ------- - cols : list - A list of 0-based column indices. - - Examples - -------- - >>> _range2cols('A:E') - [0, 1, 2, 3, 4] - >>> _range2cols('A,C,Z:AB') - [0, 2, 25, 26, 27] - """ - cols = [] - - for rng in areas.split(","): - if ":" in rng: - rng = rng.split(":") - cols.extend(lrange(_excel2num(rng[0]), _excel2num(rng[1]) + 1)) - else: - cols.append(_excel2num(rng)) - - return cols - - -def _maybe_convert_usecols(usecols): - """ - Convert `usecols` into a compatible format for parsing in `parsers.py`. - - Parameters - ---------- - usecols : object - The use-columns object to potentially convert. - - Returns - ------- - converted : object - The compatible format of `usecols`. - """ - if usecols is None: - return usecols - - if is_integer(usecols): - warnings.warn(("Passing in an integer for `usecols` has been " - "deprecated. Please pass in a list of int from " - "0 to `usecols` inclusive instead."), - FutureWarning, stacklevel=2) - return lrange(usecols + 1) - - if isinstance(usecols, compat.string_types): - return _range2cols(usecols) - - return usecols - - -def _validate_freeze_panes(freeze_panes): - if freeze_panes is not None: - if ( - len(freeze_panes) == 2 and - all(isinstance(item, int) for item in freeze_panes) - ): - return True - - raise ValueError("freeze_panes must be of form (row, column)" - " where row and column are integers") - - # freeze_panes wasn't specified, return False so it won't be applied - # to output sheet - return False - - -def _trim_excel_header(row): - # trim header row so auto-index inference works - # xlrd uses '' , openpyxl None - while len(row) > 0 and (row[0] == '' or row[0] is None): - row = row[1:] - return row - - -def _maybe_convert_to_string(row): - """ - Convert elements in a row to string from Unicode. - - This is purely a Python 2.x patch and is performed ONLY when all - elements of the row are string-like. - - Parameters - ---------- - row : array-like - The row of data to convert. - - Returns - ------- - converted : array-like - """ - if compat.PY2: - converted = [] - - for i in range(len(row)): - if isinstance(row[i], compat.string_types): - try: - converted.append(str(row[i])) - except UnicodeEncodeError: - break - else: - break - else: - row = converted - - return row - - -def _fill_mi_header(row, control_row): - """Forward fills blank entries in row, but only inside the same parent index - - Used for creating headers in Multiindex. - Parameters - ---------- - row : list - List of items in a single row. - control_row : list of bool - Helps to determine if particular column is in same parent index as the - previous value. Used to stop propagation of empty cells between - different indexes. - - Returns - ---------- - Returns changed row and control_row - """ - last = row[0] - for i in range(1, len(row)): - if not control_row[i]: - last = row[i] - - if row[i] == '' or row[i] is None: - row[i] = last - else: - control_row[i] = False - last = row[i] - - return _maybe_convert_to_string(row), control_row - -# fill blank if index_col not None - - -def _pop_header_name(row, index_col): - """ - Pop the header name for MultiIndex parsing. - - Parameters - ---------- - row : list - The data row to parse for the header name. - index_col : int, list - The index columns for our data. Assumed to be non-null. - - Returns - ------- - header_name : str - The extracted header name. - trimmed_row : list - The original data row with the header name removed. - """ - # Pop out header name and fill w/blank. - i = index_col if not is_list_like(index_col) else max(index_col) - - header_name = row[i] - header_name = None if header_name == "" else header_name - - return header_name, row[:i] + [''] + row[i + 1:] - - -@add_metaclass(abc.ABCMeta) -class ExcelWriter(object): - """ - Class for writing DataFrame objects into excel sheets, default is to use - xlwt for xls, openpyxl for xlsx. See DataFrame.to_excel for typical usage. - - Parameters - ---------- - path : string - Path to xls or xlsx file. - engine : string (optional) - Engine to use for writing. If None, defaults to - ``io.excel..writer``. NOTE: can only be passed as a keyword - argument. - date_format : string, default None - Format string for dates written into Excel files (e.g. 'YYYY-MM-DD') - datetime_format : string, default None - Format string for datetime objects written into Excel files - (e.g. 'YYYY-MM-DD HH:MM:SS') - mode : {'w' or 'a'}, default 'w' - File mode to use (write or append). - - .. versionadded:: 0.24.0 - - Attributes - ---------- - None - - Methods - ------- - None - - Notes - ----- - None of the methods and properties are considered public. - - For compatibility with CSV writers, ExcelWriter serializes lists - and dicts to strings before writing. - - Examples - -------- - Default usage: - - >>> with ExcelWriter('path_to_file.xlsx') as writer: - ... df.to_excel(writer) - - To write to separate sheets in a single file: - - >>> with ExcelWriter('path_to_file.xlsx') as writer: - ... df1.to_excel(writer, sheet_name='Sheet1') - ... df2.to_excel(writer, sheet_name='Sheet2') - - You can set the date format or datetime format: - - >>> with ExcelWriter('path_to_file.xlsx', - date_format='YYYY-MM-DD', - datetime_format='YYYY-MM-DD HH:MM:SS') as writer: - ... df.to_excel(writer) - - You can also append to an existing Excel file: - - >>> with ExcelWriter('path_to_file.xlsx', mode='a') as writer: - ... df.to_excel(writer, sheet_name='Sheet3') - """ - # Defining an ExcelWriter implementation (see abstract methods for more...) - - # - Mandatory - # - ``write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)`` - # --> called to write additional DataFrames to disk - # - ``supported_extensions`` (tuple of supported extensions), used to - # check that engine supports the given extension. - # - ``engine`` - string that gives the engine name. Necessary to - # instantiate class directly and bypass ``ExcelWriterMeta`` engine - # lookup. - # - ``save(self)`` --> called to save file to disk - # - Mostly mandatory (i.e. should at least exist) - # - book, cur_sheet, path - - # - Optional: - # - ``__init__(self, path, engine=None, **kwargs)`` --> always called - # with path as first argument. - - # You also need to register the class with ``register_writer()``. - # Technically, ExcelWriter implementations don't need to subclass - # ExcelWriter. - def __new__(cls, path, engine=None, **kwargs): - # only switch class if generic(ExcelWriter) - - if issubclass(cls, ExcelWriter): - if engine is None or (isinstance(engine, string_types) and - engine == 'auto'): - if isinstance(path, string_types): - ext = os.path.splitext(path)[-1][1:] - else: - ext = 'xlsx' - - try: - engine = config.get_option('io.excel.{ext}.writer' - .format(ext=ext)) - if engine == 'auto': - engine = _get_default_writer(ext) - except KeyError: - error = ValueError("No engine for filetype: '{ext}'" - .format(ext=ext)) - raise error - cls = get_writer(engine) - - return object.__new__(cls) - - # declare external properties you can count on - book = None - curr_sheet = None - path = None - - @abc.abstractproperty - def supported_extensions(self): - "extensions that writer engine supports" - pass - - @abc.abstractproperty - def engine(self): - "name of engine" - pass - - @abc.abstractmethod - def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, - freeze_panes=None): - """ - Write given formatted cells into Excel an excel sheet - - Parameters - ---------- - cells : generator - cell of formatted data to save to Excel sheet - sheet_name : string, default None - Name of Excel sheet, if None, then use self.cur_sheet - startrow : upper left cell row to dump data frame - startcol : upper left cell column to dump data frame - freeze_panes: integer tuple of length 2 - contains the bottom-most row and right-most column to freeze - """ - pass - - @abc.abstractmethod - def save(self): - """ - Save workbook to disk. - """ - pass - - def __init__(self, path, engine=None, - date_format=None, datetime_format=None, mode='w', - **engine_kwargs): - # validate that this engine can handle the extension - if isinstance(path, string_types): - ext = os.path.splitext(path)[-1] - else: - ext = 'xls' if engine == 'xlwt' else 'xlsx' - - self.check_extension(ext) - - self.path = path - self.sheets = {} - self.cur_sheet = None - - if date_format is None: - self.date_format = 'YYYY-MM-DD' - else: - self.date_format = date_format - if datetime_format is None: - self.datetime_format = 'YYYY-MM-DD HH:MM:SS' - else: - self.datetime_format = datetime_format - - self.mode = mode - - def __fspath__(self): - return _stringify_path(self.path) - - def _get_sheet_name(self, sheet_name): - if sheet_name is None: - sheet_name = self.cur_sheet - if sheet_name is None: # pragma: no cover - raise ValueError('Must pass explicit sheet_name or set ' - 'cur_sheet property') - return sheet_name - - def _value_with_fmt(self, val): - """Convert numpy types to Python types for the Excel writers. - - Parameters - ---------- - val : object - Value to be written into cells - - Returns - ------- - Tuple with the first element being the converted value and the second - being an optional format - """ - fmt = None - - if is_integer(val): - val = int(val) - elif is_float(val): - val = float(val) - elif is_bool(val): - val = bool(val) - elif isinstance(val, datetime): - fmt = self.datetime_format - elif isinstance(val, date): - fmt = self.date_format - elif isinstance(val, timedelta): - val = val.total_seconds() / float(86400) - fmt = '0' - else: - val = compat.to_str(val) - - return val, fmt - - @classmethod - def check_extension(cls, ext): - """checks that path's extension against the Writer's supported - extensions. If it isn't supported, raises UnsupportedFiletypeError.""" - if ext.startswith('.'): - ext = ext[1:] - if not any(ext in extension for extension in cls.supported_extensions): - msg = (u("Invalid extension for engine '{engine}': '{ext}'") - .format(engine=pprint_thing(cls.engine), - ext=pprint_thing(ext))) - raise ValueError(msg) - else: - return True - - # Allow use as a contextmanager - def __enter__(self): - return self - - def __exit__(self, exc_type, exc_value, traceback): - self.close() - - def close(self): - """synonym for save, to make it more file-like""" - return self.save() - - -class _OpenpyxlWriter(ExcelWriter): - engine = 'openpyxl' - supported_extensions = ('.xlsx', '.xlsm') - - def __init__(self, path, engine=None, mode='w', **engine_kwargs): - # Use the openpyxl module as the Excel writer. - from openpyxl.workbook import Workbook - - super(_OpenpyxlWriter, self).__init__(path, mode=mode, **engine_kwargs) - - if self.mode == 'a': # Load from existing workbook - from openpyxl import load_workbook - book = load_workbook(self.path) - self.book = book - else: - # Create workbook object with default optimized_write=True. - self.book = Workbook() - - if self.book.worksheets: - try: - self.book.remove(self.book.worksheets[0]) - except AttributeError: - - # compat - for openpyxl <= 2.4 - self.book.remove_sheet(self.book.worksheets[0]) - - def save(self): - """ - Save workbook to disk. - """ - return self.book.save(self.path) - - @classmethod - def _convert_to_style(cls, style_dict): - """ - converts a style_dict to an openpyxl style object - Parameters - ---------- - style_dict : style dictionary to convert - """ - - from openpyxl.style import Style - xls_style = Style() - for key, value in style_dict.items(): - for nk, nv in value.items(): - if key == "borders": - (xls_style.borders.__getattribute__(nk) - .__setattr__('border_style', nv)) - else: - xls_style.__getattribute__(key).__setattr__(nk, nv) - - return xls_style - - @classmethod - def _convert_to_style_kwargs(cls, style_dict): - """ - Convert a style_dict to a set of kwargs suitable for initializing - or updating-on-copy an openpyxl v2 style object - Parameters - ---------- - style_dict : dict - A dict with zero or more of the following keys (or their synonyms). - 'font' - 'fill' - 'border' ('borders') - 'alignment' - 'number_format' - 'protection' - Returns - ------- - style_kwargs : dict - A dict with the same, normalized keys as ``style_dict`` but each - value has been replaced with a native openpyxl style object of the - appropriate class. - """ - - _style_key_map = { - 'borders': 'border', - } - - style_kwargs = {} - for k, v in style_dict.items(): - if k in _style_key_map: - k = _style_key_map[k] - _conv_to_x = getattr(cls, '_convert_to_{k}'.format(k=k), - lambda x: None) - new_v = _conv_to_x(v) - if new_v: - style_kwargs[k] = new_v - - return style_kwargs - - @classmethod - def _convert_to_color(cls, color_spec): - """ - Convert ``color_spec`` to an openpyxl v2 Color object - Parameters - ---------- - color_spec : str, dict - A 32-bit ARGB hex string, or a dict with zero or more of the - following keys. - 'rgb' - 'indexed' - 'auto' - 'theme' - 'tint' - 'index' - 'type' - Returns - ------- - color : openpyxl.styles.Color - """ - - from openpyxl.styles import Color - - if isinstance(color_spec, str): - return Color(color_spec) - else: - return Color(**color_spec) - - @classmethod - def _convert_to_font(cls, font_dict): - """ - Convert ``font_dict`` to an openpyxl v2 Font object - Parameters - ---------- - font_dict : dict - A dict with zero or more of the following keys (or their synonyms). - 'name' - 'size' ('sz') - 'bold' ('b') - 'italic' ('i') - 'underline' ('u') - 'strikethrough' ('strike') - 'color' - 'vertAlign' ('vertalign') - 'charset' - 'scheme' - 'family' - 'outline' - 'shadow' - 'condense' - Returns - ------- - font : openpyxl.styles.Font - """ - - from openpyxl.styles import Font - - _font_key_map = { - 'sz': 'size', - 'b': 'bold', - 'i': 'italic', - 'u': 'underline', - 'strike': 'strikethrough', - 'vertalign': 'vertAlign', - } - - font_kwargs = {} - for k, v in font_dict.items(): - if k in _font_key_map: - k = _font_key_map[k] - if k == 'color': - v = cls._convert_to_color(v) - font_kwargs[k] = v - - return Font(**font_kwargs) - - @classmethod - def _convert_to_stop(cls, stop_seq): - """ - Convert ``stop_seq`` to a list of openpyxl v2 Color objects, - suitable for initializing the ``GradientFill`` ``stop`` parameter. - Parameters - ---------- - stop_seq : iterable - An iterable that yields objects suitable for consumption by - ``_convert_to_color``. - Returns - ------- - stop : list of openpyxl.styles.Color - """ - - return map(cls._convert_to_color, stop_seq) - - @classmethod - def _convert_to_fill(cls, fill_dict): - """ - Convert ``fill_dict`` to an openpyxl v2 Fill object - Parameters - ---------- - fill_dict : dict - A dict with one or more of the following keys (or their synonyms), - 'fill_type' ('patternType', 'patterntype') - 'start_color' ('fgColor', 'fgcolor') - 'end_color' ('bgColor', 'bgcolor') - or one or more of the following keys (or their synonyms). - 'type' ('fill_type') - 'degree' - 'left' - 'right' - 'top' - 'bottom' - 'stop' - Returns - ------- - fill : openpyxl.styles.Fill - """ - - from openpyxl.styles import PatternFill, GradientFill - - _pattern_fill_key_map = { - 'patternType': 'fill_type', - 'patterntype': 'fill_type', - 'fgColor': 'start_color', - 'fgcolor': 'start_color', - 'bgColor': 'end_color', - 'bgcolor': 'end_color', - } - - _gradient_fill_key_map = { - 'fill_type': 'type', - } - - pfill_kwargs = {} - gfill_kwargs = {} - for k, v in fill_dict.items(): - pk = gk = None - if k in _pattern_fill_key_map: - pk = _pattern_fill_key_map[k] - if k in _gradient_fill_key_map: - gk = _gradient_fill_key_map[k] - if pk in ['start_color', 'end_color']: - v = cls._convert_to_color(v) - if gk == 'stop': - v = cls._convert_to_stop(v) - if pk: - pfill_kwargs[pk] = v - elif gk: - gfill_kwargs[gk] = v - else: - pfill_kwargs[k] = v - gfill_kwargs[k] = v - - try: - return PatternFill(**pfill_kwargs) - except TypeError: - return GradientFill(**gfill_kwargs) - - @classmethod - def _convert_to_side(cls, side_spec): - """ - Convert ``side_spec`` to an openpyxl v2 Side object - Parameters - ---------- - side_spec : str, dict - A string specifying the border style, or a dict with zero or more - of the following keys (or their synonyms). - 'style' ('border_style') - 'color' - Returns - ------- - side : openpyxl.styles.Side - """ - - from openpyxl.styles import Side - - _side_key_map = { - 'border_style': 'style', - } - - if isinstance(side_spec, str): - return Side(style=side_spec) - - side_kwargs = {} - for k, v in side_spec.items(): - if k in _side_key_map: - k = _side_key_map[k] - if k == 'color': - v = cls._convert_to_color(v) - side_kwargs[k] = v - - return Side(**side_kwargs) - - @classmethod - def _convert_to_border(cls, border_dict): - """ - Convert ``border_dict`` to an openpyxl v2 Border object - Parameters - ---------- - border_dict : dict - A dict with zero or more of the following keys (or their synonyms). - 'left' - 'right' - 'top' - 'bottom' - 'diagonal' - 'diagonal_direction' - 'vertical' - 'horizontal' - 'diagonalUp' ('diagonalup') - 'diagonalDown' ('diagonaldown') - 'outline' - Returns - ------- - border : openpyxl.styles.Border - """ - - from openpyxl.styles import Border - - _border_key_map = { - 'diagonalup': 'diagonalUp', - 'diagonaldown': 'diagonalDown', - } - - border_kwargs = {} - for k, v in border_dict.items(): - if k in _border_key_map: - k = _border_key_map[k] - if k == 'color': - v = cls._convert_to_color(v) - if k in ['left', 'right', 'top', 'bottom', 'diagonal']: - v = cls._convert_to_side(v) - border_kwargs[k] = v - - return Border(**border_kwargs) - - @classmethod - def _convert_to_alignment(cls, alignment_dict): - """ - Convert ``alignment_dict`` to an openpyxl v2 Alignment object - Parameters - ---------- - alignment_dict : dict - A dict with zero or more of the following keys (or their synonyms). - 'horizontal' - 'vertical' - 'text_rotation' - 'wrap_text' - 'shrink_to_fit' - 'indent' - Returns - ------- - alignment : openpyxl.styles.Alignment - """ - - from openpyxl.styles import Alignment - - return Alignment(**alignment_dict) - - @classmethod - def _convert_to_number_format(cls, number_format_dict): - """ - Convert ``number_format_dict`` to an openpyxl v2.1.0 number format - initializer. - Parameters - ---------- - number_format_dict : dict - A dict with zero or more of the following keys. - 'format_code' : str - Returns - ------- - number_format : str - """ - return number_format_dict['format_code'] - - @classmethod - def _convert_to_protection(cls, protection_dict): - """ - Convert ``protection_dict`` to an openpyxl v2 Protection object. - Parameters - ---------- - protection_dict : dict - A dict with zero or more of the following keys. - 'locked' - 'hidden' - Returns - ------- - """ - - from openpyxl.styles import Protection - - return Protection(**protection_dict) - - def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, - freeze_panes=None): - # Write the frame cells using openpyxl. - sheet_name = self._get_sheet_name(sheet_name) - - _style_cache = {} - - if sheet_name in self.sheets: - wks = self.sheets[sheet_name] - else: - wks = self.book.create_sheet() - wks.title = sheet_name - self.sheets[sheet_name] = wks - - if _validate_freeze_panes(freeze_panes): - wks.freeze_panes = wks.cell(row=freeze_panes[0] + 1, - column=freeze_panes[1] + 1) - - for cell in cells: - xcell = wks.cell( - row=startrow + cell.row + 1, - column=startcol + cell.col + 1 - ) - xcell.value, fmt = self._value_with_fmt(cell.val) - if fmt: - xcell.number_format = fmt - - style_kwargs = {} - if cell.style: - key = str(cell.style) - style_kwargs = _style_cache.get(key) - if style_kwargs is None: - style_kwargs = self._convert_to_style_kwargs(cell.style) - _style_cache[key] = style_kwargs - - if style_kwargs: - for k, v in style_kwargs.items(): - setattr(xcell, k, v) - - if cell.mergestart is not None and cell.mergeend is not None: - - wks.merge_cells( - start_row=startrow + cell.row + 1, - start_column=startcol + cell.col + 1, - end_column=startcol + cell.mergeend + 1, - end_row=startrow + cell.mergestart + 1 - ) - - # When cells are merged only the top-left cell is preserved - # The behaviour of the other cells in a merged range is - # undefined - if style_kwargs: - first_row = startrow + cell.row + 1 - last_row = startrow + cell.mergestart + 1 - first_col = startcol + cell.col + 1 - last_col = startcol + cell.mergeend + 1 - - for row in range(first_row, last_row + 1): - for col in range(first_col, last_col + 1): - if row == first_row and col == first_col: - # Ignore first cell. It is already handled. - continue - xcell = wks.cell(column=col, row=row) - for k, v in style_kwargs.items(): - setattr(xcell, k, v) - - -register_writer(_OpenpyxlWriter) - - -class _XlwtWriter(ExcelWriter): - engine = 'xlwt' - supported_extensions = ('.xls',) - - def __init__(self, path, engine=None, encoding=None, mode='w', - **engine_kwargs): - # Use the xlwt module as the Excel writer. - import xlwt - engine_kwargs['engine'] = engine - - if mode == 'a': - raise ValueError('Append mode is not supported with xlwt!') - - super(_XlwtWriter, self).__init__(path, mode=mode, **engine_kwargs) - - if encoding is None: - encoding = 'ascii' - self.book = xlwt.Workbook(encoding=encoding) - self.fm_datetime = xlwt.easyxf(num_format_str=self.datetime_format) - self.fm_date = xlwt.easyxf(num_format_str=self.date_format) - - def save(self): - """ - Save workbook to disk. - """ - return self.book.save(self.path) - - def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, - freeze_panes=None): - # Write the frame cells using xlwt. - - sheet_name = self._get_sheet_name(sheet_name) - - if sheet_name in self.sheets: - wks = self.sheets[sheet_name] - else: - wks = self.book.add_sheet(sheet_name) - self.sheets[sheet_name] = wks - - if _validate_freeze_panes(freeze_panes): - wks.set_panes_frozen(True) - wks.set_horz_split_pos(freeze_panes[0]) - wks.set_vert_split_pos(freeze_panes[1]) - - style_dict = {} - - for cell in cells: - val, fmt = self._value_with_fmt(cell.val) - - stylekey = json.dumps(cell.style) - if fmt: - stylekey += fmt - - if stylekey in style_dict: - style = style_dict[stylekey] - else: - style = self._convert_to_style(cell.style, fmt) - style_dict[stylekey] = style - - if cell.mergestart is not None and cell.mergeend is not None: - wks.write_merge(startrow + cell.row, - startrow + cell.mergestart, - startcol + cell.col, - startcol + cell.mergeend, - val, style) - else: - wks.write(startrow + cell.row, - startcol + cell.col, - val, style) - - @classmethod - def _style_to_xlwt(cls, item, firstlevel=True, field_sep=',', - line_sep=';'): - """helper which recursively generate an xlwt easy style string - for example: - - hstyle = {"font": {"bold": True}, - "border": {"top": "thin", - "right": "thin", - "bottom": "thin", - "left": "thin"}, - "align": {"horiz": "center"}} - will be converted to - font: bold on; \ - border: top thin, right thin, bottom thin, left thin; \ - align: horiz center; - """ - if hasattr(item, 'items'): - if firstlevel: - it = ["{key}: {val}" - .format(key=key, val=cls._style_to_xlwt(value, False)) - for key, value in item.items()] - out = "{sep} ".format(sep=(line_sep).join(it)) - return out - else: - it = ["{key} {val}" - .format(key=key, val=cls._style_to_xlwt(value, False)) - for key, value in item.items()] - out = "{sep} ".format(sep=(field_sep).join(it)) - return out - else: - item = "{item}".format(item=item) - item = item.replace("True", "on") - item = item.replace("False", "off") - return item - - @classmethod - def _convert_to_style(cls, style_dict, num_format_str=None): - """ - converts a style_dict to an xlwt style object - Parameters - ---------- - style_dict : style dictionary to convert - num_format_str : optional number format string - """ - import xlwt - - if style_dict: - xlwt_stylestr = cls._style_to_xlwt(style_dict) - style = xlwt.easyxf(xlwt_stylestr, field_sep=',', line_sep=';') - else: - style = xlwt.XFStyle() - if num_format_str is not None: - style.num_format_str = num_format_str - - return style - - -register_writer(_XlwtWriter) - - -class _XlsxStyler(object): - # Map from openpyxl-oriented styles to flatter xlsxwriter representation - # Ordering necessary for both determinism and because some are keyed by - # prefixes of others. - STYLE_MAPPING = { - 'font': [ - (('name',), 'font_name'), - (('sz',), 'font_size'), - (('size',), 'font_size'), - (('color', 'rgb',), 'font_color'), - (('color',), 'font_color'), - (('b',), 'bold'), - (('bold',), 'bold'), - (('i',), 'italic'), - (('italic',), 'italic'), - (('u',), 'underline'), - (('underline',), 'underline'), - (('strike',), 'font_strikeout'), - (('vertAlign',), 'font_script'), - (('vertalign',), 'font_script'), - ], - 'number_format': [ - (('format_code',), 'num_format'), - ((), 'num_format',), - ], - 'protection': [ - (('locked',), 'locked'), - (('hidden',), 'hidden'), - ], - 'alignment': [ - (('horizontal',), 'align'), - (('vertical',), 'valign'), - (('text_rotation',), 'rotation'), - (('wrap_text',), 'text_wrap'), - (('indent',), 'indent'), - (('shrink_to_fit',), 'shrink'), - ], - 'fill': [ - (('patternType',), 'pattern'), - (('patterntype',), 'pattern'), - (('fill_type',), 'pattern'), - (('start_color', 'rgb',), 'fg_color'), - (('fgColor', 'rgb',), 'fg_color'), - (('fgcolor', 'rgb',), 'fg_color'), - (('start_color',), 'fg_color'), - (('fgColor',), 'fg_color'), - (('fgcolor',), 'fg_color'), - (('end_color', 'rgb',), 'bg_color'), - (('bgColor', 'rgb',), 'bg_color'), - (('bgcolor', 'rgb',), 'bg_color'), - (('end_color',), 'bg_color'), - (('bgColor',), 'bg_color'), - (('bgcolor',), 'bg_color'), - ], - 'border': [ - (('color', 'rgb',), 'border_color'), - (('color',), 'border_color'), - (('style',), 'border'), - (('top', 'color', 'rgb',), 'top_color'), - (('top', 'color',), 'top_color'), - (('top', 'style',), 'top'), - (('top',), 'top'), - (('right', 'color', 'rgb',), 'right_color'), - (('right', 'color',), 'right_color'), - (('right', 'style',), 'right'), - (('right',), 'right'), - (('bottom', 'color', 'rgb',), 'bottom_color'), - (('bottom', 'color',), 'bottom_color'), - (('bottom', 'style',), 'bottom'), - (('bottom',), 'bottom'), - (('left', 'color', 'rgb',), 'left_color'), - (('left', 'color',), 'left_color'), - (('left', 'style',), 'left'), - (('left',), 'left'), - ], - } - - @classmethod - def convert(cls, style_dict, num_format_str=None): - """ - converts a style_dict to an xlsxwriter format dict - - Parameters - ---------- - style_dict : style dictionary to convert - num_format_str : optional number format string - """ - - # Create a XlsxWriter format object. - props = {} - - if num_format_str is not None: - props['num_format'] = num_format_str - - if style_dict is None: - return props - - if 'borders' in style_dict: - style_dict = style_dict.copy() - style_dict['border'] = style_dict.pop('borders') - - for style_group_key, style_group in style_dict.items(): - for src, dst in cls.STYLE_MAPPING.get(style_group_key, []): - # src is a sequence of keys into a nested dict - # dst is a flat key - if dst in props: - continue - v = style_group - for k in src: - try: - v = v[k] - except (KeyError, TypeError): - break - else: - props[dst] = v - - if isinstance(props.get('pattern'), string_types): - # TODO: support other fill patterns - props['pattern'] = 0 if props['pattern'] == 'none' else 1 - - for k in ['border', 'top', 'right', 'bottom', 'left']: - if isinstance(props.get(k), string_types): - try: - props[k] = ['none', 'thin', 'medium', 'dashed', 'dotted', - 'thick', 'double', 'hair', 'mediumDashed', - 'dashDot', 'mediumDashDot', 'dashDotDot', - 'mediumDashDotDot', - 'slantDashDot'].index(props[k]) - except ValueError: - props[k] = 2 - - if isinstance(props.get('font_script'), string_types): - props['font_script'] = ['baseline', 'superscript', - 'subscript'].index(props['font_script']) - - if isinstance(props.get('underline'), string_types): - props['underline'] = {'none': 0, 'single': 1, 'double': 2, - 'singleAccounting': 33, - 'doubleAccounting': 34}[props['underline']] - - return props - - -class _XlsxWriter(ExcelWriter): - engine = 'xlsxwriter' - supported_extensions = ('.xlsx',) - - def __init__(self, path, engine=None, - date_format=None, datetime_format=None, mode='w', - **engine_kwargs): - # Use the xlsxwriter module as the Excel writer. - import xlsxwriter - - if mode == 'a': - raise ValueError('Append mode is not supported with xlsxwriter!') - - super(_XlsxWriter, self).__init__(path, engine=engine, - date_format=date_format, - datetime_format=datetime_format, - mode=mode, - **engine_kwargs) - - self.book = xlsxwriter.Workbook(path, **engine_kwargs) - - def save(self): - """ - Save workbook to disk. - """ - - return self.book.close() - - def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, - freeze_panes=None): - # Write the frame cells using xlsxwriter. - sheet_name = self._get_sheet_name(sheet_name) - - if sheet_name in self.sheets: - wks = self.sheets[sheet_name] - else: - wks = self.book.add_worksheet(sheet_name) - self.sheets[sheet_name] = wks - - style_dict = {'null': None} - - if _validate_freeze_panes(freeze_panes): - wks.freeze_panes(*(freeze_panes)) - - for cell in cells: - val, fmt = self._value_with_fmt(cell.val) - - stylekey = json.dumps(cell.style) - if fmt: - stylekey += fmt - - if stylekey in style_dict: - style = style_dict[stylekey] - else: - style = self.book.add_format( - _XlsxStyler.convert(cell.style, fmt)) - style_dict[stylekey] = style - - if cell.mergestart is not None and cell.mergeend is not None: - wks.merge_range(startrow + cell.row, - startcol + cell.col, - startrow + cell.mergestart, - startcol + cell.mergeend, - cell.val, style) - else: - wks.write(startrow + cell.row, - startcol + cell.col, - val, style) - - -register_writer(_XlsxWriter) diff --git a/pandas/io/excel/__init__.py b/pandas/io/excel/__init__.py new file mode 100644 index 0000000000000..704789cb6061e --- /dev/null +++ b/pandas/io/excel/__init__.py @@ -0,0 +1,16 @@ +from pandas.io.excel._base import read_excel, ExcelWriter, ExcelFile +from pandas.io.excel._openpyxl import _OpenpyxlWriter +from pandas.io.excel._util import register_writer +from pandas.io.excel._xlsxwriter import _XlsxWriter +from pandas.io.excel._xlwt import _XlwtWriter + +__all__ = ["read_excel", "ExcelWriter", "ExcelFile"] + + +register_writer(_OpenpyxlWriter) + + +register_writer(_XlwtWriter) + + +register_writer(_XlsxWriter) diff --git a/pandas/io/excel/_base.py b/pandas/io/excel/_base.py new file mode 100644 index 0000000000000..c6d390692c789 --- /dev/null +++ b/pandas/io/excel/_base.py @@ -0,0 +1,852 @@ +import abc +from collections import OrderedDict +from datetime import date, datetime, timedelta +import os +from textwrap import fill +import warnings + +import pandas.compat as compat +from pandas.compat import add_metaclass, range, string_types, u +from pandas.errors import EmptyDataError +from pandas.util._decorators import Appender, deprecate_kwarg + +from pandas.core.dtypes.common import ( + is_bool, is_float, is_integer, is_list_like) + +from pandas.core import config +from pandas.core.frame import DataFrame + +from pandas.io.common import _NA_VALUES, _stringify_path, _validate_header_arg +from pandas.io.excel._util import ( + _fill_mi_header, _get_default_writer, _maybe_convert_to_string, + _maybe_convert_usecols, _pop_header_name, get_writer) +from pandas.io.formats.printing import pprint_thing +from pandas.io.parsers import TextParser + +_read_excel_doc = """ +Read an Excel file into a pandas DataFrame. + +Support both `xls` and `xlsx` file extensions from a local filesystem or URL. +Support an option to read a single sheet or a list of sheets. + +Parameters +---------- +io : str, file descriptor, pathlib.Path, ExcelFile or xlrd.Book + The string could be a URL. Valid URL schemes include http, ftp, s3, + gcs, and file. For file URLs, a host is expected. For instance, a local + file could be /path/to/workbook.xlsx. +sheet_name : str, int, list, or None, default 0 + Strings are used for sheet names. Integers are used in zero-indexed + sheet positions. Lists of strings/integers are used to request + multiple sheets. Specify None to get all sheets. + + Available cases: + + * Defaults to ``0``: 1st sheet as a `DataFrame` + * ``1``: 2nd sheet as a `DataFrame` + * ``"Sheet1"``: Load sheet with name "Sheet1" + * ``[0, 1, "Sheet5"]``: Load first, second and sheet named "Sheet5" + as a dict of `DataFrame` + * None: All sheets. + +header : int, list of int, default 0 + Row (0-indexed) to use for the column labels of the parsed + DataFrame. If a list of integers is passed those row positions will + be combined into a ``MultiIndex``. Use None if there is no header. +names : array-like, default None + List of column names to use. If file contains no header row, + then you should explicitly pass header=None. +index_col : int, list of int, default None + Column (0-indexed) to use as the row labels of the DataFrame. + Pass None if there is no such column. If a list is passed, + those columns will be combined into a ``MultiIndex``. If a + subset of data is selected with ``usecols``, index_col + is based on the subset. +parse_cols : int or list, default None + Alias of `usecols`. + + .. deprecated:: 0.21.0 + Use `usecols` instead. + +usecols : int, str, list-like, or callable default None + Return a subset of the columns. + * If None, then parse all columns. + * If int, then indicates last column to be parsed. + + .. deprecated:: 0.24.0 + Pass in a list of int instead from 0 to `usecols` inclusive. + + * If str, then indicates comma separated list of Excel column letters + and column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of + both sides. + * If list of int, then indicates list of column numbers to be parsed. + * If list of string, then indicates list of column names to be parsed. + + .. versionadded:: 0.24.0 + + * If callable, then evaluate each column name against it and parse the + column if the callable returns ``True``. + + .. versionadded:: 0.24.0 + +squeeze : bool, default False + If the parsed data only contains one column then return a Series. +dtype : Type name or dict of column -> type, default None + Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32} + Use `object` to preserve data as stored in Excel and not interpret dtype. + If converters are specified, they will be applied INSTEAD + of dtype conversion. + + .. versionadded:: 0.20.0 + +engine : str, default None + If io is not a buffer or path, this must be set to identify io. + Acceptable values are None or xlrd. +converters : dict, default None + Dict of functions for converting values in certain columns. Keys can + either be integers or column labels, values are functions that take one + input argument, the Excel cell content, and return the transformed + content. +true_values : list, default None + Values to consider as True. + + .. versionadded:: 0.19.0 + +false_values : list, default None + Values to consider as False. + + .. versionadded:: 0.19.0 + +skiprows : list-like + Rows to skip at the beginning (0-indexed). +nrows : int, default None + Number of rows to parse. + + .. versionadded:: 0.23.0 + +na_values : scalar, str, list-like, or dict, default None + Additional strings to recognize as NA/NaN. If dict passed, specific + per-column NA values. By default the following values are interpreted + as NaN: '""" + fill("', '".join( + sorted(_NA_VALUES)), 70, subsequent_indent=" ") + """'. +keep_default_na : bool, default True + If na_values are specified and keep_default_na is False the default NaN + values are overridden, otherwise they're appended to. +verbose : bool, default False + Indicate number of NA values placed in non-numeric columns. +parse_dates : bool, list-like, or dict, default False + The behavior is as follows: + + * bool. If True -> try parsing the index. + * list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 + each as a separate date column. + * list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as + a single date column. + * dict, e.g. {{'foo' : [1, 3]}} -> parse columns 1, 3 as date and call + result 'foo' + + If a column or index contains an unparseable date, the entire column or + index will be returned unaltered as an object data type. For non-standard + datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv`` + + Note: A fast-path exists for iso8601-formatted dates. +date_parser : function, optional + Function to use for converting a sequence of string columns to an array of + datetime instances. The default uses ``dateutil.parser.parser`` to do the + conversion. Pandas will try to call `date_parser` in three different ways, + advancing to the next if an exception occurs: 1) Pass one or more arrays + (as defined by `parse_dates`) as arguments; 2) concatenate (row-wise) the + string values from the columns defined by `parse_dates` into a single array + and pass that; and 3) call `date_parser` once for each row using one or + more strings (corresponding to the columns defined by `parse_dates`) as + arguments. +thousands : str, default None + Thousands separator for parsing string columns to numeric. Note that + this parameter is only necessary for columns stored as TEXT in Excel, + any numeric columns will automatically be parsed, regardless of display + format. +comment : str, default None + Comments out remainder of line. Pass a character or characters to this + argument to indicate comments in the input file. Any data between the + comment string and the end of the current line is ignored. +skip_footer : int, default 0 + Alias of `skipfooter`. + + .. deprecated:: 0.23.0 + Use `skipfooter` instead. +skipfooter : int, default 0 + Rows at the end to skip (0-indexed). +convert_float : bool, default True + Convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric + data will be read in as floats: Excel stores all numbers as floats + internally. +mangle_dupe_cols : bool, default True + Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than + 'X'...'X'. Passing in False will cause data to be overwritten if there + are duplicate names in the columns. +**kwds : optional + Optional keyword arguments can be passed to ``TextFileReader``. + +Returns +------- +DataFrame or dict of DataFrames + DataFrame from the passed in Excel file. See notes in sheet_name + argument for more information on when a dict of DataFrames is returned. + +See Also +-------- +to_excel : Write DataFrame to an Excel file. +to_csv : Write DataFrame to a comma-separated values (csv) file. +read_csv : Read a comma-separated values (csv) file into DataFrame. +read_fwf : Read a table of fixed-width formatted lines into DataFrame. + +Examples +-------- +The file can be read using the file name as string or an open file object: + +>>> pd.read_excel('tmp.xlsx', index_col=0) # doctest: +SKIP + Name Value +0 string1 1 +1 string2 2 +2 #Comment 3 + +>>> pd.read_excel(open('tmp.xlsx', 'rb'), +... sheet_name='Sheet3') # doctest: +SKIP + Unnamed: 0 Name Value +0 0 string1 1 +1 1 string2 2 +2 2 #Comment 3 + +Index and header can be specified via the `index_col` and `header` arguments + +>>> pd.read_excel('tmp.xlsx', index_col=None, header=None) # doctest: +SKIP + 0 1 2 +0 NaN Name Value +1 0.0 string1 1 +2 1.0 string2 2 +3 2.0 #Comment 3 + +Column types are inferred but can be explicitly specified + +>>> pd.read_excel('tmp.xlsx', index_col=0, +... dtype={'Name': str, 'Value': float}) # doctest: +SKIP + Name Value +0 string1 1.0 +1 string2 2.0 +2 #Comment 3.0 + +True, False, and NA values, and thousands separators have defaults, +but can be explicitly specified, too. Supply the values you would like +as strings or lists of strings! + +>>> pd.read_excel('tmp.xlsx', index_col=0, +... na_values=['string1', 'string2']) # doctest: +SKIP + Name Value +0 NaN 1 +1 NaN 2 +2 #Comment 3 + +Comment lines in the excel input file can be skipped using the `comment` kwarg + +>>> pd.read_excel('tmp.xlsx', index_col=0, comment='#') # doctest: +SKIP + Name Value +0 string1 1.0 +1 string2 2.0 +2 None NaN +""" + + +@Appender(_read_excel_doc) +@deprecate_kwarg("parse_cols", "usecols") +@deprecate_kwarg("skip_footer", "skipfooter") +def read_excel(io, + sheet_name=0, + header=0, + names=None, + index_col=None, + parse_cols=None, + usecols=None, + squeeze=False, + dtype=None, + engine=None, + converters=None, + true_values=None, + false_values=None, + skiprows=None, + nrows=None, + na_values=None, + keep_default_na=True, + verbose=False, + parse_dates=False, + date_parser=None, + thousands=None, + comment=None, + skip_footer=0, + skipfooter=0, + convert_float=True, + mangle_dupe_cols=True, + **kwds): + + # Can't use _deprecate_kwarg since sheetname=None has a special meaning + if is_integer(sheet_name) and sheet_name == 0 and 'sheetname' in kwds: + warnings.warn("The `sheetname` keyword is deprecated, use " + "`sheet_name` instead", FutureWarning, stacklevel=2) + sheet_name = kwds.pop("sheetname") + + if 'sheet' in kwds: + raise TypeError("read_excel() got an unexpected keyword argument " + "`sheet`") + + if not isinstance(io, ExcelFile): + io = ExcelFile(io, engine=engine) + + return io.parse( + sheet_name=sheet_name, + header=header, + names=names, + index_col=index_col, + usecols=usecols, + squeeze=squeeze, + dtype=dtype, + converters=converters, + true_values=true_values, + false_values=false_values, + skiprows=skiprows, + nrows=nrows, + na_values=na_values, + keep_default_na=keep_default_na, + verbose=verbose, + parse_dates=parse_dates, + date_parser=date_parser, + thousands=thousands, + comment=comment, + skipfooter=skipfooter, + convert_float=convert_float, + mangle_dupe_cols=mangle_dupe_cols, + **kwds) + + +@add_metaclass(abc.ABCMeta) +class _BaseExcelReader(object): + + @property + @abc.abstractmethod + def sheet_names(self): + pass + + @abc.abstractmethod + def get_sheet_by_name(self, name): + pass + + @abc.abstractmethod + def get_sheet_by_index(self, index): + pass + + @abc.abstractmethod + def get_sheet_data(self, sheet, convert_float): + pass + + def parse(self, + sheet_name=0, + header=0, + names=None, + index_col=None, + usecols=None, + squeeze=False, + dtype=None, + true_values=None, + false_values=None, + skiprows=None, + nrows=None, + na_values=None, + verbose=False, + parse_dates=False, + date_parser=None, + thousands=None, + comment=None, + skipfooter=0, + convert_float=True, + mangle_dupe_cols=True, + **kwds): + + _validate_header_arg(header) + + ret_dict = False + + # Keep sheetname to maintain backwards compatibility. + if isinstance(sheet_name, list): + sheets = sheet_name + ret_dict = True + elif sheet_name is None: + sheets = self.sheet_names + ret_dict = True + else: + sheets = [sheet_name] + + # handle same-type duplicates. + sheets = list(OrderedDict.fromkeys(sheets).keys()) + + output = OrderedDict() + + for asheetname in sheets: + if verbose: + print("Reading sheet {sheet}".format(sheet=asheetname)) + + if isinstance(asheetname, compat.string_types): + sheet = self.get_sheet_by_name(asheetname) + else: # assume an integer if not a string + sheet = self.get_sheet_by_index(asheetname) + + data = self.get_sheet_data(sheet, convert_float) + usecols = _maybe_convert_usecols(usecols) + + if sheet.nrows == 0: + output[asheetname] = DataFrame() + continue + + if is_list_like(header) and len(header) == 1: + header = header[0] + + # forward fill and pull out names for MultiIndex column + header_names = None + if header is not None and is_list_like(header): + header_names = [] + control_row = [True] * len(data[0]) + + for row in header: + if is_integer(skiprows): + row += skiprows + + data[row], control_row = _fill_mi_header(data[row], + control_row) + + if index_col is not None: + header_name, _ = _pop_header_name(data[row], index_col) + header_names.append(header_name) + + if is_list_like(index_col): + # Forward fill values for MultiIndex index. + if not is_list_like(header): + offset = 1 + header + else: + offset = 1 + max(header) + + # Check if we have an empty dataset + # before trying to collect data. + if offset < len(data): + for col in index_col: + last = data[offset][col] + + for row in range(offset + 1, len(data)): + if data[row][col] == '' or data[row][col] is None: + data[row][col] = last + else: + last = data[row][col] + + has_index_names = is_list_like(header) and len(header) > 1 + + # GH 12292 : error when read one empty column from excel file + try: + parser = TextParser(data, + names=names, + header=header, + index_col=index_col, + has_index_names=has_index_names, + squeeze=squeeze, + dtype=dtype, + true_values=true_values, + false_values=false_values, + skiprows=skiprows, + nrows=nrows, + na_values=na_values, + parse_dates=parse_dates, + date_parser=date_parser, + thousands=thousands, + comment=comment, + skipfooter=skipfooter, + usecols=usecols, + mangle_dupe_cols=mangle_dupe_cols, + **kwds) + + output[asheetname] = parser.read(nrows=nrows) + + if not squeeze or isinstance(output[asheetname], DataFrame): + if header_names: + output[asheetname].columns = output[ + asheetname].columns.set_names(header_names) + elif compat.PY2: + output[asheetname].columns = _maybe_convert_to_string( + output[asheetname].columns) + + except EmptyDataError: + # No Data, return an empty DataFrame + output[asheetname] = DataFrame() + + if ret_dict: + return output + else: + return output[asheetname] + + +@add_metaclass(abc.ABCMeta) +class ExcelWriter(object): + """ + Class for writing DataFrame objects into excel sheets, default is to use + xlwt for xls, openpyxl for xlsx. See DataFrame.to_excel for typical usage. + + Parameters + ---------- + path : string + Path to xls or xlsx file. + engine : string (optional) + Engine to use for writing. If None, defaults to + ``io.excel..writer``. NOTE: can only be passed as a keyword + argument. + date_format : string, default None + Format string for dates written into Excel files (e.g. 'YYYY-MM-DD') + datetime_format : string, default None + Format string for datetime objects written into Excel files + (e.g. 'YYYY-MM-DD HH:MM:SS') + mode : {'w' or 'a'}, default 'w' + File mode to use (write or append). + + .. versionadded:: 0.24.0 + + Attributes + ---------- + None + + Methods + ------- + None + + Notes + ----- + None of the methods and properties are considered public. + + For compatibility with CSV writers, ExcelWriter serializes lists + and dicts to strings before writing. + + Examples + -------- + Default usage: + + >>> with ExcelWriter('path_to_file.xlsx') as writer: + ... df.to_excel(writer) + + To write to separate sheets in a single file: + + >>> with ExcelWriter('path_to_file.xlsx') as writer: + ... df1.to_excel(writer, sheet_name='Sheet1') + ... df2.to_excel(writer, sheet_name='Sheet2') + + You can set the date format or datetime format: + + >>> with ExcelWriter('path_to_file.xlsx', + date_format='YYYY-MM-DD', + datetime_format='YYYY-MM-DD HH:MM:SS') as writer: + ... df.to_excel(writer) + + You can also append to an existing Excel file: + + >>> with ExcelWriter('path_to_file.xlsx', mode='a') as writer: + ... df.to_excel(writer, sheet_name='Sheet3') + """ + # Defining an ExcelWriter implementation (see abstract methods for more...) + + # - Mandatory + # - ``write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)`` + # --> called to write additional DataFrames to disk + # - ``supported_extensions`` (tuple of supported extensions), used to + # check that engine supports the given extension. + # - ``engine`` - string that gives the engine name. Necessary to + # instantiate class directly and bypass ``ExcelWriterMeta`` engine + # lookup. + # - ``save(self)`` --> called to save file to disk + # - Mostly mandatory (i.e. should at least exist) + # - book, cur_sheet, path + + # - Optional: + # - ``__init__(self, path, engine=None, **kwargs)`` --> always called + # with path as first argument. + + # You also need to register the class with ``register_writer()``. + # Technically, ExcelWriter implementations don't need to subclass + # ExcelWriter. + def __new__(cls, path, engine=None, **kwargs): + # only switch class if generic(ExcelWriter) + + if issubclass(cls, ExcelWriter): + if engine is None or (isinstance(engine, string_types) and + engine == 'auto'): + if isinstance(path, string_types): + ext = os.path.splitext(path)[-1][1:] + else: + ext = 'xlsx' + + try: + engine = config.get_option('io.excel.{ext}.writer' + .format(ext=ext)) + if engine == 'auto': + engine = _get_default_writer(ext) + except KeyError: + raise ValueError("No engine for filetype: '{ext}'" + .format(ext=ext)) + cls = get_writer(engine) + + return object.__new__(cls) + + # declare external properties you can count on + book = None + curr_sheet = None + path = None + + @abc.abstractproperty + def supported_extensions(self): + "extensions that writer engine supports" + pass + + @abc.abstractproperty + def engine(self): + "name of engine" + pass + + @abc.abstractmethod + def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, + freeze_panes=None): + """ + Write given formatted cells into Excel an excel sheet + + Parameters + ---------- + cells : generator + cell of formatted data to save to Excel sheet + sheet_name : string, default None + Name of Excel sheet, if None, then use self.cur_sheet + startrow : upper left cell row to dump data frame + startcol : upper left cell column to dump data frame + freeze_panes: integer tuple of length 2 + contains the bottom-most row and right-most column to freeze + """ + pass + + @abc.abstractmethod + def save(self): + """ + Save workbook to disk. + """ + pass + + def __init__(self, path, engine=None, + date_format=None, datetime_format=None, mode='w', + **engine_kwargs): + # validate that this engine can handle the extension + if isinstance(path, string_types): + ext = os.path.splitext(path)[-1] + else: + ext = 'xls' if engine == 'xlwt' else 'xlsx' + + self.check_extension(ext) + + self.path = path + self.sheets = {} + self.cur_sheet = None + + if date_format is None: + self.date_format = 'YYYY-MM-DD' + else: + self.date_format = date_format + if datetime_format is None: + self.datetime_format = 'YYYY-MM-DD HH:MM:SS' + else: + self.datetime_format = datetime_format + + self.mode = mode + + def __fspath__(self): + return _stringify_path(self.path) + + def _get_sheet_name(self, sheet_name): + if sheet_name is None: + sheet_name = self.cur_sheet + if sheet_name is None: # pragma: no cover + raise ValueError('Must pass explicit sheet_name or set ' + 'cur_sheet property') + return sheet_name + + def _value_with_fmt(self, val): + """Convert numpy types to Python types for the Excel writers. + + Parameters + ---------- + val : object + Value to be written into cells + + Returns + ------- + Tuple with the first element being the converted value and the second + being an optional format + """ + fmt = None + + if is_integer(val): + val = int(val) + elif is_float(val): + val = float(val) + elif is_bool(val): + val = bool(val) + elif isinstance(val, datetime): + fmt = self.datetime_format + elif isinstance(val, date): + fmt = self.date_format + elif isinstance(val, timedelta): + val = val.total_seconds() / float(86400) + fmt = '0' + else: + val = compat.to_str(val) + + return val, fmt + + @classmethod + def check_extension(cls, ext): + """checks that path's extension against the Writer's supported + extensions. If it isn't supported, raises UnsupportedFiletypeError.""" + if ext.startswith('.'): + ext = ext[1:] + if not any(ext in extension for extension in cls.supported_extensions): + msg = (u("Invalid extension for engine '{engine}': '{ext}'") + .format(engine=pprint_thing(cls.engine), + ext=pprint_thing(ext))) + raise ValueError(msg) + else: + return True + + # Allow use as a contextmanager + def __enter__(self): + return self + + def __exit__(self, exc_type, exc_value, traceback): + self.close() + + def close(self): + """synonym for save, to make it more file-like""" + return self.save() + + +class ExcelFile(object): + """ + Class for parsing tabular excel sheets into DataFrame objects. + Uses xlrd. See read_excel for more documentation + + Parameters + ---------- + io : string, path object (pathlib.Path or py._path.local.LocalPath), + file-like object or xlrd workbook + If a string or path object, expected to be a path to xls or xlsx file. + engine : string, default None + If io is not a buffer or path, this must be set to identify io. + Acceptable values are None or ``xlrd``. + """ + + from pandas.io.excel._xlrd import _XlrdReader + + _engines = { + 'xlrd': _XlrdReader, + } + + def __init__(self, io, engine=None): + if engine is None: + engine = 'xlrd' + if engine not in self._engines: + raise ValueError("Unknown engine: {engine}".format(engine=engine)) + + # could be a str, ExcelFile, Book, etc. + self.io = io + # Always a string + self._io = _stringify_path(io) + + self._reader = self._engines[engine](self._io) + + def __fspath__(self): + return self._io + + def parse(self, + sheet_name=0, + header=0, + names=None, + index_col=None, + usecols=None, + squeeze=False, + converters=None, + true_values=None, + false_values=None, + skiprows=None, + nrows=None, + na_values=None, + parse_dates=False, + date_parser=None, + thousands=None, + comment=None, + skipfooter=0, + convert_float=True, + mangle_dupe_cols=True, + **kwds): + """ + Parse specified sheet(s) into a DataFrame + + Equivalent to read_excel(ExcelFile, ...) See the read_excel + docstring for more info on accepted parameters + """ + + # Can't use _deprecate_kwarg since sheetname=None has a special meaning + if is_integer(sheet_name) and sheet_name == 0 and 'sheetname' in kwds: + warnings.warn("The `sheetname` keyword is deprecated, use " + "`sheet_name` instead", FutureWarning, stacklevel=2) + sheet_name = kwds.pop("sheetname") + elif 'sheetname' in kwds: + raise TypeError("Cannot specify both `sheet_name` " + "and `sheetname`. Use just `sheet_name`") + + if 'chunksize' in kwds: + raise NotImplementedError("chunksize keyword of read_excel " + "is not implemented") + + return self._reader.parse(sheet_name=sheet_name, + header=header, + names=names, + index_col=index_col, + usecols=usecols, + squeeze=squeeze, + converters=converters, + true_values=true_values, + false_values=false_values, + skiprows=skiprows, + nrows=nrows, + na_values=na_values, + parse_dates=parse_dates, + date_parser=date_parser, + thousands=thousands, + comment=comment, + skipfooter=skipfooter, + convert_float=convert_float, + mangle_dupe_cols=mangle_dupe_cols, + **kwds) + + @property + def book(self): + return self._reader.book + + @property + def sheet_names(self): + return self._reader.sheet_names + + def close(self): + """close io if necessary""" + if hasattr(self.io, 'close'): + self.io.close() + + def __enter__(self): + return self + + def __exit__(self, exc_type, exc_value, traceback): + self.close() diff --git a/pandas/io/excel/_openpyxl.py b/pandas/io/excel/_openpyxl.py new file mode 100644 index 0000000000000..8d79c13a65c97 --- /dev/null +++ b/pandas/io/excel/_openpyxl.py @@ -0,0 +1,453 @@ +from pandas.io.excel._base import ExcelWriter +from pandas.io.excel._util import _validate_freeze_panes + + +class _OpenpyxlWriter(ExcelWriter): + engine = 'openpyxl' + supported_extensions = ('.xlsx', '.xlsm') + + def __init__(self, path, engine=None, mode='w', **engine_kwargs): + # Use the openpyxl module as the Excel writer. + from openpyxl.workbook import Workbook + + super(_OpenpyxlWriter, self).__init__(path, mode=mode, **engine_kwargs) + + if self.mode == 'a': # Load from existing workbook + from openpyxl import load_workbook + book = load_workbook(self.path) + self.book = book + else: + # Create workbook object with default optimized_write=True. + self.book = Workbook() + + if self.book.worksheets: + try: + self.book.remove(self.book.worksheets[0]) + except AttributeError: + + # compat - for openpyxl <= 2.4 + self.book.remove_sheet(self.book.worksheets[0]) + + def save(self): + """ + Save workbook to disk. + """ + return self.book.save(self.path) + + @classmethod + def _convert_to_style(cls, style_dict): + """ + converts a style_dict to an openpyxl style object + Parameters + ---------- + style_dict : style dictionary to convert + """ + + from openpyxl.style import Style + xls_style = Style() + for key, value in style_dict.items(): + for nk, nv in value.items(): + if key == "borders": + (xls_style.borders.__getattribute__(nk) + .__setattr__('border_style', nv)) + else: + xls_style.__getattribute__(key).__setattr__(nk, nv) + + return xls_style + + @classmethod + def _convert_to_style_kwargs(cls, style_dict): + """ + Convert a style_dict to a set of kwargs suitable for initializing + or updating-on-copy an openpyxl v2 style object + Parameters + ---------- + style_dict : dict + A dict with zero or more of the following keys (or their synonyms). + 'font' + 'fill' + 'border' ('borders') + 'alignment' + 'number_format' + 'protection' + Returns + ------- + style_kwargs : dict + A dict with the same, normalized keys as ``style_dict`` but each + value has been replaced with a native openpyxl style object of the + appropriate class. + """ + + _style_key_map = { + 'borders': 'border', + } + + style_kwargs = {} + for k, v in style_dict.items(): + if k in _style_key_map: + k = _style_key_map[k] + _conv_to_x = getattr(cls, '_convert_to_{k}'.format(k=k), + lambda x: None) + new_v = _conv_to_x(v) + if new_v: + style_kwargs[k] = new_v + + return style_kwargs + + @classmethod + def _convert_to_color(cls, color_spec): + """ + Convert ``color_spec`` to an openpyxl v2 Color object + Parameters + ---------- + color_spec : str, dict + A 32-bit ARGB hex string, or a dict with zero or more of the + following keys. + 'rgb' + 'indexed' + 'auto' + 'theme' + 'tint' + 'index' + 'type' + Returns + ------- + color : openpyxl.styles.Color + """ + + from openpyxl.styles import Color + + if isinstance(color_spec, str): + return Color(color_spec) + else: + return Color(**color_spec) + + @classmethod + def _convert_to_font(cls, font_dict): + """ + Convert ``font_dict`` to an openpyxl v2 Font object + Parameters + ---------- + font_dict : dict + A dict with zero or more of the following keys (or their synonyms). + 'name' + 'size' ('sz') + 'bold' ('b') + 'italic' ('i') + 'underline' ('u') + 'strikethrough' ('strike') + 'color' + 'vertAlign' ('vertalign') + 'charset' + 'scheme' + 'family' + 'outline' + 'shadow' + 'condense' + Returns + ------- + font : openpyxl.styles.Font + """ + + from openpyxl.styles import Font + + _font_key_map = { + 'sz': 'size', + 'b': 'bold', + 'i': 'italic', + 'u': 'underline', + 'strike': 'strikethrough', + 'vertalign': 'vertAlign', + } + + font_kwargs = {} + for k, v in font_dict.items(): + if k in _font_key_map: + k = _font_key_map[k] + if k == 'color': + v = cls._convert_to_color(v) + font_kwargs[k] = v + + return Font(**font_kwargs) + + @classmethod + def _convert_to_stop(cls, stop_seq): + """ + Convert ``stop_seq`` to a list of openpyxl v2 Color objects, + suitable for initializing the ``GradientFill`` ``stop`` parameter. + Parameters + ---------- + stop_seq : iterable + An iterable that yields objects suitable for consumption by + ``_convert_to_color``. + Returns + ------- + stop : list of openpyxl.styles.Color + """ + + return map(cls._convert_to_color, stop_seq) + + @classmethod + def _convert_to_fill(cls, fill_dict): + """ + Convert ``fill_dict`` to an openpyxl v2 Fill object + Parameters + ---------- + fill_dict : dict + A dict with one or more of the following keys (or their synonyms), + 'fill_type' ('patternType', 'patterntype') + 'start_color' ('fgColor', 'fgcolor') + 'end_color' ('bgColor', 'bgcolor') + or one or more of the following keys (or their synonyms). + 'type' ('fill_type') + 'degree' + 'left' + 'right' + 'top' + 'bottom' + 'stop' + Returns + ------- + fill : openpyxl.styles.Fill + """ + + from openpyxl.styles import PatternFill, GradientFill + + _pattern_fill_key_map = { + 'patternType': 'fill_type', + 'patterntype': 'fill_type', + 'fgColor': 'start_color', + 'fgcolor': 'start_color', + 'bgColor': 'end_color', + 'bgcolor': 'end_color', + } + + _gradient_fill_key_map = { + 'fill_type': 'type', + } + + pfill_kwargs = {} + gfill_kwargs = {} + for k, v in fill_dict.items(): + pk = gk = None + if k in _pattern_fill_key_map: + pk = _pattern_fill_key_map[k] + if k in _gradient_fill_key_map: + gk = _gradient_fill_key_map[k] + if pk in ['start_color', 'end_color']: + v = cls._convert_to_color(v) + if gk == 'stop': + v = cls._convert_to_stop(v) + if pk: + pfill_kwargs[pk] = v + elif gk: + gfill_kwargs[gk] = v + else: + pfill_kwargs[k] = v + gfill_kwargs[k] = v + + try: + return PatternFill(**pfill_kwargs) + except TypeError: + return GradientFill(**gfill_kwargs) + + @classmethod + def _convert_to_side(cls, side_spec): + """ + Convert ``side_spec`` to an openpyxl v2 Side object + Parameters + ---------- + side_spec : str, dict + A string specifying the border style, or a dict with zero or more + of the following keys (or their synonyms). + 'style' ('border_style') + 'color' + Returns + ------- + side : openpyxl.styles.Side + """ + + from openpyxl.styles import Side + + _side_key_map = { + 'border_style': 'style', + } + + if isinstance(side_spec, str): + return Side(style=side_spec) + + side_kwargs = {} + for k, v in side_spec.items(): + if k in _side_key_map: + k = _side_key_map[k] + if k == 'color': + v = cls._convert_to_color(v) + side_kwargs[k] = v + + return Side(**side_kwargs) + + @classmethod + def _convert_to_border(cls, border_dict): + """ + Convert ``border_dict`` to an openpyxl v2 Border object + Parameters + ---------- + border_dict : dict + A dict with zero or more of the following keys (or their synonyms). + 'left' + 'right' + 'top' + 'bottom' + 'diagonal' + 'diagonal_direction' + 'vertical' + 'horizontal' + 'diagonalUp' ('diagonalup') + 'diagonalDown' ('diagonaldown') + 'outline' + Returns + ------- + border : openpyxl.styles.Border + """ + + from openpyxl.styles import Border + + _border_key_map = { + 'diagonalup': 'diagonalUp', + 'diagonaldown': 'diagonalDown', + } + + border_kwargs = {} + for k, v in border_dict.items(): + if k in _border_key_map: + k = _border_key_map[k] + if k == 'color': + v = cls._convert_to_color(v) + if k in ['left', 'right', 'top', 'bottom', 'diagonal']: + v = cls._convert_to_side(v) + border_kwargs[k] = v + + return Border(**border_kwargs) + + @classmethod + def _convert_to_alignment(cls, alignment_dict): + """ + Convert ``alignment_dict`` to an openpyxl v2 Alignment object + Parameters + ---------- + alignment_dict : dict + A dict with zero or more of the following keys (or their synonyms). + 'horizontal' + 'vertical' + 'text_rotation' + 'wrap_text' + 'shrink_to_fit' + 'indent' + Returns + ------- + alignment : openpyxl.styles.Alignment + """ + + from openpyxl.styles import Alignment + + return Alignment(**alignment_dict) + + @classmethod + def _convert_to_number_format(cls, number_format_dict): + """ + Convert ``number_format_dict`` to an openpyxl v2.1.0 number format + initializer. + Parameters + ---------- + number_format_dict : dict + A dict with zero or more of the following keys. + 'format_code' : str + Returns + ------- + number_format : str + """ + return number_format_dict['format_code'] + + @classmethod + def _convert_to_protection(cls, protection_dict): + """ + Convert ``protection_dict`` to an openpyxl v2 Protection object. + Parameters + ---------- + protection_dict : dict + A dict with zero or more of the following keys. + 'locked' + 'hidden' + Returns + ------- + """ + + from openpyxl.styles import Protection + + return Protection(**protection_dict) + + def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, + freeze_panes=None): + # Write the frame cells using openpyxl. + sheet_name = self._get_sheet_name(sheet_name) + + _style_cache = {} + + if sheet_name in self.sheets: + wks = self.sheets[sheet_name] + else: + wks = self.book.create_sheet() + wks.title = sheet_name + self.sheets[sheet_name] = wks + + if _validate_freeze_panes(freeze_panes): + wks.freeze_panes = wks.cell(row=freeze_panes[0] + 1, + column=freeze_panes[1] + 1) + + for cell in cells: + xcell = wks.cell( + row=startrow + cell.row + 1, + column=startcol + cell.col + 1 + ) + xcell.value, fmt = self._value_with_fmt(cell.val) + if fmt: + xcell.number_format = fmt + + style_kwargs = {} + if cell.style: + key = str(cell.style) + style_kwargs = _style_cache.get(key) + if style_kwargs is None: + style_kwargs = self._convert_to_style_kwargs(cell.style) + _style_cache[key] = style_kwargs + + if style_kwargs: + for k, v in style_kwargs.items(): + setattr(xcell, k, v) + + if cell.mergestart is not None and cell.mergeend is not None: + + wks.merge_cells( + start_row=startrow + cell.row + 1, + start_column=startcol + cell.col + 1, + end_column=startcol + cell.mergeend + 1, + end_row=startrow + cell.mergestart + 1 + ) + + # When cells are merged only the top-left cell is preserved + # The behaviour of the other cells in a merged range is + # undefined + if style_kwargs: + first_row = startrow + cell.row + 1 + last_row = startrow + cell.mergestart + 1 + first_col = startcol + cell.col + 1 + last_col = startcol + cell.mergeend + 1 + + for row in range(first_row, last_row + 1): + for col in range(first_col, last_col + 1): + if row == first_row and col == first_col: + # Ignore first cell. It is already handled. + continue + xcell = wks.cell(column=col, row=row) + for k, v in style_kwargs.items(): + setattr(xcell, k, v) diff --git a/pandas/io/excel/_util.py b/pandas/io/excel/_util.py new file mode 100644 index 0000000000000..49255d83d1cd3 --- /dev/null +++ b/pandas/io/excel/_util.py @@ -0,0 +1,265 @@ +import warnings + +import pandas.compat as compat +from pandas.compat import lrange, range + +from pandas.core.dtypes.common import is_integer, is_list_like + +_writers = {} + + +def register_writer(klass): + """ + Add engine to the excel writer registry.io.excel. + + You must use this method to integrate with ``to_excel``. + + Parameters + ---------- + klass : ExcelWriter + """ + if not callable(klass): + raise ValueError("Can only register callables as engines") + engine_name = klass.engine + _writers[engine_name] = klass + + +def _get_default_writer(ext): + """ + Return the default writer for the given extension. + + Parameters + ---------- + ext : str + The excel file extension for which to get the default engine. + + Returns + ------- + str + The default engine for the extension. + """ + _default_writers = {'xlsx': 'openpyxl', 'xlsm': 'openpyxl', 'xls': 'xlwt'} + try: + import xlsxwriter # noqa + _default_writers['xlsx'] = 'xlsxwriter' + except ImportError: + pass + return _default_writers[ext] + + +def get_writer(engine_name): + try: + return _writers[engine_name] + except KeyError: + raise ValueError("No Excel writer '{engine}'" + .format(engine=engine_name)) + + +def _excel2num(x): + """ + Convert Excel column name like 'AB' to 0-based column index. + + Parameters + ---------- + x : str + The Excel column name to convert to a 0-based column index. + + Returns + ------- + num : int + The column index corresponding to the name. + + Raises + ------ + ValueError + Part of the Excel column name was invalid. + """ + index = 0 + + for c in x.upper().strip(): + cp = ord(c) + + if cp < ord("A") or cp > ord("Z"): + raise ValueError("Invalid column name: {x}".format(x=x)) + + index = index * 26 + cp - ord("A") + 1 + + return index - 1 + + +def _range2cols(areas): + """ + Convert comma separated list of column names and ranges to indices. + + Parameters + ---------- + areas : str + A string containing a sequence of column ranges (or areas). + + Returns + ------- + cols : list + A list of 0-based column indices. + + Examples + -------- + >>> _range2cols('A:E') + [0, 1, 2, 3, 4] + >>> _range2cols('A,C,Z:AB') + [0, 2, 25, 26, 27] + """ + cols = [] + + for rng in areas.split(","): + if ":" in rng: + rng = rng.split(":") + cols.extend(lrange(_excel2num(rng[0]), _excel2num(rng[1]) + 1)) + else: + cols.append(_excel2num(rng)) + + return cols + + +def _maybe_convert_usecols(usecols): + """ + Convert `usecols` into a compatible format for parsing in `parsers.py`. + + Parameters + ---------- + usecols : object + The use-columns object to potentially convert. + + Returns + ------- + converted : object + The compatible format of `usecols`. + """ + if usecols is None: + return usecols + + if is_integer(usecols): + warnings.warn(("Passing in an integer for `usecols` has been " + "deprecated. Please pass in a list of int from " + "0 to `usecols` inclusive instead."), + FutureWarning, stacklevel=2) + return lrange(usecols + 1) + + if isinstance(usecols, compat.string_types): + return _range2cols(usecols) + + return usecols + + +def _validate_freeze_panes(freeze_panes): + if freeze_panes is not None: + if ( + len(freeze_panes) == 2 and + all(isinstance(item, int) for item in freeze_panes) + ): + return True + + raise ValueError("freeze_panes must be of form (row, column)" + " where row and column are integers") + + # freeze_panes wasn't specified, return False so it won't be applied + # to output sheet + return False + + +def _trim_excel_header(row): + # trim header row so auto-index inference works + # xlrd uses '' , openpyxl None + while len(row) > 0 and (row[0] == '' or row[0] is None): + row = row[1:] + return row + + +def _maybe_convert_to_string(row): + """ + Convert elements in a row to string from Unicode. + + This is purely a Python 2.x patch and is performed ONLY when all + elements of the row are string-like. + + Parameters + ---------- + row : array-like + The row of data to convert. + + Returns + ------- + converted : array-like + """ + if compat.PY2: + converted = [] + + for i in range(len(row)): + if isinstance(row[i], compat.string_types): + try: + converted.append(str(row[i])) + except UnicodeEncodeError: + break + else: + break + else: + row = converted + + return row + + +def _fill_mi_header(row, control_row): + """Forward fill blank entries in row but only inside the same parent index. + + Used for creating headers in Multiindex. + Parameters + ---------- + row : list + List of items in a single row. + control_row : list of bool + Helps to determine if particular column is in same parent index as the + previous value. Used to stop propagation of empty cells between + different indexes. + + Returns + ---------- + Returns changed row and control_row + """ + last = row[0] + for i in range(1, len(row)): + if not control_row[i]: + last = row[i] + + if row[i] == '' or row[i] is None: + row[i] = last + else: + control_row[i] = False + last = row[i] + + return _maybe_convert_to_string(row), control_row + + +def _pop_header_name(row, index_col): + """ + Pop the header name for MultiIndex parsing. + + Parameters + ---------- + row : list + The data row to parse for the header name. + index_col : int, list + The index columns for our data. Assumed to be non-null. + + Returns + ------- + header_name : str + The extracted header name. + trimmed_row : list + The original data row with the header name removed. + """ + # Pop out header name and fill w/blank. + i = index_col if not is_list_like(index_col) else max(index_col) + + header_name = row[i] + header_name = None if header_name == "" else header_name + + return header_name, row[:i] + [''] + row[i + 1:] diff --git a/pandas/io/excel/_xlrd.py b/pandas/io/excel/_xlrd.py new file mode 100644 index 0000000000000..60f7d8f94a399 --- /dev/null +++ b/pandas/io/excel/_xlrd.py @@ -0,0 +1,126 @@ +from datetime import time +from distutils.version import LooseVersion +from io import UnsupportedOperation + +import numpy as np + +import pandas.compat as compat +from pandas.compat import range, zip + +from pandas.io.common import _is_url, _urlopen, get_filepath_or_buffer +from pandas.io.excel._base import _BaseExcelReader + + +class _XlrdReader(_BaseExcelReader): + + def __init__(self, filepath_or_buffer): + """Reader using xlrd engine. + + Parameters + ---------- + filepath_or_buffer : string, path object or Workbook + Object to be parsed. + """ + err_msg = "Install xlrd >= 1.0.0 for Excel support" + + try: + import xlrd + except ImportError: + raise ImportError(err_msg) + else: + if xlrd.__VERSION__ < LooseVersion("1.0.0"): + raise ImportError(err_msg + + ". Current version " + xlrd.__VERSION__) + + from pandas.io.excel._base import ExcelFile + # If filepath_or_buffer is a url, want to keep the data as bytes so + # can't pass to get_filepath_or_buffer() + if _is_url(filepath_or_buffer): + filepath_or_buffer = _urlopen(filepath_or_buffer) + elif not isinstance(filepath_or_buffer, (ExcelFile, xlrd.Book)): + filepath_or_buffer, _, _, _ = get_filepath_or_buffer( + filepath_or_buffer) + + if isinstance(filepath_or_buffer, xlrd.Book): + self.book = filepath_or_buffer + elif hasattr(filepath_or_buffer, "read"): + # N.B. xlrd.Book has a read attribute too + if hasattr(filepath_or_buffer, 'seek'): + try: + # GH 19779 + filepath_or_buffer.seek(0) + except UnsupportedOperation: + # HTTPResponse does not support seek() + # GH 20434 + pass + + data = filepath_or_buffer.read() + self.book = xlrd.open_workbook(file_contents=data) + elif isinstance(filepath_or_buffer, compat.string_types): + self.book = xlrd.open_workbook(filepath_or_buffer) + else: + raise ValueError('Must explicitly set engine if not passing in' + ' buffer or path for io.') + + @property + def sheet_names(self): + return self.book.sheet_names() + + def get_sheet_by_name(self, name): + return self.book.sheet_by_name(name) + + def get_sheet_by_index(self, index): + return self.book.sheet_by_index(index) + + def get_sheet_data(self, sheet, convert_float): + from xlrd import (xldate, XL_CELL_DATE, + XL_CELL_ERROR, XL_CELL_BOOLEAN, + XL_CELL_NUMBER) + + epoch1904 = self.book.datemode + + def _parse_cell(cell_contents, cell_typ): + """converts the contents of the cell into a pandas + appropriate object""" + + if cell_typ == XL_CELL_DATE: + + # Use the newer xlrd datetime handling. + try: + cell_contents = xldate.xldate_as_datetime( + cell_contents, epoch1904) + except OverflowError: + return cell_contents + + # Excel doesn't distinguish between dates and time, + # so we treat dates on the epoch as times only. + # Also, Excel supports 1900 and 1904 epochs. + year = (cell_contents.timetuple())[0:3] + if ((not epoch1904 and year == (1899, 12, 31)) or + (epoch1904 and year == (1904, 1, 1))): + cell_contents = time(cell_contents.hour, + cell_contents.minute, + cell_contents.second, + cell_contents.microsecond) + + elif cell_typ == XL_CELL_ERROR: + cell_contents = np.nan + elif cell_typ == XL_CELL_BOOLEAN: + cell_contents = bool(cell_contents) + elif convert_float and cell_typ == XL_CELL_NUMBER: + # GH5394 - Excel 'numbers' are always floats + # it's a minimal perf hit and less surprising + val = int(cell_contents) + if val == cell_contents: + cell_contents = val + return cell_contents + + data = [] + + for i in range(sheet.nrows): + row = [_parse_cell(value, typ) + for value, typ in zip(sheet.row_values(i), + sheet.row_types(i))] + data.append(row) + + return data diff --git a/pandas/io/excel/_xlsxwriter.py b/pandas/io/excel/_xlsxwriter.py new file mode 100644 index 0000000000000..531a3657cac6f --- /dev/null +++ b/pandas/io/excel/_xlsxwriter.py @@ -0,0 +1,218 @@ +import pandas._libs.json as json +from pandas.compat import string_types + +from pandas.io.excel._base import ExcelWriter +from pandas.io.excel._util import _validate_freeze_panes + + +class _XlsxStyler(object): + # Map from openpyxl-oriented styles to flatter xlsxwriter representation + # Ordering necessary for both determinism and because some are keyed by + # prefixes of others. + STYLE_MAPPING = { + 'font': [ + (('name',), 'font_name'), + (('sz',), 'font_size'), + (('size',), 'font_size'), + (('color', 'rgb',), 'font_color'), + (('color',), 'font_color'), + (('b',), 'bold'), + (('bold',), 'bold'), + (('i',), 'italic'), + (('italic',), 'italic'), + (('u',), 'underline'), + (('underline',), 'underline'), + (('strike',), 'font_strikeout'), + (('vertAlign',), 'font_script'), + (('vertalign',), 'font_script'), + ], + 'number_format': [ + (('format_code',), 'num_format'), + ((), 'num_format',), + ], + 'protection': [ + (('locked',), 'locked'), + (('hidden',), 'hidden'), + ], + 'alignment': [ + (('horizontal',), 'align'), + (('vertical',), 'valign'), + (('text_rotation',), 'rotation'), + (('wrap_text',), 'text_wrap'), + (('indent',), 'indent'), + (('shrink_to_fit',), 'shrink'), + ], + 'fill': [ + (('patternType',), 'pattern'), + (('patterntype',), 'pattern'), + (('fill_type',), 'pattern'), + (('start_color', 'rgb',), 'fg_color'), + (('fgColor', 'rgb',), 'fg_color'), + (('fgcolor', 'rgb',), 'fg_color'), + (('start_color',), 'fg_color'), + (('fgColor',), 'fg_color'), + (('fgcolor',), 'fg_color'), + (('end_color', 'rgb',), 'bg_color'), + (('bgColor', 'rgb',), 'bg_color'), + (('bgcolor', 'rgb',), 'bg_color'), + (('end_color',), 'bg_color'), + (('bgColor',), 'bg_color'), + (('bgcolor',), 'bg_color'), + ], + 'border': [ + (('color', 'rgb',), 'border_color'), + (('color',), 'border_color'), + (('style',), 'border'), + (('top', 'color', 'rgb',), 'top_color'), + (('top', 'color',), 'top_color'), + (('top', 'style',), 'top'), + (('top',), 'top'), + (('right', 'color', 'rgb',), 'right_color'), + (('right', 'color',), 'right_color'), + (('right', 'style',), 'right'), + (('right',), 'right'), + (('bottom', 'color', 'rgb',), 'bottom_color'), + (('bottom', 'color',), 'bottom_color'), + (('bottom', 'style',), 'bottom'), + (('bottom',), 'bottom'), + (('left', 'color', 'rgb',), 'left_color'), + (('left', 'color',), 'left_color'), + (('left', 'style',), 'left'), + (('left',), 'left'), + ], + } + + @classmethod + def convert(cls, style_dict, num_format_str=None): + """ + converts a style_dict to an xlsxwriter format dict + + Parameters + ---------- + style_dict : style dictionary to convert + num_format_str : optional number format string + """ + + # Create a XlsxWriter format object. + props = {} + + if num_format_str is not None: + props['num_format'] = num_format_str + + if style_dict is None: + return props + + if 'borders' in style_dict: + style_dict = style_dict.copy() + style_dict['border'] = style_dict.pop('borders') + + for style_group_key, style_group in style_dict.items(): + for src, dst in cls.STYLE_MAPPING.get(style_group_key, []): + # src is a sequence of keys into a nested dict + # dst is a flat key + if dst in props: + continue + v = style_group + for k in src: + try: + v = v[k] + except (KeyError, TypeError): + break + else: + props[dst] = v + + if isinstance(props.get('pattern'), string_types): + # TODO: support other fill patterns + props['pattern'] = 0 if props['pattern'] == 'none' else 1 + + for k in ['border', 'top', 'right', 'bottom', 'left']: + if isinstance(props.get(k), string_types): + try: + props[k] = ['none', 'thin', 'medium', 'dashed', 'dotted', + 'thick', 'double', 'hair', 'mediumDashed', + 'dashDot', 'mediumDashDot', 'dashDotDot', + 'mediumDashDotDot', + 'slantDashDot'].index(props[k]) + except ValueError: + props[k] = 2 + + if isinstance(props.get('font_script'), string_types): + props['font_script'] = ['baseline', 'superscript', + 'subscript'].index(props['font_script']) + + if isinstance(props.get('underline'), string_types): + props['underline'] = {'none': 0, 'single': 1, 'double': 2, + 'singleAccounting': 33, + 'doubleAccounting': 34}[props['underline']] + + return props + + +class _XlsxWriter(ExcelWriter): + engine = 'xlsxwriter' + supported_extensions = ('.xlsx',) + + def __init__(self, path, engine=None, + date_format=None, datetime_format=None, mode='w', + **engine_kwargs): + # Use the xlsxwriter module as the Excel writer. + import xlsxwriter + + if mode == 'a': + raise ValueError('Append mode is not supported with xlsxwriter!') + + super(_XlsxWriter, self).__init__(path, engine=engine, + date_format=date_format, + datetime_format=datetime_format, + mode=mode, + **engine_kwargs) + + self.book = xlsxwriter.Workbook(path, **engine_kwargs) + + def save(self): + """ + Save workbook to disk. + """ + + return self.book.close() + + def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, + freeze_panes=None): + # Write the frame cells using xlsxwriter. + sheet_name = self._get_sheet_name(sheet_name) + + if sheet_name in self.sheets: + wks = self.sheets[sheet_name] + else: + wks = self.book.add_worksheet(sheet_name) + self.sheets[sheet_name] = wks + + style_dict = {'null': None} + + if _validate_freeze_panes(freeze_panes): + wks.freeze_panes(*(freeze_panes)) + + for cell in cells: + val, fmt = self._value_with_fmt(cell.val) + + stylekey = json.dumps(cell.style) + if fmt: + stylekey += fmt + + if stylekey in style_dict: + style = style_dict[stylekey] + else: + style = self.book.add_format( + _XlsxStyler.convert(cell.style, fmt)) + style_dict[stylekey] = style + + if cell.mergestart is not None and cell.mergeend is not None: + wks.merge_range(startrow + cell.row, + startcol + cell.col, + startrow + cell.mergestart, + startcol + cell.mergeend, + cell.val, style) + else: + wks.write(startrow + cell.row, + startcol + cell.col, + val, style) diff --git a/pandas/io/excel/_xlwt.py b/pandas/io/excel/_xlwt.py new file mode 100644 index 0000000000000..191fbe914b750 --- /dev/null +++ b/pandas/io/excel/_xlwt.py @@ -0,0 +1,132 @@ +import pandas._libs.json as json + +from pandas.io.excel._base import ExcelWriter +from pandas.io.excel._util import _validate_freeze_panes + + +class _XlwtWriter(ExcelWriter): + engine = 'xlwt' + supported_extensions = ('.xls',) + + def __init__(self, path, engine=None, encoding=None, mode='w', + **engine_kwargs): + # Use the xlwt module as the Excel writer. + import xlwt + engine_kwargs['engine'] = engine + + if mode == 'a': + raise ValueError('Append mode is not supported with xlwt!') + + super(_XlwtWriter, self).__init__(path, mode=mode, **engine_kwargs) + + if encoding is None: + encoding = 'ascii' + self.book = xlwt.Workbook(encoding=encoding) + self.fm_datetime = xlwt.easyxf(num_format_str=self.datetime_format) + self.fm_date = xlwt.easyxf(num_format_str=self.date_format) + + def save(self): + """ + Save workbook to disk. + """ + return self.book.save(self.path) + + def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, + freeze_panes=None): + # Write the frame cells using xlwt. + + sheet_name = self._get_sheet_name(sheet_name) + + if sheet_name in self.sheets: + wks = self.sheets[sheet_name] + else: + wks = self.book.add_sheet(sheet_name) + self.sheets[sheet_name] = wks + + if _validate_freeze_panes(freeze_panes): + wks.set_panes_frozen(True) + wks.set_horz_split_pos(freeze_panes[0]) + wks.set_vert_split_pos(freeze_panes[1]) + + style_dict = {} + + for cell in cells: + val, fmt = self._value_with_fmt(cell.val) + + stylekey = json.dumps(cell.style) + if fmt: + stylekey += fmt + + if stylekey in style_dict: + style = style_dict[stylekey] + else: + style = self._convert_to_style(cell.style, fmt) + style_dict[stylekey] = style + + if cell.mergestart is not None and cell.mergeend is not None: + wks.write_merge(startrow + cell.row, + startrow + cell.mergestart, + startcol + cell.col, + startcol + cell.mergeend, + val, style) + else: + wks.write(startrow + cell.row, + startcol + cell.col, + val, style) + + @classmethod + def _style_to_xlwt(cls, item, firstlevel=True, field_sep=',', + line_sep=';'): + """helper which recursively generate an xlwt easy style string + for example: + + hstyle = {"font": {"bold": True}, + "border": {"top": "thin", + "right": "thin", + "bottom": "thin", + "left": "thin"}, + "align": {"horiz": "center"}} + will be converted to + font: bold on; \ + border: top thin, right thin, bottom thin, left thin; \ + align: horiz center; + """ + if hasattr(item, 'items'): + if firstlevel: + it = ["{key}: {val}" + .format(key=key, val=cls._style_to_xlwt(value, False)) + for key, value in item.items()] + out = "{sep} ".format(sep=(line_sep).join(it)) + return out + else: + it = ["{key} {val}" + .format(key=key, val=cls._style_to_xlwt(value, False)) + for key, value in item.items()] + out = "{sep} ".format(sep=(field_sep).join(it)) + return out + else: + item = "{item}".format(item=item) + item = item.replace("True", "on") + item = item.replace("False", "off") + return item + + @classmethod + def _convert_to_style(cls, style_dict, num_format_str=None): + """ + converts a style_dict to an xlwt style object + Parameters + ---------- + style_dict : style dictionary to convert + num_format_str : optional number format string + """ + import xlwt + + if style_dict: + xlwt_stylestr = cls._style_to_xlwt(style_dict) + style = xlwt.easyxf(xlwt_stylestr, field_sep=',', line_sep=';') + else: + style = xlwt.XFStyle() + if num_format_str is not None: + style.num_format_str = num_format_str + + return style diff --git a/pandas/io/formats/console.py b/pandas/io/formats/console.py index d5ef9f61bc132..ad63b3efdd832 100644 --- a/pandas/io/formats/console.py +++ b/pandas/io/formats/console.py @@ -108,44 +108,6 @@ def check_main(): return check_main() -def in_qtconsole(): - """ - check if we're inside an IPython qtconsole - - .. deprecated:: 0.14.1 - This is no longer needed, or working, in IPython 3 and above. - """ - try: - ip = get_ipython() # noqa - front_end = ( - ip.config.get('KernelApp', {}).get('parent_appname', "") or - ip.config.get('IPKernelApp', {}).get('parent_appname', "")) - if 'qtconsole' in front_end.lower(): - return True - except NameError: - return False - return False - - -def in_ipnb(): - """ - check if we're inside an IPython Notebook - - .. deprecated:: 0.14.1 - This is no longer needed, or working, in IPython 3 and above. - """ - try: - ip = get_ipython() # noqa - front_end = ( - ip.config.get('KernelApp', {}).get('parent_appname', "") or - ip.config.get('IPKernelApp', {}).get('parent_appname', "")) - if 'notebook' in front_end.lower(): - return True - except NameError: - return False - return False - - def in_ipython_frontend(): """ check if we're inside an an IPython zmq frontend diff --git a/pandas/io/formats/format.py b/pandas/io/formats/format.py index 62fa04e784072..f68ef2cc39006 100644 --- a/pandas/io/formats/format.py +++ b/pandas/io/formats/format.py @@ -1060,19 +1060,26 @@ def get_result_as_array(self): def format_values_with(float_format): formatter = self._value_formatter(float_format, threshold) + # default formatter leaves a space to the left when formatting + # floats, must be consistent for left-justifying NaNs (GH #25061) + if self.justify == 'left': + na_rep = ' ' + self.na_rep + else: + na_rep = self.na_rep + # separate the wheat from the chaff values = self.values mask = isna(values) if hasattr(values, 'to_dense'): # sparse numpy ndarray values = values.to_dense() values = np.array(values, dtype='object') - values[mask] = self.na_rep + values[mask] = na_rep imask = (~mask).ravel() values.flat[imask] = np.array([formatter(val) for val in values.ravel()[imask]]) if self.fixed_width: - return _trim_zeros(values, self.na_rep) + return _trim_zeros(values, na_rep) return values diff --git a/pandas/io/formats/html.py b/pandas/io/formats/html.py index f41749e0a7745..66d13bf2668f9 100644 --- a/pandas/io/formats/html.py +++ b/pandas/io/formats/html.py @@ -5,14 +5,14 @@ from __future__ import print_function +from collections import OrderedDict from textwrap import dedent -from pandas.compat import OrderedDict, lzip, map, range, u, unichr, zip +from pandas.compat import lzip, map, range, u, unichr, zip from pandas.core.dtypes.generic import ABCMultiIndex -from pandas import compat -import pandas.core.common as com +from pandas import compat, option_context from pandas.core.config import get_option from pandas.io.common import _is_url @@ -190,7 +190,7 @@ def _write_col_header(self, indent): if self.fmt.sparsify: # GH3547 - sentinel = com.sentinel_factory() + sentinel = object() else: sentinel = False levels = self.columns.format(sparsify=sentinel, adjoin=False, @@ -320,9 +320,15 @@ def _write_header(self, indent): self.write('', indent) + def _get_formatted_values(self): + with option_context('display.max_colwidth', 999999): + fmt_values = {i: self.fmt._format_col(i) + for i in range(self.ncols)} + return fmt_values + def _write_body(self, indent): self.write('', indent) - fmt_values = {i: self.fmt._format_col(i) for i in range(self.ncols)} + fmt_values = self._get_formatted_values() # write values if self.fmt.index and isinstance(self.frame.index, ABCMultiIndex): @@ -386,7 +392,7 @@ def _write_hierarchical_rows(self, fmt_values, indent): if self.fmt.sparsify: # GH3547 - sentinel = com.sentinel_factory() + sentinel = object() levels = frame.index.format(sparsify=sentinel, adjoin=False, names=False) @@ -486,6 +492,9 @@ class NotebookFormatter(HTMLFormatter): DataFrame._repr_html_() and DataFrame.to_html(notebook=True) """ + def _get_formatted_values(self): + return {i: self.fmt._format_col(i) for i in range(self.ncols)} + def write_style(self): # We use the "scoped" attribute here so that the desired # style properties for the data frame are not then applied diff --git a/pandas/io/formats/style.py b/pandas/io/formats/style.py index 598453eb92d25..c8b5dc6b9b7c0 100644 --- a/pandas/io/formats/style.py +++ b/pandas/io/formats/style.py @@ -81,7 +81,7 @@ class Styler(object): See Also -------- - pandas.DataFrame.style + DataFrame.style Notes ----- @@ -424,16 +424,18 @@ def render(self, **kwargs): Parameters ---------- - `**kwargs` : Any additional keyword arguments are passed through - to ``self.template.render``. This is useful when you need to provide - additional variables for a custom template. + **kwargs + Any additional keyword arguments are passed + through to ``self.template.render``. + This is useful when you need to provide + additional variables for a custom template. .. versionadded:: 0.20 Returns ------- rendered : str - the rendered HTML + The rendered HTML. Notes ----- @@ -1223,7 +1225,7 @@ def from_custom_template(cls, searchpath, name): Returns ------- MyStyler : subclass of Styler - has the correct ``env`` and ``template`` class attributes set. + Has the correct ``env`` and ``template`` class attributes set. """ loader = ChoiceLoader([ FileSystemLoader(searchpath), @@ -1322,7 +1324,7 @@ def _get_level_lengths(index, hidden_elements=None): Result is a dictionary of (level, inital_position): span """ - sentinel = com.sentinel_factory() + sentinel = object() levels = index.format(sparsify=sentinel, adjoin=False, names=False) if hidden_elements is None: diff --git a/pandas/io/formats/terminal.py b/pandas/io/formats/terminal.py index bb34259d710c7..cf2383955d593 100644 --- a/pandas/io/formats/terminal.py +++ b/pandas/io/formats/terminal.py @@ -15,6 +15,7 @@ import os import shutil +import subprocess from pandas.compat import PY3 @@ -94,22 +95,29 @@ def _get_terminal_size_tput(): # get terminal width # src: http://stackoverflow.com/questions/263890/how-do-i-find-the-width # -height-of-a-terminal-window + try: - import subprocess proc = subprocess.Popen(["tput", "cols"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) - output = proc.communicate(input=None) - cols = int(output[0]) + output_cols = proc.communicate(input=None) proc = subprocess.Popen(["tput", "lines"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) - output = proc.communicate(input=None) - rows = int(output[0]) - return (cols, rows) + output_rows = proc.communicate(input=None) except OSError: return None + try: + # Some terminals (e.g. spyder) may report a terminal size of '', + # making the `int` fail. + + cols = int(output_cols[0]) + rows = int(output_rows[0]) + return cols, rows + except (ValueError, IndexError): + return None + def _get_terminal_size_linux(): def ioctl_GWINSZ(fd): diff --git a/pandas/io/gbq.py b/pandas/io/gbq.py index 639b68d433ac6..a6cec7ea8fb16 100644 --- a/pandas/io/gbq.py +++ b/pandas/io/gbq.py @@ -127,7 +127,7 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, See Also -------- pandas_gbq.read_gbq : This function in the pandas-gbq library. - pandas.DataFrame.to_gbq : Write a DataFrame to Google BigQuery. + DataFrame.to_gbq : Write a DataFrame to Google BigQuery. """ pandas_gbq = _try_import() diff --git a/pandas/io/html.py b/pandas/io/html.py index 74934740a6957..347bb3eec54af 100644 --- a/pandas/io/html.py +++ b/pandas/io/html.py @@ -988,7 +988,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None, latest information on table attributes for the modern web. parse_dates : bool, optional - See :func:`~pandas.read_csv` for more details. + See :func:`~read_csv` for more details. tupleize_cols : bool, optional If ``False`` try to parse multiple header rows into a @@ -1043,7 +1043,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None, See Also -------- - pandas.read_csv + read_csv Notes ----- @@ -1066,7 +1066,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None, .. versionadded:: 0.21.0 - Similar to :func:`~pandas.read_csv` the `header` argument is applied + Similar to :func:`~read_csv` the `header` argument is applied **after** `skiprows` is applied. This function will *always* return a list of :class:`DataFrame` *or* diff --git a/pandas/io/json/json.py b/pandas/io/json/json.py index 4bbccc8339d7c..4bae067ee5196 100644 --- a/pandas/io/json/json.py +++ b/pandas/io/json/json.py @@ -226,8 +226,8 @@ def _write(self, obj, orient, double_precision, ensure_ascii, return serialized -def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, - convert_axes=True, convert_dates=True, keep_default_dates=True, +def read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, + convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False, chunksize=None, compression='infer'): """ @@ -277,11 +277,25 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, 'table' as an allowed value for the ``orient`` argument typ : type of object to recover (series or frame), default 'frame' - dtype : boolean or dict, default True - If True, infer dtypes, if a dict of column to dtype, then use those, + dtype : boolean or dict, default None + If True, infer dtypes; if a dict of column to dtype, then use those; if False, then don't infer dtypes at all, applies only to the data. - convert_axes : boolean, default True + + For all ``orient`` values except ``'table'``, default is True. + + .. versionchanged:: 0.25.0 + + Not applicable for ``orient='table'``. + + convert_axes : boolean, default None Try to convert the axes to the proper dtypes. + + For all ``orient`` values except ``'table'``, default is True. + + .. versionchanged:: 0.25.0 + + Not applicable for ``orient='table'``. + convert_dates : boolean, default True List of columns to parse for dates; If True, then try to parse datelike columns default is True; a column label is datelike if @@ -408,6 +422,16 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, {"index": "row 2", "col 1": "c", "col 2": "d"}]}' """ + if orient == 'table' and dtype: + raise ValueError("cannot pass both dtype and orient='table'") + if orient == 'table' and convert_axes: + raise ValueError("cannot pass both convert_axes and orient='table'") + + if dtype is None and orient != 'table': + dtype = True + if convert_axes is None and orient != 'table': + convert_axes = True + compression = _infer_compression(path_or_buf, compression) filepath_or_buffer, _, compression, should_close = get_filepath_or_buffer( path_or_buf, encoding=encoding, compression=compression, @@ -600,15 +624,15 @@ class Parser(object): 'us': long(31536000000000), 'ns': long(31536000000000000)} - def __init__(self, json, orient, dtype=True, convert_axes=True, + def __init__(self, json, orient, dtype=None, convert_axes=True, convert_dates=True, keep_default_dates=False, numpy=False, precise_float=False, date_unit=None): self.json = json if orient is None: orient = self._default_orient - self.orient = orient + self.dtype = dtype if orient == "split": @@ -680,7 +704,7 @@ def _try_convert_data(self, name, data, use_dtypes=True, # don't try to coerce, unless a force conversion if use_dtypes: - if self.dtype is False: + if not self.dtype: return data, False elif self.dtype is True: pass diff --git a/pandas/io/json/table_schema.py b/pandas/io/json/table_schema.py index 2bd93b19d4225..971386c91944e 100644 --- a/pandas/io/json/table_schema.py +++ b/pandas/io/json/table_schema.py @@ -314,12 +314,13 @@ def parse_table_schema(json, precise_float): df = df.astype(dtypes) - df = df.set_index(table['schema']['primaryKey']) - if len(df.index.names) == 1: - if df.index.name == 'index': - df.index.name = None - else: - df.index.names = [None if x.startswith('level_') else x for x in - df.index.names] + if 'primaryKey' in table['schema']: + df = df.set_index(table['schema']['primaryKey']) + if len(df.index.names) == 1: + if df.index.name == 'index': + df.index.name = None + else: + df.index.names = [None if x.startswith('level_') else x for x in + df.index.names] return df diff --git a/pandas/io/packers.py b/pandas/io/packers.py index efe4e3a91c69c..588d63d73515f 100644 --- a/pandas/io/packers.py +++ b/pandas/io/packers.py @@ -219,7 +219,7 @@ def read(fh): finally: if fh is not None: fh.close() - elif hasattr(path_or_buf, 'read') and compat.callable(path_or_buf.read): + elif hasattr(path_or_buf, 'read') and callable(path_or_buf.read): # treat as a buffer like return read(path_or_buf) diff --git a/pandas/io/parquet.py b/pandas/io/parquet.py index dada9000d901a..ba322f42c07c1 100644 --- a/pandas/io/parquet.py +++ b/pandas/io/parquet.py @@ -262,16 +262,17 @@ def read_parquet(path, engine='auto', columns=None, **kwargs): ---------- path : string File path - columns : list, default=None - If not None, only these columns will be read from the file. - - .. versionadded 0.21.1 engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' Parquet library to use. If 'auto', then the option ``io.parquet.engine`` is used. The default ``io.parquet.engine`` behavior is to try 'pyarrow', falling back to 'fastparquet' if 'pyarrow' is unavailable. - kwargs are passed to the engine + columns : list, default=None + If not None, only these columns will be read from the file. + + .. versionadded 0.21.1 + **kwargs + Any additional kwargs are passed to the engine. Returns ------- diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py index b31d3f665f47f..4163a571df800 100755 --- a/pandas/io/parsers.py +++ b/pandas/io/parsers.py @@ -203,9 +203,14 @@ * dict, e.g. {{'foo' : [1, 3]}} -> parse columns 1, 3 as date and call result 'foo' - If a column or index contains an unparseable date, the entire column or - index will be returned unaltered as an object data type. For non-standard - datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv`` + If a column or index cannot be represented as an array of datetimes, + say because of an unparseable value or a mixture of timezones, the column + or index will be returned unaltered as an object data type. For + non-standard datetime parsing, use ``pd.to_datetime`` after + ``pd.read_csv``. To parse an index or column with a mixture of timezones, + specify ``date_parser`` to be a partially-applied + :func:`pandas.to_datetime` with ``utc=True``. See + :ref:`io.csv.mixed_timezones` for more. Note: A fast-path exists for iso8601-formatted dates. infer_datetime_format : bool, default False diff --git a/pandas/io/pickle.py b/pandas/io/pickle.py index 789f55a62dc58..ab4a266853a78 100644 --- a/pandas/io/pickle.py +++ b/pandas/io/pickle.py @@ -1,8 +1,7 @@ """ pickle compat """ import warnings -import numpy as np -from numpy.lib.format import read_array, write_array +from numpy.lib.format import read_array from pandas.compat import PY3, BytesIO, cPickle as pkl, pickle_compat as pc @@ -76,6 +75,7 @@ def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL): try: f.write(pkl.dumps(obj, protocol=protocol)) finally: + f.close() for _f in fh: _f.close() @@ -138,63 +138,32 @@ def read_pickle(path, compression='infer'): >>> os.remove("./dummy.pkl") """ path = _stringify_path(path) + f, fh = _get_handle(path, 'rb', compression=compression, is_text=False) + + # 1) try with cPickle + # 2) try with the compat pickle to handle subclass changes + # 3) pass encoding only if its not None as py2 doesn't handle the param - def read_wrapper(func): - # wrapper file handle open/close operation - f, fh = _get_handle(path, 'rb', - compression=compression, - is_text=False) - try: - return func(f) - finally: - for _f in fh: - _f.close() - - def try_read(path, encoding=None): - # try with cPickle - # try with current pickle, if we have a Type Error then - # try with the compat pickle to handle subclass changes - # pass encoding only if its not None as py2 doesn't handle - # the param - - # cpickle - # GH 6899 - try: - with warnings.catch_warnings(record=True): - # We want to silence any warnings about, e.g. moved modules. - warnings.simplefilter("ignore", Warning) - return read_wrapper(lambda f: pkl.load(f)) - except Exception: # noqa: E722 - # reg/patched pickle - # compat not used in pandas/compat/pickle_compat.py::load - # TODO: remove except block OR modify pc.load to use compat - try: - return read_wrapper( - lambda f: pc.load(f, encoding=encoding, compat=False)) - # compat pickle - except Exception: # noqa: E722 - return read_wrapper( - lambda f: pc.load(f, encoding=encoding, compat=True)) try: - return try_read(path) + with warnings.catch_warnings(record=True): + # We want to silence any warnings about, e.g. moved modules. + warnings.simplefilter("ignore", Warning) + return pkl.load(f) except Exception: # noqa: E722 - if PY3: - return try_read(path, encoding='latin1') - raise - + try: + return pc.load(f, encoding=None) + except Exception: # noqa: E722 + if PY3: + return pc.load(f, encoding='latin1') + raise + finally: + f.close() + for _f in fh: + _f.close() # compat with sparse pickle / unpickle -def _pickle_array(arr): - arr = arr.view(np.ndarray) - - buf = BytesIO() - write_array(buf, arr) - - return buf.getvalue() - - def _unpickle_array(bytes): arr = read_array(BytesIO(bytes)) diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py index 4e103482f48a2..2ee8759b9bdd8 100644 --- a/pandas/io/pytables.py +++ b/pandas/io/pytables.py @@ -15,34 +15,29 @@ import numpy as np -from pandas._libs import algos, lib, writers as libwriters +from pandas._libs import lib, writers as libwriters from pandas._libs.tslibs import timezones from pandas.compat import PY3, filter, lrange, range, string_types from pandas.errors import PerformanceWarning from pandas.core.dtypes.common import ( - ensure_int64, ensure_object, ensure_platform_int, is_categorical_dtype, - is_datetime64_dtype, is_datetime64tz_dtype, is_list_like, - is_timedelta64_dtype) + ensure_object, is_categorical_dtype, is_datetime64_dtype, + is_datetime64tz_dtype, is_list_like, is_timedelta64_dtype) from pandas.core.dtypes.missing import array_equivalent from pandas import ( - DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, Panel, - PeriodIndex, Series, SparseDataFrame, SparseSeries, TimedeltaIndex, compat, - concat, isna, to_datetime) + DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, PeriodIndex, + Series, SparseDataFrame, SparseSeries, TimedeltaIndex, compat, concat, + isna, to_datetime) from pandas.core import config -from pandas.core.algorithms import match, unique -from pandas.core.arrays.categorical import ( - Categorical, _factorize_from_iterables) +from pandas.core.arrays.categorical import Categorical from pandas.core.arrays.sparse import BlockIndex, IntIndex from pandas.core.base import StringMixin import pandas.core.common as com from pandas.core.computation.pytables import Expr, maybe_expression from pandas.core.config import get_option from pandas.core.index import ensure_index -from pandas.core.internals import ( - BlockManager, _block2d_to_blocknd, _block_shape, _factor_indexer, - make_block) +from pandas.core.internals import BlockManager, _block_shape, make_block from pandas.io.common import _stringify_path from pandas.io.formats.printing import adjoin, pprint_thing @@ -175,7 +170,6 @@ class DuplicateWarning(Warning): SparseSeries: u'sparse_series', DataFrame: u'frame', SparseDataFrame: u'sparse_frame', - Panel: u'wide', } # storer class map @@ -187,7 +181,6 @@ class DuplicateWarning(Warning): u'sparse_series': 'SparseSeriesFixed', u'frame': 'FrameFixed', u'sparse_frame': 'SparseFrameFixed', - u'wide': 'PanelFixed', } # table class map @@ -197,16 +190,12 @@ class DuplicateWarning(Warning): u'appendable_multiseries': 'AppendableMultiSeriesTable', u'appendable_frame': 'AppendableFrameTable', u'appendable_multiframe': 'AppendableMultiFrameTable', - u'appendable_panel': 'AppendablePanelTable', u'worm': 'WORMTable', - u'legacy_frame': 'LegacyFrameTable', - u'legacy_panel': 'LegacyPanelTable', } # axes map _AXES_MAP = { DataFrame: [0], - Panel: [1, 2] } # register our configuration options @@ -326,8 +315,8 @@ def read_hdf(path_or_buf, key=None, mode='r', **kwargs): See Also -------- - pandas.DataFrame.to_hdf : Write a HDF file from a DataFrame. - pandas.HDFStore : Low-level access to HDF files. + DataFrame.to_hdf : Write a HDF file from a DataFrame. + HDFStore : Low-level access to HDF files. Examples -------- @@ -865,7 +854,7 @@ def put(self, key, value, format=None, append=False, **kwargs): Parameters ---------- key : object - value : {Series, DataFrame, Panel} + value : {Series, DataFrame} format : 'fixed(f)|table(t)', default is 'fixed' fixed(f) : Fixed format Fast writing/reading. Not-appendable, nor searchable @@ -947,7 +936,7 @@ def append(self, key, value, format=None, append=True, columns=None, Parameters ---------- key : object - value : {Series, DataFrame, Panel} + value : {Series, DataFrame} format : 'table' is the default table(t) : table format Write as a PyTables Table structure which may perform @@ -3028,16 +3017,6 @@ class FrameFixed(BlockManagerFixed): obj_type = DataFrame -class PanelFixed(BlockManagerFixed): - pandas_kind = u'wide' - obj_type = Panel - is_shape_reversed = True - - def write(self, obj, **kwargs): - obj._consolidate_inplace() - return super(PanelFixed, self).write(obj, **kwargs) - - class Table(Fixed): """ represent a table: @@ -3288,7 +3267,7 @@ def get_attrs(self): self.nan_rep = getattr(self.attrs, 'nan_rep', None) self.encoding = _ensure_encoding( getattr(self.attrs, 'encoding', None)) - self.errors = getattr(self.attrs, 'errors', 'strict') + self.errors = _ensure_decoded(getattr(self.attrs, 'errors', 'strict')) self.levels = getattr( self.attrs, 'levels', None) or [] self.index_axes = [ @@ -3900,107 +3879,11 @@ def read(self, where=None, columns=None, **kwargs): if not self.read_axes(where=where, **kwargs): return None - lst_vals = [a.values for a in self.index_axes] - labels, levels = _factorize_from_iterables(lst_vals) - # labels and levels are tuples but lists are expected - labels = list(labels) - levels = list(levels) - N = [len(lvl) for lvl in levels] - - # compute the key - key = _factor_indexer(N[1:], labels) - - objs = [] - if len(unique(key)) == len(key): - - sorter, _ = algos.groupsort_indexer( - ensure_int64(key), np.prod(N)) - sorter = ensure_platform_int(sorter) - - # create the objs - for c in self.values_axes: - - # the data need to be sorted - sorted_values = c.take_data().take(sorter, axis=0) - if sorted_values.ndim == 1: - sorted_values = sorted_values.reshape( - (sorted_values.shape[0], 1)) - - take_labels = [l.take(sorter) for l in labels] - items = Index(c.values) - block = _block2d_to_blocknd( - values=sorted_values, placement=np.arange(len(items)), - shape=tuple(N), labels=take_labels, ref_items=items) - - # create the object - mgr = BlockManager([block], [items] + levels) - obj = self.obj_type(mgr) - - # permute if needed - if self.is_transposed: - obj = obj.transpose( - *tuple(Series(self.data_orientation).argsort())) - - objs.append(obj) - - else: - warnings.warn(duplicate_doc, DuplicateWarning, stacklevel=5) - - # reconstruct - long_index = MultiIndex.from_arrays( - [i.values for i in self.index_axes]) - - for c in self.values_axes: - lp = DataFrame(c.data, index=long_index, columns=c.values) - - # need a better algorithm - tuple_index = long_index.values - - unique_tuples = unique(tuple_index) - unique_tuples = com.asarray_tuplesafe(unique_tuples) - - indexer = match(unique_tuples, tuple_index) - indexer = ensure_platform_int(indexer) - - new_index = long_index.take(indexer) - new_values = lp.values.take(indexer, axis=0) - - lp = DataFrame(new_values, index=new_index, columns=lp.columns) - objs.append(lp.to_panel()) - - # create the composite object - if len(objs) == 1: - wp = objs[0] - else: - wp = concat(objs, axis=0, verify_integrity=False)._consolidate() - - # apply the selection filters & axis orderings - wp = self.process_axes(wp, columns=columns) - - return wp - - -class LegacyFrameTable(LegacyTable): - - """ support the legacy frame table """ - pandas_kind = u'frame_table' - table_type = u'legacy_frame' - obj_type = Panel - - def read(self, *args, **kwargs): - return super(LegacyFrameTable, self).read(*args, **kwargs)['value'] - - -class LegacyPanelTable(LegacyTable): - - """ support the legacy panel table """ - table_type = u'legacy_panel' - obj_type = Panel + raise NotImplementedError("Panel is removed in pandas 0.25.0") class AppendableTable(LegacyTable): - - """ suppor the new appendable table formats """ + """ support the new appendable table formats """ _indexables = None table_type = u'appendable' @@ -4232,8 +4115,7 @@ def delete(self, where=None, start=None, stop=None, **kwargs): class AppendableFrameTable(AppendableTable): - - """ suppor the new appendable table formats """ + """ support the new appendable table formats """ pandas_kind = u'frame_table' table_type = u'appendable_frame' ndim = 2 @@ -4442,24 +4324,6 @@ def read(self, **kwargs): return df -class AppendablePanelTable(AppendableTable): - - """ suppor the new appendable table formats """ - table_type = u'appendable_panel' - ndim = 3 - obj_type = Panel - - def get_object(self, obj): - """ these are written transposed """ - if self.is_transposed: - obj = obj.transpose(*self.data_orientation) - return obj - - @property - def is_transposed(self): - return self.data_orientation != tuple(range(self.ndim)) - - def _reindex_axis(obj, axis, labels, other=None): ax = obj._get_axis(axis) labels = ensure_index(labels) @@ -4875,16 +4739,3 @@ def select_coords(self): return self.coordinates return np.arange(start, stop) - -# utilities ### - - -def timeit(key, df, fn=None, remove=True, **kwargs): - if fn is None: - fn = 'timeit.h5' - store = HDFStore(fn, mode='w') - store.append(key, df, **kwargs) - store.close() - - if remove: - os.remove(fn) diff --git a/pandas/io/sql.py b/pandas/io/sql.py index 5d1163b3e0024..02fba52eac7f7 100644 --- a/pandas/io/sql.py +++ b/pandas/io/sql.py @@ -182,26 +182,29 @@ def execute(sql, con, cur=None, params=None): def read_sql_table(table_name, con, schema=None, index_col=None, coerce_float=True, parse_dates=None, columns=None, chunksize=None): - """Read SQL database table into a DataFrame. + """ + Read SQL database table into a DataFrame. Given a table name and a SQLAlchemy connectable, returns a DataFrame. This function does not support DBAPI connections. Parameters ---------- - table_name : string + table_name : str Name of SQL table in database. - con : SQLAlchemy connectable (or database string URI) + con : SQLAlchemy connectable or str + A database URI could be provided as as str. SQLite DBAPI connection mode not supported. - schema : string, default None + schema : str, default None Name of SQL schema in database to query (if database flavor supports this). Uses default schema if None (default). - index_col : string or list of strings, optional, default: None + index_col : str or list of str, optional, default: None Column(s) to set as index(MultiIndex). - coerce_float : boolean, default True + coerce_float : bool, default True Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Can result in loss of Precision. - parse_dates : list or dict, default: None + parse_dates : list or dict, default None + The behavior is as follows: - List of column names to parse as dates. - Dict of ``{column_name: format string}`` where format string is strftime compatible in case of parsing string times or is one of @@ -210,8 +213,8 @@ def read_sql_table(table_name, con, schema=None, index_col=None, to the keyword arguments of :func:`pandas.to_datetime` Especially useful with databases without native Datetime support, such as SQLite. - columns : list, default: None - List of column names to select from SQL table + columns : list, default None + List of column names to select from SQL table. chunksize : int, default None If specified, returns an iterator where `chunksize` is the number of rows to include in each chunk. @@ -219,15 +222,21 @@ def read_sql_table(table_name, con, schema=None, index_col=None, Returns ------- DataFrame + A SQL table is returned as two-dimensional data structure with labeled + axes. See Also -------- read_sql_query : Read SQL query into a DataFrame. - read_sql + read_sql : Read SQL query or database table into a DataFrame. Notes ----- Any datetime values with time zone information will be converted to UTC. + + Examples + -------- + >>> pd.read_sql_table('table_name', 'postgres:///db_name') # doctest:+SKIP """ con = _engine_builder(con) @@ -381,7 +390,8 @@ def read_sql(sql, con, index_col=None, coerce_float=True, params=None, try: _is_table_name = pandas_sql.has_table(sql) - except (ImportError, AttributeError): + except Exception: + # using generic exception to catch errors from sql drivers (GH24988) _is_table_name = False if _is_table_name: diff --git a/pandas/io/stata.py b/pandas/io/stata.py index 1b0660171ecac..62a9dbdc4657e 100644 --- a/pandas/io/stata.py +++ b/pandas/io/stata.py @@ -100,8 +100,8 @@ See Also -------- -pandas.io.stata.StataReader : Low-level reader for Stata data files. -pandas.DataFrame.to_stata: Export Stata data files. +io.stata.StataReader : Low-level reader for Stata data files. +DataFrame.to_stata: Export Stata data files. Examples -------- @@ -119,7 +119,7 @@ _iterator_params) _data_method_doc = """\ -Reads observations from Stata file, converting them into a dataframe +Read observations from Stata file, converting them into a dataframe .. deprecated:: This is a legacy method. Use `read` in new code. @@ -1726,18 +1726,22 @@ def _do_convert_categoricals(self, data, value_label_dict, lbllist, return data def data_label(self): - """Returns data label of Stata file""" + """ + Return data label of Stata file. + """ return self.data_label def variable_labels(self): - """Returns variable labels as a dict, associating each variable name - with corresponding label + """ + Return variable labels as a dict, associating each variable name + with corresponding label. """ return dict(zip(self.varlist, self._variable_labels)) def value_labels(self): - """Returns a dict, associating each variable name a dict, associating - each value its corresponding label + """ + Return a dict, associating each variable name a dict, associating + each value its corresponding label. """ if not self._value_labels_read: self._read_value_labels() @@ -1747,7 +1751,7 @@ def value_labels(self): def _open_file_binary_write(fname): """ - Open a binary file or no-op if file-like + Open a binary file or no-op if file-like. Parameters ---------- @@ -1778,14 +1782,14 @@ def _set_endianness(endianness): def _pad_bytes(name, length): """ - Takes a char string and pads it with null bytes until it's length chars + Take a char string and pads it with null bytes until it's length chars. """ return name + "\x00" * (length - len(name)) def _convert_datetime_to_stata_type(fmt): """ - Converts from one of the stata date formats to a type in TYPE_MAP + Convert from one of the stata date formats to a type in TYPE_MAP. """ if fmt in ["tc", "%tc", "td", "%td", "tw", "%tw", "tm", "%tm", "tq", "%tq", "th", "%th", "ty", "%ty"]: @@ -1812,7 +1816,7 @@ def _maybe_convert_to_int_keys(convert_dates, varlist): def _dtype_to_stata_type(dtype, column): """ - Converts dtype types to stata types. Returns the byte of the given ordinal. + Convert dtype types to stata types. Returns the byte of the given ordinal. See TYPE_MAP and comments for an explanation. This is also explained in the dta spec. 1 - 244 are strings of this length @@ -1850,7 +1854,7 @@ def _dtype_to_stata_type(dtype, column): def _dtype_to_default_stata_fmt(dtype, column, dta_version=114, force_strl=False): """ - Maps numpy dtype to stata's default format for this type. Not terribly + Map numpy dtype to stata's default format for this type. Not terribly important since users can change this in Stata. Semantics are object -> "%DDs" where DD is the length of the string. If not a string, @@ -2385,32 +2389,22 @@ def _prepare_data(self): data = self._convert_strls(data) # 3. Convert bad string data to '' and pad to correct length - dtypes = [] - data_cols = [] - has_strings = False + dtypes = {} native_byteorder = self._byteorder == _set_endianness(sys.byteorder) for i, col in enumerate(data): typ = typlist[i] if typ <= self._max_string_length: - has_strings = True data[col] = data[col].fillna('').apply(_pad_bytes, args=(typ,)) stype = 'S{type}'.format(type=typ) - dtypes.append(('c' + str(i), stype)) - string = data[col].str.encode(self._encoding) - data_cols.append(string.values.astype(stype)) + dtypes[col] = stype + data[col] = data[col].str.encode(self._encoding).astype(stype) else: - values = data[col].values dtype = data[col].dtype if not native_byteorder: dtype = dtype.newbyteorder(self._byteorder) - dtypes.append(('c' + str(i), dtype)) - data_cols.append(values) - dtypes = np.dtype(dtypes) + dtypes[col] = dtype - if has_strings or not native_byteorder: - self.data = np.fromiter(zip(*data_cols), dtype=dtypes) - else: - self.data = data.to_records(index=False) + self.data = data.to_records(index=False, column_dtypes=dtypes) def _write_data(self): data = self.data diff --git a/pandas/plotting/_core.py b/pandas/plotting/_core.py index e543ab88f53b2..48d870bfc2e03 100644 --- a/pandas/plotting/_core.py +++ b/pandas/plotting/_core.py @@ -39,7 +39,7 @@ else: _HAS_MPL = True if get_option('plotting.matplotlib.register_converters'): - _converter.register(explicit=True) + _converter.register(explicit=False) def _raise_if_no_mpl(): @@ -1413,7 +1413,7 @@ def orientation(self): Returns ------- - axes : matplotlib.axes.Axes or numpy.ndarray of them + matplotlib.axes.Axes or numpy.ndarray of them See Also -------- @@ -1809,26 +1809,26 @@ def _plot(data, x=None, y=None, subplots=False, Allows plotting of one column versus another""" series_coord = "" -df_unique = """stacked : boolean, default False in line and +df_unique = """stacked : bool, default False in line and bar plots, and True in area plot. If True, create stacked plot. - sort_columns : boolean, default False + sort_columns : bool, default False Sort column names to determine plot ordering - secondary_y : boolean or sequence, default False + secondary_y : bool or sequence, default False Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis""" series_unique = """label : label argument to provide to plot - secondary_y : boolean or sequence of ints, default False + secondary_y : bool or sequence of ints, default False If True then y-axis will be on the right""" df_ax = """ax : matplotlib axes object, default None - subplots : boolean, default False + subplots : bool, default False Make separate subplots for each column - sharex : boolean, default True if ax is None else False + sharex : bool, default True if ax is None else False In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in; Be aware, that passing in both an ax and sharex=True will alter all x axis labels for all axis in a figure! - sharey : boolean, default False + sharey : bool, default False In case subplots=True, share y axis and set some y axis labels to invisible layout : tuple (optional) @@ -1882,23 +1882,23 @@ def _plot(data, x=None, y=None, subplots=False, %(klass_kind)s %(klass_ax)s figsize : a tuple (width, height) in inches - use_index : boolean, default True + use_index : bool, default True Use index as ticks for x axis title : string or list Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and `subplots` is True, print each item in the list above the corresponding subplot. - grid : boolean, default None (matlab style default) + grid : bool, default None (matlab style default) Axis grid lines legend : False/True/'reverse' Place legend on axis subplots style : list or dict matplotlib line style per column - logx : boolean, default False + logx : bool, default False Use log scaling on x axis - logy : boolean, default False + logy : bool, default False Use log scaling on y axis - loglog : boolean, default False + loglog : bool, default False Use log scaling on both x and y axes xticks : sequence Values to use for the xticks @@ -1913,12 +1913,12 @@ def _plot(data, x=None, y=None, subplots=False, colormap : str or matplotlib colormap object, default None Colormap to select colors from. If string, load colormap with that name from matplotlib. - colorbar : boolean, optional + colorbar : bool, optional If True, plot colorbar (only relevant for 'scatter' and 'hexbin' plots) position : float Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center) - table : boolean, Series or DataFrame, default False + table : bool, Series or DataFrame, default False If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib's default layout. If a Series or DataFrame is passed, use passed data to draw a table. @@ -1927,7 +1927,7 @@ def _plot(data, x=None, y=None, subplots=False, detail. xerr : same types as yerr. %(klass_unique)s - mark_right : boolean, default True + mark_right : bool, default True When using a secondary_y axis, automatically mark the column labels with "(right)" in the legend `**kwds` : keywords @@ -1935,7 +1935,7 @@ def _plot(data, x=None, y=None, subplots=False, Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them Notes ----- @@ -2025,7 +2025,7 @@ def plot_series(data, kind='line', ax=None, # Series unique rot : int or float, default 0 The rotation angle of labels (in degrees) with respect to the screen coordinate system. - grid : boolean, default True + grid : bool, default True Setting this to True will show the grid. figsize : A tuple (width, height) in inches The size of the figure to create in matplotlib. @@ -2050,9 +2050,17 @@ def plot_series(data, kind='line', ax=None, # Series unique Returns ------- - result : + result + See Notes. + + See Also + -------- + Series.plot.hist: Make a histogram. + matplotlib.pyplot.boxplot : Matplotlib equivalent plot. - The return type depends on the `return_type` parameter: + Notes + ----- + The return type depends on the `return_type` parameter: * 'axes' : object of class matplotlib.axes.Axes * 'dict' : dict of matplotlib.lines.Line2D objects @@ -2062,14 +2070,8 @@ def plot_series(data, kind='line', ax=None, # Series unique * :class:`~pandas.Series` * :class:`~numpy.array` (for ``return_type = None``) + Return Series or numpy.array. - See Also - -------- - Series.plot.hist: Make a histogram. - matplotlib.pyplot.boxplot : Matplotlib equivalent plot. - - Notes - ----- Use ``return_type='dict'`` when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned. @@ -2271,7 +2273,7 @@ def scatter_plot(data, x, y, by=None, ax=None, figsize=None, grid=False, Returns ------- - fig : matplotlib.Figure + matplotlib.Figure """ import matplotlib.pyplot as plt @@ -2320,7 +2322,7 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, If passed, will be used to limit data to a subset of columns. by : object, optional If passed, then used to form histograms for separate groups. - grid : boolean, default True + grid : bool, default True Whether to show axis grid lines. xlabelsize : int, default None If specified changes the x-axis label size. @@ -2334,13 +2336,13 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, y labels rotated 90 degrees clockwise. ax : Matplotlib axes object, default None The axes to plot the histogram on. - sharex : boolean, default True if ax is None else False + sharex : bool, default True if ax is None else False In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in. Note that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure. - sharey : boolean, default False + sharey : bool, default False In case subplots=True, share y axis and set some y axis labels to invisible. figsize : tuple @@ -2359,7 +2361,7 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, Returns ------- - axes : matplotlib.AxesSubplot or numpy.ndarray of them + matplotlib.AxesSubplot or numpy.ndarray of them See Also -------- @@ -2427,7 +2429,7 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, If passed, then used to form histograms for separate groups ax : matplotlib axis object If not passed, uses gca() - grid : boolean, default True + grid : bool, default True Whether to show axis grid lines xlabelsize : int, default None If specified changes the x-axis label size @@ -2509,15 +2511,15 @@ def grouped_hist(data, column=None, by=None, ax=None, bins=50, figsize=None, bins : int, default 50 figsize : tuple, optional layout : optional - sharex : boolean, default False - sharey : boolean, default False + sharex : bool, default False + sharey : bool, default False rot : int, default 90 grid : bool, default True kwargs : dict, keyword arguments passed to matplotlib.Axes.hist Returns ------- - axes : collection of Matplotlib Axes + collection of Matplotlib Axes """ _raise_if_no_mpl() _converter._WARN = False @@ -2548,7 +2550,7 @@ def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None, Parameters ---------- grouped : Grouped DataFrame - subplots : + subplots : bool * ``False`` - no subplots will be used * ``True`` - create a subplot for each group column : column name or list of names, or vector @@ -2751,7 +2753,7 @@ def line(self, **kwds): Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them Examples -------- @@ -2776,7 +2778,7 @@ def bar(self, **kwds): Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them """ return self(kind='bar', **kwds) @@ -2792,7 +2794,7 @@ def barh(self, **kwds): Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them """ return self(kind='barh', **kwds) @@ -2808,7 +2810,7 @@ def box(self, **kwds): Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them """ return self(kind='box', **kwds) @@ -2826,7 +2828,7 @@ def hist(self, bins=10, **kwds): Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them """ return self(kind='hist', bins=bins, **kwds) @@ -2885,7 +2887,7 @@ def area(self, **kwds): Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them """ return self(kind='area', **kwds) @@ -2901,7 +2903,7 @@ def pie(self, **kwds): Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them """ return self(kind='pie', **kwds) @@ -2957,12 +2959,12 @@ def line(self, x=None, y=None, **kwds): Either the location or the label of the columns to be used. By default, it will use the remaining DataFrame numeric columns. **kwds - Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. + Keyword arguments to pass on to :meth:`DataFrame.plot`. Returns ------- - axes : :class:`matplotlib.axes.Axes` or :class:`numpy.ndarray` - Returns an ndarray when ``subplots=True``. + :class:`matplotlib.axes.Axes` or :class:`numpy.ndarray` + Return an ndarray when ``subplots=True``. See Also -------- @@ -3022,18 +3024,18 @@ def bar(self, x=None, y=None, **kwds): all numerical columns are used. **kwds Additional keyword arguments are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- - axes : matplotlib.axes.Axes or np.ndarray of them + matplotlib.axes.Axes or np.ndarray of them An ndarray is returned with one :class:`matplotlib.axes.Axes` per column when ``subplots=True``. See Also -------- - pandas.DataFrame.plot.barh : Horizontal bar plot. - pandas.DataFrame.plot : Make plots of a DataFrame. + DataFrame.plot.barh : Horizontal bar plot. + DataFrame.plot : Make plots of a DataFrame. matplotlib.pyplot.bar : Make a bar plot with matplotlib. Examples @@ -3104,16 +3106,16 @@ def barh(self, x=None, y=None, **kwds): y : label or position, default All numeric columns in dataframe Columns to be plotted from the DataFrame. **kwds - Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. + Keyword arguments to pass on to :meth:`DataFrame.plot`. Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them. + :class:`matplotlib.axes.Axes` or numpy.ndarray of them See Also -------- - pandas.DataFrame.plot.bar: Vertical bar plot. - pandas.DataFrame.plot : Make plots of DataFrame using matplotlib. + DataFrame.plot.bar: Vertical bar plot. + DataFrame.plot : Make plots of DataFrame using matplotlib. matplotlib.axes.Axes.bar : Plot a vertical bar plot using matplotlib. Examples @@ -3191,16 +3193,16 @@ def box(self, by=None, **kwds): Column in the DataFrame to group by. **kwds : optional Additional keywords are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them See Also -------- - pandas.DataFrame.boxplot: Another method to draw a box plot. - pandas.Series.plot.box: Draw a box plot from a Series object. + DataFrame.boxplot: Another method to draw a box plot. + Series.plot.box: Draw a box plot from a Series object. matplotlib.pyplot.boxplot: Draw a box plot in matplotlib. Examples @@ -3234,11 +3236,12 @@ def hist(self, by=None, bins=10, **kwds): Number of histogram bins to be used. **kwds Additional keyword arguments are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- - axes : matplotlib.AxesSubplot histogram. + class:`matplotlib.AxesSubplot` + Return a histogram plot. See Also -------- @@ -3327,12 +3330,12 @@ def area(self, x=None, y=None, **kwds): unstacked plot. **kwds : optional Additional keyword arguments are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- matplotlib.axes.Axes or numpy.ndarray - Area plot, or array of area plots if subplots is True + Area plot, or array of area plots if subplots is True. See Also -------- @@ -3398,11 +3401,11 @@ def pie(self, y=None, **kwds): Label or position of the column to plot. If not provided, ``subplots=True`` argument must be passed. **kwds - Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. + Keyword arguments to pass on to :meth:`DataFrame.plot`. Returns ------- - axes : matplotlib.axes.Axes or np.ndarray of them. + matplotlib.axes.Axes or np.ndarray of them A NumPy array is returned when `subplots` is True. See Also @@ -3474,11 +3477,11 @@ def scatter(self, x, y, s=None, c=None, **kwds): marker points according to a colormap. **kwds - Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. + Keyword arguments to pass on to :meth:`DataFrame.plot`. Returns ------- - axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them + :class:`matplotlib.axes.Axes` or numpy.ndarray of them See Also -------- @@ -3548,7 +3551,7 @@ def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, y-direction. **kwds Additional keyword arguments are documented in - :meth:`pandas.DataFrame.plot`. + :meth:`DataFrame.plot`. Returns ------- diff --git a/pandas/plotting/_misc.py b/pandas/plotting/_misc.py index 1c69c03025e00..5171ea68fd497 100644 --- a/pandas/plotting/_misc.py +++ b/pandas/plotting/_misc.py @@ -178,11 +178,11 @@ def radviz(frame, class_column, ax=None, color=None, colormap=None, **kwds): Returns ------- - axes : :class:`matplotlib.axes.Axes` + class:`matplotlib.axes.Axes` See Also -------- - pandas.plotting.andrews_curves : Plot clustering visualization. + plotting.andrews_curves : Plot clustering visualization. Examples -------- @@ -273,7 +273,7 @@ def normalize(series): def andrews_curves(frame, class_column, ax=None, samples=200, color=None, colormap=None, **kwds): """ - Generates a matplotlib plot of Andrews curves, for visualising clusters of + Generate a matplotlib plot of Andrews curves, for visualising clusters of multivariate data. Andrews curves have the functional form: @@ -302,7 +302,7 @@ def andrews_curves(frame, class_column, ax=None, samples=200, color=None, Returns ------- - ax : Matplotlib axis object + class:`matplotlip.axis.Axes` """ from math import sqrt, pi @@ -389,13 +389,13 @@ def bootstrap_plot(series, fig=None, size=50, samples=500, **kwds): Returns ------- - fig : matplotlib.figure.Figure - Matplotlib figure + matplotlib.figure.Figure + Matplotlib figure. See Also -------- - pandas.DataFrame.plot : Basic plotting for DataFrame objects. - pandas.Series.plot : Basic plotting for Series objects. + DataFrame.plot : Basic plotting for DataFrame objects. + Series.plot : Basic plotting for Series objects. Examples -------- @@ -490,7 +490,7 @@ def parallel_coordinates(frame, class_column, cols=None, ax=None, color=None, Returns ------- - ax: matplotlib axis object + class:`matplotlib.axis.Axes` Examples -------- @@ -579,7 +579,7 @@ def lag_plot(series, lag=1, ax=None, **kwds): Returns ------- - ax: Matplotlib axis object + class:`matplotlib.axis.Axes` """ import matplotlib.pyplot as plt @@ -598,7 +598,8 @@ def lag_plot(series, lag=1, ax=None, **kwds): def autocorrelation_plot(series, ax=None, **kwds): - """Autocorrelation plot for time series. + """ + Autocorrelation plot for time series. Parameters: ----------- @@ -609,7 +610,7 @@ def autocorrelation_plot(series, ax=None, **kwds): Returns: ----------- - ax: Matplotlib axis object + class:`matplotlib.axis.Axes` """ import matplotlib.pyplot as plt n = len(series) diff --git a/pandas/tests/arithmetic/test_datetime64.py b/pandas/tests/arithmetic/test_datetime64.py index 405dc0805a285..c81a371f37dc1 100644 --- a/pandas/tests/arithmetic/test_datetime64.py +++ b/pandas/tests/arithmetic/test_datetime64.py @@ -1440,6 +1440,20 @@ def test_dt64arr_add_sub_offset_ndarray(self, tz_naive_fixture, class TestDatetime64OverflowHandling(object): # TODO: box + de-duplicate + def test_dt64_overflow_masking(self, box_with_array): + # GH#25317 + left = Series([Timestamp('1969-12-31')]) + right = Series([NaT]) + + left = tm.box_expected(left, box_with_array) + right = tm.box_expected(right, box_with_array) + + expected = TimedeltaIndex([NaT]) + expected = tm.box_expected(expected, box_with_array) + + result = left - right + tm.assert_equal(result, expected) + def test_dt64_series_arith_overflow(self): # GH#12534, fixed by GH#19024 dt = pd.Timestamp('1700-01-31') diff --git a/pandas/tests/arithmetic/test_timedelta64.py b/pandas/tests/arithmetic/test_timedelta64.py index c31d7acad3111..0faed74d4a021 100644 --- a/pandas/tests/arithmetic/test_timedelta64.py +++ b/pandas/tests/arithmetic/test_timedelta64.py @@ -205,10 +205,20 @@ def test_subtraction_ops(self): td = Timedelta('1 days') dt = Timestamp('20130101') - pytest.raises(TypeError, lambda: tdi - dt) - pytest.raises(TypeError, lambda: tdi - dti) - pytest.raises(TypeError, lambda: td - dt) - pytest.raises(TypeError, lambda: td - dti) + msg = "cannot subtract a datelike from a TimedeltaArray" + with pytest.raises(TypeError, match=msg): + tdi - dt + with pytest.raises(TypeError, match=msg): + tdi - dti + + msg = (r"descriptor '__sub__' requires a 'datetime\.datetime' object" + " but received a 'Timedelta'") + with pytest.raises(TypeError, match=msg): + td - dt + + msg = "bad operand type for unary -: 'DatetimeArray'" + with pytest.raises(TypeError, match=msg): + td - dti result = dt - dti expected = TimedeltaIndex(['0 days', '-1 days', '-2 days'], name='bar') @@ -265,19 +275,38 @@ def _check(result, expected): _check(result, expected) # tz mismatches - pytest.raises(TypeError, lambda: dt_tz - ts) - pytest.raises(TypeError, lambda: dt_tz - dt) - pytest.raises(TypeError, lambda: dt_tz - ts_tz2) - pytest.raises(TypeError, lambda: dt - dt_tz) - pytest.raises(TypeError, lambda: ts - dt_tz) - pytest.raises(TypeError, lambda: ts_tz2 - ts) - pytest.raises(TypeError, lambda: ts_tz2 - dt) - pytest.raises(TypeError, lambda: ts_tz - ts_tz2) + msg = ("Timestamp subtraction must have the same timezones or no" + " timezones") + with pytest.raises(TypeError, match=msg): + dt_tz - ts + msg = "can't subtract offset-naive and offset-aware datetimes" + with pytest.raises(TypeError, match=msg): + dt_tz - dt + msg = ("Timestamp subtraction must have the same timezones or no" + " timezones") + with pytest.raises(TypeError, match=msg): + dt_tz - ts_tz2 + msg = "can't subtract offset-naive and offset-aware datetimes" + with pytest.raises(TypeError, match=msg): + dt - dt_tz + msg = ("Timestamp subtraction must have the same timezones or no" + " timezones") + with pytest.raises(TypeError, match=msg): + ts - dt_tz + with pytest.raises(TypeError, match=msg): + ts_tz2 - ts + with pytest.raises(TypeError, match=msg): + ts_tz2 - dt + with pytest.raises(TypeError, match=msg): + ts_tz - ts_tz2 # with dti - pytest.raises(TypeError, lambda: dti - ts_tz) - pytest.raises(TypeError, lambda: dti_tz - ts) - pytest.raises(TypeError, lambda: dti_tz - ts_tz2) + with pytest.raises(TypeError, match=msg): + dti - ts_tz + with pytest.raises(TypeError, match=msg): + dti_tz - ts + with pytest.raises(TypeError, match=msg): + dti_tz - ts_tz2 result = dti_tz - dt_tz expected = TimedeltaIndex(['0 days', '1 days', '2 days']) @@ -349,8 +378,11 @@ def test_addition_ops(self): tm.assert_index_equal(result, expected) # unequal length - pytest.raises(ValueError, lambda: tdi + dti[0:1]) - pytest.raises(ValueError, lambda: tdi[0:1] + dti) + msg = "cannot add indices of unequal length" + with pytest.raises(ValueError, match=msg): + tdi + dti[0:1] + with pytest.raises(ValueError, match=msg): + tdi[0:1] + dti # random indexes with pytest.raises(NullFrequencyError): diff --git a/pandas/tests/arrays/categorical/test_analytics.py b/pandas/tests/arrays/categorical/test_analytics.py index 5efcd527de8d8..7ce82d5bcdded 100644 --- a/pandas/tests/arrays/categorical/test_analytics.py +++ b/pandas/tests/arrays/categorical/test_analytics.py @@ -18,8 +18,11 @@ def test_min_max(self): # unordered cats have no min/max cat = Categorical(["a", "b", "c", "d"], ordered=False) - pytest.raises(TypeError, lambda: cat.min()) - pytest.raises(TypeError, lambda: cat.max()) + msg = "Categorical is not ordered for operation {}" + with pytest.raises(TypeError, match=msg.format('min')): + cat.min() + with pytest.raises(TypeError, match=msg.format('max')): + cat.max() cat = Categorical(["a", "b", "c", "d"], ordered=True) _min = cat.min() @@ -108,18 +111,24 @@ def test_searchsorted(self): tm.assert_numpy_array_equal(res_ser, exp) # Searching for a single value that is not from the Categorical - pytest.raises(KeyError, lambda: c1.searchsorted('cucumber')) - pytest.raises(KeyError, lambda: s1.searchsorted('cucumber')) + msg = r"Value\(s\) to be inserted must be in categories" + with pytest.raises(KeyError, match=msg): + c1.searchsorted('cucumber') + with pytest.raises(KeyError, match=msg): + s1.searchsorted('cucumber') # Searching for multiple values one of each is not from the Categorical - pytest.raises(KeyError, - lambda: c1.searchsorted(['bread', 'cucumber'])) - pytest.raises(KeyError, - lambda: s1.searchsorted(['bread', 'cucumber'])) + with pytest.raises(KeyError, match=msg): + c1.searchsorted(['bread', 'cucumber']) + with pytest.raises(KeyError, match=msg): + s1.searchsorted(['bread', 'cucumber']) # searchsorted call for unordered Categorical - pytest.raises(ValueError, lambda: c2.searchsorted('apple')) - pytest.raises(ValueError, lambda: s2.searchsorted('apple')) + msg = "Categorical not ordered" + with pytest.raises(ValueError, match=msg): + c2.searchsorted('apple') + with pytest.raises(ValueError, match=msg): + s2.searchsorted('apple') def test_unique(self): # categories are reordered based on value when ordered=False diff --git a/pandas/tests/arrays/categorical/test_constructors.py b/pandas/tests/arrays/categorical/test_constructors.py index 25c299692ceca..f07e3aba53cd4 100644 --- a/pandas/tests/arrays/categorical/test_constructors.py +++ b/pandas/tests/arrays/categorical/test_constructors.py @@ -212,6 +212,18 @@ def test_constructor(self): c = Categorical(np.array([], dtype='int64'), # noqa categories=[3, 2, 1], ordered=True) + def test_constructor_with_existing_categories(self): + # GH25318: constructing with pd.Series used to bogusly skip recoding + # categories + c0 = Categorical(["a", "b", "c", "a"]) + c1 = Categorical(["a", "b", "c", "a"], categories=["b", "c"]) + + c2 = Categorical(c0, categories=c1.categories) + tm.assert_categorical_equal(c1, c2) + + c3 = Categorical(Series(c0), categories=c1.categories) + tm.assert_categorical_equal(c1, c3) + def test_constructor_not_sequence(self): # https://github.com/pandas-dev/pandas/issues/16022 msg = r"^Parameter 'categories' must be list-like, was" diff --git a/pandas/tests/arrays/categorical/test_operators.py b/pandas/tests/arrays/categorical/test_operators.py index b2965bbcc456a..e1264722aedcd 100644 --- a/pandas/tests/arrays/categorical/test_operators.py +++ b/pandas/tests/arrays/categorical/test_operators.py @@ -4,6 +4,8 @@ import numpy as np import pytest +from pandas.compat import PY2 + import pandas as pd from pandas import Categorical, DataFrame, Series, date_range from pandas.tests.arrays.categorical.common import TestCategorical @@ -17,6 +19,7 @@ def test_categories_none_comparisons(self): 'a', 'c', 'c', 'c'], ordered=True) tm.assert_categorical_equal(factor, self.factor) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_comparisons(self): result = self.factor[self.factor == 'a'] @@ -95,16 +98,24 @@ def test_comparisons(self): # comparison (in both directions) with Series will raise s = Series(["b", "b", "b"]) - pytest.raises(TypeError, lambda: cat > s) - pytest.raises(TypeError, lambda: cat_rev > s) - pytest.raises(TypeError, lambda: s < cat) - pytest.raises(TypeError, lambda: s < cat_rev) + msg = ("Cannot compare a Categorical for op __gt__ with type" + r" ") + with pytest.raises(TypeError, match=msg): + cat > s + with pytest.raises(TypeError, match=msg): + cat_rev > s + with pytest.raises(TypeError, match=msg): + s < cat + with pytest.raises(TypeError, match=msg): + s < cat_rev # comparison with numpy.array will raise in both direction, but only on # newer numpy versions a = np.array(["b", "b", "b"]) - pytest.raises(TypeError, lambda: cat > a) - pytest.raises(TypeError, lambda: cat_rev > a) + with pytest.raises(TypeError, match=msg): + cat > a + with pytest.raises(TypeError, match=msg): + cat_rev > a # Make sure that unequal comparison take the categories order in # account @@ -163,16 +174,23 @@ def test_comparison_with_unknown_scalars(self): # for unequal comps, but not for equal/not equal cat = Categorical([1, 2, 3], ordered=True) - pytest.raises(TypeError, lambda: cat < 4) - pytest.raises(TypeError, lambda: cat > 4) - pytest.raises(TypeError, lambda: 4 < cat) - pytest.raises(TypeError, lambda: 4 > cat) + msg = ("Cannot compare a Categorical for op __{}__ with a scalar," + " which is not a category") + with pytest.raises(TypeError, match=msg.format('lt')): + cat < 4 + with pytest.raises(TypeError, match=msg.format('gt')): + cat > 4 + with pytest.raises(TypeError, match=msg.format('gt')): + 4 < cat + with pytest.raises(TypeError, match=msg.format('lt')): + 4 > cat tm.assert_numpy_array_equal(cat == 4, np.array([False, False, False])) tm.assert_numpy_array_equal(cat != 4, np.array([True, True, True])) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") @pytest.mark.parametrize('data,reverse,base', [ (list("abc"), list("cba"), list("bbb")), ([1, 2, 3], [3, 2, 1], [2, 2, 2])] @@ -219,16 +237,26 @@ def test_comparisons(self, data, reverse, base): # categorical cannot be compared to Series or numpy array, and also # not the other way around - pytest.raises(TypeError, lambda: cat > s) - pytest.raises(TypeError, lambda: cat_rev > s) - pytest.raises(TypeError, lambda: cat > a) - pytest.raises(TypeError, lambda: cat_rev > a) + msg = ("Cannot compare a Categorical for op __gt__ with type" + r" ") + with pytest.raises(TypeError, match=msg): + cat > s + with pytest.raises(TypeError, match=msg): + cat_rev > s + with pytest.raises(TypeError, match=msg): + cat > a + with pytest.raises(TypeError, match=msg): + cat_rev > a - pytest.raises(TypeError, lambda: s < cat) - pytest.raises(TypeError, lambda: s < cat_rev) + with pytest.raises(TypeError, match=msg): + s < cat + with pytest.raises(TypeError, match=msg): + s < cat_rev - pytest.raises(TypeError, lambda: a < cat) - pytest.raises(TypeError, lambda: a < cat_rev) + with pytest.raises(TypeError, match=msg): + a < cat + with pytest.raises(TypeError, match=msg): + a < cat_rev @pytest.mark.parametrize('ctor', [ lambda *args, **kwargs: Categorical(*args, **kwargs), @@ -287,16 +315,21 @@ def test_numeric_like_ops(self): right=False, labels=cat_labels) # numeric ops should not succeed - for op in ['__add__', '__sub__', '__mul__', '__truediv__']: - pytest.raises(TypeError, - lambda: getattr(df, op)(df)) + for op, str_rep in [('__add__', r'\+'), + ('__sub__', '-'), + ('__mul__', r'\*'), + ('__truediv__', '/')]: + msg = r"Series cannot perform the operation {}".format(str_rep) + with pytest.raises(TypeError, match=msg): + getattr(df, op)(df) # reduction ops should not succeed (unless specifically defined, e.g. # min/max) s = df['value_group'] for op in ['kurt', 'skew', 'var', 'std', 'mean', 'sum', 'median']: - pytest.raises(TypeError, - lambda: getattr(s, op)(numeric_only=False)) + msg = "Categorical cannot perform the operation {}".format(op) + with pytest.raises(TypeError, match=msg): + getattr(s, op)(numeric_only=False) # mad technically works because it takes always the numeric data @@ -306,8 +339,13 @@ def test_numeric_like_ops(self): np.sum(s) # numeric ops on a Series - for op in ['__add__', '__sub__', '__mul__', '__truediv__']: - pytest.raises(TypeError, lambda: getattr(s, op)(2)) + for op, str_rep in [('__add__', r'\+'), + ('__sub__', '-'), + ('__mul__', r'\*'), + ('__truediv__', '/')]: + msg = r"Series cannot perform the operation {}".format(str_rep) + with pytest.raises(TypeError, match=msg): + getattr(s, op)(2) # invalid ufunc with pytest.raises(TypeError): diff --git a/pandas/tests/arrays/sparse/test_libsparse.py b/pandas/tests/arrays/sparse/test_libsparse.py index 6e9d790bf85f3..2cbe7d9ea084c 100644 --- a/pandas/tests/arrays/sparse/test_libsparse.py +++ b/pandas/tests/arrays/sparse/test_libsparse.py @@ -449,11 +449,13 @@ def test_check_integrity(self): # also OK even though empty index = BlockIndex(1, locs, lengths) # noqa - # block extend beyond end - pytest.raises(Exception, BlockIndex, 10, [5], [10]) + msg = "Block 0 extends beyond end" + with pytest.raises(ValueError, match=msg): + BlockIndex(10, [5], [10]) - # block overlap - pytest.raises(Exception, BlockIndex, 10, [2, 5], [5, 3]) + msg = "Block 0 overlaps" + with pytest.raises(ValueError, match=msg): + BlockIndex(10, [2, 5], [5, 3]) def test_to_int_index(self): locs = [0, 10] diff --git a/pandas/tests/arrays/test_array.py b/pandas/tests/arrays/test_array.py index 9fea1989e46df..b68ec2bf348b4 100644 --- a/pandas/tests/arrays/test_array.py +++ b/pandas/tests/arrays/test_array.py @@ -9,6 +9,7 @@ import pandas as pd from pandas.api.extensions import register_extension_dtype +from pandas.api.types import is_scalar from pandas.core.arrays import PandasArray, integer_array, period_array from pandas.tests.extension.decimal import ( DecimalArray, DecimalDtype, to_decimal) @@ -254,3 +255,51 @@ def test_array_not_registered(registry_without_decimal): result = pd.array(data, dtype=DecimalDtype) expected = DecimalArray._from_sequence(data) tm.assert_equal(result, expected) + + +class TestArrayAnalytics(object): + def test_searchsorted(self, string_dtype): + arr = pd.array(['a', 'b', 'c'], dtype=string_dtype) + + result = arr.searchsorted('a', side='left') + assert is_scalar(result) + assert result == 0 + + result = arr.searchsorted('a', side='right') + assert is_scalar(result) + assert result == 1 + + def test_searchsorted_numeric_dtypes_scalar(self, any_real_dtype): + arr = pd.array([1, 3, 90], dtype=any_real_dtype) + result = arr.searchsorted(30) + assert is_scalar(result) + assert result == 2 + + result = arr.searchsorted([30]) + expected = np.array([2], dtype=np.intp) + tm.assert_numpy_array_equal(result, expected) + + def test_searchsorted_numeric_dtypes_vector(self, any_real_dtype): + arr = pd.array([1, 3, 90], dtype=any_real_dtype) + result = arr.searchsorted([2, 30]) + expected = np.array([1, 2], dtype=np.intp) + tm.assert_numpy_array_equal(result, expected) + + @pytest.mark.parametrize('arr, val', [ + [pd.date_range('20120101', periods=10, freq='2D'), + pd.Timestamp('20120102')], + [pd.date_range('20120101', periods=10, freq='2D', tz='Asia/Hong_Kong'), + pd.Timestamp('20120102', tz='Asia/Hong_Kong')], + [pd.timedelta_range(start='1 day', end='10 days', periods=10), + pd.Timedelta('2 days')]]) + def test_search_sorted_datetime64_scalar(self, arr, val): + arr = pd.array(arr) + result = arr.searchsorted(val) + assert is_scalar(result) + assert result == 1 + + def test_searchsorted_sorter(self, any_real_dtype): + arr = pd.array([3, 1, 2], dtype=any_real_dtype) + result = arr.searchsorted([0, 3], sorter=np.argsort(arr)) + expected = np.array([0, 2], dtype=np.intp) + tm.assert_numpy_array_equal(result, expected) diff --git a/pandas/tests/arrays/test_integer.py b/pandas/tests/arrays/test_integer.py index 09298bb5cd08d..67e7db5460e6d 100644 --- a/pandas/tests/arrays/test_integer.py +++ b/pandas/tests/arrays/test_integer.py @@ -339,7 +339,7 @@ def _compare_other(self, data, op_name, other): expected = pd.Series(op(data._data, other)) # fill the nan locations - expected[data._mask] = True if op_name == '__ne__' else False + expected[data._mask] = op_name == '__ne__' tm.assert_series_equal(result, expected) @@ -351,7 +351,7 @@ def _compare_other(self, data, op_name, other): expected = op(expected, other) # fill the nan locations - expected[data._mask] = True if op_name == '__ne__' else False + expected[data._mask] = op_name == '__ne__' tm.assert_series_equal(result, expected) diff --git a/pandas/tests/arrays/test_timedeltas.py b/pandas/tests/arrays/test_timedeltas.py index 6b4662ca02e80..1fec533a14a6f 100644 --- a/pandas/tests/arrays/test_timedeltas.py +++ b/pandas/tests/arrays/test_timedeltas.py @@ -9,6 +9,18 @@ class TestTimedeltaArrayConstructor(object): + def test_only_1dim_accepted(self): + # GH#25282 + arr = np.array([0, 1, 2, 3], dtype='m8[h]').astype('m8[ns]') + + with pytest.raises(ValueError, match="Only 1-dimensional"): + # 2-dim + TimedeltaArray(arr.reshape(2, 2)) + + with pytest.raises(ValueError, match="Only 1-dimensional"): + # 0-dim + TimedeltaArray(arr[[0]].squeeze()) + def test_freq_validation(self): # ensure that the public constructor cannot create an invalid instance arr = np.array([0, 0, 1], dtype=np.int64) * 3600 * 10**9 @@ -51,6 +63,16 @@ def test_copy(self): class TestTimedeltaArray(object): + def test_np_sum(self): + # GH#25282 + vals = np.arange(5, dtype=np.int64).view('m8[h]').astype('m8[ns]') + arr = TimedeltaArray(vals) + result = np.sum(arr) + assert result == vals.sum() + + result = np.sum(pd.TimedeltaIndex(arr)) + assert result == vals.sum() + def test_from_sequence_dtype(self): msg = "dtype .*object.* cannot be converted to timedelta64" with pytest.raises(ValueError, match=msg): diff --git a/pandas/tests/computation/test_eval.py b/pandas/tests/computation/test_eval.py index c1ba15f428eb7..a14d8e4471c23 100644 --- a/pandas/tests/computation/test_eval.py +++ b/pandas/tests/computation/test_eval.py @@ -285,10 +285,14 @@ def check_operands(left, right, cmp_op): def check_simple_cmp_op(self, lhs, cmp1, rhs): ex = 'lhs {0} rhs'.format(cmp1) + msg = (r"only list-like( or dict-like)? objects are allowed to be" + r" passed to (DataFrame\.)?isin\(\), you passed a" + r" (\[|')bool(\]|')|" + "argument of type 'bool' is not iterable") if cmp1 in ('in', 'not in') and not is_list_like(rhs): - pytest.raises(TypeError, pd.eval, ex, engine=self.engine, - parser=self.parser, local_dict={'lhs': lhs, - 'rhs': rhs}) + with pytest.raises(TypeError, match=msg): + pd.eval(ex, engine=self.engine, parser=self.parser, + local_dict={'lhs': lhs, 'rhs': rhs}) else: expected = _eval_single_bin(lhs, cmp1, rhs, self.engine) result = pd.eval(ex, engine=self.engine, parser=self.parser) @@ -341,9 +345,11 @@ def check_floor_division(self, lhs, arith1, rhs): expected = lhs // rhs self.check_equal(res, expected) else: - pytest.raises(TypeError, pd.eval, ex, - local_dict={'lhs': lhs, 'rhs': rhs}, - engine=self.engine, parser=self.parser) + msg = (r"unsupported operand type\(s\) for //: 'VariableNode' and" + " 'VariableNode'") + with pytest.raises(TypeError, match=msg): + pd.eval(ex, local_dict={'lhs': lhs, 'rhs': rhs}, + engine=self.engine, parser=self.parser) def get_expected_pow_result(self, lhs, rhs): try: @@ -396,10 +402,14 @@ def check_compound_invert_op(self, lhs, cmp1, rhs): skip_these = 'in', 'not in' ex = '~(lhs {0} rhs)'.format(cmp1) + msg = (r"only list-like( or dict-like)? objects are allowed to be" + r" passed to (DataFrame\.)?isin\(\), you passed a" + r" (\[|')float(\]|')|" + "argument of type 'float' is not iterable") if is_scalar(rhs) and cmp1 in skip_these: - pytest.raises(TypeError, pd.eval, ex, engine=self.engine, - parser=self.parser, local_dict={'lhs': lhs, - 'rhs': rhs}) + with pytest.raises(TypeError, match=msg): + pd.eval(ex, engine=self.engine, parser=self.parser, + local_dict={'lhs': lhs, 'rhs': rhs}) else: # compound if is_scalar(lhs) and is_scalar(rhs): @@ -1101,8 +1111,9 @@ def test_simple_arith_ops(self): ex3 = '1 {0} (x + 1)'.format(op) if op in ('in', 'not in'): - pytest.raises(TypeError, pd.eval, ex, - engine=self.engine, parser=self.parser) + msg = "argument of type 'int' is not iterable" + with pytest.raises(TypeError, match=msg): + pd.eval(ex, engine=self.engine, parser=self.parser) else: expec = _eval_single_bin(1, op, 1, self.engine) x = self.eval(ex, engine=self.engine, parser=self.parser) @@ -1236,19 +1247,25 @@ def test_assignment_fails(self): df = DataFrame(np.random.randn(5, 3), columns=list('abc')) df2 = DataFrame(np.random.randn(5, 3)) expr1 = 'df = df2' - pytest.raises(ValueError, self.eval, expr1, - local_dict={'df': df, 'df2': df2}) + msg = "cannot assign without a target object" + with pytest.raises(ValueError, match=msg): + self.eval(expr1, local_dict={'df': df, 'df2': df2}) def test_assignment_column(self): df = DataFrame(np.random.randn(5, 2), columns=list('ab')) orig_df = df.copy() # multiple assignees - pytest.raises(SyntaxError, df.eval, 'd c = a + b') + with pytest.raises(SyntaxError, match="invalid syntax"): + df.eval('d c = a + b') # invalid assignees - pytest.raises(SyntaxError, df.eval, 'd,c = a + b') - pytest.raises(SyntaxError, df.eval, 'Timestamp("20131001") = a + b') + msg = "left hand side of an assignment must be a single name" + with pytest.raises(SyntaxError, match=msg): + df.eval('d,c = a + b') + msg = "can't assign to function call" + with pytest.raises(SyntaxError, match=msg): + df.eval('Timestamp("20131001") = a + b') # single assignment - existing variable expected = orig_df.copy() @@ -1291,7 +1308,9 @@ def f(): # multiple assignment df = orig_df.copy() df.eval('c = a + b', inplace=True) - pytest.raises(SyntaxError, df.eval, 'c = a = b') + msg = "can only assign a single expression" + with pytest.raises(SyntaxError, match=msg): + df.eval('c = a = b') # explicit targets df = orig_df.copy() @@ -1545,21 +1564,24 @@ def test_check_many_exprs(self): def test_fails_and(self): df = DataFrame(np.random.randn(5, 3)) - pytest.raises(NotImplementedError, pd.eval, 'df > 2 and df > 3', - local_dict={'df': df}, parser=self.parser, - engine=self.engine) + msg = "'BoolOp' nodes are not implemented" + with pytest.raises(NotImplementedError, match=msg): + pd.eval('df > 2 and df > 3', local_dict={'df': df}, + parser=self.parser, engine=self.engine) def test_fails_or(self): df = DataFrame(np.random.randn(5, 3)) - pytest.raises(NotImplementedError, pd.eval, 'df > 2 or df > 3', - local_dict={'df': df}, parser=self.parser, - engine=self.engine) + msg = "'BoolOp' nodes are not implemented" + with pytest.raises(NotImplementedError, match=msg): + pd.eval('df > 2 or df > 3', local_dict={'df': df}, + parser=self.parser, engine=self.engine) def test_fails_not(self): df = DataFrame(np.random.randn(5, 3)) - pytest.raises(NotImplementedError, pd.eval, 'not df > 2', - local_dict={'df': df}, parser=self.parser, - engine=self.engine) + msg = "'Not' nodes are not implemented" + with pytest.raises(NotImplementedError, match=msg): + pd.eval('not df > 2', local_dict={'df': df}, parser=self.parser, + engine=self.engine) def test_fails_ampersand(self): df = DataFrame(np.random.randn(5, 3)) # noqa diff --git a/pandas/tests/dtypes/test_common.py b/pandas/tests/dtypes/test_common.py index 62e96fd39a759..5c1f6ff405b3b 100644 --- a/pandas/tests/dtypes/test_common.py +++ b/pandas/tests/dtypes/test_common.py @@ -607,13 +607,16 @@ def test__get_dtype(input_param, result): assert com._get_dtype(input_param) == result -@pytest.mark.parametrize('input_param', [None, - 1, 1.2, - 'random string', - pd.DataFrame([1, 2])]) -def test__get_dtype_fails(input_param): +@pytest.mark.parametrize('input_param,expected_error_message', [ + (None, "Cannot deduce dtype from null object"), + (1, "data type not understood"), + (1.2, "data type not understood"), + ('random string', "data type 'random string' not understood"), + (pd.DataFrame([1, 2]), "data type not understood")]) +def test__get_dtype_fails(input_param, expected_error_message): # python objects - pytest.raises(TypeError, com._get_dtype, input_param) + with pytest.raises(TypeError, match=expected_error_message): + com._get_dtype(input_param) @pytest.mark.parametrize('input_param,result', [ diff --git a/pandas/tests/dtypes/test_dtypes.py b/pandas/tests/dtypes/test_dtypes.py index 0fe0a845f5129..4366f610871ff 100644 --- a/pandas/tests/dtypes/test_dtypes.py +++ b/pandas/tests/dtypes/test_dtypes.py @@ -3,6 +3,7 @@ import numpy as np import pytest +import pytz from pandas.core.dtypes.common import ( is_bool_dtype, is_categorical, is_categorical_dtype, @@ -37,7 +38,8 @@ def test_equality_invalid(self): assert not is_dtype_equal(self.dtype, np.int64) def test_numpy_informed(self): - pytest.raises(TypeError, np.dtype, self.dtype) + with pytest.raises(TypeError, match="data type not understood"): + np.dtype(self.dtype) assert not self.dtype == np.str_ assert not np.str_ == self.dtype @@ -86,8 +88,9 @@ def test_equality(self): def test_construction_from_string(self): result = CategoricalDtype.construct_from_string('category') assert is_dtype_equal(self.dtype, result) - pytest.raises( - TypeError, lambda: CategoricalDtype.construct_from_string('foo')) + msg = "cannot construct a CategoricalDtype" + with pytest.raises(TypeError, match=msg): + CategoricalDtype.construct_from_string('foo') def test_constructor_invalid(self): msg = "Parameter 'categories' must be list-like" @@ -201,8 +204,9 @@ def test_hash_vs_equality(self): assert hash(dtype2) != hash(dtype4) def test_construction(self): - pytest.raises(ValueError, - lambda: DatetimeTZDtype('ms', 'US/Eastern')) + msg = "DatetimeTZDtype only supports ns units" + with pytest.raises(ValueError, match=msg): + DatetimeTZDtype('ms', 'US/Eastern') def test_subclass(self): a = DatetimeTZDtype.construct_from_string('datetime64[ns, US/Eastern]') @@ -225,8 +229,9 @@ def test_construction_from_string(self): result = DatetimeTZDtype.construct_from_string( 'datetime64[ns, US/Eastern]') assert is_dtype_equal(self.dtype, result) - pytest.raises(TypeError, - lambda: DatetimeTZDtype.construct_from_string('foo')) + msg = "Could not construct DatetimeTZDtype from 'foo'" + with pytest.raises(TypeError, match=msg): + DatetimeTZDtype.construct_from_string('foo') def test_construct_from_string_raises(self): with pytest.raises(TypeError, match="notatz"): @@ -302,6 +307,15 @@ def test_empty(self): with pytest.raises(TypeError, match="A 'tz' is required."): DatetimeTZDtype() + def test_tz_standardize(self): + # GH 24713 + tz = pytz.timezone('US/Eastern') + dr = date_range('2013-01-01', periods=3, tz='US/Eastern') + dtype = DatetimeTZDtype('ns', dr.tz) + assert dtype.tz == tz + dtype = DatetimeTZDtype('ns', dr[0].tz) + assert dtype.tz == tz + class TestPeriodDtype(Base): @@ -501,10 +515,11 @@ def test_construction_not_supported(self, subtype): with pytest.raises(TypeError, match=msg): IntervalDtype(subtype) - def test_construction_errors(self): + @pytest.mark.parametrize('subtype', ['xx', 'IntervalA', 'Interval[foo]']) + def test_construction_errors(self, subtype): msg = 'could not construct IntervalDtype' with pytest.raises(TypeError, match=msg): - IntervalDtype('xx') + IntervalDtype(subtype) def test_construction_from_string(self): result = IntervalDtype('interval[int64]') @@ -513,7 +528,7 @@ def test_construction_from_string(self): assert is_dtype_equal(self.dtype, result) @pytest.mark.parametrize('string', [ - 'foo', 'foo[int64]', 0, 3.14, ('a', 'b'), None]) + 0, 3.14, ('a', 'b'), None]) def test_construction_from_string_errors(self, string): # these are invalid entirely msg = 'a string needs to be passed, got type' @@ -522,10 +537,12 @@ def test_construction_from_string_errors(self, string): IntervalDtype.construct_from_string(string) @pytest.mark.parametrize('string', [ - 'interval[foo]']) + 'foo', 'foo[int64]', 'IntervalA']) def test_construction_from_string_error_subtype(self, string): # this is an invalid subtype - msg = 'could not construct IntervalDtype' + msg = ("Incorrectly formatted string passed to constructor. " + r"Valid formats include Interval or Interval\[dtype\] " + "where dtype is numeric, datetime, or timedelta") with pytest.raises(TypeError, match=msg): IntervalDtype.construct_from_string(string) @@ -549,6 +566,7 @@ def test_is_dtype(self): assert not IntervalDtype.is_dtype('U') assert not IntervalDtype.is_dtype('S') assert not IntervalDtype.is_dtype('foo') + assert not IntervalDtype.is_dtype('IntervalA') assert not IntervalDtype.is_dtype(np.object_) assert not IntervalDtype.is_dtype(np.int64) assert not IntervalDtype.is_dtype(np.float64) diff --git a/pandas/tests/dtypes/test_generic.py b/pandas/tests/dtypes/test_generic.py index 1622088d05f4d..2bb3559d56d61 100644 --- a/pandas/tests/dtypes/test_generic.py +++ b/pandas/tests/dtypes/test_generic.py @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- -from warnings import catch_warnings, simplefilter +from warnings import catch_warnings import numpy as np @@ -39,9 +39,6 @@ def test_abc_types(self): assert isinstance(pd.Int64Index([1, 2, 3]), gt.ABCIndexClass) assert isinstance(pd.Series([1, 2, 3]), gt.ABCSeries) assert isinstance(self.df, gt.ABCDataFrame) - with catch_warnings(record=True): - simplefilter('ignore', FutureWarning) - assert isinstance(self.df.to_panel(), gt.ABCPanel) assert isinstance(self.sparse_series, gt.ABCSparseSeries) assert isinstance(self.sparse_array, gt.ABCSparseArray) assert isinstance(self.sparse_frame, gt.ABCSparseDataFrame) diff --git a/pandas/tests/dtypes/test_inference.py b/pandas/tests/dtypes/test_inference.py index 89662b70a39ad..187b37d4f788e 100644 --- a/pandas/tests/dtypes/test_inference.py +++ b/pandas/tests/dtypes/test_inference.py @@ -159,13 +159,15 @@ def test_is_nested_list_like_fails(obj): @pytest.mark.parametrize( - "ll", [{}, {'A': 1}, Series([1])]) + "ll", [{}, {'A': 1}, Series([1]), collections.defaultdict()]) def test_is_dict_like_passes(ll): assert inference.is_dict_like(ll) -@pytest.mark.parametrize( - "ll", ['1', 1, [1, 2], (1, 2), range(2), Index([1])]) +@pytest.mark.parametrize("ll", [ + '1', 1, [1, 2], (1, 2), range(2), Index([1]), + dict, collections.defaultdict, Series +]) def test_is_dict_like_fails(ll): assert not inference.is_dict_like(ll) @@ -616,6 +618,37 @@ def test_decimals(self): result = lib.infer_dtype(arr, skipna=True) assert result == 'decimal' + # complex is compatible with nan, so skipna has no effect + @pytest.mark.parametrize('skipna', [True, False]) + def test_complex(self, skipna): + # gets cast to complex on array construction + arr = np.array([1.0, 2.0, 1 + 1j]) + result = lib.infer_dtype(arr, skipna=skipna) + assert result == 'complex' + + arr = np.array([1.0, 2.0, 1 + 1j], dtype='O') + result = lib.infer_dtype(arr, skipna=skipna) + assert result == 'mixed' + + # gets cast to complex on array construction + arr = np.array([1, np.nan, 1 + 1j]) + result = lib.infer_dtype(arr, skipna=skipna) + assert result == 'complex' + + arr = np.array([1.0, np.nan, 1 + 1j], dtype='O') + result = lib.infer_dtype(arr, skipna=skipna) + assert result == 'mixed' + + # complex with nans stays complex + arr = np.array([1 + 1j, np.nan, 3 + 3j], dtype='O') + result = lib.infer_dtype(arr, skipna=skipna) + assert result == 'complex' + + # test smaller complex dtype; will pass through _try_infer_map fastpath + arr = np.array([1 + 1j, np.nan, 3 + 3j], dtype=np.complex64) + result = lib.infer_dtype(arr, skipna=skipna) + assert result == 'complex' + def test_string(self): pass diff --git a/pandas/tests/dtypes/test_missing.py b/pandas/tests/dtypes/test_missing.py index d913d2ad299ce..7ca01e13a33a9 100644 --- a/pandas/tests/dtypes/test_missing.py +++ b/pandas/tests/dtypes/test_missing.py @@ -2,7 +2,7 @@ from datetime import datetime from decimal import Decimal -from warnings import catch_warnings, filterwarnings, simplefilter +from warnings import catch_warnings, filterwarnings import numpy as np import pytest @@ -94,15 +94,6 @@ def test_isna_isnull(self, isna_f): expected = df.apply(isna_f) tm.assert_frame_equal(result, expected) - # panel - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - for p in [tm.makePanel(), tm.makePeriodPanel(), - tm.add_nans(tm.makePanel())]: - result = isna_f(p) - expected = p.apply(isna_f) - tm.assert_panel_equal(result, expected) - def test_isna_lists(self): result = isna([[False]]) exp = np.array([[False]]) diff --git a/pandas/tests/extension/base/groupby.py b/pandas/tests/extension/base/groupby.py index dd406ca0cd5ed..1929dad075695 100644 --- a/pandas/tests/extension/base/groupby.py +++ b/pandas/tests/extension/base/groupby.py @@ -55,19 +55,14 @@ def test_groupby_extension_transform(self, data_for_grouping): self.assert_series_equal(result, expected) - @pytest.mark.parametrize('op', [ - lambda x: 1, - lambda x: [1] * len(x), - lambda x: pd.Series([1] * len(x)), - lambda x: x, - ], ids=['scalar', 'list', 'series', 'object']) - def test_groupby_extension_apply(self, data_for_grouping, op): + def test_groupby_extension_apply( + self, data_for_grouping, groupby_apply_op): df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1, 4], "B": data_for_grouping}) - df.groupby("B").apply(op) - df.groupby("B").A.apply(op) - df.groupby("A").apply(op) - df.groupby("A").B.apply(op) + df.groupby("B").apply(groupby_apply_op) + df.groupby("B").A.apply(groupby_apply_op) + df.groupby("A").apply(groupby_apply_op) + df.groupby("A").B.apply(groupby_apply_op) def test_in_numeric_groupby(self, data_for_grouping): df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1, 4], diff --git a/pandas/tests/extension/base/methods.py b/pandas/tests/extension/base/methods.py index f64df7a84b7c0..1852edaa9e748 100644 --- a/pandas/tests/extension/base/methods.py +++ b/pandas/tests/extension/base/methods.py @@ -240,7 +240,6 @@ def test_shift_fill_value(self, data): expected = data.take([2, 3, 0, 0]) self.assert_extension_array_equal(result, expected) - @pytest.mark.parametrize("as_frame", [True, False]) def test_hash_pandas_object_works(self, data, as_frame): # https://github.com/pandas-dev/pandas/issues/23066 data = pd.Series(data) @@ -250,7 +249,6 @@ def test_hash_pandas_object_works(self, data, as_frame): b = pd.util.hash_pandas_object(data) self.assert_equal(a, b) - @pytest.mark.parametrize("as_series", [True, False]) def test_searchsorted(self, data_for_sorting, as_series): b, c, a = data_for_sorting arr = type(data_for_sorting)._from_sequence([a, b, c]) @@ -275,7 +273,6 @@ def test_searchsorted(self, data_for_sorting, as_series): sorter = np.array([1, 2, 0]) assert data_for_sorting.searchsorted(a, sorter=sorter) == 0 - @pytest.mark.parametrize("as_frame", [True, False]) def test_where_series(self, data, na_value, as_frame): assert data[0] != data[1] cls = type(data) @@ -309,8 +306,6 @@ def test_where_series(self, data, na_value, as_frame): expected = expected.to_frame(name='a') self.assert_equal(result, expected) - @pytest.mark.parametrize("use_numpy", [True, False]) - @pytest.mark.parametrize("as_series", [True, False]) @pytest.mark.parametrize("repeats", [0, 1, 2, [1, 2, 3]]) def test_repeat(self, data, repeats, as_series, use_numpy): arr = type(data)._from_sequence(data[:3], dtype=data.dtype) @@ -327,7 +322,6 @@ def test_repeat(self, data, repeats, as_series, use_numpy): self.assert_equal(result, expected) - @pytest.mark.parametrize("use_numpy", [True, False]) @pytest.mark.parametrize('repeats, kwargs, error, msg', [ (2, dict(axis=1), ValueError, "'axis"), (-1, dict(), ValueError, "negative"), diff --git a/pandas/tests/extension/base/missing.py b/pandas/tests/extension/base/missing.py index 2fe547e50a34b..834f49f0461f0 100644 --- a/pandas/tests/extension/base/missing.py +++ b/pandas/tests/extension/base/missing.py @@ -1,5 +1,4 @@ import numpy as np -import pytest import pandas as pd import pandas.util.testing as tm @@ -89,14 +88,13 @@ def test_fillna_series(self, data_missing): result = ser.fillna(ser) self.assert_series_equal(result, ser) - @pytest.mark.parametrize('method', ['ffill', 'bfill']) - def test_fillna_series_method(self, data_missing, method): + def test_fillna_series_method(self, data_missing, fillna_method): fill_value = data_missing[1] - if method == 'ffill': + if fillna_method == 'ffill': data_missing = data_missing[::-1] - result = pd.Series(data_missing).fillna(method=method) + result = pd.Series(data_missing).fillna(method=fillna_method) expected = pd.Series(data_missing._from_sequence( [fill_value, fill_value], dtype=data_missing.dtype)) diff --git a/pandas/tests/extension/base/setitem.py b/pandas/tests/extension/base/setitem.py index 42fda982f7339..db6328e39e6cc 100644 --- a/pandas/tests/extension/base/setitem.py +++ b/pandas/tests/extension/base/setitem.py @@ -24,7 +24,6 @@ def test_setitem_sequence(self, data, box_in_series): assert data[0] == original[1] assert data[1] == original[0] - @pytest.mark.parametrize('as_array', [True, False]) def test_setitem_sequence_mismatched_length_raises(self, data, as_array): ser = pd.Series(data) original = ser.copy() diff --git a/pandas/tests/extension/conftest.py b/pandas/tests/extension/conftest.py index 5349dd919f2a2..3cc2d313b09f5 100644 --- a/pandas/tests/extension/conftest.py +++ b/pandas/tests/extension/conftest.py @@ -2,6 +2,8 @@ import pytest +from pandas import Series + @pytest.fixture def dtype(): @@ -108,3 +110,58 @@ def data_for_grouping(): def box_in_series(request): """Whether to box the data in a Series""" return request.param + + +@pytest.fixture(params=[ + lambda x: 1, + lambda x: [1] * len(x), + lambda x: Series([1] * len(x)), + lambda x: x, +], ids=['scalar', 'list', 'series', 'object']) +def groupby_apply_op(request): + """ + Functions to test groupby.apply(). + """ + return request.param + + +@pytest.fixture(params=[True, False]) +def as_frame(request): + """ + Boolean fixture to support Series and Series.to_frame() comparison testing. + """ + return request.param + + +@pytest.fixture(params=[True, False]) +def as_series(request): + """ + Boolean fixture to support arr and Series(arr) comparison testing. + """ + return request.param + + +@pytest.fixture(params=[True, False]) +def use_numpy(request): + """ + Boolean fixture to support comparison testing of ExtensionDtype array + and numpy array. + """ + return request.param + + +@pytest.fixture(params=['ffill', 'bfill']) +def fillna_method(request): + """ + Parametrized fixture giving method parameters 'ffill' and 'bfill' for + Series.fillna(method=) testing. + """ + return request.param + + +@pytest.fixture(params=[True, False]) +def as_array(request): + """ + Boolean fixture to support ExtensionDtype _from_sequence method testing. + """ + return request.param diff --git a/pandas/tests/extension/test_numpy.py b/pandas/tests/extension/test_numpy.py index 7ca6882c7441b..41f5beb8c885d 100644 --- a/pandas/tests/extension/test_numpy.py +++ b/pandas/tests/extension/test_numpy.py @@ -1,6 +1,8 @@ import numpy as np import pytest +from pandas.compat.numpy import _np_version_under1p16 + import pandas as pd from pandas import compat from pandas.core.arrays.numpy_ import PandasArray, PandasDtype @@ -9,9 +11,9 @@ from . import base -@pytest.fixture -def dtype(): - return PandasDtype(np.dtype('float')) +@pytest.fixture(params=['float', 'object']) +def dtype(request): + return PandasDtype(np.dtype(request.param)) @pytest.fixture @@ -38,11 +40,19 @@ def allow_in_pandas(monkeypatch): @pytest.fixture def data(allow_in_pandas, dtype): + if dtype.numpy_dtype == 'object': + return pd.Series([(i,) for i in range(100)]).array return PandasArray(np.arange(1, 101, dtype=dtype._dtype)) @pytest.fixture -def data_missing(allow_in_pandas): +def data_missing(allow_in_pandas, dtype): + # For NumPy <1.16, np.array([np.nan, (1,)]) raises + # ValueError: setting an array element with a sequence. + if dtype.numpy_dtype == 'object': + if _np_version_under1p16: + raise pytest.skip("Skipping for NumPy <1.16") + return PandasArray(np.array([np.nan, (1,)])) return PandasArray(np.array([np.nan, 1.0])) @@ -59,49 +69,84 @@ def cmp(a, b): @pytest.fixture -def data_for_sorting(allow_in_pandas): +def data_for_sorting(allow_in_pandas, dtype): """Length-3 array with a known sort order. This should be three items [B, C, A] with A < B < C """ + if dtype.numpy_dtype == 'object': + # Use an empty tuple for first element, then remove, + # to disable np.array's shape inference. + return PandasArray( + np.array([(), (2,), (3,), (1,)])[1:] + ) return PandasArray( np.array([1, 2, 0]) ) @pytest.fixture -def data_missing_for_sorting(allow_in_pandas): +def data_missing_for_sorting(allow_in_pandas, dtype): """Length-3 array with a known sort order. This should be three items [B, NA, A] with A < B and NA missing. """ + if dtype.numpy_dtype == 'object': + return PandasArray( + np.array([(1,), np.nan, (0,)]) + ) return PandasArray( np.array([1, np.nan, 0]) ) @pytest.fixture -def data_for_grouping(allow_in_pandas): +def data_for_grouping(allow_in_pandas, dtype): """Data for factorization, grouping, and unique tests. Expected to be like [B, B, NA, NA, A, A, B, C] Where A < B < C and NA is missing """ - a, b, c = np.arange(3) + if dtype.numpy_dtype == 'object': + a, b, c = (1,), (2,), (3,) + else: + a, b, c = np.arange(3) return PandasArray(np.array( [b, b, np.nan, np.nan, a, a, b, c] )) +@pytest.fixture +def skip_numpy_object(dtype): + """ + Tests for PandasArray with nested data. Users typically won't create + these objects via `pd.array`, but they can show up through `.array` + on a Series with nested data. Many of the base tests fail, as they aren't + appropriate for nested data. + + This fixture allows these tests to be skipped when used as a usefixtures + marker to either an individual test or a test class. + """ + if dtype == 'object': + raise pytest.skip("Skipping for object dtype.") + + +skip_nested = pytest.mark.usefixtures('skip_numpy_object') + + class BaseNumPyTests(object): pass class TestCasting(BaseNumPyTests, base.BaseCastingTests): - pass + + @skip_nested + def test_astype_str(self, data): + # ValueError: setting an array element with a sequence + super(TestCasting, self).test_astype_str(data) class TestConstructors(BaseNumPyTests, base.BaseConstructorsTests): @@ -110,6 +155,11 @@ class TestConstructors(BaseNumPyTests, base.BaseConstructorsTests): def test_from_dtype(self, data): pass + @skip_nested + def test_array_from_scalars(self, data): + # ValueError: PandasArray must be 1-dimensional. + super(TestConstructors, self).test_array_from_scalars(data) + class TestDtype(BaseNumPyTests, base.BaseDtypeTests): @@ -120,15 +170,32 @@ def test_check_dtype(self, data): class TestGetitem(BaseNumPyTests, base.BaseGetitemTests): - pass + + @skip_nested + def test_getitem_scalar(self, data): + # AssertionError + super(TestGetitem, self).test_getitem_scalar(data) + + @skip_nested + def test_take_series(self, data): + # ValueError: PandasArray must be 1-dimensional. + super(TestGetitem, self).test_take_series(data) class TestGroupby(BaseNumPyTests, base.BaseGroupbyTests): - pass + @skip_nested + def test_groupby_extension_apply( + self, data_for_grouping, groupby_apply_op): + # ValueError: Names should be list-like for a MultiIndex + super(TestGroupby, self).test_groupby_extension_apply( + data_for_grouping, groupby_apply_op) class TestInterface(BaseNumPyTests, base.BaseInterfaceTests): - pass + @skip_nested + def test_array_interface(self, data): + # NumPy array shape inference + super(TestInterface, self).test_array_interface(data) class TestMethods(BaseNumPyTests, base.BaseMethodsTests): @@ -143,7 +210,57 @@ def test_value_counts(self, all_data, dropna): def test_combine_le(self, data_repeated): super(TestMethods, self).test_combine_le(data_repeated) - + @skip_nested + def test_combine_add(self, data_repeated): + # Not numeric + super(TestMethods, self).test_combine_add(data_repeated) + + @skip_nested + def test_shift_fill_value(self, data): + # np.array shape inference. Shift implementation fails. + super(TestMethods, self).test_shift_fill_value(data) + + @skip_nested + @pytest.mark.parametrize('box', [pd.Series, lambda x: x]) + @pytest.mark.parametrize('method', [lambda x: x.unique(), pd.unique]) + def test_unique(self, data, box, method): + # Fails creating expected + super(TestMethods, self).test_unique(data, box, method) + + @skip_nested + def test_fillna_copy_frame(self, data_missing): + # The "scalar" for this array isn't a scalar. + super(TestMethods, self).test_fillna_copy_frame(data_missing) + + @skip_nested + def test_fillna_copy_series(self, data_missing): + # The "scalar" for this array isn't a scalar. + super(TestMethods, self).test_fillna_copy_series(data_missing) + + @skip_nested + def test_hash_pandas_object_works(self, data, as_frame): + # ndarray of tuples not hashable + super(TestMethods, self).test_hash_pandas_object_works(data, as_frame) + + @skip_nested + def test_searchsorted(self, data_for_sorting, as_series): + # Test setup fails. + super(TestMethods, self).test_searchsorted(data_for_sorting, as_series) + + @skip_nested + def test_where_series(self, data, na_value, as_frame): + # Test setup fails. + super(TestMethods, self).test_where_series(data, na_value, as_frame) + + @skip_nested + @pytest.mark.parametrize("repeats", [0, 1, 2, [1, 2, 3]]) + def test_repeat(self, data, repeats, as_series, use_numpy): + # Fails creating expected + super(TestMethods, self).test_repeat( + data, repeats, as_series, use_numpy) + + +@skip_nested class TestArithmetics(BaseNumPyTests, base.BaseArithmeticOpsTests): divmod_exc = None series_scalar_exc = None @@ -183,6 +300,7 @@ class TestPrinting(BaseNumPyTests, base.BasePrintingTests): pass +@skip_nested class TestNumericReduce(BaseNumPyTests, base.BaseNumericReduceTests): def check_reduce(self, s, op_name, skipna): @@ -192,12 +310,33 @@ def check_reduce(self, s, op_name, skipna): tm.assert_almost_equal(result, expected) +@skip_nested class TestBooleanReduce(BaseNumPyTests, base.BaseBooleanReduceTests): pass -class TestMising(BaseNumPyTests, base.BaseMissingTests): - pass +class TestMissing(BaseNumPyTests, base.BaseMissingTests): + + @skip_nested + def test_fillna_scalar(self, data_missing): + # Non-scalar "scalar" values. + super(TestMissing, self).test_fillna_scalar(data_missing) + + @skip_nested + def test_fillna_series_method(self, data_missing, fillna_method): + # Non-scalar "scalar" values. + super(TestMissing, self).test_fillna_series_method( + data_missing, fillna_method) + + @skip_nested + def test_fillna_series(self, data_missing): + # Non-scalar "scalar" values. + super(TestMissing, self).test_fillna_series(data_missing) + + @skip_nested + def test_fillna_frame(self, data_missing): + # Non-scalar "scalar" values. + super(TestMissing, self).test_fillna_frame(data_missing) class TestReshaping(BaseNumPyTests, base.BaseReshapingTests): @@ -207,10 +346,85 @@ class TestReshaping(BaseNumPyTests, base.BaseReshapingTests): def test_concat_mixed_dtypes(self, data): super(TestReshaping, self).test_concat_mixed_dtypes(data) + @skip_nested + def test_merge(self, data, na_value): + # Fails creating expected + super(TestReshaping, self).test_merge(data, na_value) -class TestSetitem(BaseNumPyTests, base.BaseSetitemTests): - pass + @skip_nested + def test_merge_on_extension_array(self, data): + # Fails creating expected + super(TestReshaping, self).test_merge_on_extension_array(data) + @skip_nested + def test_merge_on_extension_array_duplicates(self, data): + # Fails creating expected + super(TestReshaping, self).test_merge_on_extension_array_duplicates( + data) + + +class TestSetitem(BaseNumPyTests, base.BaseSetitemTests): + @skip_nested + def test_setitem_scalar_series(self, data, box_in_series): + # AssertionError + super(TestSetitem, self).test_setitem_scalar_series( + data, box_in_series) + + @skip_nested + def test_setitem_sequence(self, data, box_in_series): + # ValueError: shape mismatch: value array of shape (2,1) could not + # be broadcast to indexing result of shape (2,) + super(TestSetitem, self).test_setitem_sequence(data, box_in_series) + + @skip_nested + def test_setitem_sequence_mismatched_length_raises(self, data, as_array): + # ValueError: PandasArray must be 1-dimensional. + (super(TestSetitem, self). + test_setitem_sequence_mismatched_length_raises(data, as_array)) + + @skip_nested + def test_setitem_sequence_broadcasts(self, data, box_in_series): + # ValueError: cannot set using a list-like indexer with a different + # length than the value + super(TestSetitem, self).test_setitem_sequence_broadcasts( + data, box_in_series) + + @skip_nested + def test_setitem_loc_scalar_mixed(self, data): + # AssertionError + super(TestSetitem, self).test_setitem_loc_scalar_mixed(data) + + @skip_nested + def test_setitem_loc_scalar_multiple_homogoneous(self, data): + # AssertionError + super(TestSetitem, self).test_setitem_loc_scalar_multiple_homogoneous( + data) + + @skip_nested + def test_setitem_iloc_scalar_mixed(self, data): + # AssertionError + super(TestSetitem, self).test_setitem_iloc_scalar_mixed(data) + + @skip_nested + def test_setitem_iloc_scalar_multiple_homogoneous(self, data): + # AssertionError + super(TestSetitem, self).test_setitem_iloc_scalar_multiple_homogoneous( + data) + + @skip_nested + @pytest.mark.parametrize('setter', ['loc', None]) + def test_setitem_mask_broadcast(self, data, setter): + # ValueError: cannot set using a list-like indexer with a different + # length than the value + super(TestSetitem, self).test_setitem_mask_broadcast(data, setter) + + @skip_nested + def test_setitem_scalar_key_sequence_raise(self, data): + # Failed: DID NOT RAISE + super(TestSetitem, self).test_setitem_scalar_key_sequence_raise(data) + + +@skip_nested class TestParsing(BaseNumPyTests, base.BaseParsingTests): pass diff --git a/pandas/tests/extension/test_sparse.py b/pandas/tests/extension/test_sparse.py index 21dbf9524961c..146dea2b65d83 100644 --- a/pandas/tests/extension/test_sparse.py +++ b/pandas/tests/extension/test_sparse.py @@ -287,11 +287,10 @@ def test_combine_first(self, data): pytest.skip("TODO(SparseArray.__setitem__ will preserve dtype.") super(TestMethods, self).test_combine_first(data) - @pytest.mark.parametrize("as_series", [True, False]) def test_searchsorted(self, data_for_sorting, as_series): with tm.assert_produces_warning(PerformanceWarning): super(TestMethods, self).test_searchsorted(data_for_sorting, - as_series=as_series) + as_series) class TestCasting(BaseSparseTests, base.BaseCastingTests): diff --git a/pandas/tests/frame/common.py b/pandas/tests/frame/common.py index 2ea087c0510bf..5624f7c1303b6 100644 --- a/pandas/tests/frame/common.py +++ b/pandas/tests/frame/common.py @@ -85,7 +85,7 @@ def tzframe(self): @cache_readonly def empty(self): - return pd.DataFrame({}) + return pd.DataFrame() @cache_readonly def ts1(self): diff --git a/pandas/tests/frame/conftest.py b/pandas/tests/frame/conftest.py index 377e737a53158..fbe03325a3ad9 100644 --- a/pandas/tests/frame/conftest.py +++ b/pandas/tests/frame/conftest.py @@ -29,16 +29,6 @@ def float_frame_with_na(): return df -@pytest.fixture -def float_frame2(): - """ - Fixture for DataFrame of floats with index of unique strings - - Columns are ['D', 'C', 'B', 'A'] - """ - return DataFrame(tm.getSeriesData(), columns=['D', 'C', 'B', 'A']) - - @pytest.fixture def bool_frame_with_na(): """ @@ -104,21 +94,6 @@ def mixed_float_frame(): return df -@pytest.fixture -def mixed_float_frame2(): - """ - Fixture for DataFrame of different float types with index of unique strings - - Columns are ['A', 'B', 'C', 'D']. - """ - df = DataFrame(tm.getSeriesData()) - df.D = df.D.astype('float32') - df.C = df.C.astype('float32') - df.B = df.B.astype('float16') - df.D = df.D.astype('float64') - return df - - @pytest.fixture def mixed_int_frame(): """ @@ -135,19 +110,6 @@ def mixed_int_frame(): return df -@pytest.fixture -def mixed_type_frame(): - """ - Fixture for DataFrame of float/int/string columns with RangeIndex - - Columns are ['a', 'b', 'c', 'float32', 'int32']. - """ - return DataFrame({'a': 1., 'b': 2, 'c': 'foo', - 'float32': np.array([1.] * 10, dtype='float32'), - 'int32': np.array([1] * 10, dtype='int32')}, - index=np.arange(10)) - - @pytest.fixture def timezone_frame(): """ @@ -165,30 +127,6 @@ def timezone_frame(): return df -@pytest.fixture -def empty_frame(): - """ - Fixture for empty DataFrame - """ - return DataFrame({}) - - -@pytest.fixture -def datetime_series(): - """ - Fixture for Series of floats with DatetimeIndex - """ - return tm.makeTimeSeries(nper=30) - - -@pytest.fixture -def datetime_series_short(): - """ - Fixture for Series of floats with DatetimeIndex - """ - return tm.makeTimeSeries(nper=30)[5:] - - @pytest.fixture def simple_frame(): """ diff --git a/pandas/tests/frame/test_alter_axes.py b/pandas/tests/frame/test_alter_axes.py index c2355742199dc..f4a2a5f8032a0 100644 --- a/pandas/tests/frame/test_alter_axes.py +++ b/pandas/tests/frame/test_alter_axes.py @@ -178,10 +178,10 @@ def test_set_index_pass_arrays(self, frame_of_index_cols, # MultiIndex constructor does not work directly on Series -> lambda # We also emulate a "constructor" for the label -> lambda # also test index name if append=True (name is duplicate here for A) - @pytest.mark.parametrize('box2', [Series, Index, np.array, list, + @pytest.mark.parametrize('box2', [Series, Index, np.array, list, iter, lambda x: MultiIndex.from_arrays([x]), lambda x: x.name]) - @pytest.mark.parametrize('box1', [Series, Index, np.array, list, + @pytest.mark.parametrize('box1', [Series, Index, np.array, list, iter, lambda x: MultiIndex.from_arrays([x]), lambda x: x.name]) @pytest.mark.parametrize('append, index_name', [(True, None), @@ -195,6 +195,9 @@ def test_set_index_pass_arrays_duplicate(self, frame_of_index_cols, drop, keys = [box1(df['A']), box2(df['A'])] result = df.set_index(keys, drop=drop, append=append) + # if either box is iter, it has been consumed; re-read + keys = [box1(df['A']), box2(df['A'])] + # need to adapt first drop for case that both keys are 'A' -- # cannot drop the same column twice; # use "is" because == would give ambiguous Boolean error for containers @@ -255,21 +258,150 @@ def test_set_index_raise_keys(self, frame_of_index_cols, drop, append): @pytest.mark.parametrize('append', [True, False]) @pytest.mark.parametrize('drop', [True, False]) - @pytest.mark.parametrize('box', [set, iter]) + @pytest.mark.parametrize('box', [set], ids=['set']) def test_set_index_raise_on_type(self, frame_of_index_cols, box, drop, append): df = frame_of_index_cols msg = 'The parameter "keys" may be a column key, .*' - # forbidden type, e.g. set/tuple/iter - with pytest.raises(ValueError, match=msg): + # forbidden type, e.g. set + with pytest.raises(TypeError, match=msg): df.set_index(box(df['A']), drop=drop, append=append) - # forbidden type in list, e.g. set/tuple/iter - with pytest.raises(ValueError, match=msg): + # forbidden type in list, e.g. set + with pytest.raises(TypeError, match=msg): df.set_index(['A', df['A'], box(df['A'])], drop=drop, append=append) + # MultiIndex constructor does not work directly on Series -> lambda + @pytest.mark.parametrize('box', [Series, Index, np.array, iter, + lambda x: MultiIndex.from_arrays([x])], + ids=['Series', 'Index', 'np.array', + 'iter', 'MultiIndex']) + @pytest.mark.parametrize('length', [4, 6], ids=['too_short', 'too_long']) + @pytest.mark.parametrize('append', [True, False]) + @pytest.mark.parametrize('drop', [True, False]) + def test_set_index_raise_on_len(self, frame_of_index_cols, box, length, + drop, append): + # GH 24984 + df = frame_of_index_cols # has length 5 + + values = np.random.randint(0, 10, (length,)) + + msg = 'Length mismatch: Expected 5 rows, received array of length.*' + + # wrong length directly + with pytest.raises(ValueError, match=msg): + df.set_index(box(values), drop=drop, append=append) + + # wrong length in list + with pytest.raises(ValueError, match=msg): + df.set_index(['A', df.A, box(values)], drop=drop, append=append) + + def test_set_index_custom_label_type(self): + # GH 24969 + + class Thing(object): + def __init__(self, name, color): + self.name = name + self.color = color + + def __str__(self): + return "" % (self.name,) + + # necessary for pretty KeyError + __repr__ = __str__ + + thing1 = Thing('One', 'red') + thing2 = Thing('Two', 'blue') + df = DataFrame({thing1: [0, 1], thing2: [2, 3]}) + expected = DataFrame({thing1: [0, 1]}, + index=Index([2, 3], name=thing2)) + + # use custom label directly + result = df.set_index(thing2) + tm.assert_frame_equal(result, expected) + + # custom label wrapped in list + result = df.set_index([thing2]) + tm.assert_frame_equal(result, expected) + + # missing key + thing3 = Thing('Three', 'pink') + msg = "" + with pytest.raises(KeyError, match=msg): + # missing label directly + df.set_index(thing3) + + with pytest.raises(KeyError, match=msg): + # missing label in list + df.set_index([thing3]) + + def test_set_index_custom_label_hashable_iterable(self): + # GH 24969 + + # actual example discussed in GH 24984 was e.g. for shapely.geometry + # objects (e.g. a collection of Points) that can be both hashable and + # iterable; using frozenset as a stand-in for testing here + + class Thing(frozenset): + # need to stabilize repr for KeyError (due to random order in sets) + def __repr__(self): + tmp = sorted(list(self)) + # double curly brace prints one brace in format string + return "frozenset({{{}}})".format(', '.join(map(repr, tmp))) + + thing1 = Thing(['One', 'red']) + thing2 = Thing(['Two', 'blue']) + df = DataFrame({thing1: [0, 1], thing2: [2, 3]}) + expected = DataFrame({thing1: [0, 1]}, + index=Index([2, 3], name=thing2)) + + # use custom label directly + result = df.set_index(thing2) + tm.assert_frame_equal(result, expected) + + # custom label wrapped in list + result = df.set_index([thing2]) + tm.assert_frame_equal(result, expected) + + # missing key + thing3 = Thing(['Three', 'pink']) + msg = r"frozenset\(\{'Three', 'pink'\}\)" + with pytest.raises(KeyError, match=msg): + # missing label directly + df.set_index(thing3) + + with pytest.raises(KeyError, match=msg): + # missing label in list + df.set_index([thing3]) + + def test_set_index_custom_label_type_raises(self): + # GH 24969 + + # purposefully inherit from something unhashable + class Thing(set): + def __init__(self, name, color): + self.name = name + self.color = color + + def __str__(self): + return "" % (self.name,) + + thing1 = Thing('One', 'red') + thing2 = Thing('Two', 'blue') + df = DataFrame([[0, 2], [1, 3]], columns=[thing1, thing2]) + + msg = 'The parameter "keys" may be a column key, .*' + + with pytest.raises(TypeError, match=msg): + # use custom label directly + df.set_index(thing2) + + with pytest.raises(TypeError, match=msg): + # custom label wrapped in list + df.set_index([thing2]) + def test_construction_with_categorical_index(self): ci = tm.makeCategoricalIndex(10) ci.name = 'B' @@ -501,7 +633,8 @@ def test_rename(self, float_frame): tm.assert_index_equal(renamed.index, Index(['BAR', 'FOO'])) # have to pass something - pytest.raises(TypeError, float_frame.rename) + with pytest.raises(TypeError, match="must pass an index to rename"): + float_frame.rename() # partial columns renamed = float_frame.rename(columns={'C': 'foo', 'D': 'bar'}) @@ -600,6 +733,26 @@ def test_rename_axis_mapper(self): with pytest.raises(TypeError, match='bogus'): df.rename_axis(bogus=None) + @pytest.mark.parametrize('kwargs, rename_index, rename_columns', [ + ({'mapper': None, 'axis': 0}, True, False), + ({'mapper': None, 'axis': 1}, False, True), + ({'index': None}, True, False), + ({'columns': None}, False, True), + ({'index': None, 'columns': None}, True, True), + ({}, False, False)]) + def test_rename_axis_none(self, kwargs, rename_index, rename_columns): + # GH 25034 + index = Index(list('abc'), name='foo') + columns = Index(['col1', 'col2'], name='bar') + data = np.arange(6).reshape(3, 2) + df = DataFrame(data, index, columns) + + result = df.rename_axis(**kwargs) + expected_index = index.rename(None) if rename_index else index + expected_columns = columns.rename(None) if rename_columns else columns + expected = DataFrame(data, expected_index, expected_columns) + tm.assert_frame_equal(result, expected) + def test_rename_multiindex(self): tuples_index = [('foo1', 'bar1'), ('foo2', 'bar2')] diff --git a/pandas/tests/frame/test_analytics.py b/pandas/tests/frame/test_analytics.py index 386e5f57617cf..3363a45149fff 100644 --- a/pandas/tests/frame/test_analytics.py +++ b/pandas/tests/frame/test_analytics.py @@ -8,7 +8,7 @@ import numpy as np import pytest -from pandas.compat import PY35, lrange +from pandas.compat import PY2, PY35, is_platform_windows, lrange import pandas.util._test_decorators as td import pandas as pd @@ -231,9 +231,9 @@ def assert_bool_op_api(opname, bool_frame_with_na, float_string_frame, getattr(bool_frame_with_na, opname)(axis=1, bool_only=False) -class TestDataFrameAnalytics(): +class TestDataFrameAnalytics(object): - # ---------------------------------------------------------------------= + # --------------------------------------------------------------------- # Correlation and covariance @td.skip_if_no_scipy @@ -502,6 +502,9 @@ def test_corrwith_kendall(self): expected = Series(np.ones(len(result))) tm.assert_series_equal(result, expected) + # --------------------------------------------------------------------- + # Describe + def test_bool_describe_in_mixed_frame(self): df = DataFrame({ 'string_data': ['a', 'b', 'c', 'd', 'e'], @@ -693,82 +696,113 @@ def test_describe_tz_values(self, tz_naive_fixture): result = df.describe(include='all') tm.assert_frame_equal(result, expected) - def test_reduce_mixed_frame(self): - # GH 6806 - df = DataFrame({ - 'bool_data': [True, True, False, False, False], - 'int_data': [10, 20, 30, 40, 50], - 'string_data': ['a', 'b', 'c', 'd', 'e'], - }) - df.reindex(columns=['bool_data', 'int_data', 'string_data']) - test = df.sum(axis=0) - tm.assert_numpy_array_equal(test.values, - np.array([2, 150, 'abcde'], dtype=object)) - tm.assert_series_equal(test, df.T.sum(axis=1)) + # --------------------------------------------------------------------- + # Reductions - def test_count(self, float_frame_with_na, float_frame, float_string_frame): - f = lambda s: notna(s).sum() - assert_stat_op_calc('count', f, float_frame_with_na, has_skipna=False, - check_dtype=False, check_dates=True) + def test_stat_op_api(self, float_frame, float_string_frame): assert_stat_op_api('count', float_frame, float_string_frame, has_numeric_only=True) + assert_stat_op_api('sum', float_frame, float_string_frame, + has_numeric_only=True) - # corner case - frame = DataFrame() - ct1 = frame.count(1) - assert isinstance(ct1, Series) + assert_stat_op_api('nunique', float_frame, float_string_frame) + assert_stat_op_api('mean', float_frame, float_string_frame) + assert_stat_op_api('product', float_frame, float_string_frame) + assert_stat_op_api('median', float_frame, float_string_frame) + assert_stat_op_api('min', float_frame, float_string_frame) + assert_stat_op_api('max', float_frame, float_string_frame) + assert_stat_op_api('mad', float_frame, float_string_frame) + assert_stat_op_api('var', float_frame, float_string_frame) + assert_stat_op_api('std', float_frame, float_string_frame) + assert_stat_op_api('sem', float_frame, float_string_frame) + assert_stat_op_api('median', float_frame, float_string_frame) - ct2 = frame.count(0) - assert isinstance(ct2, Series) + try: + from scipy.stats import skew, kurtosis # noqa:F401 + assert_stat_op_api('skew', float_frame, float_string_frame) + assert_stat_op_api('kurt', float_frame, float_string_frame) + except ImportError: + pass - # GH 423 - df = DataFrame(index=lrange(10)) - result = df.count(1) - expected = Series(0, index=df.index) - tm.assert_series_equal(result, expected) + def test_stat_op_calc(self, float_frame_with_na, mixed_float_frame): - df = DataFrame(columns=lrange(10)) - result = df.count(0) - expected = Series(0, index=df.columns) - tm.assert_series_equal(result, expected) + def count(s): + return notna(s).sum() - df = DataFrame() - result = df.count() - expected = Series(0, index=[]) - tm.assert_series_equal(result, expected) + def nunique(s): + return len(algorithms.unique1d(s.dropna())) + + def mad(x): + return np.abs(x - x.mean()).mean() - def test_nunique(self, float_frame_with_na, float_frame, - float_string_frame): - f = lambda s: len(algorithms.unique1d(s.dropna())) - assert_stat_op_calc('nunique', f, float_frame_with_na, + def var(x): + return np.var(x, ddof=1) + + def std(x): + return np.std(x, ddof=1) + + def sem(x): + return np.std(x, ddof=1) / np.sqrt(len(x)) + + def skewness(x): + from scipy.stats import skew # noqa:F811 + if len(x) < 3: + return np.nan + return skew(x, bias=False) + + def kurt(x): + from scipy.stats import kurtosis # noqa:F811 + if len(x) < 4: + return np.nan + return kurtosis(x, bias=False) + + assert_stat_op_calc('nunique', nunique, float_frame_with_na, has_skipna=False, check_dtype=False, check_dates=True) - assert_stat_op_api('nunique', float_frame, float_string_frame) - df = DataFrame({'A': [1, 1, 1], - 'B': [1, 2, 3], - 'C': [1, np.nan, 3]}) - tm.assert_series_equal(df.nunique(), Series({'A': 1, 'B': 3, 'C': 2})) - tm.assert_series_equal(df.nunique(dropna=False), - Series({'A': 1, 'B': 3, 'C': 3})) - tm.assert_series_equal(df.nunique(axis=1), Series({0: 1, 1: 2, 2: 2})) - tm.assert_series_equal(df.nunique(axis=1, dropna=False), - Series({0: 1, 1: 3, 2: 2})) - - def test_sum(self, float_frame_with_na, mixed_float_frame, - float_frame, float_string_frame): - assert_stat_op_api('sum', float_frame, float_string_frame, - has_numeric_only=True) - assert_stat_op_calc('sum', np.sum, float_frame_with_na, - skipna_alternative=np.nansum) # mixed types (with upcasting happening) assert_stat_op_calc('sum', np.sum, mixed_float_frame.astype('float32'), check_dtype=False, check_less_precise=True) + assert_stat_op_calc('sum', np.sum, float_frame_with_na, + skipna_alternative=np.nansum) + assert_stat_op_calc('mean', np.mean, float_frame_with_na, + check_dates=True) + assert_stat_op_calc('product', np.prod, float_frame_with_na) + + assert_stat_op_calc('mad', mad, float_frame_with_na) + assert_stat_op_calc('var', var, float_frame_with_na) + assert_stat_op_calc('std', std, float_frame_with_na) + assert_stat_op_calc('sem', sem, float_frame_with_na) + + assert_stat_op_calc('count', count, float_frame_with_na, + has_skipna=False, check_dtype=False, + check_dates=True) + + try: + from scipy import skew, kurtosis # noqa:F401 + assert_stat_op_calc('skew', skewness, float_frame_with_na) + assert_stat_op_calc('kurt', kurt, float_frame_with_na) + except ImportError: + pass + + # TODO: Ensure warning isn't emitted in the first place + @pytest.mark.filterwarnings("ignore:All-NaN:RuntimeWarning") + def test_median(self, float_frame_with_na, int_frame): + def wrapper(x): + if isna(x).any(): + return np.nan + return np.median(x) + + assert_stat_op_calc('median', wrapper, float_frame_with_na, + check_dates=True) + assert_stat_op_calc('median', wrapper, int_frame, check_dtype=False, + check_dates=True) + @pytest.mark.parametrize('method', ['sum', 'mean', 'prod', 'var', 'std', 'skew', 'min', 'max']) def test_stat_operators_attempt_obj_array(self, method): - # GH 676 + # GH#676 data = { 'a': [-0.00049987540199591344, -0.0016467257772919831, 0.00067695870775883013], @@ -789,10 +823,44 @@ def test_stat_operators_attempt_obj_array(self, method): if method in ['sum', 'prod']: tm.assert_series_equal(result, expected) - def test_mean(self, float_frame_with_na, float_frame, float_string_frame): - assert_stat_op_calc('mean', np.mean, float_frame_with_na, - check_dates=True) - assert_stat_op_api('mean', float_frame, float_string_frame) + @pytest.mark.parametrize('op', ['mean', 'std', 'var', + 'skew', 'kurt', 'sem']) + def test_mixed_ops(self, op): + # GH#16116 + df = DataFrame({'int': [1, 2, 3, 4], + 'float': [1., 2., 3., 4.], + 'str': ['a', 'b', 'c', 'd']}) + + result = getattr(df, op)() + assert len(result) == 2 + + with pd.option_context('use_bottleneck', False): + result = getattr(df, op)() + assert len(result) == 2 + + def test_reduce_mixed_frame(self): + # GH 6806 + df = DataFrame({ + 'bool_data': [True, True, False, False, False], + 'int_data': [10, 20, 30, 40, 50], + 'string_data': ['a', 'b', 'c', 'd', 'e'], + }) + df.reindex(columns=['bool_data', 'int_data', 'string_data']) + test = df.sum(axis=0) + tm.assert_numpy_array_equal(test.values, + np.array([2, 150, 'abcde'], dtype=object)) + tm.assert_series_equal(test, df.T.sum(axis=1)) + + def test_nunique(self): + df = DataFrame({'A': [1, 1, 1], + 'B': [1, 2, 3], + 'C': [1, np.nan, 3]}) + tm.assert_series_equal(df.nunique(), Series({'A': 1, 'B': 3, 'C': 2})) + tm.assert_series_equal(df.nunique(dropna=False), + Series({'A': 1, 'B': 3, 'C': 3})) + tm.assert_series_equal(df.nunique(axis=1), Series({0: 1, 1: 2, 2: 2})) + tm.assert_series_equal(df.nunique(axis=1, dropna=False), + Series({0: 1, 1: 3, 2: 2})) @pytest.mark.parametrize('tz', [None, 'UTC']) def test_mean_mixed_datetime_numeric(self, tz): @@ -813,103 +881,7 @@ def test_mean_excludeds_datetimes(self, tz): expected = pd.Series() tm.assert_series_equal(result, expected) - def test_product(self, float_frame_with_na, float_frame, - float_string_frame): - assert_stat_op_calc('product', np.prod, float_frame_with_na) - assert_stat_op_api('product', float_frame, float_string_frame) - - # TODO: Ensure warning isn't emitted in the first place - @pytest.mark.filterwarnings("ignore:All-NaN:RuntimeWarning") - def test_median(self, float_frame_with_na, float_frame, - float_string_frame): - def wrapper(x): - if isna(x).any(): - return np.nan - return np.median(x) - - assert_stat_op_calc('median', wrapper, float_frame_with_na, - check_dates=True) - assert_stat_op_api('median', float_frame, float_string_frame) - - def test_min(self, float_frame_with_na, int_frame, - float_frame, float_string_frame): - with warnings.catch_warnings(record=True): - warnings.simplefilter("ignore", RuntimeWarning) - assert_stat_op_calc('min', np.min, float_frame_with_na, - check_dates=True) - assert_stat_op_calc('min', np.min, int_frame) - assert_stat_op_api('min', float_frame, float_string_frame) - - def test_cummin(self, datetime_frame): - datetime_frame.loc[5:10, 0] = np.nan - datetime_frame.loc[10:15, 1] = np.nan - datetime_frame.loc[15:, 2] = np.nan - - # axis = 0 - cummin = datetime_frame.cummin() - expected = datetime_frame.apply(Series.cummin) - tm.assert_frame_equal(cummin, expected) - - # axis = 1 - cummin = datetime_frame.cummin(axis=1) - expected = datetime_frame.apply(Series.cummin, axis=1) - tm.assert_frame_equal(cummin, expected) - - # it works - df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) - result = df.cummin() # noqa - - # fix issue - cummin_xs = datetime_frame.cummin(axis=1) - assert np.shape(cummin_xs) == np.shape(datetime_frame) - - def test_cummax(self, datetime_frame): - datetime_frame.loc[5:10, 0] = np.nan - datetime_frame.loc[10:15, 1] = np.nan - datetime_frame.loc[15:, 2] = np.nan - - # axis = 0 - cummax = datetime_frame.cummax() - expected = datetime_frame.apply(Series.cummax) - tm.assert_frame_equal(cummax, expected) - - # axis = 1 - cummax = datetime_frame.cummax(axis=1) - expected = datetime_frame.apply(Series.cummax, axis=1) - tm.assert_frame_equal(cummax, expected) - - # it works - df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) - result = df.cummax() # noqa - - # fix issue - cummax_xs = datetime_frame.cummax(axis=1) - assert np.shape(cummax_xs) == np.shape(datetime_frame) - - def test_max(self, float_frame_with_na, int_frame, - float_frame, float_string_frame): - with warnings.catch_warnings(record=True): - warnings.simplefilter("ignore", RuntimeWarning) - assert_stat_op_calc('max', np.max, float_frame_with_na, - check_dates=True) - assert_stat_op_calc('max', np.max, int_frame) - assert_stat_op_api('max', float_frame, float_string_frame) - - def test_mad(self, float_frame_with_na, float_frame, float_string_frame): - f = lambda x: np.abs(x - x.mean()).mean() - assert_stat_op_calc('mad', f, float_frame_with_na) - assert_stat_op_api('mad', float_frame, float_string_frame) - - def test_var_std(self, float_frame_with_na, datetime_frame, float_frame, - float_string_frame): - alt = lambda x: np.var(x, ddof=1) - assert_stat_op_calc('var', alt, float_frame_with_na) - assert_stat_op_api('var', float_frame, float_string_frame) - - alt = lambda x: np.std(x, ddof=1) - assert_stat_op_calc('std', alt, float_frame_with_na) - assert_stat_op_api('std', float_frame, float_string_frame) - + def test_var_std(self, datetime_frame): result = datetime_frame.std(ddof=4) expected = datetime_frame.apply(lambda x: x.std(ddof=4)) tm.assert_almost_equal(result, expected) @@ -926,6 +898,7 @@ def test_var_std(self, float_frame_with_na, datetime_frame, float_frame, result = nanops.nanvar(arr, axis=0) assert not (result < 0).any() + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") @pytest.mark.parametrize( "meth", ['sem', 'var', 'std']) def test_numeric_only_flag(self, meth): @@ -947,84 +920,14 @@ def test_numeric_only_flag(self, meth): tm.assert_series_equal(expected, result) # df1 has all numbers, df2 has a letter inside - pytest.raises(TypeError, lambda: getattr(df1, meth)( - axis=1, numeric_only=False)) - pytest.raises(TypeError, lambda: getattr(df2, meth)( - axis=1, numeric_only=False)) - - @pytest.mark.parametrize('op', ['mean', 'std', 'var', - 'skew', 'kurt', 'sem']) - def test_mixed_ops(self, op): - # GH 16116 - df = DataFrame({'int': [1, 2, 3, 4], - 'float': [1., 2., 3., 4.], - 'str': ['a', 'b', 'c', 'd']}) - - result = getattr(df, op)() - assert len(result) == 2 - - with pd.option_context('use_bottleneck', False): - result = getattr(df, op)() - assert len(result) == 2 - - def test_cumsum(self, datetime_frame): - datetime_frame.loc[5:10, 0] = np.nan - datetime_frame.loc[10:15, 1] = np.nan - datetime_frame.loc[15:, 2] = np.nan - - # axis = 0 - cumsum = datetime_frame.cumsum() - expected = datetime_frame.apply(Series.cumsum) - tm.assert_frame_equal(cumsum, expected) - - # axis = 1 - cumsum = datetime_frame.cumsum(axis=1) - expected = datetime_frame.apply(Series.cumsum, axis=1) - tm.assert_frame_equal(cumsum, expected) - - # works - df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) - result = df.cumsum() # noqa - - # fix issue - cumsum_xs = datetime_frame.cumsum(axis=1) - assert np.shape(cumsum_xs) == np.shape(datetime_frame) - - def test_cumprod(self, datetime_frame): - datetime_frame.loc[5:10, 0] = np.nan - datetime_frame.loc[10:15, 1] = np.nan - datetime_frame.loc[15:, 2] = np.nan - - # axis = 0 - cumprod = datetime_frame.cumprod() - expected = datetime_frame.apply(Series.cumprod) - tm.assert_frame_equal(cumprod, expected) - - # axis = 1 - cumprod = datetime_frame.cumprod(axis=1) - expected = datetime_frame.apply(Series.cumprod, axis=1) - tm.assert_frame_equal(cumprod, expected) - - # fix issue - cumprod_xs = datetime_frame.cumprod(axis=1) - assert np.shape(cumprod_xs) == np.shape(datetime_frame) - - # ints - df = datetime_frame.fillna(0).astype(int) - df.cumprod(0) - df.cumprod(1) - - # ints32 - df = datetime_frame.fillna(0).astype(np.int32) - df.cumprod(0) - df.cumprod(1) - - def test_sem(self, float_frame_with_na, datetime_frame, - float_frame, float_string_frame): - alt = lambda x: np.std(x, ddof=1) / np.sqrt(len(x)) - assert_stat_op_calc('sem', alt, float_frame_with_na) - assert_stat_op_api('sem', float_frame, float_string_frame) - + msg = r"unsupported operand type\(s\) for -: 'float' and 'str'" + with pytest.raises(TypeError, match=msg): + getattr(df1, meth)(axis=1, numeric_only=False) + msg = "could not convert string to float: 'a'" + with pytest.raises(TypeError, match=msg): + getattr(df2, meth)(axis=1, numeric_only=False) + + def test_sem(self, datetime_frame): result = datetime_frame.sem(ddof=4) expected = datetime_frame.apply( lambda x: x.std(ddof=4) / np.sqrt(len(x))) @@ -1039,29 +942,7 @@ def test_sem(self, float_frame_with_na, datetime_frame, assert not (result < 0).any() @td.skip_if_no_scipy - def test_skew(self, float_frame_with_na, float_frame, float_string_frame): - from scipy.stats import skew - - def alt(x): - if len(x) < 3: - return np.nan - return skew(x, bias=False) - - assert_stat_op_calc('skew', alt, float_frame_with_na) - assert_stat_op_api('skew', float_frame, float_string_frame) - - @td.skip_if_no_scipy - def test_kurt(self, float_frame_with_na, float_frame, float_string_frame): - from scipy.stats import kurtosis - - def alt(x): - if len(x) < 4: - return np.nan - return kurtosis(x, bias=False) - - assert_stat_op_calc('kurt', alt, float_frame_with_na) - assert_stat_op_api('kurt', float_frame, float_string_frame) - + def test_kurt(self): index = MultiIndex(levels=[['bar'], ['one', 'two', 'three'], [0, 1]], codes=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], @@ -1218,7 +1099,9 @@ def test_operators_timedelta64(self): assert df['off1'].dtype == 'timedelta64[ns]' assert df['off2'].dtype == 'timedelta64[ns]' - def test_sum_corner(self, empty_frame): + def test_sum_corner(self): + empty_frame = DataFrame() + axis0 = empty_frame.sum(0) axis1 = empty_frame.sum(1) assert isinstance(axis0, Series) @@ -1323,20 +1206,146 @@ def test_stats_mixed_type(self, float_string_frame): float_string_frame.mean(1) float_string_frame.skew(1) - # TODO: Ensure warning isn't emitted in the first place - @pytest.mark.filterwarnings("ignore:All-NaN:RuntimeWarning") - def test_median_corner(self, int_frame, float_frame, float_string_frame): - def wrapper(x): - if isna(x).any(): - return np.nan - return np.median(x) + def test_sum_bools(self): + df = DataFrame(index=lrange(1), columns=lrange(10)) + bools = isna(df) + assert bools.sum(axis=1)[0] == 10 - assert_stat_op_calc('median', wrapper, int_frame, check_dtype=False, - check_dates=True) - assert_stat_op_api('median', float_frame, float_string_frame) + # --------------------------------------------------------------------- + # Cumulative Reductions - cumsum, cummax, ... + + def test_cumsum_corner(self): + dm = DataFrame(np.arange(20).reshape(4, 5), + index=lrange(4), columns=lrange(5)) + # ?(wesm) + result = dm.cumsum() # noqa + + def test_cumsum(self, datetime_frame): + datetime_frame.loc[5:10, 0] = np.nan + datetime_frame.loc[10:15, 1] = np.nan + datetime_frame.loc[15:, 2] = np.nan + + # axis = 0 + cumsum = datetime_frame.cumsum() + expected = datetime_frame.apply(Series.cumsum) + tm.assert_frame_equal(cumsum, expected) + + # axis = 1 + cumsum = datetime_frame.cumsum(axis=1) + expected = datetime_frame.apply(Series.cumsum, axis=1) + tm.assert_frame_equal(cumsum, expected) + # works + df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) + result = df.cumsum() # noqa + + # fix issue + cumsum_xs = datetime_frame.cumsum(axis=1) + assert np.shape(cumsum_xs) == np.shape(datetime_frame) + + def test_cumprod(self, datetime_frame): + datetime_frame.loc[5:10, 0] = np.nan + datetime_frame.loc[10:15, 1] = np.nan + datetime_frame.loc[15:, 2] = np.nan + + # axis = 0 + cumprod = datetime_frame.cumprod() + expected = datetime_frame.apply(Series.cumprod) + tm.assert_frame_equal(cumprod, expected) + + # axis = 1 + cumprod = datetime_frame.cumprod(axis=1) + expected = datetime_frame.apply(Series.cumprod, axis=1) + tm.assert_frame_equal(cumprod, expected) + + # fix issue + cumprod_xs = datetime_frame.cumprod(axis=1) + assert np.shape(cumprod_xs) == np.shape(datetime_frame) + + # ints + df = datetime_frame.fillna(0).astype(int) + df.cumprod(0) + df.cumprod(1) + + # ints32 + df = datetime_frame.fillna(0).astype(np.int32) + df.cumprod(0) + df.cumprod(1) + + def test_cummin(self, datetime_frame): + datetime_frame.loc[5:10, 0] = np.nan + datetime_frame.loc[10:15, 1] = np.nan + datetime_frame.loc[15:, 2] = np.nan + + # axis = 0 + cummin = datetime_frame.cummin() + expected = datetime_frame.apply(Series.cummin) + tm.assert_frame_equal(cummin, expected) + + # axis = 1 + cummin = datetime_frame.cummin(axis=1) + expected = datetime_frame.apply(Series.cummin, axis=1) + tm.assert_frame_equal(cummin, expected) + + # it works + df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) + result = df.cummin() # noqa + + # fix issue + cummin_xs = datetime_frame.cummin(axis=1) + assert np.shape(cummin_xs) == np.shape(datetime_frame) + + def test_cummax(self, datetime_frame): + datetime_frame.loc[5:10, 0] = np.nan + datetime_frame.loc[10:15, 1] = np.nan + datetime_frame.loc[15:, 2] = np.nan + + # axis = 0 + cummax = datetime_frame.cummax() + expected = datetime_frame.apply(Series.cummax) + tm.assert_frame_equal(cummax, expected) + + # axis = 1 + cummax = datetime_frame.cummax(axis=1) + expected = datetime_frame.apply(Series.cummax, axis=1) + tm.assert_frame_equal(cummax, expected) + + # it works + df = DataFrame({'A': np.arange(20)}, index=np.arange(20)) + result = df.cummax() # noqa + + # fix issue + cummax_xs = datetime_frame.cummax(axis=1) + assert np.shape(cummax_xs) == np.shape(datetime_frame) + + # --------------------------------------------------------------------- # Miscellanea + def test_count(self): + # corner case + frame = DataFrame() + ct1 = frame.count(1) + assert isinstance(ct1, Series) + + ct2 = frame.count(0) + assert isinstance(ct2, Series) + + # GH#423 + df = DataFrame(index=lrange(10)) + result = df.count(1) + expected = Series(0, index=df.index) + tm.assert_series_equal(result, expected) + + df = DataFrame(columns=lrange(10)) + result = df.count(0) + expected = Series(0, index=df.columns) + tm.assert_series_equal(result, expected) + + df = DataFrame() + result = df.count() + expected = Series(0, index=[]) + tm.assert_series_equal(result, expected) + def test_count_objects(self, float_string_frame): dm = DataFrame(float_string_frame._series) df = DataFrame(float_string_frame._series) @@ -1344,19 +1353,26 @@ def test_count_objects(self, float_string_frame): tm.assert_series_equal(dm.count(), df.count()) tm.assert_series_equal(dm.count(1), df.count(1)) - def test_cumsum_corner(self): - dm = DataFrame(np.arange(20).reshape(4, 5), - index=lrange(4), columns=lrange(5)) - # ?(wesm) - result = dm.cumsum() # noqa + def test_pct_change(self): + # GH#11150 + pnl = DataFrame([np.arange(0, 40, 10), + np.arange(0, 40, 10), + np.arange(0, 40, 10)]).astype(np.float64) + pnl.iat[1, 0] = np.nan + pnl.iat[1, 1] = np.nan + pnl.iat[2, 3] = 60 - def test_sum_bools(self): - df = DataFrame(index=lrange(1), columns=lrange(10)) - bools = isna(df) - assert bools.sum(axis=1)[0] == 10 + for axis in range(2): + expected = pnl.ffill(axis=axis) / pnl.ffill(axis=axis).shift( + axis=axis) - 1 + result = pnl.pct_change(axis=axis, fill_method='pad') + + tm.assert_frame_equal(result, expected) + # ---------------------------------------------------------------------- # Index of max / min + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_idxmin(self, float_frame, int_frame): frame = float_frame frame.loc[5:10] = np.nan @@ -1369,8 +1385,11 @@ def test_idxmin(self, float_frame, int_frame): skipna=skipna) tm.assert_series_equal(result, expected) - pytest.raises(ValueError, frame.idxmin, axis=2) + msg = "No axis named 2 for object type " + with pytest.raises(ValueError, match=msg): + frame.idxmin(axis=2) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_idxmax(self, float_frame, int_frame): frame = float_frame frame.loc[5:10] = np.nan @@ -1383,7 +1402,9 @@ def test_idxmax(self, float_frame, int_frame): skipna=skipna) tm.assert_series_equal(result, expected) - pytest.raises(ValueError, frame.idxmax, axis=2) + msg = "No axis named 2 for object type " + with pytest.raises(ValueError, match=msg): + frame.idxmax(axis=2) # ---------------------------------------------------------------------- # Logical reductions @@ -1442,6 +1463,26 @@ def test_any_datetime(self): expected = Series([True, True, True, False]) tm.assert_series_equal(result, expected) + def test_any_all_bool_only(self): + + # GH 25101 + df = DataFrame({"col1": [1, 2, 3], + "col2": [4, 5, 6], + "col3": [None, None, None]}) + + result = df.all(bool_only=True) + expected = Series(dtype=np.bool) + tm.assert_series_equal(result, expected) + + df = DataFrame({"col1": [1, 2, 3], + "col2": [4, 5, 6], + "col3": [None, None, None], + "col4": [False, False, True]}) + + result = df.all(bool_only=True) + expected = Series({"col4": False}) + tm.assert_series_equal(result, expected) + @pytest.mark.parametrize('func, data, expected', [ (np.any, {}, False), (np.all, {}, True), @@ -1680,7 +1721,9 @@ def test_isin_empty_datetimelike(self): result = df1_td.isin(df3) tm.assert_frame_equal(result, expected) + # --------------------------------------------------------------------- # Rounding + def test_round(self): # GH 2665 @@ -1810,6 +1853,17 @@ def test_numpy_round(self): with pytest.raises(ValueError, match=msg): np.round(df, decimals=0, out=df) + @pytest.mark.xfail( + PY2 and is_platform_windows(), reason="numpy/numpy#7882", + raises=AssertionError, strict=True) + def test_numpy_round_nan(self): + # See gh-14197 + df = Series([1.53, np.nan, 0.06]).to_frame() + with tm.assert_produces_warning(None): + result = df.round() + expected = Series([2., np.nan, 0.]).to_frame() + tm.assert_frame_equal(result, expected) + def test_round_mixed_type(self): # GH 11885 df = DataFrame({'col1': [1.1, 2.2, 3.3, 4.4], @@ -1836,7 +1890,9 @@ def test_round_issue(self): tm.assert_index_equal(rounded.index, dfs.index) decimals = pd.Series([1, 0, 2], index=['A', 'B', 'A']) - pytest.raises(ValueError, df.round, decimals) + msg = "Index of decimals must be unique" + with pytest.raises(ValueError, match=msg): + df.round(decimals) def test_built_in_round(self): if not compat.PY3: @@ -1868,22 +1924,9 @@ def test_round_nonunique_categorical(self): tm.assert_frame_equal(result, expected) - def test_pct_change(self): - # GH 11150 - pnl = DataFrame([np.arange(0, 40, 10), np.arange(0, 40, 10), np.arange( - 0, 40, 10)]).astype(np.float64) - pnl.iat[1, 0] = np.nan - pnl.iat[1, 1] = np.nan - pnl.iat[2, 3] = 60 - - for axis in range(2): - expected = pnl.ffill(axis=axis) / pnl.ffill(axis=axis).shift( - axis=axis) - 1 - result = pnl.pct_change(axis=axis, fill_method='pad') - - tm.assert_frame_equal(result, expected) - + # --------------------------------------------------------------------- # Clip + def test_clip(self, float_frame): median = float_frame.median().median() original = float_frame.copy() @@ -2056,7 +2099,9 @@ def test_clip_with_na_args(self, float_frame): 'col_2': [np.nan, np.nan, np.nan]}) tm.assert_frame_equal(result, expected) + # --------------------------------------------------------------------- # Matrix-like + def test_dot(self): a = DataFrame(np.random.randn(3, 4), index=['a', 'b', 'c'], columns=['p', 'q', 'r', 's']) diff --git a/pandas/tests/frame/test_api.py b/pandas/tests/frame/test_api.py index 0934dd20638e4..118341276d799 100644 --- a/pandas/tests/frame/test_api.py +++ b/pandas/tests/frame/test_api.py @@ -9,7 +9,7 @@ import numpy as np import pytest -from pandas.compat import long, lrange, range +from pandas.compat import PY2, long, lrange, range import pandas as pd from pandas import ( @@ -142,10 +142,16 @@ def test_tab_completion(self): assert key not in dir(df) assert isinstance(df.__getitem__('A'), pd.DataFrame) - def test_not_hashable(self, empty_frame): + def test_not_hashable(self): + empty_frame = DataFrame() + df = self.klass([1]) - pytest.raises(TypeError, hash, df) - pytest.raises(TypeError, hash, empty_frame) + msg = ("'(Sparse)?DataFrame' objects are mutable, thus they cannot be" + " hashed") + with pytest.raises(TypeError, match=msg): + hash(df) + with pytest.raises(TypeError, match=msg): + hash(empty_frame) def test_new_empty_index(self): df1 = self.klass(np.random.randn(0, 3)) @@ -169,9 +175,12 @@ def test_get_agg_axis(self, float_frame): idx = float_frame._get_agg_axis(1) assert idx is float_frame.index - pytest.raises(ValueError, float_frame._get_agg_axis, 2) + msg = r"Axis must be 0 or 1 \(got 2\)" + with pytest.raises(ValueError, match=msg): + float_frame._get_agg_axis(2) - def test_nonzero(self, float_frame, float_string_frame, empty_frame): + def test_nonzero(self, float_frame, float_string_frame): + empty_frame = DataFrame() assert empty_frame.empty assert not float_frame.empty @@ -351,12 +360,15 @@ def test_transpose(self, float_frame): for col, s in compat.iteritems(mixed_T): assert s.dtype == np.object_ + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_swapaxes(self): df = self.klass(np.random.randn(10, 5)) self._assert_frame_equal(df.T, df.swapaxes(0, 1)) self._assert_frame_equal(df.T, df.swapaxes(1, 0)) self._assert_frame_equal(df, df.swapaxes(0, 0)) - pytest.raises(ValueError, df.swapaxes, 2, 5) + msg = "No axis named 2 for object type " + with pytest.raises(ValueError, match=msg): + df.swapaxes(2, 5) def test_axis_aliases(self, float_frame): f = float_frame diff --git a/pandas/tests/frame/test_apply.py b/pandas/tests/frame/test_apply.py index ade527a16c902..4d1e3e7ae1f38 100644 --- a/pandas/tests/frame/test_apply.py +++ b/pandas/tests/frame/test_apply.py @@ -74,8 +74,10 @@ def test_apply_mixed_datetimelike(self): result = df.apply(lambda x: x, axis=1) assert_frame_equal(result, df) - def test_apply_empty(self, float_frame, empty_frame): + def test_apply_empty(self, float_frame): # empty + empty_frame = DataFrame() + applied = empty_frame.apply(np.sqrt) assert applied.empty @@ -97,8 +99,10 @@ def test_apply_empty(self, float_frame, empty_frame): result = expected.apply(lambda x: x['a'], axis=1) assert_frame_equal(expected, result) - def test_apply_with_reduce_empty(self, empty_frame): + def test_apply_with_reduce_empty(self): # reduce with an empty DataFrame + empty_frame = DataFrame() + x = [] result = empty_frame.apply(x.append, axis=1, result_type='expand') assert_frame_equal(result, empty_frame) @@ -116,7 +120,9 @@ def test_apply_with_reduce_empty(self, empty_frame): # Ensure that x.append hasn't been called assert x == [] - def test_apply_deprecate_reduce(self, empty_frame): + def test_apply_deprecate_reduce(self): + empty_frame = DataFrame() + x = [] with tm.assert_produces_warning(FutureWarning): empty_frame.apply(x.append, axis=1, reduce=True) @@ -318,6 +324,13 @@ def test_apply_reduce_Series(self, float_frame): result = float_frame.apply(np.mean, axis=1) assert_series_equal(result, expected) + def test_apply_reduce_rows_to_dict(self): + # GH 25196 + data = pd.DataFrame([[1, 2], [3, 4]]) + expected = pd.Series([{0: 1, 1: 3}, {0: 2, 1: 4}]) + result = data.apply(dict) + assert_series_equal(result, expected) + def test_apply_differently_indexed(self): df = DataFrame(np.random.randn(20, 10)) diff --git a/pandas/tests/frame/test_axis_select_reindex.py b/pandas/tests/frame/test_axis_select_reindex.py index dea925dcde676..fb00776b33cbb 100644 --- a/pandas/tests/frame/test_axis_select_reindex.py +++ b/pandas/tests/frame/test_axis_select_reindex.py @@ -7,7 +7,7 @@ import numpy as np import pytest -from pandas.compat import lrange, lzip, u +from pandas.compat import PY2, lrange, lzip, u from pandas.errors import PerformanceWarning import pandas as pd @@ -38,8 +38,11 @@ def test_drop_names(self): assert obj.columns.name == 'second' assert list(df.columns) == ['d', 'e', 'f'] - pytest.raises(KeyError, df.drop, ['g']) - pytest.raises(KeyError, df.drop, ['g'], 1) + msg = r"\['g'\] not found in axis" + with pytest.raises(KeyError, match=msg): + df.drop(['g']) + with pytest.raises(KeyError, match=msg): + df.drop(['g'], 1) # errors = 'ignore' dropped = df.drop(['g'], errors='ignore') @@ -84,10 +87,14 @@ def test_drop(self): assert_frame_equal(simple.drop( [0, 3], axis='index'), simple.loc[[1, 2], :]) - pytest.raises(KeyError, simple.drop, 5) - pytest.raises(KeyError, simple.drop, 'C', 1) - pytest.raises(KeyError, simple.drop, [1, 5]) - pytest.raises(KeyError, simple.drop, ['A', 'C'], 1) + with pytest.raises(KeyError, match=r"\[5\] not found in axis"): + simple.drop(5) + with pytest.raises(KeyError, match=r"\['C'\] not found in axis"): + simple.drop('C', 1) + with pytest.raises(KeyError, match=r"\[5\] not found in axis"): + simple.drop([1, 5]) + with pytest.raises(KeyError, match=r"\['C'\] not found in axis"): + simple.drop(['A', 'C'], 1) # errors = 'ignore' assert_frame_equal(simple.drop(5, errors='ignore'), simple) @@ -444,7 +451,9 @@ def test_reindex_dups(self): assert_frame_equal(result, expected) # reindex fails - pytest.raises(ValueError, df.reindex, index=list(range(len(df)))) + msg = "cannot reindex from a duplicate axis" + with pytest.raises(ValueError, match=msg): + df.reindex(index=list(range(len(df)))) def test_reindex_axis_style(self): # https://github.com/pandas-dev/pandas/issues/12392 @@ -963,10 +972,15 @@ def test_take(self): assert_frame_equal(result, expected, check_names=False) # illegal indices - pytest.raises(IndexError, df.take, [3, 1, 2, 30], axis=0) - pytest.raises(IndexError, df.take, [3, 1, 2, -31], axis=0) - pytest.raises(IndexError, df.take, [3, 1, 2, 5], axis=1) - pytest.raises(IndexError, df.take, [3, 1, 2, -5], axis=1) + msg = "indices are out-of-bounds" + with pytest.raises(IndexError, match=msg): + df.take([3, 1, 2, 30], axis=0) + with pytest.raises(IndexError, match=msg): + df.take([3, 1, 2, -31], axis=0) + with pytest.raises(IndexError, match=msg): + df.take([3, 1, 2, 5], axis=1) + with pytest.raises(IndexError, match=msg): + df.take([3, 1, 2, -5], axis=1) # mixed-dtype order = [4, 1, 2, 0, 3] @@ -1037,6 +1051,7 @@ def test_reindex_corner(self): smaller = self.intframe.reindex(columns=['A', 'B', 'E']) assert smaller['E'].dtype == np.float64 + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_reindex_axis(self): cols = ['A', 'B', 'E'] with tm.assert_produces_warning(FutureWarning) as m: @@ -1052,7 +1067,9 @@ def test_reindex_axis(self): reindexed2 = self.intframe.reindex(index=rows) assert_frame_equal(reindexed1, reindexed2) - pytest.raises(ValueError, self.intframe.reindex_axis, rows, axis=2) + msg = "No axis named 2 for object type " + with pytest.raises(ValueError, match=msg): + self.intframe.reindex_axis(rows, axis=2) # no-op case cols = self.frame.columns.copy() diff --git a/pandas/tests/frame/test_block_internals.py b/pandas/tests/frame/test_block_internals.py index 5419f4d5127f6..4b06d2e35cdfc 100644 --- a/pandas/tests/frame/test_block_internals.py +++ b/pandas/tests/frame/test_block_internals.py @@ -274,10 +274,12 @@ def f(dtype): columns=["A", "B", "C"], dtype=dtype) - pytest.raises(NotImplementedError, f, - [("A", "datetime64[h]"), - ("B", "str"), - ("C", "int32")]) + msg = ("compound dtypes are not implemented in the DataFrame" + " constructor") + with pytest.raises(NotImplementedError, match=msg): + f([("A", "datetime64[h]"), + ("B", "str"), + ("C", "int32")]) # these work (though results may be unexpected) f('int64') @@ -347,7 +349,9 @@ def test_copy(self, float_frame, float_string_frame): copy = float_string_frame.copy() assert copy._data is not float_string_frame._data - def test_pickle(self, float_string_frame, empty_frame, timezone_frame): + def test_pickle(self, float_string_frame, timezone_frame): + empty_frame = DataFrame() + unpickled = tm.round_trip_pickle(float_string_frame) assert_frame_equal(float_string_frame, unpickled) diff --git a/pandas/tests/frame/test_combine_concat.py b/pandas/tests/frame/test_combine_concat.py index 59497153c8524..c2364dc135a9a 100644 --- a/pandas/tests/frame/test_combine_concat.py +++ b/pandas/tests/frame/test_combine_concat.py @@ -504,6 +504,16 @@ def test_concat_numerical_names(self): names=[1, 2])) tm.assert_frame_equal(result, expected) + def test_concat_astype_dup_col(self): + # gh 23049 + df = pd.DataFrame([{'a': 'b'}]) + df = pd.concat([df, df], axis=1) + + result = df.astype('category') + expected = pd.DataFrame(np.array(["b", "b"]).reshape(1, 2), + columns=["a", "a"]).astype("category") + tm.assert_frame_equal(result, expected) + class TestDataFrameCombineFirst(TestData): diff --git a/pandas/tests/frame/test_constructors.py b/pandas/tests/frame/test_constructors.py index 90ad48cac3a5f..fc642d211b30c 100644 --- a/pandas/tests/frame/test_constructors.py +++ b/pandas/tests/frame/test_constructors.py @@ -2,6 +2,7 @@ from __future__ import print_function +from collections import OrderedDict from datetime import datetime, timedelta import functools import itertools @@ -11,8 +12,8 @@ import pytest from pandas.compat import ( - PY3, PY36, OrderedDict, is_platform_little_endian, lmap, long, lrange, - lzip, range, zip) + PY2, PY3, PY36, is_platform_little_endian, lmap, long, lrange, lzip, range, + zip) from pandas.core.dtypes.cast import construct_1d_object_array_from_listlike from pandas.core.dtypes.common import is_integer_dtype @@ -58,8 +59,9 @@ def test_constructor_cast_failure(self): df['foo'] = np.ones((4, 2)).tolist() # this is not ok - pytest.raises(ValueError, df.__setitem__, tuple(['test']), - np.ones((4, 2))) + msg = "Wrong number of items passed 2, placement implies 1" + with pytest.raises(ValueError, match=msg): + df['test'] = np.ones((4, 2)) # this is ok df['foo2'] = np.ones((4, 2)).tolist() @@ -247,7 +249,7 @@ def test_constructor_dict(self): assert isna(frame['col3']).all() # Corner cases - assert len(DataFrame({})) == 0 + assert len(DataFrame()) == 0 # mix dict and array, wrong size - no spec for which error should raise # first @@ -1183,6 +1185,13 @@ def test_constructor_mixed_dict_and_Series(self): index=['a', 'b']) tm.assert_frame_equal(result, expected) + def test_constructor_mixed_type_rows(self): + # Issue 25075 + data = [[1, 2], (3, 4)] + result = DataFrame(data) + expected = DataFrame([[1, 2], [3, 4]]) + tm.assert_frame_equal(result, expected) + def test_constructor_tuples(self): result = DataFrame({'A': [(1, 2), (3, 4)]}) expected = DataFrame({'A': Series([(1, 2), (3, 4)])}) @@ -1252,7 +1261,9 @@ def test_constructor_Series_named(self): expected = DataFrame({0: s}) tm.assert_frame_equal(df, expected) - pytest.raises(ValueError, DataFrame, s, columns=[1, 2]) + msg = r"Shape of passed values is \(10, 1\), indices imply \(10, 2\)" + with pytest.raises(ValueError, match=msg): + DataFrame(s, columns=[1, 2]) # #2234 a = Series([], name='x') @@ -1426,8 +1437,10 @@ def test_constructor_column_duplicates(self): tm.assert_frame_equal(idf, edf) - pytest.raises(ValueError, DataFrame.from_dict, - OrderedDict([('b', 8), ('a', 5), ('a', 6)])) + msg = "If using all scalar values, you must pass an index" + with pytest.raises(ValueError, match=msg): + DataFrame.from_dict( + OrderedDict([('b', 8), ('a', 5), ('a', 6)])) def test_constructor_empty_with_string_dtype(self): # GH 9428 @@ -1458,8 +1471,11 @@ def test_constructor_single_value(self): dtype=object), index=[1, 2], columns=['a', 'c'])) - pytest.raises(ValueError, DataFrame, 'a', [1, 2]) - pytest.raises(ValueError, DataFrame, 'a', columns=['a', 'c']) + msg = "DataFrame constructor not properly called!" + with pytest.raises(ValueError, match=msg): + DataFrame('a', [1, 2]) + with pytest.raises(ValueError, match=msg): + DataFrame('a', columns=['a', 'c']) msg = 'incompatible data and dtype' with pytest.raises(TypeError, match=msg): @@ -1685,6 +1701,7 @@ def test_constructor_series_copy(self): assert not (series['A'] == 5).all() + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_constructor_with_nas(self): # GH 5016 # na's in indices @@ -1697,9 +1714,11 @@ def check(df): # No NaN found -> error if len(indexer) == 0: - def f(): + msg = ("cannot do label indexing on" + r" " + r" with these indexers \[nan\] of ") + with pytest.raises(TypeError, match=msg): df.loc[:, np.nan] - pytest.raises(TypeError, f) # single nan should result in Series elif len(indexer) == 1: tm.assert_series_equal(df.iloc[:, indexer[0]], @@ -1775,13 +1794,15 @@ def test_constructor_categorical(self): tm.assert_frame_equal(df, expected) # invalid (shape) - pytest.raises(ValueError, - lambda: DataFrame([Categorical(list('abc')), - Categorical(list('abdefg'))])) + msg = r"Shape of passed values is \(6, 2\), indices imply \(3, 2\)" + with pytest.raises(ValueError, match=msg): + DataFrame([Categorical(list('abc')), + Categorical(list('abdefg'))]) # ndim > 1 - pytest.raises(NotImplementedError, - lambda: Categorical(np.array([list('abcd')]))) + msg = "> 1 ndim Categorical are not supported at this time" + with pytest.raises(NotImplementedError, match=msg): + Categorical(np.array([list('abcd')])) def test_constructor_categorical_series(self): @@ -2157,8 +2178,11 @@ def test_from_records_bad_index_column(self): tm.assert_index_equal(df1.index, Index(df.C)) # should fail - pytest.raises(ValueError, DataFrame.from_records, df, index=[2]) - pytest.raises(KeyError, DataFrame.from_records, df, index=2) + msg = r"Shape of passed values is \(10, 3\), indices imply \(1, 3\)" + with pytest.raises(ValueError, match=msg): + DataFrame.from_records(df, index=[2]) + with pytest.raises(KeyError, match=r"^2$"): + DataFrame.from_records(df, index=2) def test_from_records_non_tuple(self): class Record(object): diff --git a/pandas/tests/frame/test_convert_to.py b/pandas/tests/frame/test_convert_to.py index ddf85136126a1..db60fbf0f8563 100644 --- a/pandas/tests/frame/test_convert_to.py +++ b/pandas/tests/frame/test_convert_to.py @@ -10,7 +10,9 @@ from pandas.compat import long -from pandas import DataFrame, MultiIndex, Series, Timestamp, compat, date_range +from pandas import ( + CategoricalDtype, DataFrame, MultiIndex, Series, Timestamp, compat, + date_range) from pandas.tests.frame.common import TestData import pandas.util.testing as tm @@ -73,11 +75,15 @@ def test_to_dict_index_not_unique_with_index_orient(self): # GH22801 # Data loss when indexes are not unique. Raise ValueError. df = DataFrame({'a': [1, 2], 'b': [0.5, 0.75]}, index=['A', 'A']) - pytest.raises(ValueError, df.to_dict, orient='index') + msg = "DataFrame index must be unique for orient='index'" + with pytest.raises(ValueError, match=msg): + df.to_dict(orient='index') def test_to_dict_invalid_orient(self): df = DataFrame({'A': [0, 1]}) - pytest.raises(ValueError, df.to_dict, orient='xinvalid') + msg = "orient 'xinvalid' not understood" + with pytest.raises(ValueError, match=msg): + df.to_dict(orient='xinvalid') def test_to_records_dt64(self): df = DataFrame([["one", "two", "three"], @@ -220,6 +226,12 @@ def test_to_records_with_categorical(self): dtype=[("index", " 0.5) + msg = "Cannot index with multidimensional key" + with pytest.raises(ValueError, match=msg): + f.ix[f > 0.5] def test_slice_floats(self): index = [52195.504153, 52196.303147, 52198.369883] @@ -865,6 +869,7 @@ def test_getitem_fancy_slice_integers_step(self): df.iloc[:8:2] = np.nan assert isna(df.iloc[:8:2]).values.all() + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_getitem_setitem_integer_slice_keyerrors(self): df = DataFrame(np.random.randn(10, 5), index=lrange(0, 20, 2)) @@ -887,8 +892,10 @@ def test_getitem_setitem_integer_slice_keyerrors(self): # non-monotonic, raise KeyError df2 = df.iloc[lrange(5) + lrange(5, 10)[::-1]] - pytest.raises(KeyError, df2.loc.__getitem__, slice(3, 11)) - pytest.raises(KeyError, df2.loc.__setitem__, slice(3, 11), 0) + with pytest.raises(KeyError, match=r"^3$"): + df2.loc[3:11] + with pytest.raises(KeyError, match=r"^3$"): + df2.loc[3:11] = 0 def test_setitem_fancy_2d(self): @@ -1077,6 +1084,7 @@ def test_fancy_getitem_int_labels(self): expected = df[3] assert_series_equal(result, expected) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_fancy_index_int_labels_exceptions(self): df = DataFrame(np.random.randn(10, 5), index=np.arange(0, 20, 2)) @@ -1084,14 +1092,18 @@ def test_fancy_index_int_labels_exceptions(self): simplefilter("ignore", DeprecationWarning) # labels that aren't contained - pytest.raises(KeyError, df.ix.__setitem__, - ([0, 1, 2], [2, 3, 4]), 5) + with pytest.raises(KeyError, match=r"\[1\] not in index"): + df.ix[[0, 1, 2], [2, 3, 4]] = 5 # try to set indices not contained in frame - pytest.raises(KeyError, self.frame.ix.__setitem__, - ['foo', 'bar', 'baz'], 1) - pytest.raises(KeyError, self.frame.ix.__setitem__, - (slice(None, None), ['E']), 1) + msg = (r"None of \[Index\(\['foo', 'bar', 'baz'\]," + r" dtype='object'\)\] are in the \[index\]") + with pytest.raises(KeyError, match=msg): + self.frame.ix[['foo', 'bar', 'baz']] = 1 + msg = (r"None of \[Index\(\['E'\], dtype='object'\)\] are in the" + r" \[columns\]") + with pytest.raises(KeyError, match=msg): + self.frame.ix[:, ['E']] = 1 # partial setting now allows this GH2578 # pytest.raises(KeyError, self.frame.ix.__setitem__, @@ -1504,6 +1516,7 @@ def test_getitem_setitem_boolean_multi(self): expected.loc[[0, 2], [1]] = 5 assert_frame_equal(df, expected) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_getitem_setitem_float_labels(self): index = Index([1.5, 2, 3, 4, 5]) df = DataFrame(np.random.randn(5, 5), index=index) @@ -1537,7 +1550,11 @@ def test_getitem_setitem_float_labels(self): df = DataFrame(np.random.randn(5, 5), index=index) # positional slicing only via iloc! - pytest.raises(TypeError, lambda: df.iloc[1.0:5]) + msg = ("cannot do slice indexing on" + r" with" + r" these indexers \[1.0\] of ") + with pytest.raises(TypeError, match=msg): + df.iloc[1.0:5] result = df.iloc[4:5] expected = df.reindex([5.0]) @@ -1744,11 +1761,16 @@ def test_getitem_setitem_ix_bool_keyerror(self): # #2199 df = DataFrame({'a': [1, 2, 3]}) - pytest.raises(KeyError, df.loc.__getitem__, False) - pytest.raises(KeyError, df.loc.__getitem__, True) + with pytest.raises(KeyError, match=r"^False$"): + df.loc[False] + with pytest.raises(KeyError, match=r"^True$"): + df.loc[True] - pytest.raises(KeyError, df.loc.__setitem__, False, 0) - pytest.raises(KeyError, df.loc.__setitem__, True, 0) + msg = "cannot use a single bool to index into setitem" + with pytest.raises(KeyError, match=msg): + df.loc[False] = 0 + with pytest.raises(KeyError, match=msg): + df.loc[True] = 0 def test_getitem_list_duplicates(self): # #1943 @@ -1813,6 +1835,7 @@ def test_set_value(self): self.frame.set_value(idx, col, 1) assert self.frame[col][idx] == 1 + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_set_value_resize(self): with tm.assert_produces_warning(FutureWarning, @@ -1849,7 +1872,9 @@ def test_set_value_resize(self): assert isna(res3['baz'].drop(['foobar'])).all() with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - pytest.raises(ValueError, res3.set_value, 'foobar', 'baz', 'sam') + msg = "could not convert string to float: 'sam'" + with pytest.raises(ValueError, match=msg): + res3.set_value('foobar', 'baz', 'sam') def test_set_value_with_index_dtype_change(self): df_orig = DataFrame(np.random.randn(3, 3), @@ -1888,7 +1913,8 @@ def test_get_set_value_no_partial_indexing(self): df = DataFrame(index=index, columns=lrange(4)) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - pytest.raises(KeyError, df.get_value, 0, 1) + with pytest.raises(KeyError, match=r"^0$"): + df.get_value(0, 1) def test_single_element_ix_dont_upcast(self): self.frame['E'] = 1 @@ -2158,10 +2184,15 @@ def test_non_monotonic_reindex_methods(self): df_rev = pd.DataFrame(data, index=dr[[3, 4, 5] + [0, 1, 2]], columns=list('A')) # index is not monotonic increasing or decreasing - pytest.raises(ValueError, df_rev.reindex, df.index, method='pad') - pytest.raises(ValueError, df_rev.reindex, df.index, method='ffill') - pytest.raises(ValueError, df_rev.reindex, df.index, method='bfill') - pytest.raises(ValueError, df_rev.reindex, df.index, method='nearest') + msg = "index must be monotonic increasing or decreasing" + with pytest.raises(ValueError, match=msg): + df_rev.reindex(df.index, method='pad') + with pytest.raises(ValueError, match=msg): + df_rev.reindex(df.index, method='ffill') + with pytest.raises(ValueError, match=msg): + df_rev.reindex(df.index, method='bfill') + with pytest.raises(ValueError, match=msg): + df_rev.reindex(df.index, method='nearest') def test_reindex_level(self): from itertools import permutations @@ -2669,14 +2700,20 @@ def _check_align(df, cond, other, check_dtypes=True): # invalid conditions df = default_frame err1 = (df + 1).values[0:2, :] - pytest.raises(ValueError, df.where, cond, err1) + msg = "other must be the same shape as self when an ndarray" + with pytest.raises(ValueError, match=msg): + df.where(cond, err1) err2 = cond.iloc[:2, :].values other1 = _safe_add(df) - pytest.raises(ValueError, df.where, err2, other1) + msg = "Array conditional must be same shape as self" + with pytest.raises(ValueError, match=msg): + df.where(err2, other1) - pytest.raises(ValueError, df.mask, True) - pytest.raises(ValueError, df.mask, 0) + with pytest.raises(ValueError, match=msg): + df.mask(True) + with pytest.raises(ValueError, match=msg): + df.mask(0) # where inplace def _check_set(df, cond, check_dtypes=True): diff --git a/pandas/tests/frame/test_missing.py b/pandas/tests/frame/test_missing.py index 77a3d4785d295..2f3b0a9f76de9 100644 --- a/pandas/tests/frame/test_missing.py +++ b/pandas/tests/frame/test_missing.py @@ -9,7 +9,7 @@ import numpy as np import pytest -from pandas.compat import lrange +from pandas.compat import PY2, lrange import pandas.util._test_decorators as td import pandas as pd @@ -83,6 +83,7 @@ def test_dropIncompleteRows(self): tm.assert_index_equal(samesize_frame.index, self.frame.index) tm.assert_index_equal(inp_frame2.index, self.frame.index) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_dropna(self): df = DataFrame(np.random.randn(6, 4)) df[2][:2] = np.nan @@ -139,7 +140,9 @@ def test_dropna(self): assert_frame_equal(dropped, expected) # bad input - pytest.raises(ValueError, df.dropna, axis=3) + msg = "No axis named 3 for object type " + with pytest.raises(ValueError, match=msg): + df.dropna(axis=3) def test_drop_and_dropna_caching(self): # tst that cacher updates @@ -158,10 +161,15 @@ def test_drop_and_dropna_caching(self): def test_dropna_corner(self): # bad input - pytest.raises(ValueError, self.frame.dropna, how='foo') - pytest.raises(TypeError, self.frame.dropna, how=None) + msg = "invalid how option: foo" + with pytest.raises(ValueError, match=msg): + self.frame.dropna(how='foo') + msg = "must specify how or thresh" + with pytest.raises(TypeError, match=msg): + self.frame.dropna(how=None) # non-existent column - 8303 - pytest.raises(KeyError, self.frame.dropna, subset=['A', 'X']) + with pytest.raises(KeyError, match=r"^\['X'\]$"): + self.frame.dropna(subset=['A', 'X']) def test_dropna_multiple_axes(self): df = DataFrame([[1, np.nan, 2, 3], @@ -226,8 +234,12 @@ def test_fillna(self): result = self.mixed_frame.fillna(value=0) result = self.mixed_frame.fillna(method='pad') - pytest.raises(ValueError, self.tsframe.fillna) - pytest.raises(ValueError, self.tsframe.fillna, 5, method='ffill') + msg = "Must specify a fill 'value' or 'method'" + with pytest.raises(ValueError, match=msg): + self.tsframe.fillna() + msg = "Cannot specify both 'value' and 'method'" + with pytest.raises(ValueError, match=msg): + self.tsframe.fillna(5, method='ffill') # mixed numeric (but no float16) mf = self.mixed_float.reindex(columns=['A', 'B', 'D']) @@ -595,11 +607,18 @@ def test_fillna_invalid_method(self): def test_fillna_invalid_value(self): # list - pytest.raises(TypeError, self.frame.fillna, [1, 2]) + msg = ("\"value\" parameter must be a scalar or dict, but you passed" + " a \"{}\"") + with pytest.raises(TypeError, match=msg.format('list')): + self.frame.fillna([1, 2]) # tuple - pytest.raises(TypeError, self.frame.fillna, (1, 2)) + with pytest.raises(TypeError, match=msg.format('tuple')): + self.frame.fillna((1, 2)) # frame with series - pytest.raises(TypeError, self.frame.iloc[:, 0].fillna, self.frame) + msg = ("\"value\" parameter must be a scalar, dict or Series, but you" + " passed a \"DataFrame\"") + with pytest.raises(TypeError, match=msg): + self.frame.iloc[:, 0].fillna(self.frame) def test_fillna_col_reordering(self): cols = ["COL." + str(i) for i in range(5, 0, -1)] diff --git a/pandas/tests/frame/test_mutate_columns.py b/pandas/tests/frame/test_mutate_columns.py index 1f4da1bbb0470..6bef7e3f65b21 100644 --- a/pandas/tests/frame/test_mutate_columns.py +++ b/pandas/tests/frame/test_mutate_columns.py @@ -177,7 +177,9 @@ def test_insert(self): with pytest.raises(ValueError, match='already exists'): df.insert(1, 'a', df['b']) - pytest.raises(ValueError, df.insert, 1, 'c', df['b']) + msg = "cannot insert c, already exists" + with pytest.raises(ValueError, match=msg): + df.insert(1, 'c', df['b']) df.columns.name = 'some_name' # preserve columns name field diff --git a/pandas/tests/frame/test_nonunique_indexes.py b/pandas/tests/frame/test_nonunique_indexes.py index a5bed14cf06d2..799d548100b5e 100644 --- a/pandas/tests/frame/test_nonunique_indexes.py +++ b/pandas/tests/frame/test_nonunique_indexes.py @@ -187,8 +187,11 @@ def check(result, expected=None): # reindex is invalid! df = DataFrame([[1, 5, 7.], [1, 5, 7.], [1, 5, 7.]], columns=['bar', 'a', 'a']) - pytest.raises(ValueError, df.reindex, columns=['bar']) - pytest.raises(ValueError, df.reindex, columns=['bar', 'foo']) + msg = "cannot reindex from a duplicate axis" + with pytest.raises(ValueError, match=msg): + df.reindex(columns=['bar']) + with pytest.raises(ValueError, match=msg): + df.reindex(columns=['bar', 'foo']) # drop df = DataFrame([[1, 5, 7.], [1, 5, 7.], [1, 5, 7.]], @@ -306,7 +309,9 @@ def check(result, expected=None): # boolean with the duplicate raises df = DataFrame(np.arange(12).reshape(3, 4), columns=dups, dtype='float64') - pytest.raises(ValueError, lambda: df[df.A > 6]) + msg = "cannot reindex from a duplicate axis" + with pytest.raises(ValueError, match=msg): + df[df.A > 6] # dup aligining operations should work # GH 5185 @@ -323,7 +328,9 @@ def check(result, expected=None): columns=['A', 'A']) # not-comparing like-labelled - pytest.raises(ValueError, lambda: df1 == df2) + msg = "Can only compare identically-labeled DataFrame objects" + with pytest.raises(ValueError, match=msg): + df1 == df2 df1r = df1.reindex_like(df2) result = df1r == df2 diff --git a/pandas/tests/frame/test_quantile.py b/pandas/tests/frame/test_quantile.py index d1f1299a5202e..19b6636978643 100644 --- a/pandas/tests/frame/test_quantile.py +++ b/pandas/tests/frame/test_quantile.py @@ -5,6 +5,8 @@ import numpy as np import pytest +from pandas.compat import PY2 + import pandas as pd from pandas import DataFrame, Series, Timestamp from pandas.tests.frame.common import TestData @@ -71,6 +73,7 @@ def test_quantile_axis_mixed(self): with pytest.raises(TypeError): df.quantile(.5, axis=1, numeric_only=False) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_quantile_axis_parameter(self): # GH 9543/9544 @@ -92,8 +95,12 @@ def test_quantile_axis_parameter(self): result = df.quantile(.5, axis="columns") assert_series_equal(result, expected) - pytest.raises(ValueError, df.quantile, 0.1, axis=-1) - pytest.raises(ValueError, df.quantile, 0.1, axis="column") + msg = "No axis named -1 for object type " + with pytest.raises(ValueError, match=msg): + df.quantile(0.1, axis=-1) + msg = "No axis named column for object type " + with pytest.raises(ValueError, match=msg): + df.quantile(0.1, axis="column") def test_quantile_interpolation(self): # see gh-10174 diff --git a/pandas/tests/frame/test_query_eval.py b/pandas/tests/frame/test_query_eval.py index 9c4d306ea5720..ba02cb54bcea1 100644 --- a/pandas/tests/frame/test_query_eval.py +++ b/pandas/tests/frame/test_query_eval.py @@ -14,7 +14,6 @@ from pandas import DataFrame, Index, MultiIndex, Series, date_range from pandas.core.computation.check import _NUMEXPR_INSTALLED from pandas.tests.frame.common import TestData -import pandas.util.testing as tm from pandas.util.testing import ( assert_frame_equal, assert_series_equal, makeCustomDataframe as mkdf) @@ -79,10 +78,10 @@ def test_query_numexpr(self): result = df.eval('A+1', engine='numexpr') assert_series_equal(result, self.expected2, check_names=False) else: - pytest.raises(ImportError, - lambda: df.query('A>0', engine='numexpr')) - pytest.raises(ImportError, - lambda: df.eval('A+1', engine='numexpr')) + with pytest.raises(ImportError): + df.query('A>0', engine='numexpr') + with pytest.raises(ImportError): + df.eval('A+1', engine='numexpr') class TestDataFrameEval(TestData): @@ -355,13 +354,6 @@ def to_series(mi, level): else: raise AssertionError("object must be a Series or Index") - @pytest.mark.filterwarnings("ignore::FutureWarning") - def test_raise_on_panel_with_multiindex(self, parser, engine): - p = tm.makePanel(7) - p.items = tm.makeCustomIndex(len(p.items), nlevels=2) - with pytest.raises(NotImplementedError): - pd.eval('p + 1', parser=parser, engine=engine) - @td.skip_if_no_ne class TestDataFrameQueryNumExprPandas(object): @@ -860,9 +852,10 @@ def test_str_query_method(self, parser, engine): for lhs, op, rhs in zip(lhs, ops, rhs): ex = '{lhs} {op} {rhs}'.format(lhs=lhs, op=op, rhs=rhs) - pytest.raises(NotImplementedError, df.query, ex, - engine=engine, parser=parser, - local_dict={'strings': df.strings}) + msg = r"'(Not)?In' nodes are not implemented" + with pytest.raises(NotImplementedError, match=msg): + df.query(ex, engine=engine, parser=parser, + local_dict={'strings': df.strings}) else: res = df.query('"a" == strings', engine=engine, parser=parser) assert_frame_equal(res, expect) diff --git a/pandas/tests/frame/test_rank.py b/pandas/tests/frame/test_rank.py index 10c42e0d1a1cf..6bb9dea15d1ce 100644 --- a/pandas/tests/frame/test_rank.py +++ b/pandas/tests/frame/test_rank.py @@ -310,6 +310,7 @@ def test_rank_pct_true(self, method, exp): tm.assert_frame_equal(result, expected) @pytest.mark.single + @pytest.mark.high_memory def test_pct_max_many_rows(self): # GH 18271 df = DataFrame({'A': np.arange(2**24 + 1), diff --git a/pandas/tests/frame/test_replace.py b/pandas/tests/frame/test_replace.py index 219f7a1585fc2..50c66d3f8db00 100644 --- a/pandas/tests/frame/test_replace.py +++ b/pandas/tests/frame/test_replace.py @@ -466,6 +466,13 @@ def test_regex_replace_dict_nested(self): assert_frame_equal(res3, expec) assert_frame_equal(res4, expec) + def test_regex_replace_dict_nested_non_first_character(self): + # GH 25259 + df = pd.DataFrame({'first': ['abc', 'bca', 'cab']}) + expected = pd.DataFrame({'first': ['.bc', 'bc.', 'c.b']}) + result = df.replace({'a': '.'}, regex=True) + assert_frame_equal(result, expected) + def test_regex_replace_dict_nested_gh4115(self): df = pd.DataFrame({'Type': ['Q', 'T', 'Q', 'Q', 'T'], 'tmp': 2}) expected = DataFrame({'Type': [0, 1, 0, 0, 1], 'tmp': 2}) @@ -830,7 +837,9 @@ def test_replace_input_formats_listlike(self): expected.replace(to_rep[i], values[i], inplace=True) assert_frame_equal(result, expected) - pytest.raises(ValueError, df.replace, to_rep, values[1:]) + msg = r"Replacement lists must match in length\. Expecting 3 got 2" + with pytest.raises(ValueError, match=msg): + df.replace(to_rep, values[1:]) def test_replace_input_formats_scalar(self): df = DataFrame({'A': [np.nan, 0, np.inf], 'B': [0, 2, 5], @@ -843,7 +852,9 @@ def test_replace_input_formats_scalar(self): for k, v in compat.iteritems(df)} assert_frame_equal(filled, DataFrame(expected)) - pytest.raises(TypeError, df.replace, to_rep, [np.nan, 0, '']) + msg = "value argument must be scalar, dict, or Series" + with pytest.raises(TypeError, match=msg): + df.replace(to_rep, [np.nan, 0, '']) # list to scalar to_rep = [np.nan, 0, ''] diff --git a/pandas/tests/frame/test_reshape.py b/pandas/tests/frame/test_reshape.py index 28222a82945be..8abf3a6706886 100644 --- a/pandas/tests/frame/test_reshape.py +++ b/pandas/tests/frame/test_reshape.py @@ -4,7 +4,6 @@ from datetime import datetime import itertools -from warnings import catch_warnings, simplefilter import numpy as np import pytest @@ -49,14 +48,6 @@ def test_pivot(self): assert pivoted.index.name == 'index' assert pivoted.columns.names == (None, 'columns') - with catch_warnings(record=True): - # pivot multiple columns - simplefilter("ignore", FutureWarning) - wp = tm.makePanel() - lp = wp.to_frame() - df = lp.reset_index() - tm.assert_frame_equal(df.pivot('major', 'minor'), lp.unstack()) - def test_pivot_duplicates(self): data = DataFrame({'a': ['bar', 'bar', 'foo', 'foo', 'foo'], 'b': ['one', 'two', 'one', 'one', 'two'], @@ -67,7 +58,7 @@ def test_pivot_duplicates(self): def test_pivot_empty(self): df = DataFrame({}, columns=['a', 'b', 'c']) result = df.pivot('a', 'b', 'c') - expected = DataFrame({}) + expected = DataFrame() tm.assert_frame_equal(result, expected, check_names=False) def test_pivot_integer_bug(self): @@ -403,7 +394,10 @@ def test_stack_mixed_levels(self): # When mixed types are passed and the ints are not level # names, raise - pytest.raises(ValueError, df2.stack, level=['animal', 0]) + msg = ("level should contain all level names or all level numbers, not" + " a mixture of the two") + with pytest.raises(ValueError, match=msg): + df2.stack(level=['animal', 0]) # GH #8584: Having 0 in the level names could raise a # strange error about lexsort depth diff --git a/pandas/tests/frame/test_sorting.py b/pandas/tests/frame/test_sorting.py index 85e6373b384e4..8b29394bcab84 100644 --- a/pandas/tests/frame/test_sorting.py +++ b/pandas/tests/frame/test_sorting.py @@ -7,7 +7,7 @@ import numpy as np import pytest -from pandas.compat import lrange +from pandas.compat import PY2, lrange import pandas as pd from pandas import ( @@ -21,6 +21,7 @@ class TestDataFrameSorting(TestData): + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_sort_values(self): frame = DataFrame([[1, 1, 2], [3, 1, 0], [4, 5, 6]], index=[1, 2, 3], columns=list('ABC')) @@ -54,8 +55,9 @@ def test_sort_values(self): sorted_df = frame.sort_values(by=['B', 'A'], ascending=[True, False]) assert_frame_equal(sorted_df, expected) - pytest.raises(ValueError, lambda: frame.sort_values( - by=['A', 'B'], axis=2, inplace=True)) + msg = "No axis named 2 for object type " + with pytest.raises(ValueError, match=msg): + frame.sort_values(by=['A', 'B'], axis=2, inplace=True) # by row (axis=1): GH 10806 sorted_df = frame.sort_values(by=3, axis=1) diff --git a/pandas/tests/frame/test_subclass.py b/pandas/tests/frame/test_subclass.py index 4f0747c0d6945..2e3696e7e04cc 100644 --- a/pandas/tests/frame/test_subclass.py +++ b/pandas/tests/frame/test_subclass.py @@ -6,7 +6,7 @@ import pytest import pandas as pd -from pandas import DataFrame, Index, MultiIndex, Panel, Series +from pandas import DataFrame, Index, MultiIndex, Series from pandas.tests.frame.common import TestData import pandas.util.testing as tm @@ -125,29 +125,6 @@ def test_indexing_sliced(self): tm.assert_series_equal(res, exp) assert isinstance(res, tm.SubclassedSeries) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_to_panel_expanddim(self): - # GH 9762 - - class SubclassedFrame(DataFrame): - - @property - def _constructor_expanddim(self): - return SubclassedPanel - - class SubclassedPanel(Panel): - pass - - index = MultiIndex.from_tuples([(0, 0), (0, 1), (0, 2)]) - df = SubclassedFrame({'X': [1, 2, 3], 'Y': [4, 5, 6]}, index=index) - result = df.to_panel() - assert isinstance(result, SubclassedPanel) - expected = SubclassedPanel([[[1, 2, 3]], [[4, 5, 6]]], - items=['X', 'Y'], major_axis=[0], - minor_axis=[0, 1, 2], - dtype='int64') - tm.assert_panel_equal(result, expected) - def test_subclass_attr_err_propagation(self): # GH 11808 class A(DataFrame): diff --git a/pandas/tests/frame/test_timeseries.py b/pandas/tests/frame/test_timeseries.py index bc37317f72802..716a9e30e4cc3 100644 --- a/pandas/tests/frame/test_timeseries.py +++ b/pandas/tests/frame/test_timeseries.py @@ -6,8 +6,9 @@ import numpy as np import pytest +import pytz -from pandas.compat import product +from pandas.compat import PY2, product import pandas as pd from pandas import ( @@ -394,7 +395,9 @@ def test_tshift(self): assert_frame_equal(unshifted, inferred_ts) no_freq = self.tsframe.iloc[[0, 5, 7], :] - pytest.raises(ValueError, no_freq.tshift) + msg = "Freq was not given and was not set in the index" + with pytest.raises(ValueError, match=msg): + no_freq.tshift() def test_truncate(self): ts = self.tsframe[::3] @@ -435,9 +438,10 @@ def test_truncate(self): truncated = ts.truncate(after=end_missing) assert_frame_equal(truncated, expected) - pytest.raises(ValueError, ts.truncate, - before=ts.index[-1] - ts.index.freq, - after=ts.index[0] + ts.index.freq) + msg = "Truncate: 2000-01-06 00:00:00 must be after 2000-02-04 00:00:00" + with pytest.raises(ValueError, match=msg): + ts.truncate(before=ts.index[-1] - ts.index.freq, + after=ts.index[0] + ts.index.freq) def test_truncate_copy(self): index = self.tsframe.index @@ -647,6 +651,28 @@ def test_at_time(self): rs = ts.at_time('16:00') assert len(rs) == 0 + @pytest.mark.parametrize('hour', ['1:00', '1:00AM', time(1), + time(1, tzinfo=pytz.UTC)]) + def test_at_time_errors(self, hour): + # GH 24043 + dti = pd.date_range('2018', periods=3, freq='H') + df = pd.DataFrame(list(range(len(dti))), index=dti) + if getattr(hour, 'tzinfo', None) is None: + result = df.at_time(hour) + expected = df.iloc[1:2] + tm.assert_frame_equal(result, expected) + else: + with pytest.raises(ValueError, match="Index must be timezone"): + df.at_time(hour) + + def test_at_time_tz(self): + # GH 24043 + dti = pd.date_range('2018', periods=3, freq='H', tz='US/Pacific') + df = pd.DataFrame(list(range(len(dti))), index=dti) + result = df.at_time(time(4, tzinfo=pytz.timezone('US/Eastern'))) + expected = df.iloc[1:2] + tm.assert_frame_equal(result, expected) + def test_at_time_raises(self): # GH20725 df = pd.DataFrame([[1, 2, 3], [4, 5, 6]]) @@ -758,14 +784,18 @@ def test_between_time_axis_raises(self, axis): ts = DataFrame(rand_data, index=rng, columns=rng) stime, etime = ('08:00:00', '09:00:00') + msg = "Index must be DatetimeIndex" if axis in ['columns', 1]: ts.index = mask - pytest.raises(TypeError, ts.between_time, stime, etime) - pytest.raises(TypeError, ts.between_time, stime, etime, axis=0) + with pytest.raises(TypeError, match=msg): + ts.between_time(stime, etime) + with pytest.raises(TypeError, match=msg): + ts.between_time(stime, etime, axis=0) if axis in ['index', 0]: ts.columns = mask - pytest.raises(TypeError, ts.between_time, stime, etime, axis=1) + with pytest.raises(TypeError, match=msg): + ts.between_time(stime, etime, axis=1) def test_operation_on_NaT(self): # Both NaT and Timestamp are in DataFrame. @@ -806,6 +836,7 @@ def test_datetime_assignment_with_NaT_and_diff_time_units(self): 'new': [1e9, None]}, dtype='datetime64[ns]') tm.assert_frame_equal(result, expected) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_frame_to_period(self): K = 5 @@ -831,7 +862,9 @@ def test_frame_to_period(self): pts = df.to_period('M', axis=1) tm.assert_index_equal(pts.columns, exp.columns.asfreq('M')) - pytest.raises(ValueError, df.to_period, axis=2) + msg = "No axis named 2 for object type " + with pytest.raises(ValueError, match=msg): + df.to_period(axis=2) @pytest.mark.parametrize("fn", ['tz_localize', 'tz_convert']) def test_tz_convert_and_localize(self, fn): diff --git a/pandas/tests/frame/test_to_csv.py b/pandas/tests/frame/test_to_csv.py index 61eefccede5dd..54a8712a9c645 100644 --- a/pandas/tests/frame/test_to_csv.py +++ b/pandas/tests/frame/test_to_csv.py @@ -109,8 +109,9 @@ def test_to_csv_from_csv2(self): xp.columns = col_aliases assert_frame_equal(xp, rs) - pytest.raises(ValueError, self.frame2.to_csv, path, - header=['AA', 'X']) + msg = "Writing 4 cols but got 2 aliases" + with pytest.raises(ValueError, match=msg): + self.frame2.to_csv(path, header=['AA', 'X']) def test_to_csv_from_csv3(self): diff --git a/pandas/tests/generic/test_generic.py b/pandas/tests/generic/test_generic.py index 7183fea85a069..c2f6cbf4c564c 100644 --- a/pandas/tests/generic/test_generic.py +++ b/pandas/tests/generic/test_generic.py @@ -14,8 +14,7 @@ import pandas as pd from pandas import DataFrame, MultiIndex, Panel, Series, date_range import pandas.util.testing as tm -from pandas.util.testing import ( - assert_frame_equal, assert_panel_equal, assert_series_equal) +from pandas.util.testing import assert_frame_equal, assert_series_equal import pandas.io.formats.printing as printing @@ -701,16 +700,9 @@ def test_sample(sel): assert_frame_equal(sample1, df[['colString']]) # Test default axes - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - p = Panel(items=['a', 'b', 'c'], major_axis=[2, 4, 6], - minor_axis=[1, 3, 5]) - assert_panel_equal( - p.sample(n=3, random_state=42), p.sample(n=3, axis=1, - random_state=42)) - assert_frame_equal( - df.sample(n=3, random_state=42), df.sample(n=3, axis=0, - random_state=42)) + assert_frame_equal( + df.sample(n=3, random_state=42), df.sample(n=3, axis=0, + random_state=42)) # Test that function aligns weights with frame df = DataFrame( @@ -740,23 +732,11 @@ def test_squeeze(self): tm.assert_series_equal(s.squeeze(), s) for df in [tm.makeTimeDataFrame()]: tm.assert_frame_equal(df.squeeze(), df) - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - for p in [tm.makePanel()]: - tm.assert_panel_equal(p.squeeze(), p) # squeezing df = tm.makeTimeDataFrame().reindex(columns=['A']) tm.assert_series_equal(df.squeeze(), df['A']) - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - p = tm.makePanel().reindex(items=['ItemA']) - tm.assert_frame_equal(p.squeeze(), p['ItemA']) - - p = tm.makePanel().reindex(items=['ItemA'], minor_axis=['A']) - tm.assert_series_equal(p.squeeze(), p.loc['ItemA', :, 'A']) - # don't fail with 0 length dimensions GH11229 & GH8999 empty_series = Series([], name='five') empty_frame = DataFrame([empty_series]) @@ -789,8 +769,6 @@ def test_numpy_squeeze(self): tm.assert_series_equal(np.squeeze(df), df['A']) def test_transpose(self): - msg = (r"transpose\(\) got multiple values for " - r"keyword argument 'axes'") for s in [tm.makeFloatSeries(), tm.makeStringSeries(), tm.makeObjectSeries()]: # calls implementation in pandas/core/base.py @@ -798,14 +776,6 @@ def test_transpose(self): for df in [tm.makeTimeDataFrame()]: tm.assert_frame_equal(df.transpose().transpose(), df) - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - for p in [tm.makePanel()]: - tm.assert_panel_equal(p.transpose(2, 0, 1) - .transpose(1, 2, 0), p) - with pytest.raises(TypeError, match=msg): - p.transpose(2, 0, 1, axes=(2, 0, 1)) - def test_numpy_transpose(self): msg = "the 'axes' parameter is not supported" @@ -821,13 +791,6 @@ def test_numpy_transpose(self): with pytest.raises(ValueError, match=msg): np.transpose(df, axes=1) - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - p = tm.makePanel() - tm.assert_panel_equal(np.transpose( - np.transpose(p, axes=(2, 0, 1)), - axes=(1, 2, 0)), p) - def test_take(self): indices = [1, 5, -2, 6, 3, -1] for s in [tm.makeFloatSeries(), tm.makeStringSeries(), @@ -843,27 +806,12 @@ def test_take(self): columns=df.columns) tm.assert_frame_equal(out, expected) - indices = [-3, 2, 0, 1] - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - for p in [tm.makePanel()]: - out = p.take(indices) - expected = Panel(data=p.values.take(indices, axis=0), - items=p.items.take(indices), - major_axis=p.major_axis, - minor_axis=p.minor_axis) - tm.assert_panel_equal(out, expected) - def test_take_invalid_kwargs(self): indices = [-3, 2, 0, 1] s = tm.makeFloatSeries() df = tm.makeTimeDataFrame() - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - p = tm.makePanel() - - for obj in (s, df, p): + for obj in (s, df): msg = r"take\(\) got an unexpected keyword argument 'foo'" with pytest.raises(TypeError, match=msg): obj.take(indices, foo=2) @@ -966,12 +914,6 @@ def test_equals(self): assert a.equals(e) assert e.equals(f) - def test_describe_raises(self): - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - with pytest.raises(NotImplementedError): - tm.makePanel().describe() - def test_pipe(self): df = DataFrame({'A': [1, 2, 3]}) f = lambda x, y: x ** y @@ -1000,22 +942,6 @@ def test_pipe_tuple_error(self): with pytest.raises(ValueError): df.A.pipe((f, 'y'), x=1, y=0) - def test_pipe_panel(self): - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - wp = Panel({'r1': DataFrame({"A": [1, 2, 3]})}) - f = lambda x, y: x + y - result = wp.pipe(f, 2) - expected = wp + 2 - assert_panel_equal(result, expected) - - result = wp.pipe((f, 'y'), x=1) - expected = wp + 1 - assert_panel_equal(result, expected) - - with pytest.raises(ValueError): - wp.pipe((f, 'y'), x=1, y=1) - @pytest.mark.parametrize('box', [pd.Series, pd.DataFrame]) def test_axis_classmethods(self, box): obj = box() diff --git a/pandas/tests/generic/test_panel.py b/pandas/tests/generic/test_panel.py deleted file mode 100644 index 8b090d951957e..0000000000000 --- a/pandas/tests/generic/test_panel.py +++ /dev/null @@ -1,59 +0,0 @@ -# -*- coding: utf-8 -*- -# pylint: disable-msg=E1101,W0612 - -from warnings import catch_warnings, simplefilter - -import pandas.util._test_decorators as td - -from pandas import Panel -import pandas.util.testing as tm -from pandas.util.testing import assert_almost_equal, assert_panel_equal - -from .test_generic import Generic - - -class TestPanel(Generic): - _typ = Panel - _comparator = lambda self, x, y: assert_panel_equal(x, y, by_blocks=True) - - @td.skip_if_no('xarray', min_version='0.7.0') - def test_to_xarray(self): - from xarray import DataArray - - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - p = tm.makePanel() - - result = p.to_xarray() - assert isinstance(result, DataArray) - assert len(result.coords) == 3 - assert_almost_equal(list(result.coords.keys()), - ['items', 'major_axis', 'minor_axis']) - assert len(result.dims) == 3 - - # idempotency - assert_panel_equal(result.to_pandas(), p) - - -# run all the tests, but wrap each in a warning catcher -for t in ['test_rename', 'test_get_numeric_data', - 'test_get_default', 'test_nonzero', - 'test_downcast', 'test_constructor_compound_dtypes', - 'test_head_tail', - 'test_size_compat', 'test_split_compat', - 'test_unexpected_keyword', - 'test_stat_unexpected_keyword', 'test_api_compat', - 'test_stat_non_defaults_args', - 'test_truncate_out_of_bounds', - 'test_metadata_propagation', 'test_copy_and_deepcopy', - 'test_pct_change', 'test_sample']: - - def f(): - def tester(self): - f = getattr(super(TestPanel, self), t) - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - f() - return tester - - setattr(TestPanel, t, f()) diff --git a/pandas/tests/groupby/aggregate/test_aggregate.py b/pandas/tests/groupby/aggregate/test_aggregate.py index 62ec0555f9033..0c2e74c0b735f 100644 --- a/pandas/tests/groupby/aggregate/test_aggregate.py +++ b/pandas/tests/groupby/aggregate/test_aggregate.py @@ -3,12 +3,11 @@ """ test .agg behavior / note that .apply is tested generally in test_groupby.py """ +from collections import OrderedDict import numpy as np import pytest -from pandas.compat import OrderedDict - import pandas as pd from pandas import DataFrame, Index, MultiIndex, Series, concat from pandas.core.base import SpecificationError @@ -287,3 +286,20 @@ def test_multi_function_flexible_mix(df): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): result = grouped.aggregate(d) tm.assert_frame_equal(result, expected) + + +def test_groupby_agg_coercing_bools(): + # issue 14873 + dat = pd.DataFrame( + {'a': [1, 1, 2, 2], 'b': [0, 1, 2, 3], 'c': [None, None, 1, 1]}) + gp = dat.groupby('a') + + index = Index([1, 2], name='a') + + result = gp['b'].aggregate(lambda x: (x != 0).all()) + expected = Series([False, True], index=index, name='b') + tm.assert_series_equal(result, expected) + + result = gp['c'].aggregate(lambda x: x.isnull().all()) + expected = Series([True, False], index=index, name='c') + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/groupby/aggregate/test_other.py b/pandas/tests/groupby/aggregate/test_other.py index b5214b11bddcc..cacfdb7694de1 100644 --- a/pandas/tests/groupby/aggregate/test_other.py +++ b/pandas/tests/groupby/aggregate/test_other.py @@ -512,3 +512,18 @@ def test_agg_list_like_func(): expected = pd.DataFrame({'A': [str(x) for x in range(3)], 'B': [[str(x)] for x in range(3)]}) tm.assert_frame_equal(result, expected) + + +def test_agg_lambda_with_timezone(): + # GH 23683 + df = pd.DataFrame({ + 'tag': [1, 1], + 'date': [ + pd.Timestamp('2018-01-01', tz='UTC'), + pd.Timestamp('2018-01-02', tz='UTC')] + }) + result = df.groupby('tag').agg({'date': lambda e: e.head(1)}) + expected = pd.DataFrame([pd.Timestamp('2018-01-01', tz='UTC')], + index=pd.Index([1], name='tag'), + columns=['date']) + tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/groupby/test_function.py b/pandas/tests/groupby/test_function.py index a884a37840f8a..b5e328ef64424 100644 --- a/pandas/tests/groupby/test_function.py +++ b/pandas/tests/groupby/test_function.py @@ -897,6 +897,15 @@ def test_nunique_with_timegrouper(): tm.assert_series_equal(result, expected) +def test_nunique_preserves_column_level_names(): + # GH 23222 + test = pd.DataFrame([1, 2, 2], + columns=pd.Index(['A'], name="level_0")) + result = test.groupby([0, 0, 0]).nunique() + expected = pd.DataFrame([2], columns=test.columns) + tm.assert_frame_equal(result, expected) + + # count # -------------------------------- @@ -1060,6 +1069,55 @@ def test_size(df): tm.assert_series_equal(df.groupby('A').size(), out) +# quantile +# -------------------------------- +@pytest.mark.parametrize("interpolation", [ + "linear", "lower", "higher", "nearest", "midpoint"]) +@pytest.mark.parametrize("a_vals,b_vals", [ + # Ints + ([1, 2, 3, 4, 5], [5, 4, 3, 2, 1]), + ([1, 2, 3, 4], [4, 3, 2, 1]), + ([1, 2, 3, 4, 5], [4, 3, 2, 1]), + # Floats + ([1., 2., 3., 4., 5.], [5., 4., 3., 2., 1.]), + # Missing data + ([1., np.nan, 3., np.nan, 5.], [5., np.nan, 3., np.nan, 1.]), + ([np.nan, 4., np.nan, 2., np.nan], [np.nan, 4., np.nan, 2., np.nan]), + # Timestamps + ([x for x in pd.date_range('1/1/18', freq='D', periods=5)], + [x for x in pd.date_range('1/1/18', freq='D', periods=5)][::-1]), + # All NA + ([np.nan] * 5, [np.nan] * 5), +]) +@pytest.mark.parametrize('q', [0, .25, .5, .75, 1]) +def test_quantile(interpolation, a_vals, b_vals, q): + if interpolation == 'nearest' and q == 0.5 and b_vals == [4, 3, 2, 1]: + pytest.skip("Unclear numpy expectation for nearest result with " + "equidistant data") + + a_expected = pd.Series(a_vals).quantile(q, interpolation=interpolation) + b_expected = pd.Series(b_vals).quantile(q, interpolation=interpolation) + + df = DataFrame({ + 'key': ['a'] * len(a_vals) + ['b'] * len(b_vals), + 'val': a_vals + b_vals}) + + expected = DataFrame([a_expected, b_expected], columns=['val'], + index=Index(['a', 'b'], name='key')) + result = df.groupby('key').quantile(q, interpolation=interpolation) + + tm.assert_frame_equal(result, expected) + + +def test_quantile_raises(): + df = pd.DataFrame([ + ['foo', 'a'], ['foo', 'b'], ['foo', 'c']], columns=['key', 'val']) + + with pytest.raises(TypeError, match="cannot be performed against " + "'object' dtypes"): + df.groupby('key').quantile() + + # pipe # -------------------------------- diff --git a/pandas/tests/groupby/test_groupby.py b/pandas/tests/groupby/test_groupby.py index 98c917a6eca3c..6a11f0ae9b44a 100644 --- a/pandas/tests/groupby/test_groupby.py +++ b/pandas/tests/groupby/test_groupby.py @@ -1,15 +1,14 @@ # -*- coding: utf-8 -*- from __future__ import print_function -from collections import defaultdict +from collections import OrderedDict, defaultdict from datetime import datetime from decimal import Decimal import numpy as np import pytest -from pandas.compat import ( - OrderedDict, StringIO, lmap, lrange, lzip, map, range, zip) +from pandas.compat import StringIO, lmap, lrange, lzip, map, range, zip from pandas.errors import PerformanceWarning import pandas as pd @@ -209,7 +208,7 @@ def f(x, q=None, axis=0): trans_expected = ts_grouped.transform(g) assert_series_equal(apply_result, agg_expected) - assert_series_equal(agg_result, agg_expected, check_names=False) + assert_series_equal(agg_result, agg_expected) assert_series_equal(trans_result, trans_expected) agg_result = ts_grouped.agg(f, q=80) @@ -224,13 +223,13 @@ def f(x, q=None, axis=0): agg_result = df_grouped.agg(np.percentile, 80, axis=0) apply_result = df_grouped.apply(DataFrame.quantile, .8) expected = df_grouped.quantile(.8) - assert_frame_equal(apply_result, expected) - assert_frame_equal(agg_result, expected, check_names=False) + assert_frame_equal(apply_result, expected, check_names=False) + assert_frame_equal(agg_result, expected) agg_result = df_grouped.agg(f, q=80) apply_result = df_grouped.apply(DataFrame.quantile, q=.8) - assert_frame_equal(agg_result, expected, check_names=False) - assert_frame_equal(apply_result, expected) + assert_frame_equal(agg_result, expected) + assert_frame_equal(apply_result, expected, check_names=False) def test_len(): @@ -1219,51 +1218,6 @@ def test_groupby_nat_exclude(): grouped.get_group(pd.NaT) -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -def test_sparse_friendly(df): - sdf = df[['C', 'D']].to_sparse() - panel = tm.makePanel() - tm.add_nans(panel) - - def _check_work(gp): - gp.mean() - gp.agg(np.mean) - dict(iter(gp)) - - # it works! - _check_work(sdf.groupby(lambda x: x // 2)) - _check_work(sdf['C'].groupby(lambda x: x // 2)) - _check_work(sdf.groupby(df['A'])) - - # do this someday - # _check_work(panel.groupby(lambda x: x.month, axis=1)) - - -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -def test_panel_groupby(): - panel = tm.makePanel() - tm.add_nans(panel) - grouped = panel.groupby({'ItemA': 0, 'ItemB': 0, 'ItemC': 1}, - axis='items') - agged = grouped.mean() - agged2 = grouped.agg(lambda x: x.mean('items')) - - tm.assert_panel_equal(agged, agged2) - - tm.assert_index_equal(agged.items, Index([0, 1])) - - grouped = panel.groupby(lambda x: x.month, axis='major') - agged = grouped.mean() - - exp = Index(sorted(list(set(panel.major_axis.month)))) - tm.assert_index_equal(agged.major_axis, exp) - - grouped = panel.groupby({'A': 0, 'B': 0, 'C': 1, 'D': 1}, - axis='minor') - agged = grouped.mean() - tm.assert_index_equal(agged.minor_axis, Index([0, 1])) - - def test_groupby_2d_malformed(): d = DataFrame(index=lrange(2)) d['group'] = ['g1', 'g2'] @@ -1744,3 +1698,19 @@ def test_groupby_agg_ohlc_non_first(): result = df.groupby(pd.Grouper(freq='D')).agg(['sum', 'ohlc']) tm.assert_frame_equal(result, expected) + + +def test_groupby_multiindex_nat(): + # GH 9236 + values = [ + (pd.NaT, 'a'), + (datetime(2012, 1, 2), 'a'), + (datetime(2012, 1, 2), 'b'), + (datetime(2012, 1, 3), 'a') + ] + mi = pd.MultiIndex.from_tuples(values, names=['date', None]) + ser = pd.Series([3, 2, 2.5, 4], index=mi) + + result = ser.groupby(level=1).mean() + expected = pd.Series([3., 2.5], index=["a", "b"]) + assert_series_equal(result, expected) diff --git a/pandas/tests/groupby/test_grouping.py b/pandas/tests/groupby/test_grouping.py index a509a7cb57c97..44b5bd5f13992 100644 --- a/pandas/tests/groupby/test_grouping.py +++ b/pandas/tests/groupby/test_grouping.py @@ -14,8 +14,7 @@ from pandas.core.groupby.grouper import Grouping import pandas.util.testing as tm from pandas.util.testing import ( - assert_almost_equal, assert_frame_equal, assert_panel_equal, - assert_series_equal) + assert_almost_equal, assert_frame_equal, assert_series_equal) # selection # -------------------------------- @@ -563,17 +562,7 @@ def test_list_grouper_with_nat(self): # -------------------------------- class TestGetGroup(): - - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") def test_get_group(self): - wp = tm.makePanel() - grouped = wp.groupby(lambda x: x.month, axis='major') - - gp = grouped.get_group(1) - expected = wp.reindex( - major=[x for x in wp.major_axis if x.month == 1]) - assert_panel_equal(gp, expected) - # GH 5267 # be datelike friendly df = DataFrame({'DATE': pd.to_datetime( @@ -755,19 +744,6 @@ def test_multi_iter_frame(self, three_group): for key, group in grouped: pass - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_multi_iter_panel(self): - wp = tm.makePanel() - grouped = wp.groupby([lambda x: x.month, lambda x: x.weekday()], - axis=1) - - for (month, wd), group in grouped: - exp_axis = [x - for x in wp.major_axis - if x.month == month and x.weekday() == wd] - expected = wp.reindex(major=exp_axis) - assert_panel_equal(group, expected) - def test_dictify(self, df): dict(iter(df.groupby('A'))) dict(iter(df.groupby(['A', 'B']))) diff --git a/pandas/tests/groupby/test_nth.py b/pandas/tests/groupby/test_nth.py index 255d9a8acf2d0..7a3d189d3020e 100644 --- a/pandas/tests/groupby/test_nth.py +++ b/pandas/tests/groupby/test_nth.py @@ -278,6 +278,26 @@ def test_first_last_tz(data, expected_first, expected_last): assert_frame_equal(result, expected[['id', 'time']]) +@pytest.mark.parametrize('method, ts, alpha', [ + ['first', Timestamp('2013-01-01', tz='US/Eastern'), 'a'], + ['last', Timestamp('2013-01-02', tz='US/Eastern'), 'b'] +]) +def test_first_last_tz_multi_column(method, ts, alpha): + # GH 21603 + df = pd.DataFrame({'group': [1, 1, 2], + 'category_string': pd.Series(list('abc')).astype( + 'category'), + 'datetimetz': pd.date_range('20130101', periods=3, + tz='US/Eastern')}) + result = getattr(df.groupby('group'), method)() + expepcted = pd.DataFrame({'category_string': [alpha, 'c'], + 'datetimetz': [ts, + Timestamp('2013-01-03', + tz='US/Eastern')]}, + index=pd.Index([1, 2], name='group')) + assert_frame_equal(result, expepcted) + + def test_nth_multi_index_as_expected(): # PR 9090, related to issue 8979 # test nth on MultiIndex diff --git a/pandas/tests/groupby/test_transform.py b/pandas/tests/groupby/test_transform.py index f120402e6e8ca..b645073fcf72a 100644 --- a/pandas/tests/groupby/test_transform.py +++ b/pandas/tests/groupby/test_transform.py @@ -834,3 +834,14 @@ def demean_rename(x): tm.assert_frame_equal(result, expected) result_single = df.groupby('group').value.transform(demean_rename) tm.assert_series_equal(result_single, expected['value']) + + +@pytest.mark.parametrize('func', [min, max, np.min, np.max, 'first', 'last']) +def test_groupby_transform_timezone_column(func): + # GH 24198 + ts = pd.to_datetime('now', utc=True).tz_convert('Asia/Singapore') + result = pd.DataFrame({'end_time': [ts], 'id': [1]}) + result['max_end_time'] = result.groupby('id').end_time.transform(func) + expected = pd.DataFrame([[ts, 1, ts]], columns=['end_time', 'id', + 'max_end_time']) + tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/indexes/common.py b/pandas/tests/indexes/common.py index 499f01f0e7f7b..6d29c147c4a4a 100644 --- a/pandas/tests/indexes/common.py +++ b/pandas/tests/indexes/common.py @@ -30,7 +30,12 @@ def setup_indices(self): def test_pickle_compat_construction(self): # need an object to create with - pytest.raises(TypeError, self._holder) + msg = (r"Index\(\.\.\.\) must be called with a collection of some" + r" kind, None was passed|" + r"__new__\(\) missing 1 required positional argument: 'data'|" + r"__new__\(\) takes at least 2 arguments \(1 given\)") + with pytest.raises(TypeError, match=msg): + self._holder() def test_to_series(self): # assert that we are creating a copy of the index @@ -84,8 +89,11 @@ def test_shift(self): # GH8083 test the base class for shift idx = self.create_index() - pytest.raises(NotImplementedError, idx.shift, 1) - pytest.raises(NotImplementedError, idx.shift, 1, 2) + msg = "Not supported for type {}".format(type(idx).__name__) + with pytest.raises(NotImplementedError, match=msg): + idx.shift(1) + with pytest.raises(NotImplementedError, match=msg): + idx.shift(1, 2) def test_create_index_existing_name(self): @@ -478,7 +486,7 @@ def test_union_base(self): with pytest.raises(TypeError, match=msg): first.union([1, 2, 3]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_base(self, sort): for name, idx in compat.iteritems(self.indices): first = idx[2:] @@ -905,3 +913,24 @@ def test_astype_category(self, copy, name, ordered): result = index.astype('category', copy=copy) expected = CategoricalIndex(index.values, name=name) tm.assert_index_equal(result, expected) + + def test_is_unique(self): + # initialize a unique index + index = self.create_index().drop_duplicates() + assert index.is_unique is True + + # empty index should be unique + index_empty = index[:0] + assert index_empty.is_unique is True + + # test basic dupes + index_dup = index.insert(0, index[0]) + assert index_dup.is_unique is False + + # single NA should be unique + index_na = index.insert(0, np.nan) + assert index_na.is_unique is True + + # multiple NA should not be unique + index_na_dup = index_na.insert(0, np.nan) + assert index_na_dup.is_unique is False diff --git a/pandas/tests/indexes/datetimes/test_construction.py b/pandas/tests/indexes/datetimes/test_construction.py index 7ebebbf6dee28..6893f635c82ac 100644 --- a/pandas/tests/indexes/datetimes/test_construction.py +++ b/pandas/tests/indexes/datetimes/test_construction.py @@ -135,8 +135,10 @@ def test_construction_with_alt_tz_localize(self, kwargs, tz_aware_fixture): tm.assert_index_equal(i2, expected) # incompat tz/dtype - pytest.raises(ValueError, lambda: DatetimeIndex( - i.tz_localize(None).asi8, dtype=i.dtype, tz='US/Pacific')) + msg = "cannot supply both a tz and a dtype with a tz" + with pytest.raises(ValueError, match=msg): + DatetimeIndex(i.tz_localize(None).asi8, + dtype=i.dtype, tz='US/Pacific') def test_construction_index_with_mixed_timezones(self): # gh-11488: no tz results in DatetimeIndex @@ -439,14 +441,19 @@ def test_constructor_coverage(self): tm.assert_index_equal(from_ints, expected) # non-conforming - pytest.raises(ValueError, DatetimeIndex, - ['2000-01-01', '2000-01-02', '2000-01-04'], freq='D') + msg = ("Inferred frequency None from passed values does not conform" + " to passed frequency D") + with pytest.raises(ValueError, match=msg): + DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-04'], freq='D') - pytest.raises(ValueError, date_range, start='2011-01-01', - freq='b') - pytest.raises(ValueError, date_range, end='2011-01-01', - freq='B') - pytest.raises(ValueError, date_range, periods=10, freq='D') + msg = ("Of the four parameters: start, end, periods, and freq, exactly" + " three must be specified") + with pytest.raises(ValueError, match=msg): + date_range(start='2011-01-01', freq='b') + with pytest.raises(ValueError, match=msg): + date_range(end='2011-01-01', freq='B') + with pytest.raises(ValueError, match=msg): + date_range(periods=10, freq='D') @pytest.mark.parametrize('freq', ['AS', 'W-SUN']) def test_constructor_datetime64_tzformat(self, freq): @@ -511,18 +518,20 @@ def test_constructor_dtype(self): idx = DatetimeIndex(['2013-01-01', '2013-01-02'], dtype='datetime64[ns, US/Eastern]') - pytest.raises(ValueError, - lambda: DatetimeIndex(idx, - dtype='datetime64[ns]')) + msg = ("cannot supply both a tz and a timezone-naive dtype" + r" \(i\.e\. datetime64\[ns\]\)") + with pytest.raises(ValueError, match=msg): + DatetimeIndex(idx, dtype='datetime64[ns]') # this is effectively trying to convert tz's - pytest.raises(TypeError, - lambda: DatetimeIndex(idx, - dtype='datetime64[ns, CET]')) - pytest.raises(ValueError, - lambda: DatetimeIndex( - idx, tz='CET', - dtype='datetime64[ns, US/Eastern]')) + msg = ("data is already tz-aware US/Eastern, unable to set specified" + " tz: CET") + with pytest.raises(TypeError, match=msg): + DatetimeIndex(idx, dtype='datetime64[ns, CET]') + msg = "cannot supply both a tz and a dtype with a tz" + with pytest.raises(ValueError, match=msg): + DatetimeIndex(idx, tz='CET', dtype='datetime64[ns, US/Eastern]') + result = DatetimeIndex(idx, dtype='datetime64[ns, US/Eastern]') tm.assert_index_equal(idx, result) @@ -732,7 +741,9 @@ def test_from_freq_recreate_from_data(self, freq): def test_datetimeindex_constructor_misc(self): arr = ['1/1/2005', '1/2/2005', 'Jn 3, 2005', '2005-01-04'] - pytest.raises(Exception, DatetimeIndex, arr) + msg = r"(\(u?')?Unknown string format(:', 'Jn 3, 2005'\))?" + with pytest.raises(ValueError, match=msg): + DatetimeIndex(arr) arr = ['1/1/2005', '1/2/2005', '1/3/2005', '2005-01-04'] idx1 = DatetimeIndex(arr) diff --git a/pandas/tests/indexes/datetimes/test_date_range.py b/pandas/tests/indexes/datetimes/test_date_range.py index a9bece248e9d0..a38ee264d362c 100644 --- a/pandas/tests/indexes/datetimes/test_date_range.py +++ b/pandas/tests/indexes/datetimes/test_date_range.py @@ -346,8 +346,10 @@ def test_compat_replace(self, f): def test_catch_infinite_loop(self): offset = offsets.DateOffset(minute=5) # blow up, don't loop forever - pytest.raises(Exception, date_range, datetime(2011, 11, 11), - datetime(2011, 11, 12), freq=offset) + msg = "Offset did not increment date" + with pytest.raises(ValueError, match=msg): + date_range(datetime(2011, 11, 11), datetime(2011, 11, 12), + freq=offset) @pytest.mark.parametrize('periods', (1, 2)) def test_wom_len(self, periods): diff --git a/pandas/tests/indexes/datetimes/test_datetime.py b/pandas/tests/indexes/datetimes/test_datetime.py index e1ba0e1708442..c7147e6fe7063 100644 --- a/pandas/tests/indexes/datetimes/test_datetime.py +++ b/pandas/tests/indexes/datetimes/test_datetime.py @@ -100,9 +100,8 @@ def test_hash_error(self): def test_stringified_slice_with_tz(self): # GH#2658 - import datetime - start = datetime.datetime.now() - idx = date_range(start=start, freq="1d", periods=10) + start = '2013-01-07' + idx = date_range(start=start, freq="1d", periods=10, tz='US/Eastern') df = DataFrame(lrange(10), index=idx) df["2013-01-14 23:44:34.437768-05:00":] # no exception here diff --git a/pandas/tests/indexes/datetimes/test_misc.py b/pandas/tests/indexes/datetimes/test_misc.py index cec181161fc11..fc6080e68a803 100644 --- a/pandas/tests/indexes/datetimes/test_misc.py +++ b/pandas/tests/indexes/datetimes/test_misc.py @@ -190,7 +190,9 @@ def test_datetimeindex_accessors(self): # Ensure is_start/end accessors throw ValueError for CustomBusinessDay, bday_egypt = offsets.CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu') dti = date_range(datetime(2013, 4, 30), periods=5, freq=bday_egypt) - pytest.raises(ValueError, lambda: dti.is_month_start) + msg = "Custom business days is not supported by is_month_start" + with pytest.raises(ValueError, match=msg): + dti.is_month_start dti = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03']) diff --git a/pandas/tests/indexes/datetimes/test_ops.py b/pandas/tests/indexes/datetimes/test_ops.py index 2a546af79931e..84085141fcf92 100644 --- a/pandas/tests/indexes/datetimes/test_ops.py +++ b/pandas/tests/indexes/datetimes/test_ops.py @@ -37,15 +37,19 @@ def test_ops_properties_basic(self): # sanity check that the behavior didn't change # GH#7206 + msg = "'Series' object has no attribute '{}'" for op in ['year', 'day', 'second', 'weekday']: - pytest.raises(TypeError, lambda x: getattr(self.dt_series, op)) + with pytest.raises(AttributeError, match=msg.format(op)): + getattr(self.dt_series, op) # attribute access should still work! s = Series(dict(year=2000, month=1, day=10)) assert s.year == 2000 assert s.month == 1 assert s.day == 10 - pytest.raises(AttributeError, lambda: s.weekday) + msg = "'Series' object has no attribute 'weekday'" + with pytest.raises(AttributeError, match=msg): + s.weekday def test_repeat_range(self, tz_naive_fixture): tz = tz_naive_fixture diff --git a/pandas/tests/indexes/datetimes/test_partial_slicing.py b/pandas/tests/indexes/datetimes/test_partial_slicing.py index 1b2aab9d370a3..64693324521b3 100644 --- a/pandas/tests/indexes/datetimes/test_partial_slicing.py +++ b/pandas/tests/indexes/datetimes/test_partial_slicing.py @@ -170,7 +170,8 @@ def test_partial_slice(self): result = s['2005-1-1'] assert result == s.iloc[0] - pytest.raises(Exception, s.__getitem__, '2004-12-31') + with pytest.raises(KeyError, match=r"^'2004-12-31'$"): + s['2004-12-31'] def test_partial_slice_daily(self): rng = date_range(freq='H', start=datetime(2005, 1, 31), periods=500) @@ -179,7 +180,8 @@ def test_partial_slice_daily(self): result = s['2005-1-31'] tm.assert_series_equal(result, s.iloc[:24]) - pytest.raises(Exception, s.__getitem__, '2004-12-31 00') + with pytest.raises(KeyError, match=r"^'2004-12-31 00'$"): + s['2004-12-31 00'] def test_partial_slice_hourly(self): rng = date_range(freq='T', start=datetime(2005, 1, 1, 20, 0, 0), @@ -193,7 +195,8 @@ def test_partial_slice_hourly(self): tm.assert_series_equal(result, s.iloc[:60]) assert s['2005-1-1 20:00'] == s.iloc[0] - pytest.raises(Exception, s.__getitem__, '2004-12-31 00:15') + with pytest.raises(KeyError, match=r"^'2004-12-31 00:15'$"): + s['2004-12-31 00:15'] def test_partial_slice_minutely(self): rng = date_range(freq='S', start=datetime(2005, 1, 1, 23, 59, 0), @@ -207,7 +210,8 @@ def test_partial_slice_minutely(self): tm.assert_series_equal(result, s.iloc[:60]) assert s[Timestamp('2005-1-1 23:59:00')] == s.iloc[0] - pytest.raises(Exception, s.__getitem__, '2004-12-31 00:00:00') + with pytest.raises(KeyError, match=r"^'2004-12-31 00:00:00'$"): + s['2004-12-31 00:00:00'] def test_partial_slice_second_precision(self): rng = date_range(start=datetime(2005, 1, 1, 0, 0, 59, @@ -255,7 +259,9 @@ def test_partial_slicing_dataframe(self): result = df['a'][ts_string] assert isinstance(result, np.int64) assert result == expected - pytest.raises(KeyError, df.__getitem__, ts_string) + msg = r"^'{}'$".format(ts_string) + with pytest.raises(KeyError, match=msg): + df[ts_string] # Timestamp with resolution less precise than index for fmt in formats[:rnum]: @@ -282,15 +288,20 @@ def test_partial_slicing_dataframe(self): result = df['a'][ts_string] assert isinstance(result, np.int64) assert result == 2 - pytest.raises(KeyError, df.__getitem__, ts_string) + msg = r"^'{}'$".format(ts_string) + with pytest.raises(KeyError, match=msg): + df[ts_string] # Not compatible with existing key # Should raise KeyError for fmt, res in list(zip(formats, resolutions))[rnum + 1:]: ts = index[1] + Timedelta("1 " + res) ts_string = ts.strftime(fmt) - pytest.raises(KeyError, df['a'].__getitem__, ts_string) - pytest.raises(KeyError, df.__getitem__, ts_string) + msg = r"^'{}'$".format(ts_string) + with pytest.raises(KeyError, match=msg): + df['a'][ts_string] + with pytest.raises(KeyError, match=msg): + df[ts_string] def test_partial_slicing_with_multiindex(self): @@ -316,11 +327,10 @@ def test_partial_slicing_with_multiindex(self): # this is an IndexingError as we don't do partial string selection on # multi-levels. - def f(): + msg = "Too many indexers" + with pytest.raises(IndexingError, match=msg): df_multi.loc[('2013-06-19', 'ACCT1', 'ABC')] - pytest.raises(IndexingError, f) - # GH 4294 # partial slice on a series mi s = pd.DataFrame(np.random.rand(1000, 1000), index=pd.date_range( @@ -386,3 +396,30 @@ def test_selection_by_datetimelike(self, datetimelike, op, expected): result = op(df.A, datetimelike) expected = Series(expected, name='A') tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize('start', [ + '2018-12-02 21:50:00+00:00', pd.Timestamp('2018-12-02 21:50:00+00:00'), + pd.Timestamp('2018-12-02 21:50:00+00:00').to_pydatetime() + ]) + @pytest.mark.parametrize('end', [ + '2018-12-02 21:52:00+00:00', pd.Timestamp('2018-12-02 21:52:00+00:00'), + pd.Timestamp('2018-12-02 21:52:00+00:00').to_pydatetime() + ]) + def test_getitem_with_datestring_with_UTC_offset(self, start, end): + # GH 24076 + idx = pd.date_range(start='2018-12-02 14:50:00-07:00', + end='2018-12-02 14:50:00-07:00', freq='1min') + df = pd.DataFrame(1, index=idx, columns=['A']) + result = df[start:end] + expected = df.iloc[0:3, :] + tm.assert_frame_equal(result, expected) + + # GH 16785 + start = str(start) + end = str(end) + with pytest.raises(ValueError, match="Both dates must"): + df[start:end[:-4] + '1:00'] + + with pytest.raises(ValueError, match="The index must be timezone"): + df = df.tz_localize(None) + df[start:end] diff --git a/pandas/tests/indexes/datetimes/test_scalar_compat.py b/pandas/tests/indexes/datetimes/test_scalar_compat.py index 680eddd27cf9f..42338a751e0fc 100644 --- a/pandas/tests/indexes/datetimes/test_scalar_compat.py +++ b/pandas/tests/indexes/datetimes/test_scalar_compat.py @@ -7,6 +7,8 @@ import numpy as np import pytest +from pandas._libs.tslibs.np_datetime import OutOfBoundsDatetime + import pandas as pd from pandas import DatetimeIndex, Timestamp, date_range import pandas.util.testing as tm @@ -27,10 +29,14 @@ def test_dti_date(self): expected = [t.date() for t in rng] assert (result == expected).all() - def test_dti_date_out_of_range(self): + @pytest.mark.parametrize('data', [ + ['1400-01-01'], + [datetime(1400, 1, 1)]]) + def test_dti_date_out_of_range(self, data): # GH#1475 - pytest.raises(ValueError, DatetimeIndex, ['1400-01-01']) - pytest.raises(ValueError, DatetimeIndex, [datetime(1400, 1, 1)]) + msg = "Out of bounds nanosecond timestamp: 1400-01-01 00:00:00" + with pytest.raises(OutOfBoundsDatetime, match=msg): + DatetimeIndex(data) @pytest.mark.parametrize('field', [ 'dayofweek', 'dayofyear', 'week', 'weekofyear', 'quarter', @@ -74,9 +80,15 @@ def test_round_daily(self): result = dti.round('s') tm.assert_index_equal(result, dti) - # invalid - for freq in ['Y', 'M', 'foobar']: - pytest.raises(ValueError, lambda: dti.round(freq)) + @pytest.mark.parametrize('freq, error_msg', [ + ('Y', ' is a non-fixed frequency'), + ('M', ' is a non-fixed frequency'), + ('foobar', 'Invalid frequency: foobar')]) + def test_round_invalid(self, freq, error_msg): + dti = date_range('20130101 09:10:11', periods=5) + dti = dti.tz_localize('UTC').tz_convert('US/Eastern') + with pytest.raises(ValueError, match=error_msg): + dti.round(freq) def test_round(self, tz_naive_fixture): tz = tz_naive_fixture diff --git a/pandas/tests/indexes/datetimes/test_setops.py b/pandas/tests/indexes/datetimes/test_setops.py index bd37cc815d0f7..cf1f75234ec62 100644 --- a/pandas/tests/indexes/datetimes/test_setops.py +++ b/pandas/tests/indexes/datetimes/test_setops.py @@ -21,83 +21,107 @@ class TestDatetimeIndexSetOps(object): 'dateutil/US/Pacific'] # TODO: moved from test_datetimelike; dedup with version below - def test_union2(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union2(self, sort): everything = tm.makeDateIndex(10) first = everything[:5] second = everything[5:] - union = first.union(second) - assert tm.equalContents(union, everything) + union = first.union(second, sort=sort) + tm.assert_index_equal(union, everything) # GH 10149 cases = [klass(second.values) for klass in [np.array, Series, list]] for case in cases: - result = first.union(case) - assert tm.equalContents(result, everything) + result = first.union(case, sort=sort) + tm.assert_index_equal(result, everything) @pytest.mark.parametrize("tz", tz) - def test_union(self, tz): + @pytest.mark.parametrize("sort", [None, False]) + def test_union(self, tz, sort): rng1 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) other1 = pd.date_range('1/6/2000', freq='D', periods=5, tz=tz) expected1 = pd.date_range('1/1/2000', freq='D', periods=10, tz=tz) + expected1_notsorted = pd.DatetimeIndex(list(other1) + list(rng1)) rng2 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) other2 = pd.date_range('1/4/2000', freq='D', periods=5, tz=tz) expected2 = pd.date_range('1/1/2000', freq='D', periods=8, tz=tz) + expected2_notsorted = pd.DatetimeIndex(list(other2) + list(rng2[:3])) rng3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) other3 = pd.DatetimeIndex([], tz=tz) expected3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - - for rng, other, expected in [(rng1, other1, expected1), - (rng2, other2, expected2), - (rng3, other3, expected3)]: - - result_union = rng.union(other) - tm.assert_index_equal(result_union, expected) - - def test_union_coverage(self): + expected3_notsorted = rng3 + + for rng, other, exp, exp_notsorted in [(rng1, other1, expected1, + expected1_notsorted), + (rng2, other2, expected2, + expected2_notsorted), + (rng3, other3, expected3, + expected3_notsorted)]: + + result_union = rng.union(other, sort=sort) + tm.assert_index_equal(result_union, exp) + + result_union = other.union(rng, sort=sort) + if sort is None: + tm.assert_index_equal(result_union, exp) + else: + tm.assert_index_equal(result_union, exp_notsorted) + + @pytest.mark.parametrize("sort", [None, False]) + def test_union_coverage(self, sort): idx = DatetimeIndex(['2000-01-03', '2000-01-01', '2000-01-02']) ordered = DatetimeIndex(idx.sort_values(), freq='infer') - result = ordered.union(idx) + result = ordered.union(idx, sort=sort) tm.assert_index_equal(result, ordered) - result = ordered[:0].union(ordered) + result = ordered[:0].union(ordered, sort=sort) tm.assert_index_equal(result, ordered) assert result.freq == ordered.freq - def test_union_bug_1730(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union_bug_1730(self, sort): rng_a = date_range('1/1/2012', periods=4, freq='3H') rng_b = date_range('1/1/2012', periods=4, freq='4H') - result = rng_a.union(rng_b) + result = rng_a.union(rng_b, sort=sort) exp = DatetimeIndex(sorted(set(list(rng_a)) | set(list(rng_b)))) tm.assert_index_equal(result, exp) - def test_union_bug_1745(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union_bug_1745(self, sort): left = DatetimeIndex(['2012-05-11 15:19:49.695000']) right = DatetimeIndex(['2012-05-29 13:04:21.322000', '2012-05-11 15:27:24.873000', '2012-05-11 15:31:05.350000']) - result = left.union(right) - exp = DatetimeIndex(sorted(set(list(left)) | set(list(right)))) + result = left.union(right, sort=sort) + exp = DatetimeIndex(['2012-05-11 15:19:49.695000', + '2012-05-29 13:04:21.322000', + '2012-05-11 15:27:24.873000', + '2012-05-11 15:31:05.350000']) + if sort is None: + exp = exp.sort_values() tm.assert_index_equal(result, exp) - def test_union_bug_4564(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union_bug_4564(self, sort): from pandas import DateOffset left = date_range("2013-01-01", "2013-02-01") right = left + DateOffset(minutes=15) - result = left.union(right) + result = left.union(right, sort=sort) exp = DatetimeIndex(sorted(set(list(left)) | set(list(right)))) tm.assert_index_equal(result, exp) - def test_union_freq_both_none(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union_freq_both_none(self, sort): # GH11086 expected = bdate_range('20150101', periods=10) expected.freq = None - result = expected.union(expected) + result = expected.union(expected, sort=sort) tm.assert_index_equal(result, expected) assert result.freq is None @@ -112,11 +136,14 @@ def test_union_dataframe_index(self): exp = pd.date_range('1/1/1980', '1/1/2012', freq='MS') tm.assert_index_equal(df.index, exp) - def test_union_with_DatetimeIndex(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union_with_DatetimeIndex(self, sort): i1 = Int64Index(np.arange(0, 20, 2)) i2 = date_range(start='2012-01-03 00:00:00', periods=10, freq='D') - i1.union(i2) # Works - i2.union(i1) # Fails with "AttributeError: can't set attribute" + # Works + i1.union(i2, sort=sort) + # Fails with "AttributeError: can't set attribute" + i2.union(i1, sort=sort) # TODO: moved from test_datetimelike; de-duplicate with version below def test_intersection2(self): @@ -138,7 +165,7 @@ def test_intersection2(self): @pytest.mark.parametrize("tz", [None, 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific']) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, tz, sort): # GH 4690 (with tz) base = date_range('6/1/2000', '6/30/2000', freq='D', name='idx') @@ -187,7 +214,7 @@ def test_intersection(self, tz, sort): for (rng, expected) in [(rng2, expected2), (rng3, expected3), (rng4, expected4)]: result = base.intersection(rng, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) assert result.name == expected.name @@ -212,7 +239,7 @@ def test_intersection_bug_1708(self): assert len(result) == 0 @pytest.mark.parametrize("tz", tz) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference(self, tz, sort): rng_dates = ['1/2/2000', '1/3/2000', '1/1/2000', '1/4/2000', '1/5/2000'] @@ -233,11 +260,11 @@ def test_difference(self, tz, sort): (rng2, other2, expected2), (rng3, other3, expected3)]: result_diff = rng.difference(other, sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result_diff, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_freq(self, sort): # GH14323: difference of DatetimeIndex should not preserve frequency @@ -254,7 +281,7 @@ def test_difference_freq(self, sort): tm.assert_index_equal(idx_diff, expected) tm.assert_attr_equal('freq', idx_diff, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_datetimeindex_diff(self, sort): dti1 = date_range(freq='Q-JAN', start=datetime(1997, 12, 31), periods=100) @@ -262,11 +289,12 @@ def test_datetimeindex_diff(self, sort): periods=98) assert len(dti1.difference(dti2, sort)) == 2 - def test_datetimeindex_union_join_empty(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_datetimeindex_union_join_empty(self, sort): dti = date_range(start='1/1/2001', end='2/1/2001', freq='D') empty = Index([]) - result = dti.union(empty) + result = dti.union(empty, sort=sort) assert isinstance(result, DatetimeIndex) assert result is result @@ -287,35 +315,40 @@ class TestBusinessDatetimeIndex(object): def setup_method(self, method): self.rng = bdate_range(START, END) - def test_union(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union(self, sort): # overlapping left = self.rng[:10] right = self.rng[5:10] - the_union = left.union(right) + the_union = left.union(right, sort=sort) assert isinstance(the_union, DatetimeIndex) # non-overlapping, gap in middle left = self.rng[:5] right = self.rng[10:] - the_union = left.union(right) + the_union = left.union(right, sort=sort) assert isinstance(the_union, Index) # non-overlapping, no gap left = self.rng[:5] right = self.rng[5:10] - the_union = left.union(right) + the_union = left.union(right, sort=sort) assert isinstance(the_union, DatetimeIndex) # order does not matter - tm.assert_index_equal(right.union(left), the_union) + if sort is None: + tm.assert_index_equal(right.union(left, sort=sort), the_union) + else: + expected = pd.DatetimeIndex(list(right) + list(left)) + tm.assert_index_equal(right.union(left, sort=sort), expected) # overlapping, but different offset rng = date_range(START, END, freq=BMonthEnd()) - the_union = self.rng.union(rng) + the_union = self.rng.union(rng, sort=sort) assert isinstance(the_union, DatetimeIndex) def test_outer_join(self): @@ -350,16 +383,21 @@ def test_outer_join(self): assert isinstance(the_join, DatetimeIndex) assert the_join.freq is None - def test_union_not_cacheable(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union_not_cacheable(self, sort): rng = date_range('1/1/2000', periods=50, freq=Minute()) rng1 = rng[10:] rng2 = rng[:25] - the_union = rng1.union(rng2) - tm.assert_index_equal(the_union, rng) + the_union = rng1.union(rng2, sort=sort) + if sort is None: + tm.assert_index_equal(the_union, rng) + else: + expected = pd.DatetimeIndex(list(rng[10:]) + list(rng[:10])) + tm.assert_index_equal(the_union, expected) rng1 = rng[10:] rng2 = rng[15:35] - the_union = rng1.union(rng2) + the_union = rng1.union(rng2, sort=sort) expected = rng[10:] tm.assert_index_equal(the_union, expected) @@ -388,7 +426,8 @@ def test_intersection_bug(self): result = a.intersection(b) tm.assert_index_equal(result, b) - def test_month_range_union_tz_pytz(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_month_range_union_tz_pytz(self, sort): from pytz import timezone tz = timezone('US/Eastern') @@ -403,10 +442,11 @@ def test_month_range_union_tz_pytz(self): late_dr = date_range(start=late_start, end=late_end, tz=tz, freq=MonthEnd()) - early_dr.union(late_dr) + early_dr.union(late_dr, sort=sort) @td.skip_if_windows_python_3 - def test_month_range_union_tz_dateutil(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_month_range_union_tz_dateutil(self, sort): from pandas._libs.tslibs.timezones import dateutil_gettz tz = dateutil_gettz('US/Eastern') @@ -421,7 +461,7 @@ def test_month_range_union_tz_dateutil(self): late_dr = date_range(start=late_start, end=late_end, tz=tz, freq=MonthEnd()) - early_dr.union(late_dr) + early_dr.union(late_dr, sort=sort) class TestCustomDatetimeIndex(object): @@ -429,35 +469,37 @@ class TestCustomDatetimeIndex(object): def setup_method(self, method): self.rng = bdate_range(START, END, freq='C') - def test_union(self): + @pytest.mark.parametrize("sort", [None, False]) + def test_union(self, sort): # overlapping left = self.rng[:10] right = self.rng[5:10] - the_union = left.union(right) + the_union = left.union(right, sort=sort) assert isinstance(the_union, DatetimeIndex) # non-overlapping, gap in middle left = self.rng[:5] right = self.rng[10:] - the_union = left.union(right) + the_union = left.union(right, sort) assert isinstance(the_union, Index) # non-overlapping, no gap left = self.rng[:5] right = self.rng[5:10] - the_union = left.union(right) + the_union = left.union(right, sort=sort) assert isinstance(the_union, DatetimeIndex) # order does not matter - tm.assert_index_equal(right.union(left), the_union) + if sort is None: + tm.assert_index_equal(right.union(left, sort=sort), the_union) # overlapping, but different offset rng = date_range(START, END, freq=BMonthEnd()) - the_union = self.rng.union(rng) + the_union = self.rng.union(rng, sort=sort) assert isinstance(the_union, DatetimeIndex) def test_outer_join(self): diff --git a/pandas/tests/indexes/datetimes/test_timezones.py b/pandas/tests/indexes/datetimes/test_timezones.py index 8bcc9296cb010..b25918417efcd 100644 --- a/pandas/tests/indexes/datetimes/test_timezones.py +++ b/pandas/tests/indexes/datetimes/test_timezones.py @@ -434,24 +434,19 @@ def test_dti_tz_localize_utc_conversion(self, tz): with pytest.raises(pytz.NonExistentTimeError): rng.tz_localize(tz) - @pytest.mark.parametrize('idx', [ - date_range(start='2014-01-01', end='2014-12-31', freq='M'), - date_range(start='2014-01-01', end='2014-12-31', freq='D'), - date_range(start='2014-01-01', end='2014-03-01', freq='H'), - date_range(start='2014-08-01', end='2014-10-31', freq='T') - ]) - def test_dti_tz_localize_roundtrip(self, tz_aware_fixture, idx): + def test_dti_tz_localize_roundtrip(self, tz_aware_fixture): + # note: this tz tests that a tz-naive index can be localized + # and de-localized successfully, when there are no DST transitions + # in the range. + idx = date_range(start='2014-06-01', end='2014-08-30', freq='15T') tz = tz_aware_fixture localized = idx.tz_localize(tz) - expected = date_range(start=idx[0], end=idx[-1], freq=idx.freq, - tz=tz) - tm.assert_index_equal(localized, expected) + # cant localize a tz-aware object with pytest.raises(TypeError): localized.tz_localize(tz) - reset = localized.tz_localize(None) - tm.assert_index_equal(reset, idx) assert reset.tzinfo is None + tm.assert_index_equal(reset, idx) def test_dti_tz_localize_naive(self): rng = date_range('1/1/2011', periods=100, freq='H') @@ -830,6 +825,13 @@ def test_dti_drop_dont_lose_tz(self): assert ind.tz is not None + def test_dti_tz_conversion_freq(self, tz_naive_fixture): + # GH25241 + t3 = DatetimeIndex(['2019-01-01 10:00'], freq='H') + assert t3.tz_localize(tz=tz_naive_fixture).freq == t3.freq + t4 = DatetimeIndex(['2019-01-02 12:00'], tz='UTC', freq='T') + assert t4.tz_convert(tz='UTC').freq == t4.freq + def test_drop_dst_boundary(self): # see gh-18031 tz = "Europe/Brussels" diff --git a/pandas/tests/indexes/datetimes/test_tools.py b/pandas/tests/indexes/datetimes/test_tools.py index bec2fa66c43cd..a72aacce2f86d 100644 --- a/pandas/tests/indexes/datetimes/test_tools.py +++ b/pandas/tests/indexes/datetimes/test_tools.py @@ -247,6 +247,55 @@ def test_to_datetime_parse_timezone_keeps_name(self): class TestToDatetime(object): + @pytest.mark.parametrize("s, _format, dt", [ + ['2015-1-1', '%G-%V-%u', datetime(2014, 12, 29, 0, 0)], + ['2015-1-4', '%G-%V-%u', datetime(2015, 1, 1, 0, 0)], + ['2015-1-7', '%G-%V-%u', datetime(2015, 1, 4, 0, 0)] + ]) + def test_to_datetime_iso_week_year_format(self, s, _format, dt): + # See GH#16607 + assert to_datetime(s, format=_format) == dt + + @pytest.mark.parametrize("msg, s, _format", [ + ["ISO week directive '%V' must be used with the ISO year directive " + "'%G' and a weekday directive '%A', '%a', '%w', or '%u'.", "1999 50", + "%Y %V"], + ["ISO year directive '%G' must be used with the ISO week directive " + "'%V' and a weekday directive '%A', '%a', '%w', or '%u'.", "1999 51", + "%G %V"], + ["ISO year directive '%G' must be used with the ISO week directive " + "'%V' and a weekday directive '%A', '%a', '%w', or '%u'.", "1999 " + "Monday", "%G %A"], + ["ISO year directive '%G' must be used with the ISO week directive " + "'%V' and a weekday directive '%A', '%a', '%w', or '%u'.", "1999 Mon", + "%G %a"], + ["ISO year directive '%G' must be used with the ISO week directive " + "'%V' and a weekday directive '%A', '%a', '%w', or '%u'.", "1999 6", + "%G %w"], + ["ISO year directive '%G' must be used with the ISO week directive " + "'%V' and a weekday directive '%A', '%a', '%w', or '%u'.", "1999 6", + "%G %u"], + ["ISO year directive '%G' must be used with the ISO week directive " + "'%V' and a weekday directive '%A', '%a', '%w', or '%u'.", "2051", + "%G"], + ["Day of the year directive '%j' is not compatible with ISO year " + "directive '%G'. Use '%Y' instead.", "1999 51 6 256", "%G %V %u %j"], + ["ISO week directive '%V' is incompatible with the year directive " + "'%Y'. Use the ISO year '%G' instead.", "1999 51 Sunday", "%Y %V %A"], + ["ISO week directive '%V' is incompatible with the year directive " + "'%Y'. Use the ISO year '%G' instead.", "1999 51 Sun", "%Y %V %a"], + ["ISO week directive '%V' is incompatible with the year directive " + "'%Y'. Use the ISO year '%G' instead.", "1999 51 1", "%Y %V %w"], + ["ISO week directive '%V' is incompatible with the year directive " + "'%Y'. Use the ISO year '%G' instead.", "1999 51 1", "%Y %V %u"], + ["ISO week directive '%V' must be used with the ISO year directive " + "'%G' and a weekday directive '%A', '%a', '%w', or '%u'.", "20", "%V"] + ]) + def test_ValueError_iso_week_year(self, msg, s, _format): + # See GH#16607 + with pytest.raises(ValueError, match=msg): + to_datetime(s, format=_format) + @pytest.mark.parametrize('tz', [None, 'US/Central']) def test_to_datetime_dtarr(self, tz): # DatetimeArray @@ -346,12 +395,16 @@ def test_to_datetime_dt64s(self, cache): for dt in in_bound_dts: assert pd.to_datetime(dt, cache=cache) == Timestamp(dt) - oob_dts = [np.datetime64('1000-01-01'), np.datetime64('5000-01-02'), ] - - for dt in oob_dts: - pytest.raises(ValueError, pd.to_datetime, dt, errors='raise') - pytest.raises(ValueError, Timestamp, dt) - assert pd.to_datetime(dt, errors='coerce', cache=cache) is NaT + @pytest.mark.parametrize('dt', [np.datetime64('1000-01-01'), + np.datetime64('5000-01-02')]) + @pytest.mark.parametrize('cache', [True, False]) + def test_to_datetime_dt64s_out_of_bounds(self, cache, dt): + msg = "Out of bounds nanosecond timestamp: {}".format(dt) + with pytest.raises(OutOfBoundsDatetime, match=msg): + pd.to_datetime(dt, errors='raise') + with pytest.raises(OutOfBoundsDatetime, match=msg): + Timestamp(dt) + assert pd.to_datetime(dt, errors='coerce', cache=cache) is NaT @pytest.mark.parametrize('cache', [True, False]) def test_to_datetime_array_of_dt64s(self, cache): @@ -367,8 +420,9 @@ def test_to_datetime_array_of_dt64s(self, cache): # A list of datetimes where the last one is out of bounds dts_with_oob = dts + [np.datetime64('9999-01-01')] - pytest.raises(ValueError, pd.to_datetime, dts_with_oob, - errors='raise') + msg = "Out of bounds nanosecond timestamp: 9999-01-01 00:00:00" + with pytest.raises(OutOfBoundsDatetime, match=msg): + pd.to_datetime(dts_with_oob, errors='raise') tm.assert_numpy_array_equal( pd.to_datetime(dts_with_oob, box=False, errors='coerce', @@ -410,7 +464,10 @@ def test_to_datetime_tz(self, cache): # mixed tzs will raise arr = [pd.Timestamp('2013-01-01 13:00:00', tz='US/Pacific'), pd.Timestamp('2013-01-02 14:00:00', tz='US/Eastern')] - pytest.raises(ValueError, lambda: pd.to_datetime(arr, cache=cache)) + msg = ("Tz-aware datetime.datetime cannot be converted to datetime64" + " unless utc=True") + with pytest.raises(ValueError, match=msg): + pd.to_datetime(arr, cache=cache) @pytest.mark.parametrize('cache', [True, False]) def test_to_datetime_tz_pytz(self, cache): @@ -706,6 +763,29 @@ def test_iso_8601_strings_with_different_offsets(self): NaT], tz='UTC') tm.assert_index_equal(result, expected) + def test_iss8601_strings_mixed_offsets_with_naive(self): + # GH 24992 + result = pd.to_datetime([ + '2018-11-28T00:00:00', + '2018-11-28T00:00:00+12:00', + '2018-11-28T00:00:00', + '2018-11-28T00:00:00+06:00', + '2018-11-28T00:00:00' + ], utc=True) + expected = pd.to_datetime([ + '2018-11-28T00:00:00', + '2018-11-27T12:00:00', + '2018-11-28T00:00:00', + '2018-11-27T18:00:00', + '2018-11-28T00:00:00' + ], utc=True) + tm.assert_index_equal(result, expected) + + items = ['2018-11-28T00:00:00+12:00', '2018-11-28T00:00:00'] + result = pd.to_datetime(items, utc=True) + expected = pd.to_datetime(list(reversed(items)), utc=True)[::-1] + tm.assert_index_equal(result, expected) + def test_non_iso_strings_with_tz_offset(self): result = to_datetime(['March 1, 2018 12:00:00+0400'] * 2) expected = DatetimeIndex([datetime(2018, 3, 1, 12, @@ -1088,9 +1168,9 @@ def test_to_datetime_on_datetime64_series(self, cache): def test_to_datetime_with_space_in_series(self, cache): # GH 6428 s = Series(['10/18/2006', '10/18/2008', ' ']) - pytest.raises(ValueError, lambda: to_datetime(s, - errors='raise', - cache=cache)) + msg = r"(\(u?')?String does not contain a date(:', ' '\))?" + with pytest.raises(ValueError, match=msg): + to_datetime(s, errors='raise', cache=cache) result_coerce = to_datetime(s, errors='coerce', cache=cache) expected_coerce = Series([datetime(2006, 10, 18), datetime(2008, 10, 18), @@ -1111,13 +1191,12 @@ def test_to_datetime_with_apply(self, cache): assert_series_equal(result, expected) td = pd.Series(['May 04', 'Jun 02', ''], index=[1, 2, 3]) - pytest.raises(ValueError, - lambda: pd.to_datetime(td, format='%b %y', - errors='raise', - cache=cache)) - pytest.raises(ValueError, - lambda: td.apply(pd.to_datetime, format='%b %y', - errors='raise', cache=cache)) + msg = r"time data '' does not match format '%b %y' \(match\)" + with pytest.raises(ValueError, match=msg): + pd.to_datetime(td, format='%b %y', errors='raise', cache=cache) + with pytest.raises(ValueError, match=msg): + td.apply(pd.to_datetime, format='%b %y', + errors='raise', cache=cache) expected = pd.to_datetime(td, format='%b %y', errors='coerce', cache=cache) @@ -1168,8 +1247,9 @@ def test_to_datetime_unprocessable_input(self, cache, box, klass): result = to_datetime([1, '1'], errors='ignore', cache=cache, box=box) expected = klass(np.array([1, '1'], dtype='O')) tm.assert_equal(result, expected) - pytest.raises(TypeError, to_datetime, [1, '1'], errors='raise', - cache=cache, box=box) + msg = "invalid string coercion to datetime" + with pytest.raises(TypeError, match=msg): + to_datetime([1, '1'], errors='raise', cache=cache, box=box) def test_to_datetime_other_datetime64_units(self): # 5/25/2012 @@ -1225,17 +1305,18 @@ def test_string_na_nat_conversion(self, cache): malformed = np.array(['1/100/2000', np.nan], dtype=object) # GH 10636, default is now 'raise' - pytest.raises(ValueError, - lambda: to_datetime(malformed, errors='raise', - cache=cache)) + msg = (r"\(u?'Unknown string format:', '1/100/2000'\)|" + "day is out of range for month") + with pytest.raises(ValueError, match=msg): + to_datetime(malformed, errors='raise', cache=cache) result = to_datetime(malformed, errors='ignore', cache=cache) # GH 21864 expected = Index(malformed) tm.assert_index_equal(result, expected) - pytest.raises(ValueError, to_datetime, malformed, errors='raise', - cache=cache) + with pytest.raises(ValueError, match=msg): + to_datetime(malformed, errors='raise', cache=cache) idx = ['a', 'b', 'c', 'd', 'e'] series = Series(['1/1/2000', np.nan, '1/3/2000', np.nan, @@ -1414,14 +1495,24 @@ def test_day_not_in_month_coerce(self, cache): @pytest.mark.parametrize('cache', [True, False]) def test_day_not_in_month_raise(self, cache): - pytest.raises(ValueError, to_datetime, '2015-02-29', - errors='raise', cache=cache) - pytest.raises(ValueError, to_datetime, '2015-02-29', - errors='raise', format="%Y-%m-%d", cache=cache) - pytest.raises(ValueError, to_datetime, '2015-02-32', - errors='raise', format="%Y-%m-%d", cache=cache) - pytest.raises(ValueError, to_datetime, '2015-04-31', - errors='raise', format="%Y-%m-%d", cache=cache) + msg = "day is out of range for month" + with pytest.raises(ValueError, match=msg): + to_datetime('2015-02-29', errors='raise', cache=cache) + + msg = "time data 2015-02-29 doesn't match format specified" + with pytest.raises(ValueError, match=msg): + to_datetime('2015-02-29', errors='raise', format="%Y-%m-%d", + cache=cache) + + msg = "time data 2015-02-32 doesn't match format specified" + with pytest.raises(ValueError, match=msg): + to_datetime('2015-02-32', errors='raise', format="%Y-%m-%d", + cache=cache) + + msg = "time data 2015-04-31 doesn't match format specified" + with pytest.raises(ValueError, match=msg): + to_datetime('2015-04-31', errors='raise', format="%Y-%m-%d", + cache=cache) @pytest.mark.parametrize('cache', [True, False]) def test_day_not_in_month_ignore(self, cache): @@ -1656,7 +1747,9 @@ def test_parsers_time(self): assert tools.to_time(time_string) == expected new_string = "14.15" - pytest.raises(ValueError, tools.to_time, new_string) + msg = r"Cannot convert arg \['14\.15'\] to a time" + with pytest.raises(ValueError, match=msg): + tools.to_time(new_string) assert tools.to_time(new_string, format="%H.%M") == expected arg = ["14:15", "20:20"] @@ -1824,6 +1917,15 @@ def test_invalid_origins_tzinfo(self): pd.to_datetime(1, unit='D', origin=datetime(2000, 1, 1, tzinfo=pytz.utc)) + @pytest.mark.parametrize("format", [ + None, "%Y-%m-%d %H:%M:%S" + ]) + def test_to_datetime_out_of_bounds_with_format_arg(self, format): + # see gh-23830 + msg = "Out of bounds nanosecond timestamp" + with pytest.raises(OutOfBoundsDatetime, match=msg): + to_datetime("2417-10-27 00:00:00", format=format) + def test_processing_order(self): # make sure we handle out-of-bounds *before* # constructing the dates diff --git a/pandas/tests/indexes/interval/test_interval.py b/pandas/tests/indexes/interval/test_interval.py index db69258c1d3d2..ba451da10573a 100644 --- a/pandas/tests/indexes/interval/test_interval.py +++ b/pandas/tests/indexes/interval/test_interval.py @@ -242,12 +242,10 @@ def test_take(self, closed): [0, 0, 1], [1, 1, 2], closed=closed) tm.assert_index_equal(result, expected) - def test_unique(self, closed): - # unique non-overlapping - idx = IntervalIndex.from_tuples( - [(0, 1), (2, 3), (4, 5)], closed=closed) - assert idx.is_unique is True - + def test_is_unique_interval(self, closed): + """ + Interval specific tests for is_unique in addition to base class tests + """ # unique overlapping - distinct endpoints idx = IntervalIndex.from_tuples([(0, 1), (0.5, 1.5)], closed=closed) assert idx.is_unique is True @@ -261,15 +259,6 @@ def test_unique(self, closed): idx = IntervalIndex.from_tuples([(-1, 1), (-2, 2)], closed=closed) assert idx.is_unique is True - # duplicate - idx = IntervalIndex.from_tuples( - [(0, 1), (0, 1), (2, 3)], closed=closed) - assert idx.is_unique is False - - # empty - idx = IntervalIndex([], closed=closed) - assert idx.is_unique is True - def test_monotonic(self, closed): # increasing non-overlapping idx = IntervalIndex.from_tuples( @@ -414,13 +403,16 @@ def test_get_item(self, closed): # To be removed, replaced by test_interval_new.py (see #16316, #16386) def test_get_loc_value(self): - pytest.raises(KeyError, self.index.get_loc, 0) + with pytest.raises(KeyError, match="^0$"): + self.index.get_loc(0) assert self.index.get_loc(0.5) == 0 assert self.index.get_loc(1) == 0 assert self.index.get_loc(1.5) == 1 assert self.index.get_loc(2) == 1 - pytest.raises(KeyError, self.index.get_loc, -1) - pytest.raises(KeyError, self.index.get_loc, 3) + with pytest.raises(KeyError, match="^-1$"): + self.index.get_loc(-1) + with pytest.raises(KeyError, match="^3$"): + self.index.get_loc(3) idx = IntervalIndex.from_tuples([(0, 2), (1, 3)]) assert idx.get_loc(0.5) == 0 @@ -430,10 +422,12 @@ def test_get_loc_value(self): tm.assert_numpy_array_equal(np.sort(idx.get_loc(2)), np.array([0, 1], dtype='intp')) assert idx.get_loc(3) == 1 - pytest.raises(KeyError, idx.get_loc, 3.5) + with pytest.raises(KeyError, match=r"^3\.5$"): + idx.get_loc(3.5) idx = IntervalIndex.from_arrays([0, 2], [1, 3]) - pytest.raises(KeyError, idx.get_loc, 1.5) + with pytest.raises(KeyError, match=r"^1\.5$"): + idx.get_loc(1.5) # To be removed, replaced by test_interval_new.py (see #16316, #16386) def slice_locs_cases(self, breaks): @@ -497,7 +491,9 @@ def test_slice_locs_decreasing_float64(self): # To be removed, replaced by test_interval_new.py (see #16316, #16386) def test_slice_locs_fails(self): index = IntervalIndex.from_tuples([(1, 2), (0, 1), (2, 3)]) - with pytest.raises(KeyError): + msg = ("'can only get slices from an IntervalIndex if bounds are" + " non-overlapping and all monotonic increasing or decreasing'") + with pytest.raises(KeyError, match=msg): index.slice_locs(1, 2) # To be removed, replaced by test_interval_new.py (see #16316, #16386) @@ -505,9 +501,12 @@ def test_get_loc_interval(self): assert self.index.get_loc(Interval(0, 1)) == 0 assert self.index.get_loc(Interval(0, 0.5)) == 0 assert self.index.get_loc(Interval(0, 1, 'left')) == 0 - pytest.raises(KeyError, self.index.get_loc, Interval(2, 3)) - pytest.raises(KeyError, self.index.get_loc, - Interval(-1, 0, 'left')) + msg = r"Interval\(2, 3, closed='right'\)" + with pytest.raises(KeyError, match=msg): + self.index.get_loc(Interval(2, 3)) + msg = r"Interval\(-1, 0, closed='left'\)" + with pytest.raises(KeyError, match=msg): + self.index.get_loc(Interval(-1, 0, 'left')) # Make consistent with test_interval_new.py (see #16316, #16386) @pytest.mark.parametrize('item', [3, Interval(1, 4)]) @@ -783,19 +782,19 @@ def test_non_contiguous(self, closed): assert 1.5 not in index - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union(self, closed, sort): index = self.create_index(closed=closed) other = IntervalIndex.from_breaks(range(5, 13), closed=closed) expected = IntervalIndex.from_breaks(range(13), closed=closed) result = index[::-1].union(other, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) result = other[::-1].union(index, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) @@ -812,19 +811,19 @@ def test_union(self, closed, sort): result = index.union(other, sort=sort) tm.assert_index_equal(result, index) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, closed, sort): index = self.create_index(closed=closed) other = IntervalIndex.from_breaks(range(5, 13), closed=closed) expected = IntervalIndex.from_breaks(range(5, 11), closed=closed) result = index[::-1].intersection(other, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) result = other[::-1].intersection(index, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) @@ -842,14 +841,14 @@ def test_intersection(self, closed, sort): result = index.intersection(other, sort=sort) tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference(self, closed, sort): index = IntervalIndex.from_arrays([1, 0, 3, 2], [1, 2, 3, 4], closed=closed) result = index.difference(index[:1], sort=sort) expected = index[1:] - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) @@ -864,19 +863,19 @@ def test_difference(self, closed, sort): result = index.difference(other, sort=sort) tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference(self, closed, sort): index = self.create_index(closed=closed) result = index[1:].symmetric_difference(index[:-1], sort=sort) expected = IntervalIndex([index[0], index[-1]]) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) # GH 19101: empty result, same dtype result = index.symmetric_difference(index, sort=sort) expected = IntervalIndex(np.array([], dtype='int64'), closed=closed) - if sort: + if sort is None: tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) @@ -888,7 +887,7 @@ def test_symmetric_difference(self, closed, sort): @pytest.mark.parametrize('op_name', [ 'union', 'intersection', 'difference', 'symmetric_difference']) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_set_operation_errors(self, closed, op_name, sort): index = self.create_index(closed=closed) set_op = getattr(index, op_name) @@ -992,9 +991,11 @@ def test_comparison(self): self.index > 0 with pytest.raises(TypeError, match='unorderable types'): self.index <= 0 - with pytest.raises(TypeError): + msg = r"unorderable types: Interval\(\) > int\(\)" + with pytest.raises(TypeError, match=msg): self.index > np.arange(2) - with pytest.raises(ValueError): + msg = "Lengths must match to compare" + with pytest.raises(ValueError, match=msg): self.index > np.arange(3) def test_missing_values(self, closed): @@ -1004,7 +1005,9 @@ def test_missing_values(self, closed): [np.nan, 0, 1], [np.nan, 1, 2], closed=closed) assert idx.equals(idx2) - with pytest.raises(ValueError): + msg = ("missing values must be missing in the same location both left" + " and right sides") + with pytest.raises(ValueError, match=msg): IntervalIndex.from_arrays( [np.nan, 0, 1], np.array([0, 1, 2]), closed=closed) diff --git a/pandas/tests/indexes/interval/test_interval_tree.py b/pandas/tests/indexes/interval/test_interval_tree.py index 90722e66d8d8c..46b2d12015a22 100644 --- a/pandas/tests/indexes/interval/test_interval_tree.py +++ b/pandas/tests/indexes/interval/test_interval_tree.py @@ -171,3 +171,13 @@ def test_is_overlapping_trivial(self, closed, left, right): # GH 23309 tree = IntervalTree(left, right, closed=closed) assert tree.is_overlapping is False + + def test_construction_overflow(self): + # GH 25485 + left, right = np.arange(101), [np.iinfo(np.int64).max] * 101 + tree = IntervalTree(left, right) + + # pivot should be average of left/right medians + result = tree.root.pivot + expected = (50 + np.iinfo(np.int64).max) / 2 + assert result == expected diff --git a/pandas/tests/indexes/multi/test_analytics.py b/pandas/tests/indexes/multi/test_analytics.py index dca6180f39664..d5a6e9acaa5f3 100644 --- a/pandas/tests/indexes/multi/test_analytics.py +++ b/pandas/tests/indexes/multi/test_analytics.py @@ -3,7 +3,8 @@ import numpy as np import pytest -from pandas.compat import lrange +from pandas.compat import PY2, lrange +from pandas.compat.numpy import _np_version_under1p17 import pandas as pd from pandas import Index, MultiIndex, date_range, period_range @@ -13,8 +14,11 @@ def test_shift(idx): # GH8083 test the base class for shift - pytest.raises(NotImplementedError, idx.shift, 1) - pytest.raises(NotImplementedError, idx.shift, 1, 2) + msg = "Not supported for type MultiIndex" + with pytest.raises(NotImplementedError, match=msg): + idx.shift(1) + with pytest.raises(NotImplementedError, match=msg): + idx.shift(1, 2) def test_groupby(idx): @@ -50,25 +54,26 @@ def test_truncate(): result = index.truncate(before=1, after=2) assert len(result.levels[0]) == 2 - # after < before - pytest.raises(ValueError, index.truncate, 3, 1) + msg = "after < before" + with pytest.raises(ValueError, match=msg): + index.truncate(3, 1) def test_where(): i = MultiIndex.from_tuples([('A', 1), ('A', 2)]) - with pytest.raises(NotImplementedError): + msg = r"\.where is not supported for MultiIndex operations" + with pytest.raises(NotImplementedError, match=msg): i.where(True) -def test_where_array_like(): +@pytest.mark.parametrize('klass', [list, tuple, np.array, pd.Series]) +def test_where_array_like(klass): i = MultiIndex.from_tuples([('A', 1), ('A', 2)]) - klasses = [list, tuple, np.array, pd.Series] cond = [False, True] - - for klass in klasses: - with pytest.raises(NotImplementedError): - i.where(klass(cond)) + msg = r"\.where is not supported for MultiIndex operations" + with pytest.raises(NotImplementedError, match=msg): + i.where(klass(cond)) # TODO: reshape @@ -141,7 +146,8 @@ def test_take(idx): # if not isinstance(idx, # (DatetimeIndex, PeriodIndex, TimedeltaIndex)): # GH 10791 - with pytest.raises(AttributeError): + msg = "'MultiIndex' object has no attribute 'freq'" + with pytest.raises(AttributeError, match=msg): idx.freq @@ -199,7 +205,8 @@ def test_take_fill_value(): with pytest.raises(ValueError, match=msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with pytest.raises(IndexError): + msg = "index -5 is out of bounds for size 4" + with pytest.raises(IndexError, match=msg): idx.take(np.array([1, -5])) @@ -215,13 +222,15 @@ def test_sub(idx): first = idx # - now raises (previously was set op difference) - with pytest.raises(TypeError): + msg = "cannot perform __sub__ with this index type: MultiIndex" + with pytest.raises(TypeError, match=msg): first - idx[-3:] - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): idx[-3:] - first - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): idx[-3:] - first.tolist() - with pytest.raises(TypeError): + msg = "cannot perform __rsub__ with this index type: MultiIndex" + with pytest.raises(TypeError, match=msg): first.tolist() - idx[-3:] @@ -266,56 +275,35 @@ def test_map_dictlike(idx, mapper): tm.assert_index_equal(result, expected) +@pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") @pytest.mark.parametrize('func', [ np.exp, np.exp2, np.expm1, np.log, np.log2, np.log10, np.log1p, np.sqrt, np.sin, np.cos, np.tan, np.arcsin, np.arccos, np.arctan, np.sinh, np.cosh, np.tanh, np.arcsinh, np.arccosh, np.arctanh, np.deg2rad, np.rad2deg -]) -def test_numpy_ufuncs(func): +], ids=lambda func: func.__name__) +def test_numpy_ufuncs(idx, func): # test ufuncs of numpy. see: # http://docs.scipy.org/doc/numpy/reference/ufuncs.html - # copy and paste from idx fixture as pytest doesn't support - # parameters and fixtures at the same time. - major_axis = Index(['foo', 'bar', 'baz', 'qux']) - minor_axis = Index(['one', 'two']) - major_codes = np.array([0, 0, 1, 2, 3, 3]) - minor_codes = np.array([0, 1, 0, 1, 0, 1]) - index_names = ['first', 'second'] - - idx = MultiIndex( - levels=[major_axis, minor_axis], - codes=[major_codes, minor_codes], - names=index_names, - verify_integrity=False - ) - - with pytest.raises(Exception): - with np.errstate(all='ignore'): - func(idx) + if _np_version_under1p17: + expected_exception = AttributeError + msg = "'tuple' object has no attribute '{}'".format(func.__name__) + else: + expected_exception = TypeError + msg = ("loop of ufunc does not support argument 0 of type tuple which" + " has no callable {} method").format(func.__name__) + with pytest.raises(expected_exception, match=msg): + func(idx) @pytest.mark.parametrize('func', [ np.isfinite, np.isinf, np.isnan, np.signbit -]) -def test_numpy_type_funcs(func): - # for func in [np.isfinite, np.isinf, np.isnan, np.signbit]: - # copy and paste from idx fixture as pytest doesn't support - # parameters and fixtures at the same time. - major_axis = Index(['foo', 'bar', 'baz', 'qux']) - minor_axis = Index(['one', 'two']) - major_codes = np.array([0, 0, 1, 2, 3, 3]) - minor_codes = np.array([0, 1, 0, 1, 0, 1]) - index_names = ['first', 'second'] - - idx = MultiIndex( - levels=[major_axis, minor_axis], - codes=[major_codes, minor_codes], - names=index_names, - verify_integrity=False - ) - - with pytest.raises(Exception): +], ids=lambda func: func.__name__) +def test_numpy_type_funcs(idx, func): + msg = ("ufunc '{}' not supported for the input types, and the inputs" + " could not be safely coerced to any supported types according to" + " the casting rule ''safe''").format(func.__name__) + with pytest.raises(TypeError, match=msg): func(idx) diff --git a/pandas/tests/indexes/multi/test_compat.py b/pandas/tests/indexes/multi/test_compat.py index f405fc659c709..89685b9feec27 100644 --- a/pandas/tests/indexes/multi/test_compat.py +++ b/pandas/tests/indexes/multi/test_compat.py @@ -124,8 +124,6 @@ def test_compat(indices): def test_pickle_compat_construction(holder): # this is testing for pickle compat - if holder is None: - return - # need an object to create with - pytest.raises(TypeError, holder) + with pytest.raises(TypeError, match="Must pass both levels and codes"): + holder() diff --git a/pandas/tests/indexes/multi/test_constructor.py b/pandas/tests/indexes/multi/test_constructor.py index e6678baf8a996..fe90e85cf93c8 100644 --- a/pandas/tests/indexes/multi/test_constructor.py +++ b/pandas/tests/indexes/multi/test_constructor.py @@ -1,7 +1,6 @@ # -*- coding: utf-8 -*- from collections import OrderedDict -import re import numpy as np import pytest @@ -30,10 +29,10 @@ def test_constructor_no_levels(): with pytest.raises(ValueError, match=msg): MultiIndex(levels=[], codes=[]) - both_re = re.compile('Must pass both levels and codes') - with pytest.raises(TypeError, match=both_re): + msg = "Must pass both levels and codes" + with pytest.raises(TypeError, match=msg): MultiIndex(levels=[]) - with pytest.raises(TypeError, match=both_re): + with pytest.raises(TypeError, match=msg): MultiIndex(codes=[]) @@ -42,8 +41,8 @@ def test_constructor_nonhashable_names(): levels = [[1, 2], [u'one', u'two']] codes = [[0, 0, 1, 1], [0, 1, 0, 1]] names = (['foo'], ['bar']) - message = "MultiIndex.name must be a hashable type" - with pytest.raises(TypeError, match=message): + msg = r"MultiIndex\.name must be a hashable type" + with pytest.raises(TypeError, match=msg): MultiIndex(levels=levels, codes=codes, names=names) # With .rename() @@ -51,11 +50,11 @@ def test_constructor_nonhashable_names(): codes=[[0, 0, 1, 1], [0, 1, 0, 1]], names=('foo', 'bar')) renamed = [['foor'], ['barr']] - with pytest.raises(TypeError, match=message): + with pytest.raises(TypeError, match=msg): mi.rename(names=renamed) # With .set_names() - with pytest.raises(TypeError, match=message): + with pytest.raises(TypeError, match=msg): mi.set_names(names=renamed) @@ -67,8 +66,9 @@ def test_constructor_mismatched_codes_levels(idx): with pytest.raises(ValueError, match=msg): MultiIndex(levels=levels, codes=codes) - length_error = re.compile('>= length of level') - label_error = re.compile(r'Unequal code lengths: \[4, 2\]') + length_error = (r"On level 0, code max \(3\) >= length of level \(1\)\." + " NOTE: this index is in an inconsistent state") + label_error = r"Unequal code lengths: \[4, 2\]" # important to check that it's looking at the right thing. with pytest.raises(ValueError, match=length_error): @@ -142,6 +142,15 @@ def test_from_arrays_iterator(idx): MultiIndex.from_arrays(0) +def test_from_arrays_tuples(idx): + arrays = tuple(tuple(np.asarray(lev).take(level_codes)) + for lev, level_codes in zip(idx.levels, idx.codes)) + + # tuple of tuples as input + result = MultiIndex.from_arrays(arrays, names=idx.names) + tm.assert_index_equal(result, idx) + + def test_from_arrays_index_series_datetimetz(): idx1 = pd.date_range('2015-01-01 10:00', freq='D', periods=3, tz='US/Eastern') @@ -253,21 +262,16 @@ def test_from_arrays_empty(): tm.assert_index_equal(result, expected) -@pytest.mark.parametrize('invalid_array', [ - (1), - ([1]), - ([1, 2]), - ([[1], 2]), - ('a'), - (['a']), - (['a', 'b']), - ([['a'], 'b']), +@pytest.mark.parametrize('invalid_sequence_of_arrays', [ + 1, [1], [1, 2], [[1], 2], [1, [2]], 'a', ['a'], ['a', 'b'], [['a'], 'b'], + (1,), (1, 2), ([1], 2), (1, [2]), 'a', ('a',), ('a', 'b'), (['a'], 'b'), + [(1,), 2], [1, (2,)], [('a',), 'b'], + ((1,), 2), (1, (2,)), (('a',), 'b') ]) -def test_from_arrays_invalid_input(invalid_array): - invalid_inputs = [1, [1], [1, 2], [[1], 2], - 'a', ['a'], ['a', 'b'], [['a'], 'b']] - for i in invalid_inputs: - pytest.raises(TypeError, MultiIndex.from_arrays, arrays=i) +def test_from_arrays_invalid_input(invalid_sequence_of_arrays): + msg = "Input must be a list / sequence of array-likes" + with pytest.raises(TypeError, match=msg): + MultiIndex.from_arrays(arrays=invalid_sequence_of_arrays) @pytest.mark.parametrize('idx1, idx2', [ @@ -332,9 +336,10 @@ def test_tuples_with_name_string(): # GH 15110 and GH 14848 li = [(0, 0, 1), (0, 1, 0), (1, 0, 0)] - with pytest.raises(ValueError): + msg = "Names should be list-like for a MultiIndex" + with pytest.raises(ValueError, match=msg): pd.Index(li, name='abc') - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=msg): pd.Index(li, name='a') @@ -398,7 +403,10 @@ def test_from_product_empty_three_levels(N): [['a'], 'b'], ]) def test_from_product_invalid_input(invalid_input): - pytest.raises(TypeError, MultiIndex.from_product, iterables=invalid_input) + msg = (r"Input must be a list / sequence of iterables|" + "Input must be list-like") + with pytest.raises(TypeError, match=msg): + MultiIndex.from_product(iterables=invalid_input) def test_from_product_datetimeindex(): @@ -563,15 +571,15 @@ def test_from_frame_valid_names(names_in, names_out): assert mi.names == names_out -@pytest.mark.parametrize('names_in,names_out', [ - ('bad_input', ValueError("Names should be list-like for a MultiIndex")), - (['a', 'b', 'c'], ValueError("Length of names must match number of " - "levels in MultiIndex.")) +@pytest.mark.parametrize('names,expected_error_msg', [ + ('bad_input', "Names should be list-like for a MultiIndex"), + (['a', 'b', 'c'], + "Length of names must match number of levels in MultiIndex") ]) -def test_from_frame_invalid_names(names_in, names_out): +def test_from_frame_invalid_names(names, expected_error_msg): # GH 22420 df = pd.DataFrame([['a', 'a'], ['a', 'b'], ['b', 'a'], ['b', 'b']], columns=pd.MultiIndex.from_tuples([('L1', 'x'), ('L2', 'y')])) - with pytest.raises(type(names_out), match=names_out.args[0]): - pd.MultiIndex.from_frame(df, names=names_in) + with pytest.raises(ValueError, match=expected_error_msg): + pd.MultiIndex.from_frame(df, names=names) diff --git a/pandas/tests/indexes/multi/test_contains.py b/pandas/tests/indexes/multi/test_contains.py index b73ff11a4dd4e..56836b94a6b03 100644 --- a/pandas/tests/indexes/multi/test_contains.py +++ b/pandas/tests/indexes/multi/test_contains.py @@ -83,15 +83,24 @@ def test_isin_level_kwarg(): tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level=1)) tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level=-1)) - pytest.raises(IndexError, idx.isin, vals_0, level=5) - pytest.raises(IndexError, idx.isin, vals_0, level=-5) - - pytest.raises(KeyError, idx.isin, vals_0, level=1.0) - pytest.raises(KeyError, idx.isin, vals_1, level=-1.0) - pytest.raises(KeyError, idx.isin, vals_1, level='A') + msg = "Too many levels: Index has only 2 levels, not 6" + with pytest.raises(IndexError, match=msg): + idx.isin(vals_0, level=5) + msg = ("Too many levels: Index has only 2 levels, -5 is not a valid level" + " number") + with pytest.raises(IndexError, match=msg): + idx.isin(vals_0, level=-5) + + with pytest.raises(KeyError, match=r"'Level 1\.0 not found'"): + idx.isin(vals_0, level=1.0) + with pytest.raises(KeyError, match=r"'Level -1\.0 not found'"): + idx.isin(vals_1, level=-1.0) + with pytest.raises(KeyError, match="'Level A not found'"): + idx.isin(vals_1, level='A') idx.names = ['A', 'B'] tm.assert_numpy_array_equal(expected, idx.isin(vals_0, level='A')) tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level='B')) - pytest.raises(KeyError, idx.isin, vals_1, level='C') + with pytest.raises(KeyError, match="'Level C not found'"): + idx.isin(vals_1, level='C') diff --git a/pandas/tests/indexes/multi/test_drop.py b/pandas/tests/indexes/multi/test_drop.py index 0cf73d3d752ad..ac167c126fd13 100644 --- a/pandas/tests/indexes/multi/test_drop.py +++ b/pandas/tests/indexes/multi/test_drop.py @@ -4,7 +4,7 @@ import numpy as np import pytest -from pandas.compat import lrange +from pandas.compat import PY2, lrange from pandas.errors import PerformanceWarning import pandas as pd @@ -12,6 +12,7 @@ import pandas.util.testing as tm +@pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_drop(idx): dropped = idx.drop([('foo', 'two'), ('qux', 'one')]) @@ -31,13 +32,17 @@ def test_drop(idx): tm.assert_index_equal(dropped, expected) index = MultiIndex.from_tuples([('bar', 'two')]) - pytest.raises(KeyError, idx.drop, [('bar', 'two')]) - pytest.raises(KeyError, idx.drop, index) - pytest.raises(KeyError, idx.drop, ['foo', 'two']) + with pytest.raises(KeyError, match=r"^10$"): + idx.drop([('bar', 'two')]) + with pytest.raises(KeyError, match=r"^10$"): + idx.drop(index) + with pytest.raises(KeyError, match=r"^'two'$"): + idx.drop(['foo', 'two']) # partially correct argument mixed_index = MultiIndex.from_tuples([('qux', 'one'), ('bar', 'two')]) - pytest.raises(KeyError, idx.drop, mixed_index) + with pytest.raises(KeyError, match=r"^10$"): + idx.drop(mixed_index) # error='ignore' dropped = idx.drop(index, errors='ignore') @@ -59,7 +64,8 @@ def test_drop(idx): # mixed partial / full drop / error='ignore' mixed_index = ['foo', ('qux', 'one'), 'two'] - pytest.raises(KeyError, idx.drop, mixed_index) + with pytest.raises(KeyError, match=r"^'two'$"): + idx.drop(mixed_index) dropped = idx.drop(mixed_index, errors='ignore') expected = idx[[2, 3, 5]] tm.assert_index_equal(dropped, expected) @@ -98,10 +104,12 @@ def test_droplevel_list(): expected = index[:2] assert dropped.equals(expected) - with pytest.raises(ValueError): + msg = ("Cannot remove 3 levels from an index with 3 levels: at least one" + " level must be left") + with pytest.raises(ValueError, match=msg): index[:2].droplevel(['one', 'two', 'three']) - with pytest.raises(KeyError): + with pytest.raises(KeyError, match="'Level four not found'"): index[:2].droplevel(['one', 'four']) diff --git a/pandas/tests/indexes/multi/test_duplicates.py b/pandas/tests/indexes/multi/test_duplicates.py index af15026de2b34..35034dc57b4b8 100644 --- a/pandas/tests/indexes/multi/test_duplicates.py +++ b/pandas/tests/indexes/multi/test_duplicates.py @@ -143,6 +143,18 @@ def test_has_duplicates(idx, idx_dup): assert mi.is_unique is False assert mi.has_duplicates is True + # single instance of NaN + mi_nan = MultiIndex(levels=[['a', 'b'], [0, 1]], + codes=[[-1, 0, 0, 1, 1], [-1, 0, 1, 0, 1]]) + assert mi_nan.is_unique is True + assert mi_nan.has_duplicates is False + + # multiple instances of NaN + mi_nan_dup = MultiIndex(levels=[['a', 'b'], [0, 1]], + codes=[[-1, -1, 0, 0, 1, 1], [-1, -1, 0, 1, 0, 1]]) + assert mi_nan_dup.is_unique is False + assert mi_nan_dup.has_duplicates is True + def test_has_duplicates_from_tuples(): # GH 9075 diff --git a/pandas/tests/indexes/multi/test_get_set.py b/pandas/tests/indexes/multi/test_get_set.py index d201cb2eb178b..62911c7032aca 100644 --- a/pandas/tests/indexes/multi/test_get_set.py +++ b/pandas/tests/indexes/multi/test_get_set.py @@ -25,7 +25,9 @@ def test_get_level_number_integer(idx): idx.names = [1, 0] assert idx._get_level_number(1) == 0 assert idx._get_level_number(0) == 1 - pytest.raises(IndexError, idx._get_level_number, 2) + msg = "Too many levels: Index has only 2 levels, not 3" + with pytest.raises(IndexError, match=msg): + idx._get_level_number(2) with pytest.raises(KeyError, match='Level fourth not found'): idx._get_level_number('fourth') @@ -62,7 +64,7 @@ def test_get_value_duplicates(): names=['tag', 'day']) assert index.get_loc('D') == slice(0, 3) - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^'D'$"): index._engine.get_value(np.array([]), 'D') @@ -125,7 +127,8 @@ def test_set_name_methods(idx, index_names): ind = idx.set_names(new_names) assert idx.names == index_names assert ind.names == new_names - with pytest.raises(ValueError, match="^Length"): + msg = "Length of names must match number of levels in MultiIndex" + with pytest.raises(ValueError, match=msg): ind.set_names(new_names + new_names) new_names2 = [name + "SUFFIX2" for name in new_names] res = ind.set_names(new_names2, inplace=True) @@ -163,10 +166,10 @@ def test_set_levels_codes_directly(idx): minor_codes = [(x + 1) % 1 for x in minor_codes] new_codes = [major_codes, minor_codes] - with pytest.raises(AttributeError): + msg = "can't set attribute" + with pytest.raises(AttributeError, match=msg): idx.levels = new_levels - - with pytest.raises(AttributeError): + with pytest.raises(AttributeError, match=msg): idx.codes = new_codes diff --git a/pandas/tests/indexes/multi/test_indexing.py b/pandas/tests/indexes/multi/test_indexing.py index c40ecd9e82a07..c2af3b2050d8d 100644 --- a/pandas/tests/indexes/multi/test_indexing.py +++ b/pandas/tests/indexes/multi/test_indexing.py @@ -6,7 +6,7 @@ import numpy as np import pytest -from pandas.compat import lrange +from pandas.compat import PY2, lrange import pandas as pd from pandas import ( @@ -112,13 +112,14 @@ def test_slice_locs_not_contained(): def test_putmask_with_wrong_mask(idx): # GH18368 - with pytest.raises(ValueError): + msg = "putmask: mask and data must be the same size" + with pytest.raises(ValueError, match=msg): idx.putmask(np.ones(len(idx) + 1, np.bool), 1) - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=msg): idx.putmask(np.ones(len(idx) - 1, np.bool), 1) - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=msg): idx.putmask('foo', 1) @@ -176,9 +177,12 @@ def test_get_indexer(): def test_get_indexer_nearest(): midx = MultiIndex.from_tuples([('a', 1), ('b', 2)]) - with pytest.raises(NotImplementedError): + msg = ("method='nearest' not implemented yet for MultiIndex; see GitHub" + " issue 9365") + with pytest.raises(NotImplementedError, match=msg): midx.get_indexer(['a'], method='nearest') - with pytest.raises(NotImplementedError): + msg = "tolerance not implemented yet for MultiIndex" + with pytest.raises(NotImplementedError, match=msg): midx.get_indexer(['a'], method='pad', tolerance=2) @@ -251,20 +255,26 @@ def test_getitem_bool_index_single(ind1, ind2): tm.assert_index_equal(idx[ind2], expected) +@pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_get_loc(idx): assert idx.get_loc(('foo', 'two')) == 1 assert idx.get_loc(('baz', 'two')) == 3 - pytest.raises(KeyError, idx.get_loc, ('bar', 'two')) - pytest.raises(KeyError, idx.get_loc, 'quux') + with pytest.raises(KeyError, match=r"^10$"): + idx.get_loc(('bar', 'two')) + with pytest.raises(KeyError, match=r"^'quux'$"): + idx.get_loc('quux') - pytest.raises(NotImplementedError, idx.get_loc, 'foo', - method='nearest') + msg = ("only the default get_loc method is currently supported for" + " MultiIndex") + with pytest.raises(NotImplementedError, match=msg): + idx.get_loc('foo', method='nearest') # 3 levels index = MultiIndex(levels=[Index(lrange(4)), Index(lrange(4)), Index( lrange(4))], codes=[np.array([0, 0, 1, 2, 2, 2, 3, 3]), np.array( [0, 1, 0, 0, 0, 1, 0, 1]), np.array([1, 0, 1, 1, 0, 0, 1, 0])]) - pytest.raises(KeyError, index.get_loc, (1, 1)) + with pytest.raises(KeyError, match=r"^\(1, 1\)$"): + index.get_loc((1, 1)) assert index.get_loc((2, 0)) == slice(3, 5) @@ -297,11 +307,14 @@ def test_get_loc_level(): assert loc == expected assert new_index is None - pytest.raises(KeyError, index.get_loc_level, (2, 2)) + with pytest.raises(KeyError, match=r"^\(2, 2\)$"): + index.get_loc_level((2, 2)) # GH 22221: unused label - pytest.raises(KeyError, index.drop(2).get_loc_level, 2) + with pytest.raises(KeyError, match=r"^2$"): + index.drop(2).get_loc_level(2) # Unused label on unsorted level: - pytest.raises(KeyError, index.drop(1, level=2).get_loc_level, 2, 2) + with pytest.raises(KeyError, match=r"^2$"): + index.drop(1, level=2).get_loc_level(2, level=2) index = MultiIndex(levels=[[2000], lrange(4)], codes=[np.array( [0, 0, 0, 0]), np.array([0, 1, 2, 3])]) @@ -342,8 +355,10 @@ def test_get_loc_cast_bool(): assert idx.get_loc((0, 1)) == 1 assert idx.get_loc((1, 0)) == 2 - pytest.raises(KeyError, idx.get_loc, (False, True)) - pytest.raises(KeyError, idx.get_loc, (True, False)) + with pytest.raises(KeyError, match=r"^\(False, True\)$"): + idx.get_loc((False, True)) + with pytest.raises(KeyError, match=r"^\(True, False\)$"): + idx.get_loc((True, False)) @pytest.mark.parametrize('level', [0, 1]) @@ -361,9 +376,12 @@ def test_get_loc_missing_nan(): # GH 8569 idx = MultiIndex.from_arrays([[1.0, 2.0], [3.0, 4.0]]) assert isinstance(idx.get_loc(1), slice) - pytest.raises(KeyError, idx.get_loc, 3) - pytest.raises(KeyError, idx.get_loc, np.nan) - pytest.raises(KeyError, idx.get_loc, [np.nan]) + with pytest.raises(KeyError, match=r"^3\.0$"): + idx.get_loc(3) + with pytest.raises(KeyError, match=r"^nan$"): + idx.get_loc(np.nan) + with pytest.raises(KeyError, match=r"^\[nan\]$"): + idx.get_loc([np.nan]) def test_get_indexer_categorical_time(): diff --git a/pandas/tests/indexes/multi/test_integrity.py b/pandas/tests/indexes/multi/test_integrity.py index c1638a9cde660..a7dc093147725 100644 --- a/pandas/tests/indexes/multi/test_integrity.py +++ b/pandas/tests/indexes/multi/test_integrity.py @@ -159,7 +159,8 @@ def test_isna_behavior(idx): # should not segfault GH5123 # NOTE: if MI representation changes, may make sense to allow # isna(MI) - with pytest.raises(NotImplementedError): + msg = "isna is not defined for MultiIndex" + with pytest.raises(NotImplementedError, match=msg): pd.isna(idx) @@ -168,16 +169,16 @@ def test_large_multiindex_error(): df_below_1000000 = pd.DataFrame( 1, index=pd.MultiIndex.from_product([[1, 2], range(499999)]), columns=['dest']) - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^\(-1, 0\)$"): df_below_1000000.loc[(-1, 0), 'dest'] - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^\(3, 0\)$"): df_below_1000000.loc[(3, 0), 'dest'] df_above_1000000 = pd.DataFrame( 1, index=pd.MultiIndex.from_product([[1, 2], range(500001)]), columns=['dest']) - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^\(-1, 0\)$"): df_above_1000000.loc[(-1, 0), 'dest'] - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^\(3, 0\)$"): df_above_1000000.loc[(3, 0), 'dest'] @@ -260,7 +261,9 @@ def test_hash_error(indices): def test_mutability(indices): if not len(indices): return - pytest.raises(TypeError, indices.__setitem__, 0, indices[0]) + msg = "Index does not support mutable operations" + with pytest.raises(TypeError, match=msg): + indices[0] = indices[0] def test_wrong_number_names(indices): diff --git a/pandas/tests/indexes/multi/test_set_ops.py b/pandas/tests/indexes/multi/test_set_ops.py index 208d6cf1c639f..41a0e1e59e8a5 100644 --- a/pandas/tests/indexes/multi/test_set_ops.py +++ b/pandas/tests/indexes/multi/test_set_ops.py @@ -9,7 +9,7 @@ @pytest.mark.parametrize("case", [0.5, "xxx"]) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) @pytest.mark.parametrize("method", ["intersection", "union", "difference", "symmetric_difference"]) def test_set_ops_error_cases(idx, case, sort, method): @@ -19,13 +19,13 @@ def test_set_ops_error_cases(idx, case, sort, method): getattr(idx, method)(case, sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_intersection_base(idx, sort): first = idx[:5] second = idx[:3] intersect = first.intersection(second, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(intersect, second.sort_values()) assert tm.equalContents(intersect, second) @@ -34,7 +34,7 @@ def test_intersection_base(idx, sort): for klass in [np.array, Series, list]] for case in cases: result = first.intersection(case, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, second.sort_values()) assert tm.equalContents(result, second) @@ -43,13 +43,13 @@ def test_intersection_base(idx, sort): first.intersection([1, 2, 3], sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_union_base(idx, sort): first = idx[3:] second = idx[:5] everything = idx union = first.union(second, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(union, everything.sort_values()) assert tm.equalContents(union, everything) @@ -58,7 +58,7 @@ def test_union_base(idx, sort): for klass in [np.array, Series, list]] for case in cases: result = first.union(case, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, everything.sort_values()) assert tm.equalContents(result, everything) @@ -67,13 +67,13 @@ def test_union_base(idx, sort): first.union([1, 2, 3], sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_difference_base(idx, sort): second = idx[4:] answer = idx[:4] result = idx.difference(second, sort=sort) - if sort: + if sort is None: answer = answer.sort_values() assert result.equals(answer) @@ -91,14 +91,14 @@ def test_difference_base(idx, sort): idx.difference([1, 2, 3], sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference(idx, sort): first = idx[1:] second = idx[:-1] answer = idx[[-1, 0]] result = first.symmetric_difference(second, sort=sort) - if sort: + if sort is None: answer = answer.sort_values() tm.assert_index_equal(result, answer) @@ -121,14 +121,14 @@ def test_empty(idx): assert idx[:0].empty -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_difference(idx, sort): first = idx result = first.difference(idx[-3:], sort=sort) vals = idx[:-3].values - if sort: + if sort is None: vals = sorted(vals) expected = MultiIndex.from_tuples(vals, @@ -189,14 +189,62 @@ def test_difference(idx, sort): first.difference([1, 2, 3, 4, 5], sort=sort) -@pytest.mark.parametrize("sort", [True, False]) +def test_difference_sort_special(): + # GH-24959 + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + # sort=None, the default + result = idx.difference([]) + tm.assert_index_equal(result, idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_difference_sort_special_true(): + # TODO decide on True behaviour + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + result = idx.difference([], sort=True) + expected = pd.MultiIndex.from_product([[0, 1], ['a', 'b']]) + tm.assert_index_equal(result, expected) + + +def test_difference_sort_incomparable(): + # GH-24959 + idx = pd.MultiIndex.from_product([[1, pd.Timestamp('2000'), 2], + ['a', 'b']]) + + other = pd.MultiIndex.from_product([[3, pd.Timestamp('2000'), 4], + ['c', 'd']]) + # sort=None, the default + # MultiIndex.difference deviates here from other difference + # implementations in not catching the TypeError + with pytest.raises(TypeError): + result = idx.difference(other) + + # sort=False + result = idx.difference(other, sort=False) + tm.assert_index_equal(result, idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_difference_sort_incomparable_true(): + # TODO decide on True behaviour + # # sort=True, raises + idx = pd.MultiIndex.from_product([[1, pd.Timestamp('2000'), 2], + ['a', 'b']]) + other = pd.MultiIndex.from_product([[3, pd.Timestamp('2000'), 4], + ['c', 'd']]) + + with pytest.raises(TypeError): + idx.difference(other, sort=True) + + +@pytest.mark.parametrize("sort", [None, False]) def test_union(idx, sort): piece1 = idx[:5][::-1] piece2 = idx[3:] the_union = piece1.union(piece2, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(the_union, idx.sort_values()) assert tm.equalContents(the_union, idx) @@ -225,14 +273,14 @@ def test_union(idx, sort): # assert result.equals(result2) -@pytest.mark.parametrize("sort", [True, False]) +@pytest.mark.parametrize("sort", [None, False]) def test_intersection(idx, sort): piece1 = idx[:5][::-1] piece2 = idx[3:] the_int = piece1.intersection(piece2, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(the_int, idx[3:5]) assert tm.equalContents(the_int, idx[3:5]) @@ -249,3 +297,76 @@ def test_intersection(idx, sort): # tuples = _index.values # result = _index & tuples # assert result.equals(tuples) + + +def test_intersect_equal_sort(): + # GH-24959 + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + tm.assert_index_equal(idx.intersection(idx, sort=False), idx) + tm.assert_index_equal(idx.intersection(idx, sort=None), idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_intersect_equal_sort_true(): + # TODO decide on True behaviour + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + sorted_ = pd.MultiIndex.from_product([[0, 1], ['a', 'b']]) + tm.assert_index_equal(idx.intersection(idx, sort=True), sorted_) + + +@pytest.mark.parametrize('slice_', [slice(None), slice(0)]) +def test_union_sort_other_empty(slice_): + # https://github.com/pandas-dev/pandas/issues/24959 + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + + # default, sort=None + other = idx[slice_] + tm.assert_index_equal(idx.union(other), idx) + # MultiIndex does not special case empty.union(idx) + # tm.assert_index_equal(other.union(idx), idx) + + # sort=False + tm.assert_index_equal(idx.union(other, sort=False), idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_union_sort_other_empty_sort(slice_): + # TODO decide on True behaviour + # # sort=True + idx = pd.MultiIndex.from_product([[1, 0], ['a', 'b']]) + other = idx[:0] + result = idx.union(other, sort=True) + expected = pd.MultiIndex.from_product([[0, 1], ['a', 'b']]) + tm.assert_index_equal(result, expected) + + +def test_union_sort_other_incomparable(): + # https://github.com/pandas-dev/pandas/issues/24959 + idx = pd.MultiIndex.from_product([[1, pd.Timestamp('2000')], ['a', 'b']]) + + # default, sort=None + result = idx.union(idx[:1]) + tm.assert_index_equal(result, idx) + + # sort=False + result = idx.union(idx[:1], sort=False) + tm.assert_index_equal(result, idx) + + +@pytest.mark.xfail(reason="Not implemented.") +def test_union_sort_other_incomparable_sort(): + # TODO decide on True behaviour + # # sort=True + idx = pd.MultiIndex.from_product([[1, pd.Timestamp('2000')], ['a', 'b']]) + with pytest.raises(TypeError, match='Cannot compare'): + idx.union(idx[:1], sort=True) + + +@pytest.mark.parametrize("method", ['union', 'intersection', 'difference', + 'symmetric_difference']) +def test_setops_disallow_true(method): + idx1 = pd.MultiIndex.from_product([['a', 'b'], [1, 2]]) + idx2 = pd.MultiIndex.from_product([['b', 'c'], [1, 2]]) + + with pytest.raises(ValueError, match="The 'sort' keyword only takes"): + getattr(idx1, method)(idx2, sort=True) diff --git a/pandas/tests/indexes/period/test_asfreq.py b/pandas/tests/indexes/period/test_asfreq.py index 2dd49e7e0845e..30b416e3fe9dd 100644 --- a/pandas/tests/indexes/period/test_asfreq.py +++ b/pandas/tests/indexes/period/test_asfreq.py @@ -67,7 +67,9 @@ def test_asfreq(self): assert pi7.asfreq('H', 'S') == pi5 assert pi7.asfreq('Min', 'S') == pi6 - pytest.raises(ValueError, pi7.asfreq, 'T', 'foo') + msg = "How must be one of S or E" + with pytest.raises(ValueError, match=msg): + pi7.asfreq('T', 'foo') result1 = pi1.asfreq('3M') result2 = pi1.asfreq('M') expected = period_range(freq='M', start='2001-12', end='2001-12') diff --git a/pandas/tests/indexes/period/test_construction.py b/pandas/tests/indexes/period/test_construction.py index 916260c4cee7e..f1adeca7245f6 100644 --- a/pandas/tests/indexes/period/test_construction.py +++ b/pandas/tests/indexes/period/test_construction.py @@ -1,6 +1,7 @@ import numpy as np import pytest +from pandas._libs.tslibs.period import IncompatibleFrequency from pandas.compat import PY3, lmap, lrange, text_type from pandas.core.dtypes.dtypes import PeriodDtype @@ -66,12 +67,17 @@ def test_constructor_field_arrays(self): years = [2007, 2007, 2007] months = [1, 2] - pytest.raises(ValueError, PeriodIndex, year=years, month=months, - freq='M') - pytest.raises(ValueError, PeriodIndex, year=years, month=months, - freq='2M') - pytest.raises(ValueError, PeriodIndex, year=years, month=months, - freq='M', start=Period('2007-01', freq='M')) + + msg = "Mismatched Period array lengths" + with pytest.raises(ValueError, match=msg): + PeriodIndex(year=years, month=months, freq='M') + with pytest.raises(ValueError, match=msg): + PeriodIndex(year=years, month=months, freq='2M') + + msg = "Can either instantiate from fields or endpoints, but not both" + with pytest.raises(ValueError, match=msg): + PeriodIndex(year=years, month=months, freq='M', + start=Period('2007-01', freq='M')) years = [2007, 2007, 2007] months = [1, 2, 3] @@ -81,8 +87,8 @@ def test_constructor_field_arrays(self): def test_constructor_U(self): # U was used as undefined period - pytest.raises(ValueError, period_range, '2007-1-1', periods=500, - freq='X') + with pytest.raises(ValueError, match="Invalid frequency: X"): + period_range('2007-1-1', periods=500, freq='X') def test_constructor_nano(self): idx = period_range(start=Period(ordinal=1, freq='N'), @@ -103,17 +109,29 @@ def test_constructor_arrays_negative_year(self): tm.assert_index_equal(pindex.quarter, pd.Index(quarters)) def test_constructor_invalid_quarters(self): - pytest.raises(ValueError, PeriodIndex, year=lrange(2000, 2004), - quarter=lrange(4), freq='Q-DEC') + msg = "Quarter must be 1 <= q <= 4" + with pytest.raises(ValueError, match=msg): + PeriodIndex(year=lrange(2000, 2004), quarter=lrange(4), + freq='Q-DEC') def test_constructor_corner(self): - pytest.raises(ValueError, PeriodIndex, periods=10, freq='A') + msg = "Not enough parameters to construct Period range" + with pytest.raises(ValueError, match=msg): + PeriodIndex(periods=10, freq='A') start = Period('2007', freq='A-JUN') end = Period('2010', freq='A-DEC') - pytest.raises(ValueError, PeriodIndex, start=start, end=end) - pytest.raises(ValueError, PeriodIndex, start=start) - pytest.raises(ValueError, PeriodIndex, end=end) + + msg = "start and end must have same freq" + with pytest.raises(ValueError, match=msg): + PeriodIndex(start=start, end=end) + + msg = ("Of the three parameters: start, end, and periods, exactly two" + " must be specified") + with pytest.raises(ValueError, match=msg): + PeriodIndex(start=start) + with pytest.raises(ValueError, match=msg): + PeriodIndex(end=end) result = period_range('2007-01', periods=10.5, freq='M') exp = period_range('2007-01', periods=10, freq='M') @@ -126,10 +144,15 @@ def test_constructor_fromarraylike(self): tm.assert_index_equal(PeriodIndex(idx.values), idx) tm.assert_index_equal(PeriodIndex(list(idx.values)), idx) - pytest.raises(ValueError, PeriodIndex, idx._ndarray_values) - pytest.raises(ValueError, PeriodIndex, list(idx._ndarray_values)) - pytest.raises(TypeError, PeriodIndex, - data=Period('2007', freq='A')) + msg = "freq not specified and cannot be inferred" + with pytest.raises(ValueError, match=msg): + PeriodIndex(idx._ndarray_values) + with pytest.raises(ValueError, match=msg): + PeriodIndex(list(idx._ndarray_values)) + + msg = "'Period' object is not iterable" + with pytest.raises(TypeError, match=msg): + PeriodIndex(data=Period('2007', freq='A')) result = PeriodIndex(iter(idx)) tm.assert_index_equal(result, idx) @@ -160,7 +183,9 @@ def test_constructor_datetime64arr(self): vals = np.arange(100000, 100000 + 10000, 100, dtype=np.int64) vals = vals.view(np.dtype('M8[us]')) - pytest.raises(ValueError, PeriodIndex, vals, freq='D') + msg = r"Wrong dtype: datetime64\[us\]" + with pytest.raises(ValueError, match=msg): + PeriodIndex(vals, freq='D') @pytest.mark.parametrize('box', [None, 'series', 'index']) def test_constructor_datetime64arr_ok(self, box): @@ -300,17 +325,20 @@ def test_constructor_simple_new_empty(self): @pytest.mark.parametrize('floats', [[1.1, 2.1], np.array([1.1, 2.1])]) def test_constructor_floats(self, floats): - with pytest.raises(TypeError): + msg = r"PeriodIndex\._simple_new does not accept floats" + with pytest.raises(TypeError, match=msg): pd.PeriodIndex._simple_new(floats, freq='M') - with pytest.raises(TypeError): + msg = "PeriodIndex does not allow floating point in construction" + with pytest.raises(TypeError, match=msg): pd.PeriodIndex(floats, freq='M') def test_constructor_nat(self): - pytest.raises(ValueError, period_range, start='NaT', - end='2011-01-01', freq='M') - pytest.raises(ValueError, period_range, start='2011-01-01', - end='NaT', freq='M') + msg = "start and end must not be NaT" + with pytest.raises(ValueError, match=msg): + period_range(start='NaT', end='2011-01-01', freq='M') + with pytest.raises(ValueError, match=msg): + period_range(start='2011-01-01', end='NaT', freq='M') def test_constructor_year_and_quarter(self): year = pd.Series([2001, 2002, 2003]) @@ -455,9 +483,12 @@ def test_constructor(self): # Mixed freq should fail vals = [end_intv, Period('2006-12-31', 'w')] - pytest.raises(ValueError, PeriodIndex, vals) + msg = r"Input has different freq=W-SUN from PeriodIndex\(freq=B\)" + with pytest.raises(IncompatibleFrequency, match=msg): + PeriodIndex(vals) vals = np.array(vals) - pytest.raises(ValueError, PeriodIndex, vals) + with pytest.raises(IncompatibleFrequency, match=msg): + PeriodIndex(vals) def test_constructor_error(self): start = Period('02-Apr-2005', 'B') @@ -508,7 +539,8 @@ def setup_method(self, method): self.series = Series(period_range('2000-01-01', periods=10, freq='D')) def test_constructor_cant_cast_period(self): - with pytest.raises(TypeError): + msg = "Cannot cast PeriodArray to dtype float64" + with pytest.raises(TypeError, match=msg): Series(period_range('2000-01-01', periods=10, freq='D'), dtype=float) diff --git a/pandas/tests/indexes/period/test_indexing.py b/pandas/tests/indexes/period/test_indexing.py index 47c2edfd13395..fa8199b4e6163 100644 --- a/pandas/tests/indexes/period/test_indexing.py +++ b/pandas/tests/indexes/period/test_indexing.py @@ -84,7 +84,8 @@ def test_getitem_partial(self): rng = period_range('2007-01', periods=50, freq='M') ts = Series(np.random.randn(len(rng)), rng) - pytest.raises(KeyError, ts.__getitem__, '2006') + with pytest.raises(KeyError, match=r"^'2006'$"): + ts['2006'] result = ts['2008'] assert (result.index.year == 2008).all() @@ -326,7 +327,8 @@ def test_take_fill_value(self): with pytest.raises(ValueError, match=msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with pytest.raises(IndexError): + msg = "index -5 is out of bounds for size 3" + with pytest.raises(IndexError, match=msg): idx.take(np.array([1, -5])) @@ -335,7 +337,8 @@ class TestIndexing(object): def test_get_loc_msg(self): idx = period_range('2000-1-1', freq='A', periods=10) bad_period = Period('2012', 'A') - pytest.raises(KeyError, idx.get_loc, bad_period) + with pytest.raises(KeyError, match=r"^Period\('2012', 'A-DEC'\)$"): + idx.get_loc(bad_period) try: idx.get_loc(bad_period) @@ -373,8 +376,13 @@ def test_get_loc(self): msg = "Cannot interpret 'foo' as period" with pytest.raises(KeyError, match=msg): idx0.get_loc('foo') - pytest.raises(KeyError, idx0.get_loc, 1.1) - pytest.raises(TypeError, idx0.get_loc, idx0) + with pytest.raises(KeyError, match=r"^1\.1$"): + idx0.get_loc(1.1) + + msg = (r"'PeriodIndex\(\['2017-09-01', '2017-09-02', '2017-09-03'\]," + r" dtype='period\[D\]', freq='D'\)' is an invalid key") + with pytest.raises(TypeError, match=msg): + idx0.get_loc(idx0) # get the location of p1/p2 from # monotonic increasing PeriodIndex with duplicate @@ -391,8 +399,13 @@ def test_get_loc(self): with pytest.raises(KeyError, match=msg): idx1.get_loc('foo') - pytest.raises(KeyError, idx1.get_loc, 1.1) - pytest.raises(TypeError, idx1.get_loc, idx1) + with pytest.raises(KeyError, match=r"^1\.1$"): + idx1.get_loc(1.1) + + msg = (r"'PeriodIndex\(\['2017-09-02', '2017-09-02', '2017-09-03'\]," + r" dtype='period\[D\]', freq='D'\)' is an invalid key") + with pytest.raises(TypeError, match=msg): + idx1.get_loc(idx1) # get the location of p1/p2 from # non-monotonic increasing/decreasing PeriodIndex with duplicate @@ -441,18 +454,6 @@ def test_is_monotonic_decreasing(self): assert idx_dec1.is_monotonic_decreasing is True assert idx.is_monotonic_decreasing is False - def test_is_unique(self): - # GH 17717 - p0 = pd.Period('2017-09-01') - p1 = pd.Period('2017-09-02') - p2 = pd.Period('2017-09-03') - - idx0 = pd.PeriodIndex([p0, p1, p2]) - assert idx0.is_unique is True - - idx1 = pd.PeriodIndex([p1, p1, p2]) - assert idx1.is_unique is False - def test_contains(self): # GH 17717 p0 = pd.Period('2017-09-01') @@ -581,7 +582,7 @@ def test_get_loc2(self): msg = 'Input has different freq=None from PeriodArray\\(freq=D\\)' with pytest.raises(ValueError, match=msg): idx.get_loc('2000-01-10', method='nearest', tolerance='1 hour') - with pytest.raises(KeyError): + with pytest.raises(KeyError, match=r"^Period\('2000-01-10', 'D'\)$"): idx.get_loc('2000-01-10', method='nearest', tolerance='1 day') with pytest.raises( ValueError, diff --git a/pandas/tests/indexes/period/test_period.py b/pandas/tests/indexes/period/test_period.py index 464ff7aa5d58d..89bcf56dbda71 100644 --- a/pandas/tests/indexes/period/test_period.py +++ b/pandas/tests/indexes/period/test_period.py @@ -71,13 +71,15 @@ def test_fillna_period(self): pd.Period('2011-01-01', freq='D')), exp) def test_no_millisecond_field(self): - with pytest.raises(AttributeError): + msg = "type object 'DatetimeIndex' has no attribute 'millisecond'" + with pytest.raises(AttributeError, match=msg): DatetimeIndex.millisecond - with pytest.raises(AttributeError): + msg = "'DatetimeIndex' object has no attribute 'millisecond'" + with pytest.raises(AttributeError, match=msg): DatetimeIndex([]).millisecond - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_freq(self, sort): # GH14323: difference of Period MUST preserve frequency # but the ability to union results must be preserved @@ -98,8 +100,8 @@ def test_difference_freq(self, sort): def test_hash_error(self): index = period_range('20010101', periods=10) - with pytest.raises(TypeError, match=("unhashable type: %r" % - type(index).__name__)): + msg = "unhashable type: '{}'".format(type(index).__name__) + with pytest.raises(TypeError, match=msg): hash(index) def test_make_time_series(self): @@ -124,7 +126,8 @@ def test_shallow_copy_i8(self): def test_shallow_copy_changing_freq_raises(self): pi = period_range("2018-01-01", periods=3, freq="2D") - with pytest.raises(IncompatibleFrequency, match="are different"): + msg = "specified freq and dtype are different" + with pytest.raises(IncompatibleFrequency, match=msg): pi._shallow_copy(pi, freq="H") def test_dtype_str(self): @@ -214,21 +217,17 @@ def test_period_index_length(self): assert (i1 == i2).all() assert i1.freq == i2.freq - try: + msg = "start and end must have same freq" + with pytest.raises(ValueError, match=msg): period_range(start=start, end=end_intv) - raise AssertionError('Cannot allow mixed freq for start and end') - except ValueError: - pass end_intv = Period('2005-05-01', 'B') i1 = period_range(start=start, end=end_intv) - try: + msg = ("Of the three parameters: start, end, and periods, exactly two" + " must be specified") + with pytest.raises(ValueError, match=msg): period_range(start=start) - raise AssertionError( - 'Must specify periods if missing start or end') - except ValueError: - pass # infer freq from first element i2 = PeriodIndex([end_intv, Period('2005-05-05', 'B')]) @@ -241,9 +240,12 @@ def test_period_index_length(self): # Mixed freq should fail vals = [end_intv, Period('2006-12-31', 'w')] - pytest.raises(ValueError, PeriodIndex, vals) + msg = r"Input has different freq=W-SUN from PeriodIndex\(freq=B\)" + with pytest.raises(IncompatibleFrequency, match=msg): + PeriodIndex(vals) vals = np.array(vals) - pytest.raises(ValueError, PeriodIndex, vals) + with pytest.raises(ValueError, match=msg): + PeriodIndex(vals) def test_fields(self): # year, month, day, hour, minute @@ -381,7 +383,9 @@ def test_contains_nat(self): assert np.nan in idx def test_periods_number_check(self): - with pytest.raises(ValueError): + msg = ("Of the three parameters: start, end, and periods, exactly two" + " must be specified") + with pytest.raises(ValueError, match=msg): period_range('2011-1-1', '2012-1-1', 'B') def test_start_time(self): @@ -500,7 +504,8 @@ def test_is_full(self): assert index.is_full index = PeriodIndex([2006, 2005, 2005], freq='A') - pytest.raises(ValueError, getattr, index, 'is_full') + with pytest.raises(ValueError, match="Index is not monotonic"): + index.is_full assert index[:0].is_full @@ -574,5 +579,6 @@ def test_maybe_convert_timedelta(): assert pi._maybe_convert_timedelta(2) == 2 offset = offsets.BusinessDay() - with pytest.raises(ValueError, match='freq'): + msg = r"Input has different freq=B from PeriodIndex\(freq=D\)" + with pytest.raises(ValueError, match=msg): pi._maybe_convert_timedelta(offset) diff --git a/pandas/tests/indexes/period/test_setops.py b/pandas/tests/indexes/period/test_setops.py index a97ab47bcda16..bf29edad4841e 100644 --- a/pandas/tests/indexes/period/test_setops.py +++ b/pandas/tests/indexes/period/test_setops.py @@ -38,7 +38,7 @@ def test_join_does_not_recur(self): df.columns[0], df.columns[1]], object) tm.assert_index_equal(res, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union(self, sort): # union other1 = pd.period_range('1/1/2000', freq='D', periods=5) @@ -97,11 +97,11 @@ def test_union(self, sort): (rng8, other8, expected8)]: result_union = rng.union(other, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result_union, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_misc(self, sort): index = period_range('1/1/2000', '1/20/2000', freq='D') @@ -110,7 +110,7 @@ def test_union_misc(self, sort): # not in order result = _permute(index[:-5]).union(_permute(index[10:]), sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, index) assert tm.equalContents(result, index) @@ -139,7 +139,7 @@ def test_union_dataframe_index(self): exp = pd.period_range('1/1/1980', '1/1/2012', freq='M') tm.assert_index_equal(df.index, exp) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, sort): index = period_range('1/1/2000', '1/20/2000', freq='D') @@ -150,7 +150,7 @@ def test_intersection(self, sort): left = _permute(index[:-5]) right = _permute(index[10:]) result = left.intersection(right, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, index[10:-5]) assert tm.equalContents(result, index[10:-5]) @@ -164,7 +164,7 @@ def test_intersection(self, sort): with pytest.raises(period.IncompatibleFrequency): index.intersection(index3, sort=sort) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_cases(self, sort): base = period_range('6/1/2000', '6/30/2000', freq='D', name='idx') @@ -210,7 +210,7 @@ def test_intersection_cases(self, sort): for (rng, expected) in [(rng2, expected2), (rng3, expected3), (rng4, expected4)]: result = base.intersection(rng, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) assert result.name == expected.name @@ -224,7 +224,7 @@ def test_intersection_cases(self, sort): result = rng.intersection(rng[0:0]) assert len(result) == 0 - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference(self, sort): # diff period_rng = ['1/3/2000', '1/2/2000', '1/1/2000', '1/5/2000', @@ -276,6 +276,6 @@ def test_difference(self, sort): (rng6, other6, expected6), (rng7, other7, expected7), ]: result_difference = rng.difference(other, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result_difference, expected) diff --git a/pandas/tests/indexes/test_base.py b/pandas/tests/indexes/test_base.py index f3e9d835c7391..26dcf7d6bc234 100644 --- a/pandas/tests/indexes/test_base.py +++ b/pandas/tests/indexes/test_base.py @@ -3,6 +3,8 @@ from collections import defaultdict from datetime import datetime, timedelta import math +import operator +import re import sys import numpy as np @@ -106,7 +108,10 @@ def test_constructor_copy(self): def test_constructor_corner(self): # corner case - pytest.raises(TypeError, Index, 0) + msg = (r"Index\(\.\.\.\) must be called with a collection of some" + " kind, 0 was passed") + with pytest.raises(TypeError, match=msg): + Index(0) @pytest.mark.parametrize("index_vals", [ [('A', 1), 'B'], ['B', ('A', 1)]]) @@ -487,21 +492,22 @@ def test_constructor_cast(self): Index(["a", "b", "c"], dtype=float) def test_view_with_args(self): - restricted = ['unicodeIndex', 'strIndex', 'catIndex', 'boolIndex', 'empty'] - - for i in restricted: - ind = self.indices[i] - - # with arguments - pytest.raises(TypeError, lambda: ind.view('i8')) - - # these are ok for i in list(set(self.indices.keys()) - set(restricted)): ind = self.indices[i] + ind.view('i8') - # with arguments + @pytest.mark.parametrize('index_type', [ + 'unicodeIndex', + 'strIndex', + pytest.param('catIndex', marks=pytest.mark.xfail(reason="gh-25464")), + 'boolIndex', + 'empty']) + def test_view_with_args_object_array_raises(self, index_type): + ind = self.indices[index_type] + msg = "Cannot change data-type for object array" + with pytest.raises(TypeError, match=msg): ind.view('i8') def test_astype(self): @@ -564,8 +570,8 @@ def test_delete(self, pos, expected): def test_delete_raises(self): index = Index(['a', 'b', 'c', 'd'], name='index') - with pytest.raises((IndexError, ValueError)): - # either depending on numpy version + msg = "index 5 is out of bounds for axis 0 with size 4" + with pytest.raises(IndexError, match=msg): index.delete(5) def test_identical(self): @@ -682,14 +688,16 @@ def test_empty_fancy_raises(self, attr): assert index[[]].identical(empty_index) # np.ndarray only accepts ndarray of int & bool dtypes, so should Index - pytest.raises(IndexError, index.__getitem__, empty_farr) + msg = r"arrays used as indices must be of integer \(or boolean\) type" + with pytest.raises(IndexError, match=msg): + index[empty_farr] - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, sort): first = self.strIndex[:20] second = self.strIndex[:10] intersect = first.intersection(second, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(intersect, second.sort_values()) assert tm.equalContents(intersect, second) @@ -701,7 +709,7 @@ def test_intersection(self, sort): (Index([3, 4, 5, 6, 7], name="index"), True), # preserve same name (Index([3, 4, 5, 6, 7], name="other"), False), # drop diff names (Index([3, 4, 5, 6, 7]), False)]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_name_preservation(self, index2, keeps_name, sort): index1 = Index([1, 2, 3, 4, 5], name='index') expected = Index([3, 4, 5]) @@ -715,7 +723,7 @@ def test_intersection_name_preservation(self, index2, keeps_name, sort): @pytest.mark.parametrize("first_name,second_name,expected_name", [ ('A', 'A', 'A'), ('A', 'B', None), (None, 'B', None)]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_name_preservation2(self, first_name, second_name, expected_name, sort): first = self.strIndex[5:20] @@ -728,7 +736,7 @@ def test_intersection_name_preservation2(self, first_name, second_name, @pytest.mark.parametrize("index2,keeps_name", [ (Index([4, 7, 6, 5, 3], name='index'), True), (Index([4, 7, 6, 5, 3], name='other'), False)]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_monotonic(self, index2, keeps_name, sort): index1 = Index([5, 3, 2, 4, 1], name='index') expected = Index([5, 3, 4]) @@ -737,25 +745,25 @@ def test_intersection_monotonic(self, index2, keeps_name, sort): expected.name = "index" result = index1.intersection(index2, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) @pytest.mark.parametrize("index2,expected_arr", [ (Index(['B', 'D']), ['B']), (Index(['B', 'D', 'A']), ['A', 'B', 'A'])]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_non_monotonic_non_unique(self, index2, expected_arr, sort): # non-monotonic non-unique index1 = Index(['A', 'B', 'A', 'C']) expected = Index(expected_arr, dtype='object') result = index1.intersection(index2, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersect_str_dates(self, sort): dt_dates = [datetime(2012, 2, 9), datetime(2012, 2, 22)] @@ -765,7 +773,24 @@ def test_intersect_str_dates(self, sort): assert len(result) == 0 - @pytest.mark.parametrize("sort", [True, False]) + def test_intersect_nosort(self): + result = pd.Index(['c', 'b', 'a']).intersection(['b', 'a']) + expected = pd.Index(['b', 'a']) + tm.assert_index_equal(result, expected) + + def test_intersection_equal_sort(self): + idx = pd.Index(['c', 'a', 'b']) + tm.assert_index_equal(idx.intersection(idx, sort=False), idx) + tm.assert_index_equal(idx.intersection(idx, sort=None), idx) + + @pytest.mark.xfail(reason="Not implemented") + def test_intersection_equal_sort_true(self): + # TODO decide on True behaviour + idx = pd.Index(['c', 'a', 'b']) + sorted_ = pd.Index(['a', 'b', 'c']) + tm.assert_index_equal(idx.intersection(idx, sort=True), sorted_) + + @pytest.mark.parametrize("sort", [None, False]) def test_chained_union(self, sort): # Chained unions handles names correctly i1 = Index([1, 2], name='i1') @@ -782,7 +807,7 @@ def test_chained_union(self, sort): expected = j1.union(j2, sort=sort).union(j3, sort=sort) tm.assert_index_equal(union, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union(self, sort): # TODO: Replace with fixturesult first = self.strIndex[5:20] @@ -790,13 +815,65 @@ def test_union(self, sort): everything = self.strIndex[:20] union = first.union(second, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(union, everything.sort_values()) assert tm.equalContents(union, everything) + @pytest.mark.parametrize('slice_', [slice(None), slice(0)]) + def test_union_sort_other_special(self, slice_): + # https://github.com/pandas-dev/pandas/issues/24959 + + idx = pd.Index([1, 0, 2]) + # default, sort=None + other = idx[slice_] + tm.assert_index_equal(idx.union(other), idx) + tm.assert_index_equal(other.union(idx), idx) + + # sort=False + tm.assert_index_equal(idx.union(other, sort=False), idx) + + @pytest.mark.xfail(reason="Not implemented") + @pytest.mark.parametrize('slice_', [slice(None), slice(0)]) + def test_union_sort_special_true(self, slice_): + # TODO decide on True behaviour + # sort=True + idx = pd.Index([1, 0, 2]) + # default, sort=None + other = idx[slice_] + + result = idx.union(other, sort=True) + expected = pd.Index([0, 1, 2]) + tm.assert_index_equal(result, expected) + + def test_union_sort_other_incomparable(self): + # https://github.com/pandas-dev/pandas/issues/24959 + idx = pd.Index([1, pd.Timestamp('2000')]) + # default (sort=None) + with tm.assert_produces_warning(RuntimeWarning): + result = idx.union(idx[:1]) + + tm.assert_index_equal(result, idx) + + # sort=None + with tm.assert_produces_warning(RuntimeWarning): + result = idx.union(idx[:1], sort=None) + tm.assert_index_equal(result, idx) + + # sort=False + result = idx.union(idx[:1], sort=False) + tm.assert_index_equal(result, idx) + + @pytest.mark.xfail(reason="Not implemented") + def test_union_sort_other_incomparable_true(self): + # TODO decide on True behaviour + # sort=True + idx = pd.Index([1, pd.Timestamp('2000')]) + with pytest.raises(TypeError, match='.*'): + idx.union(idx[:1], sort=True) + @pytest.mark.parametrize("klass", [ np.array, Series, list]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_from_iterables(self, klass, sort): # GH 10149 # TODO: Replace with fixturesult @@ -806,29 +883,30 @@ def test_union_from_iterables(self, klass, sort): case = klass(second.values) result = first.union(case, sort=sort) - if sort: + if sort is None: tm.assert_index_equal(result, everything.sort_values()) assert tm.equalContents(result, everything) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_identity(self, sort): # TODO: replace with fixturesult first = self.strIndex[5:20] union = first.union(first, sort=sort) - assert union is first + # i.e. identity is not preserved when sort is True + assert (union is first) is (not sort) union = first.union([], sort=sort) - assert union is first + assert (union is first) is (not sort) union = Index([]).union(first, sort=sort) - assert union is first + assert (union is first) is (not sort) @pytest.mark.parametrize("first_list", [list('ba'), list()]) @pytest.mark.parametrize("second_list", [list('ab'), list()]) @pytest.mark.parametrize("first_name, second_name, expected_name", [ ('A', 'B', None), (None, 'B', None), ('A', None, None)]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_name_preservation(self, first_list, second_list, first_name, second_name, expected_name, sort): first = Index(first_list, name=first_name) @@ -837,14 +915,14 @@ def test_union_name_preservation(self, first_list, second_list, first_name, vals = set(first_list).union(second_list) - if sort and len(first_list) > 0 and len(second_list) > 0: + if sort is None and len(first_list) > 0 and len(second_list) > 0: expected = Index(sorted(vals), name=expected_name) tm.assert_index_equal(union, expected) else: expected = Index(vals, name=expected_name) assert tm.equalContents(union, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_union_dt_as_obj(self, sort): # TODO: Replace with fixturesult firstCat = self.strIndex.union(self.dateIndex) @@ -861,6 +939,15 @@ def test_union_dt_as_obj(self, sort): tm.assert_contains_all(self.strIndex, secondCat) tm.assert_contains_all(self.dateIndex, firstCat) + @pytest.mark.parametrize("method", ['union', 'intersection', 'difference', + 'symmetric_difference']) + def test_setops_disallow_true(self, method): + idx1 = pd.Index(['a', 'b']) + idx2 = pd.Index(['b', 'c']) + + with pytest.raises(ValueError, match="The 'sort' keyword only takes"): + getattr(idx1, method)(idx2, sort=True) + def test_map_identity_mapping(self): # GH 12766 # TODO: replace with fixture @@ -982,7 +1069,7 @@ def test_append_empty_preserve_name(self, name, expected): @pytest.mark.parametrize("second_name,expected", [ (None, None), ('name', 'name')]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_name_preservation(self, second_name, expected, sort): # TODO: replace with fixturesult first = self.strIndex[5:20] @@ -1000,7 +1087,7 @@ def test_difference_name_preservation(self, second_name, expected, sort): else: assert result.name == expected - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_empty_arg(self, sort): first = self.strIndex[5:20] first.name == 'name' @@ -1009,7 +1096,7 @@ def test_difference_empty_arg(self, sort): assert tm.equalContents(result, first) assert result.name == first.name - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_identity(self, sort): first = self.strIndex[5:20] first.name == 'name' @@ -1018,7 +1105,7 @@ def test_difference_identity(self, sort): assert len(result) == 0 assert result.name == first.name - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_sort(self, sort): first = self.strIndex[5:20] second = self.strIndex[:10] @@ -1026,12 +1113,12 @@ def test_difference_sort(self, sort): result = first.difference(second, sort) expected = self.strIndex[10:20] - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference(self, sort): # smoke index1 = Index([5, 2, 3, 4], name='index1') @@ -1040,7 +1127,7 @@ def test_symmetric_difference(self, sort): expected = Index([5, 1]) assert tm.equalContents(result, expected) assert result.name is None - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) @@ -1049,13 +1136,43 @@ def test_symmetric_difference(self, sort): assert tm.equalContents(result, expected) assert result.name is None - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize('opname', ['difference', 'symmetric_difference']) + def test_difference_incomparable(self, opname): + a = pd.Index([3, pd.Timestamp('2000'), 1]) + b = pd.Index([2, pd.Timestamp('1999'), 1]) + op = operator.methodcaller(opname, b) + + # sort=None, the default + result = op(a) + expected = pd.Index([3, pd.Timestamp('2000'), 2, pd.Timestamp('1999')]) + if opname == 'difference': + expected = expected[:2] + tm.assert_index_equal(result, expected) + + # sort=False + op = operator.methodcaller(opname, b, sort=False) + result = op(a) + tm.assert_index_equal(result, expected) + + @pytest.mark.xfail(reason="Not implemented") + @pytest.mark.parametrize('opname', ['difference', 'symmetric_difference']) + def test_difference_incomparable_true(self, opname): + # TODO decide on True behaviour + # # sort=True, raises + a = pd.Index([3, pd.Timestamp('2000'), 1]) + b = pd.Index([2, pd.Timestamp('1999'), 1]) + op = operator.methodcaller(opname, b, sort=True) + + with pytest.raises(TypeError, match='Cannot compare'): + op(a) + + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference_mi(self, sort): index1 = MultiIndex.from_tuples(self.tuples) index2 = MultiIndex.from_tuples([('foo', 1), ('bar', 3)]) result = index1.symmetric_difference(index2, sort=sort) expected = MultiIndex.from_tuples([('bar', 2), ('baz', 3), ('bar', 3)]) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) assert tm.equalContents(result, expected) @@ -1063,18 +1180,18 @@ def test_symmetric_difference_mi(self, sort): @pytest.mark.parametrize("index2,expected", [ (Index([0, 1, np.nan]), Index([2.0, 3.0, 0.0])), (Index([0, 1]), Index([np.nan, 2.0, 3.0, 0.0]))]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference_missing(self, index2, expected, sort): # GH 13514 change: {nan} - {nan} == {} # (GH 6444, sorting of nans, is no longer an issue) index1 = Index([1, np.nan, 2, 3]) result = index1.symmetric_difference(index2, sort=sort) - if sort: + if sort is None: expected = expected.sort_values() tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_symmetric_difference_non_index(self, sort): index1 = Index([1, 2, 3, 4], name='index1') index2 = np.array([2, 3, 4, 5]) @@ -1088,7 +1205,7 @@ def test_symmetric_difference_non_index(self, sort): assert tm.equalContents(result, expected) assert result.name == 'new_name' - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_type(self, sort): # GH 20040 # If taking difference of a set and itself, it @@ -1099,7 +1216,7 @@ def test_difference_type(self, sort): expected = index.drop(index) tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_difference(self, sort): # GH 20040 # Test that the intersection of an index with an @@ -1316,13 +1433,14 @@ def test_get_indexer_strings(self, method, expected): def test_get_indexer_strings_raises(self): index = pd.Index(['b', 'c']) - with pytest.raises(TypeError): + msg = r"unsupported operand type\(s\) for -: 'str' and 'str'" + with pytest.raises(TypeError, match=msg): index.get_indexer(['a', 'b', 'c', 'd'], method='nearest') - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): index.get_indexer(['a', 'b', 'c', 'd'], method='pad', tolerance=2) - with pytest.raises(TypeError): + with pytest.raises(TypeError, match=msg): index.get_indexer(['a', 'b', 'c', 'd'], method='pad', tolerance=[2, 2, 2, 2]) @@ -1431,8 +1549,9 @@ def test_slice_locs(self, dtype): assert index2.slice_locs(8, 2) == (2, 6) assert index2.slice_locs(7, 3) == (2, 5) - def test_slice_float_locs(self): - index = Index(np.array([0, 1, 2, 5, 6, 7, 9, 10], dtype=float)) + @pytest.mark.parametrize("dtype", [int, float]) + def test_slice_float_locs(self, dtype): + index = Index(np.array([0, 1, 2, 5, 6, 7, 9, 10], dtype=dtype)) n = len(index) assert index.slice_locs(5.0, 10.0) == (3, n) assert index.slice_locs(4.5, 10.5) == (3, 8) @@ -1441,24 +1560,6 @@ def test_slice_float_locs(self): assert index2.slice_locs(8.5, 1.5) == (2, 6) assert index2.slice_locs(10.5, -1) == (0, n) - @pytest.mark.xfail(reason="Assertions were not correct - see GH#20915") - def test_slice_ints_with_floats_raises(self): - # int slicing with floats - # GH 4892, these are all TypeErrors - index = Index(np.array([0, 1, 2, 5, 6, 7, 9, 10], dtype=int)) - n = len(index) - - pytest.raises(TypeError, - lambda: index.slice_locs(5.0, 10.0)) - pytest.raises(TypeError, - lambda: index.slice_locs(4.5, 10.5)) - - index2 = index[::-1] - pytest.raises(TypeError, - lambda: index2.slice_locs(8.5, 1.5), (2, 6)) - pytest.raises(TypeError, - lambda: index2.slice_locs(10.5, -1), (0, n)) - def test_slice_locs_dup(self): index = Index(['a', 'a', 'b', 'c', 'd', 'd']) assert index.slice_locs('a', 'd') == (0, 6) @@ -1592,23 +1693,33 @@ def test_drop_tuple(self, values, to_drop): tm.assert_index_equal(result, expected) removed = index.drop(to_drop[1]) + msg = r"\"\[{}\] not found in axis\"".format( + re.escape(to_drop[1].__repr__())) for drop_me in to_drop[1], [to_drop[1]]: - pytest.raises(KeyError, removed.drop, drop_me) + with pytest.raises(KeyError, match=msg): + removed.drop(drop_me) + + @pytest.mark.parametrize("method,expected,sort", [ + ('intersection', np.array([(1, 'A'), (2, 'A'), (1, 'B'), (2, 'B')], + dtype=[('num', int), ('let', 'a1')]), + False), - @pytest.mark.parametrize("method,expected", [ ('intersection', np.array([(1, 'A'), (1, 'B'), (2, 'A'), (2, 'B')], - dtype=[('num', int), ('let', 'a1')])), + dtype=[('num', int), ('let', 'a1')]), + None), + ('union', np.array([(1, 'A'), (1, 'B'), (1, 'C'), (2, 'A'), (2, 'B'), - (2, 'C')], dtype=[('num', int), ('let', 'a1')])) + (2, 'C')], dtype=[('num', int), ('let', 'a1')]), + None) ]) - def test_tuple_union_bug(self, method, expected): + def test_tuple_union_bug(self, method, expected, sort): index1 = Index(np.array([(1, 'A'), (2, 'A'), (1, 'B'), (2, 'B')], dtype=[('num', int), ('let', 'a1')])) index2 = Index(np.array([(1, 'A'), (2, 'A'), (1, 'B'), (2, 'B'), (1, 'C'), (2, 'C')], dtype=[('num', int), ('let', 'a1')])) - result = getattr(index1, method)(index2) + result = getattr(index1, method)(index2, sort=sort) assert result.ndim == 1 expected = Index(expected) @@ -2247,20 +2358,20 @@ def test_unique_na(self): result = idx.unique() tm.assert_index_equal(result, expected) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_base(self, sort): # (same results for py2 and py3 but sortedness not tested elsewhere) index = self.create_index() first = index[:5] second = index[:3] - expected = Index([0, 1, 'a']) if sort else Index([0, 'a', 1]) + expected = Index([0, 1, 'a']) if sort is None else Index([0, 'a', 1]) result = first.intersection(second, sort=sort) tm.assert_index_equal(result, expected) @pytest.mark.parametrize("klass", [ np.array, Series, list]) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection_different_type_base(self, klass, sort): # GH 10149 index = self.create_index() @@ -2270,7 +2381,7 @@ def test_intersection_different_type_base(self, klass, sort): result = first.intersection(klass(second.values), sort=sort) assert tm.equalContents(result, second) - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_difference_base(self, sort): # (same results for py2 and py3 but sortedness not tested elsewhere) index = self.create_index() @@ -2279,7 +2390,7 @@ def test_difference_base(self, sort): result = first.difference(second, sort) expected = Index([0, 'a', 1]) - if sort: + if sort is None: expected = Index(safe_sort(expected)) tm.assert_index_equal(result, expected) diff --git a/pandas/tests/indexes/test_category.py b/pandas/tests/indexes/test_category.py index 582d466c6178e..95fac2f6ae05b 100644 --- a/pandas/tests/indexes/test_category.py +++ b/pandas/tests/indexes/test_category.py @@ -181,18 +181,21 @@ def test_create_categorical(self): expected = Categorical(['a', 'b', 'c']) tm.assert_categorical_equal(result, expected) - def test_disallow_set_ops(self): - + @pytest.mark.parametrize('func,op_name', [ + (lambda idx: idx - idx, '__sub__'), + (lambda idx: idx + idx, '__add__'), + (lambda idx: idx - ['a', 'b'], '__sub__'), + (lambda idx: idx + ['a', 'b'], '__add__'), + (lambda idx: ['a', 'b'] - idx, '__rsub__'), + (lambda idx: ['a', 'b'] + idx, '__radd__'), + ]) + def test_disallow_set_ops(self, func, op_name): # GH 10039 # set ops (+/-) raise TypeError idx = pd.Index(pd.Categorical(['a', 'b'])) - - pytest.raises(TypeError, lambda: idx - idx) - pytest.raises(TypeError, lambda: idx + idx) - pytest.raises(TypeError, lambda: idx - ['a', 'b']) - pytest.raises(TypeError, lambda: idx + ['a', 'b']) - pytest.raises(TypeError, lambda: ['a', 'b'] - idx) - pytest.raises(TypeError, lambda: ['a', 'b'] + idx) + msg = "cannot perform {} with this index type: CategoricalIndex" + with pytest.raises(TypeError, match=msg.format(op_name)): + func(idx) def test_method_delegation(self): @@ -231,8 +234,9 @@ def test_method_delegation(self): list('aabbca'), categories=list('cabdef'), ordered=True)) # invalid - pytest.raises(ValueError, lambda: ci.set_categories( - list('cab'), inplace=True)) + msg = "cannot use inplace with CategoricalIndex" + with pytest.raises(ValueError, match=msg): + ci.set_categories(list('cab'), inplace=True) def test_contains(self): @@ -357,12 +361,11 @@ def test_append(self): tm.assert_index_equal(result, ci, exact=True) # appending with different categories or reordered is not ok - pytest.raises( - TypeError, - lambda: ci.append(ci.values.set_categories(list('abcd')))) - pytest.raises( - TypeError, - lambda: ci.append(ci.values.reorder_categories(list('abc')))) + msg = "all inputs must be Index" + with pytest.raises(TypeError, match=msg): + ci.append(ci.values.set_categories(list('abcd'))) + with pytest.raises(TypeError, match=msg): + ci.append(ci.values.reorder_categories(list('abc'))) # with objects result = ci.append(Index(['c', 'a'])) @@ -370,7 +373,9 @@ def test_append(self): tm.assert_index_equal(result, expected, exact=True) # invalid objects - pytest.raises(TypeError, lambda: ci.append(Index(['a', 'd']))) + msg = "cannot append a non-category item to a CategoricalIndex" + with pytest.raises(TypeError, match=msg): + ci.append(Index(['a', 'd'])) # GH14298 - if base object is not categorical -> coerce to object result = Index(['c', 'a']).append(ci) @@ -406,7 +411,10 @@ def test_insert(self): tm.assert_index_equal(result, expected, exact=True) # invalid - pytest.raises(TypeError, lambda: ci.insert(0, 'd')) + msg = ("cannot insert an item into a CategoricalIndex that is not" + " already an existing category") + with pytest.raises(TypeError, match=msg): + ci.insert(0, 'd') # GH 18295 (test missing) expected = CategoricalIndex(['a', np.nan, 'a', 'b', 'c', 'b']) @@ -611,15 +619,6 @@ def test_is_monotonic(self, data, non_lexsorted_data): assert c.is_monotonic_increasing is True assert c.is_monotonic_decreasing is False - @pytest.mark.parametrize('values, expected', [ - ([1, 2, 3], True), - ([1, 3, 1], False), - (list('abc'), True), - (list('aba'), False)]) - def test_is_unique(self, values, expected): - ci = CategoricalIndex(values) - assert ci.is_unique is expected - def test_has_duplicates(self): idx = CategoricalIndex([0, 0, 0], name='foo') @@ -642,12 +641,16 @@ def test_get_indexer(self): r1 = idx1.get_indexer(idx2) assert_almost_equal(r1, np.array([0, 1, 2, -1], dtype=np.intp)) - pytest.raises(NotImplementedError, - lambda: idx2.get_indexer(idx1, method='pad')) - pytest.raises(NotImplementedError, - lambda: idx2.get_indexer(idx1, method='backfill')) - pytest.raises(NotImplementedError, - lambda: idx2.get_indexer(idx1, method='nearest')) + msg = ("method='pad' and method='backfill' not implemented yet for" + " CategoricalIndex") + with pytest.raises(NotImplementedError, match=msg): + idx2.get_indexer(idx1, method='pad') + with pytest.raises(NotImplementedError, match=msg): + idx2.get_indexer(idx1, method='backfill') + + msg = "method='nearest' not implemented yet for CategoricalIndex" + with pytest.raises(NotImplementedError, match=msg): + idx2.get_indexer(idx1, method='nearest') def test_get_loc(self): # GH 12531 @@ -785,12 +788,15 @@ def test_equals_categorical(self): # invalid comparisons with pytest.raises(ValueError, match="Lengths must match"): ci1 == Index(['a', 'b', 'c']) - pytest.raises(TypeError, lambda: ci1 == ci2) - pytest.raises( - TypeError, lambda: ci1 == Categorical(ci1.values, ordered=False)) - pytest.raises( - TypeError, - lambda: ci1 == Categorical(ci1.values, categories=list('abc'))) + + msg = ("categorical index comparisons must have the same categories" + " and ordered attributes") + with pytest.raises(TypeError, match=msg): + ci1 == ci2 + with pytest.raises(TypeError, match=msg): + ci1 == Categorical(ci1.values, ordered=False) + with pytest.raises(TypeError, match=msg): + ci1 == Categorical(ci1.values, categories=list('abc')) # tests # make sure that we are testing for category inclusion properly diff --git a/pandas/tests/indexes/test_common.py b/pandas/tests/indexes/test_common.py index fd356202a8ce5..03448129a48fc 100644 --- a/pandas/tests/indexes/test_common.py +++ b/pandas/tests/indexes/test_common.py @@ -3,6 +3,8 @@ any index subclass. Makes use of the `indices` fixture defined in pandas/tests/indexes/conftest.py. """ +import re + import numpy as np import pytest @@ -189,8 +191,14 @@ def test_unique(self, indices): result = indices.unique(level=level) tm.assert_index_equal(result, expected) - for level in 3, 'wrong': - pytest.raises((IndexError, KeyError), indices.unique, level=level) + msg = "Too many levels: Index has only 1 level, not 4" + with pytest.raises(IndexError, match=msg): + indices.unique(level=3) + + msg = r"Level wrong must be same as name \({}\)".format( + re.escape(indices.name.__repr__())) + with pytest.raises(KeyError, match=msg): + indices.unique(level='wrong') def test_get_unique_index(self, indices): # MultiIndex tested separately @@ -239,12 +247,16 @@ def test_get_unique_index(self, indices): tm.assert_index_equal(result, expected) def test_sort(self, indices): - pytest.raises(TypeError, indices.sort) + msg = "cannot sort an Index object in-place, use sort_values instead" + with pytest.raises(TypeError, match=msg): + indices.sort() def test_mutability(self, indices): if not len(indices): pytest.skip('Skip check for empty Index') - pytest.raises(TypeError, indices.__setitem__, 0, indices[0]) + msg = "Index does not support mutable operations" + with pytest.raises(TypeError, match=msg): + indices[0] = indices[0] def test_view(self, indices): assert indices.view().name == indices.name diff --git a/pandas/tests/indexes/test_numeric.py b/pandas/tests/indexes/test_numeric.py index a64340c02cd22..26413f4519eff 100644 --- a/pandas/tests/indexes/test_numeric.py +++ b/pandas/tests/indexes/test_numeric.py @@ -1,15 +1,17 @@ # -*- coding: utf-8 -*- from datetime import datetime +import re import numpy as np import pytest from pandas._libs.tslibs import Timestamp -from pandas.compat import range +from pandas.compat import PY2, range import pandas as pd from pandas import Float64Index, Index, Int64Index, Series, UInt64Index +from pandas.api.types import pandas_dtype from pandas.tests.indexes.common import Base import pandas.util.testing as tm @@ -153,12 +155,22 @@ def test_constructor(self): result = Index(np.array([np.nan])) assert pd.isna(result.values).all() + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_constructor_invalid(self): # invalid - pytest.raises(TypeError, Float64Index, 0.) - pytest.raises(TypeError, Float64Index, ['a', 'b', 0.]) - pytest.raises(TypeError, Float64Index, [Timestamp('20130101')]) + msg = (r"Float64Index\(\.\.\.\) must be called with a collection of" + r" some kind, 0\.0 was passed") + with pytest.raises(TypeError, match=msg): + Float64Index(0.) + msg = ("String dtype not supported, you may need to explicitly cast to" + " a numeric type") + with pytest.raises(TypeError, match=msg): + Float64Index(['a', 'b', 0.]) + msg = (r"float\(\) argument must be a string or a number, not" + " 'Timestamp'") + with pytest.raises(TypeError, match=msg): + Float64Index([Timestamp('20130101')]) def test_constructor_coerce(self): @@ -216,12 +228,17 @@ def test_astype(self): # invalid for dtype in ['M8[ns]', 'm8[ns]']: - pytest.raises(TypeError, lambda: i.astype(dtype)) + msg = ("Cannot convert Float64Index to dtype {}; integer values" + " are required for conversion").format(pandas_dtype(dtype)) + with pytest.raises(TypeError, match=re.escape(msg)): + i.astype(dtype) # GH 13149 for dtype in ['int16', 'int32', 'int64']: i = Float64Index([0, 1.1, np.NAN]) - pytest.raises(ValueError, lambda: i.astype(dtype)) + msg = "Cannot convert NA to integer" + with pytest.raises(ValueError, match=msg): + i.astype(dtype) def test_type_coercion_fail(self, any_int_dtype): # see gh-15832 @@ -275,12 +292,16 @@ def test_get_loc(self): assert idx.get_loc(1.1, method) == loc assert idx.get_loc(1.1, method, tolerance=0.9) == loc - pytest.raises(KeyError, idx.get_loc, 'foo') - pytest.raises(KeyError, idx.get_loc, 1.5) - pytest.raises(KeyError, idx.get_loc, 1.5, method='pad', - tolerance=0.1) - pytest.raises(KeyError, idx.get_loc, True) - pytest.raises(KeyError, idx.get_loc, False) + with pytest.raises(KeyError, match="^'foo'$"): + idx.get_loc('foo') + with pytest.raises(KeyError, match=r"^1\.5$"): + idx.get_loc(1.5) + with pytest.raises(KeyError, match=r"^1\.5$"): + idx.get_loc(1.5, method='pad', tolerance=0.1) + with pytest.raises(KeyError, match="^True$"): + idx.get_loc(True) + with pytest.raises(KeyError, match="^False$"): + idx.get_loc(False) with pytest.raises(ValueError, match='must be numeric'): idx.get_loc(1.4, method='nearest', tolerance='foo') @@ -310,15 +331,20 @@ def test_get_loc_na(self): # not representable by slice idx = Float64Index([np.nan, 1, np.nan, np.nan]) assert idx.get_loc(1) == 1 - pytest.raises(KeyError, idx.slice_locs, np.nan) + msg = "'Cannot get left slice bound for non-unique label: nan" + with pytest.raises(KeyError, match=msg): + idx.slice_locs(np.nan) def test_get_loc_missing_nan(self): # GH 8569 idx = Float64Index([1, 2]) assert idx.get_loc(1) == 0 - pytest.raises(KeyError, idx.get_loc, 3) - pytest.raises(KeyError, idx.get_loc, np.nan) - pytest.raises(KeyError, idx.get_loc, [np.nan]) + with pytest.raises(KeyError, match=r"^3\.0$"): + idx.get_loc(3) + with pytest.raises(KeyError, match="^nan$"): + idx.get_loc(np.nan) + with pytest.raises(KeyError, match=r"^\[nan\]$"): + idx.get_loc([np.nan]) def test_contains_nans(self): i = Float64Index([1.0, 2.0, np.nan]) @@ -499,13 +525,17 @@ def test_union_noncomparable(self): tm.assert_index_equal(result, expected) def test_cant_or_shouldnt_cast(self): + msg = ("String dtype not supported, you may need to explicitly cast to" + " a numeric type") # can't data = ['foo', 'bar', 'baz'] - pytest.raises(TypeError, self._holder, data) + with pytest.raises(TypeError, match=msg): + self._holder(data) # shouldn't data = ['0', '1', '2'] - pytest.raises(TypeError, self._holder, data) + with pytest.raises(TypeError, match=msg): + self._holder(data) def test_view_index(self): self.index.view(Index) @@ -576,7 +606,10 @@ def test_constructor(self): tm.assert_index_equal(index, expected) # scalar raise Exception - pytest.raises(TypeError, Int64Index, 5) + msg = (r"Int64Index\(\.\.\.\) must be called with a collection of some" + " kind, 5 was passed") + with pytest.raises(TypeError, match=msg): + Int64Index(5) # copy arr = self.index.values diff --git a/pandas/tests/indexes/test_range.py b/pandas/tests/indexes/test_range.py index bbd1e0ccc19b1..96cf83d477376 100644 --- a/pandas/tests/indexes/test_range.py +++ b/pandas/tests/indexes/test_range.py @@ -503,7 +503,7 @@ def test_join_self(self): joined = self.index.join(self.index, how=kind) assert self.index is joined - @pytest.mark.parametrize("sort", [True, False]) + @pytest.mark.parametrize("sort", [None, False]) def test_intersection(self, sort): # intersect with Int64Index other = Index(np.arange(1, 6)) diff --git a/pandas/tests/indexes/timedeltas/test_arithmetic.py b/pandas/tests/indexes/timedeltas/test_arithmetic.py index 04977023d7c62..3173252e174ab 100644 --- a/pandas/tests/indexes/timedeltas/test_arithmetic.py +++ b/pandas/tests/indexes/timedeltas/test_arithmetic.py @@ -198,20 +198,34 @@ def test_ops_ndarray(self): expected = pd.to_timedelta(['2 days']).values tm.assert_numpy_array_equal(td + other, expected) tm.assert_numpy_array_equal(other + td, expected) - pytest.raises(TypeError, lambda: td + np.array([1])) - pytest.raises(TypeError, lambda: np.array([1]) + td) + msg = r"unsupported operand type\(s\) for \+: 'Timedelta' and 'int'" + with pytest.raises(TypeError, match=msg): + td + np.array([1]) + msg = (r"unsupported operand type\(s\) for \+: 'numpy.ndarray' and" + " 'Timedelta'") + with pytest.raises(TypeError, match=msg): + np.array([1]) + td expected = pd.to_timedelta(['0 days']).values tm.assert_numpy_array_equal(td - other, expected) tm.assert_numpy_array_equal(-other + td, expected) - pytest.raises(TypeError, lambda: td - np.array([1])) - pytest.raises(TypeError, lambda: np.array([1]) - td) + msg = r"unsupported operand type\(s\) for -: 'Timedelta' and 'int'" + with pytest.raises(TypeError, match=msg): + td - np.array([1]) + msg = (r"unsupported operand type\(s\) for -: 'numpy.ndarray' and" + " 'Timedelta'") + with pytest.raises(TypeError, match=msg): + np.array([1]) - td expected = pd.to_timedelta(['2 days']).values tm.assert_numpy_array_equal(td * np.array([2]), expected) tm.assert_numpy_array_equal(np.array([2]) * td, expected) - pytest.raises(TypeError, lambda: td * other) - pytest.raises(TypeError, lambda: other * td) + msg = ("ufunc multiply cannot use operands with types" + r" dtype\(' 1]) + msg = "Unordered Categoricals can only compare equality or not" + with pytest.raises(TypeError, match=msg): + df4[df4.index < 2] + with pytest.raises(TypeError, match=msg): + df4[df4.index > 1] def test_indexing_with_category(self): diff --git a/pandas/tests/indexing/test_chaining_and_caching.py b/pandas/tests/indexing/test_chaining_and_caching.py index e38c1b16b3b60..6070edca075c2 100644 --- a/pandas/tests/indexing/test_chaining_and_caching.py +++ b/pandas/tests/indexing/test_chaining_and_caching.py @@ -302,11 +302,11 @@ def test_setting_with_copy_bug(self): 'c': ['a', 'b', np.nan, 'd']}) mask = pd.isna(df.c) - def f(): + msg = ("A value is trying to be set on a copy of a slice from a" + " DataFrame") + with pytest.raises(com.SettingWithCopyError, match=msg): df[['c']][mask] = df[['b']][mask] - pytest.raises(com.SettingWithCopyError, f) - # invalid warning as we are returning a new object # GH 8730 df1 = DataFrame({'x': Series(['a', 'b', 'c']), @@ -357,7 +357,6 @@ def check(result, expected): check(result4, expected) @pytest.mark.filterwarnings("ignore::DeprecationWarning") - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") def test_cache_updating(self): # GH 4939, make sure to update the cache on setitem @@ -367,12 +366,6 @@ def test_cache_updating(self): assert "Hello Friend" in df['A'].index assert "Hello Friend" in df['B'].index - panel = tm.makePanel() - panel.ix[0] # get first item into cache - panel.ix[:, :, 'A+1'] = panel.ix[:, :, 'A'] + 1 - assert "A+1" in panel.ix[0].columns - assert "A+1" in panel.ix[1].columns - # 10264 df = DataFrame(np.zeros((5, 5), dtype='int64'), columns=[ 'a', 'b', 'c', 'd', 'e'], index=range(5)) diff --git a/pandas/tests/indexing/test_floats.py b/pandas/tests/indexing/test_floats.py index de91b8f4a796c..b9b47338c9de2 100644 --- a/pandas/tests/indexing/test_floats.py +++ b/pandas/tests/indexing/test_floats.py @@ -6,7 +6,7 @@ import pytest from pandas import ( - DataFrame, Float64Index, Index, Int64Index, RangeIndex, Series) + DataFrame, Float64Index, Index, Int64Index, RangeIndex, Series, compat) import pandas.util.testing as tm from pandas.util.testing import assert_almost_equal, assert_series_equal @@ -54,9 +54,11 @@ def test_scalar_error(self): with pytest.raises(TypeError, match=msg): s.iloc[3.0] - def f(): + msg = ("cannot do positional indexing on {klass} with these " + r"indexers \[3\.0\] of {kind}".format( + klass=type(i), kind=str(float))) + with pytest.raises(TypeError, match=msg): s.iloc[3.0] = 0 - pytest.raises(TypeError, f) @ignore_ix def test_scalar_non_numeric(self): @@ -82,35 +84,46 @@ def test_scalar_non_numeric(self): (lambda x: x.iloc, False), (lambda x: x, True)]: - def f(): - with catch_warnings(record=True): - idxr(s)[3.0] - # gettitem on a DataFrame is a KeyError as it is indexing # via labels on the columns if getitem and isinstance(s, DataFrame): error = KeyError + msg = r"^3(\.0)?$" else: error = TypeError - pytest.raises(error, f) + msg = (r"cannot do (label|index|positional) indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}|" + "Cannot index by location index with a" + " non-integer key" + .format(klass=type(i), kind=str(float))) + with catch_warnings(record=True): + with pytest.raises(error, match=msg): + idxr(s)[3.0] # label based can be a TypeError or KeyError - def f(): - s.loc[3.0] - if s.index.inferred_type in ['string', 'unicode', 'mixed']: error = KeyError + msg = r"^3$" else: error = TypeError - pytest.raises(error, f) + msg = (r"cannot do (label|index) indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}" + .format(klass=type(i), kind=str(float))) + with pytest.raises(error, match=msg): + s.loc[3.0] # contains assert 3.0 not in s # setting with a float fails with iloc - def f(): + msg = (r"cannot do (label|index|positional) indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}" + .format(klass=type(i), kind=str(float))) + with pytest.raises(TypeError, match=msg): s.iloc[3.0] = 0 - pytest.raises(TypeError, f) # setting with an indexer if s.index.inferred_type in ['categorical']: @@ -145,7 +158,12 @@ def f(): # fallsback to position selection, series only s = Series(np.arange(len(i)), index=i) s[3] - pytest.raises(TypeError, lambda: s[3.0]) + msg = (r"cannot do (label|index) indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}" + .format(klass=type(i), kind=str(float))) + with pytest.raises(TypeError, match=msg): + s[3.0] @ignore_ix def test_scalar_with_mixed(self): @@ -153,19 +171,23 @@ def test_scalar_with_mixed(self): s2 = Series([1, 2, 3], index=['a', 'b', 'c']) s3 = Series([1, 2, 3], index=['a', 'b', 1.5]) - # lookup in a pure string index + # lookup in a pure stringstr # with an invalid indexer for idxr in [lambda x: x.ix, lambda x: x, lambda x: x.iloc]: - def f(): - with catch_warnings(record=True): + msg = (r"cannot do label indexing" + r" on {klass} with these indexers \[1\.0\] of" + r" {kind}|" + "Cannot index by location index with a non-integer key" + .format(klass=str(Index), kind=str(float))) + with catch_warnings(record=True): + with pytest.raises(TypeError, match=msg): idxr(s2)[1.0] - pytest.raises(TypeError, f) - - pytest.raises(KeyError, lambda: s2.loc[1.0]) + with pytest.raises(KeyError, match=r"^1$"): + s2.loc[1.0] result = s2.loc['b'] expected = 2 @@ -175,11 +197,13 @@ def f(): # indexing for idxr in [lambda x: x]: - def f(): + msg = (r"cannot do label indexing" + r" on {klass} with these indexers \[1\.0\] of" + r" {kind}" + .format(klass=str(Index), kind=str(float))) + with pytest.raises(TypeError, match=msg): idxr(s3)[1.0] - pytest.raises(TypeError, f) - result = idxr(s3)[1] expected = 2 assert result == expected @@ -189,17 +213,22 @@ def f(): for idxr in [lambda x: x.ix]: with catch_warnings(record=True): - def f(): + msg = (r"cannot do label indexing" + r" on {klass} with these indexers \[1\.0\] of" + r" {kind}" + .format(klass=str(Index), kind=str(float))) + with pytest.raises(TypeError, match=msg): idxr(s3)[1.0] - pytest.raises(TypeError, f) - result = idxr(s3)[1] expected = 2 assert result == expected - pytest.raises(TypeError, lambda: s3.iloc[1.0]) - pytest.raises(KeyError, lambda: s3.loc[1.0]) + msg = "Cannot index by location index with a non-integer key" + with pytest.raises(TypeError, match=msg): + s3.iloc[1.0] + with pytest.raises(KeyError, match=r"^1$"): + s3.loc[1.0] result = s3.loc[1.5] expected = 3 @@ -280,16 +309,14 @@ def test_scalar_float(self): # setting s2 = s.copy() - def f(): - with catch_warnings(record=True): - idxr(s2)[indexer] = expected with catch_warnings(record=True): result = idxr(s2)[indexer] self.check(result, s, 3, getitem) # random integer is a KeyError with catch_warnings(record=True): - pytest.raises(KeyError, lambda: idxr(s)[3.5]) + with pytest.raises(KeyError, match=r"^3\.5$"): + idxr(s)[3.5] # contains assert 3.0 in s @@ -303,11 +330,16 @@ def f(): self.check(result, s, 3, False) # iloc raises with a float - pytest.raises(TypeError, lambda: s.iloc[3.0]) + msg = "Cannot index by location index with a non-integer key" + with pytest.raises(TypeError, match=msg): + s.iloc[3.0] - def g(): + msg = (r"cannot do positional indexing" + r" on {klass} with these indexers \[3\.0\] of" + r" {kind}" + .format(klass=str(Float64Index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s2.iloc[3.0] = 0 - pytest.raises(TypeError, g) @ignore_ix def test_slice_non_numeric(self): @@ -329,37 +361,55 @@ def test_slice_non_numeric(self): slice(3, 4.0), slice(3.0, 4.0)]: - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s.iloc[l] - pytest.raises(TypeError, f) for idxr in [lambda x: x.ix, lambda x: x.loc, lambda x: x.iloc, lambda x: x]: - def f(): - with catch_warnings(record=True): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers" + r" \[(3|4)(\.0)?\]" + r" of ({kind_float}|{kind_int})" + .format(klass=type(index), + kind_float=str(float), + kind_int=str(int))) + with catch_warnings(record=True): + with pytest.raises(TypeError, match=msg): idxr(s)[l] - pytest.raises(TypeError, f) # setitem for l in [slice(3.0, 4), slice(3, 4.0), slice(3.0, 4.0)]: - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s.iloc[l] = 0 - pytest.raises(TypeError, f) for idxr in [lambda x: x.ix, lambda x: x.loc, lambda x: x.iloc, lambda x: x]: - def f(): - with catch_warnings(record=True): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers" + r" \[(3|4)(\.0)?\]" + r" of ({kind_float}|{kind_int})" + .format(klass=type(index), + kind_float=str(float), + kind_int=str(int))) + with catch_warnings(record=True): + with pytest.raises(TypeError, match=msg): idxr(s)[l] = 0 - pytest.raises(TypeError, f) @ignore_ix def test_slice_integer(self): @@ -396,11 +446,13 @@ def test_slice_integer(self): self.check(result, s, indexer, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] - pytest.raises(TypeError, f) - # getitem out-of-bounds for l in [slice(-6, 6), slice(-6.0, 6.0)]: @@ -420,11 +472,13 @@ def f(): self.check(result, s, indexer, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[-6\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[slice(-6.0, 6.0)] - pytest.raises(TypeError, f) - # getitem odd floats for l, res1 in [(slice(2.5, 4), slice(3, 5)), (slice(2, 3.5), slice(2, 4)), @@ -443,11 +497,13 @@ def f(): self.check(result, s, res, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(2|3)\.5\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] - pytest.raises(TypeError, f) - # setitem for l in [slice(3.0, 4), slice(3, 4.0), @@ -462,11 +518,13 @@ def f(): assert (result == 0).all() # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] = 0 - pytest.raises(TypeError, f) - def test_integer_positional_indexing(self): """ make sure that we are raising on positional indexing w.r.t. an integer index """ @@ -484,11 +542,17 @@ def test_integer_positional_indexing(self): slice(2.0, 4), slice(2.0, 4.0)]: - def f(): + if compat.PY2: + klass = Int64Index + else: + klass = RangeIndex + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(2|4)\.0\] of" + " {kind}" + .format(klass=str(klass), kind=str(float))) + with pytest.raises(TypeError, match=msg): idxr(s)[l] - pytest.raises(TypeError, f) - @ignore_ix def test_slice_integer_frame_getitem(self): @@ -509,11 +573,13 @@ def f(idxr): self.check(result, s, indexer, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(0|1)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] - pytest.raises(TypeError, f) - # getitem out-of-bounds for l in [slice(-10, 10), slice(-10.0, 10.0)]: @@ -522,11 +588,13 @@ def f(): self.check(result, s, slice(-10, 10), True) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[-10\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[slice(-10.0, 10.0)] - pytest.raises(TypeError, f) - # getitem odd floats for l, res in [(slice(0.5, 1), slice(1, 2)), (slice(0, 0.5), slice(0, 1)), @@ -536,11 +604,13 @@ def f(): self.check(result, s, res, False) # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[0\.5\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] - pytest.raises(TypeError, f) - # setitem for l in [slice(3.0, 4), slice(3, 4.0), @@ -552,11 +622,13 @@ def f(): assert (result == 0).all() # positional indexing - def f(): + msg = ("cannot do slice indexing" + r" on {klass} with these indexers \[(3|4)\.0\] of" + " {kind}" + .format(klass=type(index), kind=str(float))) + with pytest.raises(TypeError, match=msg): s[l] = 0 - pytest.raises(TypeError, f) - f(lambda x: x.loc) with catch_warnings(record=True): f(lambda x: x.ix) @@ -632,9 +704,12 @@ def test_floating_misc(self): # value not found (and no fallbacking at all) # scalar integers - pytest.raises(KeyError, lambda: s.loc[4]) - pytest.raises(KeyError, lambda: s.loc[4]) - pytest.raises(KeyError, lambda: s[4]) + with pytest.raises(KeyError, match=r"^4\.0$"): + s.loc[4] + with pytest.raises(KeyError, match=r"^4\.0$"): + s.loc[4] + with pytest.raises(KeyError, match=r"^4\.0$"): + s[4] # fancy floats/integers create the correct entry (as nan) # fancy tests diff --git a/pandas/tests/indexing/test_iloc.py b/pandas/tests/indexing/test_iloc.py index a867387db4b46..69ec6454e952a 100644 --- a/pandas/tests/indexing/test_iloc.py +++ b/pandas/tests/indexing/test_iloc.py @@ -26,26 +26,33 @@ def test_iloc_exceeds_bounds(self): msg = 'positional indexers are out-of-bounds' with pytest.raises(IndexError, match=msg): df.iloc[:, [0, 1, 2, 3, 4, 5]] - pytest.raises(IndexError, lambda: df.iloc[[1, 30]]) - pytest.raises(IndexError, lambda: df.iloc[[1, -30]]) - pytest.raises(IndexError, lambda: df.iloc[[100]]) + with pytest.raises(IndexError, match=msg): + df.iloc[[1, 30]] + with pytest.raises(IndexError, match=msg): + df.iloc[[1, -30]] + with pytest.raises(IndexError, match=msg): + df.iloc[[100]] s = df['A'] - pytest.raises(IndexError, lambda: s.iloc[[100]]) - pytest.raises(IndexError, lambda: s.iloc[[-100]]) + with pytest.raises(IndexError, match=msg): + s.iloc[[100]] + with pytest.raises(IndexError, match=msg): + s.iloc[[-100]] # still raise on a single indexer msg = 'single positional indexer is out-of-bounds' with pytest.raises(IndexError, match=msg): df.iloc[30] - pytest.raises(IndexError, lambda: df.iloc[-30]) + with pytest.raises(IndexError, match=msg): + df.iloc[-30] # GH10779 # single positive/negative indexer exceeding Series bounds should raise # an IndexError with pytest.raises(IndexError, match=msg): s.iloc[30] - pytest.raises(IndexError, lambda: s.iloc[-30]) + with pytest.raises(IndexError, match=msg): + s.iloc[-30] # slices are ok result = df.iloc[:, 4:10] # 0 < start < len < stop @@ -104,8 +111,12 @@ def check(result, expected): check(dfl.iloc[:, 1:3], dfl.iloc[:, [1]]) check(dfl.iloc[4:6], dfl.iloc[[4]]) - pytest.raises(IndexError, lambda: dfl.iloc[[4, 5, 6]]) - pytest.raises(IndexError, lambda: dfl.iloc[:, 4]) + msg = "positional indexers are out-of-bounds" + with pytest.raises(IndexError, match=msg): + dfl.iloc[[4, 5, 6]] + msg = "single positional indexer is out-of-bounds" + with pytest.raises(IndexError, match=msg): + dfl.iloc[:, 4] def test_iloc_getitem_int(self): @@ -437,10 +448,16 @@ def test_iloc_getitem_labelled_frame(self): assert result == exp # out-of-bounds exception - pytest.raises(IndexError, df.iloc.__getitem__, tuple([10, 5])) + msg = "single positional indexer is out-of-bounds" + with pytest.raises(IndexError, match=msg): + df.iloc[10, 5] # trying to use a label - pytest.raises(ValueError, df.iloc.__getitem__, tuple(['j', 'D'])) + msg = (r"Location based indexing can only have \[integer, integer" + r" slice \(START point is INCLUDED, END point is EXCLUDED\)," + r" listlike of integers, boolean array\] types") + with pytest.raises(ValueError, match=msg): + df.iloc['j', 'D'] def test_iloc_getitem_doc_issue(self): @@ -555,10 +572,15 @@ def test_iloc_mask(self): # GH 3631, iloc with a mask (of a series) should raise df = DataFrame(lrange(5), list('ABCDE'), columns=['a']) mask = (df.a % 2 == 0) - pytest.raises(ValueError, df.iloc.__getitem__, tuple([mask])) + msg = ("iLocation based boolean indexing cannot use an indexable as" + " a mask") + with pytest.raises(ValueError, match=msg): + df.iloc[mask] mask.index = lrange(len(mask)) - pytest.raises(NotImplementedError, df.iloc.__getitem__, - tuple([mask])) + msg = ("iLocation based boolean indexing on an integer type is not" + " available") + with pytest.raises(NotImplementedError, match=msg): + df.iloc[mask] # ndarray ok result = df.iloc[np.array([True] * len(mask), dtype=bool)] @@ -675,3 +697,16 @@ def test_identity_slice_returns_new_object(self): # should also be a shallow copy original_series[:3] = [7, 8, 9] assert all(sliced_series[:3] == [7, 8, 9]) + + def test_indexing_zerodim_np_array(self): + # GH24919 + df = DataFrame([[1, 2], [3, 4]]) + result = df.iloc[np.array(0)] + s = pd.Series([1, 2], name=0) + tm.assert_series_equal(result, s) + + def test_series_indexing_zerodim_np_array(self): + # GH24919 + s = Series([1, 2]) + result = s.iloc[np.array(0)] + assert result == 1 diff --git a/pandas/tests/indexing/test_ix.py b/pandas/tests/indexing/test_ix.py index 35805bce07705..fb4dfbb39ce94 100644 --- a/pandas/tests/indexing/test_ix.py +++ b/pandas/tests/indexing/test_ix.py @@ -102,7 +102,12 @@ def compare(result, expected): with catch_warnings(record=True): df.ix[key] - pytest.raises(TypeError, lambda: df.loc[key]) + msg = (r"cannot do slice indexing" + r" on {klass} with these indexers \[(0|1)\] of" + r" {kind}" + .format(klass=type(df.index), kind=str(int))) + with pytest.raises(TypeError, match=msg): + df.loc[key] df = DataFrame(np.random.randn(5, 4), columns=list('ABCD'), index=pd.date_range('2012-01-01', periods=5)) @@ -122,7 +127,8 @@ def compare(result, expected): with catch_warnings(record=True): expected = df.ix[key] except KeyError: - pytest.raises(KeyError, lambda: df.loc[key]) + with pytest.raises(KeyError, match=r"^'2012-01-31'$"): + df.loc[key] continue result = df.loc[key] @@ -279,14 +285,18 @@ def test_ix_setitem_out_of_bounds_axis_0(self): np.random.randn(2, 5), index=["row%s" % i for i in range(2)], columns=["col%s" % i for i in range(5)]) with catch_warnings(record=True): - pytest.raises(ValueError, df.ix.__setitem__, (2, 0), 100) + msg = "cannot set by positional indexing with enlargement" + with pytest.raises(ValueError, match=msg): + df.ix[2, 0] = 100 def test_ix_setitem_out_of_bounds_axis_1(self): df = DataFrame( np.random.randn(5, 2), index=["row%s" % i for i in range(5)], columns=["col%s" % i for i in range(2)]) with catch_warnings(record=True): - pytest.raises(ValueError, df.ix.__setitem__, (0, 2), 100) + msg = "cannot set by positional indexing with enlargement" + with pytest.raises(ValueError, match=msg): + df.ix[0, 2] = 100 def test_ix_empty_list_indexer_is_ok(self): with catch_warnings(record=True): diff --git a/pandas/tests/indexing/test_loc.py b/pandas/tests/indexing/test_loc.py index 17e107c7a1130..29f70929624fc 100644 --- a/pandas/tests/indexing/test_loc.py +++ b/pandas/tests/indexing/test_loc.py @@ -233,8 +233,10 @@ def test_loc_to_fail(self): columns=['e', 'f', 'g']) # raise a KeyError? - pytest.raises(KeyError, df.loc.__getitem__, - tuple([[1, 2], [1, 2]])) + msg = (r"\"None of \[Int64Index\(\[1, 2\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + df.loc[[1, 2], [1, 2]] # GH 7496 # loc should not fallback @@ -243,10 +245,18 @@ def test_loc_to_fail(self): s.loc[1] = 1 s.loc['a'] = 2 - pytest.raises(KeyError, lambda: s.loc[-1]) - pytest.raises(KeyError, lambda: s.loc[[-1, -2]]) + with pytest.raises(KeyError, match=r"^-1$"): + s.loc[-1] - pytest.raises(KeyError, lambda: s.loc[['4']]) + msg = (r"\"None of \[Int64Index\(\[-1, -2\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + s.loc[[-1, -2]] + + msg = (r"\"None of \[Index\(\[u?'4'\], dtype='object'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + s.loc[['4']] s.loc[-1] = 3 with tm.assert_produces_warning(FutureWarning, @@ -256,29 +266,28 @@ def test_loc_to_fail(self): tm.assert_series_equal(result, expected) s['a'] = 2 - pytest.raises(KeyError, lambda: s.loc[[-2]]) + msg = (r"\"None of \[Int64Index\(\[-2\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + s.loc[[-2]] del s['a'] - def f(): + with pytest.raises(KeyError, match=msg): s.loc[[-2]] = 0 - pytest.raises(KeyError, f) - # inconsistency between .loc[values] and .loc[values,:] # GH 7999 df = DataFrame([['a'], ['b']], index=[1, 2], columns=['value']) - def f(): + msg = (r"\"None of \[Int64Index\(\[3\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): df.loc[[3], :] - pytest.raises(KeyError, f) - - def f(): + with pytest.raises(KeyError, match=msg): df.loc[[3]] - pytest.raises(KeyError, f) - def test_loc_getitem_list_with_fail(self): # 15747 # should KeyError if *any* missing labels @@ -600,11 +609,15 @@ def test_loc_non_unique(self): # these are going to raise because the we are non monotonic df = DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': [3, 4, 5, 6, 7, 8]}, index=[0, 1, 0, 1, 2, 3]) - pytest.raises(KeyError, df.loc.__getitem__, - tuple([slice(1, None)])) - pytest.raises(KeyError, df.loc.__getitem__, - tuple([slice(0, None)])) - pytest.raises(KeyError, df.loc.__getitem__, tuple([slice(1, 2)])) + msg = "'Cannot get left slice bound for non-unique label: 1'" + with pytest.raises(KeyError, match=msg): + df.loc[1:] + msg = "'Cannot get left slice bound for non-unique label: 0'" + with pytest.raises(KeyError, match=msg): + df.loc[0:] + msg = "'Cannot get left slice bound for non-unique label: 1'" + with pytest.raises(KeyError, match=msg): + df.loc[1:2] # monotonic are ok df = DataFrame({'A': [1, 2, 3, 4, 5, 6], @@ -765,3 +778,16 @@ def test_loc_setitem_empty_append_raises(self): msg = "cannot copy sequence with size 2 to array axis with dimension 0" with pytest.raises(ValueError, match=msg): df.loc[0:2, 'x'] = data + + def test_indexing_zerodim_np_array(self): + # GH24924 + df = DataFrame([[1, 2], [3, 4]]) + result = df.loc[np.array(0)] + s = pd.Series([1, 2], name=0) + tm.assert_series_equal(result, s) + + def test_series_indexing_zerodim_np_array(self): + # GH24924 + s = Series([1, 2]) + result = s.loc[np.array(0)] + assert result == 1 diff --git a/pandas/tests/indexing/test_panel.py b/pandas/tests/indexing/test_panel.py index 34708e1148c90..8033d19f330b3 100644 --- a/pandas/tests/indexing/test_panel.py +++ b/pandas/tests/indexing/test_panel.py @@ -3,7 +3,7 @@ import numpy as np import pytest -from pandas import DataFrame, Panel, date_range +from pandas import Panel, date_range from pandas.util import testing as tm @@ -31,30 +31,6 @@ def test_iloc_getitem_panel(self): expected = p.loc['B', 'b', 'two'] assert result == expected - # slice - result = p.iloc[1:3] - expected = p.loc[['B', 'C']] - tm.assert_panel_equal(result, expected) - - result = p.iloc[:, 0:2] - expected = p.loc[:, ['a', 'b']] - tm.assert_panel_equal(result, expected) - - # list of integers - result = p.iloc[[0, 2]] - expected = p.loc[['A', 'C']] - tm.assert_panel_equal(result, expected) - - # neg indices - result = p.iloc[[-1, 1], [-1, 1]] - expected = p.loc[['D', 'B'], ['c', 'b']] - tm.assert_panel_equal(result, expected) - - # dups indices - result = p.iloc[[-1, -1, 1], [-1, 1]] - expected = p.loc[['D', 'D', 'B'], ['c', 'b']] - tm.assert_panel_equal(result, expected) - # combined result = p.iloc[0, [True, True], [0, 1]] expected = p.loc['A', ['a', 'b'], ['one', 'two']] @@ -110,40 +86,6 @@ def test_iloc_panel_issue(self): def test_panel_getitem(self): with catch_warnings(record=True): - # GH4016, date selection returns a frame when a partial string - # selection - ind = date_range(start="2000", freq="D", periods=1000) - df = DataFrame( - np.random.randn( - len(ind), 5), index=ind, columns=list('ABCDE')) - panel = Panel({'frame_' + c: df for c in list('ABC')}) - - test2 = panel.loc[:, "2002":"2002-12-31"] - test1 = panel.loc[:, "2002"] - tm.assert_panel_equal(test1, test2) - - # GH8710 - # multi-element getting with a list - panel = tm.makePanel() - - expected = panel.iloc[[0, 1]] - - result = panel.loc[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - - result = panel.loc[['ItemA', 'ItemB'], :, :] - tm.assert_panel_equal(result, expected) - - result = panel[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - - result = panel.loc['ItemA':'ItemB'] - tm.assert_panel_equal(result, expected) - - with catch_warnings(record=True): - result = panel.ix[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - # with an object-like # GH 9140 class TestObject(object): @@ -160,55 +102,3 @@ def __str__(self): expected = p.iloc[0] result = p[obj] tm.assert_frame_equal(result, expected) - - def test_panel_setitem(self): - - with catch_warnings(record=True): - # GH 7763 - # loc and setitem have setting differences - np.random.seed(0) - index = range(3) - columns = list('abc') - - panel = Panel({'A': DataFrame(np.random.randn(3, 3), - index=index, columns=columns), - 'B': DataFrame(np.random.randn(3, 3), - index=index, columns=columns), - 'C': DataFrame(np.random.randn(3, 3), - index=index, columns=columns)}) - - replace = DataFrame(np.eye(3, 3), index=range(3), columns=columns) - expected = Panel({'A': replace, 'B': replace, 'C': replace}) - - p = panel.copy() - for idx in list('ABC'): - p[idx] = replace - tm.assert_panel_equal(p, expected) - - p = panel.copy() - for idx in list('ABC'): - p.loc[idx, :, :] = replace - tm.assert_panel_equal(p, expected) - - def test_panel_assignment(self): - - with catch_warnings(record=True): - # GH3777 - wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - wp2 = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - - # TODO: unused? - # expected = wp.loc[['Item1', 'Item2'], :, ['A', 'B']] - - with pytest.raises(NotImplementedError): - wp.loc[['Item1', 'Item2'], :, ['A', 'B']] = wp2.loc[ - ['Item1', 'Item2'], :, ['A', 'B']] - - # to_assign = wp2.loc[['Item1', 'Item2'], :, ['A', 'B']] - # wp.loc[['Item1', 'Item2'], :, ['A', 'B']] = to_assign - # result = wp.loc[['Item1', 'Item2'], :, ['A', 'B']] - # tm.assert_panel_equal(result,expected) diff --git a/pandas/tests/indexing/test_partial.py b/pandas/tests/indexing/test_partial.py index b863afe02c2e8..e8ce5bc4c36ef 100644 --- a/pandas/tests/indexing/test_partial.py +++ b/pandas/tests/indexing/test_partial.py @@ -10,13 +10,12 @@ import pytest import pandas as pd -from pandas import DataFrame, Index, Panel, Series, date_range +from pandas import DataFrame, Index, Series, date_range from pandas.util import testing as tm class TestPartialSetting(object): - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") @pytest.mark.filterwarnings("ignore:\\n.ix:DeprecationWarning") def test_partial_setting(self): @@ -116,35 +115,6 @@ def test_partial_setting(self): df.ix[:, 'C'] = df.ix[:, 'A'] tm.assert_frame_equal(df, expected) - with catch_warnings(record=True): - # ## panel ## - p_orig = Panel(np.arange(16).reshape(2, 4, 2), - items=['Item1', 'Item2'], - major_axis=pd.date_range('2001/1/12', periods=4), - minor_axis=['A', 'B'], dtype='float64') - - # panel setting via item - p_orig = Panel(np.arange(16).reshape(2, 4, 2), - items=['Item1', 'Item2'], - major_axis=pd.date_range('2001/1/12', periods=4), - minor_axis=['A', 'B'], dtype='float64') - expected = p_orig.copy() - expected['Item3'] = expected['Item1'] - p = p_orig.copy() - p.loc['Item3'] = p['Item1'] - tm.assert_panel_equal(p, expected) - - # panel with aligned series - expected = p_orig.copy() - expected = expected.transpose(2, 1, 0) - expected['C'] = DataFrame({'Item1': [30, 30, 30, 30], - 'Item2': [32, 32, 32, 32]}, - index=p_orig.major_axis) - expected = expected.transpose(2, 1, 0) - p = p_orig.copy() - p.loc[:, :, 'C'] = Series([30, 32], index=p_orig.items) - tm.assert_panel_equal(p, expected) - # GH 8473 dates = date_range('1/1/2000', periods=8) df_orig = DataFrame(np.random.randn(8, 4), index=dates, @@ -246,7 +216,10 @@ def test_series_partial_set(self): tm.assert_series_equal(result, expected, check_index_type=True) # raises as nothing in in the index - pytest.raises(KeyError, lambda: ser.loc[[3, 3, 3]]) + msg = (r"\"None of \[Int64Index\(\[3, 3, 3\], dtype='int64'\)\] are" + r" in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + ser.loc[[3, 3, 3]] expected = Series([0.2, 0.2, np.nan], index=[2, 2, 3]) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -342,7 +315,10 @@ def test_series_partial_set_with_name(self): tm.assert_series_equal(result, expected, check_index_type=True) # raises as nothing in in the index - pytest.raises(KeyError, lambda: ser.loc[[3, 3, 3]]) + msg = (r"\"None of \[Int64Index\(\[3, 3, 3\], dtype='int64'," + r" name=u?'idx'\)\] are in the \[index\]\"") + with pytest.raises(KeyError, match=msg): + ser.loc[[3, 3, 3]] exp_idx = Index([2, 2, 3], dtype='int64', name='idx') expected = Series([0.2, 0.2, np.nan], index=exp_idx, name='s') diff --git a/pandas/tests/indexing/test_scalar.py b/pandas/tests/indexing/test_scalar.py index e4b8181a67514..0cd41562541d1 100644 --- a/pandas/tests/indexing/test_scalar.py +++ b/pandas/tests/indexing/test_scalar.py @@ -30,7 +30,9 @@ def _check(f, func, values=False): for f in [d['labels'], d['ts'], d['floats']]: if f is not None: - pytest.raises(ValueError, self.check_values, f, 'iat') + msg = "iAt based indexing can only have integer indexers" + with pytest.raises(ValueError, match=msg): + self.check_values(f, 'iat') # at for f in [d['ints'], d['uints'], d['labels'], @@ -57,7 +59,9 @@ def _check(f, func, values=False): for f in [d['labels'], d['ts'], d['floats']]: if f is not None: - pytest.raises(ValueError, _check, f, 'iat') + msg = "iAt based indexing can only have integer indexers" + with pytest.raises(ValueError, match=msg): + _check(f, 'iat') # at for f in [d['ints'], d['uints'], d['labels'], @@ -107,8 +111,12 @@ def test_imethods_with_dups(self): result = s.iat[2] assert result == 2 - pytest.raises(IndexError, lambda: s.iat[10]) - pytest.raises(IndexError, lambda: s.iat[-10]) + msg = "index 10 is out of bounds for axis 0 with size 5" + with pytest.raises(IndexError, match=msg): + s.iat[10] + msg = "index -10 is out of bounds for axis 0 with size 5" + with pytest.raises(IndexError, match=msg): + s.iat[-10] result = s.iloc[[2, 3]] expected = Series([2, 3], [2, 2], dtype='int64') @@ -128,22 +136,30 @@ def test_at_to_fail(self): s = Series([1, 2, 3], index=list('abc')) result = s.at['a'] assert result == 1 - pytest.raises(ValueError, lambda: s.at[0]) + msg = ("At based indexing on an non-integer index can only have" + " non-integer indexers") + with pytest.raises(ValueError, match=msg): + s.at[0] df = DataFrame({'A': [1, 2, 3]}, index=list('abc')) result = df.at['a', 'A'] assert result == 1 - pytest.raises(ValueError, lambda: df.at['a', 0]) + with pytest.raises(ValueError, match=msg): + df.at['a', 0] s = Series([1, 2, 3], index=[3, 2, 1]) result = s.at[1] assert result == 3 - pytest.raises(ValueError, lambda: s.at['a']) + msg = ("At based indexing on an integer index can only have integer" + " indexers") + with pytest.raises(ValueError, match=msg): + s.at['a'] df = DataFrame({0: [1, 2, 3]}, index=[3, 2, 1]) result = df.at[1, 0] assert result == 3 - pytest.raises(ValueError, lambda: df.at['a', 0]) + with pytest.raises(ValueError, match=msg): + df.at['a', 0] # GH 13822, incorrect error string with non-unique columns when missing # column is accessed @@ -205,3 +221,16 @@ def test_iat_setter_incompatible_assignment(self): result.iat[0, 0] = None expected = DataFrame({"a": [None, 1], "b": [4, 5]}) tm.assert_frame_equal(result, expected) + + def test_getitem_zerodim_np_array(self): + # GH24924 + # dataframe __getitem__ + df = DataFrame([[1, 2], [3, 4]]) + result = df[np.array(0)] + expected = Series([1, 3], name=0) + tm.assert_series_equal(result, expected) + + # series __getitem__ + s = Series([1, 2]) + result = s[np.array(0)] + assert result == 1 diff --git a/pandas/tests/internals/test_internals.py b/pandas/tests/internals/test_internals.py index fe0706efdc4f8..bda486411e01e 100644 --- a/pandas/tests/internals/test_internals.py +++ b/pandas/tests/internals/test_internals.py @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- # pylint: disable=W0102 - +from collections import OrderedDict from datetime import date, datetime from distutils.version import LooseVersion import itertools @@ -12,7 +12,7 @@ import pytest from pandas._libs.internals import BlockPlacement -from pandas.compat import OrderedDict, lrange, u, zip +from pandas.compat import lrange, u, zip import pandas as pd from pandas import ( diff --git a/pandas/tests/io/data/legacy_hdf/legacy_table.h5 b/pandas/tests/io/data/legacy_hdf/legacy_table.h5 deleted file mode 100644 index 1c90382d9125c..0000000000000 Binary files a/pandas/tests/io/data/legacy_hdf/legacy_table.h5 and /dev/null differ diff --git a/pandas/tests/io/data/legacy_hdf/legacy_table_py2.h5 b/pandas/tests/io/data/legacy_hdf/legacy_table_py2.h5 new file mode 100644 index 0000000000000..3863d714a315b Binary files /dev/null and b/pandas/tests/io/data/legacy_hdf/legacy_table_py2.h5 differ diff --git a/pandas/tests/io/formats/test_console.py b/pandas/tests/io/formats/test_console.py index 055763bf62d6e..a3e0e195f4864 100644 --- a/pandas/tests/io/formats/test_console.py +++ b/pandas/tests/io/formats/test_console.py @@ -1,6 +1,9 @@ +import subprocess # noqa: F401 + import pytest from pandas.io.formats.console import detect_console_encoding +from pandas.io.formats.terminal import _get_terminal_size_tput class MockEncoding(object): # TODO(py27): replace with mock @@ -72,3 +75,18 @@ def test_detect_console_encoding_fallback_to_default(monkeypatch, std, locale): context.setattr('sys.stdout', MockEncoding(std)) context.setattr('sys.getdefaultencoding', lambda: 'sysDefaultEncoding') assert detect_console_encoding() == 'sysDefaultEncoding' + + +@pytest.mark.parametrize("size", ['', ['']]) +def test_terminal_unknown_dimensions(monkeypatch, size, mocker): + + def communicate(*args, **kwargs): + return size + + monkeypatch.setattr('subprocess.Popen', mocker.Mock()) + monkeypatch.setattr('subprocess.Popen.return_value.returncode', None) + monkeypatch.setattr( + 'subprocess.Popen.return_value.communicate', communicate) + result = _get_terminal_size_tput() + + assert result is None diff --git a/pandas/tests/io/formats/test_format.py b/pandas/tests/io/formats/test_format.py index 5d922ccaf1fd5..b0cf5a2f17609 100644 --- a/pandas/tests/io/formats/test_format.py +++ b/pandas/tests/io/formats/test_format.py @@ -12,6 +12,7 @@ import os import re import sys +import textwrap import warnings import dateutil @@ -2777,3 +2778,17 @@ def test_format_percentiles(): fmt.format_percentiles([2, 0.1, 0.5]) with pytest.raises(ValueError, match=msg): fmt.format_percentiles([0.1, 0.5, 'a']) + + +def test_repr_html_ipython_config(ip): + code = textwrap.dedent("""\ + import pandas as pd + df = pd.DataFrame({"A": [1, 2]}) + df._repr_html_() + + cfg = get_ipython().config + cfg['IPKernelApp']['parent_appname'] + df._repr_html_() + """) + result = ip.run_cell(code) + assert not result.error_in_exec diff --git a/pandas/tests/io/formats/test_to_html.py b/pandas/tests/io/formats/test_to_html.py index 554cfd306e2a7..428f1411a10a6 100644 --- a/pandas/tests/io/formats/test_to_html.py +++ b/pandas/tests/io/formats/test_to_html.py @@ -15,6 +15,15 @@ import pandas.io.formats.format as fmt +lorem_ipsum = ( + "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod" + " tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim" + " veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex" + " ea commodo consequat. Duis aute irure dolor in reprehenderit in" + " voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur" + " sint occaecat cupidatat non proident, sunt in culpa qui officia" + " deserunt mollit anim id est laborum.") + def expected_html(datapath, name): """ @@ -600,3 +609,17 @@ def test_to_html_render_links(render_links, expected, datapath): result = df.to_html(render_links=render_links) expected = expected_html(datapath, expected) assert result == expected + + +@pytest.mark.parametrize('method,expected', [ + ('to_html', lambda x:lorem_ipsum), + ('_repr_html_', lambda x:lorem_ipsum[:x - 4] + '...') # regression case +]) +@pytest.mark.parametrize('max_colwidth', [10, 20, 50, 100]) +def test_ignore_display_max_colwidth(method, expected, max_colwidth): + # see gh-17004 + df = DataFrame([lorem_ipsum]) + with pd.option_context('display.max_colwidth', max_colwidth): + result = getattr(df, method)() + expected = expected(max_colwidth) + assert expected in result diff --git a/pandas/tests/io/generate_legacy_storage_files.py b/pandas/tests/io/generate_legacy_storage_files.py index 6774eac6d6c1a..6c6e28cb1c090 100755 --- a/pandas/tests/io/generate_legacy_storage_files.py +++ b/pandas/tests/io/generate_legacy_storage_files.py @@ -41,7 +41,6 @@ import os import platform as pl import sys -from warnings import catch_warnings, filterwarnings import numpy as np @@ -49,7 +48,7 @@ import pandas from pandas import ( - Categorical, DataFrame, Index, MultiIndex, NaT, Panel, Period, Series, + Categorical, DataFrame, Index, MultiIndex, NaT, Period, Series, SparseDataFrame, SparseSeries, Timestamp, bdate_range, date_range, period_range, timedelta_range, to_msgpack) @@ -187,18 +186,6 @@ def create_data(): u'C': Timestamp('20130603', tz='UTC')}, index=range(5)) ) - with catch_warnings(record=True): - filterwarnings("ignore", "\\nPanel", FutureWarning) - mixed_dup_panel = Panel({u'ItemA': frame[u'float'], - u'ItemB': frame[u'int']}) - mixed_dup_panel.items = [u'ItemA', u'ItemA'] - panel = dict(float=Panel({u'ItemA': frame[u'float'], - u'ItemB': frame[u'float'] + 1}), - dup=Panel( - np.arange(30).reshape(3, 5, 2).astype(np.float64), - items=[u'A', u'B', u'A']), - mixed_dup=mixed_dup_panel) - cat = dict(int8=Categorical(list('abcdefg')), int16=Categorical(np.arange(1000)), int32=Categorical(np.arange(10000))) @@ -241,7 +228,6 @@ def create_data(): return dict(series=series, frame=frame, - panel=panel, index=index, scalars=scalars, mi=mi, diff --git a/pandas/tests/io/json/test_json_table_schema.py b/pandas/tests/io/json/test_json_table_schema.py index 6fa3b5b3b2ed4..351b495e5d8fc 100644 --- a/pandas/tests/io/json/test_json_table_schema.py +++ b/pandas/tests/io/json/test_json_table_schema.py @@ -502,12 +502,12 @@ class TestTableOrientReader(object): @pytest.mark.parametrize("vals", [ {'ints': [1, 2, 3, 4]}, {'objects': ['a', 'b', 'c', 'd']}, + {'objects': ['1', '2', '3', '4']}, {'date_ranges': pd.date_range('2016-01-01', freq='d', periods=4)}, {'categoricals': pd.Series(pd.Categorical(['a', 'b', 'c', 'c']))}, {'ordered_cats': pd.Series(pd.Categorical(['a', 'b', 'c', 'c'], ordered=True))}, - pytest.param({'floats': [1., 2., 3., 4.]}, - marks=pytest.mark.xfail), + {'floats': [1., 2., 3., 4.]}, {'floats': [1.1, 2.2, 3.3, 4.4]}, {'bools': [True, False, False, True]}]) def test_read_json_table_orient(self, index_nm, vals, recwarn): @@ -564,17 +564,10 @@ def test_multiindex(self, index_names): result = pd.read_json(out, orient="table") tm.assert_frame_equal(df, result) - @pytest.mark.parametrize("strict_check", [ - pytest.param(True, marks=pytest.mark.xfail), - False - ]) - def test_empty_frame_roundtrip(self, strict_check): + def test_empty_frame_roundtrip(self): # GH 21287 df = pd.DataFrame([], columns=['a', 'b', 'c']) expected = df.copy() out = df.to_json(orient='table') result = pd.read_json(out, orient='table') - # TODO: When DF coercion issue (#21345) is resolved tighten type checks - tm.assert_frame_equal(expected, result, - check_dtype=strict_check, - check_index_type=strict_check) + tm.assert_frame_equal(expected, result) diff --git a/pandas/tests/io/json/test_pandas.py b/pandas/tests/io/json/test_pandas.py index 23c40276072d6..ed598b730d960 100644 --- a/pandas/tests/io/json/test_pandas.py +++ b/pandas/tests/io/json/test_pandas.py @@ -1,5 +1,6 @@ # -*- coding: utf-8 -*- # pylint: disable-msg=W0612,E1101 +from collections import OrderedDict from datetime import timedelta import json import os @@ -7,8 +8,7 @@ import numpy as np import pytest -from pandas.compat import ( - OrderedDict, StringIO, is_platform_32bit, lrange, range) +from pandas.compat import StringIO, is_platform_32bit, lrange, range import pandas.util._test_decorators as td import pandas as pd @@ -194,7 +194,7 @@ def _check_orient(df, orient, dtype=None, numpy=False, else: unser = unser.sort_index() - if dtype is False: + if not dtype: check_dtype = False if not convert_axes and df.index.dtype.type == np.datetime64: @@ -1202,6 +1202,40 @@ def test_data_frame_size_after_to_json(self): assert size_before == size_after + @pytest.mark.parametrize('index', [None, [1, 2], [1., 2.], ['a', 'b'], + ['1', '2'], ['1.', '2.']]) + @pytest.mark.parametrize('columns', [['a', 'b'], ['1', '2'], ['1.', '2.']]) + def test_from_json_to_json_table_index_and_columns(self, index, columns): + # GH25433 GH25435 + expected = DataFrame([[1, 2], [3, 4]], index=index, columns=columns) + dfjson = expected.to_json(orient='table') + result = pd.read_json(dfjson, orient='table') + assert_frame_equal(result, expected) + + def test_from_json_to_json_table_dtypes(self): + # GH21345 + expected = pd.DataFrame({'a': [1, 2], 'b': [3., 4.], 'c': ['5', '6']}) + dfjson = expected.to_json(orient='table') + result = pd.read_json(dfjson, orient='table') + assert_frame_equal(result, expected) + + @pytest.mark.parametrize('dtype', [True, {'b': int, 'c': int}]) + def test_read_json_table_dtype_raises(self, dtype): + # GH21345 + df = pd.DataFrame({'a': [1, 2], 'b': [3., 4.], 'c': ['5', '6']}) + dfjson = df.to_json(orient='table') + msg = "cannot pass both dtype and orient='table'" + with pytest.raises(ValueError, match=msg): + pd.read_json(dfjson, orient='table', dtype=dtype) + + def test_read_json_table_convert_axes_raises(self): + # GH25433 GH25435 + df = DataFrame([[1, 2], [3, 4]], index=[1., 2.], columns=['1.', '2.']) + dfjson = df.to_json(orient='table') + msg = "cannot pass both convert_axes and orient='table'" + with pytest.raises(ValueError, match=msg): + pd.read_json(dfjson, orient='table', convert_axes=True) + @pytest.mark.parametrize('data, expected', [ (DataFrame([[1, 2], [4, 5]], columns=['a', 'b']), {'columns': ['a', 'b'], 'data': [[1, 2], [4, 5]]}), @@ -1262,3 +1296,13 @@ def test_index_false_error_to_json(self, orient): "'orient' is 'split' or 'table'") with pytest.raises(ValueError, match=msg): df.to_json(orient=orient, index=False) + + @pytest.mark.parametrize('orient', ['split', 'table']) + @pytest.mark.parametrize('index', [True, False]) + def test_index_false_from_json_to_json(self, orient, index): + # GH25170 + # Test index=False in from_json to_json + expected = DataFrame({'a': [1, 2], 'b': [3, 4]}) + dfjson = expected.to_json(orient=orient, index=index) + result = read_json(dfjson, orient=orient) + assert_frame_equal(result, expected) diff --git a/pandas/tests/io/msgpack/test_pack.py b/pandas/tests/io/msgpack/test_pack.py index 8c82d0d2cf870..078d9f4ceb649 100644 --- a/pandas/tests/io/msgpack/test_pack.py +++ b/pandas/tests/io/msgpack/test_pack.py @@ -1,10 +1,10 @@ # coding: utf-8 - +from collections import OrderedDict import struct import pytest -from pandas.compat import OrderedDict, u +from pandas.compat import u from pandas import compat diff --git a/pandas/tests/io/test_clipboard.py b/pandas/tests/io/test_clipboard.py index 8eb26d9f3dec5..565db92210b0a 100644 --- a/pandas/tests/io/test_clipboard.py +++ b/pandas/tests/io/test_clipboard.py @@ -12,6 +12,7 @@ from pandas.util import testing as tm from pandas.util.testing import makeCustomDataframe as mkdf +from pandas.io.clipboard import clipboard_get, clipboard_set from pandas.io.clipboard.exceptions import PyperclipException try: @@ -30,8 +31,8 @@ def build_kwargs(sep, excel): return kwargs -@pytest.fixture(params=['delims', 'utf8', 'string', 'long', 'nonascii', - 'colwidth', 'mixed', 'float', 'int']) +@pytest.fixture(params=['delims', 'utf8', 'utf16', 'string', 'long', + 'nonascii', 'colwidth', 'mixed', 'float', 'int']) def df(request): data_type = request.param @@ -41,6 +42,10 @@ def df(request): elif data_type == 'utf8': return pd.DataFrame({'a': ['µasd', 'Ωœ∑´'], 'b': ['øπ∆˚¬', 'œ∑´®']}) + elif data_type == 'utf16': + return pd.DataFrame({'a': ['\U0001f44d\U0001f44d', + '\U0001f44d\U0001f44d'], + 'b': ['abc', 'def']}) elif data_type == 'string': return mkdf(5, 3, c_idx_type='s', r_idx_type='i', c_idx_names=[None], r_idx_names=[None]) @@ -225,3 +230,14 @@ def test_invalid_encoding(self, df): @pytest.mark.parametrize('enc', ['UTF-8', 'utf-8', 'utf8']) def test_round_trip_valid_encodings(self, enc, df): self.check_round_trip_frame(df, encoding=enc) + + +@pytest.mark.single +@pytest.mark.clipboard +@pytest.mark.skipif(not _DEPS_INSTALLED, + reason="clipboard primitives not installed") +@pytest.mark.parametrize('data', [u'\U0001f44d...', u'Ωœ∑´...', 'abcd...']) +def test_raw_roundtrip(data): + # PR #25040 wide unicode wasn't copied correctly on PY3 on windows + clipboard_set(data) + assert data == clipboard_get() diff --git a/pandas/tests/io/test_excel.py b/pandas/tests/io/test_excel.py index 717e9bc23c6b1..04c9c58a326a4 100644 --- a/pandas/tests/io/test_excel.py +++ b/pandas/tests/io/test_excel.py @@ -5,7 +5,6 @@ from functools import partial import os import warnings -from warnings import catch_warnings import numpy as np from numpy import nan @@ -2360,7 +2359,7 @@ def test_register_writer(self): class DummyClass(ExcelWriter): called_save = False called_write_cells = False - supported_extensions = ['test', 'xlsx', 'xls'] + supported_extensions = ['xlsx', 'xls'] engine = 'dummy' def save(self): @@ -2378,19 +2377,13 @@ def check_called(func): with pd.option_context('io.excel.xlsx.writer', 'dummy'): register_writer(DummyClass) - writer = ExcelWriter('something.test') + writer = ExcelWriter('something.xlsx') assert isinstance(writer, DummyClass) df = tm.makeCustomDataframe(1, 1) - - with catch_warnings(record=True): - panel = tm.makePanel() - func = lambda: df.to_excel('something.test') - check_called(func) - check_called(lambda: panel.to_excel('something.test')) - check_called(lambda: df.to_excel('something.xlsx')) - check_called( - lambda: df.to_excel( - 'something.xls', engine='dummy')) + check_called(lambda: df.to_excel('something.xlsx')) + check_called( + lambda: df.to_excel( + 'something.xls', engine='dummy')) @pytest.mark.parametrize('engine', [ @@ -2417,7 +2410,10 @@ def style(df): ['', '', '']], index=df.index, columns=df.columns) - def assert_equal_style(cell1, cell2): + def assert_equal_style(cell1, cell2, engine): + if engine in ['xlsxwriter', 'openpyxl']: + pytest.xfail(reason=("GH25351: failing on some attribute " + "comparisons in {}".format(engine))) # XXX: should find a better way to check equality assert cell1.alignment.__dict__ == cell2.alignment.__dict__ assert cell1.border.__dict__ == cell2.border.__dict__ @@ -2461,7 +2457,7 @@ def custom_converter(css): assert len(col1) == len(col2) for cell1, cell2 in zip(col1, col2): assert cell1.value == cell2.value - assert_equal_style(cell1, cell2) + assert_equal_style(cell1, cell2, engine) n_cells += 1 # ensure iteration actually happened: @@ -2519,7 +2515,7 @@ def custom_converter(css): assert cell1.number_format == 'General' assert cell2.number_format == '0%' else: - assert_equal_style(cell1, cell2) + assert_equal_style(cell1, cell2, engine) assert cell1.value == cell2.value n_cells += 1 @@ -2537,7 +2533,7 @@ def custom_converter(css): assert not cell1.font.bold assert cell2.font.bold else: - assert_equal_style(cell1, cell2) + assert_equal_style(cell1, cell2, engine) assert cell1.value == cell2.value n_cells += 1 diff --git a/pandas/tests/io/test_packers.py b/pandas/tests/io/test_packers.py index 9eb6d327be025..375557c43a3ae 100644 --- a/pandas/tests/io/test_packers.py +++ b/pandas/tests/io/test_packers.py @@ -13,9 +13,8 @@ import pandas from pandas import ( - Categorical, DataFrame, Index, Interval, MultiIndex, NaT, Panel, Period, - Series, Timestamp, bdate_range, compat, date_range, period_range) -from pandas.tests.test_panel import assert_panel_equal + Categorical, DataFrame, Index, Interval, MultiIndex, NaT, Period, Series, + Timestamp, bdate_range, compat, date_range, period_range) import pandas.util.testing as tm from pandas.util.testing import ( assert_categorical_equal, assert_frame_equal, assert_index_equal, @@ -62,8 +61,6 @@ def check_arbitrary(a, b): assert(len(a) == len(b)) for a_, b_ in zip(a, b): check_arbitrary(a_, b_) - elif isinstance(a, Panel): - assert_panel_equal(a, b) elif isinstance(a, DataFrame): assert_frame_equal(a, b) elif isinstance(a, Series): @@ -490,23 +487,12 @@ def setup_method(self, method): 'int': DataFrame(dict(A=data['B'], B=Series(data['B']) + 1)), 'mixed': DataFrame(data)} - self.panel = { - 'float': Panel(dict(ItemA=self.frame['float'], - ItemB=self.frame['float'] + 1))} - def test_basic_frame(self): for s, i in self.frame.items(): i_rec = self.encode_decode(i) assert_frame_equal(i, i_rec) - def test_basic_panel(self): - - with catch_warnings(record=True): - for s, i in self.panel.items(): - i_rec = self.encode_decode(i) - assert_panel_equal(i, i_rec) - def test_multi(self): i_rec = self.encode_decode(self.frame) @@ -876,6 +862,10 @@ class TestMsgpack(object): def check_min_structure(self, data, version): for typ, v in self.minimum_structure.items(): + if typ == "panel": + # FIXME: kludge; get this key out of the legacy file + continue + assert typ in data, '"{0}" not found in unpacked data'.format(typ) for kind in v: msg = '"{0}" not found in data["{1}"]'.format(kind, typ) @@ -887,6 +877,11 @@ def compare(self, current_data, all_data, vf, version): data = read_msgpack(vf, encoding='latin-1') else: data = read_msgpack(vf) + + if "panel" in data: + # FIXME: kludge; get the key out of the stored file + del data["panel"] + self.check_min_structure(data, version) for typ, dv in data.items(): assert typ in all_data, ('unpacked data contains ' diff --git a/pandas/tests/io/test_pickle.py b/pandas/tests/io/test_pickle.py index 7f3fe1aa401ea..b4befadaddc42 100644 --- a/pandas/tests/io/test_pickle.py +++ b/pandas/tests/io/test_pickle.py @@ -75,6 +75,10 @@ def compare(data, vf, version): m = globals() for typ, dv in data.items(): + if typ == "panel": + # FIXME: kludge; get this key out of the legacy file + continue + for dt, result in dv.items(): try: expected = data[typ][dt] diff --git a/pandas/tests/io/test_pytables.py b/pandas/tests/io/test_pytables.py index 517a3e059469c..69ff32d1b728b 100644 --- a/pandas/tests/io/test_pytables.py +++ b/pandas/tests/io/test_pytables.py @@ -19,11 +19,11 @@ import pandas as pd from pandas import ( Categorical, DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, - Panel, RangeIndex, Series, Timestamp, bdate_range, compat, concat, - date_range, isna, timedelta_range) + RangeIndex, Series, Timestamp, bdate_range, compat, concat, date_range, + isna, timedelta_range) import pandas.util.testing as tm from pandas.util.testing import ( - assert_frame_equal, assert_panel_equal, assert_series_equal, set_timezone) + assert_frame_equal, assert_series_equal, set_timezone) from pandas.io import pytables as pytables # noqa:E402 from pandas.io.formats.printing import pprint_thing @@ -34,6 +34,15 @@ tables = pytest.importorskip('tables') +# TODO: +# remove when gh-24839 is fixed; this affects numpy 1.16 +# and pytables 3.4.4 +xfail_non_writeable = pytest.mark.xfail( + LooseVersion(np.__version__) >= LooseVersion('1.16'), + reason=('gh-25511, gh-24839. pytables needs a ' + 'release beyong 3.4.4 to support numpy 1.16x')) + + _default_compressor = ('blosc' if LooseVersion(tables.__version__) >= LooseVersion('2.2') else 'zlib') @@ -141,7 +150,6 @@ def teardown_method(self, method): @pytest.mark.single -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class TestHDFStore(Base): def test_format_kwarg_in_constructor(self): @@ -185,11 +193,6 @@ def roundtrip(key, obj, **kwargs): o = tm.makeDataFrame() assert_frame_equal(o, roundtrip('frame', o)) - with catch_warnings(record=True): - - o = tm.makePanel() - assert_panel_equal(o, roundtrip('panel', o)) - # table df = DataFrame(dict(A=lrange(5), B=lrange(5))) df.to_hdf(path, 'table', append=True) @@ -348,11 +351,9 @@ def test_keys(self): store['a'] = tm.makeTimeSeries() store['b'] = tm.makeStringSeries() store['c'] = tm.makeDataFrame() - with catch_warnings(record=True): - store['d'] = tm.makePanel() - store['foo/bar'] = tm.makePanel() - assert len(store) == 5 - expected = {'/a', '/b', '/c', '/d', '/foo/bar'} + + assert len(store) == 3 + expected = {'/a', '/b', '/c'} assert set(store.keys()) == expected assert set(store) == expected @@ -388,11 +389,6 @@ def test_repr(self): store['b'] = tm.makeStringSeries() store['c'] = tm.makeDataFrame() - with catch_warnings(record=True): - store['d'] = tm.makePanel() - store['foo/bar'] = tm.makePanel() - store.append('e', tm.makePanel()) - df = tm.makeDataFrame() df['obj1'] = 'foo' df['obj2'] = 'bar' @@ -875,6 +871,7 @@ def test_put_integer(self): df = DataFrame(np.random.randn(50, 100)) self._check_roundtrip(df, tm.assert_frame_equal) + @xfail_non_writeable def test_put_mixed_type(self): df = tm.makeTimeDataFrame() df['obj1'] = 'foo' @@ -936,21 +933,6 @@ def test_append(self): store.append('/df3 foo', df[10:]) tm.assert_frame_equal(store['df3 foo'], df) - # panel - wp = tm.makePanel() - _maybe_remove(store, 'wp1') - store.append('wp1', wp.iloc[:, :10, :]) - store.append('wp1', wp.iloc[:, 10:, :]) - assert_panel_equal(store['wp1'], wp) - - # test using differt order of items on the non-index axes - _maybe_remove(store, 'wp1') - wp_append1 = wp.iloc[:, :10, :] - store.append('wp1', wp_append1) - wp_append2 = wp.iloc[:, 10:, :].reindex(items=wp.items[::-1]) - store.append('wp1', wp_append2) - assert_panel_equal(store['wp1'], wp) - # dtype issues - mizxed type in a single object column df = DataFrame(data=[[1, 2], [0, 1], [1, 2], [0, 0]]) df['mixed_column'] = 'testing' @@ -1254,22 +1236,6 @@ def test_append_all_nans(self): reloaded = read_hdf(path, 'df_with_missing') tm.assert_frame_equal(df_with_missing, reloaded) - matrix = [[[np.nan, np.nan, np.nan], [1, np.nan, np.nan]], - [[np.nan, np.nan, np.nan], [np.nan, 5, 6]], - [[np.nan, np.nan, np.nan], [np.nan, 3, np.nan]]] - - with catch_warnings(record=True): - panel_with_missing = Panel(matrix, - items=['Item1', 'Item2', 'Item3'], - major_axis=[1, 2], - minor_axis=['A', 'B', 'C']) - - with ensure_clean_path(self.path) as path: - panel_with_missing.to_hdf( - path, 'panel_with_missing', format='table') - reloaded_panel = read_hdf(path, 'panel_with_missing') - tm.assert_panel_equal(panel_with_missing, reloaded_panel) - def test_append_frame_column_oriented(self): with ensure_clean_store(self.path) as store: @@ -1342,40 +1308,11 @@ def test_append_with_strings(self): with ensure_clean_store(self.path) as store: with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - wp = tm.makePanel() - wp2 = wp.rename( - minor_axis={x: "%s_extra" % x for x in wp.minor_axis}) def check_col(key, name, size): assert getattr(store.get_storer(key) .table.description, name).itemsize == size - store.append('s1', wp, min_itemsize=20) - store.append('s1', wp2) - expected = concat([wp, wp2], axis=2) - expected = expected.reindex( - minor_axis=sorted(expected.minor_axis)) - assert_panel_equal(store['s1'], expected) - check_col('s1', 'minor_axis', 20) - - # test dict format - store.append('s2', wp, min_itemsize={'minor_axis': 20}) - store.append('s2', wp2) - expected = concat([wp, wp2], axis=2) - expected = expected.reindex( - minor_axis=sorted(expected.minor_axis)) - assert_panel_equal(store['s2'], expected) - check_col('s2', 'minor_axis', 20) - - # apply the wrong field (similar to #1) - store.append('s3', wp, min_itemsize={'major_axis': 20}) - pytest.raises(ValueError, store.append, 's3', wp2) - - # test truncation of bigger strings - store.append('s4', wp) - pytest.raises(ValueError, store.append, 's4', wp2) - # avoid truncation on elements df = DataFrame([[123, 'asdqwerty'], [345, 'dggnhebbsdfbdfb']]) store.append('df_big', df) @@ -1511,7 +1448,10 @@ def test_to_hdf_with_min_itemsize(self): tm.assert_series_equal(pd.read_hdf(path, 'ss4'), pd.concat([df['B'], df2['B']])) - @pytest.mark.parametrize("format", ['fixed', 'table']) + @pytest.mark.parametrize( + "format", + [pytest.param('fixed', marks=xfail_non_writeable), + 'table']) def test_to_hdf_errors(self, format): data = ['\ud800foo'] @@ -1674,32 +1614,6 @@ def check_col(key, name, size): (df_dc.string == 'foo')] tm.assert_frame_equal(result, expected) - with ensure_clean_store(self.path) as store: - with catch_warnings(record=True): - # panel - # GH5717 not handling data_columns - np.random.seed(1234) - p = tm.makePanel() - - store.append('p1', p) - tm.assert_panel_equal(store.select('p1'), p) - - store.append('p2', p, data_columns=True) - tm.assert_panel_equal(store.select('p2'), p) - - result = store.select('p2', where='ItemA>0') - expected = p.to_frame() - expected = expected[expected['ItemA'] > 0] - tm.assert_frame_equal(result.to_frame(), expected) - - result = store.select( - 'p2', where='ItemA>0 & minor_axis=["A","B"]') - expected = p.to_frame() - expected = expected[expected['ItemA'] > 0] - expected = expected[expected.reset_index( - level=['major']).index.isin(['A', 'B'])] - tm.assert_frame_equal(result.to_frame(), expected) - def test_create_table_index(self): with ensure_clean_store(self.path) as store: @@ -1708,37 +1622,6 @@ def test_create_table_index(self): def col(t, column): return getattr(store.get_storer(t).table.cols, column) - # index=False - wp = tm.makePanel() - store.append('p5', wp, index=False) - store.create_table_index('p5', columns=['major_axis']) - assert(col('p5', 'major_axis').is_indexed is True) - assert(col('p5', 'minor_axis').is_indexed is False) - - # index=True - store.append('p5i', wp, index=True) - assert(col('p5i', 'major_axis').is_indexed is True) - assert(col('p5i', 'minor_axis').is_indexed is True) - - # default optlevels - store.get_storer('p5').create_index() - assert(col('p5', 'major_axis').index.optlevel == 6) - assert(col('p5', 'minor_axis').index.kind == 'medium') - - # let's change the indexing scheme - store.create_table_index('p5') - assert(col('p5', 'major_axis').index.optlevel == 6) - assert(col('p5', 'minor_axis').index.kind == 'medium') - store.create_table_index('p5', optlevel=9) - assert(col('p5', 'major_axis').index.optlevel == 9) - assert(col('p5', 'minor_axis').index.kind == 'medium') - store.create_table_index('p5', kind='full') - assert(col('p5', 'major_axis').index.optlevel == 9) - assert(col('p5', 'minor_axis').index.kind == 'full') - store.create_table_index('p5', optlevel=1, kind='light') - assert(col('p5', 'major_axis').index.optlevel == 1) - assert(col('p5', 'minor_axis').index.kind == 'light') - # data columns df = tm.makeTimeDataFrame() df['string'] = 'foo' @@ -1761,19 +1644,6 @@ def col(t, column): store.put('f2', df) pytest.raises(TypeError, store.create_table_index, 'f2') - def test_append_diff_item_order(self): - - with catch_warnings(record=True): - wp = tm.makePanel() - wp1 = wp.iloc[:, :10, :] - wp2 = wp.iloc[wp.items.get_indexer(['ItemC', 'ItemB', 'ItemA']), - 10:, :] - - with ensure_clean_store(self.path) as store: - store.put('panel', wp1, format='table') - pytest.raises(ValueError, store.put, 'panel', wp2, - append=True) - def test_append_hierarchical(self): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], @@ -1958,6 +1828,7 @@ def test_pass_spec_to_storer(self): pytest.raises(TypeError, store.select, 'df', where=[('columns=A')]) + @xfail_non_writeable def test_append_misc(self): with ensure_clean_store(self.path) as store: @@ -1987,10 +1858,6 @@ def check(obj, comparator): df['time2'] = Timestamp('20130102') check(df, tm.assert_frame_equal) - with catch_warnings(record=True): - p = tm.makePanel() - check(p, assert_panel_equal) - # empty frame, GH4273 with ensure_clean_store(self.path) as store: @@ -2011,24 +1878,6 @@ def check(obj, comparator): store.put('df2', df) assert_frame_equal(store.select('df2'), df) - with catch_warnings(record=True): - - # 0 len - p_empty = Panel(items=list('ABC')) - store.append('p', p_empty) - pytest.raises(KeyError, store.select, 'p') - - # repeated append of 0/non-zero frames - p = Panel(np.random.randn(3, 4, 5), items=list('ABC')) - store.append('p', p) - assert_panel_equal(store.select('p'), p) - store.append('p', p_empty) - assert_panel_equal(store.select('p'), p) - - # store - store.put('p2', p_empty) - assert_panel_equal(store.select('p2'), p_empty) - def test_append_raise(self): with ensure_clean_store(self.path) as store: @@ -2143,24 +1992,6 @@ def test_table_mixed_dtypes(self): store.append('df1_mixed', df) tm.assert_frame_equal(store.select('df1_mixed'), df) - with catch_warnings(record=True): - - # panel - wp = tm.makePanel() - wp['obj1'] = 'foo' - wp['obj2'] = 'bar' - wp['bool1'] = wp['ItemA'] > 0 - wp['bool2'] = wp['ItemB'] > 0 - wp['int1'] = 1 - wp['int2'] = 2 - wp = wp._consolidate() - - with catch_warnings(record=True): - - with ensure_clean_store(self.path) as store: - store.append('p1_mixed', wp) - assert_panel_equal(store.select('p1_mixed'), wp) - def test_unimplemented_dtypes_table_columns(self): with ensure_clean_store(self.path) as store: @@ -2189,6 +2020,7 @@ def test_unimplemented_dtypes_table_columns(self): # this fails because we have a date in the object block...... pytest.raises(TypeError, store.append, 'df_unimplemented', df) + @xfail_non_writeable @pytest.mark.skipif( LooseVersion(np.__version__) == LooseVersion('1.15.0'), reason=("Skipping pytables test when numpy version is " @@ -2308,193 +2140,6 @@ def test_remove(self): del store['b'] assert len(store) == 0 - def test_remove_where(self): - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - - # non-existance - crit1 = 'index>foo' - pytest.raises(KeyError, store.remove, 'a', [crit1]) - - # try to remove non-table (with crit) - # non-table ok (where = None) - wp = tm.makePanel(30) - store.put('wp', wp, format='table') - store.remove('wp', ["minor_axis=['A', 'D']"]) - rs = store.select('wp') - expected = wp.reindex(minor_axis=['B', 'C']) - assert_panel_equal(rs, expected) - - # empty where - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - - # deleted number (entire table) - n = store.remove('wp', []) - assert n == 120 - - # non - empty where - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - pytest.raises(ValueError, store.remove, - 'wp', ['foo']) - - def test_remove_startstop(self): - # GH #4835 and #6177 - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - wp = tm.makePanel(30) - - # start - _maybe_remove(store, 'wp1') - store.put('wp1', wp, format='t') - n = store.remove('wp1', start=32) - assert n == 120 - 32 - result = store.select('wp1') - expected = wp.reindex(major_axis=wp.major_axis[:32 // 4]) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp2') - store.put('wp2', wp, format='t') - n = store.remove('wp2', start=-32) - assert n == 32 - result = store.select('wp2') - expected = wp.reindex(major_axis=wp.major_axis[:-32 // 4]) - assert_panel_equal(result, expected) - - # stop - _maybe_remove(store, 'wp3') - store.put('wp3', wp, format='t') - n = store.remove('wp3', stop=32) - assert n == 32 - result = store.select('wp3') - expected = wp.reindex(major_axis=wp.major_axis[32 // 4:]) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp4') - store.put('wp4', wp, format='t') - n = store.remove('wp4', stop=-32) - assert n == 120 - 32 - result = store.select('wp4') - expected = wp.reindex(major_axis=wp.major_axis[-32 // 4:]) - assert_panel_equal(result, expected) - - # start n stop - _maybe_remove(store, 'wp5') - store.put('wp5', wp, format='t') - n = store.remove('wp5', start=16, stop=-16) - assert n == 120 - 32 - result = store.select('wp5') - expected = wp.reindex( - major_axis=(wp.major_axis[:16 // 4] - .union(wp.major_axis[-16 // 4:]))) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp6') - store.put('wp6', wp, format='t') - n = store.remove('wp6', start=16, stop=16) - assert n == 0 - result = store.select('wp6') - expected = wp.reindex(major_axis=wp.major_axis) - assert_panel_equal(result, expected) - - # with where - _maybe_remove(store, 'wp7') - - # TODO: unused? - date = wp.major_axis.take(np.arange(0, 30, 3)) # noqa - - crit = 'major_axis=date' - store.put('wp7', wp, format='t') - n = store.remove('wp7', where=[crit], stop=80) - assert n == 28 - result = store.select('wp7') - expected = wp.reindex(major_axis=wp.major_axis.difference( - wp.major_axis[np.arange(0, 20, 3)])) - assert_panel_equal(result, expected) - - def test_remove_crit(self): - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - wp = tm.makePanel(30) - - # group row removal - _maybe_remove(store, 'wp3') - date4 = wp.major_axis.take([0, 1, 2, 4, 5, 6, 8, 9, 10]) - crit4 = 'major_axis=date4' - store.put('wp3', wp, format='t') - n = store.remove('wp3', where=[crit4]) - assert n == 36 - - result = store.select('wp3') - expected = wp.reindex( - major_axis=wp.major_axis.difference(date4)) - assert_panel_equal(result, expected) - - # upper half - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - date = wp.major_axis[len(wp.major_axis) // 2] - - crit1 = 'major_axis>date' - crit2 = "minor_axis=['A', 'D']" - n = store.remove('wp', where=[crit1]) - assert n == 56 - - n = store.remove('wp', where=[crit2]) - assert n == 32 - - result = store['wp'] - expected = wp.truncate(after=date).reindex(minor=['B', 'C']) - assert_panel_equal(result, expected) - - # individual row elements - _maybe_remove(store, 'wp2') - store.put('wp2', wp, format='table') - - date1 = wp.major_axis[1:3] - crit1 = 'major_axis=date1' - store.remove('wp2', where=[crit1]) - result = store.select('wp2') - expected = wp.reindex( - major_axis=wp.major_axis.difference(date1)) - assert_panel_equal(result, expected) - - date2 = wp.major_axis[5] - crit2 = 'major_axis=date2' - store.remove('wp2', where=[crit2]) - result = store['wp2'] - expected = wp.reindex( - major_axis=(wp.major_axis - .difference(date1) - .difference(Index([date2])) - )) - assert_panel_equal(result, expected) - - date3 = [wp.major_axis[7], wp.major_axis[9]] - crit3 = 'major_axis=date3' - store.remove('wp2', where=[crit3]) - result = store['wp2'] - expected = wp.reindex(major_axis=wp.major_axis - .difference(date1) - .difference(Index([date2])) - .difference(Index(date3))) - assert_panel_equal(result, expected) - - # corners - _maybe_remove(store, 'wp4') - store.put('wp4', wp, format='table') - n = store.remove( - 'wp4', where="major_axis>wp.major_axis[-1]") - result = store.select('wp4') - assert_panel_equal(result, wp) - def test_invalid_terms(self): with ensure_clean_store(self.path) as store: @@ -2504,27 +2149,16 @@ def test_invalid_terms(self): df = tm.makeTimeDataFrame() df['string'] = 'foo' df.loc[0:4, 'string'] = 'bar' - wp = tm.makePanel() store.put('df', df, format='table') - store.put('wp', wp, format='table') # some invalid terms - pytest.raises(ValueError, store.select, - 'wp', "minor=['A', 'B']") - pytest.raises(ValueError, store.select, - 'wp', ["index=['20121114']"]) - pytest.raises(ValueError, store.select, 'wp', [ - "index=['20121114', '20121114']"]) pytest.raises(TypeError, Term) # more invalid pytest.raises( ValueError, store.select, 'df', 'df.index[3]') pytest.raises(SyntaxError, store.select, 'df', 'index>') - pytest.raises( - ValueError, store.select, 'wp', - "major_axis<'20000108' & minor_axis['A', 'B']") # from the docs with ensure_clean_path(self.path) as path: @@ -2546,127 +2180,6 @@ def test_invalid_terms(self): pytest.raises(ValueError, read_hdf, path, 'dfq', where="A>0 or C>0") - def test_terms(self): - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - - wp = tm.makePanel() - wpneg = Panel.fromDict({-1: tm.makeDataFrame(), - 0: tm.makeDataFrame(), - 1: tm.makeDataFrame()}) - - store.put('wp', wp, format='table') - store.put('wpneg', wpneg, format='table') - - # panel - result = store.select( - 'wp', - "major_axis<'20000108' and minor_axis=['A', 'B']") - expected = wp.truncate( - after='20000108').reindex(minor=['A', 'B']) - assert_panel_equal(result, expected) - - # with deprecation - result = store.select( - 'wp', where=("major_axis<'20000108' " - "and minor_axis=['A', 'B']")) - expected = wp.truncate( - after='20000108').reindex(minor=['A', 'B']) - tm.assert_panel_equal(result, expected) - - with catch_warnings(record=True): - - # valid terms - terms = [('major_axis=20121114'), - ('major_axis>20121114'), - (("major_axis=['20121114', '20121114']"),), - ('major_axis=datetime.datetime(2012, 11, 14)'), - 'major_axis> 20121114', - 'major_axis >20121114', - 'major_axis > 20121114', - (("minor_axis=['A', 'B']"),), - (("minor_axis=['A', 'B']"),), - ((("minor_axis==['A', 'B']"),),), - (("items=['ItemA', 'ItemB']"),), - ('items=ItemA'), - ] - - for t in terms: - store.select('wp', t) - - with pytest.raises(TypeError, - match='Only named functions are supported'): - store.select( - 'wp', - 'major_axis == (lambda x: x)("20130101")') - - with catch_warnings(record=True): - # check USub node parsing - res = store.select('wpneg', 'items == -1') - expected = Panel({-1: wpneg[-1]}) - tm.assert_panel_equal(res, expected) - - msg = 'Unary addition not supported' - with pytest.raises(NotImplementedError, match=msg): - store.select('wpneg', 'items == +1') - - def test_term_compat(self): - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - store.append('wp', wp) - - result = store.select( - 'wp', where=("major_axis>20000102 " - "and minor_axis=['A', 'B']")) - expected = wp.loc[:, wp.major_axis > - Timestamp('20000102'), ['A', 'B']] - assert_panel_equal(result, expected) - - store.remove('wp', 'major_axis>20000103') - result = store.select('wp') - expected = wp.loc[:, wp.major_axis <= Timestamp('20000103'), :] - assert_panel_equal(result, expected) - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - wp = Panel(np.random.randn(2, 5, 4), - items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - store.append('wp', wp) - - # stringified datetimes - result = store.select( - 'wp', 'major_axis>datetime.datetime(2000, 1, 2)') - expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] - assert_panel_equal(result, expected) - - result = store.select( - 'wp', 'major_axis>datetime.datetime(2000, 1, 2)') - expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] - assert_panel_equal(result, expected) - - result = store.select( - 'wp', - "major_axis=[datetime.datetime(2000, 1, 2, 0, 0), " - "datetime.datetime(2000, 1, 3, 0, 0)]") - expected = wp.loc[:, [Timestamp('20000102'), - Timestamp('20000103')]] - assert_panel_equal(result, expected) - - result = store.select( - 'wp', "minor_axis=['A', 'B']") - expected = wp.loc[:, :, ['A', 'B']] - assert_panel_equal(result, expected) - def test_same_name_scoping(self): with ensure_clean_store(self.path) as store: @@ -2747,6 +2260,7 @@ def test_float_index(self): s = Series(np.random.randn(10), index=index) self._check_roundtrip(s, tm.assert_series_equal) + @xfail_non_writeable def test_tuple_index(self): # GH #492 @@ -2759,6 +2273,7 @@ def test_tuple_index(self): simplefilter("ignore", pd.errors.PerformanceWarning) self._check_roundtrip(DF, tm.assert_frame_equal) + @xfail_non_writeable @pytest.mark.filterwarnings("ignore::pandas.errors.PerformanceWarning") def test_index_types(self): @@ -2822,6 +2337,7 @@ def test_timeseries_preepoch(self): except OverflowError: pytest.skip('known failer on some windows platforms') + @xfail_non_writeable @pytest.mark.parametrize("compression", [ False, pytest.param(True, marks=td.skip_if_windows_python_3) ]) @@ -2852,6 +2368,7 @@ def test_frame(self, compression): # empty self._check_roundtrip(df[:0], tm.assert_frame_equal) + @xfail_non_writeable def test_empty_series_frame(self): s0 = Series() s1 = Series(name='myseries') @@ -2865,8 +2382,10 @@ def test_empty_series_frame(self): self._check_roundtrip(df1, tm.assert_frame_equal) self._check_roundtrip(df2, tm.assert_frame_equal) - def test_empty_series(self): - for dtype in [np.int64, np.float64, np.object, 'm8[ns]', 'M8[ns]']: + @xfail_non_writeable + @pytest.mark.parametrize( + 'dtype', [np.int64, np.float64, np.object, 'm8[ns]', 'M8[ns]']) + def test_empty_series(self, dtype): s = Series(dtype=dtype) self._check_roundtrip(s, tm.assert_series_equal) @@ -2947,6 +2466,7 @@ def test_store_series_name(self): recons = store['series'] tm.assert_series_equal(recons, series) + @xfail_non_writeable @pytest.mark.parametrize("compression", [ False, pytest.param(True, marks=td.skip_if_windows_python_3) ]) @@ -2982,12 +2502,6 @@ def _make_one(): self._check_roundtrip(df1['int1'], tm.assert_series_equal, compression=compression) - def test_wide(self): - - with catch_warnings(record=True): - wp = tm.makePanel() - self._check_roundtrip(wp, assert_panel_equal) - @pytest.mark.filterwarnings( "ignore:\\nduplicate:pandas.io.pytables.DuplicateWarning" ) @@ -3050,29 +2564,6 @@ def test_select_with_dups(self): result = store.select('df', columns=['B', 'A']) assert_frame_equal(result, expected, by_blocks=True) - @pytest.mark.filterwarnings( - "ignore:\\nduplicate:pandas.io.pytables.DuplicateWarning" - ) - def test_wide_table_dups(self): - with ensure_clean_store(self.path) as store: - with catch_warnings(record=True): - - wp = tm.makePanel() - store.put('panel', wp, format='table') - store.put('panel', wp, format='table', append=True) - - recons = store['panel'] - - assert_panel_equal(recons, wp) - - def test_long(self): - def _check(left, right): - assert_panel_equal(left.to_panel(), right.to_panel()) - - with catch_warnings(record=True): - wp = tm.makePanel() - self._check_roundtrip(wp.to_frame(), _check) - def test_overwrite_node(self): with ensure_clean_store(self.path) as store: @@ -3119,34 +2610,6 @@ def test_select(self): with ensure_clean_store(self.path) as store: with catch_warnings(record=True): - wp = tm.makePanel() - - # put/select ok - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - store.select('wp') - - # non-table ok (where = None) - _maybe_remove(store, 'wp') - store.put('wp2', wp) - store.select('wp2') - - # selection on the non-indexable with a large number of columns - wp = Panel(np.random.randn(100, 100, 100), - items=['Item%03d' % i for i in range(100)], - major_axis=date_range('1/1/2000', periods=100), - minor_axis=['E%03d' % i for i in range(100)]) - - _maybe_remove(store, 'wp') - store.append('wp', wp) - items = ['Item%03d' % i for i in range(80)] - result = store.select('wp', 'items=items') - expected = wp.reindex(items=items) - assert_panel_equal(expected, result) - - # selectin non-table with a where - # pytest.raises(ValueError, store.select, - # 'wp2', ('column', ['A', 'D'])) # select with columns= df = tm.makeTimeDataFrame() @@ -3675,31 +3138,6 @@ def test_retain_index_attributes2(self): assert read_hdf(path, 'data').index.name is None - def test_panel_select(self): - - with ensure_clean_store(self.path) as store: - - with catch_warnings(record=True): - - wp = tm.makePanel() - - store.put('wp', wp, format='table') - date = wp.major_axis[len(wp.major_axis) // 2] - - crit1 = ('major_axis>=date') - crit2 = ("minor_axis=['A', 'D']") - - result = store.select('wp', [crit1, crit2]) - expected = wp.truncate(before=date).reindex(minor=['A', 'D']) - assert_panel_equal(result, expected) - - result = store.select( - 'wp', ['major_axis>="20000124"', - ("minor_axis=['A', 'B']")]) - expected = wp.truncate( - before='20000124').reindex(minor=['A', 'B']) - assert_panel_equal(result, expected) - def test_frame_select(self): df = tm.makeTimeDataFrame() @@ -4538,9 +3976,10 @@ def test_pytables_native2_read(self, datapath): d1 = store['detector'] assert isinstance(d1, DataFrame) + @xfail_non_writeable def test_legacy_table_fixed_format_read_py2(self, datapath): # GH 24510 - # legacy table with fixed format written en Python 2 + # legacy table with fixed format written in Python 2 with ensure_clean_store( datapath('io', 'data', 'legacy_hdf', 'legacy_table_fixed_py2.h5'), @@ -4552,29 +3991,20 @@ def test_legacy_table_fixed_format_read_py2(self, datapath): name='INDEX_NAME')) assert_frame_equal(expected, result) - def test_legacy_table_read(self, datapath): - # legacy table types + def test_legacy_table_read_py2(self, datapath): + # issue: 24925 + # legacy table written in Python 2 with ensure_clean_store( - datapath('io', 'data', 'legacy_hdf', 'legacy_table.h5'), + datapath('io', 'data', 'legacy_hdf', + 'legacy_table_py2.h5'), mode='r') as store: + result = store.select('table') - with catch_warnings(): - simplefilter("ignore", pd.io.pytables.IncompatibilityWarning) - store.select('df1') - store.select('df2') - store.select('wp1') - - # force the frame - store.select('df2', typ='legacy_frame') - - # old version warning - pytest.raises( - Exception, store.select, 'wp1', 'minor_axis=B') - - df2 = store.select('df2') - result = store.select('df2', 'index>df2.index[2]') - expected = df2[df2.index > df2.index[2]] - assert_frame_equal(expected, result) + expected = pd.DataFrame({ + "a": ["a", "b"], + "b": [2, 3] + }) + assert_frame_equal(expected, result) def test_copy(self): @@ -4710,6 +4140,7 @@ def test_unicode_longer_encoded(self): result = store.get('df') tm.assert_frame_equal(result, df) + @xfail_non_writeable def test_store_datetime_mixed(self): df = DataFrame( @@ -5270,6 +4701,7 @@ def test_complex_table(self): reread = read_hdf(path, 'df') assert_frame_equal(df, reread) + @xfail_non_writeable def test_complex_mixed_fixed(self): complex64 = np.array([1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j], dtype=np.complex64) @@ -5308,35 +4740,30 @@ def test_complex_mixed_table(self): reread = read_hdf(path, 'df') assert_frame_equal(df, reread) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") def test_complex_across_dimensions_fixed(self): with catch_warnings(record=True): complex128 = np.array( [1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j]) s = Series(complex128, index=list('abcd')) df = DataFrame({'A': s, 'B': s}) - p = Panel({'One': df, 'Two': df}) - objs = [s, df, p] - comps = [tm.assert_series_equal, tm.assert_frame_equal, - tm.assert_panel_equal] + objs = [s, df] + comps = [tm.assert_series_equal, tm.assert_frame_equal] for obj, comp in zip(objs, comps): with ensure_clean_path(self.path) as path: obj.to_hdf(path, 'obj', format='fixed') reread = read_hdf(path, 'obj') comp(obj, reread) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") def test_complex_across_dimensions(self): complex128 = np.array([1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j]) s = Series(complex128, index=list('abcd')) df = DataFrame({'A': s, 'B': s}) with catch_warnings(record=True): - p = Panel({'One': df, 'Two': df}) - objs = [df, p] - comps = [tm.assert_frame_equal, tm.assert_panel_equal] + objs = [df] + comps = [tm.assert_frame_equal] for obj, comp in zip(objs, comps): with ensure_clean_path(self.path) as path: obj.to_hdf(path, 'obj', format='table') diff --git a/pandas/tests/io/test_sql.py b/pandas/tests/io/test_sql.py index 75a6d8d009083..9d0bce3b342b4 100644 --- a/pandas/tests/io/test_sql.py +++ b/pandas/tests/io/test_sql.py @@ -605,12 +605,6 @@ def test_to_sql_series(self): s2 = sql.read_sql_query("SELECT * FROM test_series", self.conn) tm.assert_frame_equal(s.to_frame(), s2) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_to_sql_panel(self): - panel = tm.makePanel() - pytest.raises(NotImplementedError, sql.to_sql, panel, - 'test_panel', self.conn) - def test_roundtrip(self): sql.to_sql(self.test_frame1, 'test_frame_roundtrip', con=self.conn) diff --git a/pandas/tests/plotting/test_boxplot_method.py b/pandas/tests/plotting/test_boxplot_method.py index 7d721c7de3398..e6b9795aebe7c 100644 --- a/pandas/tests/plotting/test_boxplot_method.py +++ b/pandas/tests/plotting/test_boxplot_method.py @@ -267,13 +267,20 @@ def test_grouped_box_return_type(self): def test_grouped_box_layout(self): df = self.hist_df - pytest.raises(ValueError, df.boxplot, column=['weight', 'height'], - by=df.gender, layout=(1, 1)) - pytest.raises(ValueError, df.boxplot, - column=['height', 'weight', 'category'], - layout=(2, 1), return_type='dict') - pytest.raises(ValueError, df.boxplot, column=['weight', 'height'], - by=df.gender, layout=(-1, -1)) + msg = "Layout of 1x1 must be larger than required size 2" + with pytest.raises(ValueError, match=msg): + df.boxplot(column=['weight', 'height'], by=df.gender, + layout=(1, 1)) + + msg = "The 'layout' keyword is not supported when 'by' is None" + with pytest.raises(ValueError, match=msg): + df.boxplot(column=['height', 'weight', 'category'], + layout=(2, 1), return_type='dict') + + msg = "At least one dimension of layout must be positive" + with pytest.raises(ValueError, match=msg): + df.boxplot(column=['weight', 'height'], by=df.gender, + layout=(-1, -1)) # _check_plot_works adds an ax so catch warning. see GH #13188 with tm.assert_produces_warning(UserWarning): diff --git a/pandas/tests/plotting/test_datetimelike.py b/pandas/tests/plotting/test_datetimelike.py index ad79cc97f8b77..6702ad6cfb761 100644 --- a/pandas/tests/plotting/test_datetimelike.py +++ b/pandas/tests/plotting/test_datetimelike.py @@ -97,7 +97,9 @@ def test_nonnumeric_exclude(self): assert len(ax.get_lines()) == 1 # B was plotted self.plt.close(fig) - pytest.raises(TypeError, df['A'].plot) + msg = "Empty 'DataFrame': no numeric data to plot" + with pytest.raises(TypeError, match=msg): + df['A'].plot() def test_tsplot_deprecated(self): from pandas.tseries.plotting import tsplot @@ -140,10 +142,15 @@ def f(*args, **kwds): def test_both_style_and_color(self): ts = tm.makeTimeSeries() - pytest.raises(ValueError, ts.plot, style='b-', color='#000099') + msg = ("Cannot pass 'style' string with a color symbol and 'color' " + "keyword argument. Please use one or the other or pass 'style'" + " without a color symbol") + with pytest.raises(ValueError, match=msg): + ts.plot(style='b-', color='#000099') s = ts.reset_index(drop=True) - pytest.raises(ValueError, s.plot, style='b-', color='#000099') + with pytest.raises(ValueError, match=msg): + s.plot(style='b-', color='#000099') @pytest.mark.slow def test_high_freq(self): diff --git a/pandas/tests/plotting/test_hist_method.py b/pandas/tests/plotting/test_hist_method.py index 7bdbdac54f7a6..4f0bef52b5e15 100644 --- a/pandas/tests/plotting/test_hist_method.py +++ b/pandas/tests/plotting/test_hist_method.py @@ -332,12 +332,17 @@ def test_grouped_hist_legacy2(self): @pytest.mark.slow def test_grouped_hist_layout(self): df = self.hist_df - pytest.raises(ValueError, df.hist, column='weight', by=df.gender, - layout=(1, 1)) - pytest.raises(ValueError, df.hist, column='height', by=df.category, - layout=(1, 3)) - pytest.raises(ValueError, df.hist, column='height', by=df.category, - layout=(-1, -1)) + msg = "Layout of 1x1 must be larger than required size 2" + with pytest.raises(ValueError, match=msg): + df.hist(column='weight', by=df.gender, layout=(1, 1)) + + msg = "Layout of 1x3 must be larger than required size 4" + with pytest.raises(ValueError, match=msg): + df.hist(column='height', by=df.category, layout=(1, 3)) + + msg = "At least one dimension of layout must be positive" + with pytest.raises(ValueError, match=msg): + df.hist(column='height', by=df.category, layout=(-1, -1)) with tm.assert_produces_warning(UserWarning): axes = _check_plot_works(df.hist, column='height', by=df.gender, diff --git a/pandas/tests/plotting/test_misc.py b/pandas/tests/plotting/test_misc.py index 44b95f7d1b00b..98248586f3d27 100644 --- a/pandas/tests/plotting/test_misc.py +++ b/pandas/tests/plotting/test_misc.py @@ -278,14 +278,20 @@ def test_subplot_titles(self, iris): assert [p.get_title() for p in plot] == title # Case len(title) > len(df) - pytest.raises(ValueError, df.plot, subplots=True, - title=title + ["kittens > puppies"]) + msg = ("The length of `title` must equal the number of columns if" + " using `title` of type `list` and `subplots=True`") + with pytest.raises(ValueError, match=msg): + df.plot(subplots=True, title=title + ["kittens > puppies"]) # Case len(title) < len(df) - pytest.raises(ValueError, df.plot, subplots=True, title=title[:2]) + with pytest.raises(ValueError, match=msg): + df.plot(subplots=True, title=title[:2]) # Case subplots=False and title is of type list - pytest.raises(ValueError, df.plot, subplots=False, title=title) + msg = ("Using `title` of type `list` is not supported unless" + " `subplots=True` is passed") + with pytest.raises(ValueError, match=msg): + df.plot(subplots=False, title=title) # Case df with 3 numeric columns but layout of (2,2) plot = df.drop('SepalWidth', axis=1).plot(subplots=True, layout=(2, 2), diff --git a/pandas/tests/reductions/test_reductions.py b/pandas/tests/reductions/test_reductions.py index 173f719edd465..fbf7f610688ba 100644 --- a/pandas/tests/reductions/test_reductions.py +++ b/pandas/tests/reductions/test_reductions.py @@ -276,7 +276,9 @@ def test_timedelta_ops(self): # invalid ops for op in ['skew', 'kurt', 'sem', 'prod']: - pytest.raises(TypeError, getattr(td, op)) + msg = "reduction operation '{}' not allowed for this dtype" + with pytest.raises(TypeError, match=msg.format(op)): + getattr(td, op)() # GH#10040 # make sure NaT is properly handled by median() @@ -960,6 +962,27 @@ def test_min_max(self): assert np.isnan(_min) assert _max == 1 + def test_min_max_numeric_only(self): + # TODO deprecate numeric_only argument for Categorical and use + # skipna as well, see GH25303 + cat = Series(Categorical( + ["a", "b", np.nan, "a"], categories=['b', 'a'], ordered=True)) + + _min = cat.min() + _max = cat.max() + assert np.isnan(_min) + assert _max == "a" + + _min = cat.min(numeric_only=True) + _max = cat.max(numeric_only=True) + assert _min == "b" + assert _max == "a" + + _min = cat.min(numeric_only=False) + _max = cat.max(numeric_only=False) + assert np.isnan(_min) + assert _max == "a" + class TestSeriesMode(object): # Note: the name TestSeriesMode indicates these tests diff --git a/pandas/tests/resample/test_base.py b/pandas/tests/resample/test_base.py index 911cd990ab881..8f912ea5c524a 100644 --- a/pandas/tests/resample/test_base.py +++ b/pandas/tests/resample/test_base.py @@ -28,15 +28,10 @@ period_range, 'pi', datetime(2005, 1, 1), datetime(2005, 1, 10)) TIMEDELTA_RANGE = (timedelta_range, 'tdi', '1 day', '10 day') -ALL_TIMESERIES_INDEXES = [DATE_RANGE, PERIOD_RANGE, TIMEDELTA_RANGE] - - -def pytest_generate_tests(metafunc): - # called once per each test function - if metafunc.function.__name__.endswith('_all_ts'): - metafunc.parametrize( - '_index_factory,_series_name,_index_start,_index_end', - ALL_TIMESERIES_INDEXES) +all_ts = pytest.mark.parametrize( + '_index_factory,_series_name,_index_start,_index_end', + [DATE_RANGE, PERIOD_RANGE, TIMEDELTA_RANGE] +) @pytest.fixture @@ -84,7 +79,8 @@ def test_asfreq_fill_value(series, create_index): assert_frame_equal(result, expected) -def test_resample_interpolate_all_ts(frame): +@all_ts +def test_resample_interpolate(frame): # # 12925 df = frame assert_frame_equal( @@ -95,11 +91,15 @@ def test_resample_interpolate_all_ts(frame): def test_raises_on_non_datetimelike_index(): # this is a non datetimelike index xp = DataFrame() - pytest.raises(TypeError, lambda: xp.resample('A').mean()) + msg = ("Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex," + " but got an instance of 'Index'") + with pytest.raises(TypeError, match=msg): + xp.resample('A').mean() +@all_ts @pytest.mark.parametrize('freq', ['M', 'D', 'H']) -def test_resample_empty_series_all_ts(freq, empty_series, resample_method): +def test_resample_empty_series(freq, empty_series, resample_method): # GH12771 & GH12868 if resample_method == 'ohlc': @@ -118,8 +118,9 @@ def test_resample_empty_series_all_ts(freq, empty_series, resample_method): assert_series_equal(result, expected, check_dtype=False) +@all_ts @pytest.mark.parametrize('freq', ['M', 'D', 'H']) -def test_resample_empty_dataframe_all_ts(empty_frame, freq, resample_method): +def test_resample_empty_dataframe(empty_frame, freq, resample_method): # GH13212 df = empty_frame # count retains dimensions too @@ -159,7 +160,8 @@ def test_resample_empty_dtypes(index, dtype, resample_method): pass -def test_resample_loffset_arg_type_all_ts(frame, create_index): +@all_ts +def test_resample_loffset_arg_type(frame, create_index): # GH 13218, 15002 df = frame expected_means = [df.values[i:i + 2].mean() @@ -189,15 +191,18 @@ def test_resample_loffset_arg_type_all_ts(frame, create_index): # GH 13022, 7687 - TODO: fix resample w/ TimedeltaIndex if isinstance(expected.index, TimedeltaIndex): - with pytest.raises(AssertionError): + msg = "DataFrame are different" + with pytest.raises(AssertionError, match=msg): assert_frame_equal(result_agg, expected) + with pytest.raises(AssertionError, match=msg): assert_frame_equal(result_how, expected) else: assert_frame_equal(result_agg, expected) assert_frame_equal(result_how, expected) -def test_apply_to_empty_series_all_ts(empty_series): +@all_ts +def test_apply_to_empty_series(empty_series): # GH 14313 s = empty_series for freq in ['M', 'D', 'H']: @@ -207,7 +212,8 @@ def test_apply_to_empty_series_all_ts(empty_series): assert_series_equal(result, expected, check_dtype=False) -def test_resampler_is_iterable_all_ts(series): +@all_ts +def test_resampler_is_iterable(series): # GH 15314 freq = 'H' tg = TimeGrouper(freq, convention='start') @@ -218,7 +224,8 @@ def test_resampler_is_iterable_all_ts(series): assert_series_equal(rv, gv) -def test_resample_quantile_all_ts(series): +@all_ts +def test_resample_quantile(series): # GH 15023 s = series q = 0.75 diff --git a/pandas/tests/resample/test_datetime_index.py b/pandas/tests/resample/test_datetime_index.py index 73995cbe79ecd..ce675893d9907 100644 --- a/pandas/tests/resample/test_datetime_index.py +++ b/pandas/tests/resample/test_datetime_index.py @@ -1,6 +1,5 @@ from datetime import datetime, timedelta from functools import partial -from warnings import catch_warnings, simplefilter import numpy as np import pytest @@ -10,7 +9,7 @@ from pandas.errors import UnsupportedFunctionCall import pandas as pd -from pandas import DataFrame, Panel, Series, Timedelta, Timestamp, isna, notna +from pandas import DataFrame, Series, Timedelta, Timestamp, isna, notna from pandas.core.indexes.datetimes import date_range from pandas.core.indexes.period import Period, period_range from pandas.core.resample import ( @@ -113,16 +112,18 @@ def test_resample_basic_grouper(series): @pytest.mark.parametrize( '_index_start,_index_end,_index_name', [('1/1/2000 00:00:00', '1/1/2000 00:13:00', 'index')]) -@pytest.mark.parametrize('kwargs', [ - dict(label='righttt'), - dict(closed='righttt'), - dict(convention='starttt') +@pytest.mark.parametrize('keyword,value', [ + ('label', 'righttt'), + ('closed', 'righttt'), + ('convention', 'starttt') ]) -def test_resample_string_kwargs(series, kwargs): +def test_resample_string_kwargs(series, keyword, value): # see gh-19303 # Check that wrong keyword argument strings raise an error - with pytest.raises(ValueError, match='Unsupported value'): - series.resample('5min', **kwargs) + msg = "Unsupported value {value} for `{keyword}`".format( + value=value, keyword=keyword) + with pytest.raises(ValueError, match=msg): + series.resample('5min', **({keyword: value})) @pytest.mark.parametrize( @@ -676,7 +677,7 @@ def test_asfreq_non_unique(): ts = Series(np.random.randn(len(rng2)), index=rng2) msg = 'cannot reindex from a duplicate axis' - with pytest.raises(Exception, match=msg): + with pytest.raises(ValueError, match=msg): ts.asfreq('B') @@ -690,56 +691,6 @@ def test_resample_axis1(): tm.assert_frame_equal(result, expected) -def test_resample_panel(): - rng = date_range('1/1/2000', '6/30/2000') - n = len(rng) - - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - panel = Panel(np.random.randn(3, n, 5), - items=['one', 'two', 'three'], - major_axis=rng, - minor_axis=['a', 'b', 'c', 'd', 'e']) - - result = panel.resample('M', axis=1).mean() - - def p_apply(panel, f): - result = {} - for item in panel.items: - result[item] = f(panel[item]) - return Panel(result, items=panel.items) - - expected = p_apply(panel, lambda x: x.resample('M').mean()) - tm.assert_panel_equal(result, expected) - - panel2 = panel.swapaxes(1, 2) - result = panel2.resample('M', axis=2).mean() - expected = p_apply(panel2, - lambda x: x.resample('M', axis=1).mean()) - tm.assert_panel_equal(result, expected) - - -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -def test_resample_panel_numpy(): - rng = date_range('1/1/2000', '6/30/2000') - n = len(rng) - - with catch_warnings(record=True): - panel = Panel(np.random.randn(3, n, 5), - items=['one', 'two', 'three'], - major_axis=rng, - minor_axis=['a', 'b', 'c', 'd', 'e']) - - result = panel.resample('M', axis=1).apply(lambda x: x.mean(1)) - expected = panel.resample('M', axis=1).mean() - tm.assert_panel_equal(result, expected) - - panel = panel.swapaxes(1, 2) - result = panel.resample('M', axis=2).apply(lambda x: x.mean(2)) - expected = panel.resample('M', axis=2).mean() - tm.assert_panel_equal(result, expected) - - def test_resample_anchored_ticks(): # If a fixed delta (5 minute, 4 hour) evenly divides a day, we should # "anchor" the origin at midnight so we get regular intervals rather @@ -1184,6 +1135,15 @@ def test_resample_nunique(): assert_series_equal(result, expected) +def test_resample_nunique_preserves_column_level_names(): + # see gh-23222 + df = tm.makeTimeDataFrame(freq="1D").abs() + df.columns = pd.MultiIndex.from_arrays([df.columns.tolist()] * 2, + names=["lev0", "lev1"]) + result = df.resample("1h").nunique() + tm.assert_index_equal(df.columns, result.columns) + + def test_resample_nunique_with_date_gap(): # GH 13453 index = pd.date_range('1-1-2000', '2-15-2000', freq='h') @@ -1209,9 +1169,13 @@ def test_resample_nunique_with_date_gap(): @pytest.mark.parametrize('k', [10, 100, 1000]) def test_resample_group_info(n, k): # GH10914 + + # use a fixed seed to always have the same uniques + prng = np.random.RandomState(1234) + dr = date_range(start='2015-08-27', periods=n // 10, freq='T') - ts = Series(np.random.randint(0, n // k, n).astype('int64'), - index=np.random.choice(dr, n)) + ts = Series(prng.randint(0, n // k, n).astype('int64'), + index=prng.choice(dr, n)) left = ts.resample('30T').nunique() ix = date_range(start=ts.index.min(), end=ts.index.max(), @@ -1276,6 +1240,21 @@ def test_resample_across_dst(): assert_frame_equal(result, expected) +def test_groupby_with_dst_time_change(): + # GH 24972 + index = pd.DatetimeIndex([1478064900001000000, 1480037118776792000], + tz='UTC').tz_convert('America/Chicago') + + df = pd.DataFrame([1, 2], index=index) + result = df.groupby(pd.Grouper(freq='1d')).last() + expected_index_values = pd.date_range('2016-11-02', '2016-11-24', + freq='d', tz='America/Chicago') + + index = pd.DatetimeIndex(expected_index_values) + expected = pd.DataFrame([1.0] + ([np.nan] * 21) + [2.0], index=index) + assert_frame_equal(result, expected) + + def test_resample_dst_anchor(): # 5172 dti = DatetimeIndex([datetime(2012, 11, 4, 23)], tz='US/Eastern') diff --git a/pandas/tests/resample/test_period_index.py b/pandas/tests/resample/test_period_index.py index c2fbb5bbb088c..8abdf9034527b 100644 --- a/pandas/tests/resample/test_period_index.py +++ b/pandas/tests/resample/test_period_index.py @@ -11,6 +11,7 @@ import pandas as pd from pandas import DataFrame, Series, Timestamp +from pandas.core.indexes.base import InvalidIndexError from pandas.core.indexes.datetimes import date_range from pandas.core.indexes.period import Period, PeriodIndex, period_range from pandas.core.resample import _get_period_range_edges @@ -72,17 +73,19 @@ def test_asfreq_fill_value(self, series): @pytest.mark.parametrize('freq', ['H', '12H', '2D', 'W']) @pytest.mark.parametrize('kind', [None, 'period', 'timestamp']) - def test_selection(self, index, freq, kind): + @pytest.mark.parametrize('kwargs', [dict(on='date'), dict(level='d')]) + def test_selection(self, index, freq, kind, kwargs): # This is a bug, these should be implemented # GH 14008 rng = np.arange(len(index), dtype=np.int64) df = DataFrame({'date': index, 'a': rng}, index=pd.MultiIndex.from_arrays([rng, index], names=['v', 'd'])) - with pytest.raises(NotImplementedError): - df.resample(freq, on='date', kind=kind) - with pytest.raises(NotImplementedError): - df.resample(freq, level='d', kind=kind) + msg = ("Resampling from level= or on= selection with a PeriodIndex is" + r" not currently supported, use \.set_index\(\.\.\.\) to" + " explicitly set index") + with pytest.raises(NotImplementedError, match=msg): + df.resample(freq, kind=kind, **kwargs) @pytest.mark.parametrize('month', MONTHS) @pytest.mark.parametrize('meth', ['ffill', 'bfill']) @@ -110,13 +113,20 @@ def test_basic_downsample(self, simple_period_range_series): assert_series_equal(ts.resample('a-dec').mean(), result) assert_series_equal(ts.resample('a').mean(), result) - def test_not_subperiod(self, simple_period_range_series): + @pytest.mark.parametrize('rule,expected_error_msg', [ + ('a-dec', ''), + ('q-mar', ''), + ('M', ''), + ('w-thu', '') + ]) + def test_not_subperiod( + self, simple_period_range_series, rule, expected_error_msg): # These are incompatible period rules for resampling ts = simple_period_range_series('1/1/1990', '6/30/1995', freq='w-wed') - pytest.raises(ValueError, lambda: ts.resample('a-dec').mean()) - pytest.raises(ValueError, lambda: ts.resample('q-mar').mean()) - pytest.raises(ValueError, lambda: ts.resample('M').mean()) - pytest.raises(ValueError, lambda: ts.resample('w-thu').mean()) + msg = ("Frequency cannot be resampled to {}, as they" + " are not sub or super periods").format(expected_error_msg) + with pytest.raises(IncompatibleFrequency, match=msg): + ts.resample(rule).mean() @pytest.mark.parametrize('freq', ['D', '2D']) def test_basic_upsample(self, freq, simple_period_range_series): @@ -212,8 +222,9 @@ def test_resample_same_freq(self, resample_method): assert_series_equal(result, expected) def test_resample_incompat_freq(self): - - with pytest.raises(IncompatibleFrequency): + msg = ("Frequency cannot be resampled to ," + " as they are not sub or super periods") + with pytest.raises(IncompatibleFrequency, match=msg): Series(range(3), index=pd.period_range( start='2000', periods=3, freq='M')).resample('W').mean() @@ -373,7 +384,9 @@ def test_resample_fill_missing(self): def test_cant_fill_missing_dups(self): rng = PeriodIndex([2000, 2005, 2005, 2007, 2007], freq='A') s = Series(np.random.randn(5), index=rng) - pytest.raises(Exception, lambda: s.resample('A').ffill()) + msg = "Reindexing only valid with uniquely valued Index objects" + with pytest.raises(InvalidIndexError, match=msg): + s.resample('A').ffill() @pytest.mark.parametrize('freq', ['5min']) @pytest.mark.parametrize('kind', ['period', None, 'timestamp']) diff --git a/pandas/tests/resample/test_resample_api.py b/pandas/tests/resample/test_resample_api.py index 69684daf05f3d..97f1e07380ef9 100644 --- a/pandas/tests/resample/test_resample_api.py +++ b/pandas/tests/resample/test_resample_api.py @@ -1,11 +1,12 @@ # pylint: disable=E1101 +from collections import OrderedDict from datetime import datetime import numpy as np import pytest -from pandas.compat import OrderedDict, range +from pandas.compat import range import pandas as pd from pandas import DataFrame, Series @@ -113,16 +114,14 @@ def test_getitem(): test_frame.columns[[0, 1]]) -def test_select_bad_cols(): - +@pytest.mark.parametrize('key', [['D'], ['A', 'D']]) +def test_select_bad_cols(key): g = test_frame.resample('H') - pytest.raises(KeyError, g.__getitem__, ['D']) - - pytest.raises(KeyError, g.__getitem__, ['A', 'D']) - with pytest.raises(KeyError, match='^[^A]+$'): - # A should not be referenced as a bad column... - # will have to rethink regex if you change message! - g[['A', 'D']] + # 'A' should not be referenced as a bad column... + # will have to rethink regex if you change message! + msg = r"^\"Columns not found: 'D'\"$" + with pytest.raises(KeyError, match=msg): + g[key] def test_attribute_access(): @@ -216,7 +215,9 @@ def test_fillna(): result = r.fillna(method='bfill') assert_series_equal(result, expected) - with pytest.raises(ValueError): + msg = (r"Invalid fill method\. Expecting pad \(ffill\), backfill" + r" \(bfill\) or nearest\. Got 0") + with pytest.raises(ValueError, match=msg): r.fillna(0) @@ -437,12 +438,11 @@ def test_agg_misc(): # errors # invalid names in the agg specification + msg = "\"Column 'B' does not exist!\"" for t in cases: - with pytest.raises(KeyError): - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - t[['A']].agg({'A': ['sum', 'std'], - 'B': ['mean', 'std']}) + with pytest.raises(KeyError, match=msg): + t[['A']].agg({'A': ['sum', 'std'], + 'B': ['mean', 'std']}) def test_agg_nested_dicts(): @@ -464,11 +464,11 @@ def test_agg_nested_dicts(): df.groupby(pd.Grouper(freq='2D')) ] + msg = r"cannot perform renaming for r(1|2) with a nested dictionary" for t in cases: - def f(): + with pytest.raises(pd.core.base.SpecificationError, match=msg): t.aggregate({'r1': {'A': ['mean', 'sum']}, 'r2': {'B': ['mean', 'sum']}}) - pytest.raises(ValueError, f) for t in cases: expected = pd.concat([t['A'].mean(), t['A'].std(), t['B'].mean(), @@ -499,7 +499,8 @@ def test_try_aggregate_non_existing_column(): df = DataFrame(data).set_index('dt') # Error as we don't have 'z' column - with pytest.raises(KeyError): + msg = "\"Column 'z' does not exist!\"" + with pytest.raises(KeyError, match=msg): df.resample('30T').agg({'x': ['mean'], 'y': ['median'], 'z': ['sum']}) @@ -517,23 +518,29 @@ def test_selection_api_validation(): df_exp = DataFrame({'a': rng}, index=index) # non DatetimeIndex - with pytest.raises(TypeError): + msg = ("Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex," + " but got an instance of 'Int64Index'") + with pytest.raises(TypeError, match=msg): df.resample('2D', level='v') - with pytest.raises(ValueError): + msg = "The Grouper cannot specify both a key and a level!" + with pytest.raises(ValueError, match=msg): df.resample('2D', on='date', level='d') - with pytest.raises(TypeError): + msg = "unhashable type: 'list'" + with pytest.raises(TypeError, match=msg): df.resample('2D', on=['a', 'date']) - with pytest.raises(KeyError): + msg = r"\"Level \['a', 'date'\] not found\"" + with pytest.raises(KeyError, match=msg): df.resample('2D', level=['a', 'date']) # upsampling not allowed - with pytest.raises(ValueError): + msg = ("Upsampling from level= or on= selection is not supported, use" + r" \.set_index\(\.\.\.\) to explicitly set index to datetime-like") + with pytest.raises(ValueError, match=msg): df.resample('2D', level='d').asfreq() - - with pytest.raises(ValueError): + with pytest.raises(ValueError, match=msg): df.resample('2D', on='date').asfreq() exp = df_exp.resample('2D').sum() @@ -542,3 +549,25 @@ def test_selection_api_validation(): exp.index.name = 'd' assert_frame_equal(exp, df.resample('2D', level='d').sum()) + + +@pytest.mark.parametrize('col_name', ['t2', 't2x', 't2q', 'T_2M', + 't2p', 't2m', 't2m1', 'T2M']) +def test_agg_with_datetime_index_list_agg_func(col_name): + # GH 22660 + # The parametrized column names would get converted to dates by our + # date parser. Some would result in OutOfBoundsError (ValueError) while + # others would result in OverflowError when passed into Timestamp. + # We catch these errors and move on to the correct branch. + df = pd.DataFrame(list(range(200)), + index=pd.date_range(start='2017-01-01', freq='15min', + periods=200, tz='Europe/Berlin'), + columns=[col_name]) + result = df.resample('1d').aggregate(['mean']) + expected = pd.DataFrame([47.5, 143.5, 195.5], + index=pd.date_range(start='2017-01-01', freq='D', + periods=3, tz='Europe/Berlin'), + columns=pd.MultiIndex(levels=[[col_name], + ['mean']], + codes=[[0], [0]])) + assert_frame_equal(result, expected) diff --git a/pandas/tests/resample/test_time_grouper.py b/pandas/tests/resample/test_time_grouper.py index ec29b55ac9d67..2f330d1f2484b 100644 --- a/pandas/tests/resample/test_time_grouper.py +++ b/pandas/tests/resample/test_time_grouper.py @@ -5,7 +5,7 @@ import pytest import pandas as pd -from pandas import DataFrame, Panel, Series +from pandas import DataFrame, Series from pandas.core.indexes.datetimes import date_range from pandas.core.resample import TimeGrouper import pandas.util.testing as tm @@ -79,27 +79,6 @@ def f(df): tm.assert_index_equal(result.index, df.index) -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -def test_panel_aggregation(): - ind = pd.date_range('1/1/2000', periods=100) - data = np.random.randn(2, len(ind), 4) - - wp = Panel(data, items=['Item1', 'Item2'], major_axis=ind, - minor_axis=['A', 'B', 'C', 'D']) - - tg = TimeGrouper('M', axis=1) - _, grouper, _ = tg._get_grouper(wp) - bingrouped = wp.groupby(grouper) - binagg = bingrouped.mean() - - def f(x): - assert (isinstance(x, Panel)) - return x.mean(1) - - result = bingrouped.agg(f) - tm.assert_panel_equal(result, binagg) - - @pytest.mark.parametrize('name, func', [ ('Int64Index', tm.makeIntIndex), ('Index', tm.makeUnicodeIndex), @@ -112,7 +91,7 @@ def test_fails_on_no_datetime_index(name, func): df = DataFrame({'a': np.random.randn(n)}, index=index) msg = ("Only valid with DatetimeIndex, TimedeltaIndex " - "or PeriodIndex, but got an instance of %r" % name) + "or PeriodIndex, but got an instance of '{}'".format(name)) with pytest.raises(TypeError, match=msg): df.groupby(TimeGrouper('D')) diff --git a/pandas/tests/reshape/merge/test_join.py b/pandas/tests/reshape/merge/test_join.py index e21f9d0291afa..62c9047b17f3d 100644 --- a/pandas/tests/reshape/merge/test_join.py +++ b/pandas/tests/reshape/merge/test_join.py @@ -1,7 +1,5 @@ # pylint: disable=E1103 -from warnings import catch_warnings - import numpy as np from numpy.random import randn import pytest @@ -657,95 +655,6 @@ def test_join_dups(self): 'y_y', 'x_x', 'y_x', 'x_y', 'y_y'] assert_frame_equal(dta, expected) - def test_panel_join(self): - with catch_warnings(record=True): - panel = tm.makePanel() - tm.add_nans(panel) - - p1 = panel.iloc[:2, :10, :3] - p2 = panel.iloc[2:, 5:, 2:] - - # left join - result = p1.join(p2) - expected = p1.copy() - expected['ItemC'] = p2['ItemC'] - tm.assert_panel_equal(result, expected) - - # right join - result = p1.join(p2, how='right') - expected = p2.copy() - expected['ItemA'] = p1['ItemA'] - expected['ItemB'] = p1['ItemB'] - expected = expected.reindex(items=['ItemA', 'ItemB', 'ItemC']) - tm.assert_panel_equal(result, expected) - - # inner join - result = p1.join(p2, how='inner') - expected = panel.iloc[:, 5:10, 2:3] - tm.assert_panel_equal(result, expected) - - # outer join - result = p1.join(p2, how='outer') - expected = p1.reindex(major=panel.major_axis, - minor=panel.minor_axis) - expected = expected.join(p2.reindex(major=panel.major_axis, - minor=panel.minor_axis)) - tm.assert_panel_equal(result, expected) - - def test_panel_join_overlap(self): - with catch_warnings(record=True): - panel = tm.makePanel() - tm.add_nans(panel) - - p1 = panel.loc[['ItemA', 'ItemB', 'ItemC']] - p2 = panel.loc[['ItemB', 'ItemC']] - - # Expected index is - # - # ItemA, ItemB_p1, ItemC_p1, ItemB_p2, ItemC_p2 - joined = p1.join(p2, lsuffix='_p1', rsuffix='_p2') - p1_suf = p1.loc[['ItemB', 'ItemC']].add_suffix('_p1') - p2_suf = p2.loc[['ItemB', 'ItemC']].add_suffix('_p2') - no_overlap = panel.loc[['ItemA']] - expected = no_overlap.join(p1_suf.join(p2_suf)) - tm.assert_panel_equal(joined, expected) - - def test_panel_join_many(self): - with catch_warnings(record=True): - tm.K = 10 - panel = tm.makePanel() - tm.K = 4 - - panels = [panel.iloc[:2], panel.iloc[2:6], panel.iloc[6:]] - - joined = panels[0].join(panels[1:]) - tm.assert_panel_equal(joined, panel) - - panels = [panel.iloc[:2, :-5], - panel.iloc[2:6, 2:], - panel.iloc[6:, 5:-7]] - - data_dict = {} - for p in panels: - data_dict.update(p.iteritems()) - - joined = panels[0].join(panels[1:], how='inner') - expected = pd.Panel.from_dict(data_dict, intersect=True) - tm.assert_panel_equal(joined, expected) - - joined = panels[0].join(panels[1:], how='outer') - expected = pd.Panel.from_dict(data_dict, intersect=False) - tm.assert_panel_equal(joined, expected) - - # edge cases - msg = "Suffixes not supported when passing multiple panels" - with pytest.raises(ValueError, match=msg): - panels[0].join(panels[1:], how='outer', lsuffix='foo', - rsuffix='bar') - msg = "Right join not supported with multiple panels" - with pytest.raises(ValueError, match=msg): - panels[0].join(panels[1:], how='right') - def test_join_multi_to_multi(self, join_type): # GH 20475 leftindex = MultiIndex.from_product([list('abc'), list('xy'), [1, 2]], @@ -773,6 +682,28 @@ def test_join_multi_to_multi(self, join_type): with pytest.raises(ValueError, match=msg): right.join(left, on=['abc', 'xy'], how=join_type) + def test_join_on_tz_aware_datetimeindex(self): + # GH 23931 + df1 = pd.DataFrame( + { + 'date': pd.date_range(start='2018-01-01', periods=5, + tz='America/Chicago'), + 'vals': list('abcde') + } + ) + + df2 = pd.DataFrame( + { + 'date': pd.date_range(start='2018-01-03', periods=5, + tz='America/Chicago'), + 'vals_2': list('tuvwx') + } + ) + result = df1.join(df2.set_index('date'), on='date') + expected = df1.copy() + expected['vals_2'] = pd.Series([np.nan] * len(expected), dtype=object) + assert_frame_equal(result, expected) + def _check_join(left, right, result, join_col, how='left', lsuffix='_x', rsuffix='_y'): diff --git a/pandas/tests/reshape/merge/test_merge.py b/pandas/tests/reshape/merge/test_merge.py index c17c301968269..7a97368504fd6 100644 --- a/pandas/tests/reshape/merge/test_merge.py +++ b/pandas/tests/reshape/merge/test_merge.py @@ -39,6 +39,54 @@ def get_test_data(ngroups=NGROUPS, n=N): return arr +def get_series(): + return [ + pd.Series([1], dtype='int64'), + pd.Series([1], dtype='Int64'), + pd.Series([1.23]), + pd.Series(['foo']), + pd.Series([True]), + pd.Series([pd.Timestamp('2018-01-01')]), + pd.Series([pd.Timestamp('2018-01-01', tz='US/Eastern')]), + ] + + +def get_series_na(): + return [ + pd.Series([np.nan], dtype='Int64'), + pd.Series([np.nan], dtype='float'), + pd.Series([np.nan], dtype='object'), + pd.Series([pd.NaT]), + ] + + +@pytest.fixture(params=get_series(), ids=lambda x: x.dtype.name) +def series_of_dtype(request): + """ + A parametrized fixture returning a variety of Series of different + dtypes + """ + return request.param + + +@pytest.fixture(params=get_series(), ids=lambda x: x.dtype.name) +def series_of_dtype2(request): + """ + A duplicate of the series_of_dtype fixture, so that it can be used + twice by a single function + """ + return request.param + + +@pytest.fixture(params=get_series_na(), ids=lambda x: x.dtype.name) +def series_of_dtype_all_na(request): + """ + A parametrized fixture returning a variety of Series with all NA + values + """ + return request.param + + class TestMerge(object): def setup_method(self, method): @@ -428,6 +476,36 @@ def check2(exp, kwarg): check1(exp_in, kwarg) check2(exp_out, kwarg) + def test_merge_empty_frame(self, series_of_dtype, series_of_dtype2): + # GH 25183 + df = pd.DataFrame({'key': series_of_dtype, 'value': series_of_dtype2}, + columns=['key', 'value']) + df_empty = df[:0] + expected = pd.DataFrame({ + 'value_x': pd.Series(dtype=df.dtypes['value']), + 'key': pd.Series(dtype=df.dtypes['key']), + 'value_y': pd.Series(dtype=df.dtypes['value']), + }, columns=['value_x', 'key', 'value_y']) + actual = df_empty.merge(df, on='key') + assert_frame_equal(actual, expected) + + def test_merge_all_na_column(self, series_of_dtype, + series_of_dtype_all_na): + # GH 25183 + df_left = pd.DataFrame( + {'key': series_of_dtype, 'value': series_of_dtype_all_na}, + columns=['key', 'value']) + df_right = pd.DataFrame( + {'key': series_of_dtype, 'value': series_of_dtype_all_na}, + columns=['key', 'value']) + expected = pd.DataFrame({ + 'key': series_of_dtype, + 'value_x': series_of_dtype_all_na, + 'value_y': series_of_dtype_all_na, + }, columns=['key', 'value_x', 'value_y']) + actual = df_left.merge(df_right, on='key') + assert_frame_equal(actual, expected) + def test_merge_nosort(self): # #2098, anything to do? @@ -616,6 +694,24 @@ def test_merge_on_datetime64tz(self): assert result['value_x'].dtype == 'datetime64[ns, US/Eastern]' assert result['value_y'].dtype == 'datetime64[ns, US/Eastern]' + def test_merge_on_datetime64tz_empty(self): + # https://github.com/pandas-dev/pandas/issues/25014 + dtz = pd.DatetimeTZDtype(tz='UTC') + right = pd.DataFrame({'date': [pd.Timestamp('2018', tz=dtz.tz)], + 'value': [4.0], + 'date2': [pd.Timestamp('2019', tz=dtz.tz)]}, + columns=['date', 'value', 'date2']) + left = right[:0] + result = left.merge(right, on='date') + expected = pd.DataFrame({ + 'value_x': pd.Series(dtype=float), + 'date2_x': pd.Series(dtype=dtz), + 'date': pd.Series(dtype=dtz), + 'value_y': pd.Series(dtype=float), + 'date2_y': pd.Series(dtype=dtz), + }, columns=['value_x', 'date2_x', 'date', 'value_y', 'date2_y']) + tm.assert_frame_equal(result, expected) + def test_merge_datetime64tz_with_dst_transition(self): # GH 18885 df1 = pd.DataFrame(pd.date_range( @@ -1508,3 +1604,65 @@ def test_merge_series(on, left_on, right_on, left_index, right_index, nm): with pytest.raises(ValueError, match=msg): result = pd.merge(a, b, on=on, left_on=left_on, right_on=right_on, left_index=left_index, right_index=right_index) + + +@pytest.mark.parametrize("col1, col2, kwargs, expected_cols", [ + (0, 0, dict(suffixes=("", "_dup")), ["0", "0_dup"]), + (0, 0, dict(suffixes=(None, "_dup")), [0, "0_dup"]), + (0, 0, dict(suffixes=("_x", "_y")), ["0_x", "0_y"]), + ("a", 0, dict(suffixes=(None, "_y")), ["a", 0]), + (0.0, 0.0, dict(suffixes=("_x", None)), ["0.0_x", 0.0]), + ("b", "b", dict(suffixes=(None, "_y")), ["b", "b_y"]), + ("a", "a", dict(suffixes=("_x", None)), ["a_x", "a"]), + ("a", "b", dict(suffixes=("_x", None)), ["a", "b"]), + ("a", "a", dict(suffixes=[None, "_x"]), ["a", "a_x"]), + (0, 0, dict(suffixes=["_a", None]), ["0_a", 0]), + ("a", "a", dict(), ["a_x", "a_y"]), + (0, 0, dict(), ["0_x", "0_y"]) +]) +def test_merge_suffix(col1, col2, kwargs, expected_cols): + # issue: 24782 + a = pd.DataFrame({col1: [1, 2, 3]}) + b = pd.DataFrame({col2: [4, 5, 6]}) + + expected = pd.DataFrame([[1, 4], [2, 5], [3, 6]], + columns=expected_cols) + + result = a.merge(b, left_index=True, right_index=True, **kwargs) + tm.assert_frame_equal(result, expected) + + result = pd.merge(a, b, left_index=True, right_index=True, **kwargs) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("col1, col2, suffixes", [ + ("a", "a", [None, None]), + ("a", "a", (None, None)), + ("a", "a", ("", None)), + (0, 0, [None, None]), + (0, 0, (None, "")) +]) +def test_merge_suffix_error(col1, col2, suffixes): + # issue: 24782 + a = pd.DataFrame({col1: [1, 2, 3]}) + b = pd.DataFrame({col2: [3, 4, 5]}) + + # TODO: might reconsider current raise behaviour, see issue 24782 + msg = "columns overlap but no suffix specified" + with pytest.raises(ValueError, match=msg): + pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes) + + +@pytest.mark.parametrize("col1, col2, suffixes", [ + ("a", "a", None), + (0, 0, None) +]) +def test_merge_suffix_none_error(col1, col2, suffixes): + # issue: 24782 + a = pd.DataFrame({col1: [1, 2, 3]}) + b = pd.DataFrame({col2: [3, 4, 5]}) + + # TODO: might reconsider current raise behaviour, see GH24782 + msg = "iterable" + with pytest.raises(TypeError, match=msg): + pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes) diff --git a/pandas/tests/reshape/test_concat.py b/pandas/tests/reshape/test_concat.py index ec6123bae327e..a186d32ed8800 100644 --- a/pandas/tests/reshape/test_concat.py +++ b/pandas/tests/reshape/test_concat.py @@ -3,7 +3,7 @@ from datetime import datetime from decimal import Decimal from itertools import combinations -from warnings import catch_warnings, simplefilter +from warnings import catch_warnings import dateutil import numpy as np @@ -1499,15 +1499,6 @@ def test_concat_mixed_objs(self): result = concat([s1, df, s2], ignore_index=True) assert_frame_equal(result, expected) - # invalid concatente of mixed dims - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - panel = tm.makePanel() - msg = ("cannot concatenate unaligned mixed dimensional NDFrame" - " objects") - with pytest.raises(ValueError, match=msg): - concat([panel, s1], axis=1) - def test_empty_dtype_coerce(self): # xref to #12411 @@ -1543,34 +1534,6 @@ def test_dtype_coerceion(self): result = concat([df.iloc[[0]], df.iloc[[1]]]) tm.assert_series_equal(result.dtypes, df.dtypes) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_panel_concat_other_axes(self): - panel = tm.makePanel() - - p1 = panel.iloc[:, :5, :] - p2 = panel.iloc[:, 5:, :] - - result = concat([p1, p2], axis=1) - tm.assert_panel_equal(result, panel) - - p1 = panel.iloc[:, :, :2] - p2 = panel.iloc[:, :, 2:] - - result = concat([p1, p2], axis=2) - tm.assert_panel_equal(result, panel) - - # if things are a bit misbehaved - p1 = panel.iloc[:2, :, :2] - p2 = panel.iloc[:, :, 2:] - p1['ItemC'] = 'baz' - - result = concat([p1, p2], axis=2) - - expected = panel.copy() - expected['ItemC'] = expected['ItemC'].astype('O') - expected.loc['ItemC', :, :2] = 'baz' - tm.assert_panel_equal(result, expected) - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") # Panel.rename warning we don't care about @pytest.mark.filterwarnings("ignore:Using:FutureWarning") diff --git a/pandas/tests/reshape/test_reshape.py b/pandas/tests/reshape/test_reshape.py index 7b544b7981c1f..a5b6cffd1d86c 100644 --- a/pandas/tests/reshape/test_reshape.py +++ b/pandas/tests/reshape/test_reshape.py @@ -580,23 +580,28 @@ def test_get_dummies_duplicate_columns(self, df): class TestCategoricalReshape(object): - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_reshaping_panel_categorical(self): + def test_reshaping_multi_index_categorical(self): - p = tm.makePanel() - p['str'] = 'foo' - df = p.to_frame() + # construct a MultiIndexed DataFrame formerly created + # via `tm.makePanel().to_frame()` + cols = ['ItemA', 'ItemB', 'ItemC'] + data = {c: tm.makeTimeDataFrame() for c in cols} + df = pd.concat({c: data[c].stack() for c in data}, axis='columns') + df.index.names = ['major', 'minor'] + df['str'] = 'foo' + + dti = df.index.levels[0] df['category'] = df['str'].astype('category') result = df['category'].unstack() - c = Categorical(['foo'] * len(p.major_axis)) + c = Categorical(['foo'] * len(dti)) expected = DataFrame({'A': c.copy(), 'B': c.copy(), 'C': c.copy(), 'D': c.copy()}, columns=Index(list('ABCD'), name='minor'), - index=p.major_axis.set_names('major')) + index=dti) tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/scalar/period/test_period.py b/pandas/tests/scalar/period/test_period.py index d0f87618ad3af..8ca19745055a3 100644 --- a/pandas/tests/scalar/period/test_period.py +++ b/pandas/tests/scalar/period/test_period.py @@ -8,6 +8,7 @@ from pandas._libs.tslibs.ccalendar import DAYS, MONTHS from pandas._libs.tslibs.frequencies import INVALID_FREQ_ERR_MSG from pandas._libs.tslibs.parsing import DateParseError +from pandas._libs.tslibs.period import IncompatibleFrequency from pandas._libs.tslibs.timezones import dateutil_gettz, maybe_get_tz from pandas.compat import iteritems, text_type from pandas.compat.numpy import np_datetime64_compat @@ -35,7 +36,9 @@ def test_construction(self): i4 = Period('2005', freq='M') i5 = Period('2005', freq='m') - pytest.raises(ValueError, i1.__ne__, i4) + msg = r"Input has different freq=M from Period\(freq=A-DEC\)" + with pytest.raises(IncompatibleFrequency, match=msg): + i1 != i4 assert i4 == i5 i1 = Period.now('Q') @@ -74,9 +77,12 @@ def test_construction(self): freq='U') assert i1 == expected - pytest.raises(ValueError, Period, ordinal=200701) + msg = "Must supply freq for ordinal value" + with pytest.raises(ValueError, match=msg): + Period(ordinal=200701) - pytest.raises(ValueError, Period, '2007-1-1', freq='X') + with pytest.raises(ValueError, match="Invalid frequency: X"): + Period('2007-1-1', freq='X') def test_construction_bday(self): @@ -233,10 +239,6 @@ def test_period_constructor_offsets(self): freq='U') assert i1 == expected - pytest.raises(ValueError, Period, ordinal=200701) - - pytest.raises(ValueError, Period, '2007-1-1', freq='X') - def test_invalid_arguments(self): with pytest.raises(ValueError): Period(datetime.now()) @@ -925,8 +927,9 @@ def test_properties_secondly(self): class TestPeriodField(object): def test_get_period_field_array_raises_on_out_of_range(self): - pytest.raises(ValueError, libperiod.get_period_field_arr, -1, - np.empty(1), 0) + msg = "Buffer dtype mismatch, expected 'int64_t' but got 'double'" + with pytest.raises(ValueError, match=msg): + libperiod.get_period_field_arr(-1, np.empty(1), 0) class TestComparisons(object): diff --git a/pandas/tests/scalar/test_nat.py b/pandas/tests/scalar/test_nat.py index abf95b276cda1..43747ea8621d9 100644 --- a/pandas/tests/scalar/test_nat.py +++ b/pandas/tests/scalar/test_nat.py @@ -9,7 +9,7 @@ from pandas import ( DatetimeIndex, Index, NaT, Period, Series, Timedelta, TimedeltaIndex, - Timestamp) + Timestamp, isna) from pandas.core.arrays import PeriodArray from pandas.util import testing as tm @@ -201,9 +201,10 @@ def _get_overlap_public_nat_methods(klass, as_tuple=False): "fromtimestamp", "isocalendar", "isoformat", "isoweekday", "month_name", "now", "replace", "round", "strftime", "strptime", "time", "timestamp", "timetuple", "timetz", - "to_datetime64", "to_pydatetime", "today", "toordinal", - "tz_convert", "tz_localize", "tzname", "utcfromtimestamp", - "utcnow", "utcoffset", "utctimetuple", "weekday"]), + "to_datetime64", "to_numpy", "to_pydatetime", "today", + "toordinal", "tz_convert", "tz_localize", "tzname", + "utcfromtimestamp", "utcnow", "utcoffset", "utctimetuple", + "weekday"]), (Timedelta, ["total_seconds"]) ]) def test_overlap_public_nat_methods(klass, expected): @@ -339,3 +340,11 @@ def test_nat_arithmetic_td64_vector(op_name, box): def test_nat_pinned_docstrings(): # see gh-17327 assert NaT.ctime.__doc__ == datetime.ctime.__doc__ + + +def test_to_numpy_alias(): + # GH 24653: alias .to_numpy() for scalars + expected = NaT.to_datetime64() + result = NaT.to_numpy() + + assert isna(expected) and isna(result) diff --git a/pandas/tests/scalar/timedelta/test_timedelta.py b/pandas/tests/scalar/timedelta/test_timedelta.py index 9b5fdfb06a9fa..ee2c2e9e1959c 100644 --- a/pandas/tests/scalar/timedelta/test_timedelta.py +++ b/pandas/tests/scalar/timedelta/test_timedelta.py @@ -1,5 +1,6 @@ """ test the scalar Timedelta """ from datetime import timedelta +import re import numpy as np import pytest @@ -249,9 +250,13 @@ def check(value): assert rng.microseconds == 0 assert rng.nanoseconds == 0 - pytest.raises(AttributeError, lambda: rng.hours) - pytest.raises(AttributeError, lambda: rng.minutes) - pytest.raises(AttributeError, lambda: rng.milliseconds) + msg = "'Timedelta' object has no attribute '{}'" + with pytest.raises(AttributeError, match=msg.format('hours')): + rng.hours + with pytest.raises(AttributeError, match=msg.format('minutes')): + rng.minutes + with pytest.raises(AttributeError, match=msg.format('milliseconds')): + rng.milliseconds # GH 10050 check(rng.days) @@ -271,9 +276,13 @@ def check(value): assert rng.seconds == 10 * 3600 + 11 * 60 + 12 assert rng.microseconds == 100 * 1000 + 123 assert rng.nanoseconds == 456 - pytest.raises(AttributeError, lambda: rng.hours) - pytest.raises(AttributeError, lambda: rng.minutes) - pytest.raises(AttributeError, lambda: rng.milliseconds) + msg = "'Timedelta' object has no attribute '{}'" + with pytest.raises(AttributeError, match=msg.format('hours')): + rng.hours + with pytest.raises(AttributeError, match=msg.format('minutes')): + rng.minutes + with pytest.raises(AttributeError, match=msg.format('milliseconds')): + rng.milliseconds # components tup = pd.to_timedelta(-1, 'us').components @@ -309,9 +318,15 @@ def test_iso_conversion(self): assert to_timedelta('P0DT0H0M1S') == expected def test_nat_converters(self): - assert to_timedelta('nat', box=False).astype('int64') == iNaT - assert to_timedelta('nan', box=False).astype('int64') == iNaT + result = to_timedelta('nat', box=False) + assert result.dtype.kind == 'm' + assert result.astype('int64') == iNaT + result = to_timedelta('nan', box=False) + assert result.dtype.kind == 'm' + assert result.astype('int64') == iNaT + + @pytest.mark.filterwarnings("ignore:M and Y units are deprecated") @pytest.mark.parametrize('units, np_unit', [(['Y', 'y'], 'Y'), (['M'], 'M'), @@ -371,6 +386,24 @@ def test_unit_parser(self, units, np_unit, wrapper): result = Timedelta('2{}'.format(unit)) assert result == expected + @pytest.mark.skipif(compat.PY2, reason="requires python3.5 or higher") + @pytest.mark.parametrize('unit', ['Y', 'y', 'M']) + def test_unit_m_y_deprecated(self, unit): + with tm.assert_produces_warning(FutureWarning) as w1: + Timedelta(10, unit) + msg = r'.* units are deprecated .*' + assert re.match(msg, str(w1[0].message)) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False) as w2: + to_timedelta(10, unit) + msg = r'.* units are deprecated .*' + assert re.match(msg, str(w2[0].message)) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False) as w3: + to_timedelta([1, 2], unit) + msg = r'.* units are deprecated .*' + assert re.match(msg, str(w3[0].message)) + def test_numeric_conversions(self): assert Timedelta(0) == np.timedelta64(0, 'ns') assert Timedelta(10) == np.timedelta64(10, 'ns') @@ -389,6 +422,11 @@ def test_timedelta_conversions(self): assert (Timedelta(timedelta(days=1)) == np.timedelta64(1, 'D').astype('m8[ns]')) + def test_to_numpy_alias(self): + # GH 24653: alias .to_numpy() for scalars + td = Timedelta('10m7s') + assert td.to_timedelta64() == td.to_numpy() + def test_round(self): t1 = Timedelta('1 days 02:34:56.789123456') @@ -419,8 +457,12 @@ def test_round(self): assert r2 == s2 # invalid - for freq in ['Y', 'M', 'foobar']: - pytest.raises(ValueError, lambda: t1.round(freq)) + for freq, msg in [ + ('Y', ' is a non-fixed frequency'), + ('M', ' is a non-fixed frequency'), + ('foobar', 'Invalid frequency: foobar')]: + with pytest.raises(ValueError, match=msg): + t1.round(freq) t1 = timedelta_range('1 days', periods=3, freq='1 min 2 s 3 us') t2 = -1 * t1 @@ -465,11 +507,15 @@ def test_round(self): r1 = t1.round(freq) tm.assert_index_equal(r1, s1) r2 = t2.round(freq) - tm.assert_index_equal(r2, s2) + tm.assert_index_equal(r2, s2) # invalid - for freq in ['Y', 'M', 'foobar']: - pytest.raises(ValueError, lambda: t1.round(freq)) + for freq, msg in [ + ('Y', ' is a non-fixed frequency'), + ('M', ' is a non-fixed frequency'), + ('foobar', 'Invalid frequency: foobar')]: + with pytest.raises(ValueError, match=msg): + t1.round(freq) def test_contains(self): # Checking for any NaT-like objects @@ -579,9 +625,12 @@ def test_overflow(self): assert np.allclose(result.value / 1000, expected.value / 1000) # sum - pytest.raises(ValueError, lambda: (s - s.min()).sum()) + msg = "overflow in timedelta operation" + with pytest.raises(ValueError, match=msg): + (s - s.min()).sum() s1 = s[0:10000] - pytest.raises(ValueError, lambda: (s1 - s1.min()).sum()) + with pytest.raises(ValueError, match=msg): + (s1 - s1.min()).sum() s2 = s[0:1000] result = (s2 - s2.min()).sum() diff --git a/pandas/tests/scalar/timestamp/test_timestamp.py b/pandas/tests/scalar/timestamp/test_timestamp.py index b2c05d1564a48..b55d00b44fd67 100644 --- a/pandas/tests/scalar/timestamp/test_timestamp.py +++ b/pandas/tests/scalar/timestamp/test_timestamp.py @@ -60,7 +60,9 @@ def check(value, equal): check(ts.hour, 9) check(ts.minute, 6) check(ts.second, 3) - pytest.raises(AttributeError, lambda: ts.millisecond) + msg = "'Timestamp' object has no attribute 'millisecond'" + with pytest.raises(AttributeError, match=msg): + ts.millisecond check(ts.microsecond, 100) check(ts.nanosecond, 1) check(ts.dayofweek, 6) @@ -78,7 +80,9 @@ def check(value, equal): check(ts.hour, 23) check(ts.minute, 59) check(ts.second, 0) - pytest.raises(AttributeError, lambda: ts.millisecond) + msg = "'Timestamp' object has no attribute 'millisecond'" + with pytest.raises(AttributeError, match=msg): + ts.millisecond check(ts.microsecond, 0) check(ts.nanosecond, 0) check(ts.dayofweek, 2) @@ -355,6 +359,14 @@ def test_constructor_invalid_tz(self): # interpreted as a `freq` Timestamp('2012-01-01', 'US/Pacific') + def test_constructor_strptime(self): + # GH25016 + # Test support for Timestamp.strptime + fmt = '%Y%m%d-%H%M%S-%f%z' + ts = '20190129-235348-000001+0000' + with pytest.raises(NotImplementedError): + Timestamp.strptime(ts, fmt) + def test_constructor_tz_or_tzinfo(self): # GH#17943, GH#17690, GH#5168 stamps = [Timestamp(year=2017, month=10, day=22, tz='UTC'), @@ -780,6 +792,13 @@ def test_hash_equivalent(self): stamp = Timestamp(datetime(2011, 1, 1)) assert d[stamp] == 5 + def test_tz_conversion_freq(self, tz_naive_fixture): + # GH25241 + t1 = Timestamp('2019-01-01 10:00', freq='H') + assert t1.tz_localize(tz=tz_naive_fixture).freq == t1.freq + t2 = Timestamp('2019-01-02 12:00', tz='UTC', freq='T') + assert t2.tz_convert(tz='UTC').freq == t2.freq + class TestTimestampNsOperations(object): @@ -962,3 +981,8 @@ def test_to_period_tz_warning(self): with tm.assert_produces_warning(UserWarning): # warning that timezone info will be lost ts.to_period('D') + + def test_to_numpy_alias(self): + # GH 24653: alias .to_numpy() for scalars + ts = Timestamp(datetime.now()) + assert ts.to_datetime64() == ts.to_numpy() diff --git a/pandas/tests/scalar/timestamp/test_unary_ops.py b/pandas/tests/scalar/timestamp/test_unary_ops.py index 3f9a30d254126..adcf66200a672 100644 --- a/pandas/tests/scalar/timestamp/test_unary_ops.py +++ b/pandas/tests/scalar/timestamp/test_unary_ops.py @@ -8,7 +8,7 @@ from pandas._libs.tslibs import conversion from pandas._libs.tslibs.frequencies import INVALID_FREQ_ERR_MSG -from pandas.compat import PY3 +from pandas.compat import PY3, PY36 import pandas.util._test_decorators as td from pandas import NaT, Timestamp @@ -329,6 +329,19 @@ def test_replace_dst_border(self): expected = Timestamp('2013-11-3 03:00:00', tz='America/Chicago') assert result == expected + @pytest.mark.skipif(not PY36, reason='Fold not available until PY3.6') + @pytest.mark.parametrize('fold', [0, 1]) + @pytest.mark.parametrize('tz', ['dateutil/Europe/London', 'Europe/London']) + def test_replace_dst_fold(self, fold, tz): + # GH 25017 + d = datetime(2019, 10, 27, 2, 30) + ts = Timestamp(d, tz=tz) + result = ts.replace(hour=1, fold=fold) + expected = Timestamp(datetime(2019, 10, 27, 1, 30)).tz_localize( + tz, ambiguous=not fold + ) + assert result == expected + # -------------------------------------------------------------- # Timestamp.normalize diff --git a/pandas/tests/series/conftest.py b/pandas/tests/series/conftest.py index 431aacb1c8d56..367e7a1baa7f3 100644 --- a/pandas/tests/series/conftest.py +++ b/pandas/tests/series/conftest.py @@ -1,6 +1,5 @@ import pytest -from pandas import Series import pandas.util.testing as tm @@ -32,11 +31,3 @@ def object_series(): s = tm.makeObjectSeries() s.name = 'objects' return s - - -@pytest.fixture -def empty_series(): - """ - Fixture for empty Series - """ - return Series([], index=[]) diff --git a/pandas/tests/series/indexing/test_indexing.py b/pandas/tests/series/indexing/test_indexing.py index a5855f68127f4..dbe667a166d0a 100644 --- a/pandas/tests/series/indexing/test_indexing.py +++ b/pandas/tests/series/indexing/test_indexing.py @@ -114,7 +114,8 @@ def test_getitem_get(test_data): # missing d = test_data.ts.index[0] - BDay() - with pytest.raises(KeyError, match=r"Timestamp\('1999-12-31 00:00:00'\)"): + msg = r"Timestamp\('1999-12-31 00:00:00', freq='B'\)" + with pytest.raises(KeyError, match=msg): test_data.ts[d] # None diff --git a/pandas/tests/series/test_alter_axes.py b/pandas/tests/series/test_alter_axes.py index 04c54bcf8c22c..73adc7d4bf82f 100644 --- a/pandas/tests/series/test_alter_axes.py +++ b/pandas/tests/series/test_alter_axes.py @@ -258,6 +258,17 @@ def test_rename_axis_inplace(self, datetime_series): assert no_return is None tm.assert_series_equal(result, expected) + @pytest.mark.parametrize('kwargs', [{'mapper': None}, {'index': None}, {}]) + def test_rename_axis_none(self, kwargs): + # GH 25034 + index = Index(list('abc'), name='foo') + df = Series([1, 2, 3], index=index) + + result = df.rename_axis(**kwargs) + expected_index = index.rename(None) if kwargs else index + expected = Series([1, 2, 3], index=expected_index) + tm.assert_series_equal(result, expected) + def test_set_axis_inplace_axes(self, axis_series): # GH14636 ser = Series(np.arange(4), index=[1, 3, 5, 7], dtype='int64') diff --git a/pandas/tests/series/test_analytics.py b/pandas/tests/series/test_analytics.py index 6811e370726b2..1f265d574da15 100644 --- a/pandas/tests/series/test_analytics.py +++ b/pandas/tests/series/test_analytics.py @@ -9,7 +9,7 @@ from numpy import nan import pytest -from pandas.compat import PY35, lrange, range +from pandas.compat import PY2, PY35, is_platform_windows, lrange, range import pandas.util._test_decorators as td import pandas as pd @@ -285,6 +285,17 @@ def test_numpy_round(self): with pytest.raises(ValueError, match=msg): np.round(s, decimals=0, out=s) + @pytest.mark.xfail( + PY2 and is_platform_windows(), reason="numpy/numpy#7882", + raises=AssertionError, strict=True) + def test_numpy_round_nan(self): + # See gh-14197 + s = Series([1.53, np.nan, 0.06]) + with tm.assert_produces_warning(None): + result = s.round() + expected = Series([2., np.nan, 0.]) + assert_series_equal(result, expected) + def test_built_in_round(self): if not compat.PY3: pytest.skip( diff --git a/pandas/tests/series/test_apply.py b/pandas/tests/series/test_apply.py index 90cf6916df0d1..162a27db34cb1 100644 --- a/pandas/tests/series/test_apply.py +++ b/pandas/tests/series/test_apply.py @@ -163,6 +163,18 @@ def test_apply_dict_depr(self): with tm.assert_produces_warning(FutureWarning): tsdf.A.agg({'foo': ['sum', 'mean']}) + @pytest.mark.parametrize('series', [ + ['1-1', '1-1', np.NaN], + ['1-1', '1-2', np.NaN]]) + def test_apply_categorical_with_nan_values(self, series): + # GH 20714 bug fixed in: GH 24275 + s = pd.Series(series, dtype='category') + result = s.apply(lambda x: x.split('-')[0]) + result = result.astype(object) + expected = pd.Series(['1', '1', np.NaN], dtype='category') + expected = expected.astype(object) + tm.assert_series_equal(result, expected) + class TestSeriesAggregate(): diff --git a/pandas/tests/series/test_constructors.py b/pandas/tests/series/test_constructors.py index d92ca48751d0a..8525b877618c9 100644 --- a/pandas/tests/series/test_constructors.py +++ b/pandas/tests/series/test_constructors.py @@ -47,7 +47,9 @@ def test_scalar_conversion(self): assert int(Series([1.])) == 1 assert long(Series([1.])) == 1 - def test_constructor(self, datetime_series, empty_series): + def test_constructor(self, datetime_series): + empty_series = Series() + assert datetime_series.index.is_all_dates # Pass in Series diff --git a/pandas/tests/series/test_dtypes.py b/pandas/tests/series/test_dtypes.py index e29974f56967f..d8046c4944afc 100644 --- a/pandas/tests/series/test_dtypes.py +++ b/pandas/tests/series/test_dtypes.py @@ -291,8 +291,8 @@ def test_astype_categorical_to_other(self): expected = s tm.assert_series_equal(s.astype('category'), expected) tm.assert_series_equal(s.astype(CategoricalDtype()), expected) - msg = (r"could not convert string to float: '(0 - 499|9500 - 9999)'|" - r"invalid literal for float\(\): (0 - 499|9500 - 9999)") + msg = (r"could not convert string to float|" + r"invalid literal for float\(\)") with pytest.raises(ValueError, match=msg): s.astype('float64') diff --git a/pandas/tests/series/test_duplicates.py b/pandas/tests/series/test_duplicates.py index fe47975711a17..a975edacc19c7 100644 --- a/pandas/tests/series/test_duplicates.py +++ b/pandas/tests/series/test_duplicates.py @@ -59,12 +59,18 @@ def test_unique_data_ownership(): Series(Series(["a", "c", "b"]).unique()).sort_values() -def test_is_unique(): - # GH11946 - s = Series(np.random.randint(0, 10, size=1000)) - assert s.is_unique is False - s = Series(np.arange(1000)) - assert s.is_unique is True +@pytest.mark.parametrize('data, expected', [ + (np.random.randint(0, 10, size=1000), False), + (np.arange(1000), True), + ([], True), + ([np.nan], True), + (['foo', 'bar', np.nan], True), + (['foo', 'foo', np.nan], False), + (['foo', 'bar', np.nan, np.nan], False)]) +def test_is_unique(data, expected): + # GH11946 / GH25180 + s = Series(data) + assert s.is_unique is expected def test_is_unique_class_ne(capsys): diff --git a/pandas/tests/series/test_missing.py b/pandas/tests/series/test_missing.py index 985288c439917..f07dd1dfb5fda 100644 --- a/pandas/tests/series/test_missing.py +++ b/pandas/tests/series/test_missing.py @@ -854,8 +854,23 @@ def test_series_pad_backfill_limit(self): assert_series_equal(result, expected) -class TestSeriesInterpolateData(): +@pytest.fixture(params=['linear', 'index', 'values', 'nearest', 'slinear', + 'zero', 'quadratic', 'cubic', 'barycentric', 'krogh', + 'polynomial', 'spline', 'piecewise_polynomial', + 'from_derivatives', 'pchip', 'akima', ]) +def nontemporal_method(request): + """ Fixture that returns an (method name, required kwargs) pair. + + This fixture does not include method 'time' as a parameterization; that + method requires a Series with a DatetimeIndex, and is generally tested + separately from these non-temporal methods. + """ + method = request.param + kwargs = dict(order=1) if method in ('spline', 'polynomial') else dict() + return method, kwargs + +class TestSeriesInterpolateData(): def test_interpolate(self, datetime_series, string_series): ts = Series(np.arange(len(datetime_series), dtype=float), datetime_series.index) @@ -875,12 +890,12 @@ def test_interpolate(self, datetime_series, string_series): time_interp = ord_ts_copy.interpolate(method='time') tm.assert_series_equal(time_interp, ord_ts) - # try time interpolation on a non-TimeSeries - # Only raises ValueError if there are NaNs. - non_ts = string_series.copy() - non_ts[0] = np.NaN - msg = ("time-weighted interpolation only works on Series or DataFrames" - " with a DatetimeIndex") + def test_interpolate_time_raises_for_non_timeseries(self): + # When method='time' is used on a non-TimeSeries that contains a null + # value, a ValueError should be raised. + non_ts = Series([0, 1, 2, np.NaN]) + msg = ("time-weighted interpolation only works on Series.* " + "with a DatetimeIndex") with pytest.raises(ValueError, match=msg): non_ts.interpolate(method='time') @@ -1061,21 +1076,35 @@ def test_interp_limit(self): result = s.interpolate(method='linear', limit=2) assert_series_equal(result, expected) - # GH 9217, make sure limit is an int and greater than 0 - methods = ['linear', 'time', 'index', 'values', 'nearest', 'zero', - 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', - 'polynomial', 'spline', 'piecewise_polynomial', None, - 'from_derivatives', 'pchip', 'akima'] - s = pd.Series([1, 2, np.nan, np.nan, 5]) - msg = (r"Limit must be greater than 0|" - "time-weighted interpolation only works on Series or" - r" DataFrames with a DatetimeIndex|" - r"invalid method '(polynomial|spline|None)' to interpolate|" - "Limit must be an integer") - for limit in [-1, 0, 1., 2.]: - for method in methods: - with pytest.raises(ValueError, match=msg): - s.interpolate(limit=limit, method=method) + @pytest.mark.parametrize("limit", [-1, 0]) + def test_interpolate_invalid_nonpositive_limit(self, nontemporal_method, + limit): + # GH 9217: make sure limit is greater than zero. + s = pd.Series([1, 2, np.nan, 4]) + method, kwargs = nontemporal_method + with pytest.raises(ValueError, match="Limit must be greater than 0"): + s.interpolate(limit=limit, method=method, **kwargs) + + def test_interpolate_invalid_float_limit(self, nontemporal_method): + # GH 9217: make sure limit is an integer. + s = pd.Series([1, 2, np.nan, 4]) + method, kwargs = nontemporal_method + limit = 2.0 + with pytest.raises(ValueError, match="Limit must be an integer"): + s.interpolate(limit=limit, method=method, **kwargs) + + @pytest.mark.parametrize("invalid_method", [None, 'nonexistent_method']) + def test_interp_invalid_method(self, invalid_method): + s = Series([1, 3, np.nan, 12, np.nan, 25]) + + msg = "method must be one of.* Got '{}' instead".format(invalid_method) + with pytest.raises(ValueError, match=msg): + s.interpolate(method=invalid_method) + + # When an invalid method and invalid limit (such as -1) are + # provided, the error message reflects the invalid method. + with pytest.raises(ValueError, match=msg): + s.interpolate(method=invalid_method, limit=-1) def test_interp_limit_forward(self): s = Series([1, 3, np.nan, np.nan, np.nan, 11]) @@ -1276,11 +1305,20 @@ def test_interp_limit_no_nans(self): @td.skip_if_no_scipy @pytest.mark.parametrize("method", ['polynomial', 'spline']) def test_no_order(self, method): + # see GH-10633, GH-24014 s = Series([0, 1, np.nan, 3]) - msg = "invalid method '{}' to interpolate".format(method) + msg = "You must specify the order of the spline or polynomial" with pytest.raises(ValueError, match=msg): s.interpolate(method=method) + @td.skip_if_no_scipy + @pytest.mark.parametrize('order', [-1, -1.0, 0, 0.0, np.nan]) + def test_interpolate_spline_invalid_order(self, order): + s = Series([0, 1, np.nan, 3]) + msg = "order needs to be specified and greater than 0" + with pytest.raises(ValueError, match=msg): + s.interpolate(method='spline', order=order) + @td.skip_if_no_scipy def test_spline(self): s = Series([1, 2, np.nan, 4, 5, np.nan, 7]) @@ -1313,19 +1351,6 @@ def test_spline_interpolation(self): expected1 = s.interpolate(method='spline', order=1) assert_series_equal(result1, expected1) - @td.skip_if_no_scipy - def test_spline_error(self): - # see gh-10633 - s = pd.Series(np.arange(10) ** 2) - s[np.random.randint(0, 9, 3)] = np.nan - msg = "invalid method 'spline' to interpolate" - with pytest.raises(ValueError, match=msg): - s.interpolate(method='spline') - - msg = "order needs to be specified and greater than 0" - with pytest.raises(ValueError, match=msg): - s.interpolate(method='spline', order=0) - def test_interp_timedelta64(self): # GH 6424 df = Series([1, np.nan, 3], diff --git a/pandas/tests/series/test_operators.py b/pandas/tests/series/test_operators.py index 4d3c9926fc5ae..b2aac441db195 100644 --- a/pandas/tests/series/test_operators.py +++ b/pandas/tests/series/test_operators.py @@ -563,6 +563,13 @@ def test_comp_ops_df_compat(self): with pytest.raises(ValueError, match=msg): left.to_frame() < right.to_frame() + def test_compare_series_interval_keyword(self): + # GH 25338 + s = Series(['IntervalA', 'IntervalB', 'IntervalC']) + result = s == 'IntervalA' + expected = Series([True, False, False]) + assert_series_equal(result, expected) + class TestSeriesFlexComparisonOps(object): diff --git a/pandas/tests/series/test_rank.py b/pandas/tests/series/test_rank.py index 510a51e002918..dfcda889269ee 100644 --- a/pandas/tests/series/test_rank.py +++ b/pandas/tests/series/test_rank.py @@ -499,6 +499,7 @@ def test_rank_first_pct(dtype, ser, exp): @pytest.mark.single +@pytest.mark.high_memory def test_pct_max_many_rows(): # GH 18271 s = Series(np.arange(2**24 + 1)) diff --git a/pandas/tests/series/test_repr.py b/pandas/tests/series/test_repr.py index b4e7708e2456e..842207f2a572f 100644 --- a/pandas/tests/series/test_repr.py +++ b/pandas/tests/series/test_repr.py @@ -198,6 +198,14 @@ def test_latex_repr(self): assert s._repr_latex_() is None + def test_index_repr_in_frame_with_nan(self): + # see gh-25061 + i = Index([1, np.nan]) + s = Series([1, 2], index=i) + exp = """1.0 1\nNaN 2\ndtype: int64""" + + assert repr(s) == exp + class TestCategoricalRepr(object): diff --git a/pandas/tests/sparse/frame/test_frame.py b/pandas/tests/sparse/frame/test_frame.py index bfb5103c97adc..888d1fa1bfe45 100644 --- a/pandas/tests/sparse/frame/test_frame.py +++ b/pandas/tests/sparse/frame/test_frame.py @@ -7,7 +7,7 @@ import pytest from pandas._libs.sparse import BlockIndex, IntIndex -from pandas.compat import lrange +from pandas.compat import PY2, lrange from pandas.errors import PerformanceWarning import pandas as pd @@ -145,8 +145,9 @@ def test_constructor_ndarray(self, float_frame): tm.assert_sp_frame_equal(sp, float_frame.reindex(columns=['A'])) # raise on level argument - pytest.raises(TypeError, float_frame.reindex, columns=['A'], - level=1) + msg = "Reindex by level not supported for sparse" + with pytest.raises(TypeError, match=msg): + float_frame.reindex(columns=['A'], level=1) # wrong length index / columns with pytest.raises(ValueError, match="^Index length"): @@ -269,6 +270,19 @@ def test_type_coercion_at_construction(self): default_fill_value=0) tm.assert_sp_frame_equal(result, expected) + def test_default_dtype(self): + result = pd.SparseDataFrame(columns=list('ab'), index=range(2)) + expected = pd.SparseDataFrame([[np.nan, np.nan], [np.nan, np.nan]], + columns=list('ab'), index=range(2)) + tm.assert_sp_frame_equal(result, expected) + + def test_nan_data_with_int_dtype_raises_error(self): + sdf = pd.SparseDataFrame([[np.nan, np.nan], [np.nan, np.nan]], + columns=list('ab'), index=range(2)) + msg = "Cannot convert non-finite values" + with pytest.raises(ValueError, match=msg): + pd.SparseDataFrame(sdf, dtype=np.int64) + def test_dtypes(self): df = DataFrame(np.random.randn(10000, 4)) df.loc[:9998] = np.nan @@ -433,7 +447,8 @@ def test_getitem(self): exp = sdf.reindex(columns=['a', 'b']) tm.assert_sp_frame_equal(result, exp) - pytest.raises(Exception, sdf.__getitem__, ['a', 'd']) + with pytest.raises(KeyError, match=r"\['d'\] not in index"): + sdf[['a', 'd']] def test_iloc(self, float_frame): @@ -504,7 +519,9 @@ def test_getitem_overload(self, float_frame): subframe = float_frame[indexer] tm.assert_index_equal(subindex, subframe.index) - pytest.raises(Exception, float_frame.__getitem__, indexer[:-1]) + msg = "Item wrong length 9 instead of 10" + with pytest.raises(ValueError, match=msg): + float_frame[indexer[:-1]] def test_setitem(self, float_frame, float_frame_int_kind, float_frame_dense, @@ -551,8 +568,9 @@ def _check_frame(frame, orig): assert len(frame['I'].sp_values) == N // 2 # insert ndarray wrong size - pytest.raises(Exception, frame.__setitem__, 'foo', - np.random.randn(N - 1)) + msg = "Length of values does not match length of index" + with pytest.raises(AssertionError, match=msg): + frame['foo'] = np.random.randn(N - 1) # scalar value frame['J'] = 5 @@ -625,17 +643,22 @@ def test_delitem(self, float_frame): def test_set_columns(self, float_frame): float_frame.columns = float_frame.columns - pytest.raises(Exception, setattr, float_frame, 'columns', - float_frame.columns[:-1]) + msg = ("Length mismatch: Expected axis has 4 elements, new values have" + " 3 elements") + with pytest.raises(ValueError, match=msg): + float_frame.columns = float_frame.columns[:-1] def test_set_index(self, float_frame): float_frame.index = float_frame.index - pytest.raises(Exception, setattr, float_frame, 'index', - float_frame.index[:-1]) + msg = ("Length mismatch: Expected axis has 10 elements, new values" + " have 9 elements") + with pytest.raises(ValueError, match=msg): + float_frame.index = float_frame.index[:-1] def test_ctor_reindex(self): idx = pd.Index([0, 1, 2, 3]) - with pytest.raises(ValueError, match=''): + msg = "Length of passed values is 2, index implies 4" + with pytest.raises(ValueError, match=msg): pd.SparseDataFrame({"A": [1, 2]}, index=idx) def test_append(self, float_frame): @@ -858,6 +881,7 @@ def test_describe(self, float_frame): str(float_frame) desc = float_frame.describe() # noqa + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_join(self, float_frame): left = float_frame.loc[:, ['A', 'B']] right = float_frame.loc[:, ['C', 'D']] @@ -865,7 +889,10 @@ def test_join(self, float_frame): tm.assert_sp_frame_equal(joined, float_frame, exact_indices=False) right = float_frame.loc[:, ['B', 'D']] - pytest.raises(Exception, left.join, right) + msg = (r"columns overlap but no suffix specified: Index\(\['B'\]," + r" dtype='object'\)") + with pytest.raises(ValueError, match=msg): + left.join(right) with pytest.raises(ValueError, match='Other Series must have a name'): float_frame.join(Series( @@ -1046,8 +1073,11 @@ def _check(frame): _check(float_frame_int_kind) # for now - pytest.raises(Exception, _check, float_frame_fill0) - pytest.raises(Exception, _check, float_frame_fill2) + msg = "This routine assumes NaN fill value" + with pytest.raises(TypeError, match=msg): + _check(float_frame_fill0) + with pytest.raises(TypeError, match=msg): + _check(float_frame_fill2) def test_transpose(self, float_frame, float_frame_int_kind, float_frame_dense, @@ -1246,6 +1276,14 @@ def test_notna(self): 'B': [True, False, True, True, False]}) tm.assert_frame_equal(res.to_dense(), exp) + def test_default_fill_value_with_no_data(self): + # GH 16807 + expected = pd.SparseDataFrame([[1.0, 1.0], [1.0, 1.0]], + columns=list('ab'), index=range(2)) + result = pd.SparseDataFrame(columns=list('ab'), index=range(2), + default_fill_value=1.0) + tm.assert_frame_equal(expected, result) + class TestSparseDataFrameArithmetic(object): diff --git a/pandas/tests/sparse/series/test_series.py b/pandas/tests/sparse/series/test_series.py index 7eed47d0de888..93cf629f20957 100644 --- a/pandas/tests/sparse/series/test_series.py +++ b/pandas/tests/sparse/series/test_series.py @@ -452,12 +452,13 @@ def _check_getitem(sp, dense): _check_getitem(self.ziseries, self.ziseries.to_dense()) # exception handling - pytest.raises(Exception, self.bseries.__getitem__, - len(self.bseries) + 1) + with pytest.raises(IndexError, match="Out of bounds access"): + self.bseries[len(self.bseries) + 1] # index not contained - pytest.raises(Exception, self.btseries.__getitem__, - self.btseries.index[-1] + BDay()) + msg = r"Timestamp\('2011-01-31 00:00:00', freq='B'\)" + with pytest.raises(KeyError, match=msg): + self.btseries[self.btseries.index[-1] + BDay()] def test_get_get_value(self): tm.assert_almost_equal(self.bseries.get(10), self.bseries[10]) @@ -523,8 +524,9 @@ def _compare(idx): self._check_all(_compare_with_dense) - pytest.raises(Exception, self.bseries.take, - [0, len(self.bseries) + 1]) + msg = "index 21 is out of bounds for size 20" + with pytest.raises(IndexError, match=msg): + self.bseries.take([0, len(self.bseries) + 1]) # Corner case # XXX: changed test. Why wsa this considered a corner case? @@ -1138,25 +1140,35 @@ def test_to_coo_text_names_text_row_levels_nosort(self): def test_to_coo_bad_partition_nonnull_intersection(self): ss = self.sparse_series[0] - pytest.raises(ValueError, ss.to_coo, ['A', 'B', 'C'], ['C', 'D']) + msg = "Is not a partition because intersection is not null" + with pytest.raises(ValueError, match=msg): + ss.to_coo(['A', 'B', 'C'], ['C', 'D']) def test_to_coo_bad_partition_small_union(self): ss = self.sparse_series[0] - pytest.raises(ValueError, ss.to_coo, ['A'], ['C', 'D']) + msg = "Is not a partition because union is not the whole" + with pytest.raises(ValueError, match=msg): + ss.to_coo(['A'], ['C', 'D']) def test_to_coo_nlevels_less_than_two(self): ss = self.sparse_series[0] ss.index = np.arange(len(ss.index)) - pytest.raises(ValueError, ss.to_coo) + msg = "to_coo requires MultiIndex with nlevels > 2" + with pytest.raises(ValueError, match=msg): + ss.to_coo() def test_to_coo_bad_ilevel(self): ss = self.sparse_series[0] - pytest.raises(KeyError, ss.to_coo, ['A', 'B'], ['C', 'D', 'E']) + with pytest.raises(KeyError, match="Level E not found"): + ss.to_coo(['A', 'B'], ['C', 'D', 'E']) def test_to_coo_duplicate_index_entries(self): ss = pd.concat([self.sparse_series[0], self.sparse_series[0]]).to_sparse() - pytest.raises(ValueError, ss.to_coo, ['A', 'B'], ['C', 'D']) + msg = ("Duplicate index entries are not allowed in to_coo" + " transformation") + with pytest.raises(ValueError, match=msg): + ss.to_coo(['A', 'B'], ['C', 'D']) def test_from_coo_dense_index(self): ss = SparseSeries.from_coo(self.coo_matrices[0], dense_index=True) diff --git a/pandas/tests/test_algos.py b/pandas/tests/test_algos.py index 3d28b17750540..3f75c508d22f9 100644 --- a/pandas/tests/test_algos.py +++ b/pandas/tests/test_algos.py @@ -11,7 +11,7 @@ from pandas._libs import ( algos as libalgos, groupby as libgroupby, hashtable as ht) -from pandas.compat import lrange, range +from pandas.compat import PY2, lrange, range from pandas.compat.numpy import np_array_datetime64_compat import pandas.util._test_decorators as td @@ -224,11 +224,16 @@ def test_factorize_tuple_list(self, data, expected_label, expected_level): dtype=object) tm.assert_numpy_array_equal(result[1], expected_level_array) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_complex_sorting(self): # gh 12666 - check no segfault x17 = np.array([complex(i) for i in range(17)], dtype=object) - pytest.raises(TypeError, algos.factorize, x17[::-1], sort=True) + msg = (r"'(<|>)' not supported between instances of 'complex' and" + r" 'complex'|" + r"unorderable types: complex\(\) > complex\(\)") + with pytest.raises(TypeError, match=msg): + algos.factorize(x17[::-1], sort=True) def test_float64_factorize(self, writable): data = np.array([1.0, 1e8, 1.0, 1e-8, 1e8, 1.0], dtype=np.float64) @@ -589,9 +594,14 @@ class TestIsin(object): def test_invalid(self): - pytest.raises(TypeError, lambda: algos.isin(1, 1)) - pytest.raises(TypeError, lambda: algos.isin(1, [1])) - pytest.raises(TypeError, lambda: algos.isin([1], 1)) + msg = (r"only list-like objects are allowed to be passed to isin\(\)," + r" you passed a \[int\]") + with pytest.raises(TypeError, match=msg): + algos.isin(1, 1) + with pytest.raises(TypeError, match=msg): + algos.isin(1, [1]) + with pytest.raises(TypeError, match=msg): + algos.isin([1], 1) def test_basic(self): @@ -819,8 +829,9 @@ def test_value_counts_dtypes(self): result = algos.value_counts(Series([1, 1., '1'])) # object assert len(result) == 2 - pytest.raises(TypeError, lambda s: algos.value_counts(s, bins=1), - ['1', 1]) + msg = "bins argument only works with numeric data" + with pytest.raises(TypeError, match=msg): + algos.value_counts(['1', 1], bins=1) def test_value_counts_nat(self): td = Series([np.timedelta64(10000), pd.NaT], dtype='timedelta64[ns]') @@ -1222,7 +1233,7 @@ def test_group_var_constant(self): class TestGroupVarFloat64(GroupVarTestMixin): __test__ = True - algo = libgroupby.group_var_float64 + algo = staticmethod(libgroupby.group_var_float64) dtype = np.float64 rtol = 1e-5 @@ -1245,7 +1256,7 @@ def test_group_var_large_inputs(self): class TestGroupVarFloat32(GroupVarTestMixin): __test__ = True - algo = libgroupby.group_var_float32 + algo = staticmethod(libgroupby.group_var_float32) dtype = np.float32 rtol = 1e-2 @@ -1484,6 +1495,7 @@ def test_too_many_ndims(self): algos.rank(arr) @pytest.mark.single + @pytest.mark.high_memory @pytest.mark.parametrize('values', [ np.arange(2**24 + 1), np.arange(2**25 + 2).reshape(2**24 + 1, 2)], diff --git a/pandas/tests/test_config.py b/pandas/tests/test_config.py index 54db3887850ea..baca66e0361ad 100644 --- a/pandas/tests/test_config.py +++ b/pandas/tests/test_config.py @@ -3,7 +3,10 @@ import pytest +from pandas.compat import PY2 + import pandas as pd +from pandas.core.config import OptionError class TestConfig(object): @@ -48,26 +51,35 @@ def test_is_one_of_factory(self): v(12) v(None) - pytest.raises(ValueError, v, 1.1) + msg = r"Value must be one of None\|12" + with pytest.raises(ValueError, match=msg): + v(1.1) def test_register_option(self): self.cf.register_option('a', 1, 'doc') # can't register an already registered option - pytest.raises(KeyError, self.cf.register_option, 'a', 1, 'doc') + msg = "Option 'a' has already been registered" + with pytest.raises(OptionError, match=msg): + self.cf.register_option('a', 1, 'doc') # can't register an already registered option - pytest.raises(KeyError, self.cf.register_option, 'a.b.c.d1', 1, - 'doc') - pytest.raises(KeyError, self.cf.register_option, 'a.b.c.d2', 1, - 'doc') + msg = "Path prefix to option 'a' is already an option" + with pytest.raises(OptionError, match=msg): + self.cf.register_option('a.b.c.d1', 1, 'doc') + with pytest.raises(OptionError, match=msg): + self.cf.register_option('a.b.c.d2', 1, 'doc') # no python keywords - pytest.raises(ValueError, self.cf.register_option, 'for', 0) - pytest.raises(ValueError, self.cf.register_option, 'a.for.b', 0) + msg = "for is a python keyword" + with pytest.raises(ValueError, match=msg): + self.cf.register_option('for', 0) + with pytest.raises(ValueError, match=msg): + self.cf.register_option('a.for.b', 0) # must be valid identifier (ensure attribute access works) - pytest.raises(ValueError, self.cf.register_option, - 'Oh my Goddess!', 0) + msg = "oh my goddess! is not a valid identifier" + with pytest.raises(ValueError, match=msg): + self.cf.register_option('Oh my Goddess!', 0) # we can register options several levels deep # without predefining the intermediate steps @@ -90,7 +102,9 @@ def test_describe_option(self): self.cf.register_option('l', "foo") # non-existent keys raise KeyError - pytest.raises(KeyError, self.cf.describe_option, 'no.such.key') + msg = r"No such keys\(s\)" + with pytest.raises(OptionError, match=msg): + self.cf.describe_option('no.such.key') # we can get the description for any key we registered assert 'doc' in self.cf.describe_option('a', _print_desc=False) @@ -122,7 +136,9 @@ def test_case_insensitive(self): assert self.cf.get_option('kAnBaN') == 2 # gets of non-existent keys fail - pytest.raises(KeyError, self.cf.get_option, 'no_such_option') + msg = r"No such keys\(s\): 'no_such_option'" + with pytest.raises(OptionError, match=msg): + self.cf.get_option('no_such_option') self.cf.deprecate_option('KanBan') assert self.cf._is_deprecated('kAnBaN') @@ -138,7 +154,9 @@ def test_get_option(self): assert self.cf.get_option('b.b') is None # gets of non-existent keys fail - pytest.raises(KeyError, self.cf.get_option, 'no_such_option') + msg = r"No such keys\(s\): 'no_such_option'" + with pytest.raises(OptionError, match=msg): + self.cf.get_option('no_such_option') def test_set_option(self): self.cf.register_option('a', 1, 'doc') @@ -157,16 +175,24 @@ def test_set_option(self): assert self.cf.get_option('b.c') == 'wurld' assert self.cf.get_option('b.b') == 1.1 - pytest.raises(KeyError, self.cf.set_option, 'no.such.key', None) + msg = r"No such keys\(s\): 'no.such.key'" + with pytest.raises(OptionError, match=msg): + self.cf.set_option('no.such.key', None) def test_set_option_empty_args(self): - pytest.raises(ValueError, self.cf.set_option) + msg = "Must provide an even number of non-keyword arguments" + with pytest.raises(ValueError, match=msg): + self.cf.set_option() def test_set_option_uneven_args(self): - pytest.raises(ValueError, self.cf.set_option, 'a.b', 2, 'b.c') + msg = "Must provide an even number of non-keyword arguments" + with pytest.raises(ValueError, match=msg): + self.cf.set_option('a.b', 2, 'b.c') def test_set_option_invalid_single_argument_type(self): - pytest.raises(ValueError, self.cf.set_option, 2) + msg = "Must provide an even number of non-keyword arguments" + with pytest.raises(ValueError, match=msg): + self.cf.set_option(2) def test_set_option_multiple(self): self.cf.register_option('a', 1, 'doc') @@ -183,27 +209,36 @@ def test_set_option_multiple(self): assert self.cf.get_option('b.c') is None assert self.cf.get_option('b.b') == 10.0 + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_validation(self): self.cf.register_option('a', 1, 'doc', validator=self.cf.is_int) self.cf.register_option('b.c', 'hullo', 'doc2', validator=self.cf.is_text) - pytest.raises(ValueError, self.cf.register_option, 'a.b.c.d2', - 'NO', 'doc', validator=self.cf.is_int) + msg = "Value must have type ''" + with pytest.raises(ValueError, match=msg): + self.cf.register_option( + 'a.b.c.d2', 'NO', 'doc', validator=self.cf.is_int) self.cf.set_option('a', 2) # int is_int self.cf.set_option('b.c', 'wurld') # str is_str - pytest.raises( - ValueError, self.cf.set_option, 'a', None) # None not is_int - pytest.raises(ValueError, self.cf.set_option, 'a', 'ab') - pytest.raises(ValueError, self.cf.set_option, 'b.c', 1) + # None not is_int + with pytest.raises(ValueError, match=msg): + self.cf.set_option('a', None) + with pytest.raises(ValueError, match=msg): + self.cf.set_option('a', 'ab') + + msg = r"Value must be an instance of \|" + with pytest.raises(ValueError, match=msg): + self.cf.set_option('b.c', 1) validator = self.cf.is_one_of_factory([None, self.cf.is_callable]) self.cf.register_option('b', lambda: None, 'doc', validator=validator) self.cf.set_option('b', '%.1f'.format) # Formatter is callable self.cf.set_option('b', None) # Formatter is none (default) - pytest.raises(ValueError, self.cf.set_option, 'b', '%.1f') + with pytest.raises(ValueError, match="Value must be a callable"): + self.cf.set_option('b', '%.1f') def test_reset_option(self): self.cf.register_option('a', 1, 'doc', validator=self.cf.is_int) @@ -267,8 +302,9 @@ def test_deprecate_option(self): assert 'eprecated' in str(w[-1]) # we get the default message assert 'nifty_ver' in str(w[-1]) # with the removal_ver quoted - pytest.raises( - KeyError, self.cf.deprecate_option, 'a') # can't depr. twice + msg = "Option 'a' has already been defined as deprecated" + with pytest.raises(OptionError, match=msg): + self.cf.deprecate_option('a') self.cf.deprecate_option('b.c', 'zounds!') with warnings.catch_warnings(record=True) as w: @@ -374,12 +410,6 @@ def eq(val): def test_attribute_access(self): holder = [] - def f(): - options.b = 1 - - def f2(): - options.display = 1 - def f3(key): holder.append(True) @@ -397,8 +427,11 @@ def f3(key): self.cf.reset_option("a") assert options.a == self.cf.get_option("a", 0) - pytest.raises(KeyError, f) - pytest.raises(KeyError, f2) + msg = "You can only set the value of existing options" + with pytest.raises(OptionError, match=msg): + options.b = 1 + with pytest.raises(OptionError, match=msg): + options.display = 1 # make sure callback kicks when using this form of setting options.c = 1 @@ -429,5 +462,6 @@ def test_option_context_scope(self): def test_dictwrapper_getattr(self): options = self.cf.options # GH 19789 - pytest.raises(self.cf.OptionError, getattr, options, 'bananas') + with pytest.raises(OptionError, match="No such option"): + options.bananas assert not hasattr(options, 'bananas') diff --git a/pandas/tests/test_downstream.py b/pandas/tests/test_downstream.py index e22b9a0ef25e3..92b4e5a99041a 100644 --- a/pandas/tests/test_downstream.py +++ b/pandas/tests/test_downstream.py @@ -9,7 +9,7 @@ import numpy as np # noqa import pytest -from pandas.compat import PY36 +from pandas.compat import PY2, PY36, is_platform_windows from pandas import DataFrame from pandas.util import testing as tm @@ -58,6 +58,8 @@ def test_xarray(df): assert df.to_xarray() is not None +@pytest.mark.skipif(is_platform_windows() and PY2, + reason="Broken on Windows / Py2") def test_oo_optimizable(): # GH 21071 subprocess.check_call([sys.executable, "-OO", "-c", "import pandas"]) diff --git a/pandas/tests/test_expressions.py b/pandas/tests/test_expressions.py index f5aa0b0b3c9c8..7a2680135ea80 100644 --- a/pandas/tests/test_expressions.py +++ b/pandas/tests/test_expressions.py @@ -3,19 +3,17 @@ import operator import re -from warnings import catch_warnings, simplefilter import numpy as np from numpy.random import randn import pytest from pandas import _np_version_under1p13, compat -from pandas.core.api import DataFrame, Panel +from pandas.core.api import DataFrame from pandas.core.computation import expressions as expr import pandas.util.testing as tm from pandas.util.testing import ( - assert_almost_equal, assert_frame_equal, assert_panel_equal, - assert_series_equal) + assert_almost_equal, assert_frame_equal, assert_series_equal) from pandas.io.formats.printing import pprint_thing @@ -39,23 +37,6 @@ _integer2 = DataFrame(np.random.randint(1, 100, size=(101, 4)), columns=list('ABCD'), dtype='int64') -with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - _frame_panel = Panel(dict(ItemA=_frame.copy(), - ItemB=(_frame.copy() + 3), - ItemC=_frame.copy(), - ItemD=_frame.copy())) - _frame2_panel = Panel(dict(ItemA=_frame2.copy(), - ItemB=(_frame2.copy() + 3), - ItemC=_frame2.copy(), - ItemD=_frame2.copy())) - _integer_panel = Panel(dict(ItemA=_integer, - ItemB=(_integer + 34).astype('int64'))) - _integer2_panel = Panel(dict(ItemA=_integer2, - ItemB=(_integer2 + 34).astype('int64'))) - _mixed_panel = Panel(dict(ItemA=_mixed, ItemB=(_mixed + 3))) - _mixed2_panel = Panel(dict(ItemA=_mixed2, ItemB=(_mixed2 + 3))) - @pytest.mark.skipif(not expr._USE_NUMEXPR, reason='not using numexpr') class TestExpressions(object): @@ -173,42 +154,18 @@ def run_series(self, ser, other, binary_comp=None, **kwargs): # self.run_binary(ser, binary_comp, assert_frame_equal, # test_flex=True, **kwargs) - def run_panel(self, panel, other, binary_comp=None, run_binary=True, - assert_func=assert_panel_equal, **kwargs): - self.run_arithmetic(panel, other, assert_func, test_flex=False, - **kwargs) - self.run_arithmetic(panel, other, assert_func, test_flex=True, - **kwargs) - if run_binary: - if binary_comp is None: - binary_comp = other + 1 - self.run_binary(panel, binary_comp, assert_func, - test_flex=False, **kwargs) - self.run_binary(panel, binary_comp, assert_func, - test_flex=True, **kwargs) - def test_integer_arithmetic_frame(self): self.run_frame(self.integer, self.integer) def test_integer_arithmetic_series(self): self.run_series(self.integer.iloc[:, 0], self.integer.iloc[:, 0]) - @pytest.mark.slow - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_integer_panel(self): - self.run_panel(_integer2_panel, np.random.randint(1, 100)) - def test_float_arithemtic_frame(self): self.run_frame(self.frame2, self.frame2) def test_float_arithmetic_series(self): self.run_series(self.frame2.iloc[:, 0], self.frame2.iloc[:, 0]) - @pytest.mark.slow - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_float_panel(self): - self.run_panel(_frame2_panel, np.random.randn() + 0.1, binary_comp=0.8) - def test_mixed_arithmetic_frame(self): # TODO: FIGURE OUT HOW TO GET IT TO WORK... # can't do arithmetic because comparison methods try to do *entire* @@ -219,12 +176,6 @@ def test_mixed_arithmetic_series(self): for col in self.mixed2.columns: self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4) - @pytest.mark.slow - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_mixed_panel(self): - self.run_panel(_mixed2_panel, np.random.randint(1, 100), - binary_comp=-2) - def test_float_arithemtic(self): self.run_arithmetic(self.frame, self.frame, assert_frame_equal) self.run_arithmetic(self.frame.iloc[:, 0], self.frame.iloc[:, 0], diff --git a/pandas/tests/test_multilevel.py b/pandas/tests/test_multilevel.py index a7bbbbb5033ac..a9a59c6d95373 100644 --- a/pandas/tests/test_multilevel.py +++ b/pandas/tests/test_multilevel.py @@ -15,7 +15,7 @@ from pandas.core.dtypes.common import is_float_dtype, is_integer_dtype import pandas as pd -from pandas import DataFrame, Panel, Series, Timestamp, isna +from pandas import DataFrame, Series, Timestamp, isna from pandas.core.index import Index, MultiIndex import pandas.util.testing as tm @@ -818,18 +818,6 @@ def test_swaplevel(self): exp = self.frame.swaplevel('first', 'second').T tm.assert_frame_equal(swapped, exp) - def test_swaplevel_panel(self): - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - panel = Panel({'ItemA': self.frame, 'ItemB': self.frame * 2}) - expected = panel.copy() - expected.major_axis = expected.major_axis.swaplevel(0, 1) - - for result in (panel.swaplevel(axis='major'), - panel.swaplevel(0, axis='major'), - panel.swaplevel(0, 1, axis='major')): - tm.assert_panel_equal(result, expected) - def test_reorder_levels(self): result = self.ymd.reorder_levels(['month', 'day', 'year']) expected = self.ymd.swaplevel(0, 1).swaplevel(1, 2) @@ -898,8 +886,11 @@ def test_count(self): tm.assert_series_equal(result, expect, check_names=False) assert result.index.name == 'a' - pytest.raises(KeyError, series.count, 'x') - pytest.raises(KeyError, frame.count, level='x') + msg = "Level x not found" + with pytest.raises(KeyError, match=msg): + series.count('x') + with pytest.raises(KeyError, match=msg): + frame.count(level='x') @pytest.mark.parametrize('op', AGG_FUNCTIONS) @pytest.mark.parametrize('level', [0, 1]) @@ -1131,7 +1122,8 @@ def test_level_with_tuples(self): tm.assert_series_equal(result, expected) tm.assert_series_equal(result2, expected) - pytest.raises(KeyError, series.__getitem__, (('foo', 'bar', 0), 2)) + with pytest.raises(KeyError, match=r"^\(\('foo', 'bar', 0\), 2\)$"): + series[('foo', 'bar', 0), 2] result = frame.loc[('foo', 'bar', 0)] result2 = frame.xs(('foo', 'bar', 0)) diff --git a/pandas/tests/test_nanops.py b/pandas/tests/test_nanops.py index cf5ef6cf15eca..d1893b7efbc41 100644 --- a/pandas/tests/test_nanops.py +++ b/pandas/tests/test_nanops.py @@ -7,6 +7,7 @@ import numpy as np import pytest +from pandas.compat import PY2 from pandas.compat.numpy import _np_version_under1p13 import pandas.util._test_decorators as td @@ -728,6 +729,7 @@ def test_numeric_values(self): # Test complex assert nanops._ensure_numeric(1 + 2j) == 1 + 2j + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_ndarray(self): # Test numeric ndarray values = np.array([1, 2, 3]) @@ -743,7 +745,9 @@ def test_ndarray(self): # Test non-convertible string ndarray s_values = np.array(['foo', 'bar', 'baz'], dtype=object) - pytest.raises(ValueError, lambda: nanops._ensure_numeric(s_values)) + msg = r"could not convert string to float: '(foo|baz)'" + with pytest.raises(ValueError, match=msg): + nanops._ensure_numeric(s_values) def test_convertable_values(self): assert np.allclose(nanops._ensure_numeric('1'), 1.0) @@ -751,9 +755,15 @@ def test_convertable_values(self): assert np.allclose(nanops._ensure_numeric('1+1j'), 1 + 1j) def test_non_convertable_values(self): - pytest.raises(TypeError, lambda: nanops._ensure_numeric('foo')) - pytest.raises(TypeError, lambda: nanops._ensure_numeric({})) - pytest.raises(TypeError, lambda: nanops._ensure_numeric([])) + msg = "Could not convert foo to numeric" + with pytest.raises(TypeError, match=msg): + nanops._ensure_numeric('foo') + msg = "Could not convert {} to numeric" + with pytest.raises(TypeError, match=msg): + nanops._ensure_numeric({}) + msg = r"Could not convert \[\] to numeric" + with pytest.raises(TypeError, match=msg): + nanops._ensure_numeric([]) class TestNanvarFixedValues(object): diff --git a/pandas/tests/test_panel.py b/pandas/tests/test_panel.py index ba0ad72e624f7..b418091de8d7f 100644 --- a/pandas/tests/test_panel.py +++ b/pandas/tests/test_panel.py @@ -1,57 +1,28 @@ # -*- coding: utf-8 -*- # pylint: disable=W0612,E1101 - +from collections import OrderedDict from datetime import datetime -import operator -from warnings import catch_warnings, simplefilter import numpy as np import pytest -from pandas.compat import OrderedDict, StringIO, lrange, range, signature -import pandas.util._test_decorators as td - -from pandas.core.dtypes.common import is_float_dtype +from pandas.compat import lrange -from pandas import ( - DataFrame, Index, MultiIndex, Series, compat, date_range, isna, notna) -from pandas.core.nanops import nanall, nanany +from pandas import DataFrame, MultiIndex, Series, date_range, notna import pandas.core.panel as panelm from pandas.core.panel import Panel import pandas.util.testing as tm from pandas.util.testing import ( - assert_almost_equal, assert_frame_equal, assert_panel_equal, - assert_series_equal, ensure_clean, makeCustomDataframe as mkdf, - makeMixedDataFrame) - -from pandas.io.formats.printing import pprint_thing -from pandas.tseries.offsets import BDay, MonthEnd + assert_almost_equal, assert_frame_equal, assert_series_equal, + makeCustomDataframe as mkdf, makeMixedDataFrame) - -def make_test_panel(): - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - _panel = tm.makePanel() - tm.add_nans(_panel) - _panel = _panel.copy() - return _panel +from pandas.tseries.offsets import MonthEnd @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class PanelTests(object): panel = None - def test_pickle(self): - unpickled = tm.round_trip_pickle(self.panel) - assert_frame_equal(unpickled['ItemA'], self.panel['ItemA']) - - def test_rank(self): - pytest.raises(NotImplementedError, lambda: self.panel.rank()) - - def test_cumsum(self): - cumsum = self.panel.cumsum() - assert_frame_equal(cumsum['ItemA'], self.panel['ItemA'].cumsum()) - def not_hashable(self): c_empty = Panel() c = Panel(Panel([[[1]]])) @@ -59,298 +30,9 @@ def not_hashable(self): pytest.raises(TypeError, hash, c) -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -class SafeForLongAndSparse(object): - - def test_repr(self): - repr(self.panel) - - def test_copy_names(self): - for attr in ('major_axis', 'minor_axis'): - getattr(self.panel, attr).name = None - cp = self.panel.copy() - getattr(cp, attr).name = 'foo' - assert getattr(self.panel, attr).name is None - - def test_iter(self): - tm.equalContents(list(self.panel), self.panel.items) - - def test_count(self): - f = lambda s: notna(s).sum() - self._check_stat_op('count', f, obj=self.panel, has_skipna=False) - - def test_sum(self): - self._check_stat_op('sum', np.sum, skipna_alternative=np.nansum) - - def test_mean(self): - self._check_stat_op('mean', np.mean) - - def test_prod(self): - self._check_stat_op('prod', np.prod, skipna_alternative=np.nanprod) - - @pytest.mark.filterwarnings("ignore:Invalid value:RuntimeWarning") - @pytest.mark.filterwarnings("ignore:All-NaN:RuntimeWarning") - def test_median(self): - def wrapper(x): - if isna(x).any(): - return np.nan - return np.median(x) - - self._check_stat_op('median', wrapper) - - @pytest.mark.filterwarnings("ignore:Invalid value:RuntimeWarning") - def test_min(self): - self._check_stat_op('min', np.min) - - @pytest.mark.filterwarnings("ignore:Invalid value:RuntimeWarning") - def test_max(self): - self._check_stat_op('max', np.max) - - @td.skip_if_no_scipy - def test_skew(self): - from scipy.stats import skew - - def this_skew(x): - if len(x) < 3: - return np.nan - return skew(x, bias=False) - - self._check_stat_op('skew', this_skew) - - def test_var(self): - def alt(x): - if len(x) < 2: - return np.nan - return np.var(x, ddof=1) - - self._check_stat_op('var', alt) - - def test_std(self): - def alt(x): - if len(x) < 2: - return np.nan - return np.std(x, ddof=1) - - self._check_stat_op('std', alt) - - def test_sem(self): - def alt(x): - if len(x) < 2: - return np.nan - return np.std(x, ddof=1) / np.sqrt(len(x)) - - self._check_stat_op('sem', alt) - - def _check_stat_op(self, name, alternative, obj=None, has_skipna=True, - skipna_alternative=None): - if obj is None: - obj = self.panel - - # # set some NAs - # obj.loc[5:10] = np.nan - # obj.loc[15:20, -2:] = np.nan - - f = getattr(obj, name) - - if has_skipna: - - skipna_wrapper = tm._make_skipna_wrapper(alternative, - skipna_alternative) - - def wrapper(x): - return alternative(np.asarray(x)) - - for i in range(obj.ndim): - result = f(axis=i, skipna=False) - assert_frame_equal(result, obj.apply(wrapper, axis=i)) - else: - skipna_wrapper = alternative - wrapper = alternative - - for i in range(obj.ndim): - result = f(axis=i) - if name in ['sum', 'prod']: - assert_frame_equal(result, obj.apply(skipna_wrapper, axis=i)) - - pytest.raises(Exception, f, axis=obj.ndim) - - # Unimplemented numeric_only parameter. - if 'numeric_only' in signature(f).args: - with pytest.raises(NotImplementedError, match=name): - f(numeric_only=True) - - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class SafeForSparse(object): - def test_get_axis(self): - assert (self.panel._get_axis(0) is self.panel.items) - assert (self.panel._get_axis(1) is self.panel.major_axis) - assert (self.panel._get_axis(2) is self.panel.minor_axis) - - def test_set_axis(self): - new_items = Index(np.arange(len(self.panel.items))) - new_major = Index(np.arange(len(self.panel.major_axis))) - new_minor = Index(np.arange(len(self.panel.minor_axis))) - - # ensure propagate to potentially prior-cached items too - item = self.panel['ItemA'] - self.panel.items = new_items - - if hasattr(self.panel, '_item_cache'): - assert 'ItemA' not in self.panel._item_cache - assert self.panel.items is new_items - - # TODO: unused? - item = self.panel[0] # noqa - - self.panel.major_axis = new_major - assert self.panel[0].index is new_major - assert self.panel.major_axis is new_major - - # TODO: unused? - item = self.panel[0] # noqa - - self.panel.minor_axis = new_minor - assert self.panel[0].columns is new_minor - assert self.panel.minor_axis is new_minor - - def test_get_axis_number(self): - assert self.panel._get_axis_number('items') == 0 - assert self.panel._get_axis_number('major') == 1 - assert self.panel._get_axis_number('minor') == 2 - - with pytest.raises(ValueError, match="No axis named foo"): - self.panel._get_axis_number('foo') - - with pytest.raises(ValueError, match="No axis named foo"): - self.panel.__ge__(self.panel, axis='foo') - - def test_get_axis_name(self): - assert self.panel._get_axis_name(0) == 'items' - assert self.panel._get_axis_name(1) == 'major_axis' - assert self.panel._get_axis_name(2) == 'minor_axis' - - def test_get_plane_axes(self): - # what to do here? - - index, columns = self.panel._get_plane_axes('items') - index, columns = self.panel._get_plane_axes('major_axis') - index, columns = self.panel._get_plane_axes('minor_axis') - index, columns = self.panel._get_plane_axes(0) - - def test_truncate(self): - dates = self.panel.major_axis - start, end = dates[1], dates[5] - - trunced = self.panel.truncate(start, end, axis='major') - expected = self.panel['ItemA'].truncate(start, end) - - assert_frame_equal(trunced['ItemA'], expected) - - trunced = self.panel.truncate(before=start, axis='major') - expected = self.panel['ItemA'].truncate(before=start) - - assert_frame_equal(trunced['ItemA'], expected) - - trunced = self.panel.truncate(after=end, axis='major') - expected = self.panel['ItemA'].truncate(after=end) - - assert_frame_equal(trunced['ItemA'], expected) - - def test_arith(self): - self._test_op(self.panel, operator.add) - self._test_op(self.panel, operator.sub) - self._test_op(self.panel, operator.mul) - self._test_op(self.panel, operator.truediv) - self._test_op(self.panel, operator.floordiv) - self._test_op(self.panel, operator.pow) - - self._test_op(self.panel, lambda x, y: y + x) - self._test_op(self.panel, lambda x, y: y - x) - self._test_op(self.panel, lambda x, y: y * x) - self._test_op(self.panel, lambda x, y: y / x) - self._test_op(self.panel, lambda x, y: y ** x) - - self._test_op(self.panel, lambda x, y: x + y) # panel + 1 - self._test_op(self.panel, lambda x, y: x - y) # panel - 1 - self._test_op(self.panel, lambda x, y: x * y) # panel * 1 - self._test_op(self.panel, lambda x, y: x / y) # panel / 1 - self._test_op(self.panel, lambda x, y: x ** y) # panel ** 1 - - pytest.raises(Exception, self.panel.__add__, - self.panel['ItemA']) - - @staticmethod - def _test_op(panel, op): - result = op(panel, 1) - assert_frame_equal(result['ItemA'], op(panel['ItemA'], 1)) - - def test_keys(self): - tm.equalContents(list(self.panel.keys()), self.panel.items) - - def test_iteritems(self): - # Test panel.iteritems(), aka panel.iteritems() - # just test that it works - for k, v in self.panel.iteritems(): - pass - - assert len(list(self.panel.iteritems())) == len(self.panel.items) - - def test_combineFrame(self): - def check_op(op, name): - # items - df = self.panel['ItemA'] - - func = getattr(self.panel, name) - - result = func(df, axis='items') - - assert_frame_equal( - result['ItemB'], op(self.panel['ItemB'], df)) - - # major - xs = self.panel.major_xs(self.panel.major_axis[0]) - result = func(xs, axis='major') - - idx = self.panel.major_axis[1] - - assert_frame_equal(result.major_xs(idx), - op(self.panel.major_xs(idx), xs)) - - # minor - xs = self.panel.minor_xs(self.panel.minor_axis[0]) - result = func(xs, axis='minor') - - idx = self.panel.minor_axis[1] - - assert_frame_equal(result.minor_xs(idx), - op(self.panel.minor_xs(idx), xs)) - - ops = ['add', 'sub', 'mul', 'truediv', 'floordiv', 'pow', 'mod'] - if not compat.PY3: - ops.append('div') - - for op in ops: - try: - check_op(getattr(operator, op), op) - except AttributeError: - pprint_thing("Failing operation: %r" % op) - raise - if compat.PY3: - try: - check_op(operator.truediv, 'div') - except AttributeError: - pprint_thing("Failing operation: %r" % 'div') - raise - - def test_combinePanel(self): - result = self.panel.add(self.panel) - assert_panel_equal(result, self.panel * 2) - - def test_neg(self): - assert_panel_equal(-self.panel, self.panel * -1) - # issue 7692 def test_raise_when_not_implemented(self): p = Panel(np.arange(3 * 4 * 5).reshape(3, 4, 5), @@ -364,84 +46,11 @@ def test_raise_when_not_implemented(self): with pytest.raises(NotImplementedError): getattr(p, op)(d, axis=0) - def test_select(self): - p = self.panel - - # select items - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = p.select(lambda x: x in ('ItemA', 'ItemC'), axis='items') - expected = p.reindex(items=['ItemA', 'ItemC']) - assert_panel_equal(result, expected) - - # select major_axis - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = p.select(lambda x: x >= datetime( - 2000, 1, 15), axis='major') - new_major = p.major_axis[p.major_axis >= datetime(2000, 1, 15)] - expected = p.reindex(major=new_major) - assert_panel_equal(result, expected) - - # select minor_axis - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = p.select(lambda x: x in ('D', 'A'), axis=2) - expected = p.reindex(minor=['A', 'D']) - assert_panel_equal(result, expected) - - # corner case, empty thing - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = p.select(lambda x: x in ('foo', ), axis='items') - assert_panel_equal(result, p.reindex(items=[])) - - def test_get_value(self): - for item in self.panel.items: - for mjr in self.panel.major_axis[::2]: - for mnr in self.panel.minor_axis: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = self.panel.get_value(item, mjr, mnr) - expected = self.panel[item][mnr][mjr] - assert_almost_equal(result, expected) - - def test_abs(self): - - result = self.panel.abs() - result2 = abs(self.panel) - expected = np.abs(self.panel) - assert_panel_equal(result, expected) - assert_panel_equal(result2, expected) - - df = self.panel['ItemA'] - result = df.abs() - result2 = abs(df) - expected = np.abs(df) - assert_frame_equal(result, expected) - assert_frame_equal(result2, expected) - - s = df['A'] - result = s.abs() - result2 = abs(s) - expected = np.abs(s) - assert_series_equal(result, expected) - assert_series_equal(result2, expected) - assert result.name == 'A' - assert result2.name == 'A' - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class CheckIndexing(object): - def test_getitem(self): - pytest.raises(Exception, self.panel.__getitem__, 'ItemQ') - def test_delitem_and_pop(self): - expected = self.panel['ItemA'] - result = self.panel.pop('ItemA') - assert_frame_equal(expected, result) - assert 'ItemA' not in self.panel.items - - del self.panel['ItemB'] - assert 'ItemB' not in self.panel.items - pytest.raises(Exception, self.panel.__delitem__, 'ItemB') values = np.empty((3, 3, 3)) values[0] = 0 @@ -468,38 +77,6 @@ def test_delitem_and_pop(self): tm.assert_frame_equal(panelc[0], panel[0]) def test_setitem(self): - lp = self.panel.filter(['ItemA', 'ItemB']).to_frame() - - with pytest.raises(TypeError): - self.panel['ItemE'] = lp - - # DataFrame - df = self.panel['ItemA'][2:].filter(items=['A', 'B']) - self.panel['ItemF'] = df - self.panel['ItemE'] = df - - df2 = self.panel['ItemF'] - - assert_frame_equal(df, df2.reindex( - index=df.index, columns=df.columns)) - - # scalar - self.panel['ItemG'] = 1 - self.panel['ItemE'] = True - assert self.panel['ItemG'].values.dtype == np.int64 - assert self.panel['ItemE'].values.dtype == np.bool_ - - # object dtype - self.panel['ItemQ'] = 'foo' - assert self.panel['ItemQ'].values.dtype == np.object_ - - # boolean dtype - self.panel['ItemP'] = self.panel['ItemA'] > 0 - assert self.panel['ItemP'].values.dtype == np.bool_ - - pytest.raises(TypeError, self.panel.__setitem__, 'foo', - self.panel.loc[['ItemP']]) - # bad shape p = Panel(np.random.randn(4, 3, 2)) msg = (r"shape of value must be \(3, 2\), " @@ -537,159 +114,9 @@ def test_set_minor_major(self): assert_frame_equal(panel.loc[:, 'NewMajor', :], newmajor.astype(object)) - def test_major_xs(self): - ref = self.panel['ItemA'] - - idx = self.panel.major_axis[5] - xs = self.panel.major_xs(idx) - - result = xs['ItemA'] - assert_series_equal(result, ref.xs(idx), check_names=False) - assert result.name == 'ItemA' - - # not contained - idx = self.panel.major_axis[0] - BDay() - pytest.raises(Exception, self.panel.major_xs, idx) - - def test_major_xs_mixed(self): - self.panel['ItemD'] = 'foo' - xs = self.panel.major_xs(self.panel.major_axis[0]) - assert xs['ItemA'].dtype == np.float64 - assert xs['ItemD'].dtype == np.object_ - - def test_minor_xs(self): - ref = self.panel['ItemA'] - - idx = self.panel.minor_axis[1] - xs = self.panel.minor_xs(idx) - - assert_series_equal(xs['ItemA'], ref[idx], check_names=False) - - # not contained - pytest.raises(Exception, self.panel.minor_xs, 'E') - - def test_minor_xs_mixed(self): - self.panel['ItemD'] = 'foo' - - xs = self.panel.minor_xs('D') - assert xs['ItemA'].dtype == np.float64 - assert xs['ItemD'].dtype == np.object_ - - def test_xs(self): - itemA = self.panel.xs('ItemA', axis=0) - expected = self.panel['ItemA'] - tm.assert_frame_equal(itemA, expected) - - # Get a view by default. - itemA_view = self.panel.xs('ItemA', axis=0) - itemA_view.values[:] = np.nan - - assert np.isnan(self.panel['ItemA'].values).all() - - # Mixed-type yields a copy. - self.panel['strings'] = 'foo' - result = self.panel.xs('D', axis=2) - assert result._is_copy is not None - - def test_getitem_fancy_labels(self): - p = self.panel - - items = p.items[[1, 0]] - dates = p.major_axis[::2] - cols = ['D', 'C', 'F'] - - # all 3 specified - with catch_warnings(): - simplefilter("ignore", FutureWarning) - # XXX: warning in _validate_read_indexer - assert_panel_equal(p.loc[items, dates, cols], - p.reindex(items=items, major=dates, minor=cols)) - - # 2 specified - assert_panel_equal(p.loc[:, dates, cols], - p.reindex(major=dates, minor=cols)) - - assert_panel_equal(p.loc[items, :, cols], - p.reindex(items=items, minor=cols)) - - assert_panel_equal(p.loc[items, dates, :], - p.reindex(items=items, major=dates)) - - # only 1 - assert_panel_equal(p.loc[items, :, :], p.reindex(items=items)) - - assert_panel_equal(p.loc[:, dates, :], p.reindex(major=dates)) - - assert_panel_equal(p.loc[:, :, cols], p.reindex(minor=cols)) - def test_getitem_fancy_slice(self): pass - def test_getitem_fancy_ints(self): - p = self.panel - - # #1603 - result = p.iloc[:, -1, :] - expected = p.loc[:, p.major_axis[-1], :] - assert_frame_equal(result, expected) - - def test_getitem_fancy_xs(self): - p = self.panel - item = 'ItemB' - - date = p.major_axis[5] - col = 'C' - - # get DataFrame - # item - assert_frame_equal(p.loc[item], p[item]) - assert_frame_equal(p.loc[item, :], p[item]) - assert_frame_equal(p.loc[item, :, :], p[item]) - - # major axis, axis=1 - assert_frame_equal(p.loc[:, date], p.major_xs(date)) - assert_frame_equal(p.loc[:, date, :], p.major_xs(date)) - - # minor axis, axis=2 - assert_frame_equal(p.loc[:, :, 'C'], p.minor_xs('C')) - - # get Series - assert_series_equal(p.loc[item, date], p[item].loc[date]) - assert_series_equal(p.loc[item, date, :], p[item].loc[date]) - assert_series_equal(p.loc[item, :, col], p[item][col]) - assert_series_equal(p.loc[:, date, col], p.major_xs(date).loc[col]) - - def test_getitem_fancy_xs_check_view(self): - item = 'ItemB' - date = self.panel.major_axis[5] - - # make sure it's always a view - NS = slice(None, None) - - # DataFrames - comp = assert_frame_equal - self._check_view(item, comp) - self._check_view((item, NS), comp) - self._check_view((item, NS, NS), comp) - self._check_view((NS, date), comp) - self._check_view((NS, date, NS), comp) - self._check_view((NS, NS, 'C'), comp) - - # Series - comp = assert_series_equal - self._check_view((item, date), comp) - self._check_view((item, date, NS), comp) - self._check_view((item, NS, 'C'), comp) - self._check_view((NS, date, 'C'), comp) - - def test_getitem_callable(self): - p = self.panel - # GH 12533 - - assert_frame_equal(p[lambda x: 'ItemB'], p.loc['ItemB']) - assert_panel_equal(p[lambda x: ['ItemB', 'ItemC']], - p.loc[['ItemB', 'ItemC']]) - def test_ix_setitem_slice_dataframe(self): a = Panel(items=[1, 2, 3], major_axis=[11, 22, 33], minor_axis=[111, 222, 333]) @@ -719,43 +146,6 @@ def test_ix_align(self): assert_series_equal(df.loc[0, 0, :].reindex(b.index), b) def test_ix_frame_align(self): - p_orig = tm.makePanel() - df = p_orig.iloc[0].copy() - assert_frame_equal(p_orig['ItemA'], df) - - p = p_orig.copy() - p.iloc[0, :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0, :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.loc['ItemA'] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.loc['ItemA', :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p['ItemA'] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0, [0, 1, 3, 5], -2:] = df - out = p.iloc[0, [0, 1, 3, 5], -2:] - assert_frame_equal(out, df.iloc[[0, 1, 3, 5], [2, 3]]) - # GH3830, panel assignent by values/frame for dtype in ['float64', 'int64']: @@ -782,13 +172,6 @@ def test_ix_frame_align(self): tm.assert_frame_equal(panel.loc['a1'], df1) tm.assert_frame_equal(panel.loc['a2'], df2) - def _check_view(self, indexer, comp): - cp = self.panel.copy() - obj = cp.loc[indexer] - obj.values[:] = 0 - assert (obj.values == 0).all() - comp(cp.loc[indexer].reindex_like(obj), obj) - def test_logical_with_nas(self): d = Panel({'ItemA': {'a': [np.nan, False]}, 'ItemB': {'a': [True, True]}}) @@ -802,157 +185,11 @@ def test_logical_with_nas(self): expected = DataFrame({'a': [True, True]}) assert_frame_equal(result, expected) - def test_neg(self): - assert_panel_equal(-self.panel, -1 * self.panel) - - def test_invert(self): - assert_panel_equal(-(self.panel < 0), ~(self.panel < 0)) - - def test_comparisons(self): - p1 = tm.makePanel() - p2 = tm.makePanel() - - tp = p1.reindex(items=p1.items + ['foo']) - df = p1[p1.items[0]] - - def test_comp(func): - - # versus same index - result = func(p1, p2) - tm.assert_numpy_array_equal(result.values, - func(p1.values, p2.values)) - - # versus non-indexed same objs - pytest.raises(Exception, func, p1, tp) - - # versus different objs - pytest.raises(Exception, func, p1, df) - - # versus scalar - result3 = func(self.panel, 0) - tm.assert_numpy_array_equal(result3.values, - func(self.panel.values, 0)) - - with np.errstate(invalid='ignore'): - test_comp(operator.eq) - test_comp(operator.ne) - test_comp(operator.lt) - test_comp(operator.gt) - test_comp(operator.ge) - test_comp(operator.le) - - def test_get_value(self): - for item in self.panel.items: - for mjr in self.panel.major_axis[::2]: - for mnr in self.panel.minor_axis: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = self.panel.get_value(item, mjr, mnr) - expected = self.panel[item][mnr][mjr] - assert_almost_equal(result, expected) - with catch_warnings(): - simplefilter("ignore", FutureWarning) - msg = "There must be an argument for each axis" - with pytest.raises(TypeError, match=msg): - self.panel.get_value('a') - - def test_set_value(self): - for item in self.panel.items: - for mjr in self.panel.major_axis[::2]: - for mnr in self.panel.minor_axis: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - self.panel.set_value(item, mjr, mnr, 1.) - tm.assert_almost_equal(self.panel[item][mnr][mjr], 1.) - - # resize - with catch_warnings(): - simplefilter("ignore", FutureWarning) - res = self.panel.set_value('ItemE', 'foo', 'bar', 1.5) - assert isinstance(res, Panel) - assert res is not self.panel - assert res.get_value('ItemE', 'foo', 'bar') == 1.5 - - res3 = self.panel.set_value('ItemE', 'foobar', 'baz', 5) - assert is_float_dtype(res3['ItemE'].values) - - msg = ("There must be an argument for each " - "axis plus the value provided") - with pytest.raises(TypeError, match=msg): - self.panel.set_value('a') - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -class TestPanel(PanelTests, CheckIndexing, SafeForLongAndSparse, - SafeForSparse): - - def setup_method(self, method): - self.panel = make_test_panel() - self.panel.major_axis.name = None - self.panel.minor_axis.name = None - self.panel.items.name = None - - def test_constructor(self): - # with BlockManager - wp = Panel(self.panel._data) - assert wp._data is self.panel._data - - wp = Panel(self.panel._data, copy=True) - assert wp._data is not self.panel._data - tm.assert_panel_equal(wp, self.panel) - - # strings handled prop - wp = Panel([[['foo', 'foo', 'foo', ], ['foo', 'foo', 'foo']]]) - assert wp.values.dtype == np.object_ - - vals = self.panel.values - - # no copy - wp = Panel(vals) - assert wp.values is vals - - # copy - wp = Panel(vals, copy=True) - assert wp.values is not vals - - # GH #8285, test when scalar data is used to construct a Panel - # if dtype is not passed, it should be inferred - value_and_dtype = [(1, 'int64'), (3.14, 'float64'), - ('foo', np.object_)] - for (val, dtype) in value_and_dtype: - wp = Panel(val, items=range(2), major_axis=range(3), - minor_axis=range(4)) - vals = np.empty((2, 3, 4), dtype=dtype) - vals.fill(val) - - tm.assert_panel_equal(wp, Panel(vals, dtype=dtype)) - - # test the case when dtype is passed - wp = Panel(1, items=range(2), major_axis=range(3), - minor_axis=range(4), - dtype='float32') - vals = np.empty((2, 3, 4), dtype='float32') - vals.fill(1) - - tm.assert_panel_equal(wp, Panel(vals, dtype='float32')) +class TestPanel(PanelTests, CheckIndexing, SafeForSparse): def test_constructor_cast(self): - zero_filled = self.panel.fillna(0) - - casted = Panel(zero_filled._data, dtype=int) - casted2 = Panel(zero_filled.values, dtype=int) - - exp_values = zero_filled.values.astype(int) - assert_almost_equal(casted.values, exp_values) - assert_almost_equal(casted2.values, exp_values) - - casted = Panel(zero_filled._data, dtype=np.int32) - casted2 = Panel(zero_filled.values, dtype=np.int32) - - exp_values = zero_filled.values.astype(np.int32) - assert_almost_equal(casted.values, exp_values) - assert_almost_equal(casted2.values, exp_values) - # can't cast data = [[['foo', 'bar', 'baz']]] pytest.raises(ValueError, Panel, data, dtype=float) @@ -1017,86 +254,6 @@ def test_constructor_fails_with_not_3d_input(self): with pytest.raises(ValueError, match=msg): Panel(np.random.randn(10, 2)) - def test_consolidate(self): - assert self.panel._data.is_consolidated() - - self.panel['foo'] = 1. - assert not self.panel._data.is_consolidated() - - panel = self.panel._consolidate() - assert panel._data.is_consolidated() - - def test_ctor_dict(self): - itema = self.panel['ItemA'] - itemb = self.panel['ItemB'] - - d = {'A': itema, 'B': itemb[5:]} - d2 = {'A': itema._series, 'B': itemb[5:]._series} - d3 = {'A': None, - 'B': DataFrame(itemb[5:]._series), - 'C': DataFrame(itema._series)} - - wp = Panel.from_dict(d) - wp2 = Panel.from_dict(d2) # nested Dict - - # TODO: unused? - wp3 = Panel.from_dict(d3) # noqa - - tm.assert_index_equal(wp.major_axis, self.panel.major_axis) - assert_panel_equal(wp, wp2) - - # intersect - wp = Panel.from_dict(d, intersect=True) - tm.assert_index_equal(wp.major_axis, itemb.index[5:]) - - # use constructor - assert_panel_equal(Panel(d), Panel.from_dict(d)) - assert_panel_equal(Panel(d2), Panel.from_dict(d2)) - assert_panel_equal(Panel(d3), Panel.from_dict(d3)) - - # a pathological case - d4 = {'A': None, 'B': None} - - # TODO: unused? - wp4 = Panel.from_dict(d4) # noqa - - assert_panel_equal(Panel(d4), Panel(items=['A', 'B'])) - - # cast - dcasted = {k: v.reindex(wp.major_axis).fillna(0) - for k, v in compat.iteritems(d)} - result = Panel(dcasted, dtype=int) - expected = Panel({k: v.astype(int) - for k, v in compat.iteritems(dcasted)}) - assert_panel_equal(result, expected) - - result = Panel(dcasted, dtype=np.int32) - expected = Panel({k: v.astype(np.int32) - for k, v in compat.iteritems(dcasted)}) - assert_panel_equal(result, expected) - - def test_constructor_dict_mixed(self): - data = {k: v.values for k, v in self.panel.iteritems()} - result = Panel(data) - exp_major = Index(np.arange(len(self.panel.major_axis))) - tm.assert_index_equal(result.major_axis, exp_major) - - result = Panel(data, items=self.panel.items, - major_axis=self.panel.major_axis, - minor_axis=self.panel.minor_axis) - assert_panel_equal(result, self.panel) - - data['ItemC'] = self.panel['ItemC'] - result = Panel(data) - assert_panel_equal(result, self.panel) - - # corner, blow up - data['ItemB'] = data['ItemB'][:-1] - pytest.raises(Exception, Panel, data) - - data['ItemB'] = self.panel['ItemB'].values[:, :-1] - pytest.raises(Exception, Panel, data) - def test_ctor_orderedDict(self): keys = list(set(np.random.randint(0, 5000, 100)))[ :50] # unique random int keys @@ -1107,30 +264,6 @@ def test_ctor_orderedDict(self): p = Panel.from_dict(d) assert list(p.items) == keys - def test_constructor_resize(self): - data = self.panel._data - items = self.panel.items[:-1] - major = self.panel.major_axis[:-1] - minor = self.panel.minor_axis[:-1] - - result = Panel(data, items=items, - major_axis=major, minor_axis=minor) - expected = self.panel.reindex( - items=items, major=major, minor=minor) - assert_panel_equal(result, expected) - - result = Panel(data, items=items, major_axis=major) - expected = self.panel.reindex(items=items, major=major) - assert_panel_equal(result, expected) - - result = Panel(data, items=items) - expected = self.panel.reindex(items=items) - assert_panel_equal(result, expected) - - result = Panel(data, minor_axis=minor) - expected = self.panel.reindex(minor=minor) - assert_panel_equal(result, expected) - def test_from_dict_mixed_orient(self): df = tm.makeDataFrame() df['foo'] = 'bar' @@ -1161,153 +294,7 @@ def test_constructor_error_msgs(self): Panel(np.random.randn(3, 4, 5), lrange(5), lrange(5), lrange(4)) - def test_conform(self): - df = self.panel['ItemA'][:-5].filter(items=['A', 'B']) - conformed = self.panel.conform(df) - - tm.assert_index_equal(conformed.index, self.panel.major_axis) - tm.assert_index_equal(conformed.columns, self.panel.minor_axis) - - def test_convert_objects(self): - # GH 4937 - p = Panel(dict(A=dict(a=['1', '1.0']))) - expected = Panel(dict(A=dict(a=[1, 1.0]))) - result = p._convert(numeric=True, coerce=True) - assert_panel_equal(result, expected) - - def test_dtypes(self): - - result = self.panel.dtypes - expected = Series(np.dtype('float64'), index=self.panel.items) - assert_series_equal(result, expected) - - def test_astype(self): - # GH7271 - data = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) - panel = Panel(data, ['a', 'b'], ['c', 'd'], ['e', 'f']) - - str_data = np.array([[['1', '2'], ['3', '4']], - [['5', '6'], ['7', '8']]]) - expected = Panel(str_data, ['a', 'b'], ['c', 'd'], ['e', 'f']) - assert_panel_equal(panel.astype(str), expected) - - pytest.raises(NotImplementedError, panel.astype, {0: str}) - - def test_apply(self): - # GH1148 - - # ufunc - applied = self.panel.apply(np.sqrt) - with np.errstate(invalid='ignore'): - expected = np.sqrt(self.panel.values) - assert_almost_equal(applied.values, expected) - - # ufunc same shape - result = self.panel.apply(lambda x: x * 2, axis='items') - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, axis='major_axis') - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, axis='minor_axis') - expected = self.panel * 2 - assert_panel_equal(result, expected) - - # reduction to DataFrame - result = self.panel.apply(lambda x: x.dtype, axis='items') - expected = DataFrame(np.dtype('float64'), - index=self.panel.major_axis, - columns=self.panel.minor_axis) - assert_frame_equal(result, expected) - result = self.panel.apply(lambda x: x.dtype, axis='major_axis') - expected = DataFrame(np.dtype('float64'), - index=self.panel.minor_axis, - columns=self.panel.items) - assert_frame_equal(result, expected) - result = self.panel.apply(lambda x: x.dtype, axis='minor_axis') - expected = DataFrame(np.dtype('float64'), - index=self.panel.major_axis, - columns=self.panel.items) - assert_frame_equal(result, expected) - - # reductions via other dims - expected = self.panel.sum(0) - result = self.panel.apply(lambda x: x.sum(), axis='items') - assert_frame_equal(result, expected) - expected = self.panel.sum(1) - result = self.panel.apply(lambda x: x.sum(), axis='major_axis') - assert_frame_equal(result, expected) - expected = self.panel.sum(2) - result = self.panel.apply(lambda x: x.sum(), axis='minor_axis') - assert_frame_equal(result, expected) - - # pass kwargs - result = self.panel.apply( - lambda x, y: x.sum() + y, axis='items', y=5) - expected = self.panel.sum(0) + 5 - assert_frame_equal(result, expected) - def test_apply_slabs(self): - - # same shape as original - result = self.panel.apply(lambda x: x * 2, - axis=['items', 'major_axis']) - expected = (self.panel * 2).transpose('minor_axis', 'major_axis', - 'items') - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['major_axis', 'items']) - assert_panel_equal(result, expected) - - result = self.panel.apply(lambda x: x * 2, - axis=['items', 'minor_axis']) - expected = (self.panel * 2).transpose('major_axis', 'minor_axis', - 'items') - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['minor_axis', 'items']) - assert_panel_equal(result, expected) - - result = self.panel.apply(lambda x: x * 2, - axis=['major_axis', 'minor_axis']) - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['minor_axis', 'major_axis']) - assert_panel_equal(result, expected) - - # reductions - result = self.panel.apply(lambda x: x.sum(0), axis=[ - 'items', 'major_axis' - ]) - expected = self.panel.sum(1).T - assert_frame_equal(result, expected) - - result = self.panel.apply(lambda x: x.sum(1), axis=[ - 'items', 'major_axis' - ]) - expected = self.panel.sum(0) - assert_frame_equal(result, expected) - - # transforms - f = lambda x: ((x.T - x.mean(1)) / x.std(1)).T - - # make sure that we don't trigger any warnings - result = self.panel.apply(f, axis=['items', 'major_axis']) - expected = Panel({ax: f(self.panel.loc[:, :, ax]) - for ax in self.panel.minor_axis}) - assert_panel_equal(result, expected) - - result = self.panel.apply(f, axis=['major_axis', 'minor_axis']) - expected = Panel({ax: f(self.panel.loc[ax]) - for ax in self.panel.items}) - assert_panel_equal(result, expected) - - result = self.panel.apply(f, axis=['minor_axis', 'items']) - expected = Panel({ax: f(self.panel.loc[:, ax]) - for ax in self.panel.major_axis}) - assert_panel_equal(result, expected) - # with multi-indexes # GH7469 index = MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ( @@ -1343,371 +330,13 @@ def test_apply_no_or_zero_ndim(self): assert_series_equal(result_float, expected_float) assert_series_equal(result_float64, expected_float64) - def test_reindex(self): - ref = self.panel['ItemB'] - - # items - result = self.panel.reindex(items=['ItemA', 'ItemB']) - assert_frame_equal(result['ItemB'], ref) - - # major - new_major = list(self.panel.major_axis[:10]) - result = self.panel.reindex(major=new_major) - assert_frame_equal(result['ItemB'], ref.reindex(index=new_major)) - - # raise exception put both major and major_axis - pytest.raises(Exception, self.panel.reindex, - major_axis=new_major, - major=new_major) - - # minor - new_minor = list(self.panel.minor_axis[:2]) - result = self.panel.reindex(minor=new_minor) - assert_frame_equal(result['ItemB'], ref.reindex(columns=new_minor)) - - # raise exception put both major and major_axis - pytest.raises(Exception, self.panel.reindex, - minor_axis=new_minor, - minor=new_minor) - - # this ok - result = self.panel.reindex() - assert_panel_equal(result, self.panel) - assert result is not self.panel - - # with filling - smaller_major = self.panel.major_axis[::5] - smaller = self.panel.reindex(major=smaller_major) - - larger = smaller.reindex(major=self.panel.major_axis, method='pad') - - assert_frame_equal(larger.major_xs(self.panel.major_axis[1]), - smaller.major_xs(smaller_major[0])) - - # don't necessarily copy - result = self.panel.reindex( - major=self.panel.major_axis, copy=False) - assert_panel_equal(result, self.panel) - assert result is self.panel - - def test_reindex_axis_style(self): - panel = Panel(np.random.rand(5, 5, 5)) - expected0 = Panel(panel.values).iloc[[0, 1]] - expected1 = Panel(panel.values).iloc[:, [0, 1]] - expected2 = Panel(panel.values).iloc[:, :, [0, 1]] - - result = panel.reindex([0, 1], axis=0) - assert_panel_equal(result, expected0) - - result = panel.reindex([0, 1], axis=1) - assert_panel_equal(result, expected1) - - result = panel.reindex([0, 1], axis=2) - assert_panel_equal(result, expected2) - - result = panel.reindex([0, 1], axis=2) - assert_panel_equal(result, expected2) - - def test_reindex_multi(self): - - # with and without copy full reindexing - result = self.panel.reindex( - items=self.panel.items, - major=self.panel.major_axis, - minor=self.panel.minor_axis, copy=False) - - assert result.items is self.panel.items - assert result.major_axis is self.panel.major_axis - assert result.minor_axis is self.panel.minor_axis - - result = self.panel.reindex( - items=self.panel.items, - major=self.panel.major_axis, - minor=self.panel.minor_axis, copy=False) - assert_panel_equal(result, self.panel) - - # multi-axis indexing consistency - # GH 5900 - df = DataFrame(np.random.randn(4, 3)) - p = Panel({'Item1': df}) - expected = Panel({'Item1': df}) - expected['Item2'] = np.nan - - items = ['Item1', 'Item2'] - major_axis = np.arange(4) - minor_axis = np.arange(3) - - results = [] - results.append(p.reindex(items=items, major_axis=major_axis, - copy=True)) - results.append(p.reindex(items=items, major_axis=major_axis, - copy=False)) - results.append(p.reindex(items=items, minor_axis=minor_axis, - copy=True)) - results.append(p.reindex(items=items, minor_axis=minor_axis, - copy=False)) - results.append(p.reindex(items=items, major_axis=major_axis, - minor_axis=minor_axis, copy=True)) - results.append(p.reindex(items=items, major_axis=major_axis, - minor_axis=minor_axis, copy=False)) - - for i, r in enumerate(results): - assert_panel_equal(expected, r) - - def test_reindex_like(self): - # reindex_like - smaller = self.panel.reindex(items=self.panel.items[:-1], - major=self.panel.major_axis[:-1], - minor=self.panel.minor_axis[:-1]) - smaller_like = self.panel.reindex_like(smaller) - assert_panel_equal(smaller, smaller_like) - - def test_take(self): - # axis == 0 - result = self.panel.take([2, 0, 1], axis=0) - expected = self.panel.reindex(items=['ItemC', 'ItemA', 'ItemB']) - assert_panel_equal(result, expected) - - # axis >= 1 - result = self.panel.take([3, 0, 1, 2], axis=2) - expected = self.panel.reindex(minor=['D', 'A', 'B', 'C']) - assert_panel_equal(result, expected) - - # neg indices ok - expected = self.panel.reindex(minor=['D', 'D', 'B', 'C']) - result = self.panel.take([3, -1, 1, 2], axis=2) - assert_panel_equal(result, expected) - - pytest.raises(Exception, self.panel.take, [4, 0, 1, 2], axis=2) - - def test_sort_index(self): - import random - - ritems = list(self.panel.items) - rmajor = list(self.panel.major_axis) - rminor = list(self.panel.minor_axis) - random.shuffle(ritems) - random.shuffle(rmajor) - random.shuffle(rminor) - - random_order = self.panel.reindex(items=ritems) - sorted_panel = random_order.sort_index(axis=0) - assert_panel_equal(sorted_panel, self.panel) - - # descending - random_order = self.panel.reindex(items=ritems) - sorted_panel = random_order.sort_index(axis=0, ascending=False) - assert_panel_equal( - sorted_panel, - self.panel.reindex(items=self.panel.items[::-1])) - - random_order = self.panel.reindex(major=rmajor) - sorted_panel = random_order.sort_index(axis=1) - assert_panel_equal(sorted_panel, self.panel) - - random_order = self.panel.reindex(minor=rminor) - sorted_panel = random_order.sort_index(axis=2) - assert_panel_equal(sorted_panel, self.panel) - def test_fillna(self): - filled = self.panel.fillna(0) - assert np.isfinite(filled.values).all() - - filled = self.panel.fillna(method='backfill') - assert_frame_equal(filled['ItemA'], - self.panel['ItemA'].fillna(method='backfill')) - - panel = self.panel.copy() - panel['str'] = 'foo' - - filled = panel.fillna(method='backfill') - assert_frame_equal(filled['ItemA'], - panel['ItemA'].fillna(method='backfill')) - - empty = self.panel.reindex(items=[]) - filled = empty.fillna(0) - assert_panel_equal(filled, empty) - - pytest.raises(ValueError, self.panel.fillna) - pytest.raises(ValueError, self.panel.fillna, 5, method='ffill') - - pytest.raises(TypeError, self.panel.fillna, [1, 2]) - pytest.raises(TypeError, self.panel.fillna, (1, 2)) - # limit not implemented when only value is specified p = Panel(np.random.randn(3, 4, 5)) p.iloc[0:2, 0:2, 0:2] = np.nan pytest.raises(NotImplementedError, lambda: p.fillna(999, limit=1)) - # Test in place fillNA - # Expected result - expected = Panel([[[0, 1], [2, 1]], [[10, 11], [12, 11]]], - items=['a', 'b'], minor_axis=['x', 'y'], - dtype=np.float64) - # method='ffill' - p1 = Panel([[[0, 1], [2, np.nan]], [[10, 11], [12, np.nan]]], - items=['a', 'b'], minor_axis=['x', 'y'], - dtype=np.float64) - p1.fillna(method='ffill', inplace=True) - assert_panel_equal(p1, expected) - - # method='bfill' - p2 = Panel([[[0, np.nan], [2, 1]], [[10, np.nan], [12, 11]]], - items=['a', 'b'], minor_axis=['x', 'y'], - dtype=np.float64) - p2.fillna(method='bfill', inplace=True) - assert_panel_equal(p2, expected) - - def test_ffill_bfill(self): - assert_panel_equal(self.panel.ffill(), - self.panel.fillna(method='ffill')) - assert_panel_equal(self.panel.bfill(), - self.panel.fillna(method='bfill')) - - def test_truncate_fillna_bug(self): - # #1823 - result = self.panel.truncate(before=None, after=None, axis='items') - - # it works! - result.fillna(value=0.0) - - def test_swapaxes(self): - result = self.panel.swapaxes('items', 'minor') - assert result.items is self.panel.minor_axis - - result = self.panel.swapaxes('items', 'major') - assert result.items is self.panel.major_axis - - result = self.panel.swapaxes('major', 'minor') - assert result.major_axis is self.panel.minor_axis - - panel = self.panel.copy() - result = panel.swapaxes('major', 'minor') - panel.values[0, 0, 1] = np.nan - expected = panel.swapaxes('major', 'minor') - assert_panel_equal(result, expected) - - # this should also work - result = self.panel.swapaxes(0, 1) - assert result.items is self.panel.major_axis - - # this works, but return a copy - result = self.panel.swapaxes('items', 'items') - assert_panel_equal(self.panel, result) - assert id(self.panel) != id(result) - - def test_transpose(self): - result = self.panel.transpose('minor', 'major', 'items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) - - # test kwargs - result = self.panel.transpose(items='minor', major='major', - minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) - - # text mixture of args - result = self.panel.transpose( - 'minor', major='major', minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) - - result = self.panel.transpose('minor', - 'major', - minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) - - # duplicate axes - with pytest.raises(TypeError, - match='not enough/duplicate arguments'): - self.panel.transpose('minor', maj='major', minor='items') - - with pytest.raises(ValueError, - match='repeated axis in transpose'): - self.panel.transpose('minor', 'major', major='minor', - minor='items') - - result = self.panel.transpose(2, 1, 0) - assert_panel_equal(result, expected) - - result = self.panel.transpose('minor', 'items', 'major') - expected = self.panel.swapaxes('items', 'minor') - expected = expected.swapaxes('major', 'minor') - assert_panel_equal(result, expected) - - result = self.panel.transpose(2, 0, 1) - assert_panel_equal(result, expected) - - pytest.raises(ValueError, self.panel.transpose, 0, 0, 1) - - def test_transpose_copy(self): - panel = self.panel.copy() - result = panel.transpose(2, 0, 1, copy=True) - expected = panel.swapaxes('items', 'minor') - expected = expected.swapaxes('major', 'minor') - assert_panel_equal(result, expected) - - panel.values[0, 1, 1] = np.nan - assert notna(result.values[1, 0, 1]) - - def test_to_frame(self): - # filtered - filtered = self.panel.to_frame() - expected = self.panel.to_frame().dropna(how='any') - assert_frame_equal(filtered, expected) - - # unfiltered - unfiltered = self.panel.to_frame(filter_observations=False) - assert_panel_equal(unfiltered.to_panel(), self.panel) - - # names - assert unfiltered.index.names == ('major', 'minor') - - # unsorted, round trip - df = self.panel.to_frame(filter_observations=False) - unsorted = df.take(np.random.permutation(len(df))) - pan = unsorted.to_panel() - assert_panel_equal(pan, self.panel) - - # preserve original index names - df = DataFrame(np.random.randn(6, 2), - index=[['a', 'a', 'b', 'b', 'c', 'c'], - [0, 1, 0, 1, 0, 1]], - columns=['one', 'two']) - df.index.names = ['foo', 'bar'] - df.columns.name = 'baz' - - rdf = df.to_panel().to_frame() - assert rdf.index.names == df.index.names - assert rdf.columns.names == df.columns.names - - def test_to_frame_mixed(self): - panel = self.panel.fillna(0) - panel['str'] = 'foo' - panel['bool'] = panel['ItemA'] > 0 - - lp = panel.to_frame() - wp = lp.to_panel() - assert wp['bool'].values.dtype == np.bool_ - # Previously, this was mutating the underlying - # index and changing its name - assert_frame_equal(wp['bool'], panel['bool'], check_names=False) - - # GH 8704 - # with categorical - df = panel.to_frame() - df['category'] = df['str'].astype('category') - - # to_panel - # TODO: this converts back to object - p = df.to_panel() - expected = panel.copy() - expected['category'] = 'foo' - assert_panel_equal(p, expected) - def test_to_frame_multi_major(self): idx = MultiIndex.from_tuples( [(1, 'one'), (1, 'two'), (2, 'one'), (2, 'two')]) @@ -1808,22 +437,6 @@ def test_to_frame_multi_drop_level(self): expected = DataFrame({'i1': [1., 2], 'i2': [1., 2]}, index=exp_idx) assert_frame_equal(result, expected) - def test_to_panel_na_handling(self): - df = DataFrame(np.random.randint(0, 10, size=20).reshape((10, 2)), - index=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1], - [0, 1, 2, 3, 4, 5, 2, 3, 4, 5]]) - - panel = df.to_panel() - assert isna(panel[0].loc[1, [0, 1]]).all() - - def test_to_panel_duplicates(self): - # #2441 - df = DataFrame({'a': [0, 0, 1], 'b': [1, 1, 1], 'c': [1, 2, 3]}) - idf = df.set_index(['a', 'b']) - - with pytest.raises(ValueError, match='non-uniquely indexed'): - idf.to_panel() - def test_panel_dups(self): # GH 4960 @@ -1842,11 +455,6 @@ def test_panel_dups(self): result = panel.loc['E'] assert_frame_equal(result, expected) - expected = no_dup_panel.loc[['A', 'B']] - expected.items = ['A', 'A'] - result = panel.loc['A'] - assert_panel_equal(result, expected) - # major data = np.random.randn(5, 5, 5) no_dup_panel = Panel(data, major_axis=list("ABCDE")) @@ -1860,11 +468,6 @@ def test_panel_dups(self): result = panel.loc[:, 'E'] assert_frame_equal(result, expected) - expected = no_dup_panel.loc[:, ['A', 'B']] - expected.major_axis = ['A', 'A'] - result = panel.loc[:, 'A'] - assert_panel_equal(result, expected) - # minor data = np.random.randn(5, 100, 5) no_dup_panel = Panel(data, minor_axis=list("ABCDE")) @@ -1878,48 +481,10 @@ def test_panel_dups(self): result = panel.loc[:, :, 'E'] assert_frame_equal(result, expected) - expected = no_dup_panel.loc[:, :, ['A', 'B']] - expected.minor_axis = ['A', 'A'] - result = panel.loc[:, :, 'A'] - assert_panel_equal(result, expected) - def test_filter(self): pass - def test_compound(self): - compounded = self.panel.compound() - - assert_series_equal(compounded['ItemA'], - (1 + self.panel['ItemA']).product(0) - 1, - check_names=False) - def test_shift(self): - # major - idx = self.panel.major_axis[0] - idx_lag = self.panel.major_axis[1] - shifted = self.panel.shift(1) - assert_frame_equal(self.panel.major_xs(idx), - shifted.major_xs(idx_lag)) - - # minor - idx = self.panel.minor_axis[0] - idx_lag = self.panel.minor_axis[1] - shifted = self.panel.shift(1, axis='minor') - assert_frame_equal(self.panel.minor_xs(idx), - shifted.minor_xs(idx_lag)) - - # items - idx = self.panel.items[0] - idx_lag = self.panel.items[1] - shifted = self.panel.shift(1, axis='items') - assert_frame_equal(self.panel[idx], shifted[idx_lag]) - - # negative numbers, #2164 - result = self.panel.shift(-1) - expected = Panel({i: f.shift(-1)[:-1] - for i, f in self.panel.iteritems()}) - assert_panel_equal(result, expected) - # mixed dtypes #6959 data = [('item ' + ch, makeMixedDataFrame()) for ch in list('abcde')] @@ -1928,131 +493,14 @@ def test_shift(self): shifted = mixed_panel.shift(1) assert_series_equal(mixed_panel.dtypes, shifted.dtypes) - def test_tshift(self): - # PeriodIndex - ps = tm.makePeriodPanel() - shifted = ps.tshift(1) - unshifted = shifted.tshift(-1) - - assert_panel_equal(unshifted, ps) - - shifted2 = ps.tshift(freq='B') - assert_panel_equal(shifted, shifted2) - - shifted3 = ps.tshift(freq=BDay()) - assert_panel_equal(shifted, shifted3) - - with pytest.raises(ValueError, match='does not match'): - ps.tshift(freq='M') - - # DatetimeIndex - panel = make_test_panel() - shifted = panel.tshift(1) - unshifted = shifted.tshift(-1) - - assert_panel_equal(panel, unshifted) - - shifted2 = panel.tshift(freq=panel.major_axis.freq) - assert_panel_equal(shifted, shifted2) - - inferred_ts = Panel(panel.values, items=panel.items, - major_axis=Index(np.asarray(panel.major_axis)), - minor_axis=panel.minor_axis) - shifted = inferred_ts.tshift(1) - unshifted = shifted.tshift(-1) - assert_panel_equal(shifted, panel.tshift(1)) - assert_panel_equal(unshifted, inferred_ts) - - no_freq = panel.iloc[:, [0, 5, 7], :] - pytest.raises(ValueError, no_freq.tshift) - - def test_pct_change(self): - df1 = DataFrame({'c1': [1, 2, 5], 'c2': [3, 4, 6]}) - df2 = df1 + 1 - df3 = DataFrame({'c1': [3, 4, 7], 'c2': [5, 6, 8]}) - wp = Panel({'i1': df1, 'i2': df2, 'i3': df3}) - # major, 1 - result = wp.pct_change() # axis='major' - expected = Panel({'i1': df1.pct_change(), - 'i2': df2.pct_change(), - 'i3': df3.pct_change()}) - assert_panel_equal(result, expected) - result = wp.pct_change(axis=1) - assert_panel_equal(result, expected) - # major, 2 - result = wp.pct_change(periods=2) - expected = Panel({'i1': df1.pct_change(2), - 'i2': df2.pct_change(2), - 'i3': df3.pct_change(2)}) - assert_panel_equal(result, expected) - # minor, 1 - result = wp.pct_change(axis='minor') - expected = Panel({'i1': df1.pct_change(axis=1), - 'i2': df2.pct_change(axis=1), - 'i3': df3.pct_change(axis=1)}) - assert_panel_equal(result, expected) - result = wp.pct_change(axis=2) - assert_panel_equal(result, expected) - # minor, 2 - result = wp.pct_change(periods=2, axis='minor') - expected = Panel({'i1': df1.pct_change(periods=2, axis=1), - 'i2': df2.pct_change(periods=2, axis=1), - 'i3': df3.pct_change(periods=2, axis=1)}) - assert_panel_equal(result, expected) - # items, 1 - result = wp.pct_change(axis='items') - expected = Panel( - {'i1': DataFrame({'c1': [np.nan, np.nan, np.nan], - 'c2': [np.nan, np.nan, np.nan]}), - 'i2': DataFrame({'c1': [1, 0.5, .2], - 'c2': [1. / 3, 0.25, 1. / 6]}), - 'i3': DataFrame({'c1': [.5, 1. / 3, 1. / 6], - 'c2': [.25, .2, 1. / 7]})}) - assert_panel_equal(result, expected) - result = wp.pct_change(axis=0) - assert_panel_equal(result, expected) - # items, 2 - result = wp.pct_change(periods=2, axis='items') - expected = Panel( - {'i1': DataFrame({'c1': [np.nan, np.nan, np.nan], - 'c2': [np.nan, np.nan, np.nan]}), - 'i2': DataFrame({'c1': [np.nan, np.nan, np.nan], - 'c2': [np.nan, np.nan, np.nan]}), - 'i3': DataFrame({'c1': [2, 1, .4], - 'c2': [2. / 3, .5, 1. / 3]})}) - assert_panel_equal(result, expected) - - def test_round(self): - values = [[[-3.2, 2.2], [0, -4.8213], [3.123, 123.12], - [-1566.213, 88.88], [-12, 94.5]], - [[-5.82, 3.5], [6.21, -73.272], [-9.087, 23.12], - [272.212, -99.99], [23, -76.5]]] - evalues = [[[float(np.around(i)) for i in j] for j in k] - for k in values] - p = Panel(values, items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B']) - expected = Panel(evalues, items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B']) - result = p.round() - assert_panel_equal(expected, result) - def test_numpy_round(self): values = [[[-3.2, 2.2], [0, -4.8213], [3.123, 123.12], [-1566.213, 88.88], [-12, 94.5]], [[-5.82, 3.5], [6.21, -73.272], [-9.087, 23.12], [272.212, -99.99], [23, -76.5]]] - evalues = [[[float(np.around(i)) for i in j] for j in k] - for k in values] p = Panel(values, items=['Item1', 'Item2'], major_axis=date_range('1/1/2000', periods=5), minor_axis=['A', 'B']) - expected = Panel(evalues, items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B']) - result = np.round(p) - assert_panel_equal(expected, result) msg = "the 'out' parameter is not supported" with pytest.raises(ValueError, match=msg): @@ -2070,7 +518,6 @@ def test_multiindex_get(self): minor_axis=np.arange(5)) f1 = wp['a'] f2 = wp.loc['a'] - assert_panel_equal(f1, f2) assert (f1.items == [1, 2]).all() assert (f2.items == [1, 2]).all() @@ -2078,269 +525,10 @@ def test_multiindex_get(self): MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)], names=['first', 'second']) - @pytest.mark.filterwarnings("ignore:Using a non-tuple:FutureWarning") - def test_multiindex_blocks(self): - ind = MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)], - names=['first', 'second']) - wp = Panel(self.panel._data) - wp.items = ind - f1 = wp['a'] - assert (f1.items == [1, 2]).all() - - f1 = wp[('b', 1)] - assert (f1.columns == ['A', 'B', 'C', 'D']).all() - def test_repr_empty(self): empty = Panel() repr(empty) - # ignore warning from us, because removing panel - @pytest.mark.filterwarnings("ignore:Using:FutureWarning") - def test_rename(self): - mapper = {'ItemA': 'foo', 'ItemB': 'bar', 'ItemC': 'baz'} - - renamed = self.panel.rename(items=mapper) - exp = Index(['foo', 'bar', 'baz']) - tm.assert_index_equal(renamed.items, exp) - - renamed = self.panel.rename(minor_axis=str.lower) - exp = Index(['a', 'b', 'c', 'd']) - tm.assert_index_equal(renamed.minor_axis, exp) - - # don't copy - renamed_nocopy = self.panel.rename(items=mapper, copy=False) - renamed_nocopy['foo'] = 3. - assert (self.panel['ItemA'].values == 3).all() - - def test_get_attr(self): - assert_frame_equal(self.panel['ItemA'], self.panel.ItemA) - - # specific cases from #3440 - self.panel['a'] = self.panel['ItemA'] - assert_frame_equal(self.panel['a'], self.panel.a) - self.panel['i'] = self.panel['ItemA'] - assert_frame_equal(self.panel['i'], self.panel.i) - - def test_from_frame_level1_unsorted(self): - tuples = [('MSFT', 3), ('MSFT', 2), ('AAPL', 2), ('AAPL', 1), - ('MSFT', 1)] - midx = MultiIndex.from_tuples(tuples) - df = DataFrame(np.random.rand(5, 4), index=midx) - p = df.to_panel() - assert_frame_equal(p.minor_xs(2), df.xs(2, level=1).sort_index()) - - def test_to_excel(self): - try: - import xlwt # noqa - import xlrd # noqa - import openpyxl # noqa - from pandas.io.excel import ExcelFile - except ImportError: - pytest.skip("need xlwt xlrd openpyxl") - - for ext in ['xls', 'xlsx']: - with ensure_clean('__tmp__.' + ext) as path: - self.panel.to_excel(path) - try: - reader = ExcelFile(path) - except ImportError: - pytest.skip("need xlwt xlrd openpyxl") - - for item, df in self.panel.iteritems(): - recdf = reader.parse(str(item), index_col=0) - assert_frame_equal(df, recdf) - - def test_to_excel_xlsxwriter(self): - try: - import xlrd # noqa - import xlsxwriter # noqa - from pandas.io.excel import ExcelFile - except ImportError: - pytest.skip("Requires xlrd and xlsxwriter. Skipping test.") - - with ensure_clean('__tmp__.xlsx') as path: - self.panel.to_excel(path, engine='xlsxwriter') - try: - reader = ExcelFile(path) - except ImportError as e: - pytest.skip("cannot write excel file: %s" % e) - - for item, df in self.panel.iteritems(): - recdf = reader.parse(str(item), index_col=0) - assert_frame_equal(df, recdf) - - @pytest.mark.filterwarnings("ignore:'.reindex:FutureWarning") - def test_dropna(self): - p = Panel(np.random.randn(4, 5, 6), major_axis=list('abcde')) - p.loc[:, ['b', 'd'], 0] = np.nan - - result = p.dropna(axis=1) - exp = p.loc[:, ['a', 'c', 'e'], :] - assert_panel_equal(result, exp) - inp = p.copy() - inp.dropna(axis=1, inplace=True) - assert_panel_equal(inp, exp) - - result = p.dropna(axis=1, how='all') - assert_panel_equal(result, p) - - p.loc[:, ['b', 'd'], :] = np.nan - result = p.dropna(axis=1, how='all') - exp = p.loc[:, ['a', 'c', 'e'], :] - assert_panel_equal(result, exp) - - p = Panel(np.random.randn(4, 5, 6), items=list('abcd')) - p.loc[['b'], :, 0] = np.nan - - result = p.dropna() - exp = p.loc[['a', 'c', 'd']] - assert_panel_equal(result, exp) - - result = p.dropna(how='all') - assert_panel_equal(result, p) - - p.loc['b'] = np.nan - result = p.dropna(how='all') - exp = p.loc[['a', 'c', 'd']] - assert_panel_equal(result, exp) - - def test_drop(self): - df = DataFrame({"A": [1, 2], "B": [3, 4]}) - panel = Panel({"One": df, "Two": df}) - - def check_drop(drop_val, axis_number, aliases, expected): - try: - actual = panel.drop(drop_val, axis=axis_number) - assert_panel_equal(actual, expected) - for alias in aliases: - actual = panel.drop(drop_val, axis=alias) - assert_panel_equal(actual, expected) - except AssertionError: - pprint_thing("Failed with axis_number %d and aliases: %s" % - (axis_number, aliases)) - raise - # Items - expected = Panel({"One": df}) - check_drop('Two', 0, ['items'], expected) - - pytest.raises(KeyError, panel.drop, 'Three') - - # errors = 'ignore' - dropped = panel.drop('Three', errors='ignore') - assert_panel_equal(dropped, panel) - dropped = panel.drop(['Two', 'Three'], errors='ignore') - expected = Panel({"One": df}) - assert_panel_equal(dropped, expected) - - # Major - exp_df = DataFrame({"A": [2], "B": [4]}, index=[1]) - expected = Panel({"One": exp_df, "Two": exp_df}) - check_drop(0, 1, ['major_axis', 'major'], expected) - - exp_df = DataFrame({"A": [1], "B": [3]}, index=[0]) - expected = Panel({"One": exp_df, "Two": exp_df}) - check_drop([1], 1, ['major_axis', 'major'], expected) - - # Minor - exp_df = df[['B']] - expected = Panel({"One": exp_df, "Two": exp_df}) - check_drop(["A"], 2, ['minor_axis', 'minor'], expected) - - exp_df = df[['A']] - expected = Panel({"One": exp_df, "Two": exp_df}) - check_drop("B", 2, ['minor_axis', 'minor'], expected) - - def test_update(self): - pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]], - [[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) - - other = Panel( - [[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) - - pan.update(other) - - expected = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]], - [[3.6, 2., 3], [1.5, np.nan, 7], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) - - assert_panel_equal(pan, expected) - - def test_update_from_dict(self): - pan = Panel({'one': DataFrame([[1.5, np.nan, 3], - [1.5, np.nan, 3], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]), - 'two': DataFrame([[1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]])}) - - other = {'two': DataFrame( - [[3.6, 2., np.nan], [np.nan, np.nan, 7]])} - - pan.update(other) - - expected = Panel( - {'one': DataFrame([[1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]), - 'two': DataFrame([[3.6, 2., 3], - [1.5, np.nan, 7], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]) - } - ) - - assert_panel_equal(pan, expected) - - def test_update_nooverwrite(self): - pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]], - [[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) - - other = Panel( - [[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) - - pan.update(other, overwrite=False) - - expected = Panel([[[1.5, np.nan, 3], [1.5, np.nan, 3], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]], - [[1.5, 2., 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) - - assert_panel_equal(pan, expected) - - def test_update_filtered(self): - pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]], - [[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) - - other = Panel( - [[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) - - pan.update(other, filter_func=lambda x: x > 2) - - expected = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]], - [[1.5, np.nan, 3], [1.5, np.nan, 7], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]]]) - - assert_panel_equal(pan, expected) - @pytest.mark.parametrize('bad_kwarg, exception, msg', [ # errors must be 'ignore' or 'raise' ({'errors': 'something'}, ValueError, 'The parameter errors must.*'), @@ -2369,242 +557,6 @@ def test_update_deprecation(self, raise_conflict): with tm.assert_produces_warning(FutureWarning): pan.update(other, raise_conflict=raise_conflict) - def test_all_any(self): - assert (self.panel.all(axis=0).values == nanall( - self.panel, axis=0)).all() - assert (self.panel.all(axis=1).values == nanall( - self.panel, axis=1).T).all() - assert (self.panel.all(axis=2).values == nanall( - self.panel, axis=2).T).all() - assert (self.panel.any(axis=0).values == nanany( - self.panel, axis=0)).all() - assert (self.panel.any(axis=1).values == nanany( - self.panel, axis=1).T).all() - assert (self.panel.any(axis=2).values == nanany( - self.panel, axis=2).T).all() - - def test_all_any_unhandled(self): - pytest.raises(NotImplementedError, self.panel.all, bool_only=True) - pytest.raises(NotImplementedError, self.panel.any, bool_only=True) - - # GH issue 15960 - def test_sort_values(self): - pytest.raises(NotImplementedError, self.panel.sort_values) - pytest.raises(NotImplementedError, self.panel.sort_values, 'ItemA') - - -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -class TestPanelFrame(object): - """ - Check that conversions to and from Panel to DataFrame work. - """ - - def setup_method(self, method): - panel = make_test_panel() - self.panel = panel.to_frame() - self.unfiltered_panel = panel.to_frame(filter_observations=False) - - def test_ops_differently_indexed(self): - # trying to set non-identically indexed panel - wp = self.panel.to_panel() - wp2 = wp.reindex(major=wp.major_axis[:-1]) - lp2 = wp2.to_frame() - - result = self.panel + lp2 - assert_frame_equal(result.reindex(lp2.index), lp2 * 2) - - # careful, mutation - self.panel['foo'] = lp2['ItemA'] - assert_series_equal(self.panel['foo'].reindex(lp2.index), - lp2['ItemA'], - check_names=False) - - def test_ops_scalar(self): - result = self.panel.mul(2) - expected = DataFrame.__mul__(self.panel, 2) - assert_frame_equal(result, expected) - - def test_combineFrame(self): - wp = self.panel.to_panel() - result = self.panel.add(wp['ItemA'].stack(), axis=0) - assert_frame_equal(result.to_panel()['ItemA'], wp['ItemA'] * 2) - - def test_combinePanel(self): - wp = self.panel.to_panel() - result = self.panel.add(self.panel) - wide_result = result.to_panel() - assert_frame_equal(wp['ItemA'] * 2, wide_result['ItemA']) - - # one item - result = self.panel.add(self.panel.filter(['ItemA'])) - - def test_combine_scalar(self): - result = self.panel.mul(2) - expected = DataFrame(self.panel._data) * 2 - assert_frame_equal(result, expected) - - def test_combine_series(self): - s = self.panel['ItemA'][:10] - result = self.panel.add(s, axis=0) - expected = DataFrame.add(self.panel, s, axis=0) - assert_frame_equal(result, expected) - - s = self.panel.iloc[5] - result = self.panel + s - expected = DataFrame.add(self.panel, s, axis=1) - assert_frame_equal(result, expected) - - def test_operators(self): - wp = self.panel.to_panel() - result = (self.panel + 1).to_panel() - assert_frame_equal(wp['ItemA'] + 1, result['ItemA']) - - def test_arith_flex_panel(self): - ops = ['add', 'sub', 'mul', 'div', - 'truediv', 'pow', 'floordiv', 'mod'] - if not compat.PY3: - aliases = {} - else: - aliases = {'div': 'truediv'} - self.panel = self.panel.to_panel() - - for n in [np.random.randint(-50, -1), np.random.randint(1, 50), 0]: - for op in ops: - alias = aliases.get(op, op) - f = getattr(operator, alias) - exp = f(self.panel, n) - result = getattr(self.panel, op)(n) - assert_panel_equal(result, exp, check_panel_type=True) - - # rops - r_f = lambda x, y: f(y, x) - exp = r_f(self.panel, n) - result = getattr(self.panel, 'r' + op)(n) - assert_panel_equal(result, exp) - - def test_sort(self): - def is_sorted(arr): - return (arr[1:] > arr[:-1]).any() - - sorted_minor = self.panel.sort_index(level=1) - assert is_sorted(sorted_minor.index.codes[1]) - - sorted_major = sorted_minor.sort_index(level=0) - assert is_sorted(sorted_major.index.codes[0]) - - def test_to_string(self): - buf = StringIO() - self.panel.to_string(buf) - - def test_to_sparse(self): - if isinstance(self.panel, Panel): - msg = 'sparsifying is not supported' - with pytest.raises(NotImplementedError, match=msg): - self.panel.to_sparse - - def test_truncate(self): - dates = self.panel.index.levels[0] - start, end = dates[1], dates[5] - - trunced = self.panel.truncate(start, end).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(start, end) - - # TODO truncate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) - - trunced = self.panel.truncate(before=start).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(before=start) - - # TODO truncate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) - - trunced = self.panel.truncate(after=end).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(after=end) - - # TODO truncate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) - - # truncate on dates that aren't in there - wp = self.panel.to_panel() - new_index = wp.major_axis[::5] - - wp2 = wp.reindex(major=new_index) - - lp2 = wp2.to_frame() - lp_trunc = lp2.truncate(wp.major_axis[2], wp.major_axis[-2]) - - wp_trunc = wp2.truncate(wp.major_axis[2], wp.major_axis[-2]) - - assert_panel_equal(wp_trunc, lp_trunc.to_panel()) - - # throw proper exception - pytest.raises(Exception, lp2.truncate, wp.major_axis[-2], - wp.major_axis[2]) - - def test_axis_dummies(self): - from pandas.core.reshape.reshape import make_axis_dummies - - minor_dummies = make_axis_dummies(self.panel, 'minor').astype(np.uint8) - assert len(minor_dummies.columns) == len(self.panel.index.levels[1]) - - major_dummies = make_axis_dummies(self.panel, 'major').astype(np.uint8) - assert len(major_dummies.columns) == len(self.panel.index.levels[0]) - - mapping = {'A': 'one', 'B': 'one', 'C': 'two', 'D': 'two'} - - transformed = make_axis_dummies(self.panel, 'minor', - transform=mapping.get).astype(np.uint8) - assert len(transformed.columns) == 2 - tm.assert_index_equal(transformed.columns, Index(['one', 'two'])) - - # TODO: test correctness - - def test_get_dummies(self): - from pandas.core.reshape.reshape import get_dummies, make_axis_dummies - - self.panel['Label'] = self.panel.index.codes[1] - minor_dummies = make_axis_dummies(self.panel, 'minor').astype(np.uint8) - dummies = get_dummies(self.panel['Label']) - tm.assert_numpy_array_equal(dummies.values, minor_dummies.values) - - def test_mean(self): - means = self.panel.mean(level='minor') - - # test versus Panel version - wide_means = self.panel.to_panel().mean('major') - assert_frame_equal(means, wide_means) - - def test_sum(self): - sums = self.panel.sum(level='minor') - - # test versus Panel version - wide_sums = self.panel.to_panel().sum('major') - assert_frame_equal(sums, wide_sums) - - def test_count(self): - index = self.panel.index - - major_count = self.panel.count(level=0)['ItemA'] - level_codes = index.codes[0] - for i, idx in enumerate(index.levels[0]): - assert major_count[i] == (level_codes == i).sum() - - minor_count = self.panel.count(level=1)['ItemA'] - level_codes = index.codes[1] - for i, idx in enumerate(index.levels[1]): - assert minor_count[i] == (level_codes == i).sum() - - def test_join(self): - lp1 = self.panel.filter(['ItemA', 'ItemB']) - lp2 = self.panel.filter(['ItemC']) - - joined = lp1.join(lp2) - - assert len(joined.columns) == 3 - - pytest.raises(Exception, lp1.join, - self.panel.filter(['ItemB', 'ItemC'])) - def test_panel_index(): index = panelm.panel_index([1, 2, 3, 4], [1, 2, 3]) diff --git a/pandas/tests/test_sorting.py b/pandas/tests/test_sorting.py index 7500cbb3cfc3a..2a64947042979 100644 --- a/pandas/tests/test_sorting.py +++ b/pandas/tests/test_sorting.py @@ -7,6 +7,8 @@ from numpy import nan import pytest +from pandas.compat import PY2 + from pandas import DataFrame, MultiIndex, Series, compat, concat, merge from pandas.core import common as com from pandas.core.sorting import ( @@ -403,15 +405,21 @@ def test_mixed_integer_from_list(self): expected = np.array([0, 0, 1, 'a', 'b', 'b'], dtype=object) tm.assert_numpy_array_equal(result, expected) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_unsortable(self): # GH 13714 arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object) + msg = (r"'(<|>)' not supported between instances of" + r" 'datetime\.datetime' and 'int'|" + r"unorderable types: int\(\) > datetime\.datetime\(\)") if compat.PY2: # RuntimeWarning: tp_compare didn't return -1 or -2 for exception with warnings.catch_warnings(): - pytest.raises(TypeError, safe_sort, arr) + with pytest.raises(TypeError, match=msg): + safe_sort(arr) else: - pytest.raises(TypeError, safe_sort, arr) + with pytest.raises(TypeError, match=msg): + safe_sort(arr) def test_exceptions(self): with pytest.raises(TypeError, diff --git a/pandas/tests/test_strings.py b/pandas/tests/test_strings.py index 7cea3be03d1a7..40a83f90c8dfd 100644 --- a/pandas/tests/test_strings.py +++ b/pandas/tests/test_strings.py @@ -10,7 +10,7 @@ import pytest import pandas.compat as compat -from pandas.compat import PY3, range, u +from pandas.compat import PY2, PY3, range, u from pandas import DataFrame, Index, MultiIndex, Series, concat, isna, notna import pandas.core.strings as strings @@ -76,7 +76,7 @@ def assert_series_or_index_equal(left, right): 'len', 'lower', 'lstrip', 'partition', 'rpartition', 'rsplit', 'rstrip', 'slice', 'slice_replace', 'split', - 'strip', 'swapcase', 'title', 'upper' + 'strip', 'swapcase', 'title', 'upper', 'casefold' ], [()] * 100, [{}] * 100)) ids, _, _ = zip(*_any_string_method) # use method name as fixture-id @@ -1002,11 +1002,13 @@ def test_replace(self): tm.assert_series_equal(result, exp) # GH 13438 + msg = "repl must be a string or callable" for klass in (Series, Index): for repl in (None, 3, {'a': 'b'}): for data in (['a', 'b', None], ['a', 'b', 'c', 'ad']): values = klass(data) - pytest.raises(TypeError, values.str.replace, 'a', repl) + with pytest.raises(TypeError, match=msg): + values.str.replace('a', repl) def test_replace_callable(self): # GH 15055 @@ -1123,10 +1125,14 @@ def test_replace_literal(self): callable_repl = lambda m: m.group(0).swapcase() compiled_pat = re.compile('[a-z][A-Z]{2}') - pytest.raises(ValueError, values.str.replace, 'abc', callable_repl, - regex=False) - pytest.raises(ValueError, values.str.replace, compiled_pat, '', - regex=False) + msg = "Cannot use a callable replacement when regex=False" + with pytest.raises(ValueError, match=msg): + values.str.replace('abc', callable_repl, regex=False) + + msg = ("Cannot use a compiled regex as replacement pattern with" + " regex=False") + with pytest.raises(ValueError, match=msg): + values.str.replace(compiled_pat, '', regex=False) def test_repeat(self): values = Series(['a', 'b', NA, 'c', NA, 'd']) @@ -1242,12 +1248,13 @@ def test_extract_expand_False(self): for klass in [Series, Index]: # no groups s_or_idx = klass(['A1', 'B2', 'C3']) - f = lambda: s_or_idx.str.extract('[ABC][123]', expand=False) - pytest.raises(ValueError, f) + msg = "pattern contains no capture groups" + with pytest.raises(ValueError, match=msg): + s_or_idx.str.extract('[ABC][123]', expand=False) # only non-capturing groups - f = lambda: s_or_idx.str.extract('(?:[AB]).*', expand=False) - pytest.raises(ValueError, f) + with pytest.raises(ValueError, match=msg): + s_or_idx.str.extract('(?:[AB]).*', expand=False) # single group renames series/index properly s_or_idx = klass(['A1', 'A2']) @@ -1387,12 +1394,13 @@ def test_extract_expand_True(self): for klass in [Series, Index]: # no groups s_or_idx = klass(['A1', 'B2', 'C3']) - f = lambda: s_or_idx.str.extract('[ABC][123]', expand=True) - pytest.raises(ValueError, f) + msg = "pattern contains no capture groups" + with pytest.raises(ValueError, match=msg): + s_or_idx.str.extract('[ABC][123]', expand=True) # only non-capturing groups - f = lambda: s_or_idx.str.extract('(?:[AB]).*', expand=True) - pytest.raises(ValueError, f) + with pytest.raises(ValueError, match=msg): + s_or_idx.str.extract('(?:[AB]).*', expand=True) # single group renames series/index properly s_or_idx = klass(['A1', 'A2']) @@ -3315,10 +3323,14 @@ def test_encode_decode(self): tm.assert_series_equal(result, exp) + @pytest.mark.skipif(PY2, reason="pytest.raises match regex fails") def test_encode_decode_errors(self): encodeBase = Series([u('a'), u('b'), u('a\x9d')]) - pytest.raises(UnicodeEncodeError, encodeBase.str.encode, 'cp1252') + msg = (r"'charmap' codec can't encode character '\\x9d' in position 1:" + " character maps to ") + with pytest.raises(UnicodeEncodeError, match=msg): + encodeBase.str.encode('cp1252') f = lambda x: x.encode('cp1252', 'ignore') result = encodeBase.str.encode('cp1252', 'ignore') @@ -3327,7 +3339,10 @@ def test_encode_decode_errors(self): decodeBase = Series([b'a', b'b', b'a\x9d']) - pytest.raises(UnicodeDecodeError, decodeBase.str.decode, 'cp1252') + msg = ("'charmap' codec can't decode byte 0x9d in position 1:" + " character maps to ") + with pytest.raises(UnicodeDecodeError, match=msg): + decodeBase.str.decode('cp1252') f = lambda x: x.decode('cp1252', 'ignore') result = decodeBase.str.decode('cp1252', 'ignore') @@ -3418,9 +3433,19 @@ def test_method_on_bytes(self): lhs = Series(np.array(list('abc'), 'S1').astype(object)) rhs = Series(np.array(list('def'), 'S1').astype(object)) if compat.PY3: - pytest.raises(TypeError, lhs.str.cat, rhs) + with pytest.raises(TypeError, match="can't concat str to bytes"): + lhs.str.cat(rhs) else: result = lhs.str.cat(rhs) expected = Series(np.array( ['ad', 'be', 'cf'], 'S2').astype(object)) tm.assert_series_equal(result, expected) + + @pytest.mark.skipif(compat.PY2, reason='not in python2') + def test_casefold(self): + # GH25405 + expected = Series(['ss', NA, 'case', 'ssd']) + s = Series(['ß', NA, 'case', 'ßd']) + result = s.str.casefold() + + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/test_window.py b/pandas/tests/test_window.py index e816d4c04344a..ce9d1888b8e96 100644 --- a/pandas/tests/test_window.py +++ b/pandas/tests/test_window.py @@ -89,9 +89,8 @@ def test_getitem(self): def test_select_bad_cols(self): df = DataFrame([[1, 2]], columns=['A', 'B']) g = df.rolling(window=5) - pytest.raises(KeyError, g.__getitem__, ['C']) # g[['C']] - - pytest.raises(KeyError, g.__getitem__, ['A', 'C']) # g[['A', 'C']] + with pytest.raises(KeyError, match="Columns not found: 'C'"): + g[['C']] with pytest.raises(KeyError, match='^[^A]+$'): # A should not be referenced as a bad column... # will have to rethink regex if you change message! @@ -102,7 +101,9 @@ def test_attribute_access(self): df = DataFrame([[1, 2]], columns=['A', 'B']) r = df.rolling(window=5) tm.assert_series_equal(r.A.sum(), r['A'].sum()) - pytest.raises(AttributeError, lambda: r.F) + msg = "'Rolling' object has no attribute 'F'" + with pytest.raises(AttributeError, match=msg): + r.F def tests_skip_nuisance(self): @@ -217,12 +218,11 @@ def test_agg_nested_dicts(self): df = DataFrame({'A': range(5), 'B': range(0, 10, 2)}) r = df.rolling(window=3) - def f(): + msg = r"cannot perform renaming for (r1|r2) with a nested dictionary" + with pytest.raises(SpecificationError, match=msg): r.aggregate({'r1': {'A': ['mean', 'sum']}, 'r2': {'B': ['mean', 'sum']}}) - pytest.raises(SpecificationError, f) - expected = concat([r['A'].mean(), r['A'].std(), r['B'].mean(), r['B'].std()], axis=1) expected.columns = pd.MultiIndex.from_tuples([('ra', 'mean'), ( @@ -1806,26 +1806,38 @@ def test_ewm_alpha_arg(self): def test_ewm_domain_checks(self): # GH 12492 s = Series(self.arr) - # com must satisfy: com >= 0 - pytest.raises(ValueError, s.ewm, com=-0.1) + msg = "comass must satisfy: comass >= 0" + with pytest.raises(ValueError, match=msg): + s.ewm(com=-0.1) s.ewm(com=0.0) s.ewm(com=0.1) - # span must satisfy: span >= 1 - pytest.raises(ValueError, s.ewm, span=-0.1) - pytest.raises(ValueError, s.ewm, span=0.0) - pytest.raises(ValueError, s.ewm, span=0.9) + + msg = "span must satisfy: span >= 1" + with pytest.raises(ValueError, match=msg): + s.ewm(span=-0.1) + with pytest.raises(ValueError, match=msg): + s.ewm(span=0.0) + with pytest.raises(ValueError, match=msg): + s.ewm(span=0.9) s.ewm(span=1.0) s.ewm(span=1.1) - # halflife must satisfy: halflife > 0 - pytest.raises(ValueError, s.ewm, halflife=-0.1) - pytest.raises(ValueError, s.ewm, halflife=0.0) + + msg = "halflife must satisfy: halflife > 0" + with pytest.raises(ValueError, match=msg): + s.ewm(halflife=-0.1) + with pytest.raises(ValueError, match=msg): + s.ewm(halflife=0.0) s.ewm(halflife=0.1) - # alpha must satisfy: 0 < alpha <= 1 - pytest.raises(ValueError, s.ewm, alpha=-0.1) - pytest.raises(ValueError, s.ewm, alpha=0.0) + + msg = "alpha must satisfy: 0 < alpha <= 1" + with pytest.raises(ValueError, match=msg): + s.ewm(alpha=-0.1) + with pytest.raises(ValueError, match=msg): + s.ewm(alpha=0.0) s.ewm(alpha=0.1) s.ewm(alpha=1.0) - pytest.raises(ValueError, s.ewm, alpha=1.1) + with pytest.raises(ValueError, match=msg): + s.ewm(alpha=1.1) @pytest.mark.parametrize('method', ['mean', 'vol', 'var']) def test_ew_empty_series(self, method): @@ -2598,7 +2610,10 @@ def get_result(obj, obj2=None): def test_flex_binary_moment(self): # GH3155 # don't blow the stack - pytest.raises(TypeError, rwindow._flex_binary_moment, 5, 6, None) + msg = ("arguments to moment function must be of type" + " np.ndarray/Series/DataFrame") + with pytest.raises(TypeError, match=msg): + rwindow._flex_binary_moment(5, 6, None) def test_corr_sanity(self): # GH 3155 @@ -2682,7 +2697,10 @@ def func(A, B, com, **kwargs): Series([1.]), Series([1.]), 50, min_periods=min_periods) tm.assert_series_equal(result, Series([np.NaN])) - pytest.raises(Exception, func, A, randn(50), 20, min_periods=5) + msg = "Input arrays must be of the same type!" + # exception raised is Exception + with pytest.raises(Exception, match=msg): + func(A, randn(50), 20, min_periods=5) def test_expanding_apply_args_kwargs(self, raw): @@ -3266,9 +3284,9 @@ def setup_method(self, method): def test_mutated(self): - def f(): + msg = r"group\(\) got an unexpected keyword argument 'foo'" + with pytest.raises(TypeError, match=msg): self.frame.groupby('A', foo=1) - pytest.raises(TypeError, f) g = self.frame.groupby('A') assert not g.mutated diff --git a/pandas/tests/tools/test_numeric.py b/pandas/tests/tools/test_numeric.py index 3822170d884aa..97e1dc2f6aefc 100644 --- a/pandas/tests/tools/test_numeric.py +++ b/pandas/tests/tools/test_numeric.py @@ -4,11 +4,50 @@ from numpy import iinfo import pytest +import pandas.compat as compat + import pandas as pd from pandas import DataFrame, Index, Series, to_numeric from pandas.util import testing as tm +@pytest.fixture(params=[None, "ignore", "raise", "coerce"]) +def errors(request): + return request.param + + +@pytest.fixture(params=[True, False]) +def signed(request): + return request.param + + +@pytest.fixture(params=[lambda x: x, str], ids=["identity", "str"]) +def transform(request): + return request.param + + +@pytest.fixture(params=[ + 47393996303418497800, + 100000000000000000000 +]) +def large_val(request): + return request.param + + +@pytest.fixture(params=[True, False]) +def multiple_elts(request): + return request.param + + +@pytest.fixture(params=[ + (lambda x: Index(x, name="idx"), tm.assert_index_equal), + (lambda x: Series(x, name="ser"), tm.assert_series_equal), + (lambda x: np.array(Index(x).values), tm.assert_numpy_array_equal) +]) +def transform_assert_equal(request): + return request.param + + @pytest.mark.parametrize("input_kwargs,result_kwargs", [ (dict(), dict(dtype=np.int64)), (dict(errors="coerce", downcast="integer"), dict(dtype=np.int8)) @@ -172,7 +211,6 @@ def test_all_nan(): tm.assert_series_equal(result, expected) -@pytest.mark.parametrize("errors", [None, "ignore", "raise", "coerce"]) def test_type_check(errors): # see gh-11776 df = DataFrame({"a": [1, -3.14, 7], "b": ["4", "5", "6"]}) @@ -183,11 +221,100 @@ def test_type_check(errors): to_numeric(df, **kwargs) -@pytest.mark.parametrize("val", [ - 1, 1.1, "1", "1.1", -1.5, "-1.5" -]) -def test_scalar(val): - assert to_numeric(val) == float(val) +@pytest.mark.parametrize("val", [1, 1.1, 20001]) +def test_scalar(val, signed, transform): + val = -val if signed else val + assert to_numeric(transform(val)) == float(val) + + +def test_really_large_scalar(large_val, signed, transform, errors): + # see gh-24910 + kwargs = dict(errors=errors) if errors is not None else dict() + val = -large_val if signed else large_val + + val = transform(val) + val_is_string = isinstance(val, str) + + if val_is_string and errors in (None, "raise"): + msg = "Integer out of range. at position 0" + with pytest.raises(ValueError, match=msg): + to_numeric(val, **kwargs) + else: + expected = float(val) if (errors == "coerce" and + val_is_string) else val + assert tm.assert_almost_equal(to_numeric(val, **kwargs), expected) + + +def test_really_large_in_arr(large_val, signed, transform, + multiple_elts, errors): + # see gh-24910 + kwargs = dict(errors=errors) if errors is not None else dict() + val = -large_val if signed else large_val + val = transform(val) + + extra_elt = "string" + arr = [val] + multiple_elts * [extra_elt] + + val_is_string = isinstance(val, str) + coercing = errors == "coerce" + + if errors in (None, "raise") and (val_is_string or multiple_elts): + if val_is_string: + msg = "Integer out of range. at position 0" + else: + msg = 'Unable to parse string "string" at position 1' + + with pytest.raises(ValueError, match=msg): + to_numeric(arr, **kwargs) + else: + result = to_numeric(arr, **kwargs) + + exp_val = float(val) if (coercing and val_is_string) else val + expected = [exp_val] + + if multiple_elts: + if coercing: + expected.append(np.nan) + exp_dtype = float + else: + expected.append(extra_elt) + exp_dtype = object + else: + exp_dtype = float if isinstance(exp_val, ( + int, compat.long, float)) else object + + tm.assert_almost_equal(result, np.array(expected, dtype=exp_dtype)) + + +def test_really_large_in_arr_consistent(large_val, signed, + multiple_elts, errors): + # see gh-24910 + # + # Even if we discover that we have to hold float, does not mean + # we should be lenient on subsequent elements that fail to be integer. + kwargs = dict(errors=errors) if errors is not None else dict() + arr = [str(-large_val if signed else large_val)] + + if multiple_elts: + arr.insert(0, large_val) + + if errors in (None, "raise"): + index = int(multiple_elts) + msg = "Integer out of range. at position {index}".format(index=index) + + with pytest.raises(ValueError, match=msg): + to_numeric(arr, **kwargs) + else: + result = to_numeric(arr, **kwargs) + + if errors == "coerce": + expected = [float(i) for i in arr] + exp_dtype = float + else: + expected = arr + exp_dtype = object + + tm.assert_almost_equal(result, np.array(expected, dtype=exp_dtype)) @pytest.mark.parametrize("errors,checker", [ @@ -205,15 +332,6 @@ def test_scalar_fail(errors, checker): assert checker(to_numeric(scalar, errors=errors)) -@pytest.fixture(params=[ - (lambda x: Index(x, name="idx"), tm.assert_index_equal), - (lambda x: Series(x, name="ser"), tm.assert_series_equal), - (lambda x: np.array(Index(x).values), tm.assert_numpy_array_equal) -]) -def transform_assert_equal(request): - return request.param - - @pytest.mark.parametrize("data", [ [1, 2, 3], [1., np.nan, 3, np.nan] diff --git a/pandas/tests/tseries/frequencies/__init__.py b/pandas/tests/tseries/frequencies/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tests/tseries/frequencies/test_freq_code.py b/pandas/tests/tseries/frequencies/test_freq_code.py new file mode 100644 index 0000000000000..0aa29e451b1ba --- /dev/null +++ b/pandas/tests/tseries/frequencies/test_freq_code.py @@ -0,0 +1,149 @@ +import pytest + +from pandas._libs.tslibs import frequencies as libfrequencies, resolution +from pandas._libs.tslibs.frequencies import ( + FreqGroup, _period_code_map, get_freq, get_freq_code) +import pandas.compat as compat + +import pandas.tseries.offsets as offsets + + +@pytest.fixture(params=list(compat.iteritems(_period_code_map))) +def period_code_item(request): + return request.param + + +@pytest.mark.parametrize("freqstr,expected", [ + ("A", 1000), ("3A", 1000), ("-1A", 1000), + ("Y", 1000), ("3Y", 1000), ("-1Y", 1000), + ("W", 4000), ("W-MON", 4001), ("W-FRI", 4005) +]) +def test_freq_code(freqstr, expected): + assert get_freq(freqstr) == expected + + +def test_freq_code_match(period_code_item): + freqstr, code = period_code_item + assert get_freq(freqstr) == code + + +@pytest.mark.parametrize("freqstr,expected", [ + ("A", 1000), ("3A", 1000), ("-1A", 1000), ("A-JAN", 1000), + ("A-MAY", 1000), ("Y", 1000), ("3Y", 1000), ("-1Y", 1000), + ("Y-JAN", 1000), ("Y-MAY", 1000), (offsets.YearEnd(), 1000), + (offsets.YearEnd(month=1), 1000), (offsets.YearEnd(month=5), 1000), + ("W", 4000), ("W-MON", 4000), ("W-FRI", 4000), (offsets.Week(), 4000), + (offsets.Week(weekday=1), 4000), (offsets.Week(weekday=5), 4000), + ("T", FreqGroup.FR_MIN), +]) +def test_freq_group(freqstr, expected): + assert resolution.get_freq_group(freqstr) == expected + + +def test_freq_group_match(period_code_item): + freqstr, code = period_code_item + + str_group = resolution.get_freq_group(freqstr) + code_group = resolution.get_freq_group(code) + + assert str_group == code_group == code // 1000 * 1000 + + +@pytest.mark.parametrize("freqstr,exp_freqstr", [ + ("D", "D"), ("W", "D"), ("M", "D"), + ("S", "S"), ("T", "S"), ("H", "S") +]) +def test_get_to_timestamp_base(freqstr, exp_freqstr): + tsb = libfrequencies.get_to_timestamp_base + + assert tsb(get_freq_code(freqstr)[0]) == get_freq_code(exp_freqstr)[0] + + +_reso = resolution.Resolution + + +@pytest.mark.parametrize("freqstr,expected", [ + ("A", "year"), ("Q", "quarter"), ("M", "month"), + ("D", "day"), ("H", "hour"), ("T", "minute"), + ("S", "second"), ("L", "millisecond"), + ("U", "microsecond"), ("N", "nanosecond") +]) +def test_get_str_from_freq(freqstr, expected): + assert _reso.get_str_from_freq(freqstr) == expected + + +@pytest.mark.parametrize("freq", ["A", "Q", "M", "D", "H", + "T", "S", "L", "U", "N"]) +def test_get_freq_roundtrip(freq): + result = _reso.get_freq(_reso.get_str_from_freq(freq)) + assert freq == result + + +@pytest.mark.parametrize("freq", ["D", "H", "T", "S", "L", "U"]) +def test_get_freq_roundtrip2(freq): + result = _reso.get_freq(_reso.get_str(_reso.get_reso_from_freq(freq))) + assert freq == result + + +@pytest.mark.parametrize("args,expected", [ + ((1.5, "T"), (90, "S")), ((62.4, "T"), (3744, "S")), + ((1.04, "H"), (3744, "S")), ((1, "D"), (1, "D")), + ((0.342931, "H"), (1234551600, "U")), ((1.2345, "D"), (106660800, "L")) +]) +def test_resolution_bumping(args, expected): + # see gh-14378 + assert _reso.get_stride_from_decimal(*args) == expected + + +@pytest.mark.parametrize("args", [ + (0.5, "N"), + + # Too much precision in the input can prevent. + (0.3429324798798269273987982, "H") +]) +def test_cat(args): + msg = "Could not convert to integer offset at any resolution" + + with pytest.raises(ValueError, match=msg): + _reso.get_stride_from_decimal(*args) + + +@pytest.mark.parametrize("freq_input,expected", [ + # Frequency string. + ("A", (get_freq("A"), 1)), + ("3D", (get_freq("D"), 3)), + ("-2M", (get_freq("M"), -2)), + + # Tuple. + (("D", 1), (get_freq("D"), 1)), + (("A", 3), (get_freq("A"), 3)), + (("M", -2), (get_freq("M"), -2)), + ((5, "T"), (FreqGroup.FR_MIN, 5)), + + # Numeric Tuple. + ((1000, 1), (1000, 1)), + + # Offsets. + (offsets.Day(), (get_freq("D"), 1)), + (offsets.Day(3), (get_freq("D"), 3)), + (offsets.Day(-2), (get_freq("D"), -2)), + (offsets.MonthEnd(), (get_freq("M"), 1)), + (offsets.MonthEnd(3), (get_freq("M"), 3)), + (offsets.MonthEnd(-2), (get_freq("M"), -2)), + (offsets.Week(), (get_freq("W"), 1)), + (offsets.Week(3), (get_freq("W"), 3)), + (offsets.Week(-2), (get_freq("W"), -2)), + (offsets.Hour(), (FreqGroup.FR_HR, 1)), + + # Monday is weekday=0. + (offsets.Week(weekday=1), (get_freq("W-TUE"), 1)), + (offsets.Week(3, weekday=0), (get_freq("W-MON"), 3)), + (offsets.Week(-2, weekday=4), (get_freq("W-FRI"), -2)), +]) +def test_get_freq_code(freq_input, expected): + assert get_freq_code(freq_input) == expected + + +def test_get_code_invalid(): + with pytest.raises(ValueError, match="Invalid frequency"): + get_freq_code((5, "baz")) diff --git a/pandas/tests/tseries/frequencies/test_inference.py b/pandas/tests/tseries/frequencies/test_inference.py new file mode 100644 index 0000000000000..9e7ddbc45bba8 --- /dev/null +++ b/pandas/tests/tseries/frequencies/test_inference.py @@ -0,0 +1,406 @@ +from datetime import datetime, timedelta + +import numpy as np +import pytest + +from pandas._libs.tslibs.ccalendar import DAYS, MONTHS +from pandas._libs.tslibs.frequencies import INVALID_FREQ_ERR_MSG +import pandas.compat as compat +from pandas.compat import is_platform_windows, range + +from pandas import ( + DatetimeIndex, Index, Series, Timestamp, date_range, period_range) +from pandas.core.tools.datetimes import to_datetime +import pandas.util.testing as tm + +import pandas.tseries.frequencies as frequencies +import pandas.tseries.offsets as offsets + + +def _check_generated_range(start, periods, freq): + """ + Check the range generated from a given start, frequency, and period count. + + Parameters + ---------- + start : str + The start date. + periods : int + The number of periods. + freq : str + The frequency of the range. + """ + freq = freq.upper() + + gen = date_range(start, periods=periods, freq=freq) + index = DatetimeIndex(gen.values) + + if not freq.startswith("Q-"): + assert frequencies.infer_freq(index) == gen.freqstr + else: + inf_freq = frequencies.infer_freq(index) + is_dec_range = inf_freq == "Q-DEC" and gen.freqstr in ( + "Q", "Q-DEC", "Q-SEP", "Q-JUN", "Q-MAR") + is_nov_range = inf_freq == "Q-NOV" and gen.freqstr in ( + "Q-NOV", "Q-AUG", "Q-MAY", "Q-FEB") + is_oct_range = inf_freq == "Q-OCT" and gen.freqstr in ( + "Q-OCT", "Q-JUL", "Q-APR", "Q-JAN") + assert is_dec_range or is_nov_range or is_oct_range + + +@pytest.fixture(params=[(timedelta(1), "D"), + (timedelta(hours=1), "H"), + (timedelta(minutes=1), "T"), + (timedelta(seconds=1), "S"), + (np.timedelta64(1, "ns"), "N"), + (timedelta(microseconds=1), "U"), + (timedelta(microseconds=1000), "L")]) +def base_delta_code_pair(request): + return request.param + + +@pytest.fixture(params=[1, 2, 3, 4]) +def count(request): + return request.param + + +@pytest.fixture(params=DAYS) +def day(request): + return request.param + + +@pytest.fixture(params=MONTHS) +def month(request): + return request.param + + +@pytest.fixture(params=[5, 7]) +def periods(request): + return request.param + + +def test_raise_if_period_index(): + index = period_range(start="1/1/1990", periods=20, freq="M") + msg = "Check the `freq` attribute instead of using infer_freq" + + with pytest.raises(TypeError, match=msg): + frequencies.infer_freq(index) + + +def test_raise_if_too_few(): + index = DatetimeIndex(["12/31/1998", "1/3/1999"]) + msg = "Need at least 3 dates to infer frequency" + + with pytest.raises(ValueError, match=msg): + frequencies.infer_freq(index) + + +def test_business_daily(): + index = DatetimeIndex(["01/01/1999", "1/4/1999", "1/5/1999"]) + assert frequencies.infer_freq(index) == "B" + + +def test_business_daily_look_alike(): + # see gh-16624 + # + # Do not infer "B when "weekend" (2-day gap) in wrong place. + index = DatetimeIndex(["12/31/1998", "1/3/1999", "1/4/1999"]) + assert frequencies.infer_freq(index) is None + + +def test_day_corner(): + index = DatetimeIndex(["1/1/2000", "1/2/2000", "1/3/2000"]) + assert frequencies.infer_freq(index) == "D" + + +def test_non_datetime_index(): + dates = to_datetime(["1/1/2000", "1/2/2000", "1/3/2000"]) + assert frequencies.infer_freq(dates) == "D" + + +def test_fifth_week_of_month_infer(): + # see gh-9425 + # + # Only attempt to infer up to WOM-4. + index = DatetimeIndex(["2014-03-31", "2014-06-30", "2015-03-30"]) + assert frequencies.infer_freq(index) is None + + +def test_week_of_month_fake(): + # All of these dates are on same day + # of week and are 4 or 5 weeks apart. + index = DatetimeIndex(["2013-08-27", "2013-10-01", + "2013-10-29", "2013-11-26"]) + assert frequencies.infer_freq(index) != "WOM-4TUE" + + +def test_fifth_week_of_month(): + # see gh-9425 + # + # Only supports freq up to WOM-4. + msg = ("Of the four parameters: start, end, periods, " + "and freq, exactly three must be specified") + + with pytest.raises(ValueError, match=msg): + date_range("2014-01-01", freq="WOM-5MON") + + +def test_monthly_ambiguous(): + rng = DatetimeIndex(["1/31/2000", "2/29/2000", "3/31/2000"]) + assert rng.inferred_freq == "M" + + +def test_annual_ambiguous(): + rng = DatetimeIndex(["1/31/2000", "1/31/2001", "1/31/2002"]) + assert rng.inferred_freq == "A-JAN" + + +def test_infer_freq_delta(base_delta_code_pair, count): + b = Timestamp(datetime.now()) + base_delta, code = base_delta_code_pair + + inc = base_delta * count + index = DatetimeIndex([b + inc * j for j in range(3)]) + + exp_freq = "%d%s" % (count, code) if count > 1 else code + assert frequencies.infer_freq(index) == exp_freq + + +@pytest.mark.parametrize("constructor", [ + lambda now, delta: DatetimeIndex([now + delta * 7] + + [now + delta * j for j in range(3)]), + lambda now, delta: DatetimeIndex([now + delta * j for j in range(3)] + + [now + delta * 7]) +]) +def test_infer_freq_custom(base_delta_code_pair, constructor): + b = Timestamp(datetime.now()) + base_delta, _ = base_delta_code_pair + + index = constructor(b, base_delta) + assert frequencies.infer_freq(index) is None + + +def test_weekly_infer(periods, day): + _check_generated_range("1/1/2000", periods, "W-{day}".format(day=day)) + + +def test_week_of_month_infer(periods, day, count): + _check_generated_range("1/1/2000", periods, + "WOM-{count}{day}".format(count=count, day=day)) + + +@pytest.mark.parametrize("freq", ["M", "BM", "BMS"]) +def test_monthly_infer(periods, freq): + _check_generated_range("1/1/2000", periods, "M") + + +def test_quarterly_infer(month, periods): + _check_generated_range("1/1/2000", periods, + "Q-{month}".format(month=month)) + + +@pytest.mark.parametrize("annual", ["A", "BA"]) +def test_annually_infer(month, periods, annual): + _check_generated_range("1/1/2000", periods, + "{annual}-{month}".format(annual=annual, + month=month)) + + +@pytest.mark.parametrize("freq,expected", [ + ("Q", "Q-DEC"), ("Q-NOV", "Q-NOV"), ("Q-OCT", "Q-OCT") +]) +def test_infer_freq_index(freq, expected): + rng = period_range("1959Q2", "2009Q3", freq=freq) + rng = Index(rng.to_timestamp("D", how="e").astype(object)) + + assert rng.inferred_freq == expected + + +@pytest.mark.parametrize( + "expected,dates", + list(compat.iteritems( + {"AS-JAN": ["2009-01-01", "2010-01-01", "2011-01-01", "2012-01-01"], + "Q-OCT": ["2009-01-31", "2009-04-30", "2009-07-31", "2009-10-31"], + "M": ["2010-11-30", "2010-12-31", "2011-01-31", "2011-02-28"], + "W-SAT": ["2010-12-25", "2011-01-01", "2011-01-08", "2011-01-15"], + "D": ["2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04"], + "H": ["2011-12-31 22:00", "2011-12-31 23:00", + "2012-01-01 00:00", "2012-01-01 01:00"]})) +) +def test_infer_freq_tz(tz_naive_fixture, expected, dates): + # see gh-7310 + tz = tz_naive_fixture + idx = DatetimeIndex(dates, tz=tz) + assert idx.inferred_freq == expected + + +@pytest.mark.parametrize("date_pair", [ + ["2013-11-02", "2013-11-5"], # Fall DST + ["2014-03-08", "2014-03-11"], # Spring DST + ["2014-01-01", "2014-01-03"] # Regular Time +]) +@pytest.mark.parametrize("freq", [ + "3H", "10T", "3601S", "3600001L", "3600000001U", "3600000000001N" +]) +def test_infer_freq_tz_transition(tz_naive_fixture, date_pair, freq): + # see gh-8772 + tz = tz_naive_fixture + idx = date_range(date_pair[0], date_pair[1], freq=freq, tz=tz) + assert idx.inferred_freq == freq + + +def test_infer_freq_tz_transition_custom(): + index = date_range("2013-11-03", periods=5, + freq="3H").tz_localize("America/Chicago") + assert index.inferred_freq is None + + +@pytest.mark.parametrize("data,expected", [ + # Hourly freq in a day must result in "H" + (["2014-07-01 09:00", "2014-07-01 10:00", "2014-07-01 11:00", + "2014-07-01 12:00", "2014-07-01 13:00", "2014-07-01 14:00"], "H"), + + (["2014-07-01 09:00", "2014-07-01 10:00", "2014-07-01 11:00", + "2014-07-01 12:00", "2014-07-01 13:00", "2014-07-01 14:00", + "2014-07-01 15:00", "2014-07-01 16:00", "2014-07-02 09:00", + "2014-07-02 10:00", "2014-07-02 11:00"], "BH"), + (["2014-07-04 09:00", "2014-07-04 10:00", "2014-07-04 11:00", + "2014-07-04 12:00", "2014-07-04 13:00", "2014-07-04 14:00", + "2014-07-04 15:00", "2014-07-04 16:00", "2014-07-07 09:00", + "2014-07-07 10:00", "2014-07-07 11:00"], "BH"), + (["2014-07-04 09:00", "2014-07-04 10:00", "2014-07-04 11:00", + "2014-07-04 12:00", "2014-07-04 13:00", "2014-07-04 14:00", + "2014-07-04 15:00", "2014-07-04 16:00", "2014-07-07 09:00", + "2014-07-07 10:00", "2014-07-07 11:00", "2014-07-07 12:00", + "2014-07-07 13:00", "2014-07-07 14:00", "2014-07-07 15:00", + "2014-07-07 16:00", "2014-07-08 09:00", "2014-07-08 10:00", + "2014-07-08 11:00", "2014-07-08 12:00", "2014-07-08 13:00", + "2014-07-08 14:00", "2014-07-08 15:00", "2014-07-08 16:00"], "BH"), +]) +def test_infer_freq_business_hour(data, expected): + # see gh-7905 + idx = DatetimeIndex(data) + assert idx.inferred_freq == expected + + +def test_not_monotonic(): + rng = DatetimeIndex(["1/31/2000", "1/31/2001", "1/31/2002"]) + rng = rng[::-1] + + assert rng.inferred_freq == "-1A-JAN" + + +def test_non_datetime_index2(): + rng = DatetimeIndex(["1/31/2000", "1/31/2001", "1/31/2002"]) + vals = rng.to_pydatetime() + + result = frequencies.infer_freq(vals) + assert result == rng.inferred_freq + + +@pytest.mark.parametrize("idx", [ + tm.makeIntIndex(10), tm.makeFloatIndex(10), tm.makePeriodIndex(10) +]) +def test_invalid_index_types(idx): + msg = ("(cannot infer freq from a non-convertible)|" + "(Check the `freq` attribute instead of using infer_freq)") + + with pytest.raises(TypeError, match=msg): + frequencies.infer_freq(idx) + + +@pytest.mark.skipif(is_platform_windows(), + reason="see gh-10822: Windows issue") +@pytest.mark.parametrize("idx", [tm.makeStringIndex(10), + tm.makeUnicodeIndex(10)]) +def test_invalid_index_types_unicode(idx): + # see gh-10822 + # + # Odd error message on conversions to datetime for unicode. + msg = "Unknown string format" + + with pytest.raises(ValueError, match=msg): + frequencies.infer_freq(idx) + + +def test_string_datetime_like_compat(): + # see gh-6463 + data = ["2004-01", "2004-02", "2004-03", "2004-04"] + + expected = frequencies.infer_freq(data) + result = frequencies.infer_freq(Index(data)) + + assert result == expected + + +def test_series(): + # see gh-6407 + s = Series(date_range("20130101", "20130110")) + inferred = frequencies.infer_freq(s) + assert inferred == "D" + + +@pytest.mark.parametrize("end", [10, 10.]) +def test_series_invalid_type(end): + # see gh-6407 + msg = "cannot infer freq from a non-convertible dtype on a Series" + s = Series(np.arange(end)) + + with pytest.raises(TypeError, match=msg): + frequencies.infer_freq(s) + + +def test_series_inconvertible_string(): + # see gh-6407 + msg = "Unknown string format" + + with pytest.raises(ValueError, match=msg): + frequencies.infer_freq(Series(["foo", "bar"])) + + +@pytest.mark.parametrize("freq", [None, "L"]) +def test_series_period_index(freq): + # see gh-6407 + # + # Cannot infer on PeriodIndex + msg = "cannot infer freq from a non-convertible dtype on a Series" + s = Series(period_range("2013", periods=10, freq=freq)) + + with pytest.raises(TypeError, match=msg): + frequencies.infer_freq(s) + + +@pytest.mark.parametrize("freq", ["M", "L", "S"]) +def test_series_datetime_index(freq): + s = Series(date_range("20130101", periods=10, freq=freq)) + inferred = frequencies.infer_freq(s) + assert inferred == freq + + +@pytest.mark.parametrize("offset_func", [ + frequencies.get_offset, + lambda freq: date_range("2011-01-01", periods=5, freq=freq) +]) +@pytest.mark.parametrize("freq", [ + "WEEKDAY", "EOM", "W@MON", "W@TUE", "W@WED", "W@THU", + "W@FRI", "W@SAT", "W@SUN", "Q@JAN", "Q@FEB", "Q@MAR", + "A@JAN", "A@FEB", "A@MAR", "A@APR", "A@MAY", "A@JUN", + "A@JUL", "A@AUG", "A@SEP", "A@OCT", "A@NOV", "A@DEC", + "Y@JAN", "WOM@1MON", "WOM@2MON", "WOM@3MON", + "WOM@4MON", "WOM@1TUE", "WOM@2TUE", "WOM@3TUE", + "WOM@4TUE", "WOM@1WED", "WOM@2WED", "WOM@3WED", + "WOM@4WED", "WOM@1THU", "WOM@2THU", "WOM@3THU", + "WOM@4THU", "WOM@1FRI", "WOM@2FRI", "WOM@3FRI", + "WOM@4FRI" +]) +def test_legacy_offset_warnings(offset_func, freq): + with pytest.raises(ValueError, match=INVALID_FREQ_ERR_MSG): + offset_func(freq) + + +def test_ms_vs_capital_ms(): + left = frequencies.get_offset("ms") + right = frequencies.get_offset("MS") + + assert left == offsets.Milli() + assert right == offsets.MonthBegin() diff --git a/pandas/tests/tseries/frequencies/test_to_offset.py b/pandas/tests/tseries/frequencies/test_to_offset.py new file mode 100644 index 0000000000000..c9c35b47f3475 --- /dev/null +++ b/pandas/tests/tseries/frequencies/test_to_offset.py @@ -0,0 +1,146 @@ +import re + +import pytest + +from pandas import Timedelta + +import pandas.tseries.frequencies as frequencies +import pandas.tseries.offsets as offsets + + +@pytest.mark.parametrize("freq_input,expected", [ + (frequencies.to_offset("10us"), offsets.Micro(10)), + (offsets.Hour(), offsets.Hour()), + ((5, "T"), offsets.Minute(5)), + ("2h30min", offsets.Minute(150)), + ("2h 30min", offsets.Minute(150)), + ("2h30min15s", offsets.Second(150 * 60 + 15)), + ("2h 60min", offsets.Hour(3)), + ("2h 20.5min", offsets.Second(8430)), + ("1.5min", offsets.Second(90)), + ("0.5S", offsets.Milli(500)), + ("15l500u", offsets.Micro(15500)), + ("10s75L", offsets.Milli(10075)), + ("1s0.25ms", offsets.Micro(1000250)), + ("1s0.25L", offsets.Micro(1000250)), + ("2800N", offsets.Nano(2800)), + ("2SM", offsets.SemiMonthEnd(2)), + ("2SM-16", offsets.SemiMonthEnd(2, day_of_month=16)), + ("2SMS-14", offsets.SemiMonthBegin(2, day_of_month=14)), + ("2SMS-15", offsets.SemiMonthBegin(2)), +]) +def test_to_offset(freq_input, expected): + result = frequencies.to_offset(freq_input) + assert result == expected + + +@pytest.mark.parametrize("freqstr,expected", [ + ("-1S", -1), + ("-2SM", -2), + ("-1SMS", -1), + ("-5min10s", -310), +]) +def test_to_offset_negative(freqstr, expected): + result = frequencies.to_offset(freqstr) + assert result.n == expected + + +@pytest.mark.parametrize("freqstr", [ + "2h20m", "U1", "-U", "3U1", "-2-3U", "-2D:3H", + "1.5.0S", "2SMS-15-15", "2SMS-15D", "100foo", + + # Invalid leading +/- signs. + "+-1d", "-+1h", "+1", "-7", "+d", "-m", + + # Invalid shortcut anchors. + "SM-0", "SM-28", "SM-29", "SM-FOO", "BSM", "SM--1", "SMS-1", + "SMS-28", "SMS-30", "SMS-BAR", "SMS-BYR", "BSMS", "SMS--2" +]) +def test_to_offset_invalid(freqstr): + # see gh-13930 + + # We escape string because some of our + # inputs contain regex special characters. + msg = re.escape("Invalid frequency: {freqstr}".format(freqstr=freqstr)) + with pytest.raises(ValueError, match=msg): + frequencies.to_offset(freqstr) + + +def test_to_offset_no_evaluate(): + with pytest.raises(ValueError, match="Could not evaluate"): + frequencies.to_offset(("", "")) + + +@pytest.mark.parametrize("freqstr,expected", [ + ("2D 3H", offsets.Hour(51)), + ("2 D3 H", offsets.Hour(51)), + ("2 D 3 H", offsets.Hour(51)), + (" 2 D 3 H ", offsets.Hour(51)), + (" H ", offsets.Hour()), + (" 3 H ", offsets.Hour(3)), +]) +def test_to_offset_whitespace(freqstr, expected): + result = frequencies.to_offset(freqstr) + assert result == expected + + +@pytest.mark.parametrize("freqstr,expected", [ + ("00H 00T 01S", 1), + ("-00H 03T 14S", -194), +]) +def test_to_offset_leading_zero(freqstr, expected): + result = frequencies.to_offset(freqstr) + assert result.n == expected + + +@pytest.mark.parametrize("freqstr,expected", [ + ("+1d", 1), + ("+2h30min", 150), +]) +def test_to_offset_leading_plus(freqstr, expected): + result = frequencies.to_offset(freqstr) + assert result.n == expected + + +@pytest.mark.parametrize("kwargs,expected", [ + (dict(days=1, seconds=1), offsets.Second(86401)), + (dict(days=-1, seconds=1), offsets.Second(-86399)), + (dict(hours=1, minutes=10), offsets.Minute(70)), + (dict(hours=1, minutes=-10), offsets.Minute(50)), + (dict(weeks=1), offsets.Day(7)), + (dict(hours=1), offsets.Hour(1)), + (dict(hours=1), frequencies.to_offset("60min")), + (dict(microseconds=1), offsets.Micro(1)) +]) +def test_to_offset_pd_timedelta(kwargs, expected): + # see gh-9064 + td = Timedelta(**kwargs) + result = frequencies.to_offset(td) + assert result == expected + + +def test_to_offset_pd_timedelta_invalid(): + # see gh-9064 + msg = "Invalid frequency: 0 days 00:00:00" + td = Timedelta(microseconds=0) + + with pytest.raises(ValueError, match=msg): + frequencies.to_offset(td) + + +@pytest.mark.parametrize("shortcut,expected", [ + ("W", offsets.Week(weekday=6)), + ("W-SUN", offsets.Week(weekday=6)), + ("Q", offsets.QuarterEnd(startingMonth=12)), + ("Q-DEC", offsets.QuarterEnd(startingMonth=12)), + ("Q-MAY", offsets.QuarterEnd(startingMonth=5)), + ("SM", offsets.SemiMonthEnd(day_of_month=15)), + ("SM-15", offsets.SemiMonthEnd(day_of_month=15)), + ("SM-1", offsets.SemiMonthEnd(day_of_month=1)), + ("SM-27", offsets.SemiMonthEnd(day_of_month=27)), + ("SMS-2", offsets.SemiMonthBegin(day_of_month=2)), + ("SMS-27", offsets.SemiMonthBegin(day_of_month=27)), +]) +def test_anchored_shortcuts(shortcut, expected): + result = frequencies.to_offset(shortcut) + assert result == expected diff --git a/pandas/tests/tseries/holiday/__init__.py b/pandas/tests/tseries/holiday/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tests/tseries/holiday/test_calendar.py b/pandas/tests/tseries/holiday/test_calendar.py new file mode 100644 index 0000000000000..a5cc4095ce583 --- /dev/null +++ b/pandas/tests/tseries/holiday/test_calendar.py @@ -0,0 +1,77 @@ +from datetime import datetime + +import pytest + +from pandas import DatetimeIndex +import pandas.util.testing as tm + +from pandas.tseries.holiday import ( + AbstractHolidayCalendar, Holiday, Timestamp, USFederalHolidayCalendar, + USThanksgivingDay, get_calendar) + + +@pytest.mark.parametrize("transform", [ + lambda x: x, + lambda x: x.strftime("%Y-%m-%d"), + lambda x: Timestamp(x) +]) +def test_calendar(transform): + start_date = datetime(2012, 1, 1) + end_date = datetime(2012, 12, 31) + + calendar = USFederalHolidayCalendar() + holidays = calendar.holidays(transform(start_date), transform(end_date)) + + expected = [ + datetime(2012, 1, 2), + datetime(2012, 1, 16), + datetime(2012, 2, 20), + datetime(2012, 5, 28), + datetime(2012, 7, 4), + datetime(2012, 9, 3), + datetime(2012, 10, 8), + datetime(2012, 11, 12), + datetime(2012, 11, 22), + datetime(2012, 12, 25) + ] + + assert list(holidays.to_pydatetime()) == expected + + +def test_calendar_caching(): + # see gh-9552. + + class TestCalendar(AbstractHolidayCalendar): + def __init__(self, name=None, rules=None): + super(TestCalendar, self).__init__(name=name, rules=rules) + + jan1 = TestCalendar(rules=[Holiday("jan1", year=2015, month=1, day=1)]) + jan2 = TestCalendar(rules=[Holiday("jan2", year=2015, month=1, day=2)]) + + # Getting holidays for Jan 1 should not alter results for Jan 2. + tm.assert_index_equal(jan1.holidays(), DatetimeIndex(["01-Jan-2015"])) + tm.assert_index_equal(jan2.holidays(), DatetimeIndex(["02-Jan-2015"])) + + +def test_calendar_observance_dates(): + # see gh-11477 + us_fed_cal = get_calendar("USFederalHolidayCalendar") + holidays0 = us_fed_cal.holidays(datetime(2015, 7, 3), datetime( + 2015, 7, 3)) # <-- same start and end dates + holidays1 = us_fed_cal.holidays(datetime(2015, 7, 3), datetime( + 2015, 7, 6)) # <-- different start and end dates + holidays2 = us_fed_cal.holidays(datetime(2015, 7, 3), datetime( + 2015, 7, 3)) # <-- same start and end dates + + # These should all produce the same result. + # + # In addition, calling with different start and end + # dates should not alter the output if we call the + # function again with the same start and end date. + tm.assert_index_equal(holidays0, holidays1) + tm.assert_index_equal(holidays0, holidays2) + + +def test_rule_from_name(): + us_fed_cal = get_calendar("USFederalHolidayCalendar") + assert us_fed_cal.rule_from_name("Thanksgiving") == USThanksgivingDay diff --git a/pandas/tests/tseries/holiday/test_federal.py b/pandas/tests/tseries/holiday/test_federal.py new file mode 100644 index 0000000000000..62b5ab2b849ae --- /dev/null +++ b/pandas/tests/tseries/holiday/test_federal.py @@ -0,0 +1,36 @@ +from datetime import datetime + +from pandas.tseries.holiday import ( + AbstractHolidayCalendar, USMartinLutherKingJr, USMemorialDay) + + +def test_no_mlk_before_1986(): + # see gh-10278 + class MLKCalendar(AbstractHolidayCalendar): + rules = [USMartinLutherKingJr] + + holidays = MLKCalendar().holidays(start="1984", + end="1988").to_pydatetime().tolist() + + # Testing to make sure holiday is not incorrectly observed before 1986. + assert holidays == [datetime(1986, 1, 20, 0, 0), + datetime(1987, 1, 19, 0, 0)] + + +def test_memorial_day(): + class MemorialDay(AbstractHolidayCalendar): + rules = [USMemorialDay] + + holidays = MemorialDay().holidays(start="1971", + end="1980").to_pydatetime().tolist() + + # Fixes 5/31 error and checked manually against Wikipedia. + assert holidays == [datetime(1971, 5, 31, 0, 0), + datetime(1972, 5, 29, 0, 0), + datetime(1973, 5, 28, 0, 0), + datetime(1974, 5, 27, 0, 0), + datetime(1975, 5, 26, 0, 0), + datetime(1976, 5, 31, 0, 0), + datetime(1977, 5, 30, 0, 0), + datetime(1978, 5, 29, 0, 0), + datetime(1979, 5, 28, 0, 0)] diff --git a/pandas/tests/tseries/holiday/test_holiday.py b/pandas/tests/tseries/holiday/test_holiday.py new file mode 100644 index 0000000000000..27bba1cc89dee --- /dev/null +++ b/pandas/tests/tseries/holiday/test_holiday.py @@ -0,0 +1,193 @@ +from datetime import datetime + +import pytest +from pytz import utc + +import pandas.util.testing as tm + +from pandas.tseries.holiday import ( + MO, SA, AbstractHolidayCalendar, DateOffset, EasterMonday, GoodFriday, + Holiday, HolidayCalendarFactory, Timestamp, USColumbusDay, USLaborDay, + USMartinLutherKingJr, USMemorialDay, USPresidentsDay, USThanksgivingDay, + get_calendar, next_monday) + + +def _check_holiday_results(holiday, start, end, expected): + """ + Check that the dates for a given holiday match in date and timezone. + + Parameters + ---------- + holiday : Holiday + The holiday to check. + start : datetime-like + The start date of range in which to collect dates for a given holiday. + end : datetime-like + The end date of range in which to collect dates for a given holiday. + expected : list + The list of dates we expect to get. + """ + assert list(holiday.dates(start, end)) == expected + + # Verify that timezone info is preserved. + assert (list(holiday.dates(utc.localize(Timestamp(start)), + utc.localize(Timestamp(end)))) == + [utc.localize(dt) for dt in expected]) + + +@pytest.mark.parametrize("holiday,start_date,end_date,expected", [ + (USMemorialDay, datetime(2011, 1, 1), datetime(2020, 12, 31), + [datetime(2011, 5, 30), datetime(2012, 5, 28), datetime(2013, 5, 27), + datetime(2014, 5, 26), datetime(2015, 5, 25), datetime(2016, 5, 30), + datetime(2017, 5, 29), datetime(2018, 5, 28), datetime(2019, 5, 27), + datetime(2020, 5, 25)]), + + (Holiday("July 4th Eve", month=7, day=3), "2001-01-01", "2003-03-03", + [Timestamp("2001-07-03 00:00:00"), Timestamp("2002-07-03 00:00:00")]), + (Holiday("July 4th Eve", month=7, day=3, days_of_week=(0, 1, 2, 3)), + "2001-01-01", "2008-03-03", [ + Timestamp("2001-07-03 00:00:00"), Timestamp("2002-07-03 00:00:00"), + Timestamp("2003-07-03 00:00:00"), Timestamp("2006-07-03 00:00:00"), + Timestamp("2007-07-03 00:00:00")]), + + (EasterMonday, datetime(2011, 1, 1), datetime(2020, 12, 31), + [Timestamp("2011-04-25 00:00:00"), Timestamp("2012-04-09 00:00:00"), + Timestamp("2013-04-01 00:00:00"), Timestamp("2014-04-21 00:00:00"), + Timestamp("2015-04-06 00:00:00"), Timestamp("2016-03-28 00:00:00"), + Timestamp("2017-04-17 00:00:00"), Timestamp("2018-04-02 00:00:00"), + Timestamp("2019-04-22 00:00:00"), Timestamp("2020-04-13 00:00:00")]), + (GoodFriday, datetime(2011, 1, 1), datetime(2020, 12, 31), + [Timestamp("2011-04-22 00:00:00"), Timestamp("2012-04-06 00:00:00"), + Timestamp("2013-03-29 00:00:00"), Timestamp("2014-04-18 00:00:00"), + Timestamp("2015-04-03 00:00:00"), Timestamp("2016-03-25 00:00:00"), + Timestamp("2017-04-14 00:00:00"), Timestamp("2018-03-30 00:00:00"), + Timestamp("2019-04-19 00:00:00"), Timestamp("2020-04-10 00:00:00")]), + + (USThanksgivingDay, datetime(2011, 1, 1), datetime(2020, 12, 31), + [datetime(2011, 11, 24), datetime(2012, 11, 22), datetime(2013, 11, 28), + datetime(2014, 11, 27), datetime(2015, 11, 26), datetime(2016, 11, 24), + datetime(2017, 11, 23), datetime(2018, 11, 22), datetime(2019, 11, 28), + datetime(2020, 11, 26)]) +]) +def test_holiday_dates(holiday, start_date, end_date, expected): + _check_holiday_results(holiday, start_date, end_date, expected) + + +@pytest.mark.parametrize("holiday,start,expected", [ + (USMemorialDay, datetime(2015, 7, 1), []), + (USMemorialDay, "2015-05-25", "2015-05-25"), + + (USLaborDay, datetime(2015, 7, 1), []), + (USLaborDay, "2015-09-07", "2015-09-07"), + + (USColumbusDay, datetime(2015, 7, 1), []), + (USColumbusDay, "2015-10-12", "2015-10-12"), + + (USThanksgivingDay, datetime(2015, 7, 1), []), + (USThanksgivingDay, "2015-11-26", "2015-11-26"), + + (USMartinLutherKingJr, datetime(2015, 7, 1), []), + (USMartinLutherKingJr, "2015-01-19", "2015-01-19"), + + (USPresidentsDay, datetime(2015, 7, 1), []), + (USPresidentsDay, "2015-02-16", "2015-02-16"), + + (GoodFriday, datetime(2015, 7, 1), []), + (GoodFriday, "2015-04-03", "2015-04-03"), + + (EasterMonday, "2015-04-06", "2015-04-06"), + (EasterMonday, datetime(2015, 7, 1), []), + (EasterMonday, "2015-04-05", []), + + ("New Years Day", "2015-01-01", "2015-01-01"), + ("New Years Day", "2010-12-31", "2010-12-31"), + ("New Years Day", datetime(2015, 7, 1), []), + ("New Years Day", "2011-01-01", []), + + ("July 4th", "2015-07-03", "2015-07-03"), + ("July 4th", datetime(2015, 7, 1), []), + ("July 4th", "2015-07-04", []), + + ("Veterans Day", "2012-11-12", "2012-11-12"), + ("Veterans Day", datetime(2015, 7, 1), []), + ("Veterans Day", "2012-11-11", []), + + ("Christmas", "2011-12-26", "2011-12-26"), + ("Christmas", datetime(2015, 7, 1), []), + ("Christmas", "2011-12-25", []), +]) +def test_holidays_within_dates(holiday, start, expected): + # see gh-11477 + # + # Fix holiday behavior where holiday.dates returned dates outside + # start/end date, or observed rules could not be applied because the + # holiday was not in the original date range (e.g., 7/4/2015 -> 7/3/2015). + if isinstance(holiday, str): + calendar = get_calendar("USFederalHolidayCalendar") + holiday = calendar.rule_from_name(holiday) + + if isinstance(expected, str): + expected = [Timestamp(expected)] + + _check_holiday_results(holiday, start, start, expected) + + +@pytest.mark.parametrize("transform", [ + lambda x: x.strftime("%Y-%m-%d"), + lambda x: Timestamp(x) +]) +def test_argument_types(transform): + start_date = datetime(2011, 1, 1) + end_date = datetime(2020, 12, 31) + + holidays = USThanksgivingDay.dates(start_date, end_date) + holidays2 = USThanksgivingDay.dates( + transform(start_date), transform(end_date)) + tm.assert_index_equal(holidays, holidays2) + + +@pytest.mark.parametrize("name,kwargs", [ + ("One-Time", dict(year=2012, month=5, day=28)), + ("Range", dict(month=5, day=28, start_date=datetime(2012, 1, 1), + end_date=datetime(2012, 12, 31), + offset=DateOffset(weekday=MO(1)))) +]) +def test_special_holidays(name, kwargs): + base_date = [datetime(2012, 5, 28)] + holiday = Holiday(name, **kwargs) + + start_date = datetime(2011, 1, 1) + end_date = datetime(2020, 12, 31) + + assert base_date == holiday.dates(start_date, end_date) + + +def test_get_calendar(): + class TestCalendar(AbstractHolidayCalendar): + rules = [] + + calendar = get_calendar("TestCalendar") + assert TestCalendar == calendar.__class__ + + +def test_factory(): + class_1 = HolidayCalendarFactory("MemorialDay", + AbstractHolidayCalendar, + USMemorialDay) + class_2 = HolidayCalendarFactory("Thanksgiving", + AbstractHolidayCalendar, + USThanksgivingDay) + class_3 = HolidayCalendarFactory("Combined", class_1, class_2) + + assert len(class_1.rules) == 1 + assert len(class_2.rules) == 1 + assert len(class_3.rules) == 2 + + +def test_both_offset_observance_raises(): + # see gh-10217 + msg = "Cannot use both offset and observance" + with pytest.raises(NotImplementedError, match=msg): + Holiday("Cyber Monday", month=11, day=1, + offset=[DateOffset(weekday=SA(4))], + observance=next_monday) diff --git a/pandas/tests/tseries/holiday/test_observance.py b/pandas/tests/tseries/holiday/test_observance.py new file mode 100644 index 0000000000000..1c22918b2efd8 --- /dev/null +++ b/pandas/tests/tseries/holiday/test_observance.py @@ -0,0 +1,93 @@ +from datetime import datetime + +import pytest + +from pandas.tseries.holiday import ( + after_nearest_workday, before_nearest_workday, nearest_workday, + next_monday, next_monday_or_tuesday, next_workday, previous_friday, + previous_workday, sunday_to_monday, weekend_to_monday) + +_WEDNESDAY = datetime(2014, 4, 9) +_THURSDAY = datetime(2014, 4, 10) +_FRIDAY = datetime(2014, 4, 11) +_SATURDAY = datetime(2014, 4, 12) +_SUNDAY = datetime(2014, 4, 13) +_MONDAY = datetime(2014, 4, 14) +_TUESDAY = datetime(2014, 4, 15) + + +@pytest.mark.parametrize("day", [_SATURDAY, _SUNDAY]) +def test_next_monday(day): + assert next_monday(day) == _MONDAY + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _MONDAY), + (_SUNDAY, _TUESDAY), + (_MONDAY, _TUESDAY) +]) +def test_next_monday_or_tuesday(day, expected): + assert next_monday_or_tuesday(day) == expected + + +@pytest.mark.parametrize("day", [_SATURDAY, _SUNDAY]) +def test_previous_friday(day): + assert previous_friday(day) == _FRIDAY + + +def test_sunday_to_monday(): + assert sunday_to_monday(_SUNDAY) == _MONDAY + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _FRIDAY), + (_SUNDAY, _MONDAY), + (_MONDAY, _MONDAY) +]) +def test_nearest_workday(day, expected): + assert nearest_workday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _MONDAY), + (_SUNDAY, _MONDAY), + (_MONDAY, _MONDAY) +]) +def test_weekend_to_monday(day, expected): + assert weekend_to_monday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _MONDAY), + (_SUNDAY, _MONDAY), + (_MONDAY, _TUESDAY) +]) +def test_next_workday(day, expected): + assert next_workday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _FRIDAY), + (_SUNDAY, _FRIDAY), + (_TUESDAY, _MONDAY) +]) +def test_previous_workday(day, expected): + assert previous_workday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _THURSDAY), + (_SUNDAY, _FRIDAY), + (_TUESDAY, _MONDAY) +]) +def test_before_nearest_workday(day, expected): + assert before_nearest_workday(day) == expected + + +@pytest.mark.parametrize("day,expected", [ + (_SATURDAY, _MONDAY), + (_SUNDAY, _TUESDAY), + (_FRIDAY, _MONDAY) +]) +def test_after_nearest_workday(day, expected): + assert after_nearest_workday(day) == expected diff --git a/pandas/tests/tseries/offsets/test_offsets.py b/pandas/tests/tseries/offsets/test_offsets.py index ac3955970587f..e6f21a7b47c3b 100644 --- a/pandas/tests/tseries/offsets/test_offsets.py +++ b/pandas/tests/tseries/offsets/test_offsets.py @@ -9,6 +9,7 @@ from pandas._libs.tslibs.frequencies import ( INVALID_FREQ_ERR_MSG, get_freq_code, get_freq_str) import pandas._libs.tslibs.offsets as liboffsets +from pandas._libs.tslibs.offsets import ApplyTypeError import pandas.compat as compat from pandas.compat import range from pandas.compat.numpy import np_datetime64_compat @@ -150,7 +151,8 @@ def test_sub(self): # offset2 attr return off = self.offset2 - with pytest.raises(Exception): + msg = "Cannot subtract datetime from offset" + with pytest.raises(TypeError, match=msg): off - self.d assert 2 * off - off == off @@ -257,6 +259,26 @@ def test_offset_n(self, offset_types): mul_offset = offset * 3 assert mul_offset.n == 3 + def test_offset_timedelta64_arg(self, offset_types): + # check that offset._validate_n raises TypeError on a timedelt64 + # object + off = self._get_offset(offset_types) + + td64 = np.timedelta64(4567, 's') + with pytest.raises(TypeError, match="argument must be an integer"): + type(off)(n=td64, **off.kwds) + + def test_offset_mul_ndarray(self, offset_types): + off = self._get_offset(offset_types) + + expected = np.array([[off, off * 2], [off * 3, off * 4]]) + + result = np.array([[1, 2], [3, 4]]) * off + tm.assert_numpy_array_equal(result, expected) + + result = off * np.array([[1, 2], [3, 4]]) + tm.assert_numpy_array_equal(result, expected) + def test_offset_freqstr(self, offset_types): offset = self._get_offset(offset_types) @@ -716,7 +738,10 @@ def test_apply_large_n(self): assert rs == xp def test_apply_corner(self): - pytest.raises(TypeError, BDay().apply, BMonthEnd()) + msg = ("Only know how to combine business day with datetime or" + " timedelta") + with pytest.raises(ApplyTypeError, match=msg): + BDay().apply(BMonthEnd()) class TestBusinessHour(Base): @@ -792,7 +817,8 @@ def test_sub(self): # we have to override test_sub here becasue self.offset2 is not # defined as self._offset(2) off = self.offset2 - with pytest.raises(Exception): + msg = "Cannot subtract datetime from offset" + with pytest.raises(TypeError, match=msg): off - self.d assert 2 * off - off == off @@ -1776,7 +1802,10 @@ def test_apply_large_n(self): assert rs == xp def test_apply_corner(self): - pytest.raises(Exception, CDay().apply, BMonthEnd()) + msg = ("Only know how to combine trading day with datetime, datetime64" + " or timedelta") + with pytest.raises(ApplyTypeError, match=msg): + CDay().apply(BMonthEnd()) def test_holidays(self): # Define a TradingDay offset diff --git a/pandas/tests/tseries/offsets/test_ticks.py b/pandas/tests/tseries/offsets/test_ticks.py index f4b012ec1897f..9a8251201f75f 100644 --- a/pandas/tests/tseries/offsets/test_ticks.py +++ b/pandas/tests/tseries/offsets/test_ticks.py @@ -11,6 +11,7 @@ import pytest from pandas import Timedelta, Timestamp +import pandas.util.testing as tm from pandas.tseries import offsets from pandas.tseries.offsets import Hour, Micro, Milli, Minute, Nano, Second @@ -262,6 +263,28 @@ def test_tick_division(cls): assert result.delta == off.delta / .001 +@pytest.mark.parametrize('cls', tick_classes) +def test_tick_rdiv(cls): + off = cls(10) + delta = off.delta + td64 = delta.to_timedelta64() + + with pytest.raises(TypeError): + 2 / off + with pytest.raises(TypeError): + 2.0 / off + + assert (td64 * 2.5) / off == 2.5 + + if cls is not Nano: + # skip pytimedelta for Nano since it gets dropped + assert (delta.to_pytimedelta() * 2) / off == 2 + + result = np.array([2 * td64, td64]) / off + expected = np.array([2., 1.]) + tm.assert_numpy_array_equal(result, expected) + + @pytest.mark.parametrize('cls1', tick_classes) @pytest.mark.parametrize('cls2', tick_classes) def test_tick_zero(cls1, cls2): diff --git a/pandas/tests/tseries/offsets/test_yqm_offsets.py b/pandas/tests/tseries/offsets/test_yqm_offsets.py index 8023ee3139dd5..9ee03d2e886f3 100644 --- a/pandas/tests/tseries/offsets/test_yqm_offsets.py +++ b/pandas/tests/tseries/offsets/test_yqm_offsets.py @@ -713,7 +713,8 @@ class TestYearBegin(Base): _offset = YearBegin def test_misspecified(self): - pytest.raises(ValueError, YearBegin, month=13) + with pytest.raises(ValueError, match="Month must go from 1 to 12"): + YearBegin(month=13) offset_cases = [] offset_cases.append((YearBegin(), { @@ -804,7 +805,8 @@ class TestYearEnd(Base): _offset = YearEnd def test_misspecified(self): - pytest.raises(ValueError, YearEnd, month=13) + with pytest.raises(ValueError, match="Month must go from 1 to 12"): + YearEnd(month=13) offset_cases = [] offset_cases.append((YearEnd(), { @@ -900,8 +902,11 @@ class TestBYearBegin(Base): _offset = BYearBegin def test_misspecified(self): - pytest.raises(ValueError, BYearBegin, month=13) - pytest.raises(ValueError, BYearEnd, month=13) + msg = "Month must go from 1 to 12" + with pytest.raises(ValueError, match=msg): + BYearBegin(month=13) + with pytest.raises(ValueError, match=msg): + BYearEnd(month=13) offset_cases = [] offset_cases.append((BYearBegin(), { @@ -993,8 +998,11 @@ class TestBYearEndLagged(Base): _offset = BYearEnd def test_bad_month_fail(self): - pytest.raises(Exception, BYearEnd, month=13) - pytest.raises(Exception, BYearEnd, month=0) + msg = "Month must go from 1 to 12" + with pytest.raises(ValueError, match=msg): + BYearEnd(month=13) + with pytest.raises(ValueError, match=msg): + BYearEnd(month=0) offset_cases = [] offset_cases.append((BYearEnd(month=6), { diff --git a/pandas/tests/tseries/test_frequencies.py b/pandas/tests/tseries/test_frequencies.py deleted file mode 100644 index eb4e63654b47b..0000000000000 --- a/pandas/tests/tseries/test_frequencies.py +++ /dev/null @@ -1,793 +0,0 @@ -from datetime import datetime, timedelta - -import numpy as np -import pytest - -from pandas._libs.tslibs import frequencies as libfrequencies, resolution -from pandas._libs.tslibs.ccalendar import MONTHS -from pandas._libs.tslibs.frequencies import ( - INVALID_FREQ_ERR_MSG, FreqGroup, _period_code_map, get_freq, get_freq_code) -import pandas.compat as compat -from pandas.compat import is_platform_windows, range - -from pandas import ( - DatetimeIndex, Index, Series, Timedelta, Timestamp, date_range, - period_range) -from pandas.core.tools.datetimes import to_datetime -import pandas.util.testing as tm - -import pandas.tseries.frequencies as frequencies -import pandas.tseries.offsets as offsets - - -class TestToOffset(object): - - def test_to_offset_multiple(self): - freqstr = '2h30min' - freqstr2 = '2h 30min' - - result = frequencies.to_offset(freqstr) - assert (result == frequencies.to_offset(freqstr2)) - expected = offsets.Minute(150) - assert (result == expected) - - freqstr = '2h30min15s' - result = frequencies.to_offset(freqstr) - expected = offsets.Second(150 * 60 + 15) - assert (result == expected) - - freqstr = '2h 60min' - result = frequencies.to_offset(freqstr) - expected = offsets.Hour(3) - assert (result == expected) - - freqstr = '2h 20.5min' - result = frequencies.to_offset(freqstr) - expected = offsets.Second(8430) - assert (result == expected) - - freqstr = '1.5min' - result = frequencies.to_offset(freqstr) - expected = offsets.Second(90) - assert (result == expected) - - freqstr = '0.5S' - result = frequencies.to_offset(freqstr) - expected = offsets.Milli(500) - assert (result == expected) - - freqstr = '15l500u' - result = frequencies.to_offset(freqstr) - expected = offsets.Micro(15500) - assert (result == expected) - - freqstr = '10s75L' - result = frequencies.to_offset(freqstr) - expected = offsets.Milli(10075) - assert (result == expected) - - freqstr = '1s0.25ms' - result = frequencies.to_offset(freqstr) - expected = offsets.Micro(1000250) - assert (result == expected) - - freqstr = '1s0.25L' - result = frequencies.to_offset(freqstr) - expected = offsets.Micro(1000250) - assert (result == expected) - - freqstr = '2800N' - result = frequencies.to_offset(freqstr) - expected = offsets.Nano(2800) - assert (result == expected) - - freqstr = '2SM' - result = frequencies.to_offset(freqstr) - expected = offsets.SemiMonthEnd(2) - assert (result == expected) - - freqstr = '2SM-16' - result = frequencies.to_offset(freqstr) - expected = offsets.SemiMonthEnd(2, day_of_month=16) - assert (result == expected) - - freqstr = '2SMS-14' - result = frequencies.to_offset(freqstr) - expected = offsets.SemiMonthBegin(2, day_of_month=14) - assert (result == expected) - - freqstr = '2SMS-15' - result = frequencies.to_offset(freqstr) - expected = offsets.SemiMonthBegin(2) - assert (result == expected) - - # malformed - with pytest.raises(ValueError, match='Invalid frequency: 2h20m'): - frequencies.to_offset('2h20m') - - def test_to_offset_negative(self): - freqstr = '-1S' - result = frequencies.to_offset(freqstr) - assert (result.n == -1) - - freqstr = '-5min10s' - result = frequencies.to_offset(freqstr) - assert (result.n == -310) - - freqstr = '-2SM' - result = frequencies.to_offset(freqstr) - assert (result.n == -2) - - freqstr = '-1SMS' - result = frequencies.to_offset(freqstr) - assert (result.n == -1) - - def test_to_offset_invalid(self): - # GH 13930 - with pytest.raises(ValueError, match='Invalid frequency: U1'): - frequencies.to_offset('U1') - with pytest.raises(ValueError, match='Invalid frequency: -U'): - frequencies.to_offset('-U') - with pytest.raises(ValueError, match='Invalid frequency: 3U1'): - frequencies.to_offset('3U1') - with pytest.raises(ValueError, match='Invalid frequency: -2-3U'): - frequencies.to_offset('-2-3U') - with pytest.raises(ValueError, match='Invalid frequency: -2D:3H'): - frequencies.to_offset('-2D:3H') - with pytest.raises(ValueError, match='Invalid frequency: 1.5.0S'): - frequencies.to_offset('1.5.0S') - - # split offsets with spaces are valid - assert frequencies.to_offset('2D 3H') == offsets.Hour(51) - assert frequencies.to_offset('2 D3 H') == offsets.Hour(51) - assert frequencies.to_offset('2 D 3 H') == offsets.Hour(51) - assert frequencies.to_offset(' 2 D 3 H ') == offsets.Hour(51) - assert frequencies.to_offset(' H ') == offsets.Hour() - assert frequencies.to_offset(' 3 H ') == offsets.Hour(3) - - # special cases - assert frequencies.to_offset('2SMS-15') == offsets.SemiMonthBegin(2) - with pytest.raises(ValueError, match='Invalid frequency: 2SMS-15-15'): - frequencies.to_offset('2SMS-15-15') - with pytest.raises(ValueError, match='Invalid frequency: 2SMS-15D'): - frequencies.to_offset('2SMS-15D') - - def test_to_offset_leading_zero(self): - freqstr = '00H 00T 01S' - result = frequencies.to_offset(freqstr) - assert (result.n == 1) - - freqstr = '-00H 03T 14S' - result = frequencies.to_offset(freqstr) - assert (result.n == -194) - - def test_to_offset_leading_plus(self): - freqstr = '+1d' - result = frequencies.to_offset(freqstr) - assert (result.n == 1) - - freqstr = '+2h30min' - result = frequencies.to_offset(freqstr) - assert (result.n == 150) - - for bad_freq in ['+-1d', '-+1h', '+1', '-7', '+d', '-m']: - with pytest.raises(ValueError, match='Invalid frequency:'): - frequencies.to_offset(bad_freq) - - def test_to_offset_pd_timedelta(self): - # Tests for #9064 - td = Timedelta(days=1, seconds=1) - result = frequencies.to_offset(td) - expected = offsets.Second(86401) - assert (expected == result) - - td = Timedelta(days=-1, seconds=1) - result = frequencies.to_offset(td) - expected = offsets.Second(-86399) - assert (expected == result) - - td = Timedelta(hours=1, minutes=10) - result = frequencies.to_offset(td) - expected = offsets.Minute(70) - assert (expected == result) - - td = Timedelta(hours=1, minutes=-10) - result = frequencies.to_offset(td) - expected = offsets.Minute(50) - assert (expected == result) - - td = Timedelta(weeks=1) - result = frequencies.to_offset(td) - expected = offsets.Day(7) - assert (expected == result) - - td1 = Timedelta(hours=1) - result1 = frequencies.to_offset(td1) - result2 = frequencies.to_offset('60min') - assert (result1 == result2) - - td = Timedelta(microseconds=1) - result = frequencies.to_offset(td) - expected = offsets.Micro(1) - assert (expected == result) - - td = Timedelta(microseconds=0) - pytest.raises(ValueError, lambda: frequencies.to_offset(td)) - - def test_anchored_shortcuts(self): - result = frequencies.to_offset('W') - expected = frequencies.to_offset('W-SUN') - assert (result == expected) - - result1 = frequencies.to_offset('Q') - result2 = frequencies.to_offset('Q-DEC') - expected = offsets.QuarterEnd(startingMonth=12) - assert (result1 == expected) - assert (result2 == expected) - - result1 = frequencies.to_offset('Q-MAY') - expected = offsets.QuarterEnd(startingMonth=5) - assert (result1 == expected) - - result1 = frequencies.to_offset('SM') - result2 = frequencies.to_offset('SM-15') - expected = offsets.SemiMonthEnd(day_of_month=15) - assert (result1 == expected) - assert (result2 == expected) - - result = frequencies.to_offset('SM-1') - expected = offsets.SemiMonthEnd(day_of_month=1) - assert (result == expected) - - result = frequencies.to_offset('SM-27') - expected = offsets.SemiMonthEnd(day_of_month=27) - assert (result == expected) - - result = frequencies.to_offset('SMS-2') - expected = offsets.SemiMonthBegin(day_of_month=2) - assert (result == expected) - - result = frequencies.to_offset('SMS-27') - expected = offsets.SemiMonthBegin(day_of_month=27) - assert (result == expected) - - # ensure invalid cases fail as expected - invalid_anchors = ['SM-0', 'SM-28', 'SM-29', - 'SM-FOO', 'BSM', 'SM--1', - 'SMS-1', 'SMS-28', 'SMS-30', - 'SMS-BAR', 'SMS-BYR' 'BSMS', - 'SMS--2'] - for invalid_anchor in invalid_anchors: - with pytest.raises(ValueError, match='Invalid frequency: '): - frequencies.to_offset(invalid_anchor) - - -def test_ms_vs_MS(): - left = frequencies.get_offset('ms') - right = frequencies.get_offset('MS') - assert left == offsets.Milli() - assert right == offsets.MonthBegin() - - -def test_rule_aliases(): - rule = frequencies.to_offset('10us') - assert rule == offsets.Micro(10) - - -class TestFrequencyCode(object): - - def test_freq_code(self): - assert get_freq('A') == 1000 - assert get_freq('3A') == 1000 - assert get_freq('-1A') == 1000 - - assert get_freq('Y') == 1000 - assert get_freq('3Y') == 1000 - assert get_freq('-1Y') == 1000 - - assert get_freq('W') == 4000 - assert get_freq('W-MON') == 4001 - assert get_freq('W-FRI') == 4005 - - for freqstr, code in compat.iteritems(_period_code_map): - result = get_freq(freqstr) - assert result == code - - result = resolution.get_freq_group(freqstr) - assert result == code // 1000 * 1000 - - result = resolution.get_freq_group(code) - assert result == code // 1000 * 1000 - - def test_freq_group(self): - assert resolution.get_freq_group('A') == 1000 - assert resolution.get_freq_group('3A') == 1000 - assert resolution.get_freq_group('-1A') == 1000 - assert resolution.get_freq_group('A-JAN') == 1000 - assert resolution.get_freq_group('A-MAY') == 1000 - - assert resolution.get_freq_group('Y') == 1000 - assert resolution.get_freq_group('3Y') == 1000 - assert resolution.get_freq_group('-1Y') == 1000 - assert resolution.get_freq_group('Y-JAN') == 1000 - assert resolution.get_freq_group('Y-MAY') == 1000 - - assert resolution.get_freq_group(offsets.YearEnd()) == 1000 - assert resolution.get_freq_group(offsets.YearEnd(month=1)) == 1000 - assert resolution.get_freq_group(offsets.YearEnd(month=5)) == 1000 - - assert resolution.get_freq_group('W') == 4000 - assert resolution.get_freq_group('W-MON') == 4000 - assert resolution.get_freq_group('W-FRI') == 4000 - assert resolution.get_freq_group(offsets.Week()) == 4000 - assert resolution.get_freq_group(offsets.Week(weekday=1)) == 4000 - assert resolution.get_freq_group(offsets.Week(weekday=5)) == 4000 - - def test_get_to_timestamp_base(self): - tsb = libfrequencies.get_to_timestamp_base - - assert (tsb(get_freq_code('D')[0]) == - get_freq_code('D')[0]) - assert (tsb(get_freq_code('W')[0]) == - get_freq_code('D')[0]) - assert (tsb(get_freq_code('M')[0]) == - get_freq_code('D')[0]) - - assert (tsb(get_freq_code('S')[0]) == - get_freq_code('S')[0]) - assert (tsb(get_freq_code('T')[0]) == - get_freq_code('S')[0]) - assert (tsb(get_freq_code('H')[0]) == - get_freq_code('S')[0]) - - def test_freq_to_reso(self): - Reso = resolution.Resolution - - assert Reso.get_str_from_freq('A') == 'year' - assert Reso.get_str_from_freq('Q') == 'quarter' - assert Reso.get_str_from_freq('M') == 'month' - assert Reso.get_str_from_freq('D') == 'day' - assert Reso.get_str_from_freq('H') == 'hour' - assert Reso.get_str_from_freq('T') == 'minute' - assert Reso.get_str_from_freq('S') == 'second' - assert Reso.get_str_from_freq('L') == 'millisecond' - assert Reso.get_str_from_freq('U') == 'microsecond' - assert Reso.get_str_from_freq('N') == 'nanosecond' - - for freq in ['A', 'Q', 'M', 'D', 'H', 'T', 'S', 'L', 'U', 'N']: - # check roundtrip - result = Reso.get_freq(Reso.get_str_from_freq(freq)) - assert freq == result - - for freq in ['D', 'H', 'T', 'S', 'L', 'U']: - result = Reso.get_freq(Reso.get_str(Reso.get_reso_from_freq(freq))) - assert freq == result - - def test_resolution_bumping(self): - # see gh-14378 - Reso = resolution.Resolution - - assert Reso.get_stride_from_decimal(1.5, 'T') == (90, 'S') - assert Reso.get_stride_from_decimal(62.4, 'T') == (3744, 'S') - assert Reso.get_stride_from_decimal(1.04, 'H') == (3744, 'S') - assert Reso.get_stride_from_decimal(1, 'D') == (1, 'D') - assert (Reso.get_stride_from_decimal(0.342931, 'H') == - (1234551600, 'U')) - assert Reso.get_stride_from_decimal(1.2345, 'D') == (106660800, 'L') - - with pytest.raises(ValueError): - Reso.get_stride_from_decimal(0.5, 'N') - - # too much precision in the input can prevent - with pytest.raises(ValueError): - Reso.get_stride_from_decimal(0.3429324798798269273987982, 'H') - - def test_get_freq_code(self): - # frequency str - assert (get_freq_code('A') == - (get_freq('A'), 1)) - assert (get_freq_code('3D') == - (get_freq('D'), 3)) - assert (get_freq_code('-2M') == - (get_freq('M'), -2)) - - # tuple - assert (get_freq_code(('D', 1)) == - (get_freq('D'), 1)) - assert (get_freq_code(('A', 3)) == - (get_freq('A'), 3)) - assert (get_freq_code(('M', -2)) == - (get_freq('M'), -2)) - - # numeric tuple - assert get_freq_code((1000, 1)) == (1000, 1) - - # offsets - assert (get_freq_code(offsets.Day()) == - (get_freq('D'), 1)) - assert (get_freq_code(offsets.Day(3)) == - (get_freq('D'), 3)) - assert (get_freq_code(offsets.Day(-2)) == - (get_freq('D'), -2)) - - assert (get_freq_code(offsets.MonthEnd()) == - (get_freq('M'), 1)) - assert (get_freq_code(offsets.MonthEnd(3)) == - (get_freq('M'), 3)) - assert (get_freq_code(offsets.MonthEnd(-2)) == - (get_freq('M'), -2)) - - assert (get_freq_code(offsets.Week()) == - (get_freq('W'), 1)) - assert (get_freq_code(offsets.Week(3)) == - (get_freq('W'), 3)) - assert (get_freq_code(offsets.Week(-2)) == - (get_freq('W'), -2)) - - # Monday is weekday=0 - assert (get_freq_code(offsets.Week(weekday=1)) == - (get_freq('W-TUE'), 1)) - assert (get_freq_code(offsets.Week(3, weekday=0)) == - (get_freq('W-MON'), 3)) - assert (get_freq_code(offsets.Week(-2, weekday=4)) == - (get_freq('W-FRI'), -2)) - - def test_frequency_misc(self): - assert (resolution.get_freq_group('T') == - FreqGroup.FR_MIN) - - code, stride = get_freq_code(offsets.Hour()) - assert code == FreqGroup.FR_HR - - code, stride = get_freq_code((5, 'T')) - assert code == FreqGroup.FR_MIN - assert stride == 5 - - offset = offsets.Hour() - result = frequencies.to_offset(offset) - assert result == offset - - result = frequencies.to_offset((5, 'T')) - expected = offsets.Minute(5) - assert result == expected - - with pytest.raises(ValueError, match='Invalid frequency'): - get_freq_code((5, 'baz')) - - with pytest.raises(ValueError, match='Invalid frequency'): - frequencies.to_offset('100foo') - - with pytest.raises(ValueError, match='Could not evaluate'): - frequencies.to_offset(('', '')) - - -_dti = DatetimeIndex - - -class TestFrequencyInference(object): - - def test_raise_if_period_index(self): - index = period_range(start="1/1/1990", periods=20, freq="M") - pytest.raises(TypeError, frequencies.infer_freq, index) - - def test_raise_if_too_few(self): - index = _dti(['12/31/1998', '1/3/1999']) - pytest.raises(ValueError, frequencies.infer_freq, index) - - def test_business_daily(self): - index = _dti(['01/01/1999', '1/4/1999', '1/5/1999']) - assert frequencies.infer_freq(index) == 'B' - - def test_business_daily_look_alike(self): - # GH 16624, do not infer 'B' when 'weekend' (2-day gap) in wrong place - index = _dti(['12/31/1998', '1/3/1999', '1/4/1999']) - assert frequencies.infer_freq(index) is None - - def test_day(self): - self._check_tick(timedelta(1), 'D') - - def test_day_corner(self): - index = _dti(['1/1/2000', '1/2/2000', '1/3/2000']) - assert frequencies.infer_freq(index) == 'D' - - def test_non_datetimeindex(self): - dates = to_datetime(['1/1/2000', '1/2/2000', '1/3/2000']) - assert frequencies.infer_freq(dates) == 'D' - - def test_hour(self): - self._check_tick(timedelta(hours=1), 'H') - - def test_minute(self): - self._check_tick(timedelta(minutes=1), 'T') - - def test_second(self): - self._check_tick(timedelta(seconds=1), 'S') - - def test_millisecond(self): - self._check_tick(timedelta(microseconds=1000), 'L') - - def test_microsecond(self): - self._check_tick(timedelta(microseconds=1), 'U') - - def test_nanosecond(self): - self._check_tick(np.timedelta64(1, 'ns'), 'N') - - def _check_tick(self, base_delta, code): - b = Timestamp(datetime.now()) - for i in range(1, 5): - inc = base_delta * i - index = _dti([b + inc * j for j in range(3)]) - if i > 1: - exp_freq = '%d%s' % (i, code) - else: - exp_freq = code - assert frequencies.infer_freq(index) == exp_freq - - index = _dti([b + base_delta * 7] + [b + base_delta * j for j in range( - 3)]) - assert frequencies.infer_freq(index) is None - - index = _dti([b + base_delta * j for j in range(3)] + [b + base_delta * - 7]) - - assert frequencies.infer_freq(index) is None - - def test_weekly(self): - days = ['MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'] - - for day in days: - self._check_generated_range('1/1/2000', 'W-%s' % day) - - def test_week_of_month(self): - days = ['MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'] - - for day in days: - for i in range(1, 5): - self._check_generated_range('1/1/2000', 'WOM-%d%s' % (i, day)) - - def test_fifth_week_of_month(self): - # Only supports freq up to WOM-4. See #9425 - func = lambda: date_range('2014-01-01', freq='WOM-5MON') - pytest.raises(ValueError, func) - - def test_fifth_week_of_month_infer(self): - # Only attempts to infer up to WOM-4. See #9425 - index = DatetimeIndex(["2014-03-31", "2014-06-30", "2015-03-30"]) - assert frequencies.infer_freq(index) is None - - def test_week_of_month_fake(self): - # All of these dates are on same day of week and are 4 or 5 weeks apart - index = DatetimeIndex(["2013-08-27", "2013-10-01", "2013-10-29", - "2013-11-26"]) - assert frequencies.infer_freq(index) != 'WOM-4TUE' - - def test_monthly(self): - self._check_generated_range('1/1/2000', 'M') - - def test_monthly_ambiguous(self): - rng = _dti(['1/31/2000', '2/29/2000', '3/31/2000']) - assert rng.inferred_freq == 'M' - - def test_business_monthly(self): - self._check_generated_range('1/1/2000', 'BM') - - def test_business_start_monthly(self): - self._check_generated_range('1/1/2000', 'BMS') - - def test_quarterly(self): - for month in ['JAN', 'FEB', 'MAR']: - self._check_generated_range('1/1/2000', 'Q-%s' % month) - - def test_annual(self): - for month in MONTHS: - self._check_generated_range('1/1/2000', 'A-%s' % month) - - def test_business_annual(self): - for month in MONTHS: - self._check_generated_range('1/1/2000', 'BA-%s' % month) - - def test_annual_ambiguous(self): - rng = _dti(['1/31/2000', '1/31/2001', '1/31/2002']) - assert rng.inferred_freq == 'A-JAN' - - def _check_generated_range(self, start, freq): - freq = freq.upper() - - gen = date_range(start, periods=7, freq=freq) - index = _dti(gen.values) - if not freq.startswith('Q-'): - assert frequencies.infer_freq(index) == gen.freqstr - else: - inf_freq = frequencies.infer_freq(index) - is_dec_range = inf_freq == 'Q-DEC' and gen.freqstr in ( - 'Q', 'Q-DEC', 'Q-SEP', 'Q-JUN', 'Q-MAR') - is_nov_range = inf_freq == 'Q-NOV' and gen.freqstr in ( - 'Q-NOV', 'Q-AUG', 'Q-MAY', 'Q-FEB') - is_oct_range = inf_freq == 'Q-OCT' and gen.freqstr in ( - 'Q-OCT', 'Q-JUL', 'Q-APR', 'Q-JAN') - assert is_dec_range or is_nov_range or is_oct_range - - gen = date_range(start, periods=5, freq=freq) - index = _dti(gen.values) - - if not freq.startswith('Q-'): - assert frequencies.infer_freq(index) == gen.freqstr - else: - inf_freq = frequencies.infer_freq(index) - is_dec_range = inf_freq == 'Q-DEC' and gen.freqstr in ( - 'Q', 'Q-DEC', 'Q-SEP', 'Q-JUN', 'Q-MAR') - is_nov_range = inf_freq == 'Q-NOV' and gen.freqstr in ( - 'Q-NOV', 'Q-AUG', 'Q-MAY', 'Q-FEB') - is_oct_range = inf_freq == 'Q-OCT' and gen.freqstr in ( - 'Q-OCT', 'Q-JUL', 'Q-APR', 'Q-JAN') - - assert is_dec_range or is_nov_range or is_oct_range - - def test_infer_freq(self): - rng = period_range('1959Q2', '2009Q3', freq='Q') - rng = Index(rng.to_timestamp('D', how='e').astype(object)) - assert rng.inferred_freq == 'Q-DEC' - - rng = period_range('1959Q2', '2009Q3', freq='Q-NOV') - rng = Index(rng.to_timestamp('D', how='e').astype(object)) - assert rng.inferred_freq == 'Q-NOV' - - rng = period_range('1959Q2', '2009Q3', freq='Q-OCT') - rng = Index(rng.to_timestamp('D', how='e').astype(object)) - assert rng.inferred_freq == 'Q-OCT' - - def test_infer_freq_tz(self): - - freqs = {'AS-JAN': - ['2009-01-01', '2010-01-01', '2011-01-01', '2012-01-01'], - 'Q-OCT': - ['2009-01-31', '2009-04-30', '2009-07-31', '2009-10-31'], - 'M': ['2010-11-30', '2010-12-31', '2011-01-31', '2011-02-28'], - 'W-SAT': - ['2010-12-25', '2011-01-01', '2011-01-08', '2011-01-15'], - 'D': ['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04'], - 'H': ['2011-12-31 22:00', '2011-12-31 23:00', - '2012-01-01 00:00', '2012-01-01 01:00']} - - # GH 7310 - for tz in [None, 'Australia/Sydney', 'Asia/Tokyo', 'Europe/Paris', - 'US/Pacific', 'US/Eastern']: - for expected, dates in compat.iteritems(freqs): - idx = DatetimeIndex(dates, tz=tz) - assert idx.inferred_freq == expected - - def test_infer_freq_tz_transition(self): - # Tests for #8772 - date_pairs = [['2013-11-02', '2013-11-5'], # Fall DST - ['2014-03-08', '2014-03-11'], # Spring DST - ['2014-01-01', '2014-01-03']] # Regular Time - freqs = ['3H', '10T', '3601S', '3600001L', '3600000001U', - '3600000000001N'] - - for tz in [None, 'Australia/Sydney', 'Asia/Tokyo', 'Europe/Paris', - 'US/Pacific', 'US/Eastern']: - for date_pair in date_pairs: - for freq in freqs: - idx = date_range(date_pair[0], date_pair[ - 1], freq=freq, tz=tz) - assert idx.inferred_freq == freq - - index = date_range("2013-11-03", periods=5, - freq="3H").tz_localize("America/Chicago") - assert index.inferred_freq is None - - def test_infer_freq_businesshour(self): - # GH 7905 - idx = DatetimeIndex( - ['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00', - '2014-07-01 12:00', '2014-07-01 13:00', '2014-07-01 14:00']) - # hourly freq in a day must result in 'H' - assert idx.inferred_freq == 'H' - - idx = DatetimeIndex( - ['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00', - '2014-07-01 12:00', '2014-07-01 13:00', '2014-07-01 14:00', - '2014-07-01 15:00', '2014-07-01 16:00', '2014-07-02 09:00', - '2014-07-02 10:00', '2014-07-02 11:00']) - assert idx.inferred_freq == 'BH' - - idx = DatetimeIndex( - ['2014-07-04 09:00', '2014-07-04 10:00', '2014-07-04 11:00', - '2014-07-04 12:00', '2014-07-04 13:00', '2014-07-04 14:00', - '2014-07-04 15:00', '2014-07-04 16:00', '2014-07-07 09:00', - '2014-07-07 10:00', '2014-07-07 11:00']) - assert idx.inferred_freq == 'BH' - - idx = DatetimeIndex( - ['2014-07-04 09:00', '2014-07-04 10:00', '2014-07-04 11:00', - '2014-07-04 12:00', '2014-07-04 13:00', '2014-07-04 14:00', - '2014-07-04 15:00', '2014-07-04 16:00', '2014-07-07 09:00', - '2014-07-07 10:00', '2014-07-07 11:00', '2014-07-07 12:00', - '2014-07-07 13:00', '2014-07-07 14:00', '2014-07-07 15:00', - '2014-07-07 16:00', '2014-07-08 09:00', '2014-07-08 10:00', - '2014-07-08 11:00', '2014-07-08 12:00', '2014-07-08 13:00', - '2014-07-08 14:00', '2014-07-08 15:00', '2014-07-08 16:00']) - assert idx.inferred_freq == 'BH' - - def test_not_monotonic(self): - rng = _dti(['1/31/2000', '1/31/2001', '1/31/2002']) - rng = rng[::-1] - assert rng.inferred_freq == '-1A-JAN' - - def test_non_datetimeindex2(self): - rng = _dti(['1/31/2000', '1/31/2001', '1/31/2002']) - - vals = rng.to_pydatetime() - - result = frequencies.infer_freq(vals) - assert result == rng.inferred_freq - - def test_invalid_index_types(self): - - # test all index types - for i in [tm.makeIntIndex(10), tm.makeFloatIndex(10), - tm.makePeriodIndex(10)]: - pytest.raises(TypeError, lambda: frequencies.infer_freq(i)) - - # GH 10822 - # odd error message on conversions to datetime for unicode - if not is_platform_windows(): - for i in [tm.makeStringIndex(10), tm.makeUnicodeIndex(10)]: - pytest.raises(ValueError, lambda: frequencies.infer_freq(i)) - - def test_string_datetimelike_compat(self): - - # GH 6463 - expected = frequencies.infer_freq(['2004-01', '2004-02', '2004-03', - '2004-04']) - result = frequencies.infer_freq(Index(['2004-01', '2004-02', '2004-03', - '2004-04'])) - assert result == expected - - def test_series(self): - - # GH6407 - # inferring series - - # invalid type of Series - for s in [Series(np.arange(10)), Series(np.arange(10.))]: - pytest.raises(TypeError, lambda: frequencies.infer_freq(s)) - - # a non-convertible string - pytest.raises(ValueError, lambda: frequencies.infer_freq( - Series(['foo', 'bar']))) - - # cannot infer on PeriodIndex - for freq in [None, 'L']: - s = Series(period_range('2013', periods=10, freq=freq)) - pytest.raises(TypeError, lambda: frequencies.infer_freq(s)) - - # DateTimeIndex - for freq in ['M', 'L', 'S']: - s = Series(date_range('20130101', periods=10, freq=freq)) - inferred = frequencies.infer_freq(s) - assert inferred == freq - - s = Series(date_range('20130101', '20130110')) - inferred = frequencies.infer_freq(s) - assert inferred == 'D' - - def test_legacy_offset_warnings(self): - freqs = ['WEEKDAY', 'EOM', 'W@MON', 'W@TUE', 'W@WED', 'W@THU', - 'W@FRI', 'W@SAT', 'W@SUN', 'Q@JAN', 'Q@FEB', 'Q@MAR', - 'A@JAN', 'A@FEB', 'A@MAR', 'A@APR', 'A@MAY', 'A@JUN', - 'A@JUL', 'A@AUG', 'A@SEP', 'A@OCT', 'A@NOV', 'A@DEC', - 'Y@JAN', 'WOM@1MON', 'WOM@2MON', 'WOM@3MON', - 'WOM@4MON', 'WOM@1TUE', 'WOM@2TUE', 'WOM@3TUE', - 'WOM@4TUE', 'WOM@1WED', 'WOM@2WED', 'WOM@3WED', - 'WOM@4WED', 'WOM@1THU', 'WOM@2THU', 'WOM@3THU', - 'WOM@4THU', 'WOM@1FRI', 'WOM@2FRI', 'WOM@3FRI', - 'WOM@4FRI'] - - msg = INVALID_FREQ_ERR_MSG - for freq in freqs: - with pytest.raises(ValueError, match=msg): - frequencies.get_offset(freq) - - with pytest.raises(ValueError, match=msg): - date_range('2011-01-01', periods=5, freq=freq) diff --git a/pandas/tests/tseries/test_holiday.py b/pandas/tests/tseries/test_holiday.py deleted file mode 100644 index 86f154ed1acc2..0000000000000 --- a/pandas/tests/tseries/test_holiday.py +++ /dev/null @@ -1,382 +0,0 @@ -from datetime import datetime - -import pytest -from pytz import utc - -from pandas import DatetimeIndex, compat -import pandas.util.testing as tm - -from pandas.tseries.holiday import ( - MO, SA, AbstractHolidayCalendar, DateOffset, EasterMonday, GoodFriday, - Holiday, HolidayCalendarFactory, Timestamp, USColumbusDay, - USFederalHolidayCalendar, USLaborDay, USMartinLutherKingJr, USMemorialDay, - USPresidentsDay, USThanksgivingDay, after_nearest_workday, - before_nearest_workday, get_calendar, nearest_workday, next_monday, - next_monday_or_tuesday, next_workday, previous_friday, previous_workday, - sunday_to_monday, weekend_to_monday) - - -class TestCalendar(object): - - def setup_method(self, method): - self.holiday_list = [ - datetime(2012, 1, 2), - datetime(2012, 1, 16), - datetime(2012, 2, 20), - datetime(2012, 5, 28), - datetime(2012, 7, 4), - datetime(2012, 9, 3), - datetime(2012, 10, 8), - datetime(2012, 11, 12), - datetime(2012, 11, 22), - datetime(2012, 12, 25)] - - self.start_date = datetime(2012, 1, 1) - self.end_date = datetime(2012, 12, 31) - - def test_calendar(self): - - calendar = USFederalHolidayCalendar() - holidays = calendar.holidays(self.start_date, self.end_date) - - holidays_1 = calendar.holidays( - self.start_date.strftime('%Y-%m-%d'), - self.end_date.strftime('%Y-%m-%d')) - holidays_2 = calendar.holidays( - Timestamp(self.start_date), - Timestamp(self.end_date)) - - assert list(holidays.to_pydatetime()) == self.holiday_list - assert list(holidays_1.to_pydatetime()) == self.holiday_list - assert list(holidays_2.to_pydatetime()) == self.holiday_list - - def test_calendar_caching(self): - # Test for issue #9552 - - class TestCalendar(AbstractHolidayCalendar): - - def __init__(self, name=None, rules=None): - super(TestCalendar, self).__init__(name=name, rules=rules) - - jan1 = TestCalendar(rules=[Holiday('jan1', year=2015, month=1, day=1)]) - jan2 = TestCalendar(rules=[Holiday('jan2', year=2015, month=1, day=2)]) - - tm.assert_index_equal(jan1.holidays(), DatetimeIndex(['01-Jan-2015'])) - tm.assert_index_equal(jan2.holidays(), DatetimeIndex(['02-Jan-2015'])) - - def test_calendar_observance_dates(self): - # Test for issue 11477 - USFedCal = get_calendar('USFederalHolidayCalendar') - holidays0 = USFedCal.holidays(datetime(2015, 7, 3), datetime( - 2015, 7, 3)) # <-- same start and end dates - holidays1 = USFedCal.holidays(datetime(2015, 7, 3), datetime( - 2015, 7, 6)) # <-- different start and end dates - holidays2 = USFedCal.holidays(datetime(2015, 7, 3), datetime( - 2015, 7, 3)) # <-- same start and end dates - - tm.assert_index_equal(holidays0, holidays1) - tm.assert_index_equal(holidays0, holidays2) - - def test_rule_from_name(self): - USFedCal = get_calendar('USFederalHolidayCalendar') - assert USFedCal.rule_from_name('Thanksgiving') == USThanksgivingDay - - -class TestHoliday(object): - - def setup_method(self, method): - self.start_date = datetime(2011, 1, 1) - self.end_date = datetime(2020, 12, 31) - - def check_results(self, holiday, start, end, expected): - assert list(holiday.dates(start, end)) == expected - - # Verify that timezone info is preserved. - assert (list(holiday.dates(utc.localize(Timestamp(start)), - utc.localize(Timestamp(end)))) == - [utc.localize(dt) for dt in expected]) - - def test_usmemorialday(self): - self.check_results(holiday=USMemorialDay, - start=self.start_date, - end=self.end_date, - expected=[ - datetime(2011, 5, 30), - datetime(2012, 5, 28), - datetime(2013, 5, 27), - datetime(2014, 5, 26), - datetime(2015, 5, 25), - datetime(2016, 5, 30), - datetime(2017, 5, 29), - datetime(2018, 5, 28), - datetime(2019, 5, 27), - datetime(2020, 5, 25), - ], ) - - def test_non_observed_holiday(self): - - self.check_results( - Holiday('July 4th Eve', month=7, day=3), - start="2001-01-01", - end="2003-03-03", - expected=[ - Timestamp('2001-07-03 00:00:00'), - Timestamp('2002-07-03 00:00:00') - ] - ) - - self.check_results( - Holiday('July 4th Eve', month=7, day=3, days_of_week=(0, 1, 2, 3)), - start="2001-01-01", - end="2008-03-03", - expected=[ - Timestamp('2001-07-03 00:00:00'), - Timestamp('2002-07-03 00:00:00'), - Timestamp('2003-07-03 00:00:00'), - Timestamp('2006-07-03 00:00:00'), - Timestamp('2007-07-03 00:00:00'), - ] - ) - - def test_easter(self): - - self.check_results(EasterMonday, - start=self.start_date, - end=self.end_date, - expected=[ - Timestamp('2011-04-25 00:00:00'), - Timestamp('2012-04-09 00:00:00'), - Timestamp('2013-04-01 00:00:00'), - Timestamp('2014-04-21 00:00:00'), - Timestamp('2015-04-06 00:00:00'), - Timestamp('2016-03-28 00:00:00'), - Timestamp('2017-04-17 00:00:00'), - Timestamp('2018-04-02 00:00:00'), - Timestamp('2019-04-22 00:00:00'), - Timestamp('2020-04-13 00:00:00'), - ], ) - self.check_results(GoodFriday, - start=self.start_date, - end=self.end_date, - expected=[ - Timestamp('2011-04-22 00:00:00'), - Timestamp('2012-04-06 00:00:00'), - Timestamp('2013-03-29 00:00:00'), - Timestamp('2014-04-18 00:00:00'), - Timestamp('2015-04-03 00:00:00'), - Timestamp('2016-03-25 00:00:00'), - Timestamp('2017-04-14 00:00:00'), - Timestamp('2018-03-30 00:00:00'), - Timestamp('2019-04-19 00:00:00'), - Timestamp('2020-04-10 00:00:00'), - ], ) - - def test_usthanksgivingday(self): - - self.check_results(USThanksgivingDay, - start=self.start_date, - end=self.end_date, - expected=[ - datetime(2011, 11, 24), - datetime(2012, 11, 22), - datetime(2013, 11, 28), - datetime(2014, 11, 27), - datetime(2015, 11, 26), - datetime(2016, 11, 24), - datetime(2017, 11, 23), - datetime(2018, 11, 22), - datetime(2019, 11, 28), - datetime(2020, 11, 26), - ], ) - - def test_holidays_within_dates(self): - # Fix holiday behavior found in #11477 - # where holiday.dates returned dates outside start/end date - # or observed rules could not be applied as the holiday - # was not in the original date range (e.g., 7/4/2015 -> 7/3/2015) - start_date = datetime(2015, 7, 1) - end_date = datetime(2015, 7, 1) - - calendar = get_calendar('USFederalHolidayCalendar') - new_years = calendar.rule_from_name('New Years Day') - july_4th = calendar.rule_from_name('July 4th') - veterans_day = calendar.rule_from_name('Veterans Day') - christmas = calendar.rule_from_name('Christmas') - - # Holiday: (start/end date, holiday) - holidays = {USMemorialDay: ("2015-05-25", "2015-05-25"), - USLaborDay: ("2015-09-07", "2015-09-07"), - USColumbusDay: ("2015-10-12", "2015-10-12"), - USThanksgivingDay: ("2015-11-26", "2015-11-26"), - USMartinLutherKingJr: ("2015-01-19", "2015-01-19"), - USPresidentsDay: ("2015-02-16", "2015-02-16"), - GoodFriday: ("2015-04-03", "2015-04-03"), - EasterMonday: [("2015-04-06", "2015-04-06"), - ("2015-04-05", [])], - new_years: [("2015-01-01", "2015-01-01"), - ("2011-01-01", []), - ("2010-12-31", "2010-12-31")], - july_4th: [("2015-07-03", "2015-07-03"), - ("2015-07-04", [])], - veterans_day: [("2012-11-11", []), - ("2012-11-12", "2012-11-12")], - christmas: [("2011-12-25", []), - ("2011-12-26", "2011-12-26")]} - - for rule, dates in compat.iteritems(holidays): - empty_dates = rule.dates(start_date, end_date) - assert empty_dates.tolist() == [] - - if isinstance(dates, tuple): - dates = [dates] - - for start, expected in dates: - if len(expected): - expected = [Timestamp(expected)] - self.check_results(rule, start, start, expected) - - def test_argument_types(self): - holidays = USThanksgivingDay.dates(self.start_date, self.end_date) - - holidays_1 = USThanksgivingDay.dates( - self.start_date.strftime('%Y-%m-%d'), - self.end_date.strftime('%Y-%m-%d')) - - holidays_2 = USThanksgivingDay.dates( - Timestamp(self.start_date), - Timestamp(self.end_date)) - - tm.assert_index_equal(holidays, holidays_1) - tm.assert_index_equal(holidays, holidays_2) - - def test_special_holidays(self): - base_date = [datetime(2012, 5, 28)] - holiday_1 = Holiday('One-Time', year=2012, month=5, day=28) - holiday_2 = Holiday('Range', month=5, day=28, - start_date=datetime(2012, 1, 1), - end_date=datetime(2012, 12, 31), - offset=DateOffset(weekday=MO(1))) - - assert base_date == holiday_1.dates(self.start_date, self.end_date) - assert base_date == holiday_2.dates(self.start_date, self.end_date) - - def test_get_calendar(self): - class TestCalendar(AbstractHolidayCalendar): - rules = [] - - calendar = get_calendar('TestCalendar') - assert TestCalendar == calendar.__class__ - - def test_factory(self): - class_1 = HolidayCalendarFactory('MemorialDay', - AbstractHolidayCalendar, - USMemorialDay) - class_2 = HolidayCalendarFactory('Thansksgiving', - AbstractHolidayCalendar, - USThanksgivingDay) - class_3 = HolidayCalendarFactory('Combined', class_1, class_2) - - assert len(class_1.rules) == 1 - assert len(class_2.rules) == 1 - assert len(class_3.rules) == 2 - - -class TestObservanceRules(object): - - def setup_method(self, method): - self.we = datetime(2014, 4, 9) - self.th = datetime(2014, 4, 10) - self.fr = datetime(2014, 4, 11) - self.sa = datetime(2014, 4, 12) - self.su = datetime(2014, 4, 13) - self.mo = datetime(2014, 4, 14) - self.tu = datetime(2014, 4, 15) - - def test_next_monday(self): - assert next_monday(self.sa) == self.mo - assert next_monday(self.su) == self.mo - - def test_next_monday_or_tuesday(self): - assert next_monday_or_tuesday(self.sa) == self.mo - assert next_monday_or_tuesday(self.su) == self.tu - assert next_monday_or_tuesday(self.mo) == self.tu - - def test_previous_friday(self): - assert previous_friday(self.sa) == self.fr - assert previous_friday(self.su) == self.fr - - def test_sunday_to_monday(self): - assert sunday_to_monday(self.su) == self.mo - - def test_nearest_workday(self): - assert nearest_workday(self.sa) == self.fr - assert nearest_workday(self.su) == self.mo - assert nearest_workday(self.mo) == self.mo - - def test_weekend_to_monday(self): - assert weekend_to_monday(self.sa) == self.mo - assert weekend_to_monday(self.su) == self.mo - assert weekend_to_monday(self.mo) == self.mo - - def test_next_workday(self): - assert next_workday(self.sa) == self.mo - assert next_workday(self.su) == self.mo - assert next_workday(self.mo) == self.tu - - def test_previous_workday(self): - assert previous_workday(self.sa) == self.fr - assert previous_workday(self.su) == self.fr - assert previous_workday(self.tu) == self.mo - - def test_before_nearest_workday(self): - assert before_nearest_workday(self.sa) == self.th - assert before_nearest_workday(self.su) == self.fr - assert before_nearest_workday(self.tu) == self.mo - - def test_after_nearest_workday(self): - assert after_nearest_workday(self.sa) == self.mo - assert after_nearest_workday(self.su) == self.tu - assert after_nearest_workday(self.fr) == self.mo - - -class TestFederalHolidayCalendar(object): - - def test_no_mlk_before_1986(self): - # see gh-10278 - class MLKCalendar(AbstractHolidayCalendar): - rules = [USMartinLutherKingJr] - - holidays = MLKCalendar().holidays(start='1984', - end='1988').to_pydatetime().tolist() - - # Testing to make sure holiday is not incorrectly observed before 1986 - assert holidays == [datetime(1986, 1, 20, 0, 0), - datetime(1987, 1, 19, 0, 0)] - - def test_memorial_day(self): - class MemorialDay(AbstractHolidayCalendar): - rules = [USMemorialDay] - - holidays = MemorialDay().holidays(start='1971', - end='1980').to_pydatetime().tolist() - - # Fixes 5/31 error and checked manually against Wikipedia - assert holidays == [datetime(1971, 5, 31, 0, 0), - datetime(1972, 5, 29, 0, 0), - datetime(1973, 5, 28, 0, 0), - datetime(1974, 5, 27, 0, 0), - datetime(1975, 5, 26, 0, 0), - datetime(1976, 5, 31, 0, 0), - datetime(1977, 5, 30, 0, 0), - datetime(1978, 5, 29, 0, 0), - datetime(1979, 5, 28, 0, 0)] - - -class TestHolidayConflictingArguments(object): - - def test_both_offset_observance_raises(self): - # see gh-10217 - with pytest.raises(NotImplementedError): - Holiday("Cyber Monday", month=11, day=1, - offset=[DateOffset(weekday=SA(4))], - observance=next_monday) diff --git a/pandas/tests/util/test_hashing.py b/pandas/tests/util/test_hashing.py index d36de931e2610..c80b4483c0482 100644 --- a/pandas/tests/util/test_hashing.py +++ b/pandas/tests/util/test_hashing.py @@ -257,8 +257,7 @@ def test_categorical_with_nan_consistency(): assert result[1] in expected -@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") -@pytest.mark.parametrize("obj", [pd.Timestamp("20130101"), tm.makePanel()]) +@pytest.mark.parametrize("obj", [pd.Timestamp("20130101")]) def test_pandas_errors(obj): msg = "Unexpected type for hashing" with pytest.raises(TypeError, match=msg): diff --git a/pandas/tseries/frequencies.py b/pandas/tseries/frequencies.py index c454db3bbdffc..1b782b430a1a7 100644 --- a/pandas/tseries/frequencies.py +++ b/pandas/tseries/frequencies.py @@ -67,8 +67,8 @@ def to_offset(freq): Returns ------- - delta : DateOffset - None if freq is None + DateOffset + None if freq is None. Raises ------ @@ -77,7 +77,7 @@ def to_offset(freq): See Also -------- - pandas.DateOffset + DateOffset Examples -------- @@ -214,7 +214,7 @@ def infer_freq(index, warn=True): Returns ------- - freq : string or None + str or None None if no discernible frequency TypeError if the index is not datetime-like ValueError if there are less than three values. @@ -300,7 +300,7 @@ def get_freq(self): Returns ------- - freqstr : str or None + str or None """ if not self.is_monotonic or not self.index._is_unique: return None diff --git a/pandas/util/_decorators.py b/pandas/util/_decorators.py index 86cd8b1e698c6..e92051ebbea9a 100644 --- a/pandas/util/_decorators.py +++ b/pandas/util/_decorators.py @@ -4,12 +4,13 @@ import warnings from pandas._libs.properties import cache_readonly # noqa -from pandas.compat import PY2, callable, signature +from pandas.compat import PY2, signature def deprecate(name, alternative, version, alt_name=None, klass=None, stacklevel=2, msg=None): - """Return a new function that emits a deprecation warning on use. + """ + Return a new function that emits a deprecation warning on use. To use this method for a deprecated function, another function `alternative` with the same signature must exist. The deprecated diff --git a/pandas/util/_tester.py b/pandas/util/_tester.py index 18e8d415459fd..19b1cc700261c 100644 --- a/pandas/util/_tester.py +++ b/pandas/util/_tester.py @@ -11,7 +11,7 @@ def test(extra_args=None): try: import pytest except ImportError: - raise ImportError("Need pytest>=3.0 to run tests") + raise ImportError("Need pytest>=4.0.2 to run tests") try: import hypothesis # noqa except ImportError: diff --git a/pandas/util/move.c b/pandas/util/move.c index 62860adb1c1f6..188d7b79b35d2 100644 --- a/pandas/util/move.c +++ b/pandas/util/move.c @@ -1,3 +1,12 @@ +/* +Copyright (c) 2019, PyData Development Team +All rights reserved. + +Distributed under the terms of the BSD Simplified License. + +The full license is in the LICENSE file, distributed with this software. +*/ + #include #define COMPILING_IN_PY2 (PY_VERSION_HEX <= 0x03000000) @@ -10,15 +19,15 @@ /* in python 3, we cannot intern bytes objects so this is always false */ #define PyString_CHECK_INTERNED(cs) 0 -#endif /* !COMPILING_IN_PY2 */ +#endif // !COMPILING_IN_PY2 #ifndef Py_TPFLAGS_HAVE_GETCHARBUFFER #define Py_TPFLAGS_HAVE_GETCHARBUFFER 0 -#endif +#endif // Py_TPFLAGS_HAVE_GETCHARBUFFER #ifndef Py_TPFLAGS_HAVE_NEWBUFFER #define Py_TPFLAGS_HAVE_NEWBUFFER 0 -#endif +#endif // Py_TPFLAGS_HAVE_NEWBUFFER static PyObject *badmove; /* bad move exception class */ @@ -31,15 +40,13 @@ typedef struct { static PyTypeObject stolenbuf_type; /* forward declare type */ static void -stolenbuf_dealloc(stolenbufobject *self) -{ +stolenbuf_dealloc(stolenbufobject *self) { Py_DECREF(self->invalid_bytes); PyObject_Del(self); } static int -stolenbuf_getbuffer(stolenbufobject *self, Py_buffer *view, int flags) -{ +stolenbuf_getbuffer(stolenbufobject *self, Py_buffer *view, int flags) { return PyBuffer_FillInfo(view, (PyObject*) self, (void*) PyString_AS_STRING(self->invalid_bytes), @@ -51,8 +58,8 @@ stolenbuf_getbuffer(stolenbufobject *self, Py_buffer *view, int flags) #if COMPILING_IN_PY2 static Py_ssize_t -stolenbuf_getreadwritebuf(stolenbufobject *self, Py_ssize_t segment, void **out) -{ +stolenbuf_getreadwritebuf(stolenbufobject *self, + Py_ssize_t segment, void **out) { if (segment != 0) { PyErr_SetString(PyExc_SystemError, "accessing non-existent string segment"); @@ -63,8 +70,7 @@ stolenbuf_getreadwritebuf(stolenbufobject *self, Py_ssize_t segment, void **out) } static Py_ssize_t -stolenbuf_getsegcount(stolenbufobject *self, Py_ssize_t *len) -{ +stolenbuf_getsegcount(stolenbufobject *self, Py_ssize_t *len) { if (len) { *len = PyString_GET_SIZE(self->invalid_bytes); } @@ -79,14 +85,14 @@ static PyBufferProcs stolenbuf_as_buffer = { (getbufferproc) stolenbuf_getbuffer, }; -#else /* Python 3 */ +#else // Python 3 static PyBufferProcs stolenbuf_as_buffer = { (getbufferproc) stolenbuf_getbuffer, NULL, }; -#endif /* COMPILING_IN_PY2 */ +#endif // COMPILING_IN_PY2 PyDoc_STRVAR(stolenbuf_doc, "A buffer that is wrapping a stolen bytes object's buffer."); @@ -157,8 +163,7 @@ PyDoc_STRVAR( however, if called through *unpacking like ``stolenbuf(*(a,))`` it would only have the one reference (the tuple). */ static PyObject* -move_into_mutable_buffer(PyObject *self, PyObject *bytes_rvalue) -{ +move_into_mutable_buffer(PyObject *self, PyObject *bytes_rvalue) { stolenbufobject *ret; if (!PyString_CheckExact(bytes_rvalue)) { @@ -203,7 +208,7 @@ static PyModuleDef move_module = { -1, methods, }; -#endif /* !COMPILING_IN_PY2 */ +#endif // !COMPILING_IN_PY2 PyDoc_STRVAR( badmove_doc, @@ -226,7 +231,7 @@ PyInit__move(void) #else #define ERROR_RETURN init_move(void) -#endif /* !COMPILING_IN_PY2 */ +#endif // !COMPILING_IN_PY2 { PyObject *m; @@ -245,7 +250,7 @@ init_move(void) if (!(m = PyModule_Create(&move_module))) #else if (!(m = Py_InitModule(MODULE_NAME, methods))) -#endif /* !COMPILING_IN_PY2 */ +#endif // !COMPILING_IN_PY2 { return ERROR_RETURN; } @@ -264,5 +269,5 @@ init_move(void) #if !COMPILING_IN_PY2 return m; -#endif /* !COMPILING_IN_PY2 */ +#endif // !COMPILING_IN_PY2 } diff --git a/pandas/util/testing.py b/pandas/util/testing.py index f441dd20f3982..a5ae1f6a4d960 100644 --- a/pandas/util/testing.py +++ b/pandas/util/testing.py @@ -1,5 +1,6 @@ from __future__ import division +from collections import Counter from contextlib import contextmanager from datetime import datetime from functools import wraps @@ -20,8 +21,8 @@ from pandas._libs import testing as _testing import pandas.compat as compat from pandas.compat import ( - PY2, PY3, Counter, callable, filter, httplib, lmap, lrange, lzip, map, - raise_with_traceback, range, string_types, u, unichr, zip) + PY2, PY3, filter, httplib, lmap, lrange, lzip, map, raise_with_traceback, + range, string_types, u, unichr, zip) from pandas.core.dtypes.common import ( is_bool, is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype, @@ -33,7 +34,7 @@ import pandas as pd from pandas import ( Categorical, CategoricalIndex, DataFrame, DatetimeIndex, Index, - IntervalIndex, MultiIndex, Panel, RangeIndex, Series, bdate_range) + IntervalIndex, MultiIndex, RangeIndex, Series, bdate_range) from pandas.core.algorithms import take_1d from pandas.core.arrays import ( DatetimeArray, ExtensionArray, IntervalArray, PeriodArray, TimedeltaArray, @@ -1502,69 +1503,6 @@ def assert_frame_equal(left, right, check_dtype=True, obj='DataFrame.iloc[:, {idx}]'.format(idx=i)) -def assert_panel_equal(left, right, - check_dtype=True, - check_panel_type=False, - check_less_precise=False, - check_names=False, - by_blocks=False, - obj='Panel'): - """Check that left and right Panels are equal. - - Parameters - ---------- - left : Panel (or nd) - right : Panel (or nd) - check_dtype : bool, default True - Whether to check the Panel dtype is identical. - check_panel_type : bool, default False - Whether to check the Panel class is identical. - check_less_precise : bool or int, default False - Specify comparison precision. Only used when check_exact is False. - 5 digits (False) or 3 digits (True) after decimal points are compared. - If int, then specify the digits to compare - check_names : bool, default True - Whether to check the Index names attribute. - by_blocks : bool, default False - Specify how to compare internal data. If False, compare by columns. - If True, compare by blocks. - obj : str, default 'Panel' - Specify the object name being compared, internally used to show - the appropriate assertion message. - """ - - if check_panel_type: - assert_class_equal(left, right, obj=obj) - - for axis in left._AXIS_ORDERS: - left_ind = getattr(left, axis) - right_ind = getattr(right, axis) - assert_index_equal(left_ind, right_ind, check_names=check_names) - - if by_blocks: - rblocks = right._to_dict_of_blocks() - lblocks = left._to_dict_of_blocks() - for dtype in list(set(list(lblocks.keys()) + list(rblocks.keys()))): - assert dtype in lblocks - assert dtype in rblocks - array_equivalent(lblocks[dtype].values, rblocks[dtype].values) - else: - - # can potentially be slow - for i, item in enumerate(left._get_axis(0)): - msg = "non-matching item (right) '{item}'".format(item=item) - assert item in right, msg - litem = left.iloc[i] - ritem = right.iloc[i] - assert_frame_equal(litem, ritem, - check_less_precise=check_less_precise, - check_names=check_names) - - for i, item in enumerate(right._get_axis(0)): - msg = "non-matching item (left) '{item}'".format(item=item) - assert item in left, msg - - def assert_equal(left, right, **kwargs): """ Wrapper for tm.assert_*_equal to dispatch to the appropriate test function. @@ -2051,22 +1989,6 @@ def makePeriodFrame(nper=None): return DataFrame(data) -def makePanel(nper=None): - with warnings.catch_warnings(record=True): - warnings.filterwarnings("ignore", "\\nPanel", FutureWarning) - cols = ['Item' + c for c in string.ascii_uppercase[:K - 1]] - data = {c: makeTimeDataFrame(nper) for c in cols} - return Panel.fromDict(data) - - -def makePeriodPanel(nper=None): - with warnings.catch_warnings(record=True): - warnings.filterwarnings("ignore", "\\nPanel", FutureWarning) - cols = ['Item' + c for c in string.ascii_uppercase[:K - 1]] - data = {c: makePeriodFrame(nper) for c in cols} - return Panel.fromDict(data) - - def makeCustomIndex(nentries, nlevels, prefix='#', names=False, ndupe_l=None, idx_type=None): """Create an index/multindex with given dimensions, levels, names, etc' @@ -2314,15 +2236,6 @@ def makeMissingDataframe(density=.9, random_state=None): return df -def add_nans(panel): - I, J, N = panel.shape - for i, item in enumerate(panel.items): - dm = panel[item] - for j, col in enumerate(dm.columns): - dm[col][:i + j] = np.NaN - return panel - - class TestSubDict(dict): def __init__(self, *args, **kwargs): diff --git a/requirements-dev.txt b/requirements-dev.txt index 76aaeefa648f4..be84c6f29fdeb 100644 --- a/requirements-dev.txt +++ b/requirements-dev.txt @@ -10,7 +10,8 @@ gitpython hypothesis>=3.82 isort moto -pytest>=4.0 +pytest>=4.0.2 +pytest-mock sphinx numpydoc beautifulsoup4>=4.2.1 diff --git a/scripts/tests/test_validate_docstrings.py b/scripts/tests/test_validate_docstrings.py index bb58449843096..09fb5a30cbc3b 100644 --- a/scripts/tests/test_validate_docstrings.py +++ b/scripts/tests/test_validate_docstrings.py @@ -4,6 +4,8 @@ import textwrap import pytest import numpy as np +import pandas as pd + import validate_docstrings validate_one = validate_docstrings.validate_one @@ -1004,6 +1006,32 @@ def test_item_subsection(self, idx, subsection): assert result[idx][3] == subsection +class TestDocstringClass(object): + @pytest.mark.parametrize('name, expected_obj', + [('pandas.isnull', pd.isnull), + ('pandas.DataFrame', pd.DataFrame), + ('pandas.Series.sum', pd.Series.sum)]) + def test_resolves_class_name(self, name, expected_obj): + d = validate_docstrings.Docstring(name) + assert d.obj is expected_obj + + @pytest.mark.parametrize('invalid_name', ['panda', 'panda.DataFrame']) + def test_raises_for_invalid_module_name(self, invalid_name): + msg = 'No module can be imported from "{}"'.format(invalid_name) + with pytest.raises(ImportError, match=msg): + validate_docstrings.Docstring(invalid_name) + + @pytest.mark.parametrize('invalid_name', + ['pandas.BadClassName', + 'pandas.Series.bad_method_name']) + def test_raises_for_invalid_attribute_name(self, invalid_name): + name_components = invalid_name.split('.') + obj_name, invalid_attr_name = name_components[-2], name_components[-1] + msg = "'{}' has no attribute '{}'".format(obj_name, invalid_attr_name) + with pytest.raises(AttributeError, match=msg): + validate_docstrings.Docstring(invalid_name) + + class TestMainFunction(object): def test_exit_status_for_validate_one(self, monkeypatch): monkeypatch.setattr( diff --git a/scripts/validate_docstrings.py b/scripts/validate_docstrings.py index bce33f7e78daa..20f32124a2532 100755 --- a/scripts/validate_docstrings.py +++ b/scripts/validate_docstrings.py @@ -267,7 +267,7 @@ def _load_obj(name): else: continue - if 'module' not in locals(): + if 'obj' not in locals(): raise ImportError('No module can be imported ' 'from "{}"'.format(name)) diff --git a/setup.cfg b/setup.cfg index 7155cc1013544..84b8f69a83f16 100644 --- a/setup.cfg +++ b/setup.cfg @@ -57,6 +57,7 @@ split_penalty_after_opening_bracket = 1000000 split_penalty_logical_operator = 30 [tool:pytest] +minversion = 4.0.2 testpaths = pandas markers = single: mark a test as single cpu only @@ -114,7 +115,6 @@ force_sort_within_sections=True skip= pandas/core/api.py, pandas/core/frame.py, - asv_bench/benchmarks/algorithms.py, asv_bench/benchmarks/attrs_caching.py, asv_bench/benchmarks/binary_ops.py, asv_bench/benchmarks/categoricals.py, @@ -153,3 +153,23 @@ skip= asv_bench/benchmarks/dtypes.py asv_bench/benchmarks/strings.py asv_bench/benchmarks/period.py + pandas/__init__.py + pandas/plotting/__init__.py + pandas/tests/extension/decimal/__init__.py + pandas/tests/extension/base/__init__.py + pandas/io/msgpack/__init__.py + pandas/io/json/__init__.py + pandas/io/clipboard/__init__.py + pandas/io/excel/__init__.py + pandas/compat/__init__.py + pandas/compat/numpy/__init__.py + pandas/core/arrays/__init__.py + pandas/core/groupby/__init__.py + pandas/core/internals/__init__.py + pandas/api/__init__.py + pandas/api/extensions/__init__.py + pandas/api/types/__init__.py + pandas/_libs/__init__.py + pandas/_libs/tslibs/__init__.py + pandas/util/__init__.py + pandas/arrays/__init__.py diff --git a/setup.py b/setup.py index 4bf040b8c8e20..c8d29a2e4be5a 100755 --- a/setup.py +++ b/setup.py @@ -450,7 +450,8 @@ def run(self): # Note: if not using `cythonize`, coverage can be enabled by # pinning `ext.cython_directives = directives` to each ext in extensions. # github.com/cython/cython/wiki/enhancements-compilerdirectives#in-setuppy -directives = {'linetrace': False} +directives = {'linetrace': False, + 'language_level': 2} macros = [] if linetrace: # https://pypkg.com/pypi/pytest-cython/f/tests/example-project/setup.py