`__ installed to perform the last two operations.
.. |svm| replace:: **strictly valid markup**
.. _svm: http://validator.w3.org/docs/help.html#validation_basics
@@ -561,9 +527,6 @@ parse HTML tables in the top-level pandas io function ``read_html``.
.. |lxml| replace:: **lxml**
.. _lxml: http://lxml.de
-.. |Anaconda| replace:: **Anaconda**
-.. _Anaconda: https://store.continuum.io/cshop/anaconda
-
Byte-Ordering Issues
--------------------
diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst
index c5a77770085d6..45af02cb60b25 100644
--- a/doc/source/groupby.rst
+++ b/doc/source/groupby.rst
@@ -94,11 +94,21 @@ The mapping can be specified many different ways:
- For DataFrame objects, a string indicating a column to be used to group. Of
course ``df.groupby('A')`` is just syntactic sugar for
``df.groupby(df['A'])``, but it makes life simpler
+ - For DataFrame objects, a string indicating an index level to be used to group.
- A list of any of the above things
Collectively we refer to the grouping objects as the **keys**. For example,
consider the following DataFrame:
+.. note::
+
+ .. versionadded:: 0.20
+
+ A string passed to ``groupby`` may refer to either a column or an index level.
+ If a string matches both a column name and an index level name then a warning is
+ issued and the column takes precedence. This will result in an ambiguity error
+ in a future version.
+
.. ipython:: python
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
@@ -237,17 +247,6 @@ the length of the ``groups`` dict, so it is largely just a convenience:
gb.aggregate gb.count gb.cumprod gb.dtype gb.first gb.groups gb.hist gb.max gb.min gb.nth gb.prod gb.resample gb.sum gb.var
gb.apply gb.cummax gb.cumsum gb.fillna gb.gender gb.head gb.indices gb.mean gb.name gb.ohlc gb.quantile gb.size gb.tail gb.weight
-
-.. ipython:: python
- :suppress:
-
- df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
- 'foo', 'bar', 'foo', 'foo'],
- 'B' : ['one', 'one', 'two', 'three',
- 'two', 'two', 'one', 'three'],
- 'C' : np.random.randn(8),
- 'D' : np.random.randn(8)})
-
.. _groupby.multiindex:
GroupBy with MultiIndex
@@ -289,7 +288,9 @@ chosen level:
s.sum(level='second')
-Also as of v0.6, grouping with multiple levels is supported.
+.. versionadded:: 0.6
+
+Grouping with multiple levels is supported.
.. ipython:: python
:suppress:
@@ -306,8 +307,56 @@ Also as of v0.6, grouping with multiple levels is supported.
s
s.groupby(level=['first', 'second']).sum()
+.. versionadded:: 0.20
+
+Index level names may be supplied as keys.
+
+.. ipython:: python
+
+ s.groupby(['first', 'second']).sum()
+
More on the ``sum`` function and aggregation later.
+Grouping DataFrame with Index Levels and Columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A DataFrame may be grouped by a combination of columns and index levels by
+specifying the column names as strings and the index levels as ``pd.Grouper``
+objects.
+
+.. ipython:: python
+
+ arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
+ ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
+
+ index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
+
+ df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
+ 'B': np.arange(8)},
+ index=index)
+
+ df
+
+The following example groups ``df`` by the ``second`` index level and
+the ``A`` column.
+
+.. ipython:: python
+
+ df.groupby([pd.Grouper(level=1), 'A']).sum()
+
+Index levels may also be specified by name.
+
+.. ipython:: python
+
+ df.groupby([pd.Grouper(level='second'), 'A']).sum()
+
+.. versionadded:: 0.20
+
+Index level names may be specified as keys directly to ``groupby``.
+
+.. ipython:: python
+
+ df.groupby(['second', 'A']).sum()
+
DataFrame column selection in GroupBy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -315,6 +364,16 @@ Once you have created the GroupBy object from a DataFrame, for example, you
might want to do something different for each of the columns. Thus, using
``[]`` similar to getting a column from a DataFrame, you can do:
+.. ipython:: python
+ :suppress:
+
+ df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
+ 'foo', 'bar', 'foo', 'foo'],
+ 'B' : ['one', 'one', 'two', 'three',
+ 'two', 'two', 'one', 'three'],
+ 'C' : np.random.randn(8),
+ 'D' : np.random.randn(8)})
+
.. ipython:: python
grouped = df.groupby(['A'])
@@ -614,6 +673,54 @@ and that the transformed data contains no NAs.
grouped.ffill()
+
+.. _groupby.transform.window_resample:
+
+New syntax to window and resample operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. versionadded:: 0.18.1
+
+Working with the resample, expanding or rolling operations on the groupby
+level used to require the application of helper functions. However,
+now it is possible to use ``resample()``, ``expanding()`` and
+``rolling()`` as methods on groupbys.
+
+The example below will apply the ``rolling()`` method on the samples of
+the column B based on the groups of column A.
+
+.. ipython:: python
+
+ df_re = pd.DataFrame({'A': [1] * 10 + [5] * 10,
+ 'B': np.arange(20)})
+ df_re
+
+ df_re.groupby('A').rolling(4).B.mean()
+
+
+The ``expanding()`` method will accumulate a given operation
+(``sum()`` in the example) for all the members of each particular
+group.
+
+.. ipython:: python
+
+ df_re.groupby('A').expanding().sum()
+
+
+Suppose you want to use the ``resample()`` method to get a daily
+frequency in each group of your dataframe and wish to complete the
+missing values with the ``ffill()`` method.
+
+.. ipython:: python
+
+ df_re = pd.DataFrame({'date': pd.date_range(start='2016-01-01',
+ periods=4,
+ freq='W'),
+ 'group': [1, 1, 2, 2],
+ 'val': [5, 6, 7, 8]}).set_index('date')
+ df_re
+
+ df_re.groupby('group').resample('1D').ffill()
+
.. _groupby.filter:
Filtration
diff --git a/doc/source/html-styling.ipynb b/doc/source/html-styling.ipynb
index e55712b2bb4f6..1a97378fd30b1 100644
--- a/doc/source/html-styling.ipynb
+++ b/doc/source/html-styling.ipynb
@@ -6,9 +6,9 @@
"source": [
"*New in version 0.17.1*\n",
"\n",
- "*Provisional: This is a new feature and still under development. We'll be adding features and possibly making breaking changes in future releases. We'd love to hear your [feedback](https://github.com/pydata/pandas/issues).*
\n",
+ "
*Provisional: This is a new feature and still under development. We'll be adding features and possibly making breaking changes in future releases. We'd love to hear your [feedback](https://github.com/pandas-dev/pandas/issues).*
\n",
"\n",
- "This document is written as a Jupyter Notebook, and can be viewed or downloaded [here](http://nbviewer.ipython.org/github/pydata/pandas/blob/master/doc/source/html-styling.ipynb).\n",
+ "This document is written as a Jupyter Notebook, and can be viewed or downloaded [here](http://nbviewer.ipython.org/github/pandas-dev/pandas/blob/master/doc/source/html-styling.ipynb).\n",
"\n",
"You can apply **conditional formatting**, the visual styling of a DataFrame\n",
"depending on the data within, by using the ``DataFrame.style`` property.\n",
diff --git a/doc/source/index.rst.template b/doc/source/index.rst.template
index 1996ad75ea92a..67072ff9fb224 100644
--- a/doc/source/index.rst.template
+++ b/doc/source/index.rst.template
@@ -14,9 +14,9 @@ pandas: powerful Python data analysis toolkit
**Binary Installers:** http://pypi.python.org/pypi/pandas
-**Source Repository:** http://github.com/pydata/pandas
+**Source Repository:** http://github.com/pandas-dev/pandas
-**Issues & Ideas:** https://github.com/pydata/pandas/issues
+**Issues & Ideas:** https://github.com/pandas-dev/pandas/issues
**Q&A Support:** http://stackoverflow.com/questions/tagged/pandas
diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst
index 0a6691936d97d..1ea6662a4edb0 100644
--- a/doc/source/indexing.rst
+++ b/doc/source/indexing.rst
@@ -1467,6 +1467,10 @@ with duplicates dropped.
idx1.symmetric_difference(idx2)
idx1 ^ idx2
+.. note::
+
+ The resulting index from a set operation will be sorted in ascending order.
+
Missing values
~~~~~~~~~~~~~~
diff --git a/doc/source/install.rst b/doc/source/install.rst
index 923c22aa9048f..d45b8765cfd8a 100644
--- a/doc/source/install.rst
+++ b/doc/source/install.rst
@@ -18,7 +18,7 @@ Instructions for installing from source,
Python version support
----------------------
-Officially Python 2.7, 3.4, and 3.5
+Officially Python 2.7, 3.4, 3.5, and 3.6
Installing pandas
-----------------
@@ -243,7 +243,7 @@ Optional Dependencies
~~~~~~~~~~~~~~~~~~~~~
* `Cython `__: Only necessary to build development
- version. Version 0.19.1 or higher.
+ version. Version 0.23 or higher.
* `SciPy `__: miscellaneous statistical functions
* `xarray `__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.
* `PyTables `__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended.
diff --git a/doc/source/io.rst b/doc/source/io.rst
index ae71587c8b46b..17c7653072526 100644
--- a/doc/source/io.rst
+++ b/doc/source/io.rst
@@ -126,13 +126,23 @@ index_col : int or sequence or ``False``, default ``None``
MultiIndex is used. If you have a malformed file with delimiters at the end of
each line, you might consider ``index_col=False`` to force pandas to *not* use
the first column as the index (row names).
-usecols : array-like, default ``None``
- Return a subset of the columns. All elements in this array must either
+usecols : array-like or callable, default ``None``
+ Return a subset of the columns. If array-like, all elements must either
be positional (i.e. integer indices into the document columns) or strings
that correspond to column names provided either by the user in `names` or
- inferred from the document header row(s). For example, a valid `usecols`
- parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Using this parameter
- results in much faster parsing time and lower memory usage.
+ inferred from the document header row(s). For example, a valid array-like
+ `usecols` parameter would be [0, 1, 2] or ['foo', 'bar', 'baz'].
+
+ If callable, the callable function will be evaluated against the column names,
+ returning names where the callable function evaluates to True:
+
+ .. ipython:: python
+
+ data = 'col1,col2,col3\na,b,1\na,b,2\nc,d,3'
+ pd.read_csv(StringIO(data))
+ pd.read_csv(StringIO(data), usecols=lambda x: x.upper() in ['COL1', 'COL3'])
+
+ Using this parameter results in much faster parsing time and lower memory usage.
as_recarray : boolean, default ``False``
DEPRECATED: this argument will be removed in a future version. Please call
``pd.read_csv(...).to_records()`` instead.
@@ -157,6 +167,9 @@ dtype : Type name or dict of column -> type, default ``None``
Data type for data or columns. E.g. ``{'a': np.float64, 'b': np.int32}``
(unsupported with ``engine='python'``). Use `str` or `object` to preserve and
not interpret dtype.
+
+ .. versionadded:: 0.20.0 support for the Python parser.
+
engine : {``'c'``, ``'python'``}
Parser engine to use. The C engine is faster while the python engine is
currently more feature-complete.
@@ -473,10 +486,9 @@ However, if you wanted for all the data to be coerced, no matter the type, then
using the ``converters`` argument of :func:`~pandas.read_csv` would certainly be
worth trying.
-.. note::
- The ``dtype`` option is currently only supported by the C engine.
- Specifying ``dtype`` with ``engine`` other than 'c' raises a
- ``ValueError``.
+ .. versionadded:: 0.20.0 support for the Python parser.
+
+ The ``dtype`` option is supported by the 'python' engine
.. note::
In some cases, reading in abnormal data with columns containing mixed dtypes
@@ -615,7 +627,9 @@ Filtering columns (``usecols``)
+++++++++++++++++++++++++++++++
The ``usecols`` argument allows you to select any subset of the columns in a
-file, either using the column names or position numbers:
+file, either using the column names, position numbers or a callable:
+
+.. versionadded:: 0.20.0 support for callable `usecols` arguments
.. ipython:: python
@@ -623,6 +637,7 @@ file, either using the column names or position numbers:
pd.read_csv(StringIO(data))
pd.read_csv(StringIO(data), usecols=['b', 'd'])
pd.read_csv(StringIO(data), usecols=[0, 2, 3])
+ pd.read_csv(StringIO(data), usecols=lambda x: x.upper() in ['A', 'C'])
Comments and Empty Lines
''''''''''''''''''''''''
@@ -852,6 +867,12 @@ data columns:
index_col=0) #index is the nominal column
df
+.. note::
+ If a column or index contains an unparseable date, the entire column or
+ index will be returned unaltered as an object data type. For non-standard
+ datetime parsing, use :func:`to_datetime` after ``pd.read_csv``.
+
+
.. note::
read_csv has a fast_path for parsing datetime strings in iso8601 format,
e.g "2000-01-01T00:01:02+00:00" and similar variations. If you can arrange
@@ -1165,8 +1186,8 @@ too many will cause an error by default:
In [28]: pd.read_csv(StringIO(data))
---------------------------------------------------------------------------
- CParserError Traceback (most recent call last)
- CParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4
+ ParserError Traceback (most recent call last)
+ ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4
You can elect to skip bad lines:
@@ -1266,11 +1287,22 @@ is whitespace).
df = pd.read_fwf('bar.csv', header=None, index_col=0)
df
+.. versionadded:: 0.20.0
+
+``read_fwf`` supports the ``dtype`` parameter for specifying the types of
+parsed columns to be different from the inferred type.
+
+.. ipython:: python
+
+ pd.read_fwf('bar.csv', header=None, index_col=0).dtypes
+ pd.read_fwf('bar.csv', header=None, dtype={2: 'object'}).dtypes
+
.. ipython:: python
:suppress:
os.remove('bar.csv')
+
Indexes
'''''''
@@ -2525,6 +2557,20 @@ missing data to recover integer dtype:
cfun = lambda x: int(x) if x else -1
read_excel('path_to_file.xls', 'Sheet1', converters={'MyInts': cfun})
+dtype Specifications
+++++++++++++++++++++
+
+.. versionadded:: 0.20
+
+As an alternative to converters, the type for an entire column can
+be specified using the `dtype` keyword, which takes a dictionary
+mapping column names to types. To interpret data with
+no type inference, use the type ``str`` or ``object``.
+
+.. code-block:: python
+
+ read_excel('path_to_file.xls', dtype={'MyInts': 'int64', 'MyText': str})
+
.. _io.excel_writer:
Writing Excel Files
@@ -2789,7 +2835,7 @@ both on the writing (serialization), and reading (deserialization).
| 0.17 / Python 3 | >=0.18 / any Python |
+----------------------+------------------------+
| 0.18 | >= 0.18 |
- +======================+========================+
+ +----------------------+------------------------+
Reading (files packed by older versions) is backward-compatibile, except for files packed with 0.17 in Python 2, in which case only they can only be unpacked in Python 2.
diff --git a/doc/source/merging.rst b/doc/source/merging.rst
index c6541a26c72b4..f95987afd4c77 100644
--- a/doc/source/merging.rst
+++ b/doc/source/merging.rst
@@ -692,6 +692,29 @@ either the left or right tables, the values in the joined table will be
p.plot([left, right], result,
labels=['left', 'right'], vertical=False);
plt.close('all');
+
+Here is another example with duplicate join keys in DataFrames:
+
+.. ipython:: python
+
+ left = pd.DataFrame({'A' : [1,2], 'B' : [2, 2]})
+
+ right = pd.DataFrame({'A' : [4,5,6], 'B': [2,2,2]})
+
+ result = pd.merge(left, right, on='B', how='outer')
+
+.. ipython:: python
+ :suppress:
+
+ @savefig merging_merge_on_key_dup.png
+ p.plot([left, right], result,
+ labels=['left', 'right'], vertical=False);
+ plt.close('all');
+
+.. warning::
+
+ Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions,
+ may result in memory overflow. It is the user' s responsibility to manage duplicate values in keys before joining large DataFrames.
.. _merging.indicator:
diff --git a/doc/source/release.rst b/doc/source/release.rst
index d210065f04459..622e9a53ff8f0 100644
--- a/doc/source/release.rst
+++ b/doc/source/release.rst
@@ -37,6 +37,50 @@ analysis / manipulation tool available in any language.
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
* Documentation: http://pandas.pydata.org
+
+pandas 0.19.1
+-------------
+
+**Release date:** November 3, 2016
+
+This is a minor bug-fix release from 0.19.0 and includes some small regression fixes,
+bug fixes and performance improvements.
+
+See the :ref:`v0.19.1 Whatsnew ` page for an overview of all
+bugs that have been fixed in 0.19.1.
+
+Thanks
+~~~~~~
+
+- Adam Chainz
+- Anthonios Partheniou
+- Arash Rouhani
+- Ben Kandel
+- Brandon M. Burroughs
+- Chris
+- chris-b1
+- Chris Warth
+- David Krych
+- dubourg
+- gfyoung
+- Iván Vallés Pérez
+- Jeff Reback
+- Joe Jevnik
+- Jon M. Mease
+- Joris Van den Bossche
+- Josh Owen
+- Keshav Ramaswamy
+- Larry Ren
+- mattrijk
+- Michael Felt
+- paul-mannino
+- Piotr Chromiec
+- Robert Bradshaw
+- Sinhrks
+- Thiago Serafim
+- Tom Bird
+
+
pandas 0.19.0
-------------
diff --git a/doc/source/remote_data.rst b/doc/source/remote_data.rst
index e2c713ac8519a..019aa82fed1aa 100644
--- a/doc/source/remote_data.rst
+++ b/doc/source/remote_data.rst
@@ -13,7 +13,7 @@ DataReader
The sub-package ``pandas.io.data`` is removed in favor of a separately
installable `pandas-datareader package
-`_. This will allow the data
+`_. This will allow the data
modules to be independently updated to your pandas installation. The API for
``pandas-datareader v0.1.1`` is the same as in ``pandas v0.16.1``.
(:issue:`8961`)
diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst
index 9ed2c42610b69..3a2c48834991f 100644
--- a/doc/source/reshaping.rst
+++ b/doc/source/reshaping.rst
@@ -323,6 +323,10 @@ Pivot tables
.. _reshaping.pivot:
+While ``pivot`` provides general purpose pivoting of DataFrames with various
+data types (strings, numerics, etc.), Pandas also provides the ``pivot_table``
+function for pivoting with aggregation of numeric data.
+
The function ``pandas.pivot_table`` can be used to create spreadsheet-style pivot
tables. See the :ref:`cookbook` for some advanced strategies
diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst
index 4132d25e9be48..9253124f7e8b2 100644
--- a/doc/source/timeseries.rst
+++ b/doc/source/timeseries.rst
@@ -1286,12 +1286,14 @@ secondly data into 5-minutely data). This is extremely common in, but not
limited to, financial applications.
``.resample()`` is a time-based groupby, followed by a reduction method on each of its groups.
+See some :ref:`cookbook examples ` for some advanced strategies
-.. note::
+Starting in version 0.18.1, the ``resample()`` function can be used directly from
+DataFrameGroupBy objects, see the :ref:`groupby docs `.
- ``.resample()`` is similar to using a ``.rolling()`` operation with a time-based offset, see a discussion `here `
+.. note::
-See some :ref:`cookbook examples ` for some advanced strategies
+ ``.resample()`` is similar to using a ``.rolling()`` operation with a time-based offset, see a discussion :ref:`here `
.. ipython:: python
diff --git a/doc/source/whatsnew.rst b/doc/source/whatsnew.rst
index 2a1f2cc47d48e..d6fb1c6a8f9cc 100644
--- a/doc/source/whatsnew.rst
+++ b/doc/source/whatsnew.rst
@@ -18,6 +18,10 @@ What's New
These are new features and improvements of note in each release.
+.. include:: whatsnew/v0.20.0.txt
+
+.. include:: whatsnew/v0.19.2.txt
+
.. include:: whatsnew/v0.19.1.txt
.. include:: whatsnew/v0.19.0.txt
diff --git a/doc/source/whatsnew/v0.13.0.txt b/doc/source/whatsnew/v0.13.0.txt
index 0944d849cfafd..6ecd4b487c798 100644
--- a/doc/source/whatsnew/v0.13.0.txt
+++ b/doc/source/whatsnew/v0.13.0.txt
@@ -600,7 +600,7 @@ Enhancements
.. ipython:: python
t = Timestamp('20130101 09:01:02')
- t + pd.datetools.Nano(123)
+ t + pd.tseries.offsets.Nano(123)
- A new method, ``isin`` for DataFrames, which plays nicely with boolean indexing. The argument to ``isin``, what we're comparing the DataFrame to, can be a DataFrame, Series, dict, or array of values. See :ref:`the docs` for more.
diff --git a/doc/source/whatsnew/v0.14.0.txt b/doc/source/whatsnew/v0.14.0.txt
index 181cd401c85d6..78f96e3c0e049 100644
--- a/doc/source/whatsnew/v0.14.0.txt
+++ b/doc/source/whatsnew/v0.14.0.txt
@@ -630,9 +630,9 @@ There are prior version deprecations that are taking effect as of 0.14.0.
- Remove ``unique`` keyword from :meth:`HDFStore.select_column` (:issue:`3256`)
- Remove ``inferTimeRule`` keyword from :func:`Timestamp.offset` (:issue:`391`)
- Remove ``name`` keyword from :func:`get_data_yahoo` and
- :func:`get_data_google` ( `commit b921d1a `__ )
+ :func:`get_data_google` ( `commit b921d1a `__ )
- Remove ``offset`` keyword from :class:`DatetimeIndex` constructor
- ( `commit 3136390 `__ )
+ ( `commit 3136390 `__ )
- Remove ``time_rule`` from several rolling-moment statistical functions, such
as :func:`rolling_sum` (:issue:`1042`)
- Removed neg ``-`` boolean operations on numpy arrays in favor of inv ``~``, as this is going to
diff --git a/doc/source/whatsnew/v0.16.1.txt b/doc/source/whatsnew/v0.16.1.txt
old mode 100755
new mode 100644
diff --git a/doc/source/whatsnew/v0.17.1.txt b/doc/source/whatsnew/v0.17.1.txt
old mode 100755
new mode 100644
index c25e0300a1050..17496c84b7181
--- a/doc/source/whatsnew/v0.17.1.txt
+++ b/doc/source/whatsnew/v0.17.1.txt
@@ -36,7 +36,7 @@ Conditional HTML Formatting
We'll be adding features an possibly making breaking changes in future
releases. Feedback is welcome_.
-.. _welcome: https://github.com/pydata/pandas/issues/11610
+.. _welcome: https://github.com/pandas-dev/pandas/issues/11610
We've added *experimental* support for conditional HTML formatting:
the visual styling of a DataFrame based on the data.
diff --git a/doc/source/whatsnew/v0.19.1.txt b/doc/source/whatsnew/v0.19.1.txt
index 5180b9a092f6c..545b4380d9b75 100644
--- a/doc/source/whatsnew/v0.19.1.txt
+++ b/doc/source/whatsnew/v0.19.1.txt
@@ -1,15 +1,12 @@
.. _whatsnew_0191:
-v0.19.1 (????, 2016)
----------------------
+v0.19.1 (November 3, 2016)
+--------------------------
-This is a minor bug-fix release from 0.19.0 and includes a large number of
-bug fixes along with several new features, enhancements, and performance improvements.
+This is a minor bug-fix release from 0.19.0 and includes some small regression fixes,
+bug fixes and performance improvements.
We recommend that all users upgrade to this version.
-Highlights include:
-
-
.. contents:: What's new in v0.19.1
:local:
:backlinks: none
@@ -21,9 +18,10 @@ Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Fixed performance regression in factorization of ``Period`` data (:issue:`14338`)
-- Improved Performance in ``.to_json()`` when ``lines=True`` (:issue:`14408`)
-
-
+- Fixed performance regression in ``Series.asof(where)`` when ``where`` is a scalar (:issue:`14461`)
+- Improved performance in ``DataFrame.asof(where)`` when ``where`` is a scalar (:issue:`14461`)
+- Improved performance in ``.to_json()`` when ``lines=True`` (:issue:`14408`)
+- Improved performance in certain types of `loc` indexing with a MultiIndex (:issue:`14551`).
.. _whatsnew_0191.bug_fixes:
@@ -31,20 +29,33 @@ Performance Improvements
Bug Fixes
~~~~~~~~~
-
-
-
+- Source installs from PyPI will now again work without ``cython`` installed, as in previous versions (:issue:`14204`)
+- Compat with Cython 0.25 for building (:issue:`14496`)
+- Fixed regression where user-provided file handles were closed in ``read_csv`` (c engine) (:issue:`14418`).
+- Fixed regression in ``DataFrame.quantile`` when missing values where present in some columns (:issue:`14357`).
+- Fixed regression in ``Index.difference`` where the ``freq`` of a ``DatetimeIndex`` was incorrectly set (:issue:`14323`)
+- Added back ``pandas.core.common.array_equivalent`` with a deprecation warning (:issue:`14555`).
+- Bug in ``pd.read_csv`` for the C engine in which quotation marks were improperly parsed in skipped rows (:issue:`14459`)
+- Bug in ``pd.read_csv`` for Python 2.x in which Unicode quote characters were no longer being respected (:issue:`14477`)
+- Fixed regression in ``Index.append`` when categorical indices were appended (:issue:`14545`).
+- Fixed regression in ``pd.DataFrame`` where constructor fails when given dict with ``None`` value (:issue:`14381`)
+- Fixed regression in ``DatetimeIndex._maybe_cast_slice_bound`` when index is empty (:issue:`14354`).
- Bug in localizing an ambiguous timezone when a boolean is passed (:issue:`14402`)
-
-
-
-
-
-
-
-
+- Bug in ``TimedeltaIndex`` addition with a Datetime-like object where addition overflow in the negative direction was not being caught (:issue:`14068`, :issue:`14453`)
+- Bug in string indexing against data with ``object`` ``Index`` may raise ``AttributeError`` (:issue:`14424`)
+- Corrrecly raise ``ValueError`` on empty input to ``pd.eval()`` and ``df.query()`` (:issue:`13139`)
+- Bug in ``RangeIndex.intersection`` when result is a empty set (:issue:`14364`).
+- Bug in groupby-transform broadcasting that could cause incorrect dtype coercion (:issue:`14457`)
+- Bug in ``Series.__setitem__`` which allowed mutating read-only arrays (:issue:`14359`).
+- Bug in ``DataFrame.insert`` where multiple calls with duplicate columns can fail (:issue:`14291`)
+- ``pd.merge()`` will raise ``ValueError`` with non-boolean parameters in passed boolean type arguments (:issue:`14434`)
+- Bug in ``Timestamp`` where dates very near the minimum (1677-09) could underflow on creation (:issue:`14415`)
- Bug in ``pd.concat`` where names of the ``keys`` were not propagated to the resulting ``MultiIndex`` (:issue:`14252`)
- Bug in ``pd.concat`` where ``axis`` cannot take string parameters ``'rows'`` or ``'columns'`` (:issue:`14369`)
+- Bug in ``pd.concat`` with dataframes heterogeneous in length and tuple ``keys`` (:issue:`14438`)
- Bug in ``MultiIndex.set_levels`` where illegal level values were still set after raising an error (:issue:`13754`)
- Bug in ``DataFrame.to_json`` where ``lines=True`` and a value contained a ``}`` character (:issue:`14391`)
- Bug in ``df.groupby`` causing an ``AttributeError`` when grouping a single index frame by a column and the index level (:issue`14327`)
+- Bug in ``df.groupby`` where ``TypeError`` raised when ``pd.Grouper(key=...)`` is passed in a list (:issue:`14334`)
+- Bug in ``pd.pivot_table`` may raise ``TypeError`` or ``ValueError`` when ``index`` or ``columns``
+ is not scalar and ``values`` is not specified (:issue:`14380`)
diff --git a/doc/source/whatsnew/v0.19.2.txt b/doc/source/whatsnew/v0.19.2.txt
new file mode 100644
index 0000000000000..82d43db667550
--- /dev/null
+++ b/doc/source/whatsnew/v0.19.2.txt
@@ -0,0 +1,93 @@
+.. _whatsnew_0192:
+
+v0.19.2 (December ??, 2016)
+---------------------------
+
+This is a minor bug-fix release from 0.19.1 and includes some small regression fixes,
+bug fixes and performance improvements.
+We recommend that all users upgrade to this version.
+
+Highlights include:
+
+- Compatibility with Python 3.6
+
+.. contents:: What's new in v0.19.2
+ :local:
+ :backlinks: none
+
+
+.. _whatsnew_0192.performance:
+
+Performance Improvements
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+- Improved performance of ``.replace()`` (:issue:`12745`)
+
+.. _whatsnew_0192.enhancements.other:
+
+Other Enhancements
+~~~~~~~~~~~~~~~~~~
+
+- ``pd.merge_asof()`` gained ``left_index``/``right_index`` and ``left_by``/``right_by`` arguments (:issue:`14253`)
+
+
+
+.. _whatsnew_0192.bug_fixes:
+
+Bug Fixes
+~~~~~~~~~
+
+- Compat with ``dateutil==2.6.0``; segfault reported in the testing suite (:issue:`14621`)
+- Allow ``nanoseconds`` in ``Timestamp.replace`` as a kwarg (:issue:`14621`)
+- Bug in ``pd.read_csv`` where reading files fails, if the number of headers is equal to the number of lines in the file (:issue:`14515`)
+- Bug in ``pd.read_csv`` for the Python engine in which an unhelpful error message was being raised when multi-char delimiters were not being respected with quotes (:issue:`14582`)
+- Fix bugs (:issue:`14734`, :issue:`13654`) in ``pd.read_sas`` and ``pandas.io.sas.sas7bdat.SAS7BDATReader`` that caused problems when reading a SAS file incrementally.
+- Bug in ``pd.read_csv`` for the Python engine in which an unhelpful error message was being raised when ``skipfooter`` was not being respected by Python's CSV library (:issue:`13879`)
+
+
+- Bug in ``.groupby(..., sort=True)`` of a non-lexsorted MultiIndex when grouping with multiple levels (:issue:`14776`)
+
+
+
+- Bug in ``pd.cut`` with negative values and a single bin (:issue:`14652`)
+- Bug in ``pd.to_numeric`` where a 0 was not unsigned on a ``downcast='unsigned'`` argument (:issue:`14401`)
+- Bug in plotting regular and irregular timeseries using shared axes
+ (``sharex=True`` or ``ax.twinx()``) (:issue:`13341`, :issue:`14322`).
+
+
+
+- Bug in not propogating exceptions in parsing invalid datetimes, noted in python 3.6 (:issue:`14561`)
+
+
+- Compat with python 3.6 for pickling of some offsets (:issue:`14685`)
+- Compat with python 3.6 for some indexing exception types (:issue:`14684`, :issue:`14689`)
+- Compat with python 3.6 for deprecation warnings in the test suite (:issue:`14681`)
+- Compat with python 3.6 for Timestamp pickles (:issue:`14689`)
+- Bug in resampling a ``DatetimeIndex`` in local TZ, covering a DST change, which would raise ``AmbiguousTimeError`` (:issue:`14682`)
+
+
+
+- Bug in ``HDFStore`` when writing a ``MultiIndex`` when using ``data_columns=True`` (:issue:`14435`)
+- Bug in ``HDFStore.append()`` when writing a ``Series`` and passing a ``min_itemsize`` argument containing a value for the ``index`` (:issue:`11412`)
+- Bug when writing to a ``HDFStore`` in ``table`` format with a ``min_itemsize`` value for the ``index`` and without asking to append (:issue:`10381`)
+- Bug in ``Series.groupby.nunique()`` raising an ``IndexError`` for an empty ``Series`` (:issue:`12553`)
+- Bug in ``DataFrame.nlargest`` and ``DataFrame.nsmallest`` when the index had duplicate values (:issue:`13412`)
+
+
+
+- Bug in clipboard functions on linux with python2 with unicode and separators (:issue:`13747`)
+- Bug in clipboard functions on Windows 10 and python 3 (:issue:`14362`, :issue:`12807`)
+- Bug in ``.to_clipboard()`` and Excel compat (:issue:`12529`)
+
+
+- Bug in ``pd.read_csv()`` in which the ``dtype`` parameter was not being respected for empty data (:issue:`14712`)
+- Bug in ``pd.read_csv()`` in which the ``nrows`` parameter was not being respected for large input when using the C engine for parsing (:issue:`7626`)
+
+
+- Bug in ``pd.merge_asof()`` could not handle timezone-aware DatetimeIndex when a tolerance was specified (:issue:`14844`)
+
+- Explicit check in ``to_stata`` and ``StataWriter`` for out-of-range values when writing doubles (:issue:`14618`)
+
+- Bug in ``.plot(kind='kde')`` which did not drop missing values to generate the KDE Plot, instead generating an empty plot. (:issue:`14821`)
+
+- Bug in ``unstack()`` if called with a list of column(s) as an argument, regardless of the dtypes of all columns, they get coerced to ``object`` (:issue:`11847`)
diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt
index 7fa9991138fba..2855cde95ac2a 100644
--- a/doc/source/whatsnew/v0.20.0.txt
+++ b/doc/source/whatsnew/v0.20.0.txt
@@ -9,10 +9,11 @@ users upgrade to this version.
Highlights include:
+- Building pandas for development now requires ``cython >= 0.23`` (:issue:`14831`)
Check the :ref:`API Changes ` and :ref:`deprecations ` before updating.
-.. contents:: What's new in v0.19.0
+.. contents:: What's new in v0.20.0
:local:
:backlinks: none
@@ -22,15 +23,67 @@ New features
~~~~~~~~~~~~
+``dtype`` keyword for data IO
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns is now supported with the ``'python'`` engine (:issue:`14295`). See the :ref:`io docs ` for more information.
+
+.. ipython:: python
+
+ data = "a,b\n1,2\n3,4"
+ pd.read_csv(StringIO(data), engine='python').dtypes
+ pd.read_csv(StringIO(data), engine='python', dtype={'a':'float64', 'b':'object'}).dtypes
+
+The ``dtype`` keyword argument is also now supported in the :func:`read_fwf` function for parsing
+fixed-width text files, and :func:`read_excel` for parsing Excel files.
+
+.. ipython:: python
+
+ data = "a b\n1 2\n3 4"
+ pd.read_fwf(StringIO(data)).dtypes
+ pd.read_fwf(StringIO(data), dtype={'a':'float64', 'b':'object'}).dtypes
+
+.. _whatsnew_0200.enhancements.groupby_access:
+
+Groupby Enhancements
+^^^^^^^^^^^^^^^^^^^^
+
+Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names (:issue:`5677`)
+
+.. ipython:: python
+
+ arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
+ ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
+
+ index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
+
+ df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
+ 'B': np.arange(8)},
+ index=index)
+ df
+
+ df.groupby(['second', 'A']).sum()
.. _whatsnew_0200.enhancements.other:
Other enhancements
^^^^^^^^^^^^^^^^^^
+- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)
+
+- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
+- Multiple offset aliases with decimal points are now supported (e.g. '0.5min' is parsed as '30s') (:issue:`8419`)
+- New ``UnsortedIndexError`` (subclass of ``KeyError``) raised when indexing/slicing into an
+ unsorted MultiIndex (:issue:`11897`). This allows differentiation between errors due to lack
+ of sorting or an incorrect key. See :ref:`here `
+- ``pd.cut`` and ``pd.qcut`` now support datetime64 and timedelta64 dtypes (:issue:`14714`)
+- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`)
+- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`14154`)
+- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
+- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
+- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
.. _whatsnew_0200.api_breaking:
@@ -41,6 +94,8 @@ Backwards incompatible API changes
.. _whatsnew_0200.api:
+- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)
+- ``SparseArray.cumsum()`` and ``SparseSeries.cumsum()`` will now always return ``SparseArray`` and ``SparseSeries`` respectively (:issue:`12855`)
@@ -48,11 +103,16 @@ Backwards incompatible API changes
Other API Changes
^^^^^^^^^^^^^^^^^
+
.. _whatsnew_0200.deprecations:
Deprecations
^^^^^^^^^^^^
+- ``Series.repeat()`` has deprecated the ``reps`` parameter in favor of ``repeats`` (:issue:`12662`)
+- ``Index.repeat()`` and ``MultiIndex.repeat()`` have deprecated the ``n`` parameter in favor of ``repeats`` (:issue:`12662`)
+- ``Categorical.searchsorted()`` and ``Series.searchsorted()`` have deprecated the ``v`` parameter in favor of ``value`` (:issue:`12662`)
+- ``TimedeltaIndex.searchsorted()``, ``DatetimeIndex.searchsorted()``, and ``PeriodIndex.searchsorted()`` have deprecated the ``key`` parameter in favor of ``value`` (:issue:`12662`)
@@ -63,6 +123,8 @@ Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- ``pd.to_datetime`` and ``pd.to_timedelta`` have dropped the ``coerce`` parameter in favor of ``errors`` (:issue:`13602`)
+- ``SparseArray.to_dense()`` has deprecated the ``fill`` parameter, as that parameter was not being respected (:issue:`14647`)
+- ``SparseSeries.to_dense()`` has deprecated the ``sparse_only`` parameter (:issue:`14647`)
@@ -72,7 +134,8 @@ Removal of prior version deprecations/changes
Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
-
+- Improved performance of ``pd.wide_to_long()`` (:issue:`14779`)
+- Increased performance of ``pd.factorize()`` by releasing the GIL with ``object`` dtype when inferred as strings (:issue:`14859`)
@@ -80,3 +143,20 @@ Performance Improvements
Bug Fixes
~~~~~~~~~
+
+- Bug in ``astype()`` where ``inf`` values were incorrectly converted to integers. Now raises error now with ``astype()`` for Series and DataFrames (:issue:`14265`)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`)
diff --git a/doc/source/whatsnew/v0.4.x.txt b/doc/source/whatsnew/v0.4.x.txt
index 4717b46a6bca8..237ea84425051 100644
--- a/doc/source/whatsnew/v0.4.x.txt
+++ b/doc/source/whatsnew/v0.4.x.txt
@@ -56,8 +56,8 @@ Performance Enhancements
- Wrote fast time series merging / joining methods in Cython. Will be
integrated later into DataFrame.join and related functions
-.. _ENH1b: https://github.com/pydata/pandas/commit/1ba56251f0013ff7cd8834e9486cef2b10098371
-.. _ENHdc: https://github.com/pydata/pandas/commit/dca3c5c5a6a3769ee01465baca04cfdfa66a4f76
-.. _ENHed: https://github.com/pydata/pandas/commit/edd9f1945fc010a57fa0ae3b3444d1fffe592591
-.. _ENH56: https://github.com/pydata/pandas/commit/56e0c9ffafac79ce262b55a6a13e1b10a88fbe93
+.. _ENH1b: https://github.com/pandas-dev/pandas/commit/1ba56251f0013ff7cd8834e9486cef2b10098371
+.. _ENHdc: https://github.com/pandas-dev/pandas/commit/dca3c5c5a6a3769ee01465baca04cfdfa66a4f76
+.. _ENHed: https://github.com/pandas-dev/pandas/commit/edd9f1945fc010a57fa0ae3b3444d1fffe592591
+.. _ENH56: https://github.com/pandas-dev/pandas/commit/56e0c9ffafac79ce262b55a6a13e1b10a88fbe93
diff --git a/doc/source/whatsnew/v0.5.0.txt b/doc/source/whatsnew/v0.5.0.txt
index 8b7e4721d136f..6fe6a02b08f70 100644
--- a/doc/source/whatsnew/v0.5.0.txt
+++ b/doc/source/whatsnew/v0.5.0.txt
@@ -39,5 +39,5 @@ Performance Enhancements
- VBENCH Significantly sped up conversion of nested dict into DataFrame (:issue:`212`)
- VBENCH Significantly speed up DataFrame ``__repr__`` and ``count`` on large mixed-type DataFrame objects
-.. _ENH61: https://github.com/pydata/pandas/commit/6141961
-.. _ENH5c: https://github.com/pydata/pandas/commit/5ca6ff5d822ee4ddef1ec0d87b6d83d8b4bbd3eb
+.. _ENH61: https://github.com/pandas-dev/pandas/commit/6141961
+.. _ENH5c: https://github.com/pandas-dev/pandas/commit/5ca6ff5d822ee4ddef1ec0d87b6d83d8b4bbd3eb
diff --git a/doc/sphinxext/numpydoc/LICENSE.txt b/doc/sphinxext/numpydoc/LICENSE.txt
old mode 100755
new mode 100644
diff --git a/pandas/api/tests/test_api.py b/pandas/api/tests/test_api.py
index d4d8b7e4e9747..49aa31c375e25 100644
--- a/pandas/api/tests/test_api.py
+++ b/pandas/api/tests/test_api.py
@@ -1,5 +1,7 @@
# -*- coding: utf-8 -*-
+import numpy as np
+
import pandas as pd
from pandas.core import common as com
from pandas import api
@@ -184,6 +186,11 @@ def test_deprecation_core_common(self):
for t in self.allowed:
self.check_deprecation(getattr(com, t), getattr(types, t))
+ def test_deprecation_core_common_array_equivalent(self):
+
+ with tm.assert_produces_warning(DeprecationWarning):
+ com.array_equivalent(np.array([1, 2]), np.array([1, 2]))
+
def test_deprecation_core_common_moved(self):
# these are in pandas.types.common
diff --git a/pandas/compat/__init__.py b/pandas/compat/__init__.py
index 1b8930dcae0f1..532f960468204 100644
--- a/pandas/compat/__init__.py
+++ b/pandas/compat/__init__.py
@@ -41,6 +41,7 @@
PY2 = sys.version_info[0] == 2
PY3 = (sys.version_info[0] >= 3)
PY35 = (sys.version_info >= (3, 5))
+PY36 = (sys.version_info >= (3, 6))
try:
import __builtin__ as builtins
diff --git a/pandas/computation/eval.py b/pandas/computation/eval.py
index 6c5c631a6bf0e..fffde4d9db867 100644
--- a/pandas/computation/eval.py
+++ b/pandas/computation/eval.py
@@ -233,6 +233,7 @@ def eval(expr, parser='pandas', engine=None, truediv=True,
"""
first_expr = True
if isinstance(expr, string_types):
+ _check_expression(expr)
exprs = [e for e in expr.splitlines() if e != '']
else:
exprs = [expr]
diff --git a/pandas/computation/tests/test_eval.py b/pandas/computation/tests/test_eval.py
index f480eae2dd04d..ffa2cb0684b72 100644
--- a/pandas/computation/tests/test_eval.py
+++ b/pandas/computation/tests/test_eval.py
@@ -1891,6 +1891,18 @@ def test_bad_resolver_raises():
yield check_bad_resolver_raises, engine, parser
+def check_empty_string_raises(engine, parser):
+ # GH 13139
+ tm.skip_if_no_ne(engine)
+ with tm.assertRaisesRegexp(ValueError, 'expr cannot be an empty string'):
+ pd.eval('', engine=engine, parser=parser)
+
+
+def test_empty_string_raises():
+ for engine, parser in ENGINES_PARSERS:
+ yield check_empty_string_raises, engine, parser
+
+
def check_more_than_one_expression_raises(engine, parser):
tm.skip_if_no_ne(engine)
with tm.assertRaisesRegexp(SyntaxError,
diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
index 8644d4568e44d..0d4d4143e6b9b 100644
--- a/pandas/core/algorithms.py
+++ b/pandas/core/algorithms.py
@@ -65,7 +65,7 @@ def match(to_match, values, na_sentinel=-1):
values = np.array(values, dtype='O')
f = lambda htype, caster: _match_generic(to_match, values, htype, caster)
- result = _hashtable_algo(f, values.dtype, np.int64)
+ result = _hashtable_algo(f, values, np.int64)
if na_sentinel != -1:
@@ -102,7 +102,7 @@ def unique(values):
values = com._asarray_tuplesafe(values)
f = lambda htype, caster: _unique_generic(values, htype, caster)
- return _hashtable_algo(f, values.dtype)
+ return _hashtable_algo(f, values)
def _unique_generic(values, table_type, type_caster):
@@ -684,11 +684,12 @@ def select_n_slow(dropped, n, keep, method):
_select_methods = {'nsmallest': nsmallest, 'nlargest': nlargest}
-def select_n(series, n, keep, method):
- """Implement n largest/smallest.
+def select_n_series(series, n, keep, method):
+ """Implement n largest/smallest for pandas Series
Parameters
----------
+ series : pandas.Series object
n : int
keep : {'first', 'last'}, default 'first'
method : str, {'nlargest', 'nsmallest'}
@@ -717,6 +718,31 @@ def select_n(series, n, keep, method):
return dropped.iloc[inds]
+def select_n_frame(frame, columns, n, method, keep):
+ """Implement n largest/smallest for pandas DataFrame
+
+ Parameters
+ ----------
+ frame : pandas.DataFrame object
+ columns : list or str
+ n : int
+ keep : {'first', 'last'}, default 'first'
+ method : str, {'nlargest', 'nsmallest'}
+
+ Returns
+ -------
+ nordered : DataFrame
+ """
+ from pandas.core.series import Series
+ if not is_list_like(columns):
+ columns = [columns]
+ columns = list(columns)
+ ser = getattr(frame[columns[0]], method)(n, keep=keep)
+ if isinstance(ser, Series):
+ ser = ser.to_frame()
+ return ser.merge(frame, on=columns[0], left_index=True)[frame.columns]
+
+
def _finalize_nsmallest(arr, kth_val, n, keep, narr):
ns, = np.nonzero(arr <= kth_val)
inds = ns[arr[ns].argsort(kind='mergesort')][:n]
@@ -733,10 +759,12 @@ def _finalize_nsmallest(arr, kth_val, n, keep, narr):
# helpers #
# ------- #
-def _hashtable_algo(f, dtype, return_dtype=None):
+def _hashtable_algo(f, values, return_dtype=None):
"""
f(HashTable, type_caster) -> result
"""
+
+ dtype = values.dtype
if is_float_dtype(dtype):
return f(htable.Float64HashTable, _ensure_float64)
elif is_integer_dtype(dtype):
@@ -747,17 +775,25 @@ def _hashtable_algo(f, dtype, return_dtype=None):
elif is_timedelta64_dtype(dtype):
return_dtype = return_dtype or 'm8[ns]'
return f(htable.Int64HashTable, _ensure_int64).view(return_dtype)
- else:
- return f(htable.PyObjectHashTable, _ensure_object)
+
+ # its cheaper to use a String Hash Table than Object
+ if lib.infer_dtype(values) in ['string']:
+ return f(htable.StringHashTable, _ensure_object)
+
+ # use Object
+ return f(htable.PyObjectHashTable, _ensure_object)
_hashtables = {
'float64': (htable.Float64HashTable, htable.Float64Vector),
'int64': (htable.Int64HashTable, htable.Int64Vector),
+ 'string': (htable.StringHashTable, htable.ObjectVector),
'generic': (htable.PyObjectHashTable, htable.ObjectVector)
}
def _get_data_algo(values, func_map):
+
+ f = None
if is_float_dtype(values):
f = func_map['float64']
values = _ensure_float64(values)
@@ -770,8 +806,19 @@ def _get_data_algo(values, func_map):
f = func_map['int64']
values = _ensure_int64(values)
else:
- f = func_map['generic']
+
values = _ensure_object(values)
+
+ # its cheaper to use a String Hash Table than Object
+ if lib.infer_dtype(values) in ['string']:
+ try:
+ f = func_map['string']
+ except KeyError:
+ pass
+
+ if f is None:
+ f = func_map['generic']
+
return f, values
diff --git a/pandas/core/base.py b/pandas/core/base.py
index b9a70292498e4..d412349447794 100644
--- a/pandas/core/base.py
+++ b/pandas/core/base.py
@@ -1091,12 +1091,12 @@ def factorize(self, sort=False, na_sentinel=-1):
"""Find indices where elements should be inserted to maintain order.
Find the indices into a sorted %(klass)s `self` such that, if the
- corresponding elements in `v` were inserted before the indices, the
- order of `self` would be preserved.
+ corresponding elements in `value` were inserted before the indices,
+ the order of `self` would be preserved.
Parameters
----------
- %(value)s : array_like
+ value : array_like
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
@@ -1109,7 +1109,7 @@ def factorize(self, sort=False, na_sentinel=-1):
Returns
-------
indices : array of ints
- Array of insertion points with the same shape as `v`.
+ Array of insertion points with the same shape as `value`.
See Also
--------
@@ -1149,11 +1149,12 @@ def factorize(self, sort=False, na_sentinel=-1):
array([3, 4]) # eggs before milk
""")
- @Substitution(klass='IndexOpsMixin', value='key')
+ @Substitution(klass='IndexOpsMixin')
@Appender(_shared_docs['searchsorted'])
- def searchsorted(self, key, side='left', sorter=None):
+ @deprecate_kwarg(old_arg_name='key', new_arg_name='value')
+ def searchsorted(self, value, side='left', sorter=None):
# needs coercion on the key (DatetimeIndex does already)
- return self.values.searchsorted(key, side=side, sorter=sorter)
+ return self.values.searchsorted(value, side=side, sorter=sorter)
_shared_docs['drop_duplicates'] = (
"""Return %(klass)s with duplicate values removed
diff --git a/pandas/core/categorical.py b/pandas/core/categorical.py
index 9efaff6060909..922fb84684729 100644
--- a/pandas/core/categorical.py
+++ b/pandas/core/categorical.py
@@ -1076,9 +1076,10 @@ def memory_usage(self, deep=False):
"""
return self._codes.nbytes + self._categories.memory_usage(deep=deep)
- @Substitution(klass='Categorical', value='v')
+ @Substitution(klass='Categorical')
@Appender(_shared_docs['searchsorted'])
- def searchsorted(self, v, side='left', sorter=None):
+ @deprecate_kwarg(old_arg_name='v', new_arg_name='value')
+ def searchsorted(self, value, side='left', sorter=None):
if not self.ordered:
raise ValueError("Categorical not ordered\nyou can use "
".as_ordered() to change the Categorical to an "
@@ -1086,7 +1087,7 @@ def searchsorted(self, v, side='left', sorter=None):
from pandas.core.series import Series
values_as_codes = self.categories.values.searchsorted(
- Series(v).values, side=side)
+ Series(value).values, side=side)
return self.codes.searchsorted(values_as_codes, sorter=sorter)
@@ -2055,14 +2056,14 @@ def _factorize_from_iterables(iterables):
Returns
-------
- codes_tuple : tuple of ndarrays
- categories_tuple : tuple of Indexes
+ codes_list : list of ndarrays
+ categories_list : list of Indexes
Notes
-----
See `_factorize_from_iterable` for more info.
"""
if len(iterables) == 0:
- # For consistency, it should return a list of 2 tuples.
- return [(), ()]
- return lzip(*[_factorize_from_iterable(it) for it in iterables])
+ # For consistency, it should return a list of 2 lists.
+ return [[], []]
+ return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
diff --git a/pandas/core/common.py b/pandas/core/common.py
index 341bd3b4cc845..fddac1f29d454 100644
--- a/pandas/core/common.py
+++ b/pandas/core/common.py
@@ -64,6 +64,15 @@ def wrapper(*args, **kwargs):
setattr(m, t, outer(t))
+# deprecate array_equivalent
+
+def array_equivalent(*args, **kwargs):
+ warnings.warn("'pandas.core.common.array_equivalent' is deprecated and "
+ "is no longer public API", DeprecationWarning, stacklevel=2)
+ from pandas.types import missing
+ return missing.array_equivalent(*args, **kwargs)
+
+
class PandasError(Exception):
pass
@@ -88,6 +97,16 @@ class UnsupportedFunctionCall(ValueError):
pass
+class UnsortedIndexError(KeyError):
+ """ Error raised when attempting to get a slice of a MultiIndex
+ and the index has not been lexsorted. Subclass of `KeyError`.
+
+ .. versionadded:: 0.20.0
+
+ """
+ pass
+
+
class AbstractMethodError(NotImplementedError):
"""Raise this error instead of NotImplementedError for abstract methods
while keeping compatibility with Python 2 and Python 3.
diff --git a/pandas/core/frame.py b/pandas/core/frame.py
index 1798a35168265..0d4bcd781cf74 100644
--- a/pandas/core/frame.py
+++ b/pandas/core/frame.py
@@ -105,7 +105,8 @@
axes_single_arg="{0 or 'index', 1 or 'columns'}",
optional_by="""
by : str or list of str
- Name or list of names which refer to the axis items.""")
+ Name or list of names which refer to the axis items.""",
+ versionadded_to_excel='')
_numeric_only_doc = """numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use
@@ -1346,7 +1347,7 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
file
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL. If you have set a `float_format`
- then floats are comverted to strings and thus csv.QUOTE_NONNUMERIC
+ then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
will treat them as non-numeric
quotechar : string (length 1), default '\"'
character used to quote fields
@@ -1385,65 +1386,11 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
if path_or_buf is None:
return formatter.path_or_buf.getvalue()
+ @Appender(_shared_docs['to_excel'] % _shared_doc_kwargs)
def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
float_format=None, columns=None, header=True, index=True,
index_label=None, startrow=0, startcol=0, engine=None,
merge_cells=True, encoding=None, inf_rep='inf', verbose=True):
- """
- Write DataFrame to a excel sheet
-
- Parameters
- ----------
- excel_writer : string or ExcelWriter object
- File path or existing ExcelWriter
- sheet_name : string, default 'Sheet1'
- Name of sheet which will contain DataFrame
- na_rep : string, default ''
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- columns : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is
- assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, default None
- Column label for index column(s) if desired. If None is given, and
- `header` and `index` are True, then the index names are used. A
- sequence should be given if the DataFrame uses MultiIndex.
- startrow :
- upper left cell row to dump data frame
- startcol :
- upper left cell column to dump data frame
- engine : string, default None
- write engine to use - you can also set this via the options
- ``io.excel.xlsx.writer``, ``io.excel.xls.writer``, and
- ``io.excel.xlsm.writer``.
- merge_cells : boolean, default True
- Write MultiIndex and Hierarchical Rows as merged cells.
- encoding: string, default None
- encoding of the resulting excel file. Only necessary for xlwt,
- other writers support unicode natively.
- inf_rep : string, default 'inf'
- Representation for infinity (there is no native representation for
- infinity in Excel)
-
- Notes
- -----
- If passing an existing ExcelWriter object, then the sheet will be added
- to the existing workbook. This can be used to save different
- DataFrames to one workbook:
-
- >>> writer = ExcelWriter('output.xlsx')
- >>> df1.to_excel(writer,'Sheet1')
- >>> df2.to_excel(writer,'Sheet2')
- >>> writer.save()
-
- For compatibility with to_csv, to_excel serializes lists and dicts to
- strings before writing.
- """
from pandas.io.excel import ExcelWriter
need_save = False
if encoding is None:
@@ -2487,7 +2434,7 @@ def _set_item(self, key, value):
# check if we are modifying a copy
# try to set first as we want an invalid
- # value exeption to occur first
+ # value exception to occur first
if len(self):
self._check_setitem_copy()
@@ -2503,10 +2450,10 @@ def insert(self, loc, column, value, allow_duplicates=False):
loc : int
Must have 0 <= loc <= len(columns)
column : object
- value : int, Series, or array-like
+ value : scalar, Series, or array-like
"""
self._ensure_valid_index(value)
- value = self._sanitize_column(column, value)
+ value = self._sanitize_column(column, value, broadcast=False)
self._data.insert(loc, column, value,
allow_duplicates=allow_duplicates)
@@ -2536,7 +2483,7 @@ def assign(self, **kwargs):
Notes
-----
Since ``kwargs`` is a dictionary, the order of your
- arguments may not be preserved. The make things predicatable,
+ arguments may not be preserved. To make things predicatable,
the columns are inserted in alphabetical order, at the end of
your DataFrame. Assigning multiple columns within the same
``assign`` is possible, but you cannot reference other columns
@@ -2590,9 +2537,25 @@ def assign(self, **kwargs):
return data
- def _sanitize_column(self, key, value):
- # Need to make sure new columns (which go into the BlockManager as new
- # blocks) are always copied
+ def _sanitize_column(self, key, value, broadcast=True):
+ """
+ Ensures new columns (which go into the BlockManager as new blocks) are
+ always copied and converted into an array.
+
+ Parameters
+ ----------
+ key : object
+ value : scalar, Series, or array-like
+ broadcast : bool, default True
+ If ``key`` matches multiple duplicate column names in the
+ DataFrame, this parameter indicates whether ``value`` should be
+ tiled so that the returned array contains a (duplicated) column for
+ each occurrence of the key. If False, ``value`` will not be tiled.
+
+ Returns
+ -------
+ sanitized_column : numpy-array
+ """
def reindexer(value):
# reindex if necessary
@@ -2665,7 +2628,7 @@ def reindexer(value):
return value
# broadcast across multiple columns if necessary
- if key in self.columns and value.ndim == 1:
+ if broadcast and key in self.columns and value.ndim == 1:
if (not self.columns.is_unique or
isinstance(self.columns, MultiIndex)):
existing_piece = self[key]
@@ -3217,7 +3180,7 @@ def trans(v):
# try to be helpful
if isinstance(self.columns, MultiIndex):
raise ValueError('Cannot sort by column %s in a '
- 'multi-index you need to explicity '
+ 'multi-index you need to explicitly '
'provide all the levels' % str(by))
raise ValueError('Cannot sort by duplicate column %s' %
@@ -3374,15 +3337,6 @@ def sortlevel(self, level=0, axis=0, ascending=True, inplace=False,
return self.sort_index(level=level, axis=axis, ascending=ascending,
inplace=inplace, sort_remaining=sort_remaining)
- def _nsorted(self, columns, n, method, keep):
- if not is_list_like(columns):
- columns = [columns]
- columns = list(columns)
- ser = getattr(self[columns[0]], method)(n, keep=keep)
- ascending = dict(nlargest=False, nsmallest=True)[method]
- return self.loc[ser.index].sort_values(columns, ascending=ascending,
- kind='mergesort')
-
def nlargest(self, n, columns, keep='first'):
"""Get the rows of a DataFrame sorted by the `n` largest
values of `columns`.
@@ -3415,7 +3369,7 @@ def nlargest(self, n, columns, keep='first'):
1 10 b 2
2 8 d NaN
"""
- return self._nsorted(columns, n, 'nlargest', keep)
+ return algos.select_n_frame(self, columns, n, 'nlargest', keep)
def nsmallest(self, n, columns, keep='first'):
"""Get the rows of a DataFrame sorted by the `n` smallest
@@ -3449,7 +3403,7 @@ def nsmallest(self, n, columns, keep='first'):
0 1 a 1
2 8 d NaN
"""
- return self._nsorted(columns, n, 'nsmallest', keep)
+ return algos.select_n_frame(self, columns, n, 'nsmallest', keep)
def swaplevel(self, i=-2, j=-1, axis=0):
"""
@@ -3868,9 +3822,8 @@ def last_valid_index(self):
def pivot(self, index=None, columns=None, values=None):
"""
Reshape data (produce a "pivot" table) based on column values. Uses
- unique values from index / columns to form axes and return either
- DataFrame or Panel, depending on whether you request a single value
- column (DataFrame) or all columns (Panel)
+ unique values from index / columns to form axes of the resulting
+ DataFrame.
Parameters
----------
@@ -3880,7 +3833,20 @@ def pivot(self, index=None, columns=None, values=None):
columns : string or object
Column name to use to make new frame's columns
values : string or object, optional
- Column name to use for populating new frame's values
+ Column name to use for populating new frame's values. If not
+ specified, all remaining columns will be used and the result will
+ have hierarchically indexed columns
+
+ Returns
+ -------
+ pivoted : DataFrame
+
+ See also
+ --------
+ DataFrame.pivot_table : generalization of pivot that can handle
+ duplicate values for one index/column pair
+ DataFrame.unstack : pivot based on the index values instead of a
+ column
Notes
-----
@@ -3889,30 +3855,30 @@ def pivot(self, index=None, columns=None, values=None):
Examples
--------
+
+ >>> df = pd.DataFrame({'foo': ['one','one','one','two','two','two'],
+ 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
+ 'baz': [1, 2, 3, 4, 5, 6]})
>>> df
foo bar baz
- 0 one A 1.
- 1 one B 2.
- 2 one C 3.
- 3 two A 4.
- 4 two B 5.
- 5 two C 6.
-
- >>> df.pivot('foo', 'bar', 'baz')
+ 0 one A 1
+ 1 one B 2
+ 2 one C 3
+ 3 two A 4
+ 4 two B 5
+ 5 two C 6
+
+ >>> df.pivot(index='foo', columns='bar', values='baz')
A B C
one 1 2 3
two 4 5 6
- >>> df.pivot('foo', 'bar')['baz']
+ >>> df.pivot(index='foo', columns='bar')['baz']
A B C
one 1 2 3
two 4 5 6
- Returns
- -------
- pivoted : DataFrame
- If no values column specified, will have hierarchically indexed
- columns
+
"""
from pandas.core.reshape import pivot
return pivot(self, index=index, columns=columns, values=values)
diff --git a/pandas/core/generic.py b/pandas/core/generic.py
index 697438df87d4f..48d799811aa94 100644
--- a/pandas/core/generic.py
+++ b/pandas/core/generic.py
@@ -1016,6 +1016,62 @@ def __setstate__(self, state):
# ----------------------------------------------------------------------
# I/O Methods
+ _shared_docs['to_excel'] = """
+ Write %(klass)s to a excel sheet
+ %(versionadded_to_excel)s
+ Parameters
+ ----------
+ excel_writer : string or ExcelWriter object
+ File path or existing ExcelWriter
+ sheet_name : string, default 'Sheet1'
+ Name of sheet which will contain DataFrame
+ na_rep : string, default ''
+ Missing data representation
+ float_format : string, default None
+ Format string for floating point numbers
+ columns : sequence, optional
+ Columns to write
+ header : boolean or list of string, default True
+ Write out column names. If a list of string is given it is
+ assumed to be aliases for the column names
+ index : boolean, default True
+ Write row names (index)
+ index_label : string or sequence, default None
+ Column label for index column(s) if desired. If None is given, and
+ `header` and `index` are True, then the index names are used. A
+ sequence should be given if the DataFrame uses MultiIndex.
+ startrow :
+ upper left cell row to dump data frame
+ startcol :
+ upper left cell column to dump data frame
+ engine : string, default None
+ write engine to use - you can also set this via the options
+ ``io.excel.xlsx.writer``, ``io.excel.xls.writer``, and
+ ``io.excel.xlsm.writer``.
+ merge_cells : boolean, default True
+ Write MultiIndex and Hierarchical Rows as merged cells.
+ encoding: string, default None
+ encoding of the resulting excel file. Only necessary for xlwt,
+ other writers support unicode natively.
+ inf_rep : string, default 'inf'
+ Representation for infinity (there is no native representation for
+ infinity in Excel)
+
+ Notes
+ -----
+ If passing an existing ExcelWriter object, then the sheet will be added
+ to the existing workbook. This can be used to save different
+ DataFrames to one workbook:
+
+ >>> writer = ExcelWriter('output.xlsx')
+ >>> df1.to_excel(writer,'Sheet1')
+ >>> df2.to_excel(writer,'Sheet2')
+ >>> writer.save()
+
+ For compatibility with to_csv, to_excel serializes lists and dicts to
+ strings before writing.
+ """
+
def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
double_precision=10, force_ascii=True, date_unit='ms',
default_handler=None, lines=False):
@@ -1066,7 +1122,7 @@ def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
Handler to call if object cannot otherwise be converted to a
suitable format for JSON. Should receive a single argument which is
the object to convert and return a serialisable object.
- lines : boolean, defalut False
+ lines : boolean, default False
If 'orient' is 'records' write out line delimited json format. Will
throw ValueError if incorrect 'orient' since others are not list
like.
@@ -1095,7 +1151,7 @@ def to_hdf(self, path_or_buf, key, **kwargs):
----------
path_or_buf : the path (string) or HDFStore object
key : string
- indentifier for the group in the store
+ identifier for the group in the store
mode : optional, {'a', 'w', 'r+'}, default 'a'
``'w'``
@@ -2029,7 +2085,8 @@ def sort_values(self, by, axis=0, ascending=True, inplace=False,
DataFrames, this option is only applied when sorting on a single
column or label.
na_position : {'first', 'last'}, default 'last'
- `first` puts NaNs at the beginning, `last` puts NaNs at the end
+ `first` puts NaNs at the beginning, `last` puts NaNs at the end.
+ Not implemented for MultiIndex.
sort_remaining : bool, default True
if true and sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified level
@@ -3297,12 +3354,16 @@ def fillna(self, value=None, method=None, axis=None, inplace=False,
return self._constructor(new_data).__finalize__(self)
def ffill(self, axis=None, inplace=False, limit=None, downcast=None):
- """Synonym for NDFrame.fillna(method='ffill')"""
+ """
+ Synonym for :meth:`DataFrame.fillna(method='ffill') `
+ """
return self.fillna(method='ffill', axis=axis, inplace=inplace,
limit=limit, downcast=downcast)
def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
- """Synonym for NDFrame.fillna(method='bfill')"""
+ """
+ Synonym for :meth:`DataFrame.fillna(method='bfill') `
+ """
return self.fillna(method='bfill', axis=axis, inplace=inplace,
limit=limit, downcast=downcast)
@@ -3477,20 +3538,27 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None,
res = self if inplace else self.copy()
for c, src in compat.iteritems(to_replace):
if c in value and c in self:
+ # object conversion is handled in
+ # series.replace which is called recursivelly
res[c] = res[c].replace(to_replace=src,
value=value[c],
- inplace=False, regex=regex)
+ inplace=False,
+ regex=regex)
return None if inplace else res
# {'A': NA} -> 0
elif not is_list_like(value):
- for k, src in compat.iteritems(to_replace):
- if k in self:
- new_data = new_data.replace(to_replace=src,
- value=value,
- filter=[k],
- inplace=inplace,
- regex=regex)
+ keys = [(k, src) for k, src in compat.iteritems(to_replace)
+ if k in self]
+ keys_len = len(keys) - 1
+ for i, (k, src) in enumerate(keys):
+ convert = i == keys_len
+ new_data = new_data.replace(to_replace=src,
+ value=value,
+ filter=[k],
+ inplace=inplace,
+ regex=regex,
+ convert=convert)
else:
raise TypeError('value argument must be scalar, dict, or '
'Series')
@@ -3571,14 +3639,17 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None,
require that you also specify an `order` (int),
e.g. df.interpolate(method='polynomial', order=4).
These use the actual numerical values of the index.
- * 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima' are all
- wrappers around the scipy interpolation methods of similar
- names. These use the actual numerical values of the index. See
- the scipy documentation for more on their behavior
- `here `__ # noqa
- `and here `__ # noqa
+ * 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
+ are all wrappers around the scipy interpolation methods of
+ similar names. These use the actual numerical values of the
+ index. For more information on their behavior, see the
+ `scipy documentation
+ `__
+ and `tutorial documentation
+ `__
* 'from_derivatives' refers to BPoly.from_derivatives which
- replaces 'piecewise_polynomial' interpolation method in scipy 0.18
+ replaces 'piecewise_polynomial' interpolation method in
+ scipy 0.18
.. versionadded:: 0.18.1
@@ -3592,7 +3663,7 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None,
* 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill.
- limit_direction : {'forward', 'backward', 'both'}, defaults to 'forward'
+ limit_direction : {'forward', 'backward', 'both'}, default 'forward'
If limit is specified, consecutive NaNs will be filled in this
direction.
@@ -3735,10 +3806,10 @@ def asof(self, where, subset=None):
if not self.index.is_monotonic:
raise ValueError("asof requires a sorted index")
- if isinstance(self, ABCSeries):
+ is_series = isinstance(self, ABCSeries)
+ if is_series:
if subset is not None:
raise ValueError("subset is not valid for Series")
- nulls = self.isnull()
elif self.ndim > 2:
raise NotImplementedError("asof is not implemented "
"for {type}".format(type(self)))
@@ -3747,9 +3818,9 @@ def asof(self, where, subset=None):
subset = self.columns
if not is_list_like(subset):
subset = [subset]
- nulls = self[subset].isnull().any(1)
- if not is_list_like(where):
+ is_list = is_list_like(where)
+ if not is_list:
start = self.index[0]
if isinstance(self.index, PeriodIndex):
where = Period(where, freq=self.index.freq).ordinal
@@ -3758,16 +3829,26 @@ def asof(self, where, subset=None):
if where < start:
return np.nan
- loc = self.index.searchsorted(where, side='right')
- if loc > 0:
- loc -= 1
- while nulls[loc] and loc > 0:
- loc -= 1
- return self.iloc[loc]
+ # It's always much faster to use a *while* loop here for
+ # Series than pre-computing all the NAs. However a
+ # *while* loop is extremely expensive for DataFrame
+ # so we later pre-compute all the NAs and use the same
+ # code path whether *where* is a scalar or list.
+ # See PR: https://github.com/pandas-dev/pandas/pull/14476
+ if is_series:
+ loc = self.index.searchsorted(where, side='right')
+ if loc > 0:
+ loc -= 1
+
+ values = self._values
+ while loc > 0 and isnull(values[loc]):
+ loc -= 1
+ return values[loc]
if not isinstance(where, Index):
- where = Index(where)
+ where = Index(where) if is_list else Index([where])
+ nulls = self.isnull() if is_series else self[subset].isnull().any(1)
locs = self.index.asof_locs(where, ~(nulls.values))
# mask the missing
@@ -3775,7 +3856,7 @@ def asof(self, where, subset=None):
data = self.take(locs, is_copy=False)
data.index = where
data.loc[missing] = np.nan
- return data
+ return data if is_list else data.iloc[-1]
# ----------------------------------------------------------------------
# Action Methods
@@ -3926,7 +4007,7 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
Parameters
----------
by : mapping function / list of functions, dict, Series, or tuple /
- list of column names.
+ list of column names or index level names.
Called on each element of the object index to determine the groups.
If a dict or Series is passed, the Series or dict VALUES will be
used to determine the groups
@@ -3998,6 +4079,8 @@ def asfreq(self, freq, method=None, how=None, normalize=False):
-------
converted : type of caller
+ Notes
+ -----
To learn more about the frequency strings, please see `this link
`__.
"""
@@ -4083,6 +4166,9 @@ def resample(self, rule, how=None, axis=0, fill_method=None, closed=None,
.. versionadded:: 0.19.0
+ Notes
+ -----
+
To learn more about the offset strings, please see `this link
`__.
@@ -4270,7 +4356,7 @@ def rank(self, axis=0, method='average', numeric_only=None,
Parameters
----------
- axis: {0 or 'index', 1 or 'columns'}, default 0
+ axis : {0 or 'index', 1 or 'columns'}, default 0
index to direct ranking
method : {'average', 'min', 'max', 'first', 'dense'}
* average: average rank of group
@@ -5395,16 +5481,18 @@ def compound(self, axis=None, skipna=None, level=None):
cls.cummin = _make_cum_function(
cls, 'cummin', name, name2, axis_descr, "cumulative minimum",
- lambda y, axis: np.minimum.accumulate(y, axis), np.inf, np.nan)
+ lambda y, axis: np.minimum.accumulate(y, axis), "min",
+ np.inf, np.nan)
cls.cumsum = _make_cum_function(
cls, 'cumsum', name, name2, axis_descr, "cumulative sum",
- lambda y, axis: y.cumsum(axis), 0., np.nan)
+ lambda y, axis: y.cumsum(axis), "sum", 0., np.nan)
cls.cumprod = _make_cum_function(
cls, 'cumprod', name, name2, axis_descr, "cumulative product",
- lambda y, axis: y.cumprod(axis), 1., np.nan)
+ lambda y, axis: y.cumprod(axis), "prod", 1., np.nan)
cls.cummax = _make_cum_function(
cls, 'cummax', name, name2, axis_descr, "cumulative max",
- lambda y, axis: np.maximum.accumulate(y, axis), -np.inf, np.nan)
+ lambda y, axis: np.maximum.accumulate(y, axis), "max",
+ -np.inf, np.nan)
cls.sum = _make_stat_function(
cls, 'sum', name, name2, axis_descr,
@@ -5592,7 +5680,15 @@ def _doc_parms(cls):
Returns
-------
-%(outname)s : %(name1)s\n"""
+%(outname)s : %(name1)s\n
+
+
+See also
+--------
+pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
+ but ignores ``NaN`` values.
+
+"""
def _make_stat_function(cls, name, name1, name2, axis_descr, desc, f):
@@ -5635,10 +5731,10 @@ def stat_func(self, axis=None, skipna=None, level=None, ddof=1,
return set_function_name(stat_func, name, cls)
-def _make_cum_function(cls, name, name1, name2, axis_descr, desc, accum_func,
- mask_a, mask_b):
+def _make_cum_function(cls, name, name1, name2, axis_descr, desc,
+ accum_func, accum_func_name, mask_a, mask_b):
@Substitution(outname=name, desc=desc, name1=name1, name2=name2,
- axis_descr=axis_descr)
+ axis_descr=axis_descr, accum_func_name=accum_func_name)
@Appender("Return {0} over requested axis.".format(desc) +
_cnum_doc)
def cum_func(self, axis=None, skipna=True, *args, **kwargs):
diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index 5223c0ac270f3..b249cded39133 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -6,7 +6,7 @@
import warnings
import copy
-from pandas.compat import(
+from pandas.compat import (
zip, range, long, lzip,
callable, map
)
@@ -175,8 +175,8 @@ class Grouper(object):
freq : string / frequency object, defaults to None
This will groupby the specified frequency if the target selection
(via key or level) is a datetime-like object. For full specification
- of available frequencies, please see
- `here `_.
+ of available frequencies, please see `here
+ `_.
axis : number/name of the axis, defaults to 0
sort : boolean, default to False
whether to sort the resulting labels
@@ -861,7 +861,17 @@ def reset_identity(values):
if isinstance(result, Series):
result = result.reindex(ax)
else:
- result = result.reindex_axis(ax, axis=self.axis)
+
+ # this is a very unfortunate situation
+ # we have a multi-index that is NOT lexsorted
+ # and we have a result which is duplicated
+ # we can't reindex, so we resort to this
+ # GH 14776
+ if isinstance(ax, MultiIndex) and not ax.is_unique:
+ result = result.take(result.index.get_indexer_for(
+ ax.values).unique(), axis=self.axis)
+ else:
+ result = result.reindex_axis(ax, axis=self.axis)
elif self.group_keys:
@@ -2208,7 +2218,10 @@ def __init__(self, index, grouper=None, obj=None, name=None, level=None,
index._get_grouper_for_level(self.grouper, level)
else:
- if isinstance(self.grouper, (list, tuple)):
+ if self.grouper is None and self.name is not None:
+ self.grouper = self.obj[self.name]
+
+ elif isinstance(self.grouper, (list, tuple)):
self.grouper = com._asarray_tuplesafe(self.grouper)
# a passed Categorical
@@ -2446,9 +2459,24 @@ def is_in_obj(gpr):
exclusions.append(name)
elif is_in_axis(gpr): # df.groupby('name')
- in_axis, name, gpr = True, gpr, obj[gpr]
- exclusions.append(name)
-
+ if gpr in obj:
+ if gpr in obj.index.names:
+ warnings.warn(
+ ("'%s' is both a column name and an index level.\n"
+ "Defaulting to column but "
+ "this will raise an ambiguity error in a "
+ "future version") % gpr,
+ FutureWarning, stacklevel=2)
+ in_axis, name, gpr = True, gpr, obj[gpr]
+ exclusions.append(name)
+ elif gpr in obj.index.names:
+ in_axis, name, level, gpr = False, None, gpr, None
+ else:
+ raise KeyError(gpr)
+ elif isinstance(gpr, Grouper) and gpr.key is not None:
+ # Add key to exclusions
+ exclusions.append(gpr.key)
+ in_axis, name = False, None
else:
in_axis, name = False, None
@@ -2892,6 +2920,7 @@ def true_and_notnull(x, *args, **kwargs):
def nunique(self, dropna=True):
""" Returns number of unique elements in the group """
ids, _, _ = self.grouper.group_info
+
val = self.obj.get_values()
try:
@@ -2922,7 +2951,10 @@ def nunique(self, dropna=True):
inc[idx] = 1
out = np.add.reduceat(inc, idx).astype('int64', copy=False)
- res = out if ids[0] != -1 else out[1:]
+ if len(ids):
+ res = out if ids[0] != -1 else out[1:]
+ else:
+ res = out[1:]
ri = self.grouper.result_index
# we might have duplications among the bins
@@ -3454,7 +3486,6 @@ def _transform_general(self, func, *args, **kwargs):
from pandas.tools.merge import concat
applied = []
-
obj = self._obj_with_exclusions
gen = self.grouper.get_iterator(obj, axis=self.axis)
fast_path, slow_path = self._define_paths(func, *args, **kwargs)
@@ -3475,14 +3506,24 @@ def _transform_general(self, func, *args, **kwargs):
else:
res = path(group)
- # broadcasting
if isinstance(res, Series):
- if res.index.is_(obj.index):
- group.T.values[:] = res
+
+ # we need to broadcast across the
+ # other dimension; this will preserve dtypes
+ # GH14457
+ if not np.prod(group.shape):
+ continue
+ elif res.index.is_(obj.index):
+ r = concat([res] * len(group.columns), axis=1)
+ r.columns = group.columns
+ r.index = group.index
else:
- group.values[:] = res
+ r = DataFrame(
+ np.concatenate([res.values] * len(group.index)
+ ).reshape(group.shape),
+ columns=group.columns, index=group.index)
- applied.append(group)
+ applied.append(r)
else:
applied.append(res)
diff --git a/pandas/core/indexing.py b/pandas/core/indexing.py
index 35fcf0d49d0d6..c4ae3dcca8367 100755
--- a/pandas/core/indexing.py
+++ b/pandas/core/indexing.py
@@ -11,6 +11,7 @@
is_sequence,
is_scalar,
is_sparse,
+ _is_unorderable_exception,
_ensure_platform_int)
from pandas.types.missing import isnull, _infer_fill_value
@@ -1411,7 +1412,7 @@ def error():
except TypeError as e:
# python 3 type errors should be raised
- if 'unorderable' in str(e): # pragma: no cover
+ if _is_unorderable_exception(e):
error()
raise
except:
@@ -1813,7 +1814,9 @@ def check_bool_indexer(ax, key):
result = result.reindex(ax)
mask = isnull(result._values)
if mask.any():
- raise IndexingError('Unalignable boolean Series key provided')
+ raise IndexingError('Unalignable boolean Series provided as '
+ 'indexer (index of the boolean Series and of '
+ 'the indexed object do not match')
result = result.astype(bool)._values
elif is_sparse(result):
result = result.to_dense()
diff --git a/pandas/core/internals.py b/pandas/core/internals.py
index 11721a5bdac29..120a9cbcd1a75 100644
--- a/pandas/core/internals.py
+++ b/pandas/core/internals.py
@@ -6,7 +6,6 @@
from collections import defaultdict
import numpy as np
-from numpy import percentile as _quantile
from pandas.core.base import PandasObject
@@ -623,7 +622,6 @@ def replace(self, to_replace, value, inplace=False, filter=None,
original_to_replace = to_replace
mask = isnull(self.values)
-
# try to replace, if we raise an error, convert to ObjectBlock and
# retry
try:
@@ -1147,8 +1145,9 @@ def get_result(other):
def handle_error():
if raise_on_error:
+ # The 'detail' variable is defined in outer scope.
raise TypeError('Could not operate %s with block values %s' %
- (repr(other), str(detail)))
+ (repr(other), str(detail))) # noqa
else:
# return the values
result = np.empty(values.shape, dtype='O')
@@ -1315,16 +1314,38 @@ def quantile(self, qs, interpolation='linear', axis=0, mgr=None):
values = self.get_values()
values, _, _, _ = self._try_coerce_args(values, values)
- mask = isnull(self.values)
- if not lib.isscalar(mask) and mask.any():
- # even though this could be a 2-d mask it appears
- # as a 1-d result
- mask = mask.reshape(values.shape)
- result_shape = tuple([values.shape[0]] + [-1] * (self.ndim - 1))
- values = _block_shape(values[~mask], ndim=self.ndim)
- if self.ndim > 1:
- values = values.reshape(result_shape)
+ def _nanpercentile1D(values, mask, q, **kw):
+ values = values[~mask]
+
+ if len(values) == 0:
+ if is_scalar(q):
+ return self._na_value
+ else:
+ return np.array([self._na_value] * len(q),
+ dtype=values.dtype)
+
+ return np.percentile(values, q, **kw)
+
+ def _nanpercentile(values, q, axis, **kw):
+
+ mask = isnull(self.values)
+ if not is_scalar(mask) and mask.any():
+ if self.ndim == 1:
+ return _nanpercentile1D(values, mask, q, **kw)
+ else:
+ # for nonconsolidatable blocks mask is 1D, but values 2D
+ if mask.ndim < values.ndim:
+ mask = mask.reshape(values.shape)
+ if axis == 0:
+ values = values.T
+ mask = mask.T
+ result = [_nanpercentile1D(val, m, q, **kw) for (val, m)
+ in zip(list(values), list(mask))]
+ result = np.array(result, dtype=values.dtype, copy=False).T
+ return result
+ else:
+ return np.percentile(values, q, axis=axis, **kw)
from pandas import Float64Index
is_empty = values.shape[axis] == 0
@@ -1343,13 +1364,13 @@ def quantile(self, qs, interpolation='linear', axis=0, mgr=None):
else:
try:
- result = _quantile(values, np.array(qs) * 100,
- axis=axis, **kw)
+ result = _nanpercentile(values, np.array(qs) * 100,
+ axis=axis, **kw)
except ValueError:
# older numpies don't handle an array for q
- result = [_quantile(values, q * 100,
- axis=axis, **kw) for q in qs]
+ result = [_nanpercentile(values, q * 100,
+ axis=axis, **kw) for q in qs]
result = np.array(result, copy=False)
if self.ndim > 1:
@@ -1368,7 +1389,7 @@ def quantile(self, qs, interpolation='linear', axis=0, mgr=None):
else:
result = np.array([self._na_value] * len(self))
else:
- result = _quantile(values, qs * 100, axis=axis, **kw)
+ result = _nanpercentile(values, qs * 100, axis=axis, **kw)
ndim = getattr(result, 'ndim', None) or 0
result = self._try_coerce_result(result)
@@ -1773,13 +1794,14 @@ def should_store(self, value):
return issubclass(value.dtype.type, np.bool_)
def replace(self, to_replace, value, inplace=False, filter=None,
- regex=False, mgr=None):
+ regex=False, convert=True, mgr=None):
to_replace_values = np.atleast_1d(to_replace)
if not np.can_cast(to_replace_values, bool):
return self
return super(BoolBlock, self).replace(to_replace, value,
inplace=inplace, filter=filter,
- regex=regex, mgr=mgr)
+ regex=regex, convert=convert,
+ mgr=mgr)
class ObjectBlock(Block):
@@ -3192,6 +3214,7 @@ def comp(s):
masks = [comp(s) for i, s in enumerate(src_list)]
result_blocks = []
+ src_len = len(src_list) - 1
for blk in self.blocks:
# its possible to get multiple result blocks here
@@ -3201,8 +3224,9 @@ def comp(s):
new_rb = []
for b in rb:
if b.dtype == np.object_:
+ convert = i == src_len
result = b.replace(s, d, inplace=inplace, regex=regex,
- mgr=mgr)
+ mgr=mgr, convert=convert)
new_rb = _extend_blocks(result, new_rb)
else:
# get our mask for this element, sized to this
@@ -4766,7 +4790,12 @@ def _putmask_smart(v, m, n):
# change the dtype
dtype, _ = _maybe_promote(n.dtype)
- nv = v.astype(dtype)
+
+ if is_extension_type(v.dtype) and is_object_dtype(dtype):
+ nv = v.get_values(dtype)
+ else:
+ nv = v.astype(dtype)
+
try:
nv[m] = n[m]
except ValueError:
diff --git a/pandas/core/missing.py b/pandas/core/missing.py
index b847415f274db..f1191ff1c7009 100644
--- a/pandas/core/missing.py
+++ b/pandas/core/missing.py
@@ -12,7 +12,8 @@
is_float_dtype, is_datetime64_dtype,
is_integer_dtype, _ensure_float64,
is_scalar,
- _DATELIKE_DTYPES)
+ _DATELIKE_DTYPES,
+ needs_i8_conversion)
from pandas.types.missing import isnull
@@ -187,7 +188,7 @@ def _interp_limit(invalid, fw_limit, bw_limit):
if method in ('values', 'index'):
inds = np.asarray(xvalues)
# hack for DatetimeIndex, #1646
- if issubclass(inds.dtype.type, np.datetime64):
+ if needs_i8_conversion(inds.dtype.type):
inds = inds.view(np.int64)
if inds.dtype == np.object_:
inds = lib.maybe_convert_objects(inds)
diff --git a/pandas/core/nanops.py b/pandas/core/nanops.py
index 564586eec5a8e..d7d68ad536be5 100644
--- a/pandas/core/nanops.py
+++ b/pandas/core/nanops.py
@@ -11,6 +11,7 @@
import pandas.hashtable as _hash
from pandas import compat, lib, algos, tslib
+from pandas.compat.numpy import _np_version_under1p10
from pandas.types.common import (_ensure_int64, _ensure_object,
_ensure_float64, _get_dtype,
is_float, is_scalar,
@@ -829,9 +830,37 @@ def _checked_add_with_arr(arr, b):
Raises
------
- OverflowError if any x + y exceeds the maximum int64 value.
+ OverflowError if any x + y exceeds the maximum or minimum int64 value.
"""
- if (np.iinfo(np.int64).max - b < arr).any():
- raise OverflowError("Python int too large to "
- "convert to C long")
+ # For performance reasons, we broadcast 'b' to the new array 'b2'
+ # so that it has the same size as 'arr'.
+ if _np_version_under1p10:
+ if lib.isscalar(b):
+ b2 = np.empty(arr.shape)
+ b2.fill(b)
+ else:
+ b2 = b
+ else:
+ b2 = np.broadcast_to(b, arr.shape)
+
+ # gh-14324: For each element in 'arr' and its corresponding element
+ # in 'b2', we check the sign of the element in 'b2'. If it is positive,
+ # we then check whether its sum with the element in 'arr' exceeds
+ # np.iinfo(np.int64).max. If so, we have an overflow error. If it
+ # it is negative, we then check whether its sum with the element in
+ # 'arr' exceeds np.iinfo(np.int64).min. If so, we have an overflow
+ # error as well.
+ mask1 = b2 > 0
+ mask2 = b2 < 0
+
+ if not mask1.any():
+ to_raise = (np.iinfo(np.int64).min - b2 > arr).any()
+ elif not mask2.any():
+ to_raise = (np.iinfo(np.int64).max - b2 < arr).any()
+ else:
+ to_raise = ((np.iinfo(np.int64).max - b2[mask1] < arr[mask1]).any() or
+ (np.iinfo(np.int64).min - b2[mask2] > arr[mask2]).any())
+
+ if to_raise:
+ raise OverflowError("Overflow in int64 addition")
return arr + b
diff --git a/pandas/core/ops.py b/pandas/core/ops.py
index 7cff1104c50be..80de3cd85d4db 100644
--- a/pandas/core/ops.py
+++ b/pandas/core/ops.py
@@ -421,7 +421,7 @@ def _validate(self, lvalues, rvalues, name):
# if tz's must be equal (same or None)
if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None):
- raise ValueError("Incompatbile tz's on datetime subtraction "
+ raise ValueError("Incompatible tz's on datetime subtraction "
"ops")
elif ((self.is_timedelta_lhs or self.is_offset_lhs) and
@@ -1006,7 +1006,7 @@ def wrapper(self, other):
Parameters
----------
-other: Series or scalar value
+other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill missing (NaN) values with this value. If both Series are
missing, the result will be missing
@@ -1176,6 +1176,13 @@ def na_op(x, y):
yrav = y.ravel()
mask = notnull(xrav) & notnull(yrav)
xrav = xrav[mask]
+
+ # we may need to manually
+ # broadcast a 1 element array
+ if yrav.shape != mask.shape:
+ yrav = np.empty(mask.shape, dtype=yrav.dtype)
+ yrav.fill(yrav.item())
+
yrav = yrav[mask]
if np.prod(xrav.shape) and np.prod(yrav.shape):
with np.errstate(all='ignore'):
diff --git a/pandas/core/reshape.py b/pandas/core/reshape.py
index fa5d16bd85e98..b359c54535b28 100644
--- a/pandas/core/reshape.py
+++ b/pandas/core/reshape.py
@@ -3,6 +3,7 @@
from pandas.compat import range, zip
from pandas import compat
import itertools
+import re
import numpy as np
@@ -277,7 +278,8 @@ def _unstack_multiple(data, clocs):
verify_integrity=False)
if isinstance(data, Series):
- dummy = Series(data.values, index=dummy_index)
+ dummy = data.copy()
+ dummy.index = dummy_index
unstacked = dummy.unstack('__placeholder__')
new_levels = clevels
new_names = cnames
@@ -292,7 +294,8 @@ def _unstack_multiple(data, clocs):
return result
- dummy = DataFrame(data.values, index=dummy_index, columns=data.columns)
+ dummy = data.copy()
+ dummy.index = dummy_index
unstacked = dummy.unstack('__placeholder__')
if isinstance(unstacked, Series):
@@ -357,6 +360,11 @@ def pivot_simple(index, columns, values):
Returns
-------
DataFrame
+
+ See also
+ --------
+ DataFrame.pivot_table : generalization of pivot that can handle
+ duplicate values for one index/column pair
"""
if (len(index) != len(columns)) or (len(columns) != len(values)):
raise AssertionError('Length of index, columns, and values must be the'
@@ -870,29 +878,55 @@ def lreshape(data, groups, dropna=True, label=None):
return DataFrame(mdata, columns=id_cols + pivot_cols)
-def wide_to_long(df, stubnames, i, j):
- """
+def wide_to_long(df, stubnames, i, j, sep="", suffix='\d+'):
+ r"""
Wide panel to long format. Less flexible but more user-friendly than melt.
+ With stubnames ['A', 'B'], this function expects to find one or more
+ group of columns with format Asuffix1, Asuffix2,..., Bsuffix1, Bsuffix2,...
+ You specify what you want to call this suffix in the resulting long format
+ with `j` (for example `j='year'`)
+
+ Each row of these wide variables are assumed to be uniquely identified by
+ `i` (can be a single column name or a list of column names)
+
+ All remaining variables in the data frame are left intact.
+
Parameters
----------
df : DataFrame
The wide-format DataFrame
- stubnames : list
- A list of stub names. The wide format variables are assumed to
+ stubnames : str or list-like
+ The stub name(s). The wide format variables are assumed to
start with the stub names.
- i : str
- The name of the id variable.
+ i : str or list-like
+ Column(s) to use as id variable(s)
j : str
- The name of the subobservation variable.
- stubend : str
- Regex to match for the end of the stubs.
+ The name of the subobservation variable. What you wish to name your
+ suffix in the long format.
+ sep : str, default ""
+ A character indicating the separation of the variable names
+ in the wide format, to be stripped from the names in the long format.
+ For example, if your column names are A-suffix1, A-suffix2, you
+ can strip the hypen by specifying `sep='-'`
+
+ .. versionadded:: 0.20.0
+
+ suffix : str, default '\\d+'
+ A regular expression capturing the wanted suffixes. '\\d+' captures
+ numeric suffixes. Suffixes with no numbers could be specified with the
+ negated character class '\\D+'. You can also further disambiguate
+ suffixes, for example, if your wide variables are of the form
+ Aone, Btwo,.., and you have an unrelated column Arating, you can
+ ignore the last one by specifying `suffix='(!?one|two)'`
+
+ .. versionadded:: 0.20.0
Returns
-------
DataFrame
- A DataFrame that contains each stub name as a variable as well as
- variables for i and j.
+ A DataFrame that contains each stub name as a variable, with new index
+ (i, j)
Examples
--------
@@ -911,7 +945,7 @@ def wide_to_long(df, stubnames, i, j):
0 a d 2.5 3.2 -1.085631 0
1 b e 1.2 1.3 0.997345 1
2 c f 0.7 0.1 0.282978 2
- >>> wide_to_long(df, ["A", "B"], i="id", j="year")
+ >>> pd.wide_to_long(df, ["A", "B"], i="id", j="year")
X A B
id year
0 1970 -1.085631 a 2.5
@@ -921,38 +955,151 @@ def wide_to_long(df, stubnames, i, j):
1 1980 0.997345 e 1.3
2 1980 0.282978 f 0.1
+ With multuple id columns
+
+ >>> df = pd.DataFrame({
+ ... 'famid': [1, 1, 1, 2, 2, 2, 3, 3, 3],
+ ... 'birth': [1, 2, 3, 1, 2, 3, 1, 2, 3],
+ ... 'ht1': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
+ ... 'ht2': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9]
+ ... })
+ >>> df
+ birth famid ht1 ht2
+ 0 1 1 2.8 3.4
+ 1 2 1 2.9 3.8
+ 2 3 1 2.2 2.9
+ 3 1 2 2.0 3.2
+ 4 2 2 1.8 2.8
+ 5 3 2 1.9 2.4
+ 6 1 3 2.2 3.3
+ 7 2 3 2.3 3.4
+ 8 3 3 2.1 2.9
+ >>> l = pd.wide_to_long(df, stubnames='ht', i=['famid', 'birth'], j='age')
+ >>> l
+ ht
+ famid birth age
+ 1 1 1 2.8
+ 2 3.4
+ 2 1 2.9
+ 2 3.8
+ 3 1 2.2
+ 2 2.9
+ 2 1 1 2.0
+ 2 3.2
+ 2 1 1.8
+ 2 2.8
+ 3 1 1.9
+ 2 2.4
+ 3 1 1 2.2
+ 2 3.3
+ 2 1 2.3
+ 2 3.4
+ 3 1 2.1
+ 2 2.9
+
+ Going from long back to wide just takes some creative use of `unstack`
+
+ >>> w = l.reset_index().set_index(['famid', 'birth', 'age']).unstack()
+ >>> w.columns = pd.Index(w.columns).str.join('')
+ >>> w.reset_index()
+ famid birth ht1 ht2
+ 0 1 1 2.8 3.4
+ 1 1 2 2.9 3.8
+ 2 1 3 2.2 2.9
+ 3 2 1 2.0 3.2
+ 4 2 2 1.8 2.8
+ 5 2 3 1.9 2.4
+ 6 3 1 2.2 3.3
+ 7 3 2 2.3 3.4
+ 8 3 3 2.1 2.9
+
+ Less wieldy column names are also handled
+
+ >>> df = pd.DataFrame({'A(quarterly)-2010': np.random.rand(3),
+ ... 'A(quarterly)-2011': np.random.rand(3),
+ ... 'B(quarterly)-2010': np.random.rand(3),
+ ... 'B(quarterly)-2011': np.random.rand(3),
+ ... 'X' : np.random.randint(3, size=3)})
+ >>> df['id'] = df.index
+ >>> df
+ A(quarterly)-2010 A(quarterly)-2011 B(quarterly)-2010 B(quarterly)-2011
+ 0 0.531828 0.724455 0.322959 0.293714
+ 1 0.634401 0.611024 0.361789 0.630976
+ 2 0.849432 0.722443 0.228263 0.092105
+ \
+ X id
+ 0 0 0
+ 1 1 1
+ 2 2 2
+ >>> pd.wide_to_long(df, ['A(quarterly)', 'B(quarterly)'],
+ i='id', j='year', sep='-')
+ X A(quarterly) B(quarterly)
+ id year
+ 0 2010 0 0.531828 0.322959
+ 1 2010 2 0.634401 0.361789
+ 2 2010 2 0.849432 0.228263
+ 0 2011 0 0.724455 0.293714
+ 1 2011 2 0.611024 0.630976
+ 2 2011 2 0.722443 0.092105
+
+ If we have many columns, we could also use a regex to find our
+ stubnames and pass that list on to wide_to_long
+
+ >>> stubnames = set([match[0] for match in
+ df.columns.str.findall('[A-B]\(.*\)').values
+ if match != [] ])
+ >>> list(stubnames)
+ ['B(quarterly)', 'A(quarterly)']
+
Notes
-----
- All extra variables are treated as extra id variables. This simply uses
+ All extra variables are left untouched. This simply uses
`pandas.melt` under the hood, but is hard-coded to "do the right thing"
in a typicaly case.
"""
-
- def get_var_names(df, regex):
+ def get_var_names(df, stub, sep, suffix):
+ regex = "^{0}{1}{2}".format(re.escape(stub), re.escape(sep), suffix)
return df.filter(regex=regex).columns.tolist()
- def melt_stub(df, stub, i, j):
- varnames = get_var_names(df, "^" + stub)
- newdf = melt(df, id_vars=i, value_vars=varnames, value_name=stub,
- var_name=j)
- newdf_j = newdf[j].str.replace(stub, "")
- try:
- newdf_j = newdf_j.astype(int)
- except ValueError:
- pass
- newdf[j] = newdf_j
- return newdf
-
- id_vars = get_var_names(df, "^(?!%s)" % "|".join(stubnames))
- if i not in id_vars:
- id_vars += [i]
-
- newdf = melt_stub(df, stubnames[0], id_vars, j)
-
- for stub in stubnames[1:]:
- new = melt_stub(df, stub, id_vars, j)
- newdf = newdf.merge(new, how="outer", on=id_vars + [j], copy=False)
- return newdf.set_index([i, j])
+ def melt_stub(df, stub, i, j, value_vars, sep):
+ newdf = melt(df, id_vars=i, value_vars=value_vars,
+ value_name=stub.rstrip(sep), var_name=j)
+ newdf[j] = Categorical(newdf[j])
+ newdf[j] = newdf[j].str.replace(re.escape(stub + sep), "")
+
+ return newdf.set_index(i + [j])
+
+ if any(map(lambda s: s in df.columns.tolist(), stubnames)):
+ raise ValueError("stubname can't be identical to a column name")
+
+ if not is_list_like(stubnames):
+ stubnames = [stubnames]
+ else:
+ stubnames = list(stubnames)
+
+ if not is_list_like(i):
+ i = [i]
+ else:
+ i = list(i)
+
+ value_vars = list(map(lambda stub:
+ get_var_names(df, stub, sep, suffix), stubnames))
+
+ value_vars_flattened = [e for sublist in value_vars for e in sublist]
+ id_vars = list(set(df.columns.tolist()).difference(value_vars_flattened))
+
+ melted = []
+ for s, v in zip(stubnames, value_vars):
+ melted.append(melt_stub(df, s, i, j, v, sep))
+ melted = melted[0].join(melted[1:], how='outer')
+
+ if len(i) == 1:
+ new = df[id_vars].set_index(i).join(melted)
+ return new
+
+ new = df[id_vars].merge(melted.reset_index(), on=i).set_index(i + [j])
+
+ return new
def get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False,
diff --git a/pandas/core/series.py b/pandas/core/series.py
index 1c6b13885dd01..7018865e5b3ec 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -25,6 +25,7 @@
is_iterator,
is_dict_like,
is_scalar,
+ _is_unorderable_exception,
_ensure_platform_int)
from pandas.types.generic import ABCSparseArray, ABCDataFrame
from pandas.types.cast import (_maybe_upcast, _infer_dtype_from_scalar,
@@ -79,7 +80,8 @@
inplace="""inplace : boolean, default False
If True, performs operation inplace and returns None.""",
unique='np.ndarray', duplicated='Series',
- optional_by='')
+ optional_by='',
+ versionadded_to_excel='\n.. versionadded:: 0.20.0\n')
def _coerce_method(converter):
@@ -102,11 +104,11 @@ class Series(base.IndexOpsMixin, strings.StringAccessorMixin,
"""
One-dimensional ndarray with axis labels (including time series).
- Labels need not be unique but must be any hashable type. The object
+ Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
- missing data (currently represented as NaN)
+ missing data (currently represented as NaN).
Operations between Series (+, -, /, *, **) align values based on their
associated index values-- they need not be the same length. The result
@@ -117,8 +119,8 @@ class Series(base.IndexOpsMixin, strings.StringAccessorMixin,
data : array-like, dict, or scalar value
Contains data stored in Series
index : array-like or Index (1d)
- Values must be unique and hashable, same length as data. Index
- object (or other iterable of same length as data) Will default to
+ Values must be hashable and have the same length as `data`.
+ Non-unique index values are allowed. Will default to
RangeIndex(len(data)) if not provided. If both a dict and index
sequence are used, the index will override the keys found in the
dict.
@@ -753,7 +755,7 @@ def setitem(key, value):
raise ValueError("Can only tuple-index with a MultiIndex")
# python 3 type errors should be raised
- if 'unorderable' in str(e): # pragma: no cover
+ if _is_unorderable_exception(e):
raise IndexError(key)
if com.is_bool_indexer(key):
@@ -831,18 +833,19 @@ def _set_values(self, key, value):
self._data = self._data.setitem(indexer=key, value=value)
self._maybe_update_cacher()
- def repeat(self, reps, *args, **kwargs):
+ @deprecate_kwarg(old_arg_name='reps', new_arg_name='repeats')
+ def repeat(self, repeats, *args, **kwargs):
"""
Repeat elements of an Series. Refer to `numpy.ndarray.repeat`
- for more information about the `reps` argument.
+ for more information about the `repeats` argument.
See also
--------
numpy.ndarray.repeat
"""
nv.validate_repeat(args, kwargs)
- new_index = self.index.repeat(reps)
- new_values = self._values.repeat(reps)
+ new_index = self.index.repeat(repeats)
+ new_values = self._values.repeat(repeats)
return self._constructor(new_values,
index=new_index).__finalize__(self)
@@ -1216,16 +1219,10 @@ def count(self, level=None):
dtype='int64').__finalize__(self)
def mode(self):
- """Returns the mode(s) of the dataset.
+ """Return the mode(s) of the dataset.
- Empty if nothing occurs at least 2 times. Always returns Series even
- if only one value.
-
- Parameters
- ----------
- sort : bool, default True
- If True, will lexicographically sort values, if False skips
- sorting. Result ordering when ``sort=False`` is not defined.
+ Empty if nothing occurs at least 2 times. Always returns Series even
+ if only one value is returned.
Returns
-------
@@ -1514,12 +1511,13 @@ def dot(self, other):
else: # pragma: no cover
raise TypeError('unsupported type: %s' % type(other))
- @Substitution(klass='Series', value='v')
+ @Substitution(klass='Series')
@Appender(base._shared_docs['searchsorted'])
- def searchsorted(self, v, side='left', sorter=None):
+ @deprecate_kwarg(old_arg_name='v', new_arg_name='value')
+ def searchsorted(self, value, side='left', sorter=None):
if sorter is not None:
sorter = _ensure_platform_int(sorter)
- return self._values.searchsorted(Series(v)._values,
+ return self._values.searchsorted(Series(value)._values,
side=side, sorter=sorter)
# -------------------------------------------------------------------
@@ -1773,7 +1771,7 @@ def _try_kind_sort(arr):
@Appender(generic._shared_docs['sort_index'] % _shared_doc_kwargs)
def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
- sort_remaining=True):
+ kind='quicksort', na_position='last', sort_remaining=True):
axis = self._get_axis_number(axis)
index = self.index
@@ -1783,11 +1781,13 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
elif isinstance(index, MultiIndex):
from pandas.core.groupby import _lexsort_indexer
indexer = _lexsort_indexer(index.labels, orders=ascending)
- indexer = _ensure_platform_int(indexer)
- new_index = index.take(indexer)
else:
- new_index, indexer = index.sort_values(return_indexer=True,
- ascending=ascending)
+ from pandas.core.groupby import _nargsort
+ indexer = _nargsort(index, kind=kind, ascending=ascending,
+ na_position=na_position)
+
+ indexer = _ensure_platform_int(indexer)
+ new_index = index.take(indexer)
new_values = self._values.take(indexer)
result = self._constructor(new_values, index=new_index)
@@ -1940,7 +1940,7 @@ def nlargest(self, n=5, keep='first'):
>>> s = pd.Series(np.random.randn(1e6))
>>> s.nlargest(10) # only sorts up to the N requested
"""
- return algos.select_n(self, n=n, keep=keep, method='nlargest')
+ return algos.select_n_series(self, n=n, keep=keep, method='nlargest')
@deprecate_kwarg('take_last', 'keep', mapping={True: 'last',
False: 'first'})
@@ -1978,7 +1978,7 @@ def nsmallest(self, n=5, keep='first'):
>>> s = pd.Series(np.random.randn(1e6))
>>> s.nsmallest(10) # only sorts up to the N requested
"""
- return algos.select_n(self, n=n, keep=keep, method='nsmallest')
+ return algos.select_n_series(self, n=n, keep=keep, method='nsmallest')
def sortlevel(self, level=0, ascending=True, sort_remaining=True):
"""
@@ -2033,9 +2033,9 @@ def reorder_levels(self, order):
Parameters
----------
- order: list of int representing new level order.
+ order : list of int representing new level order.
(reference level by number or key)
- axis: where to reorder levels
+ axis : where to reorder levels
Returns
-------
@@ -2622,6 +2622,19 @@ def to_csv(self, path=None, index=True, sep=",", na_rep='',
if path is None:
return result
+ @Appender(generic._shared_docs['to_excel'] % _shared_doc_kwargs)
+ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
+ float_format=None, columns=None, header=True, index=True,
+ index_label=None, startrow=0, startcol=0, engine=None,
+ merge_cells=True, encoding=None, inf_rep='inf', verbose=True):
+ df = self.to_frame()
+ df.to_excel(excel_writer=excel_writer, sheet_name=sheet_name,
+ na_rep=na_rep, float_format=float_format, columns=columns,
+ header=header, index=index, index_label=index_label,
+ startrow=startrow, startcol=startcol, engine=engine,
+ merge_cells=merge_cells, encoding=encoding,
+ inf_rep=inf_rep, verbose=verbose)
+
def dropna(self, axis=0, inplace=False, **kwargs):
"""
Return Series without null values
@@ -2915,8 +2928,8 @@ def create_from_value(value, index, dtype):
return subarr
- # scalar like
- if subarr.ndim == 0:
+ # scalar like, GH
+ if getattr(subarr, 'ndim', 0) == 0:
if isinstance(data, list): # pragma: no cover
subarr = np.array(data, dtype=object)
elif index is not None:
diff --git a/pandas/formats/format.py b/pandas/formats/format.py
index 7706666142a64..0cf6050e515e0 100644
--- a/pandas/formats/format.py
+++ b/pandas/formats/format.py
@@ -1455,9 +1455,9 @@ def save(self):
f = self.path_or_buf
close = False
else:
- f = _get_handle(self.path_or_buf, self.mode,
- encoding=self.encoding,
- compression=self.compression)
+ f, handles = _get_handle(self.path_or_buf, self.mode,
+ encoding=self.encoding,
+ compression=self.compression)
close = True
try:
diff --git a/pandas/hashtable.pxd b/pandas/hashtable.pxd
index 97b6687d061e9..f3ea7ad792160 100644
--- a/pandas/hashtable.pxd
+++ b/pandas/hashtable.pxd
@@ -1,4 +1,4 @@
-from khash cimport kh_int64_t, kh_float64_t, kh_pymap_t, int64_t, float64_t
+from khash cimport kh_int64_t, kh_float64_t, kh_pymap_t, kh_str_t, int64_t, float64_t
# prototypes for sharing
@@ -22,3 +22,9 @@ cdef class PyObjectHashTable(HashTable):
cpdef get_item(self, object val)
cpdef set_item(self, object key, Py_ssize_t val)
+
+cdef class StringHashTable(HashTable):
+ cdef kh_str_t *table
+
+ cpdef get_item(self, object val)
+ cpdef set_item(self, object key, Py_ssize_t val)
diff --git a/pandas/hashtable.pyx b/pandas/hashtable.pyx
index 3bda3f49cb054..ce760b49fabc0 100644
--- a/pandas/hashtable.pyx
+++ b/pandas/hashtable.pyx
@@ -4,7 +4,11 @@ from cpython cimport PyObject, Py_INCREF, PyList_Check, PyTuple_Check
from khash cimport *
from numpy cimport *
-from cpython cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free
+
+from libc.stdlib cimport malloc, free
+from cpython cimport (PyMem_Malloc, PyMem_Realloc, PyMem_Free,
+ PyString_Check, PyBytes_Check,
+ PyUnicode_Check)
from util cimport _checknan
cimport util
@@ -33,7 +37,7 @@ PyDateTime_IMPORT
cdef extern from "Python.h":
int PySlice_Check(object)
-cdef size_t _INIT_VEC_CAP = 32
+cdef size_t _INIT_VEC_CAP = 128
include "hashtable_class_helper.pxi"
diff --git a/pandas/indexes/base.py b/pandas/indexes/base.py
index 1c24a0db34b2b..512abfd88c78c 100644
--- a/pandas/indexes/base.py
+++ b/pandas/indexes/base.py
@@ -535,17 +535,18 @@ def tolist(self):
"""
return list(self.values)
- def repeat(self, n, *args, **kwargs):
+ @deprecate_kwarg(old_arg_name='n', new_arg_name='repeats')
+ def repeat(self, repeats, *args, **kwargs):
"""
Repeat elements of an Index. Refer to `numpy.ndarray.repeat`
- for more information about the `n` argument.
+ for more information about the `repeats` argument.
See also
--------
numpy.ndarray.repeat
"""
nv.validate_repeat(args, kwargs)
- return self._shallow_copy(self._values.repeat(n))
+ return self._shallow_copy(self._values.repeat(repeats))
def where(self, cond, other=None):
"""
@@ -1464,13 +1465,13 @@ def append(self, other):
names = set([obj.name for obj in to_concat])
name = None if len(names) > 1 else self.name
- typs = _concat.get_dtype_kinds(to_concat)
-
- if 'category' in typs:
- # if any of the to_concat is category
+ if self.is_categorical():
+ # if calling index is category, don't check dtype of others
from pandas.indexes.category import CategoricalIndex
return CategoricalIndex._append_same_dtype(self, to_concat, name)
+ typs = _concat.get_dtype_kinds(to_concat)
+
if len(typs) == 1:
return self._append_same_dtype(to_concat, name=name)
return _concat._concat_index_asobject(to_concat, name=name)
@@ -2003,7 +2004,7 @@ def difference(self, other):
except TypeError:
pass
- return this._shallow_copy(the_diff, name=result_name)
+ return this._shallow_copy(the_diff, name=result_name, freq=None)
def symmetric_difference(self, other, result_name=None):
"""
@@ -2966,6 +2967,11 @@ def _wrap_joined_index(self, joined, other):
name = self.name if self.name == other.name else None
return Index(joined, name=name)
+ def _get_string_slice(self, key, use_lhs=True, use_rhs=True):
+ # this is for partial string indexing,
+ # overridden in DatetimeIndex, TimedeltaIndex and PeriodIndex
+ raise NotImplementedError
+
def slice_indexer(self, start=None, end=None, step=None, kind=None):
"""
For an ordered Index, compute the slice indexer for input labels and
diff --git a/pandas/indexes/multi.py b/pandas/indexes/multi.py
index a9f452db69659..132543e0e386c 100644
--- a/pandas/indexes/multi.py
+++ b/pandas/indexes/multi.py
@@ -25,7 +25,8 @@
from pandas.core.common import (_values_from_object,
is_bool_indexer,
is_null_slice,
- PerformanceWarning)
+ PerformanceWarning,
+ UnsortedIndexError)
from pandas.core.base import FrozenList
@@ -1166,10 +1167,11 @@ def append(self, other):
def argsort(self, *args, **kwargs):
return self.values.argsort(*args, **kwargs)
- def repeat(self, n, *args, **kwargs):
+ @deprecate_kwarg(old_arg_name='n', new_arg_name='repeats')
+ def repeat(self, repeats, *args, **kwargs):
nv.validate_repeat(args, kwargs)
return MultiIndex(levels=self.levels,
- labels=[label.view(np.ndarray).repeat(n)
+ labels=[label.view(np.ndarray).repeat(repeats)
for label in self.labels], names=self.names,
sortorder=self.sortorder, verify_integrity=False)
@@ -1907,6 +1909,13 @@ def convert_indexer(start, stop, step, indexer=indexer, labels=labels):
return np.array(labels == loc, dtype=bool)
else:
# sorted, so can return slice object -> view
+ try:
+ loc = labels.dtype.type(loc)
+ except TypeError:
+ # this occurs when loc is a slice (partial string indexing)
+ # but the TypeError raised by searchsorted in this case
+ # is catched in Index._has_valid_type()
+ pass
i = labels.searchsorted(loc, side='left')
j = labels.searchsorted(loc, side='right')
return slice(i, j)
@@ -1928,9 +1937,10 @@ def get_locs(self, tup):
# must be lexsorted to at least as many levels
if not self.is_lexsorted_for_tuple(tup):
- raise KeyError('MultiIndex Slicing requires the index to be fully '
- 'lexsorted tuple len ({0}), lexsort depth '
- '({1})'.format(len(tup), self.lexsort_depth))
+ raise UnsortedIndexError('MultiIndex Slicing requires the index '
+ 'to be fully lexsorted tuple len ({0}), '
+ 'lexsort depth ({1})'
+ .format(len(tup), self.lexsort_depth))
# indexer
# this is the list of all values that we want to select
diff --git a/pandas/indexes/range.py b/pandas/indexes/range.py
index 76166e7155bd0..7a7902b503bd6 100644
--- a/pandas/indexes/range.py
+++ b/pandas/indexes/range.py
@@ -315,6 +315,9 @@ def intersection(self, other):
if not isinstance(other, RangeIndex):
return super(RangeIndex, self).intersection(other)
+ if not len(self) or not len(other):
+ return RangeIndex._simple_new(None)
+
# check whether intervals intersect
# deals with in- and decreasing ranges
int_low = max(min(self._start, self._stop + 1),
@@ -322,7 +325,7 @@ def intersection(self, other):
int_high = min(max(self._stop, self._start + 1),
max(other._stop, other._start + 1))
if int_high <= int_low:
- return RangeIndex()
+ return RangeIndex._simple_new(None)
# Method hint: linear Diophantine equation
# solve intersection problem
@@ -332,7 +335,7 @@ def intersection(self, other):
# check whether element sets intersect
if (self._start - other._start) % gcd:
- return RangeIndex()
+ return RangeIndex._simple_new(None)
# calculate parameters for the RangeIndex describing the
# intersection disregarding the lower bounds
diff --git a/pandas/io/clipboard.py b/pandas/io/clipboard.py
index 2109e1c5d6d4c..3c7ac528d83fd 100644
--- a/pandas/io/clipboard.py
+++ b/pandas/io/clipboard.py
@@ -1,19 +1,31 @@
""" io on the clipboard """
from pandas import compat, get_option, option_context, DataFrame
-from pandas.compat import StringIO
+from pandas.compat import StringIO, PY2
-def read_clipboard(**kwargs): # pragma: no cover
- """
+def read_clipboard(sep='\s+', **kwargs): # pragma: no cover
+ r"""
Read text from clipboard and pass to read_table. See read_table for the
full argument list
- If unspecified, `sep` defaults to '\s+'
+ Parameters
+ ----------
+ sep : str, default '\s+'.
+ A string or regex delimiter. The default of '\s+' denotes
+ one or more whitespace characters.
Returns
-------
parsed : DataFrame
"""
+ encoding = kwargs.pop('encoding', 'utf-8')
+
+ # only utf-8 is valid for passed value because that's what clipboard
+ # supports
+ if encoding is not None and encoding.lower().replace('-', '') != 'utf8':
+ raise NotImplementedError(
+ 'reading from clipboard only supports utf-8 encoding')
+
from pandas.util.clipboard import clipboard_get
from pandas.io.parsers import read_table
text = clipboard_get()
@@ -29,7 +41,7 @@ def read_clipboard(**kwargs): # pragma: no cover
except:
pass
- # Excel copies into clipboard with \t seperation
+ # Excel copies into clipboard with \t separation
# inspect no more then the 10 first lines, if they
# all contain an equal number (>0) of tabs, infer
# that this came from excel and set 'sep' accordingly
@@ -43,12 +55,12 @@ def read_clipboard(**kwargs): # pragma: no cover
counts = set([x.lstrip().count('\t') for x in lines])
if len(lines) > 1 and len(counts) == 1 and counts.pop() != 0:
- kwargs['sep'] = '\t'
+ sep = '\t'
- if kwargs.get('sep') is None and kwargs.get('delim_whitespace') is None:
- kwargs['sep'] = '\s+'
+ if sep is None and kwargs.get('delim_whitespace') is None:
+ sep = '\s+'
- return read_table(StringIO(text), **kwargs)
+ return read_table(StringIO(text), sep=sep, **kwargs)
def to_clipboard(obj, excel=None, sep=None, **kwargs): # pragma: no cover
@@ -74,6 +86,12 @@ def to_clipboard(obj, excel=None, sep=None, **kwargs): # pragma: no cover
- Windows:
- OS X:
"""
+ encoding = kwargs.pop('encoding', 'utf-8')
+
+ # testing if an invalid encoding is passed to clipboard
+ if encoding is not None and encoding.lower().replace('-', '') != 'utf8':
+ raise ValueError('clipboard only supports utf-8 encoding')
+
from pandas.util.clipboard import clipboard_set
if excel is None:
excel = True
@@ -83,8 +101,12 @@ def to_clipboard(obj, excel=None, sep=None, **kwargs): # pragma: no cover
if sep is None:
sep = '\t'
buf = StringIO()
- obj.to_csv(buf, sep=sep, **kwargs)
- clipboard_set(buf.getvalue())
+ # clipboard_set (pyperclip) expects unicode
+ obj.to_csv(buf, sep=sep, encoding='utf-8', **kwargs)
+ text = buf.getvalue()
+ if PY2:
+ text = text.decode('utf-8')
+ clipboard_set(text)
return
except:
pass
diff --git a/pandas/io/common.py b/pandas/io/common.py
index 127ebc4839fd3..c115fab217fba 100644
--- a/pandas/io/common.py
+++ b/pandas/io/common.py
@@ -1,11 +1,9 @@
"""Common IO api utilities"""
-import sys
import os
import csv
import codecs
import mmap
-import zipfile
from contextlib import contextmanager, closing
from pandas.compat import StringIO, BytesIO, string_types, text_type
@@ -65,13 +63,15 @@ def urlopen(*args, **kwargs):
_VALID_URLS.discard('')
-class CParserError(ValueError):
+class ParserError(ValueError):
"""
- Exception that is thrown by the C engine when it encounters
- a parsing error in `pd.read_csv`
+ Exception that is thrown by an error is encountered in `pd.read_csv`
"""
pass
+# gh-12665: Alias for now and remove later.
+CParserError = ParserError
+
class DtypeWarning(Warning):
"""
@@ -139,39 +139,6 @@ def _is_s3_url(url):
return False
-def maybe_read_encoded_stream(reader, encoding=None, compression=None):
- """read an encoded stream from the reader and transform the bytes to
- unicode if required based on the encoding
-
- Parameters
- ----------
- reader : a streamable file-like object
- encoding : optional, the encoding to attempt to read
-
- Returns
- -------
- a tuple of (a stream of decoded bytes, the encoding which was used)
-
- """
-
- if compat.PY3 or encoding is not None: # pragma: no cover
- if encoding:
- errors = 'strict'
- else:
- errors = 'replace'
- encoding = 'utf-8'
-
- if compression == 'gzip':
- reader = BytesIO(reader.read())
- else:
- reader = StringIO(reader.read().decode(encoding, errors))
- else:
- if compression == 'gzip':
- reader = BytesIO(reader.read())
- encoding = None
- return reader, encoding
-
-
def _expand_user(filepath_or_buffer):
"""Return the argument with an initial component of ~ or ~user
replaced by that user's home directory.
@@ -235,18 +202,14 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
"""
if _is_url(filepath_or_buffer):
- req = _urlopen(str(filepath_or_buffer))
- if compression == 'infer':
- content_encoding = req.headers.get('Content-Encoding', None)
- if content_encoding == 'gzip':
- compression = 'gzip'
- else:
- compression = None
- # cat on the compression to the tuple returned by the function
- to_return = (list(maybe_read_encoded_stream(req, encoding,
- compression)) +
- [compression])
- return tuple(to_return)
+ url = str(filepath_or_buffer)
+ req = _urlopen(url)
+ content_encoding = req.headers.get('Content-Encoding', None)
+ if content_encoding == 'gzip':
+ # Override compression based on Content-Encoding header
+ compression = 'gzip'
+ reader = BytesIO(req.read())
+ return reader, encoding, compression
if _is_s3_url(filepath_or_buffer):
from pandas.io.s3 import get_filepath_or_buffer
@@ -274,64 +237,161 @@ def file_path_to_url(path):
return urljoin('file:', pathname2url(path))
-# ZipFile is not a context manager for <= 2.6
-# must be tuple index here since 2.6 doesn't use namedtuple for version_info
-if sys.version_info[1] <= 6:
- @contextmanager
- def ZipFile(*args, **kwargs):
- with closing(zipfile.ZipFile(*args, **kwargs)) as zf:
- yield zf
-else:
- ZipFile = zipfile.ZipFile
+_compression_to_extension = {
+ 'gzip': '.gz',
+ 'bz2': '.bz2',
+ 'zip': '.zip',
+ 'xz': '.xz',
+}
+
+
+def _infer_compression(filepath_or_buffer, compression):
+ """
+ Get file handle for given path/buffer and mode.
+
+ Parameters
+ ----------
+ filepath_or_buf :
+ a path (str) or buffer
+ compression : str, or None
+ Returns
+ -------
+ string compression method, None
-def _get_handle(path, mode, encoding=None, compression=None, memory_map=False):
- """Gets file handle for given path and mode.
+ Raises
+ ------
+ ValueError on invalid compression specified
+
+ If compression='infer', infer compression. If compression
"""
- if compression is not None:
- if encoding is not None and not compat.PY3:
- msg = 'encoding + compression not yet supported in Python 2'
+
+ # No compression has been explicitly specified
+ if compression is None:
+ return None
+
+ # Cannot infer compression of a buffer. Hence assume no compression.
+ is_path = isinstance(filepath_or_buffer, compat.string_types)
+ if compression == 'infer' and not is_path:
+ return None
+
+ # Infer compression from the filename/URL extension
+ if compression == 'infer':
+ for compression, extension in _compression_to_extension.items():
+ if filepath_or_buffer.endswith(extension):
+ return compression
+ return None
+
+ # Compression has been specified. Check that it's valid
+ if compression in _compression_to_extension:
+ return compression
+
+ msg = 'Unrecognized compression type: {}'.format(compression)
+ valid = ['infer', None] + sorted(_compression_to_extension)
+ msg += '\nValid compression types are {}'.format(valid)
+ raise ValueError(msg)
+
+
+def _get_handle(path_or_buf, mode, encoding=None, compression=None,
+ memory_map=False):
+ """
+ Get file handle for given path/buffer and mode.
+
+ Parameters
+ ----------
+ path_or_buf :
+ a path (str) or buffer
+ mode : str
+ mode to open path_or_buf with
+ encoding : str or None
+ compression : str or None
+ Supported compression protocols are gzip, bz2, zip, and xz
+ memory_map : boolean, default False
+ See parsers._parser_params for more information.
+
+ Returns
+ -------
+ f : file-like
+ A file-like object
+ handles : list of file-like objects
+ A list of file-like object that were openned in this function.
+ """
+
+ handles = list()
+ f = path_or_buf
+ is_path = isinstance(path_or_buf, compat.string_types)
+
+ if compression:
+
+ if compat.PY2 and not is_path and encoding:
+ msg = 'compression with encoding is not yet supported in Python 2'
raise ValueError(msg)
+ # GZ Compression
if compression == 'gzip':
import gzip
- f = gzip.GzipFile(path, mode)
+ if is_path:
+ f = gzip.open(path_or_buf, mode)
+ else:
+ f = gzip.GzipFile(fileobj=path_or_buf)
+
+ # BZ Compression
elif compression == 'bz2':
import bz2
- f = bz2.BZ2File(path, mode)
+ if is_path:
+ f = bz2.BZ2File(path_or_buf, mode)
+ elif compat.PY2:
+ # Python 2's bz2 module can't take file objects, so have to
+ # run through decompress manually
+ f = StringIO(bz2.decompress(path_or_buf.read()))
+ path_or_buf.close()
+ else:
+ f = bz2.BZ2File(path_or_buf)
+
+ # ZIP Compression
elif compression == 'zip':
import zipfile
- zip_file = zipfile.ZipFile(path)
+ zip_file = zipfile.ZipFile(path_or_buf)
zip_names = zip_file.namelist()
-
if len(zip_names) == 1:
- file_name = zip_names.pop()
- f = zip_file.open(file_name)
+ f = zip_file.open(zip_names.pop())
elif len(zip_names) == 0:
raise ValueError('Zero files found in ZIP file {}'
- .format(path))
+ .format(path_or_buf))
else:
raise ValueError('Multiple files found in ZIP file.'
- ' Only one file per ZIP :{}'
+ ' Only one file per ZIP: {}'
.format(zip_names))
+
+ # XZ Compression
elif compression == 'xz':
lzma = compat.import_lzma()
- f = lzma.LZMAFile(path, mode)
+ f = lzma.LZMAFile(path_or_buf, mode)
+
+ # Unrecognized Compression
else:
- raise ValueError('Unrecognized compression type: %s' %
- compression)
- if compat.PY3:
- from io import TextIOWrapper
- f = TextIOWrapper(f, encoding=encoding)
- return f
- else:
- if compat.PY3:
- if encoding:
- f = open(path, mode, encoding=encoding)
- else:
- f = open(path, mode, errors='replace')
+ msg = 'Unrecognized compression type: {}'.format(compression)
+ raise ValueError(msg)
+
+ handles.append(f)
+
+ elif is_path:
+ if compat.PY2:
+ # Python 2
+ f = open(path_or_buf, mode)
+ elif encoding:
+ # Python 3 and encoding
+ f = open(path_or_buf, mode, encoding=encoding)
else:
- f = open(path, mode)
+ # Python 3 and no explicit encoding
+ f = open(path_or_buf, mode, errors='replace')
+ handles.append(f)
+
+ # in Python 3, convert BytesIO or fileobjects passed with an encoding
+ if compat.PY3 and (compression or isinstance(f, compat.BytesIO)):
+ from io import TextIOWrapper
+ f = TextIOWrapper(f, encoding=encoding)
+ handles.append(f)
if memory_map and hasattr(f, 'fileno'):
try:
@@ -345,7 +405,7 @@ def _get_handle(path, mode, encoding=None, compression=None, memory_map=False):
# leave the file handler as is then
pass
- return f
+ return f, handles
class MMapWrapper(BaseIterator):
diff --git a/pandas/io/data.py b/pandas/io/data.py
index 09c7aef0cde1a..e76790a6ab98b 100644
--- a/pandas/io/data.py
+++ b/pandas/io/data.py
@@ -1,6 +1,6 @@
raise ImportError(
"The pandas.io.data module is moved to a separate package "
"(pandas-datareader). After installing the pandas-datareader package "
- "(https://github.com/pandas-dev/pandas-datareader), you can change "
+ "(https://github.com/pydata/pandas-datareader), you can change "
"the import ``from pandas.io import data, wb`` to "
"``from pandas_datareader import data, wb``.")
diff --git a/pandas/io/excel.py b/pandas/io/excel.py
index 6662d106ad85d..6b7c597ecfcdc 100644
--- a/pandas/io/excel.py
+++ b/pandas/io/excel.py
@@ -21,7 +21,7 @@
from pandas.tseries.period import Period
from pandas import json
from pandas.compat import (map, zip, reduce, range, lrange, u, add_metaclass,
- string_types)
+ string_types, OrderedDict)
from pandas.core import config
from pandas.formats.printing import pprint_thing
import pandas.compat as compat
@@ -87,6 +87,14 @@
either be integers or column labels, values are functions that take one
input argument, the Excel cell content, and return the transformed
content.
+dtype : Type name or dict of column -> type, default None
+ Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
+ Use `str` or `object` to preserve and not interpret dtype.
+ If converters are specified, they will be applied INSTEAD
+ of dtype conversion.
+
+ .. versionadded:: 0.20.0
+
true_values : list, default None
Values to consider as True
@@ -184,8 +192,8 @@ def read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0,
index_col=None, names=None, parse_cols=None, parse_dates=False,
date_parser=None, na_values=None, thousands=None,
convert_float=True, has_index_names=None, converters=None,
- true_values=None, false_values=None, engine=None, squeeze=False,
- **kwds):
+ dtype=None, true_values=None, false_values=None, engine=None,
+ squeeze=False, **kwds):
if not isinstance(io, ExcelFile):
io = ExcelFile(io, engine=engine)
@@ -195,7 +203,7 @@ def read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0,
index_col=index_col, parse_cols=parse_cols, parse_dates=parse_dates,
date_parser=date_parser, na_values=na_values, thousands=thousands,
convert_float=convert_float, has_index_names=has_index_names,
- skip_footer=skip_footer, converters=converters,
+ skip_footer=skip_footer, converters=converters, dtype=dtype,
true_values=true_values, false_values=false_values, squeeze=squeeze,
**kwds)
@@ -318,7 +326,7 @@ def _parse_excel(self, sheetname=0, header=0, skiprows=None, names=None,
parse_cols=None, parse_dates=False, date_parser=None,
na_values=None, thousands=None, convert_float=True,
true_values=None, false_values=None, verbose=False,
- squeeze=False, **kwds):
+ dtype=None, squeeze=False, **kwds):
skipfooter = kwds.pop('skipfooter', None)
if skipfooter is not None:
@@ -418,9 +426,9 @@ def _parse_cell(cell_contents, cell_typ):
sheets = [sheetname]
# handle same-type duplicates.
- sheets = list(set(sheets))
+ sheets = list(OrderedDict.fromkeys(sheets).keys())
- output = {}
+ output = OrderedDict()
for asheetname in sheets:
if verbose:
@@ -501,6 +509,7 @@ def _parse_cell(cell_contents, cell_typ):
skiprows=skiprows,
skipfooter=skip_footer,
squeeze=squeeze,
+ dtype=dtype,
**kwds)
output[asheetname] = parser.read()
diff --git a/pandas/io/json.py b/pandas/io/json.py
index 1e258101a5d86..0a6b8af179e12 100644
--- a/pandas/io/json.py
+++ b/pandas/io/json.py
@@ -123,32 +123,38 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
file. For file URLs, a host is expected. For instance, a local file
could be ``file://localhost/path/to/table.json``
- orient
-
- * `Series`
-
+ orient : string,
+ Indication of expected JSON string format.
+ Compatible JSON strings can be produced by ``to_json()`` with a
+ corresponding orient value.
+ The set of possible orients is:
+
+ - ``'split'`` : dict like
+ ``{index -> [index], columns -> [columns], data -> [values]}``
+ - ``'records'`` : list like
+ ``[{column -> value}, ... , {column -> value}]``
+ - ``'index'`` : dict like ``{index -> {column -> value}}``
+ - ``'columns'`` : dict like ``{column -> {index -> value}}``
+ - ``'values'`` : just the values array
+
+ The allowed and default values depend on the value
+ of the `typ` parameter.
+
+ * when ``typ == 'series'``,
+
+ - allowed orients are ``{'split','records','index'}``
- default is ``'index'``
- - allowed values are: ``{'split','records','index'}``
- The Series index must be unique for orient ``'index'``.
- * `DataFrame`
+ * when ``typ == 'frame'``,
+ - allowed orients are ``{'split','records','index',
+ 'columns','values'}``
- default is ``'columns'``
- - allowed values are: {'split','records','index','columns','values'}
- - The DataFrame index must be unique for orients 'index' and
- 'columns'.
- - The DataFrame columns must be unique for orients 'index',
- 'columns', and 'records'.
-
- * The format of the JSON string
-
- - split : dict like
- ``{index -> [index], columns -> [columns], data -> [values]}``
- - records : list like
- ``[{column -> value}, ... , {column -> value}]``
- - index : dict like ``{index -> {column -> value}}``
- - columns : dict like ``{column -> {index -> value}}``
- - values : just the values array
+ - The DataFrame index must be unique for orients ``'index'`` and
+ ``'columns'``.
+ - The DataFrame columns must be unique for orients ``'index'``,
+ ``'columns'``, and ``'records'``.
typ : type of object to recover (series or frame), default 'frame'
dtype : boolean or dict, default True
@@ -197,7 +203,48 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
Returns
-------
- result : Series or DataFrame
+ result : Series or DataFrame, depending on the value of `typ`.
+
+ See Also
+ --------
+ DataFrame.to_json
+
+ Examples
+ --------
+
+ >>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
+ ... index=['row 1', 'row 2'],
+ ... columns=['col 1', 'col 2'])
+
+ Encoding/decoding a Dataframe using ``'split'`` formatted JSON:
+
+ >>> df.to_json(orient='split')
+ '{"columns":["col 1","col 2"],
+ "index":["row 1","row 2"],
+ "data":[["a","b"],["c","d"]]}'
+ >>> pd.read_json(_, orient='split')
+ col 1 col 2
+ row 1 a b
+ row 2 c d
+
+ Encoding/decoding a Dataframe using ``'index'`` formatted JSON:
+
+ >>> df.to_json(orient='index')
+ '{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'
+ >>> pd.read_json(_, orient='index')
+ col 1 col 2
+ row 1 a b
+ row 2 c d
+
+ Encoding/decoding a Dataframe using ``'records'`` formatted JSON.
+ Note that index labels are not preserved with this encoding.
+
+ >>> df.to_json(orient='records')
+ '[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
+ >>> pd.read_json(_, orient='records')
+ col 1 col 2
+ 0 a b
+ 1 c d
"""
filepath_or_buffer, _, _ = get_filepath_or_buffer(path_or_buf,
@@ -212,8 +259,10 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
exists = False
if exists:
- with _get_handle(filepath_or_buffer, 'r', encoding=encoding) as fh:
- json = fh.read()
+ fh, handles = _get_handle(filepath_or_buffer, 'r',
+ encoding=encoding)
+ json = fh.read()
+ fh.close()
else:
json = filepath_or_buffer
elif hasattr(filepath_or_buffer, 'read'):
@@ -676,7 +725,9 @@ def nested_to_record(ds, prefix="", level=0):
def json_normalize(data, record_path=None, meta=None,
meta_prefix=None,
- record_prefix=None):
+ record_prefix=None,
+ errors='raise'):
+
"""
"Normalize" semi-structured JSON data into a flat table
@@ -693,6 +744,13 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
+ errors : {'raise', 'ignore'}, default 'raise'
+ * ignore : will ignore KeyError if keys listed in meta are not
+ always present
+ * raise : will raise KeyError if keys listed in meta are not
+ always present
+
+ .. versionadded:: 0.20.0
Returns
-------
@@ -792,7 +850,16 @@ def _recursive_extract(data, path, seen_meta, level=0):
if level + 1 > len(val):
meta_val = seen_meta[key]
else:
- meta_val = _pull_field(obj, val[level:])
+ try:
+ meta_val = _pull_field(obj, val[level:])
+ except KeyError as e:
+ if errors == 'ignore':
+ meta_val = np.nan
+ else:
+ raise \
+ KeyError("Try running with "
+ "errors='ignore' as key "
+ "%s is not always present", e)
meta_vals[key].append(meta_val)
records.extend(recs)
diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py
index f8cf04e08ab03..200943324ce66 100755
--- a/pandas/io/parsers.py
+++ b/pandas/io/parsers.py
@@ -17,17 +17,21 @@
zip, string_types, map, u)
from pandas.types.common import (is_integer, _ensure_object,
is_list_like, is_integer_dtype,
- is_float,
- is_scalar)
+ is_float, is_dtype_equal,
+ is_object_dtype, is_string_dtype,
+ is_scalar, is_categorical_dtype)
+from pandas.types.missing import isnull
+from pandas.types.cast import _astype_nansafe
from pandas.core.index import Index, MultiIndex, RangeIndex
+from pandas.core.series import Series
from pandas.core.frame import DataFrame
+from pandas.core.categorical import Categorical
from pandas.core.common import AbstractMethodError
-from pandas.core.config import get_option
from pandas.io.date_converters import generic_parser
from pandas.io.common import (get_filepath_or_buffer, _validate_header_arg,
_get_handle, UnicodeReader, UTF8Recoder,
- BaseIterator, CParserError, EmptyDataError,
- ParserWarning, _NA_VALUES)
+ BaseIterator, ParserError, EmptyDataError,
+ ParserWarning, _NA_VALUES, _infer_compression)
from pandas.tseries import tools
from pandas.util.decorators import Appender
@@ -85,13 +89,18 @@
MultiIndex is used. If you have a malformed file with delimiters at the end
of each line, you might consider index_col=False to force pandas to _not_
use the first column as the index (row names)
-usecols : array-like, default None
- Return a subset of the columns. All elements in this array must either
+usecols : array-like or callable, default None
+ Return a subset of the columns. If array-like, all elements must either
be positional (i.e. integer indices into the document columns) or strings
that correspond to column names provided either by the user in `names` or
- inferred from the document header row(s). For example, a valid `usecols`
- parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Using this parameter
- results in much faster parsing time and lower memory usage.
+ inferred from the document header row(s). For example, a valid array-like
+ `usecols` parameter would be [0, 1, 2] or ['foo', 'bar', 'baz'].
+
+ If callable, the callable function will be evaluated against the column
+ names, returning names where the callable function evaluates to True. An
+ example of a valid callable argument would be ``lambda x: x.upper() in
+ ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster
+ parsing time and lower memory usage.
as_recarray : boolean, default False
DEPRECATED: this argument will be removed in a future version. Please call
`pd.read_csv(...).to_records()` instead.
@@ -110,8 +119,9 @@
are duplicate names in the columns.
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
- (Unsupported with engine='python'). Use `str` or `object` to preserve and
- not interpret dtype.
+ Use `str` or `object` to preserve and not interpret dtype.
+ If converters are specified, they will be applied INSTEAD
+ of dtype conversion.
%s
converters : dict, default None
Dict of functions for converting values in certain columns. Keys can either
@@ -157,6 +167,10 @@
* dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result
'foo'
+ If a column or index contains an unparseable date, the entire column or
+ index will be returned unaltered as an object data type. For non-standard
+ datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv``
+
Note: A fast-path exists for iso8601-formatted dates.
infer_datetime_format : boolean, default False
If True and parse_dates is enabled, pandas will attempt to infer the format
@@ -343,37 +357,17 @@ def _validate_nrows(nrows):
def _read(filepath_or_buffer, kwds):
- "Generic reader of line files."
+ """Generic reader of line files."""
encoding = kwds.get('encoding', None)
if encoding is not None:
encoding = re.sub('_', '-', encoding).lower()
kwds['encoding'] = encoding
- # If the input could be a filename, check for a recognizable compression
- # extension. If we're reading from a URL, the `get_filepath_or_buffer`
- # will use header info to determine compression, so use what it finds in
- # that case.
- inferred_compression = kwds.get('compression')
- if inferred_compression == 'infer':
- if isinstance(filepath_or_buffer, compat.string_types):
- if filepath_or_buffer.endswith('.gz'):
- inferred_compression = 'gzip'
- elif filepath_or_buffer.endswith('.bz2'):
- inferred_compression = 'bz2'
- elif filepath_or_buffer.endswith('.zip'):
- inferred_compression = 'zip'
- elif filepath_or_buffer.endswith('.xz'):
- inferred_compression = 'xz'
- else:
- inferred_compression = None
- else:
- inferred_compression = None
-
+ compression = kwds.get('compression')
+ compression = _infer_compression(filepath_or_buffer, compression)
filepath_or_buffer, _, compression = get_filepath_or_buffer(
- filepath_or_buffer, encoding,
- compression=kwds.get('compression', None))
- kwds['compression'] = (inferred_compression if compression == 'infer'
- else compression)
+ filepath_or_buffer, encoding, compression)
+ kwds['compression'] = compression
if kwds.get('date_parser', None) is not None:
if isinstance(kwds['parse_dates'], bool):
@@ -420,6 +414,7 @@ def _read(filepath_or_buffer, kwds):
'true_values': None,
'false_values': None,
'converters': None,
+ 'dtype': None,
'skipfooter': 0,
'keep_default_na': True,
@@ -460,7 +455,6 @@ def _read(filepath_or_buffer, kwds):
'buffer_lines': None,
'error_bad_lines': True,
'warn_bad_lines': True,
- 'dtype': None,
'float_precision': None
}
@@ -475,7 +469,6 @@ def _read(filepath_or_buffer, kwds):
'buffer_lines',
'error_bad_lines',
'warn_bad_lines',
- 'dtype',
'float_precision',
])
_deprecated_args = set([
@@ -833,9 +826,6 @@ def _clean_options(self, options, engine):
" ignored as it is not supported by the 'python'"
" engine.").format(reason=fallback_reason,
option=arg)
- if arg == 'dtype':
- msg += " (Note the 'converters' option provides"\
- " similar functionality.)"
raise ValueError(msg)
del result[arg]
@@ -975,17 +965,33 @@ def _is_index_col(col):
return col is not None and col is not False
+def _evaluate_usecols(usecols, names):
+ """
+ Check whether or not the 'usecols' parameter
+ is a callable. If so, enumerates the 'names'
+ parameter and returns a set of indices for
+ each entry in 'names' that evaluates to True.
+ If not a callable, returns 'usecols'.
+ """
+ if callable(usecols):
+ return set([i for i, name in enumerate(names)
+ if usecols(name)])
+ return usecols
+
+
def _validate_usecols_arg(usecols):
"""
Check whether or not the 'usecols' parameter
- contains all integers (column selection by index)
- or strings (column by name). Raises a ValueError
- if that is not the case.
+ contains all integers (column selection by index),
+ strings (column by name) or is a callable. Raises
+ a ValueError if that is not the case.
"""
- msg = ("The elements of 'usecols' must "
- "either be all strings, all unicode, or all integers")
+ msg = ("'usecols' must either be all strings, all unicode, "
+ "all integers or a callable")
if usecols is not None:
+ if callable(usecols):
+ return usecols
usecols_dtype = lib.infer_dtype(usecols)
if usecols_dtype not in ('empty', 'integer',
'string', 'unicode'):
@@ -1141,7 +1147,7 @@ def tostr(x):
# long
for n in range(len(columns[0])):
if all(['Unnamed' in tostr(c[n]) for c in columns]):
- raise CParserError(
+ raise ParserError(
"Passed header=[%s] are too many rows for this "
"multi_index of columns"
% ','.join([str(x) for x in self.header])
@@ -1284,7 +1290,7 @@ def _agg_index(self, index, try_parse_dates=True):
col_na_values, col_na_fvalues = _get_na_values(
col_name, self.na_values, self.na_fvalues)
- arr, _ = self._convert_types(arr, col_na_values | col_na_fvalues)
+ arr, _ = self._infer_types(arr, col_na_values | col_na_fvalues)
arrays.append(arr)
index = MultiIndex.from_arrays(arrays, names=self.index_names)
@@ -1292,10 +1298,15 @@ def _agg_index(self, index, try_parse_dates=True):
return index
def _convert_to_ndarrays(self, dct, na_values, na_fvalues, verbose=False,
- converters=None):
+ converters=None, dtypes=None):
result = {}
for c, values in compat.iteritems(dct):
conv_f = None if converters is None else converters.get(c, None)
+ if isinstance(dtypes, dict):
+ cast_type = dtypes.get(c, None)
+ else:
+ # single dtype or None
+ cast_type = dtypes
if self.na_filter:
col_na_values, col_na_fvalues = _get_na_values(
@@ -1303,17 +1314,35 @@ def _convert_to_ndarrays(self, dct, na_values, na_fvalues, verbose=False,
else:
col_na_values, col_na_fvalues = set(), set()
- coerce_type = True
if conv_f is not None:
+ # conv_f applied to data before inference
+ if cast_type is not None:
+ warnings.warn(("Both a converter and dtype were specified "
+ "for column {0} - only the converter will "
+ "be used").format(c), ParserWarning,
+ stacklevel=7)
+
try:
values = lib.map_infer(values, conv_f)
except ValueError:
mask = lib.ismember(values, na_values).view(np.uint8)
values = lib.map_infer_mask(values, conv_f, mask)
- coerce_type = False
- cvals, na_count = self._convert_types(
- values, set(col_na_values) | col_na_fvalues, coerce_type)
+ cvals, na_count = self._infer_types(
+ values, set(col_na_values) | col_na_fvalues,
+ try_num_bool=False)
+ else:
+ # skip inference if specified dtype is object
+ try_num_bool = not (cast_type and is_string_dtype(cast_type))
+
+ # general type inference and conversion
+ cvals, na_count = self._infer_types(
+ values, set(col_na_values) | col_na_fvalues,
+ try_num_bool)
+
+ # type specificed in dtype param
+ if cast_type and not is_dtype_equal(cvals, cast_type):
+ cvals = self._cast_types(cvals, cast_type, c)
if issubclass(cvals.dtype.type, np.integer) and self.compact_ints:
cvals = lib.downcast_int64(
@@ -1325,7 +1354,23 @@ def _convert_to_ndarrays(self, dct, na_values, na_fvalues, verbose=False,
print('Filled %d NA values in column %s' % (na_count, str(c)))
return result
- def _convert_types(self, values, na_values, try_num_bool=True):
+ def _infer_types(self, values, na_values, try_num_bool=True):
+ """
+ Infer types of values, possibly casting
+
+ Parameters
+ ----------
+ values : ndarray
+ na_values : set
+ try_num_bool : bool, default try
+ try to cast values to numeric (first preference) or boolean
+
+ Returns:
+ --------
+ converted : ndarray
+ na_count : int
+ """
+
na_count = 0
if issubclass(values.dtype.type, (np.number, np.bool_)):
mask = lib.ismember(values, na_values)
@@ -1339,6 +1384,7 @@ def _convert_types(self, values, na_values, try_num_bool=True):
if try_num_bool:
try:
result = lib.maybe_convert_numeric(values, na_values, False)
+ na_count = isnull(result).sum()
except Exception:
result = values
if values.dtype == np.object_:
@@ -1355,6 +1401,38 @@ def _convert_types(self, values, na_values, try_num_bool=True):
return result, na_count
+ def _cast_types(self, values, cast_type, column):
+ """
+ Cast values to specified type
+
+ Parameters
+ ----------
+ values : ndarray
+ cast_type : string or np.dtype
+ dtype to cast values to
+ column : string
+ column name - used only for error reporting
+
+ Returns
+ -------
+ converted : ndarray
+ """
+
+ if is_categorical_dtype(cast_type):
+ # XXX this is for consistency with
+ # c-parser which parses all categories
+ # as strings
+ if not is_object_dtype(values):
+ values = _astype_nansafe(values, str)
+ values = Categorical(values)
+ else:
+ try:
+ values = _astype_nansafe(values, cast_type, copy=True)
+ except ValueError:
+ raise ValueError("Unable to convert column %s to "
+ "type %s" % (column, cast_type))
+ return values
+
def _do_date_conversions(self, names, data):
# returns data, columns
if self.parse_dates is not None:
@@ -1425,11 +1503,12 @@ def __init__(self, src, **kwds):
self.orig_names = self.names[:]
if self.usecols:
- if len(self.names) > len(self.usecols):
+ usecols = _evaluate_usecols(self.usecols, self.orig_names)
+ if len(self.names) > len(usecols):
self.names = [n for i, n in enumerate(self.names)
- if (i in self.usecols or n in self.usecols)]
+ if (i in usecols or n in usecols)]
- if len(self.names) < len(self.usecols):
+ if len(self.names) < len(usecols):
raise ValueError("Usecols do not match names.")
self._set_noconvert_columns()
@@ -1456,6 +1535,8 @@ def __init__(self, src, **kwds):
def close(self):
for f in self.handles:
f.close()
+
+ # close additional handles opened by C parser (for compression)
try:
self._reader.close()
except:
@@ -1507,10 +1588,11 @@ def read(self, nrows=None):
if self._first_chunk:
self._first_chunk = False
names = self._maybe_dedup_names(self.orig_names)
-
index, columns, col_dict = _get_empty_meta(
names, self.index_col, self.index_names,
dtype=self.kwds.get('dtype'))
+ columns = self._maybe_make_multi_index_columns(
+ columns, self.col_names)
if self.usecols is not None:
columns = self._filter_usecols(columns)
@@ -1588,9 +1670,10 @@ def read(self, nrows=None):
def _filter_usecols(self, names):
# hackish
- if self.usecols is not None and len(names) != len(self.usecols):
+ usecols = _evaluate_usecols(self.usecols, names)
+ if usecols is not None and len(names) != len(usecols):
names = [name for i, name in enumerate(names)
- if i in self.usecols or name in self.usecols]
+ if i in usecols or name in usecols]
return names
def _get_index_names(self):
@@ -1671,70 +1754,6 @@ def count_empty_vals(vals):
return sum([1 for v in vals if v == '' or v is None])
-def _wrap_compressed(f, compression, encoding=None):
- """wraps compressed fileobject in a decompressing fileobject
- NOTE: For all files in Python 3.2 and for bzip'd files under all Python
- versions, this means reading in the entire file and then re-wrapping it in
- StringIO.
- """
- compression = compression.lower()
- encoding = encoding or get_option('display.encoding')
-
- if compression == 'gzip':
- import gzip
-
- f = gzip.GzipFile(fileobj=f)
- if compat.PY3:
- from io import TextIOWrapper
-
- f = TextIOWrapper(f)
- return f
- elif compression == 'bz2':
- import bz2
-
- if compat.PY3:
- f = bz2.open(f, 'rt', encoding=encoding)
- else:
- # Python 2's bz2 module can't take file objects, so have to
- # run through decompress manually
- data = bz2.decompress(f.read())
- f = StringIO(data)
- return f
- elif compression == 'zip':
- import zipfile
- zip_file = zipfile.ZipFile(f)
- zip_names = zip_file.namelist()
-
- if len(zip_names) == 1:
- file_name = zip_names.pop()
- f = zip_file.open(file_name)
- return f
-
- elif len(zip_names) == 0:
- raise ValueError('Corrupted or zero files found in compressed '
- 'zip file %s', zip_file.filename)
-
- else:
- raise ValueError('Multiple files found in compressed '
- 'zip file %s', str(zip_names))
-
- elif compression == 'xz':
-
- lzma = compat.import_lzma()
- f = lzma.LZMAFile(f)
-
- if compat.PY3:
- from io import TextIOWrapper
-
- f = TextIOWrapper(f)
-
- return f
-
- else:
- raise ValueError('do not recognize compression method %s'
- % compression)
-
-
class PythonParser(ParserBase):
def __init__(self, f, **kwds):
@@ -1759,6 +1778,9 @@ def __init__(self, f, **kwds):
self.delimiter = kwds['delimiter']
self.quotechar = kwds['quotechar']
+ if isinstance(self.quotechar, compat.text_type):
+ self.quotechar = str(self.quotechar)
+
self.escapechar = kwds['escapechar']
self.doublequote = kwds['doublequote']
self.skipinitialspace = kwds['skipinitialspace']
@@ -1777,6 +1799,7 @@ def __init__(self, f, **kwds):
self.verbose = kwds['verbose']
self.converters = kwds['converters']
+ self.dtype = kwds['dtype']
self.compact_ints = kwds['compact_ints']
self.use_unsigned = kwds['use_unsigned']
@@ -1786,20 +1809,10 @@ def __init__(self, f, **kwds):
self.comment = kwds['comment']
self._comment_lines = []
- if isinstance(f, compat.string_types):
- f = _get_handle(f, 'r', encoding=self.encoding,
- compression=self.compression,
- memory_map=self.memory_map)
- self.handles.append(f)
- elif self.compression:
- f = _wrap_compressed(f, self.compression, self.encoding)
- self.handles.append(f)
- # in Python 3, convert BytesIO or fileobjects passed with an encoding
- elif compat.PY3 and isinstance(f, compat.BytesIO):
- from io import TextIOWrapper
-
- f = TextIOWrapper(f, encoding=self.encoding)
- self.handles.append(f)
+ f, handles = _get_handle(f, 'r', encoding=self.encoding,
+ compression=self.compression,
+ memory_map=self.memory_map)
+ self.handles.extend(handles)
# Set self.data to something that can read lines.
if hasattr(f, 'readline'):
@@ -1974,8 +1987,11 @@ def read(self, rows=None):
if not len(content): # pragma: no cover
# DataFrame with the right metadata, even though it's length 0
names = self._maybe_dedup_names(self.orig_names)
- return _get_empty_meta(names, self.index_col,
- self.index_names)
+ index, columns, col_dict = _get_empty_meta(
+ names, self.index_col, self.index_names, self.dtype)
+ columns = self._maybe_make_multi_index_columns(
+ columns, self.col_names)
+ return index, columns, col_dict
# handle new style for names in index
count_empty_content_vals = count_empty_vals(content[0])
@@ -2023,15 +2039,25 @@ def get_chunk(self, size=None):
def _convert_data(self, data):
# apply converters
- clean_conv = {}
-
- for col, f in compat.iteritems(self.converters):
- if isinstance(col, int) and col not in self.orig_names:
- col = self.orig_names[col]
- clean_conv[col] = f
+ def _clean_mapping(mapping):
+ "converts col numbers to names"
+ clean = {}
+ for col, v in compat.iteritems(mapping):
+ if isinstance(col, int) and col not in self.orig_names:
+ col = self.orig_names[col]
+ clean[col] = v
+ return clean
+
+ clean_conv = _clean_mapping(self.converters)
+ if not isinstance(self.dtype, dict):
+ # handles single dtype applied to all columns
+ clean_dtypes = self.dtype
+ else:
+ clean_dtypes = _clean_mapping(self.dtype)
return self._convert_to_ndarrays(data, self.na_values, self.na_fvalues,
- self.verbose, clean_conv)
+ self.verbose, clean_conv,
+ clean_dtypes)
def _to_recarray(self, data, columns):
dtypes = []
@@ -2078,6 +2104,12 @@ def _infer_columns(self):
# We have an empty file, so check
# if columns are provided. That will
# serve as the 'line' for parsing
+ if have_mi_columns and hr > 0:
+ if clear_buffer:
+ self._clear_buffer()
+ columns.append([None] * len(columns[-1]))
+ return columns, num_original_columns
+
if not self.names:
raise EmptyDataError(
"No columns to parse from file")
@@ -2191,16 +2223,18 @@ def _handle_usecols(self, columns, usecols_key):
usecols_key is used if there are string usecols.
"""
if self.usecols is not None:
- if any([isinstance(u, string_types) for u in self.usecols]):
+ if callable(self.usecols):
+ col_indices = _evaluate_usecols(self.usecols, usecols_key)
+ elif any([isinstance(u, string_types) for u in self.usecols]):
if len(columns) > 1:
raise ValueError("If using multiple headers, usecols must "
"be integers.")
col_indices = []
- for u in self.usecols:
- if isinstance(u, string_types):
- col_indices.append(usecols_key.index(u))
+ for col in self.usecols:
+ if isinstance(col, string_types):
+ col_indices.append(usecols_key.index(col))
else:
- col_indices.append(u)
+ col_indices.append(col)
else:
col_indices = self.usecols
@@ -2311,14 +2345,23 @@ def _next_line(self):
try:
orig_line = next(self.data)
except csv.Error as e:
+ msg = str(e)
+
if 'NULL byte' in str(e):
- raise csv.Error(
- 'NULL byte detected. This byte '
- 'cannot be processed in Python\'s '
- 'native csv library at the moment, '
- 'so please pass in engine=\'c\' instead.')
- else:
- raise
+ msg = ('NULL byte detected. This byte '
+ 'cannot be processed in Python\'s '
+ 'native csv library at the moment, '
+ 'so please pass in engine=\'c\' instead')
+
+ if self.skipfooter > 0:
+ reason = ('Error could possibly be due to '
+ 'parsing errors in the skipped footer rows '
+ '(the skipfooter keyword is only applied '
+ 'after Python\'s csv library has parsed '
+ 'all rows).')
+ msg += '. ' + reason
+
+ raise csv.Error(msg)
line = self._check_comments([orig_line])[0]
self.pos += 1
if (not self.skip_blank_lines and
@@ -2499,6 +2542,11 @@ def _rows_to_cols(self, content):
msg = ('Expected %d fields in line %d, saw %d' %
(col_len, row_num + 1, zip_len))
+ if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
+ # see gh-13374
+ reason = ('Error could possibly be due to quotes being '
+ 'ignored when a multi-char delimiter is used.')
+ msg += '. ' + reason
raise ValueError(msg)
if self.usecols:
@@ -2776,19 +2824,27 @@ def _clean_index_names(columns, index_col):
def _get_empty_meta(columns, index_col, index_names, dtype=None):
columns = list(columns)
- if dtype is None:
- dtype = {}
+ # Convert `dtype` to a defaultdict of some kind.
+ # This will enable us to write `dtype[col_name]`
+ # without worrying about KeyError issues later on.
+ if not isinstance(dtype, dict):
+ # if dtype == None, default will be np.object.
+ default_dtype = dtype or np.object
+ dtype = defaultdict(lambda: default_dtype)
else:
- if not isinstance(dtype, dict):
- dtype = defaultdict(lambda: dtype)
+ # Save a copy of the dictionary.
+ _dtype = dtype.copy()
+ dtype = defaultdict(lambda: np.object)
+
# Convert column indexes to column names.
- dtype = dict((columns[k] if is_integer(k) else k, v)
- for k, v in compat.iteritems(dtype))
+ for k, v in compat.iteritems(_dtype):
+ col = columns[k] if is_integer(k) else k
+ dtype[col] = v
if index_col is None or index_col is False:
index = Index([])
else:
- index = [np.empty(0, dtype=dtype.get(index_name, np.object))
+ index = [Series([], dtype=dtype[index_name])
for index_name in index_names]
index = MultiIndex.from_arrays(index, names=index_names)
index_col.sort()
@@ -2796,7 +2852,7 @@ def _get_empty_meta(columns, index_col, index_names, dtype=None):
columns.pop(n - i)
col_dict = dict((col_name,
- np.empty(0, dtype=dtype.get(col_name, np.object)))
+ Series([], dtype=dtype[col_name]))
for col_name in columns)
return index, columns, col_dict
diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py
index b8c2b146b6259..e474aeab1f6ca 100644
--- a/pandas/io/pytables.py
+++ b/pandas/io/pytables.py
@@ -3315,7 +3315,7 @@ def validate_data_columns(self, data_columns, min_itemsize):
# evaluate the passed data_columns, True == use all columns
# take only valide axis labels
if data_columns is True:
- data_columns = axis_labels
+ data_columns = list(axis_labels)
elif data_columns is None:
data_columns = []
@@ -3429,9 +3429,8 @@ def create_axes(self, axes, obj, validate=True, nan_rep=None,
j = len(self.index_axes)
# check for column conflicts
- if validate:
- for a in self.axes:
- a.maybe_set_size(min_itemsize=min_itemsize)
+ for a in self.axes:
+ a.maybe_set_size(min_itemsize=min_itemsize)
# reindex by our non_index_axes & compute data_columns
for a in self.non_index_axes:
@@ -4153,7 +4152,7 @@ def write(self, obj, data_columns=None, **kwargs):
obj = DataFrame({name: obj}, index=obj.index)
obj.columns = [name]
return super(AppendableSeriesTable, self).write(
- obj=obj, data_columns=obj.columns, **kwargs)
+ obj=obj, data_columns=obj.columns.tolist(), **kwargs)
def read(self, columns=None, **kwargs):
@@ -4254,7 +4253,7 @@ def write(self, obj, data_columns=None, **kwargs):
if data_columns is None:
data_columns = []
elif data_columns is True:
- data_columns = obj.columns[:]
+ data_columns = obj.columns.tolist()
obj, self.levels = self.validate_multiindex(obj)
for n in self.levels:
if n not in data_columns:
diff --git a/pandas/io/s3.py b/pandas/io/s3.py
index df8f1d9187031..8aa3694834a0a 100644
--- a/pandas/io/s3.py
+++ b/pandas/io/s3.py
@@ -99,9 +99,7 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
conn = boto.connect_s3(host=s3_host, anon=True)
b = conn.get_bucket(parsed_url.netloc, validate=False)
- if compat.PY2 and (compression == 'gzip' or
- (compression == 'infer' and
- filepath_or_buffer.endswith(".gz"))):
+ if compat.PY2 and compression:
k = boto.s3.key.Key(b, parsed_url.path)
filepath_or_buffer = BytesIO(k.get_contents_as_string(
encoding=encoding))
diff --git a/pandas/io/sas/sas7bdat.py b/pandas/io/sas/sas7bdat.py
index 2a82fd7a53222..91f417abc0502 100644
--- a/pandas/io/sas/sas7bdat.py
+++ b/pandas/io/sas/sas7bdat.py
@@ -225,6 +225,12 @@ def _get_properties(self):
self.os_name = self.os_name.decode(
self.encoding or self.default_encoding)
+ def __next__(self):
+ da = self.read(nrows=self.chunksize or 1)
+ if da is None:
+ raise StopIteration
+ return da
+
# Read a single float of the given width (4 or 8).
def _read_float(self, offset, width):
if width not in (4, 8):
@@ -591,6 +597,10 @@ def read(self, nrows=None):
if self._current_row_in_file_index >= self.row_count:
return None
+ m = self.row_count - self._current_row_in_file_index
+ if nrows > m:
+ nrows = m
+
nd = (self.column_types == b'd').sum()
ns = (self.column_types == b's').sum()
diff --git a/pandas/io/sql.py b/pandas/io/sql.py
index 47642c2e2bc28..c9f8d32e1b504 100644
--- a/pandas/io/sql.py
+++ b/pandas/io/sql.py
@@ -507,10 +507,11 @@ def _engine_builder(con):
if isinstance(con, string_types):
try:
import sqlalchemy
- con = sqlalchemy.create_engine(con)
- return con
except ImportError:
_SQLALCHEMY_INSTALLED = False
+ else:
+ con = sqlalchemy.create_engine(con)
+ return con
return con
diff --git a/pandas/io/stata.py b/pandas/io/stata.py
index 985ea9c051505..c35e07be2c31a 100644
--- a/pandas/io/stata.py
+++ b/pandas/io/stata.py
@@ -511,6 +511,9 @@ def _cast_to_stata_types(data):
(np.uint16, np.int16, np.int32),
(np.uint32, np.int32, np.int64))
+ float32_max = struct.unpack('= 2 ** 53 or data[col].min() <= -2 ** 53:
ws = precision_loss_doc % ('int64', 'float64')
+ elif dtype in (np.float32, np.float64):
+ value = data[col].max()
+ if np.isinf(value):
+ msg = 'Column {0} has a maximum value of infinity which is ' \
+ 'outside the range supported by Stata.'
+ raise ValueError(msg.format(col))
+ if dtype == np.float32 and value > float32_max:
+ data[col] = data[col].astype(np.float64)
+ elif dtype == np.float64:
+ if value > float64_max:
+ msg = 'Column {0} has a maximum value ({1}) outside the ' \
+ 'range supported by Stata ({1})'
+ raise ValueError(msg.format(col, value, float64_max))
if ws:
import warnings
@@ -1210,18 +1226,18 @@ def _read_old_header(self, first_char):
if tp in self.OLD_TYPE_MAPPING:
typlist.append(self.OLD_TYPE_MAPPING[tp])
else:
- typlist.append(tp - 127) # string
+ typlist.append(tp - 127) # py2 string, py3 bytes
try:
self.typlist = [self.TYPE_MAP[typ] for typ in typlist]
except:
raise ValueError("cannot convert stata types [{0}]"
- .format(','.join(typlist)))
+ .format(','.join(str(x) for x in typlist)))
try:
self.dtyplist = [self.DTYPE_MAP[typ] for typ in typlist]
except:
raise ValueError("cannot convert stata dtypes [{0}]"
- .format(','.join(typlist)))
+ .format(','.join(str(x) for x in typlist)))
if self.format_version > 108:
self.varlist = [self._null_terminate(self.path_or_buf.read(33))
@@ -2048,6 +2064,7 @@ def _prepare_pandas(self, data):
data = self._check_column_names(data)
# Check columns for compatibility with stata, upcast if necessary
+ # Raise if outside the supported range
data = _cast_to_stata_types(data)
# Replace NaNs with Stata missing values
diff --git a/pandas/io/tests/data/test_multisheet.xls b/pandas/io/tests/data/test_multisheet.xls
index fa37723fcdefb..7b4b9759a1a94 100644
Binary files a/pandas/io/tests/data/test_multisheet.xls and b/pandas/io/tests/data/test_multisheet.xls differ
diff --git a/pandas/io/tests/data/test_multisheet.xlsm b/pandas/io/tests/data/test_multisheet.xlsm
index 694f8e07d5e29..c6191bc61bc49 100644
Binary files a/pandas/io/tests/data/test_multisheet.xlsm and b/pandas/io/tests/data/test_multisheet.xlsm differ
diff --git a/pandas/io/tests/data/test_multisheet.xlsx b/pandas/io/tests/data/test_multisheet.xlsx
index 5de07772b276a..dc424a9963253 100644
Binary files a/pandas/io/tests/data/test_multisheet.xlsx and b/pandas/io/tests/data/test_multisheet.xlsx differ
diff --git a/pandas/io/tests/data/testdtype.xls b/pandas/io/tests/data/testdtype.xls
new file mode 100644
index 0000000000000..f63357524324f
Binary files /dev/null and b/pandas/io/tests/data/testdtype.xls differ
diff --git a/pandas/io/tests/data/testdtype.xlsm b/pandas/io/tests/data/testdtype.xlsm
new file mode 100644
index 0000000000000..20e658288d5ac
Binary files /dev/null and b/pandas/io/tests/data/testdtype.xlsm differ
diff --git a/pandas/io/tests/data/testdtype.xlsx b/pandas/io/tests/data/testdtype.xlsx
new file mode 100644
index 0000000000000..7c65263c373a3
Binary files /dev/null and b/pandas/io/tests/data/testdtype.xlsx differ
diff --git a/pandas/io/tests/json/test_json_norm.py b/pandas/io/tests/json/test_json_norm.py
index 4848db97194d9..36110898448ea 100644
--- a/pandas/io/tests/json/test_json_norm.py
+++ b/pandas/io/tests/json/test_json_norm.py
@@ -225,6 +225,65 @@ def test_nested_flattens(self):
self.assertEqual(result, expected)
+ def test_json_normalize_errors(self):
+ # GH14583: If meta keys are not always present
+ # a new option to set errors='ignore' has been implemented
+ i = {
+ "Trades": [{
+ "general": {
+ "tradeid": 100,
+ "trade_version": 1,
+ "stocks": [{
+
+ "symbol": "AAPL",
+ "name": "Apple",
+ "price": "0"
+ }, {
+ "symbol": "GOOG",
+ "name": "Google",
+ "price": "0"
+ }
+ ]
+ }
+ }, {
+ "general": {
+ "tradeid": 100,
+ "stocks": [{
+ "symbol": "AAPL",
+ "name": "Apple",
+ "price": "0"
+ }, {
+ "symbol": "GOOG",
+ "name": "Google",
+ "price": "0"
+ }
+ ]
+ }
+ }
+ ]
+ }
+ j = json_normalize(data=i['Trades'],
+ record_path=[['general', 'stocks']],
+ meta=[['general', 'tradeid'],
+ ['general', 'trade_version']],
+ errors='ignore')
+ expected = {'general.trade_version': {0: 1.0, 1: 1.0, 2: '', 3: ''},
+ 'general.tradeid': {0: 100, 1: 100, 2: 100, 3: 100},
+ 'name': {0: 'Apple', 1: 'Google', 2: 'Apple', 3: 'Google'},
+ 'price': {0: '0', 1: '0', 2: '0', 3: '0'},
+ 'symbol': {0: 'AAPL', 1: 'GOOG', 2: 'AAPL', 3: 'GOOG'}}
+
+ self.assertEqual(j.fillna('').to_dict(), expected)
+
+ self.assertRaises(KeyError,
+ json_normalize, data=i['Trades'],
+ record_path=[['general', 'stocks']],
+ meta=[['general', 'tradeid'],
+ ['general', 'trade_version']],
+ errors='raise'
+ )
+
+
if __name__ == '__main__':
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb',
'--pdb-failure', '-s'], exit=False)
diff --git a/pandas/io/tests/json/test_pandas.py b/pandas/io/tests/json/test_pandas.py
index 117ac2324d0e0..e6e6f33669e17 100644
--- a/pandas/io/tests/json/test_pandas.py
+++ b/pandas/io/tests/json/test_pandas.py
@@ -167,7 +167,7 @@ def _check_orient(df, orient, dtype=None, numpy=False,
if raise_ok is not None:
if isinstance(detail, raise_ok):
return
- raise
+ raise
if sort is not None and sort in unser.columns:
unser = unser.sort_values(sort)
@@ -971,7 +971,7 @@ def test_to_jsonl(self):
def test_latin_encoding(self):
if compat.PY2:
self.assertRaisesRegexp(
- TypeError, '\[unicode\] is not implemented as a table column')
+ TypeError, r'\[unicode\] is not implemented as a table column')
return
# GH 13774
diff --git a/pandas/io/tests/parser/c_parser_only.py b/pandas/io/tests/parser/c_parser_only.py
index 09d521e5a7e46..c6ef68fcac9a0 100644
--- a/pandas/io/tests/parser/c_parser_only.py
+++ b/pandas/io/tests/parser/c_parser_only.py
@@ -12,10 +12,9 @@
import pandas as pd
import pandas.util.testing as tm
-from pandas import DataFrame, Series, Index, MultiIndex, Categorical
+from pandas import DataFrame
from pandas import compat
from pandas.compat import StringIO, range, lrange
-from pandas.types.dtypes import CategoricalDtype
class CParserTests(object):
@@ -71,11 +70,11 @@ def test_dtype_and_names_error(self):
3.0 3
"""
# base cases
- result = self.read_csv(StringIO(data), sep='\s+', header=None)
+ result = self.read_csv(StringIO(data), sep=r'\s+', header=None)
expected = DataFrame([[1.0, 1], [2.0, 2], [3.0, 3]])
tm.assert_frame_equal(result, expected)
- result = self.read_csv(StringIO(data), sep='\s+',
+ result = self.read_csv(StringIO(data), sep=r'\s+',
header=None, names=['a', 'b'])
expected = DataFrame(
[[1.0, 1], [2.0, 2], [3.0, 3]], columns=['a', 'b'])
@@ -83,7 +82,7 @@ def test_dtype_and_names_error(self):
# fallback casting
result = self.read_csv(StringIO(
- data), sep='\s+', header=None,
+ data), sep=r'\s+', header=None,
names=['a', 'b'], dtype={'a': np.int32})
expected = DataFrame([[1, 1], [2, 2], [3, 3]],
columns=['a', 'b'])
@@ -97,32 +96,16 @@ def test_dtype_and_names_error(self):
"""
# fallback casting, but not castable
with tm.assertRaisesRegexp(ValueError, 'cannot safely convert'):
- self.read_csv(StringIO(data), sep='\s+', header=None,
+ self.read_csv(StringIO(data), sep=r'\s+', header=None,
names=['a', 'b'], dtype={'a': np.int32})
- def test_passing_dtype(self):
- # see gh-6607
+ def test_unsupported_dtype(self):
df = DataFrame(np.random.rand(5, 2), columns=list(
'AB'), index=['1A', '1B', '1C', '1D', '1E'])
- with tm.ensure_clean('__passing_str_as_dtype__.csv') as path:
+ with tm.ensure_clean('__unsupported_dtype__.csv') as path:
df.to_csv(path)
- # see gh-3795: passing 'str' as the dtype
- result = self.read_csv(path, dtype=str, index_col=0)
- tm.assert_series_equal(result.dtypes, Series(
- {'A': 'object', 'B': 'object'}))
-
- # we expect all object columns, so need to
- # convert to test for equivalence
- result = result.astype(float)
- tm.assert_frame_equal(result, df)
-
- # invalid dtype
- self.assertRaises(TypeError, self.read_csv, path,
- dtype={'A': 'foo', 'B': 'float64'},
- index_col=0)
-
# valid but we don't support it (date)
self.assertRaises(TypeError, self.read_csv, path,
dtype={'A': 'datetime64', 'B': 'float64'},
@@ -141,11 +124,6 @@ def test_passing_dtype(self):
dtype={'A': 'U8'},
index_col=0)
- # see gh-12048: empty frame
- actual = self.read_csv(StringIO('A,B'), dtype=str)
- expected = DataFrame({'A': [], 'B': []}, index=[], dtype=str)
- tm.assert_frame_equal(actual, expected)
-
def test_precise_conversion(self):
# see gh-8002
tm._skip_if_32bit()
@@ -178,104 +156,6 @@ def error(val):
self.assertTrue(sum(precise_errors) <= sum(normal_errors))
self.assertTrue(max(precise_errors) <= max(normal_errors))
- def test_pass_dtype(self):
- data = """\
-one,two
-1,2.5
-2,3.5
-3,4.5
-4,5.5"""
-
- result = self.read_csv(StringIO(data), dtype={'one': 'u1', 1: 'S1'})
- self.assertEqual(result['one'].dtype, 'u1')
- self.assertEqual(result['two'].dtype, 'object')
-
- def test_categorical_dtype(self):
- # GH 10153
- data = """a,b,c
-1,a,3.4
-1,a,3.4
-2,b,4.5"""
- expected = pd.DataFrame({'a': Categorical(['1', '1', '2']),
- 'b': Categorical(['a', 'a', 'b']),
- 'c': Categorical(['3.4', '3.4', '4.5'])})
- actual = self.read_csv(StringIO(data), dtype='category')
- tm.assert_frame_equal(actual, expected)
-
- actual = self.read_csv(StringIO(data), dtype=CategoricalDtype())
- tm.assert_frame_equal(actual, expected)
-
- actual = self.read_csv(StringIO(data), dtype={'a': 'category',
- 'b': 'category',
- 'c': CategoricalDtype()})
- tm.assert_frame_equal(actual, expected)
-
- actual = self.read_csv(StringIO(data), dtype={'b': 'category'})
- expected = pd.DataFrame({'a': [1, 1, 2],
- 'b': Categorical(['a', 'a', 'b']),
- 'c': [3.4, 3.4, 4.5]})
- tm.assert_frame_equal(actual, expected)
-
- actual = self.read_csv(StringIO(data), dtype={1: 'category'})
- tm.assert_frame_equal(actual, expected)
-
- # unsorted
- data = """a,b,c
-1,b,3.4
-1,b,3.4
-2,a,4.5"""
- expected = pd.DataFrame({'a': Categorical(['1', '1', '2']),
- 'b': Categorical(['b', 'b', 'a']),
- 'c': Categorical(['3.4', '3.4', '4.5'])})
- actual = self.read_csv(StringIO(data), dtype='category')
- tm.assert_frame_equal(actual, expected)
-
- # missing
- data = """a,b,c
-1,b,3.4
-1,nan,3.4
-2,a,4.5"""
- expected = pd.DataFrame({'a': Categorical(['1', '1', '2']),
- 'b': Categorical(['b', np.nan, 'a']),
- 'c': Categorical(['3.4', '3.4', '4.5'])})
- actual = self.read_csv(StringIO(data), dtype='category')
- tm.assert_frame_equal(actual, expected)
-
- def test_categorical_dtype_encoding(self):
- # GH 10153
- pth = tm.get_data_path('unicode_series.csv')
- encoding = 'latin-1'
- expected = self.read_csv(pth, header=None, encoding=encoding)
- expected[1] = Categorical(expected[1])
- actual = self.read_csv(pth, header=None, encoding=encoding,
- dtype={1: 'category'})
- tm.assert_frame_equal(actual, expected)
-
- pth = tm.get_data_path('utf16_ex.txt')
- encoding = 'utf-16'
- expected = self.read_table(pth, encoding=encoding)
- expected = expected.apply(Categorical)
- actual = self.read_table(pth, encoding=encoding, dtype='category')
- tm.assert_frame_equal(actual, expected)
-
- def test_categorical_dtype_chunksize(self):
- # GH 10153
- data = """a,b
-1,a
-1,b
-1,b
-2,c"""
- expecteds = [pd.DataFrame({'a': [1, 1],
- 'b': Categorical(['a', 'b'])}),
- pd.DataFrame({'a': [1, 2],
- 'b': Categorical(['b', 'c'])},
- index=[2, 3])]
- actuals = self.read_csv(StringIO(data), dtype={'b': 'category'},
- chunksize=2)
-
- for actual, expected in zip(actuals, expecteds):
- tm.assert_frame_equal(actual, expected)
-
def test_pass_dtype_as_recarray(self):
if compat.is_platform_windows() and self.low_memory:
raise nose.SkipTest(
@@ -295,66 +175,6 @@ def test_pass_dtype_as_recarray(self):
self.assertEqual(result['one'].dtype, 'u1')
self.assertEqual(result['two'].dtype, 'S1')
- def test_empty_pass_dtype(self):
- data = 'one,two'
- result = self.read_csv(StringIO(data), dtype={'one': 'u1'})
-
- expected = DataFrame({'one': np.empty(0, dtype='u1'),
- 'two': np.empty(0, dtype=np.object)})
- tm.assert_frame_equal(result, expected, check_index_type=False)
-
- def test_empty_with_index_pass_dtype(self):
- data = 'one,two'
- result = self.read_csv(StringIO(data), index_col=['one'],
- dtype={'one': 'u1', 1: 'f'})
-
- expected = DataFrame({'two': np.empty(0, dtype='f')},
- index=Index([], dtype='u1', name='one'))
- tm.assert_frame_equal(result, expected, check_index_type=False)
-
- def test_empty_with_multiindex_pass_dtype(self):
- data = 'one,two,three'
- result = self.read_csv(StringIO(data), index_col=['one', 'two'],
- dtype={'one': 'u1', 1: 'f8'})
-
- exp_idx = MultiIndex.from_arrays([np.empty(0, dtype='u1'),
- np.empty(0, dtype='O')],
- names=['one', 'two'])
- expected = DataFrame(
- {'three': np.empty(0, dtype=np.object)}, index=exp_idx)
- tm.assert_frame_equal(result, expected, check_index_type=False)
-
- def test_empty_with_mangled_column_pass_dtype_by_names(self):
- data = 'one,one'
- result = self.read_csv(StringIO(data), dtype={
- 'one': 'u1', 'one.1': 'f'})
-
- expected = DataFrame(
- {'one': np.empty(0, dtype='u1'), 'one.1': np.empty(0, dtype='f')})
- tm.assert_frame_equal(result, expected, check_index_type=False)
-
- def test_empty_with_mangled_column_pass_dtype_by_indexes(self):
- data = 'one,one'
- result = self.read_csv(StringIO(data), dtype={0: 'u1', 1: 'f'})
-
- expected = DataFrame(
- {'one': np.empty(0, dtype='u1'), 'one.1': np.empty(0, dtype='f')})
- tm.assert_frame_equal(result, expected, check_index_type=False)
-
- def test_empty_with_dup_column_pass_dtype_by_indexes(self):
- # see gh-9424
- expected = pd.concat([Series([], name='one', dtype='u1'),
- Series([], name='one.1', dtype='f')], axis=1)
-
- data = 'one,one'
- result = self.read_csv(StringIO(data), dtype={0: 'u1', 1: 'f'})
- tm.assert_frame_equal(result, expected, check_index_type=False)
-
- data = ''
- result = self.read_csv(StringIO(data), names=['one', 'one'],
- dtype={0: 'u1', 1: 'f'})
- tm.assert_frame_equal(result, expected, check_index_type=False)
-
def test_usecols_dtypes(self):
data = """\
1,2,3
@@ -400,16 +220,6 @@ def test_custom_lineterminator(self):
tm.assert_frame_equal(result, expected)
- def test_raise_on_passed_int_dtype_with_nas(self):
- # see gh-2631
- data = """YEAR, DOY, a
-2001,106380451,10
-2001,,11
-2001,106380451,67"""
- self.assertRaises(ValueError, self.read_csv, StringIO(data),
- sep=",", skipinitialspace=True,
- dtype={'DOY': np.int64})
-
def test_parse_ragged_csv(self):
data = """1,2,3
1,2,3,4
@@ -561,3 +371,20 @@ def test_internal_null_byte(self):
result = self.read_csv(StringIO(data), names=names)
tm.assert_frame_equal(result, expected)
+
+ def test_read_nrows_large(self):
+ # gh-7626 - Read only nrows of data in for large inputs (>262144b)
+ header_narrow = '\t'.join(['COL_HEADER_' + str(i)
+ for i in range(10)]) + '\n'
+ data_narrow = '\t'.join(['somedatasomedatasomedata1'
+ for i in range(10)]) + '\n'
+ header_wide = '\t'.join(['COL_HEADER_' + str(i)
+ for i in range(15)]) + '\n'
+ data_wide = '\t'.join(['somedatasomedatasomedata2'
+ for i in range(15)]) + '\n'
+ test_input = (header_narrow + data_narrow * 1050 +
+ header_wide + data_wide * 2)
+
+ df = self.read_csv(StringIO(test_input), sep='\t', nrows=1010)
+
+ self.assertTrue(df.size == 1010 * 10)
diff --git a/pandas/io/tests/parser/common.py b/pandas/io/tests/parser/common.py
index 0219e16391be8..b6d1d4bb09f56 100644
--- a/pandas/io/tests/parser/common.py
+++ b/pandas/io/tests/parser/common.py
@@ -17,8 +17,8 @@
import pandas.util.testing as tm
from pandas import DataFrame, Series, Index, MultiIndex
from pandas import compat
-from pandas.compat import(StringIO, BytesIO, PY3,
- range, lrange, u)
+from pandas.compat import (StringIO, BytesIO, PY3,
+ range, lrange, u)
from pandas.io.common import DtypeWarning, EmptyDataError, URLError
from pandas.io.parsers import TextFileReader, TextParser
@@ -50,7 +50,7 @@ def test_bad_stream_exception(self):
# Issue 13652:
# This test validates that both python engine
# and C engine will raise UnicodeDecodeError instead of
- # c engine raising CParserError and swallowing exception
+ # c engine raising ParserError and swallowing exception
# that caused read to fail.
handle = open(self.csv_shiftjs, "rb")
codec = codecs.lookup("utf-8")
@@ -606,6 +606,28 @@ def test_multi_index_no_level_names(self):
expected = self.read_csv(StringIO(data), index_col=[1, 0])
tm.assert_frame_equal(df, expected, check_names=False)
+ def test_multi_index_blank_df(self):
+ # GH 14545
+ data = """a,b
+"""
+ df = self.read_csv(StringIO(data), header=[0])
+ expected = DataFrame(columns=['a', 'b'])
+ tm.assert_frame_equal(df, expected)
+ round_trip = self.read_csv(StringIO(
+ expected.to_csv(index=False)), header=[0])
+ tm.assert_frame_equal(round_trip, expected)
+
+ data_multiline = """a,b
+c,d
+"""
+ df2 = self.read_csv(StringIO(data_multiline), header=[0, 1])
+ cols = MultiIndex.from_tuples([('a', 'c'), ('b', 'd')])
+ expected2 = DataFrame(columns=cols)
+ tm.assert_frame_equal(df2, expected2)
+ round_trip = self.read_csv(StringIO(
+ expected2.to_csv(index=False)), header=[0, 1])
+ tm.assert_frame_equal(round_trip, expected2)
+
def test_no_unnamed_index(self):
data = """ id c0 c1 c2
0 1 0 a b
@@ -630,10 +652,10 @@ def test_read_csv_parse_simple_list(self):
def test_url(self):
# HTTP(S)
url = ('https://raw.github.com/pandas-dev/pandas/master/'
- 'pandas/io/tests/parser/data/salary.table.csv')
+ 'pandas/io/tests/parser/data/salaries.csv')
url_table = self.read_table(url)
dirpath = tm.get_data_path()
- localtable = os.path.join(dirpath, 'salary.table.csv')
+ localtable = os.path.join(dirpath, 'salaries.csv')
local_table = self.read_table(localtable)
tm.assert_frame_equal(url_table, local_table)
# TODO: ftp testing
@@ -641,7 +663,7 @@ def test_url(self):
@tm.slow
def test_file(self):
dirpath = tm.get_data_path()
- localtable = os.path.join(dirpath, 'salary.table.csv')
+ localtable = os.path.join(dirpath, 'salaries.csv')
local_table = self.read_table(localtable)
try:
@@ -836,7 +858,7 @@ def test_integer_overflow_bug(self):
result = self.read_csv(StringIO(data), header=None, sep=' ')
self.assertTrue(result[0].dtype == np.float64)
- result = self.read_csv(StringIO(data), header=None, sep='\s+')
+ result = self.read_csv(StringIO(data), header=None, sep=r'\s+')
self.assertTrue(result[0].dtype == np.float64)
def test_catch_too_many_names(self):
@@ -852,7 +874,7 @@ def test_catch_too_many_names(self):
def test_ignore_leading_whitespace(self):
# see gh-3374, gh-6607
data = ' a b c\n 1 2 3\n 4 5 6\n 7 8 9'
- result = self.read_table(StringIO(data), sep='\s+')
+ result = self.read_table(StringIO(data), sep=r'\s+')
expected = DataFrame({'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]})
tm.assert_frame_equal(result, expected)
@@ -1052,7 +1074,7 @@ def test_uneven_lines_with_usecols(self):
# make sure that an error is still thrown
# when the 'usecols' parameter is not provided
- msg = "Expected \d+ fields in line \d+, saw \d+"
+ msg = r"Expected \d+ fields in line \d+, saw \d+"
with tm.assertRaisesRegexp(ValueError, msg):
df = self.read_csv(StringIO(csv))
@@ -1122,7 +1144,7 @@ def test_raise_on_sep_with_delim_whitespace(self):
# see gh-6607
data = 'a b c\n1 2 3'
with tm.assertRaisesRegexp(ValueError, 'you can only specify one'):
- self.read_table(StringIO(data), sep='\s', delim_whitespace=True)
+ self.read_table(StringIO(data), sep=r'\s', delim_whitespace=True)
def test_single_char_leading_whitespace(self):
# see gh-9710
@@ -1157,7 +1179,7 @@ def test_empty_lines(self):
[-70., .4, 1.]])
df = self.read_csv(StringIO(data))
tm.assert_numpy_array_equal(df.values, expected)
- df = self.read_csv(StringIO(data.replace(',', ' ')), sep='\s+')
+ df = self.read_csv(StringIO(data.replace(',', ' ')), sep=r'\s+')
tm.assert_numpy_array_equal(df.values, expected)
expected = np.array([[1., 2., 4.],
[np.nan, np.nan, np.nan],
@@ -1189,14 +1211,14 @@ def test_regex_separator(self):
b 1 2 3 4
c 1 2 3 4
"""
- df = self.read_table(StringIO(data), sep='\s+')
+ df = self.read_table(StringIO(data), sep=r'\s+')
expected = self.read_csv(StringIO(re.sub('[ ]+', ',', data)),
index_col=0)
self.assertIsNone(expected.index.name)
tm.assert_frame_equal(df, expected)
data = ' a b c\n1 2 3 \n4 5 6\n 7 8 9'
- result = self.read_table(StringIO(data), sep='\s+')
+ result = self.read_table(StringIO(data), sep=r'\s+')
expected = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
columns=['a', 'b', 'c'])
tm.assert_frame_equal(result, expected)
@@ -1431,7 +1453,7 @@ def test_as_recarray(self):
FutureWarning, check_stacklevel=False):
data = 'a,b\n1,a\n2,b'
expected = np.array([(1, 'a'), (2, 'b')],
- dtype=[('a', ' test passing of this error to user
+ db_uri = "postgresql+pg8000://user:pass@host/dbname"
+ with tm.assertRaisesRegexp(ImportError, "pg8000"):
+ sql.read_sql("select * from table", db_uri)
+
def _make_iris_table_metadata(self):
sa = sqlalchemy
metadata = sa.MetaData()
@@ -1995,6 +2001,8 @@ def test_to_sql_save_index(self):
self._to_sql_save_index()
def test_transactions(self):
+ if PY36:
+ raise nose.SkipTest("not working on python > 3.5")
self._transaction_test()
def _get_sqlite_column_type(self, table, column):
diff --git a/pandas/io/tests/test_stata.py b/pandas/io/tests/test_stata.py
index 1849b32a4a7c8..cd972868a6e32 100644
--- a/pandas/io/tests/test_stata.py
+++ b/pandas/io/tests/test_stata.py
@@ -11,8 +11,6 @@
import nose
import numpy as np
-from pandas.tslib import NaT
-
import pandas as pd
import pandas.util.testing as tm
from pandas import compat
@@ -21,6 +19,7 @@
from pandas.io.parsers import read_csv
from pandas.io.stata import (read_stata, StataReader, InvalidColumnName,
PossiblePrecisionLoss, StataMissingValue)
+from pandas.tslib import NaT
from pandas.types.common import is_categorical_dtype
@@ -1234,6 +1233,52 @@ def test_stata_111(self):
original = original[['y', 'x', 'w', 'z']]
tm.assert_frame_equal(original, df)
+ def test_out_of_range_double(self):
+ # GH 14618
+ df = DataFrame({'ColumnOk': [0.0,
+ np.finfo(np.double).eps,
+ 4.49423283715579e+307],
+ 'ColumnTooBig': [0.0,
+ np.finfo(np.double).eps,
+ np.finfo(np.double).max]})
+ with tm.assertRaises(ValueError) as cm:
+ with tm.ensure_clean() as path:
+ df.to_stata(path)
+ tm.assertTrue('ColumnTooBig' in cm.exception)
+
+ df.loc[2, 'ColumnTooBig'] = np.inf
+ with tm.assertRaises(ValueError) as cm:
+ with tm.ensure_clean() as path:
+ df.to_stata(path)
+ tm.assertTrue('ColumnTooBig' in cm.exception)
+ tm.assertTrue('infinity' in cm.exception)
+
+ def test_out_of_range_float(self):
+ original = DataFrame({'ColumnOk': [0.0,
+ np.finfo(np.float32).eps,
+ np.finfo(np.float32).max / 10.0],
+ 'ColumnTooBig': [0.0,
+ np.finfo(np.float32).eps,
+ np.finfo(np.float32).max]})
+ original.index.name = 'index'
+ for col in original:
+ original[col] = original[col].astype(np.float32)
+
+ with tm.ensure_clean() as path:
+ original.to_stata(path)
+ reread = read_stata(path)
+ original['ColumnTooBig'] = original['ColumnTooBig'].astype(
+ np.float64)
+ tm.assert_frame_equal(original,
+ reread.set_index('index'))
+
+ original.loc[2, 'ColumnTooBig'] = np.inf
+ with tm.assertRaises(ValueError) as cm:
+ with tm.ensure_clean() as path:
+ original.to_stata(path)
+ tm.assertTrue('ColumnTooBig' in cm.exception)
+ tm.assertTrue('infinity' in cm.exception)
+
if __name__ == '__main__':
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
diff --git a/pandas/io/wb.py b/pandas/io/wb.py
index 2183290c7e074..5dc4d9ce1adc4 100644
--- a/pandas/io/wb.py
+++ b/pandas/io/wb.py
@@ -1,6 +1,6 @@
raise ImportError(
"The pandas.io.wb module is moved to a separate package "
"(pandas-datareader). After installing the pandas-datareader package "
- "(https://github.com/pandas-dev/pandas-datareader), you can change "
+ "(https://github.com/pydata/pandas-datareader), you can change "
"the import ``from pandas.io import data, wb`` to "
"``from pandas_datareader import data, wb``.")
diff --git a/pandas/lib.pyx b/pandas/lib.pyx
index b56a02b245d69..b09a1c2755a06 100644
--- a/pandas/lib.pyx
+++ b/pandas/lib.pyx
@@ -65,13 +65,8 @@ cdef int64_t NPY_NAT = util.get_nat()
ctypedef unsigned char UChar
cimport util
-from util cimport is_array, _checknull, _checknan
-
-cdef extern from "headers/stdint.h":
- enum: UINT8_MAX
- enum: INT64_MAX
- enum: INT64_MIN
-
+from util cimport (is_array, _checknull, _checknan, INT64_MAX,
+ INT64_MIN, UINT8_MAX)
cdef extern from "math.h":
double sqrt(double x)
@@ -980,7 +975,9 @@ def astype_intsafe(ndarray[object] arr, new_dtype):
if is_datelike and checknull(v):
result[i] = NPY_NAT
else:
- util.set_value_at(result, i, v)
+ # we can use the unsafe version because we know `result` is mutable
+ # since it was created from `np.empty`
+ util.set_value_at_unsafe(result, i, v)
return result
@@ -991,7 +988,9 @@ cpdef ndarray[object] astype_unicode(ndarray arr):
ndarray[object] result = np.empty(n, dtype=object)
for i in range(n):
- util.set_value_at(result, i, unicode(arr[i]))
+ # we can use the unsafe version because we know `result` is mutable
+ # since it was created from `np.empty`
+ util.set_value_at_unsafe(result, i, unicode(arr[i]))
return result
@@ -1002,7 +1001,9 @@ cpdef ndarray[object] astype_str(ndarray arr):
ndarray[object] result = np.empty(n, dtype=object)
for i in range(n):
- util.set_value_at(result, i, str(arr[i]))
+ # we can use the unsafe version because we know `result` is mutable
+ # since it was created from `np.empty`
+ util.set_value_at_unsafe(result, i, str(arr[i]))
return result
diff --git a/pandas/msgpack/__init__.py b/pandas/msgpack/__init__.py
index 0c2370df936a4..33d60a12ef0a3 100644
--- a/pandas/msgpack/__init__.py
+++ b/pandas/msgpack/__init__.py
@@ -1,11 +1,10 @@
# coding: utf-8
-# flake8: noqa
-
-from pandas.msgpack._version import version
-from pandas.msgpack.exceptions import *
from collections import namedtuple
+from pandas.msgpack.exceptions import * # noqa
+from pandas.msgpack._version import version # noqa
+
class ExtType(namedtuple('ExtType', 'code data')):
"""ExtType represents ext type in msgpack."""
@@ -18,11 +17,10 @@ def __new__(cls, code, data):
raise ValueError("code must be 0~127")
return super(ExtType, cls).__new__(cls, code, data)
+import os # noqa
-import os
-from pandas.msgpack._packer import Packer
-from pandas.msgpack._unpacker import unpack, unpackb, Unpacker
-
+from pandas.msgpack._packer import Packer # noqa
+from pandas.msgpack._unpacker import unpack, unpackb, Unpacker # noqa
def pack(o, stream, **kwargs):
diff --git a/pandas/parser.pyx b/pandas/parser.pyx
index 12525c7a9c587..d94a4ef278dee 100644
--- a/pandas/parser.pyx
+++ b/pandas/parser.pyx
@@ -13,8 +13,11 @@ from cpython cimport (PyObject, PyBytes_FromString,
PyUnicode_Check, PyUnicode_AsUTF8String,
PyErr_Occurred, PyErr_Fetch)
from cpython.ref cimport PyObject, Py_XDECREF
-from io.common import CParserError, DtypeWarning, EmptyDataError
+from io.common import ParserError, DtypeWarning, EmptyDataError, ParserWarning
+# Import CParserError as alias of ParserError for backwards compatibility.
+# Ultimately, we want to remove this import. See gh-12665 and gh-14479.
+from io.common import CParserError
cdef extern from "Python.h":
object PyUnicode_FromString(char *v)
@@ -272,7 +275,7 @@ cdef class TextReader:
parser_t *parser
object file_handle, na_fvalues
object true_values, false_values
- object dsource
+ object handle
bint na_filter, verbose, has_usecols, has_mi_columns
int parser_start
list clocks
@@ -297,8 +300,9 @@ cdef class TextReader:
object compression
object mangle_dupe_cols
object tupleize_cols
+ object usecols
list dtype_cast_order
- set noconvert, usecols
+ set noconvert
def __cinit__(self, source,
delimiter=b',',
@@ -434,7 +438,10 @@ cdef class TextReader:
# suboptimal
if usecols is not None:
self.has_usecols = 1
- self.usecols = set(usecols)
+ if callable(usecols):
+ self.usecols = usecols
+ else:
+ self.usecols = set(usecols)
# XXX
if skipfooter > 0:
@@ -554,9 +561,9 @@ cdef class TextReader:
def close(self):
# we need to properly close an open derived
# filehandle here, e.g. and UTFRecoder
- if self.dsource is not None:
+ if self.handle is not None:
try:
- self.dsource.close()
+ self.handle.close()
except:
pass
@@ -570,7 +577,8 @@ cdef class TextReader:
if not QUOTE_MINIMAL <= quoting <= QUOTE_NONE:
raise TypeError('bad "quoting" value')
- if not isinstance(quote_char, (str, bytes)) and quote_char is not None:
+ if not isinstance(quote_char, (str, compat.text_type,
+ bytes)) and quote_char is not None:
dtype = type(quote_char).__name__
raise TypeError('"quotechar" must be string, '
'not {dtype}'.format(dtype=dtype))
@@ -640,6 +648,7 @@ cdef class TextReader:
else:
raise ValueError('Unrecognized compression type: %s' %
self.compression)
+ self.handle = source
if isinstance(source, basestring):
if not isinstance(source, bytes):
@@ -683,8 +692,6 @@ cdef class TextReader:
raise IOError('Expected file path name or file-like object,'
' got %s type' % type(source))
- self.dsource = source
-
cdef _get_header(self):
# header is now a list of lists, so field_count should use header[0]
@@ -698,7 +705,6 @@ cdef class TextReader:
cdef StringPath path = _string_path(self.c_encoding)
header = []
-
if self.parser.header_start >= 0:
# Header is in the file
@@ -714,12 +720,14 @@ cdef class TextReader:
start = self.parser.line_start[0]
# e.g., if header=3 and file only has 2 lines
- elif self.parser.lines < hr + 1:
+ elif (self.parser.lines < hr + 1
+ and not isinstance(self.orig_header, list)) or (
+ self.parser.lines < hr):
msg = self.orig_header
if isinstance(msg, list):
msg = "[%s], len of %d," % (
','.join([ str(m) for m in msg ]), len(msg))
- raise CParserError(
+ raise ParserError(
'Passed header=%s but only %d lines in file'
% (msg, self.parser.lines))
@@ -812,11 +820,12 @@ cdef class TextReader:
passed_count = len(header[0])
# if passed_count > field_count:
- # raise CParserError('Column names have %d fields, '
+ # raise ParserError('Column names have %d fields, '
# 'data has %d fields'
# % (passed_count, field_count))
- if self.has_usecols and self.allow_leading_cols:
+ if self.has_usecols and self.allow_leading_cols and \
+ not callable(self.usecols):
nuse = len(self.usecols)
if nuse == passed_count:
self.leading_cols = 0
@@ -937,7 +946,7 @@ cdef class TextReader:
raise_parser_error('Error tokenizing data', self.parser)
footer = self.skipfooter
- if self.parser_start == self.parser.lines:
+ if self.parser_start >= self.parser.lines:
raise StopIteration
self._end_clock('Tokenization')
@@ -982,7 +991,7 @@ cdef class TextReader:
Py_ssize_t i, nused
kh_str_t *na_hashset = NULL
int start, end
- object name, na_flist
+ object name, na_flist, col_dtype = None
bint na_filter = 0
Py_ssize_t num_cols
@@ -1004,7 +1013,7 @@ cdef class TextReader:
(num_cols >= self.parser.line_fields[i]) * num_cols
if self.table_width - self.leading_cols > num_cols:
- raise CParserError(
+ raise ParserError(
"Too many columns specified: expected %s and found %s" %
(self.table_width - self.leading_cols, num_cols))
@@ -1014,13 +1023,20 @@ cdef class TextReader:
if i < self.leading_cols:
# Pass through leading columns always
name = i
- elif self.usecols and nused == len(self.usecols):
+ elif self.usecols and not callable(self.usecols) and \
+ nused == len(self.usecols):
# Once we've gathered all requested columns, stop. GH5766
break
else:
name = self._get_column_name(i, nused)
- if self.has_usecols and not (i in self.usecols or
- name in self.usecols):
+ usecols = set()
+ if callable(self.usecols):
+ if self.usecols(name):
+ usecols = set([i])
+ else:
+ usecols = self.usecols
+ if self.has_usecols and not (i in usecols or
+ name in usecols):
continue
nused += 1
@@ -1038,14 +1054,34 @@ cdef class TextReader:
else:
na_filter = 0
+ col_dtype = None
+ if self.dtype is not None:
+ if isinstance(self.dtype, dict):
+ if name in self.dtype:
+ col_dtype = self.dtype[name]
+ elif i in self.dtype:
+ col_dtype = self.dtype[i]
+ else:
+ if self.dtype.names:
+ # structured array
+ col_dtype = np.dtype(self.dtype.descr[i][1])
+ else:
+ col_dtype = self.dtype
+
if conv:
+ if col_dtype is not None:
+ warnings.warn(("Both a converter and dtype were specified "
+ "for column {0} - only the converter will "
+ "be used").format(name), ParserWarning,
+ stacklevel=5)
results[i] = _apply_converter(conv, self.parser, i, start, end,
self.c_encoding)
continue
# Should return as the desired dtype (inferred or specified)
col_res, na_count = self._convert_tokens(
- i, start, end, name, na_filter, na_hashset, na_flist)
+ i, start, end, name, na_filter, na_hashset,
+ na_flist, col_dtype)
if na_filter:
self._free_na_set(na_hashset)
@@ -1059,7 +1095,7 @@ cdef class TextReader:
self.use_unsigned)
if col_res is None:
- raise CParserError('Unable to parse column %d' % i)
+ raise ParserError('Unable to parse column %d' % i)
results[i] = col_res
@@ -1070,32 +1106,17 @@ cdef class TextReader:
cdef inline _convert_tokens(self, Py_ssize_t i, int start, int end,
object name, bint na_filter,
kh_str_t *na_hashset,
- object na_flist):
- cdef:
- object col_dtype = None
-
- if self.dtype is not None:
- if isinstance(self.dtype, dict):
- if name in self.dtype:
- col_dtype = self.dtype[name]
- elif i in self.dtype:
- col_dtype = self.dtype[i]
- else:
- if self.dtype.names:
- # structured array
- col_dtype = np.dtype(self.dtype.descr[i][1])
- else:
- col_dtype = self.dtype
+ object na_flist, object col_dtype):
- if col_dtype is not None:
- col_res, na_count = self._convert_with_dtype(
- col_dtype, i, start, end, na_filter,
- 1, na_hashset, na_flist)
+ if col_dtype is not None:
+ col_res, na_count = self._convert_with_dtype(
+ col_dtype, i, start, end, na_filter,
+ 1, na_hashset, na_flist)
- # Fallback on the parse (e.g. we requested int dtype,
- # but its actually a float).
- if col_res is not None:
- return col_res, na_count
+ # Fallback on the parse (e.g. we requested int dtype,
+ # but its actually a float).
+ if col_res is not None:
+ return col_res, na_count
if i in self.noconvert:
return self._string_convert(i, start, end, na_filter, na_hashset)
@@ -1310,7 +1331,7 @@ def _is_file_like(obj):
if PY3:
import io
if isinstance(obj, io.TextIOWrapper):
- raise CParserError('Cannot handle open unicode files (yet)')
+ raise ParserError('Cannot handle open unicode files (yet)')
# BufferedReader is a byte reader for Python 3
file = io.BufferedReader
@@ -2015,7 +2036,7 @@ cdef raise_parser_error(object base, parser_t *parser):
else:
message += 'no error message set'
- raise CParserError(message)
+ raise ParserError(message)
def _concatenate_chunks(list chunks):
diff --git a/pandas/sparse/array.py b/pandas/sparse/array.py
index 8420371d05e02..da13726e88a14 100644
--- a/pandas/sparse/array.py
+++ b/pandas/sparse/array.py
@@ -5,6 +5,7 @@
# pylint: disable=E1101,E1103,W0231
import numpy as np
+import warnings
import pandas as pd
from pandas.core.base import PandasObject
@@ -381,8 +382,22 @@ def get_values(self, fill=None):
def to_dense(self, fill=None):
"""
- Convert SparseSeries to (dense) Series
+ Convert SparseArray to a NumPy array.
+
+ Parameters
+ ----------
+ fill: float, default None
+ DEPRECATED: this argument will be removed in a future version
+ because it is not respected by this function.
+
+ Returns
+ -------
+ arr : NumPy array
"""
+ if fill is not None:
+ warnings.warn(("The 'fill' parameter has been deprecated and "
+ "will be removed in a future version."),
+ FutureWarning, stacklevel=2)
return self.values
def __iter__(self):
@@ -532,8 +547,8 @@ def astype(self, dtype=None, copy=True):
def copy(self, deep=True):
"""
- Make a copy of the SparseSeries. Only the actual sparse values need to
- be copied
+ Make a copy of the SparseArray. Only the actual sparse values need to
+ be copied.
"""
if deep:
values = self.sp_values.copy()
@@ -544,9 +559,9 @@ def copy(self, deep=True):
def count(self):
"""
- Compute sum of non-NA/null observations in SparseSeries. If the
+ Compute sum of non-NA/null observations in SparseArray. If the
fill_value is not NaN, the "sparse" locations will be included in the
- observation count
+ observation count.
Returns
-------
@@ -605,19 +620,30 @@ def sum(self, axis=0, *args, **kwargs):
def cumsum(self, axis=0, *args, **kwargs):
"""
- Cumulative sum of values. Preserves locations of NaN values
+ Cumulative sum of non-NA/null values.
+
+ When performing the cumulative summation, any non-NA/null values will
+ be skipped. The resulting SparseArray will preserve the locations of
+ NaN values, but the fill value will be `np.nan` regardless.
+
+ Parameters
+ ----------
+ axis : int or None
+ Axis over which to perform the cumulative summation. If None,
+ perform cumulative summation over flattened array.
Returns
-------
- cumsum : Series
+ cumsum : SparseArray
"""
nv.validate_cumsum(args, kwargs)
- # TODO: gh-12855 - return a SparseArray here
- if notnull(self.fill_value):
- return self.to_dense().cumsum()
+ if axis is not None and axis >= self.ndim: # Mimic ndarray behaviour.
+ raise ValueError("axis(={axis}) out of bounds".format(axis=axis))
+
+ if not self._null_fill_value:
+ return SparseArray(self.to_dense()).cumsum()
- # TODO: what if sp_values contains NaN??
return SparseArray(self.sp_values.cumsum(), sparse_index=self.sp_index,
fill_value=self.fill_value)
diff --git a/pandas/sparse/frame.py b/pandas/sparse/frame.py
index 8eeff045d1fac..56020e32b9963 100644
--- a/pandas/sparse/frame.py
+++ b/pandas/sparse/frame.py
@@ -302,7 +302,21 @@ def fillna(self, value=None, method=None, axis=0, inplace=False,
# ----------------------------------------------------------------------
# Support different internal representation of SparseDataFrame
- def _sanitize_column(self, key, value):
+ def _sanitize_column(self, key, value, **kwargs):
+ """
+ Creates a new SparseArray from the input value.
+
+ Parameters
+ ----------
+ key : object
+ value : scalar, Series, or array-like
+ kwargs : dict
+
+ Returns
+ -------
+ sanitized_column : SparseArray
+
+ """
sp_maker = lambda x, index=None: SparseArray(
x, index=index, fill_value=self._default_fill_value,
kind=self._default_kind)
diff --git a/pandas/sparse/series.py b/pandas/sparse/series.py
index ad9168890b8f2..d6bc892921c42 100644
--- a/pandas/sparse/series.py
+++ b/pandas/sparse/series.py
@@ -528,9 +528,24 @@ def _set_values(self, key, value):
def to_dense(self, sparse_only=False):
"""
- Convert SparseSeries to (dense) Series
+ Convert SparseSeries to a Series.
+
+ Parameters
+ ----------
+ sparse_only: bool, default False
+ DEPRECATED: this argument will be removed in a future version.
+
+ If True, return just the non-sparse values, or the dense version
+ of `self.values` if False.
+
+ Returns
+ -------
+ s : Series
"""
if sparse_only:
+ warnings.warn(("The 'sparse_only' parameter has been deprecated "
+ "and will be removed in a future version."),
+ FutureWarning, stacklevel=2)
int_index = self.sp_index.to_int_index()
index = self.index.take(int_index.indices)
return Series(self.sp_values, index=index, name=self.name)
@@ -615,21 +630,29 @@ def take(self, indices, axis=0, convert=True, *args, **kwargs):
def cumsum(self, axis=0, *args, **kwargs):
"""
- Cumulative sum of values. Preserves locations of NaN values
+ Cumulative sum of non-NA/null values.
+
+ When performing the cumulative summation, any non-NA/null values will
+ be skipped. The resulting SparseSeries will preserve the locations of
+ NaN values, but the fill value will be `np.nan` regardless.
+
+ Parameters
+ ----------
+ axis : {0}
Returns
-------
- cumsum : SparseSeries if `self` has a null `fill_value` and a
- generic Series otherwise
+ cumsum : SparseSeries
"""
nv.validate_cumsum(args, kwargs)
- new_array = SparseArray.cumsum(self.values)
- if isinstance(new_array, SparseArray):
- return self._constructor(
- new_array, index=self.index,
- sparse_index=new_array.sp_index).__finalize__(self)
- # TODO: gh-12855 - return a SparseSeries here
- return Series(new_array, index=self.index).__finalize__(self)
+ if axis is not None:
+ axis = self._get_axis_number(axis)
+
+ new_array = self.values.cumsum()
+
+ return self._constructor(
+ new_array, index=self.index,
+ sparse_index=new_array.sp_index).__finalize__(self)
@Appender(generic._shared_docs['isnull'])
def isnull(self):
diff --git a/pandas/sparse/tests/test_array.py b/pandas/sparse/tests/test_array.py
index dd86e9e791e5e..bd896ae5b86d9 100644
--- a/pandas/sparse/tests/test_array.py
+++ b/pandas/sparse/tests/test_array.py
@@ -182,7 +182,7 @@ def test_bad_take(self):
self.assertRaises(IndexError, lambda: self.arr.take(-11))
def test_take_invalid_kwargs(self):
- msg = "take\(\) got an unexpected keyword argument 'foo'"
+ msg = r"take\(\) got an unexpected keyword argument 'foo'"
tm.assertRaisesRegexp(TypeError, msg, self.arr.take,
[2, 3], foo=2)
@@ -361,7 +361,7 @@ def test_astype(self):
arr.astype('i8')
arr = SparseArray([0, np.nan, 0, 1], fill_value=0)
- msg = "Cannot convert NA to integer"
+ msg = 'Cannot convert non-finite values \(NA or inf\) to integer'
with tm.assertRaisesRegexp(ValueError, msg):
arr.astype('i8')
@@ -453,6 +453,11 @@ def test_to_dense(self):
res = SparseArray(vals, fill_value=0).to_dense()
tm.assert_numpy_array_equal(res, vals)
+ # see gh-14647
+ with tm.assert_produces_warning(FutureWarning,
+ check_stacklevel=False):
+ SparseArray(vals).to_dense(fill=2)
+
def test_getitem(self):
def _checkit(i):
assert_almost_equal(self.arr[i], self.arr.values[i])
@@ -683,46 +688,57 @@ def test_numpy_sum(self):
SparseArray(data), out=out)
def test_cumsum(self):
- data = np.arange(10).astype(float)
- out = SparseArray(data).cumsum()
- expected = SparseArray(data.cumsum())
- tm.assert_sp_array_equal(out, expected)
+ non_null_data = np.array([1, 2, 3, 4, 5], dtype=float)
+ non_null_expected = SparseArray(non_null_data.cumsum())
- # TODO: gh-12855 - return a SparseArray here
- data[5] = np.nan
- out = SparseArray(data, fill_value=2).cumsum()
- self.assertNotIsInstance(out, SparseArray)
- tm.assert_numpy_array_equal(out, data.cumsum())
+ null_data = np.array([1, 2, np.nan, 4, 5], dtype=float)
+ null_expected = SparseArray(np.array([1.0, 3.0, np.nan, 7.0, 12.0]))
+
+ for data, expected in [
+ (null_data, null_expected),
+ (non_null_data, non_null_expected)
+ ]:
+ out = SparseArray(data).cumsum()
+ tm.assert_sp_array_equal(out, expected)
- out = SparseArray(data, fill_value=np.nan).cumsum()
- expected = SparseArray(np.array([
- 0, 1, 3, 6, 10, np.nan, 16, 23, 31, 40]))
- tm.assert_sp_array_equal(out, expected)
+ out = SparseArray(data, fill_value=np.nan).cumsum()
+ tm.assert_sp_array_equal(out, expected)
+
+ out = SparseArray(data, fill_value=2).cumsum()
+ tm.assert_sp_array_equal(out, expected)
+
+ axis = 1 # SparseArray currently 1-D, so only axis = 0 is valid.
+ msg = "axis\(={axis}\) out of bounds".format(axis=axis)
+ with tm.assertRaisesRegexp(ValueError, msg):
+ SparseArray(data).cumsum(axis=axis)
def test_numpy_cumsum(self):
- data = np.arange(10).astype(float)
- out = np.cumsum(SparseArray(data))
- expected = SparseArray(data.cumsum())
- tm.assert_sp_array_equal(out, expected)
+ non_null_data = np.array([1, 2, 3, 4, 5], dtype=float)
+ non_null_expected = SparseArray(non_null_data.cumsum())
- # TODO: gh-12855 - return a SparseArray here
- data[5] = np.nan
- out = np.cumsum(SparseArray(data, fill_value=2))
- self.assertNotIsInstance(out, SparseArray)
- tm.assert_numpy_array_equal(out, data.cumsum())
+ null_data = np.array([1, 2, np.nan, 4, 5], dtype=float)
+ null_expected = SparseArray(np.array([1.0, 3.0, np.nan, 7.0, 12.0]))
- out = np.cumsum(SparseArray(data, fill_value=np.nan))
- expected = SparseArray(np.array([
- 0, 1, 3, 6, 10, np.nan, 16, 23, 31, 40]))
- tm.assert_sp_array_equal(out, expected)
+ for data, expected in [
+ (null_data, null_expected),
+ (non_null_data, non_null_expected)
+ ]:
+ out = np.cumsum(SparseArray(data))
+ tm.assert_sp_array_equal(out, expected)
- msg = "the 'dtype' parameter is not supported"
- tm.assertRaisesRegexp(ValueError, msg, np.cumsum,
- SparseArray(data), dtype=np.int64)
+ out = np.cumsum(SparseArray(data, fill_value=np.nan))
+ tm.assert_sp_array_equal(out, expected)
- msg = "the 'out' parameter is not supported"
- tm.assertRaisesRegexp(ValueError, msg, np.cumsum,
- SparseArray(data), out=out)
+ out = np.cumsum(SparseArray(data, fill_value=2))
+ tm.assert_sp_array_equal(out, expected)
+
+ msg = "the 'dtype' parameter is not supported"
+ tm.assertRaisesRegexp(ValueError, msg, np.cumsum,
+ SparseArray(data), dtype=np.int64)
+
+ msg = "the 'out' parameter is not supported"
+ tm.assertRaisesRegexp(ValueError, msg, np.cumsum,
+ SparseArray(data), out=out)
def test_mean(self):
data = np.arange(10).astype(float)
diff --git a/pandas/sparse/tests/test_series.py b/pandas/sparse/tests/test_series.py
index de8c63df9c9e6..14339ab388a5d 100644
--- a/pandas/sparse/tests/test_series.py
+++ b/pandas/sparse/tests/test_series.py
@@ -161,7 +161,10 @@ def test_sparse_to_dense(self):
series = self.bseries.to_dense()
tm.assert_series_equal(series, Series(arr, name='bseries'))
- series = self.bseries.to_dense(sparse_only=True)
+ # see gh-14647
+ with tm.assert_produces_warning(FutureWarning,
+ check_stacklevel=False):
+ series = self.bseries.to_dense(sparse_only=True)
indexer = np.isfinite(arr)
exp = Series(arr[indexer], index=index[indexer], name='bseries')
@@ -1328,21 +1331,22 @@ def test_cumsum(self):
expected = SparseSeries(self.bseries.to_dense().cumsum())
tm.assert_sp_series_equal(result, expected)
- # TODO: gh-12855 - return a SparseSeries here
result = self.zbseries.cumsum()
expected = self.zbseries.to_dense().cumsum()
- self.assertNotIsInstance(result, SparseSeries)
tm.assert_series_equal(result, expected)
+ axis = 1 # Series is 1-D, so only axis = 0 is valid.
+ msg = "No axis named {axis}".format(axis=axis)
+ with tm.assertRaisesRegexp(ValueError, msg):
+ self.bseries.cumsum(axis=axis)
+
def test_numpy_cumsum(self):
result = np.cumsum(self.bseries)
expected = SparseSeries(self.bseries.to_dense().cumsum())
tm.assert_sp_series_equal(result, expected)
- # TODO: gh-12855 - return a SparseSeries here
result = np.cumsum(self.zbseries)
expected = self.zbseries.to_dense().cumsum()
- self.assertNotIsInstance(result, SparseSeries)
tm.assert_series_equal(result, expected)
msg = "the 'dtype' parameter is not supported"
diff --git a/pandas/src/algos_common_helper.pxi b/pandas/src/algos_common_helper.pxi
deleted file mode 100644
index 9dede87e0c15b..0000000000000
--- a/pandas/src/algos_common_helper.pxi
+++ /dev/null
@@ -1,2764 +0,0 @@
-"""
-Template for each `dtype` helper function using 1-d template
-
-# 1-d template
-- map_indices
-- pad
-- pad_1d
-- pad_2d
-- backfill
-- backfill_1d
-- backfill_2d
-- is_monotonic
-- arrmap
-
-WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
-"""
-
-#----------------------------------------------------------------------
-# 1-d template
-#----------------------------------------------------------------------
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cpdef map_indices_float64(ndarray[float64_t] index):
- """
- Produce a dict mapping the values of the input array to their respective
- locations.
-
- Example:
- array(['hi', 'there']) --> {'hi' : 0 , 'there' : 1}
-
- Better to do this with Cython because of the enormous speed boost.
- """
- cdef Py_ssize_t i, length
- cdef dict result = {}
-
- length = len(index)
-
- for i in range(length):
- result[index[i]] = i
-
- return result
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_float64(ndarray[float64_t] old, ndarray[float64_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef float64_t cur, next
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[nright - 1] < old[0]:
- return indexer
-
- i = j = 0
-
- cur = old[0]
-
- while j <= nright - 1 and new[j] < cur:
- j += 1
-
- while True:
- if j == nright:
- break
-
- if i == nleft - 1:
- while j < nright:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] > cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
- break
-
- next = old[i + 1]
-
- while j < nright and cur <= new[j] < next:
- if new[j] == cur:
- indexer[j] = i
- elif fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
-
- fill_count = 0
- i += 1
- cur = next
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_inplace_float64(ndarray[float64_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef float64_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[0]
- for i in range(N):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_2d_inplace_float64(ndarray[float64_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef float64_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, 0]
- for i in range(N):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-"""
-Backfilling logic for generating fill vector
-
-Diagram of what's going on
-
-Old New Fill vector Mask
- . 0 1
- . 0 1
- . 0 1
-A A 0 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
-B B 1 1
- . 2 1
- . 2 1
- . 2 1
-C C 2 1
- . 0
- . 0
-D
-"""
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_float64(ndarray[float64_t] old, ndarray[float64_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef float64_t cur, prev
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[0] > old[nleft - 1]:
- return indexer
-
- i = nleft - 1
- j = nright - 1
-
- cur = old[nleft - 1]
-
- while j >= 0 and new[j] > cur:
- j -= 1
-
- while True:
- if j < 0:
- break
-
- if i == 0:
- while j >= 0:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
- break
-
- prev = old[i - 1]
-
- while j >= 0 and prev < new[j] <= cur:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
-
- fill_count = 0
- i -= 1
- cur = prev
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_inplace_float64(ndarray[float64_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef float64_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[N - 1]
- for i in range(N - 1, -1, -1):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_2d_inplace_float64(ndarray[float64_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef float64_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, N - 1]
- for i in range(N - 1, -1, -1):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def is_monotonic_float64(ndarray[float64_t] arr, bint timelike):
- """
- Returns
- -------
- is_monotonic_inc, is_monotonic_dec, is_unique
- """
- cdef:
- Py_ssize_t i, n
- float64_t prev, cur
- bint is_monotonic_inc = 1
- bint is_monotonic_dec = 1
- bint is_unique = 1
-
- n = len(arr)
-
- if n == 1:
- if arr[0] != arr[0] or (timelike and arr[0] == iNaT):
- # single value is NaN
- return False, False, True
- else:
- return True, True, True
- elif n < 2:
- return True, True, True
-
- if timelike and arr[0] == iNaT:
- return False, False, True
-
- with nogil:
- prev = arr[0]
- for i in range(1, n):
- cur = arr[i]
- if timelike and cur == iNaT:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if cur < prev:
- is_monotonic_inc = 0
- elif cur > prev:
- is_monotonic_dec = 0
- elif cur == prev:
- is_unique = 0
- else:
- # cur or prev is NaN
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if not is_monotonic_inc and not is_monotonic_dec:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- prev = cur
- return is_monotonic_inc, is_monotonic_dec, \
- is_unique and (is_monotonic_inc or is_monotonic_dec)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def arrmap_float64(ndarray[float64_t] index, object func):
- cdef Py_ssize_t length = index.shape[0]
- cdef Py_ssize_t i = 0
-
- cdef ndarray[object] result = np.empty(length, dtype=np.object_)
-
- from pandas.lib import maybe_convert_objects
-
- for i in range(length):
- result[i] = func(index[i])
-
- return maybe_convert_objects(result)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cpdef map_indices_float32(ndarray[float32_t] index):
- """
- Produce a dict mapping the values of the input array to their respective
- locations.
-
- Example:
- array(['hi', 'there']) --> {'hi' : 0 , 'there' : 1}
-
- Better to do this with Cython because of the enormous speed boost.
- """
- cdef Py_ssize_t i, length
- cdef dict result = {}
-
- length = len(index)
-
- for i in range(length):
- result[index[i]] = i
-
- return result
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_float32(ndarray[float32_t] old, ndarray[float32_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef float32_t cur, next
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[nright - 1] < old[0]:
- return indexer
-
- i = j = 0
-
- cur = old[0]
-
- while j <= nright - 1 and new[j] < cur:
- j += 1
-
- while True:
- if j == nright:
- break
-
- if i == nleft - 1:
- while j < nright:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] > cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
- break
-
- next = old[i + 1]
-
- while j < nright and cur <= new[j] < next:
- if new[j] == cur:
- indexer[j] = i
- elif fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
-
- fill_count = 0
- i += 1
- cur = next
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_inplace_float32(ndarray[float32_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef float32_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[0]
- for i in range(N):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_2d_inplace_float32(ndarray[float32_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef float32_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, 0]
- for i in range(N):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-"""
-Backfilling logic for generating fill vector
-
-Diagram of what's going on
-
-Old New Fill vector Mask
- . 0 1
- . 0 1
- . 0 1
-A A 0 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
-B B 1 1
- . 2 1
- . 2 1
- . 2 1
-C C 2 1
- . 0
- . 0
-D
-"""
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_float32(ndarray[float32_t] old, ndarray[float32_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef float32_t cur, prev
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[0] > old[nleft - 1]:
- return indexer
-
- i = nleft - 1
- j = nright - 1
-
- cur = old[nleft - 1]
-
- while j >= 0 and new[j] > cur:
- j -= 1
-
- while True:
- if j < 0:
- break
-
- if i == 0:
- while j >= 0:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
- break
-
- prev = old[i - 1]
-
- while j >= 0 and prev < new[j] <= cur:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
-
- fill_count = 0
- i -= 1
- cur = prev
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_inplace_float32(ndarray[float32_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef float32_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[N - 1]
- for i in range(N - 1, -1, -1):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_2d_inplace_float32(ndarray[float32_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef float32_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, N - 1]
- for i in range(N - 1, -1, -1):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def is_monotonic_float32(ndarray[float32_t] arr, bint timelike):
- """
- Returns
- -------
- is_monotonic_inc, is_monotonic_dec, is_unique
- """
- cdef:
- Py_ssize_t i, n
- float32_t prev, cur
- bint is_monotonic_inc = 1
- bint is_monotonic_dec = 1
- bint is_unique = 1
-
- n = len(arr)
-
- if n == 1:
- if arr[0] != arr[0] or (timelike and arr[0] == iNaT):
- # single value is NaN
- return False, False, True
- else:
- return True, True, True
- elif n < 2:
- return True, True, True
-
- if timelike and arr[0] == iNaT:
- return False, False, True
-
- with nogil:
- prev = arr[0]
- for i in range(1, n):
- cur = arr[i]
- if timelike and cur == iNaT:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if cur < prev:
- is_monotonic_inc = 0
- elif cur > prev:
- is_monotonic_dec = 0
- elif cur == prev:
- is_unique = 0
- else:
- # cur or prev is NaN
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if not is_monotonic_inc and not is_monotonic_dec:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- prev = cur
- return is_monotonic_inc, is_monotonic_dec, \
- is_unique and (is_monotonic_inc or is_monotonic_dec)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def arrmap_float32(ndarray[float32_t] index, object func):
- cdef Py_ssize_t length = index.shape[0]
- cdef Py_ssize_t i = 0
-
- cdef ndarray[object] result = np.empty(length, dtype=np.object_)
-
- from pandas.lib import maybe_convert_objects
-
- for i in range(length):
- result[i] = func(index[i])
-
- return maybe_convert_objects(result)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cpdef map_indices_object(ndarray[object] index):
- """
- Produce a dict mapping the values of the input array to their respective
- locations.
-
- Example:
- array(['hi', 'there']) --> {'hi' : 0 , 'there' : 1}
-
- Better to do this with Cython because of the enormous speed boost.
- """
- cdef Py_ssize_t i, length
- cdef dict result = {}
-
- length = len(index)
-
- for i in range(length):
- result[index[i]] = i
-
- return result
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_object(ndarray[object] old, ndarray[object] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef object cur, next
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[nright - 1] < old[0]:
- return indexer
-
- i = j = 0
-
- cur = old[0]
-
- while j <= nright - 1 and new[j] < cur:
- j += 1
-
- while True:
- if j == nright:
- break
-
- if i == nleft - 1:
- while j < nright:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] > cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
- break
-
- next = old[i + 1]
-
- while j < nright and cur <= new[j] < next:
- if new[j] == cur:
- indexer[j] = i
- elif fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
-
- fill_count = 0
- i += 1
- cur = next
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_inplace_object(ndarray[object] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef object val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[0]
- for i in range(N):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_2d_inplace_object(ndarray[object, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef object val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, 0]
- for i in range(N):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-"""
-Backfilling logic for generating fill vector
-
-Diagram of what's going on
-
-Old New Fill vector Mask
- . 0 1
- . 0 1
- . 0 1
-A A 0 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
-B B 1 1
- . 2 1
- . 2 1
- . 2 1
-C C 2 1
- . 0
- . 0
-D
-"""
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_object(ndarray[object] old, ndarray[object] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef object cur, prev
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[0] > old[nleft - 1]:
- return indexer
-
- i = nleft - 1
- j = nright - 1
-
- cur = old[nleft - 1]
-
- while j >= 0 and new[j] > cur:
- j -= 1
-
- while True:
- if j < 0:
- break
-
- if i == 0:
- while j >= 0:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
- break
-
- prev = old[i - 1]
-
- while j >= 0 and prev < new[j] <= cur:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
-
- fill_count = 0
- i -= 1
- cur = prev
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_inplace_object(ndarray[object] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef object val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[N - 1]
- for i in range(N - 1, -1, -1):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_2d_inplace_object(ndarray[object, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef object val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, N - 1]
- for i in range(N - 1, -1, -1):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def is_monotonic_object(ndarray[object] arr, bint timelike):
- """
- Returns
- -------
- is_monotonic_inc, is_monotonic_dec, is_unique
- """
- cdef:
- Py_ssize_t i, n
- object prev, cur
- bint is_monotonic_inc = 1
- bint is_monotonic_dec = 1
- bint is_unique = 1
-
- n = len(arr)
-
- if n == 1:
- if arr[0] != arr[0] or (timelike and arr[0] == iNaT):
- # single value is NaN
- return False, False, True
- else:
- return True, True, True
- elif n < 2:
- return True, True, True
-
- if timelike and arr[0] == iNaT:
- return False, False, True
-
-
- prev = arr[0]
- for i in range(1, n):
- cur = arr[i]
- if timelike and cur == iNaT:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if cur < prev:
- is_monotonic_inc = 0
- elif cur > prev:
- is_monotonic_dec = 0
- elif cur == prev:
- is_unique = 0
- else:
- # cur or prev is NaN
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if not is_monotonic_inc and not is_monotonic_dec:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- prev = cur
- return is_monotonic_inc, is_monotonic_dec, \
- is_unique and (is_monotonic_inc or is_monotonic_dec)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def arrmap_object(ndarray[object] index, object func):
- cdef Py_ssize_t length = index.shape[0]
- cdef Py_ssize_t i = 0
-
- cdef ndarray[object] result = np.empty(length, dtype=np.object_)
-
- from pandas.lib import maybe_convert_objects
-
- for i in range(length):
- result[i] = func(index[i])
-
- return maybe_convert_objects(result)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cpdef map_indices_int32(ndarray[int32_t] index):
- """
- Produce a dict mapping the values of the input array to their respective
- locations.
-
- Example:
- array(['hi', 'there']) --> {'hi' : 0 , 'there' : 1}
-
- Better to do this with Cython because of the enormous speed boost.
- """
- cdef Py_ssize_t i, length
- cdef dict result = {}
-
- length = len(index)
-
- for i in range(length):
- result[index[i]] = i
-
- return result
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_int32(ndarray[int32_t] old, ndarray[int32_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef int32_t cur, next
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[nright - 1] < old[0]:
- return indexer
-
- i = j = 0
-
- cur = old[0]
-
- while j <= nright - 1 and new[j] < cur:
- j += 1
-
- while True:
- if j == nright:
- break
-
- if i == nleft - 1:
- while j < nright:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] > cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
- break
-
- next = old[i + 1]
-
- while j < nright and cur <= new[j] < next:
- if new[j] == cur:
- indexer[j] = i
- elif fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
-
- fill_count = 0
- i += 1
- cur = next
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_inplace_int32(ndarray[int32_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef int32_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[0]
- for i in range(N):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_2d_inplace_int32(ndarray[int32_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef int32_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, 0]
- for i in range(N):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-"""
-Backfilling logic for generating fill vector
-
-Diagram of what's going on
-
-Old New Fill vector Mask
- . 0 1
- . 0 1
- . 0 1
-A A 0 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
-B B 1 1
- . 2 1
- . 2 1
- . 2 1
-C C 2 1
- . 0
- . 0
-D
-"""
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_int32(ndarray[int32_t] old, ndarray[int32_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef int32_t cur, prev
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[0] > old[nleft - 1]:
- return indexer
-
- i = nleft - 1
- j = nright - 1
-
- cur = old[nleft - 1]
-
- while j >= 0 and new[j] > cur:
- j -= 1
-
- while True:
- if j < 0:
- break
-
- if i == 0:
- while j >= 0:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
- break
-
- prev = old[i - 1]
-
- while j >= 0 and prev < new[j] <= cur:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
-
- fill_count = 0
- i -= 1
- cur = prev
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_inplace_int32(ndarray[int32_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef int32_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[N - 1]
- for i in range(N - 1, -1, -1):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_2d_inplace_int32(ndarray[int32_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef int32_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, N - 1]
- for i in range(N - 1, -1, -1):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def is_monotonic_int32(ndarray[int32_t] arr, bint timelike):
- """
- Returns
- -------
- is_monotonic_inc, is_monotonic_dec, is_unique
- """
- cdef:
- Py_ssize_t i, n
- int32_t prev, cur
- bint is_monotonic_inc = 1
- bint is_monotonic_dec = 1
- bint is_unique = 1
-
- n = len(arr)
-
- if n == 1:
- if arr[0] != arr[0] or (timelike and arr[0] == iNaT):
- # single value is NaN
- return False, False, True
- else:
- return True, True, True
- elif n < 2:
- return True, True, True
-
- if timelike and arr[0] == iNaT:
- return False, False, True
-
- with nogil:
- prev = arr[0]
- for i in range(1, n):
- cur = arr[i]
- if timelike and cur == iNaT:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if cur < prev:
- is_monotonic_inc = 0
- elif cur > prev:
- is_monotonic_dec = 0
- elif cur == prev:
- is_unique = 0
- else:
- # cur or prev is NaN
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if not is_monotonic_inc and not is_monotonic_dec:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- prev = cur
- return is_monotonic_inc, is_monotonic_dec, \
- is_unique and (is_monotonic_inc or is_monotonic_dec)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def arrmap_int32(ndarray[int32_t] index, object func):
- cdef Py_ssize_t length = index.shape[0]
- cdef Py_ssize_t i = 0
-
- cdef ndarray[object] result = np.empty(length, dtype=np.object_)
-
- from pandas.lib import maybe_convert_objects
-
- for i in range(length):
- result[i] = func(index[i])
-
- return maybe_convert_objects(result)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cpdef map_indices_int64(ndarray[int64_t] index):
- """
- Produce a dict mapping the values of the input array to their respective
- locations.
-
- Example:
- array(['hi', 'there']) --> {'hi' : 0 , 'there' : 1}
-
- Better to do this with Cython because of the enormous speed boost.
- """
- cdef Py_ssize_t i, length
- cdef dict result = {}
-
- length = len(index)
-
- for i in range(length):
- result[index[i]] = i
-
- return result
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_int64(ndarray[int64_t] old, ndarray[int64_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef int64_t cur, next
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[nright - 1] < old[0]:
- return indexer
-
- i = j = 0
-
- cur = old[0]
-
- while j <= nright - 1 and new[j] < cur:
- j += 1
-
- while True:
- if j == nright:
- break
-
- if i == nleft - 1:
- while j < nright:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] > cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
- break
-
- next = old[i + 1]
-
- while j < nright and cur <= new[j] < next:
- if new[j] == cur:
- indexer[j] = i
- elif fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
-
- fill_count = 0
- i += 1
- cur = next
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_inplace_int64(ndarray[int64_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef int64_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[0]
- for i in range(N):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_2d_inplace_int64(ndarray[int64_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef int64_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, 0]
- for i in range(N):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-"""
-Backfilling logic for generating fill vector
-
-Diagram of what's going on
-
-Old New Fill vector Mask
- . 0 1
- . 0 1
- . 0 1
-A A 0 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
-B B 1 1
- . 2 1
- . 2 1
- . 2 1
-C C 2 1
- . 0
- . 0
-D
-"""
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_int64(ndarray[int64_t] old, ndarray[int64_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef int64_t cur, prev
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[0] > old[nleft - 1]:
- return indexer
-
- i = nleft - 1
- j = nright - 1
-
- cur = old[nleft - 1]
-
- while j >= 0 and new[j] > cur:
- j -= 1
-
- while True:
- if j < 0:
- break
-
- if i == 0:
- while j >= 0:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
- break
-
- prev = old[i - 1]
-
- while j >= 0 and prev < new[j] <= cur:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
-
- fill_count = 0
- i -= 1
- cur = prev
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_inplace_int64(ndarray[int64_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef int64_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[N - 1]
- for i in range(N - 1, -1, -1):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_2d_inplace_int64(ndarray[int64_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef int64_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, N - 1]
- for i in range(N - 1, -1, -1):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def is_monotonic_int64(ndarray[int64_t] arr, bint timelike):
- """
- Returns
- -------
- is_monotonic_inc, is_monotonic_dec, is_unique
- """
- cdef:
- Py_ssize_t i, n
- int64_t prev, cur
- bint is_monotonic_inc = 1
- bint is_monotonic_dec = 1
- bint is_unique = 1
-
- n = len(arr)
-
- if n == 1:
- if arr[0] != arr[0] or (timelike and arr[0] == iNaT):
- # single value is NaN
- return False, False, True
- else:
- return True, True, True
- elif n < 2:
- return True, True, True
-
- if timelike and arr[0] == iNaT:
- return False, False, True
-
- with nogil:
- prev = arr[0]
- for i in range(1, n):
- cur = arr[i]
- if timelike and cur == iNaT:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if cur < prev:
- is_monotonic_inc = 0
- elif cur > prev:
- is_monotonic_dec = 0
- elif cur == prev:
- is_unique = 0
- else:
- # cur or prev is NaN
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if not is_monotonic_inc and not is_monotonic_dec:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- prev = cur
- return is_monotonic_inc, is_monotonic_dec, \
- is_unique and (is_monotonic_inc or is_monotonic_dec)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def arrmap_int64(ndarray[int64_t] index, object func):
- cdef Py_ssize_t length = index.shape[0]
- cdef Py_ssize_t i = 0
-
- cdef ndarray[object] result = np.empty(length, dtype=np.object_)
-
- from pandas.lib import maybe_convert_objects
-
- for i in range(length):
- result[i] = func(index[i])
-
- return maybe_convert_objects(result)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cpdef map_indices_bool(ndarray[uint8_t] index):
- """
- Produce a dict mapping the values of the input array to their respective
- locations.
-
- Example:
- array(['hi', 'there']) --> {'hi' : 0 , 'there' : 1}
-
- Better to do this with Cython because of the enormous speed boost.
- """
- cdef Py_ssize_t i, length
- cdef dict result = {}
-
- length = len(index)
-
- for i in range(length):
- result[index[i]] = i
-
- return result
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_bool(ndarray[uint8_t] old, ndarray[uint8_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef uint8_t cur, next
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[nright - 1] < old[0]:
- return indexer
-
- i = j = 0
-
- cur = old[0]
-
- while j <= nright - 1 and new[j] < cur:
- j += 1
-
- while True:
- if j == nright:
- break
-
- if i == nleft - 1:
- while j < nright:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] > cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
- break
-
- next = old[i + 1]
-
- while j < nright and cur <= new[j] < next:
- if new[j] == cur:
- indexer[j] = i
- elif fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j += 1
-
- fill_count = 0
- i += 1
- cur = next
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_inplace_bool(ndarray[uint8_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef uint8_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[0]
- for i in range(N):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def pad_2d_inplace_bool(ndarray[uint8_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef uint8_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, 0]
- for i in range(N):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-"""
-Backfilling logic for generating fill vector
-
-Diagram of what's going on
-
-Old New Fill vector Mask
- . 0 1
- . 0 1
- . 0 1
-A A 0 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
- . 1 1
-B B 1 1
- . 2 1
- . 2 1
- . 2 1
-C C 2 1
- . 0
- . 0
-D
-"""
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_bool(ndarray[uint8_t] old, ndarray[uint8_t] new,
- limit=None):
- cdef Py_ssize_t i, j, nleft, nright
- cdef ndarray[int64_t, ndim=1] indexer
- cdef uint8_t cur, prev
- cdef int lim, fill_count = 0
-
- nleft = len(old)
- nright = len(new)
- indexer = np.empty(nright, dtype=np.int64)
- indexer.fill(-1)
-
- if limit is None:
- lim = nright
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- if nleft == 0 or nright == 0 or new[0] > old[nleft - 1]:
- return indexer
-
- i = nleft - 1
- j = nright - 1
-
- cur = old[nleft - 1]
-
- while j >= 0 and new[j] > cur:
- j -= 1
-
- while True:
- if j < 0:
- break
-
- if i == 0:
- while j >= 0:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
- break
-
- prev = old[i - 1]
-
- while j >= 0 and prev < new[j] <= cur:
- if new[j] == cur:
- indexer[j] = i
- elif new[j] < cur and fill_count < lim:
- indexer[j] = i
- fill_count += 1
- j -= 1
-
- fill_count = 0
- i -= 1
- cur = prev
-
- return indexer
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_inplace_bool(ndarray[uint8_t] values,
- ndarray[uint8_t, cast=True] mask,
- limit=None):
- cdef Py_ssize_t i, N
- cdef uint8_t val
- cdef int lim, fill_count = 0
-
- N = len(values)
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- val = values[N - 1]
- for i in range(N - 1, -1, -1):
- if mask[i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[i] = val
- else:
- fill_count = 0
- val = values[i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def backfill_2d_inplace_bool(ndarray[uint8_t, ndim=2] values,
- ndarray[uint8_t, ndim=2] mask,
- limit=None):
- cdef Py_ssize_t i, j, N, K
- cdef uint8_t val
- cdef int lim, fill_count = 0
-
- K, N = ( values).shape
-
- # GH 2778
- if N == 0:
- return
-
- if limit is None:
- lim = N
- else:
- if limit < 0:
- raise ValueError('Limit must be non-negative')
- lim = limit
-
- for j in range(K):
- fill_count = 0
- val = values[j, N - 1]
- for i in range(N - 1, -1, -1):
- if mask[j, i]:
- if fill_count >= lim:
- continue
- fill_count += 1
- values[j, i] = val
- else:
- fill_count = 0
- val = values[j, i]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def is_monotonic_bool(ndarray[uint8_t] arr, bint timelike):
- """
- Returns
- -------
- is_monotonic_inc, is_monotonic_dec, is_unique
- """
- cdef:
- Py_ssize_t i, n
- uint8_t prev, cur
- bint is_monotonic_inc = 1
- bint is_monotonic_dec = 1
- bint is_unique = 1
-
- n = len(arr)
-
- if n == 1:
- if arr[0] != arr[0] or (timelike and arr[0] == iNaT):
- # single value is NaN
- return False, False, True
- else:
- return True, True, True
- elif n < 2:
- return True, True, True
-
- if timelike and arr[0] == iNaT:
- return False, False, True
-
- with nogil:
- prev = arr[0]
- for i in range(1, n):
- cur = arr[i]
- if timelike and cur == iNaT:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if cur < prev:
- is_monotonic_inc = 0
- elif cur > prev:
- is_monotonic_dec = 0
- elif cur == prev:
- is_unique = 0
- else:
- # cur or prev is NaN
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- if not is_monotonic_inc and not is_monotonic_dec:
- is_monotonic_inc = 0
- is_monotonic_dec = 0
- break
- prev = cur
- return is_monotonic_inc, is_monotonic_dec, \
- is_unique and (is_monotonic_inc or is_monotonic_dec)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def arrmap_bool(ndarray[uint8_t] index, object func):
- cdef Py_ssize_t length = index.shape[0]
- cdef Py_ssize_t i = 0
-
- cdef ndarray[object] result = np.empty(length, dtype=np.object_)
-
- from pandas.lib import maybe_convert_objects
-
- for i in range(length):
- result[i] = func(index[i])
-
- return maybe_convert_objects(result)
-
-#----------------------------------------------------------------------
-# put template
-#----------------------------------------------------------------------
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def diff_2d_float64(ndarray[float64_t, ndim=2] arr,
- ndarray[float64_t, ndim=2] out,
- Py_ssize_t periods, int axis):
- cdef:
- Py_ssize_t i, j, sx, sy
-
- sx, sy = ( arr).shape
- if arr.flags.f_contiguous:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for j in range(sy):
- for i in range(start, stop):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for j in range(start, stop):
- for i in range(sx):
- out[i, j] = arr[i, j] - arr[i, j - periods]
- else:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for i in range(start, stop):
- for j in range(sy):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for i in range(sx):
- for j in range(start, stop):
- out[i, j] = arr[i, j] - arr[i, j - periods]
-
-
-def put2d_float64_float64(ndarray[float64_t, ndim=2, cast=True] values,
- ndarray[int64_t] indexer, Py_ssize_t loc,
- ndarray[float64_t] out):
- cdef:
- Py_ssize_t i, j, k
-
- k = len(values)
- for j from 0 <= j < k:
- i = indexer[j]
- out[i] = values[j, loc]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def diff_2d_float32(ndarray[float32_t, ndim=2] arr,
- ndarray[float32_t, ndim=2] out,
- Py_ssize_t periods, int axis):
- cdef:
- Py_ssize_t i, j, sx, sy
-
- sx, sy = ( arr).shape
- if arr.flags.f_contiguous:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for j in range(sy):
- for i in range(start, stop):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for j in range(start, stop):
- for i in range(sx):
- out[i, j] = arr[i, j] - arr[i, j - periods]
- else:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for i in range(start, stop):
- for j in range(sy):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for i in range(sx):
- for j in range(start, stop):
- out[i, j] = arr[i, j] - arr[i, j - periods]
-
-
-def put2d_float32_float32(ndarray[float32_t, ndim=2, cast=True] values,
- ndarray[int64_t] indexer, Py_ssize_t loc,
- ndarray[float32_t] out):
- cdef:
- Py_ssize_t i, j, k
-
- k = len(values)
- for j from 0 <= j < k:
- i = indexer[j]
- out[i] = values[j, loc]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def diff_2d_int8(ndarray[int8_t, ndim=2] arr,
- ndarray[float32_t, ndim=2] out,
- Py_ssize_t periods, int axis):
- cdef:
- Py_ssize_t i, j, sx, sy
-
- sx, sy = ( arr).shape
- if arr.flags.f_contiguous:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for j in range(sy):
- for i in range(start, stop):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for j in range(start, stop):
- for i in range(sx):
- out[i, j] = arr[i, j] - arr[i, j - periods]
- else:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for i in range(start, stop):
- for j in range(sy):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for i in range(sx):
- for j in range(start, stop):
- out[i, j] = arr[i, j] - arr[i, j - periods]
-
-
-def put2d_int8_float32(ndarray[int8_t, ndim=2, cast=True] values,
- ndarray[int64_t] indexer, Py_ssize_t loc,
- ndarray[float32_t] out):
- cdef:
- Py_ssize_t i, j, k
-
- k = len(values)
- for j from 0 <= j < k:
- i = indexer[j]
- out[i] = values[j, loc]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def diff_2d_int16(ndarray[int16_t, ndim=2] arr,
- ndarray[float32_t, ndim=2] out,
- Py_ssize_t periods, int axis):
- cdef:
- Py_ssize_t i, j, sx, sy
-
- sx, sy = ( arr).shape
- if arr.flags.f_contiguous:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for j in range(sy):
- for i in range(start, stop):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for j in range(start, stop):
- for i in range(sx):
- out[i, j] = arr[i, j] - arr[i, j - periods]
- else:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for i in range(start, stop):
- for j in range(sy):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for i in range(sx):
- for j in range(start, stop):
- out[i, j] = arr[i, j] - arr[i, j - periods]
-
-
-def put2d_int16_float32(ndarray[int16_t, ndim=2, cast=True] values,
- ndarray[int64_t] indexer, Py_ssize_t loc,
- ndarray[float32_t] out):
- cdef:
- Py_ssize_t i, j, k
-
- k = len(values)
- for j from 0 <= j < k:
- i = indexer[j]
- out[i] = values[j, loc]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def diff_2d_int32(ndarray[int32_t, ndim=2] arr,
- ndarray[float64_t, ndim=2] out,
- Py_ssize_t periods, int axis):
- cdef:
- Py_ssize_t i, j, sx, sy
-
- sx, sy = ( arr).shape
- if arr.flags.f_contiguous:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for j in range(sy):
- for i in range(start, stop):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for j in range(start, stop):
- for i in range(sx):
- out[i, j] = arr[i, j] - arr[i, j - periods]
- else:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for i in range(start, stop):
- for j in range(sy):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for i in range(sx):
- for j in range(start, stop):
- out[i, j] = arr[i, j] - arr[i, j - periods]
-
-
-def put2d_int32_float64(ndarray[int32_t, ndim=2, cast=True] values,
- ndarray[int64_t] indexer, Py_ssize_t loc,
- ndarray[float64_t] out):
- cdef:
- Py_ssize_t i, j, k
-
- k = len(values)
- for j from 0 <= j < k:
- i = indexer[j]
- out[i] = values[j, loc]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def diff_2d_int64(ndarray[int64_t, ndim=2] arr,
- ndarray[float64_t, ndim=2] out,
- Py_ssize_t periods, int axis):
- cdef:
- Py_ssize_t i, j, sx, sy
-
- sx, sy = ( arr).shape
- if arr.flags.f_contiguous:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for j in range(sy):
- for i in range(start, stop):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for j in range(start, stop):
- for i in range(sx):
- out[i, j] = arr[i, j] - arr[i, j - periods]
- else:
- if axis == 0:
- if periods >= 0:
- start, stop = periods, sx
- else:
- start, stop = 0, sx + periods
- for i in range(start, stop):
- for j in range(sy):
- out[i, j] = arr[i, j] - arr[i - periods, j]
- else:
- if periods >= 0:
- start, stop = periods, sy
- else:
- start, stop = 0, sy + periods
- for i in range(sx):
- for j in range(start, stop):
- out[i, j] = arr[i, j] - arr[i, j - periods]
-
-
-def put2d_int64_float64(ndarray[int64_t, ndim=2, cast=True] values,
- ndarray[int64_t] indexer, Py_ssize_t loc,
- ndarray[float64_t] out):
- cdef:
- Py_ssize_t i, j, k
-
- k = len(values)
- for j from 0 <= j < k:
- i = indexer[j]
- out[i] = values[j, loc]
-
-#----------------------------------------------------------------------
-# ensure_dtype
-#----------------------------------------------------------------------
-
-cdef int PLATFORM_INT = ( np.arange(0, dtype=np.intp)).descr.type_num
-
-cpdef ensure_platform_int(object arr):
- # GH3033, GH1392
- # platform int is the size of the int pointer, e.g. np.intp
- if util.is_array(arr):
- if ( arr).descr.type_num == PLATFORM_INT:
- return arr
- else:
- return arr.astype(np.intp)
- else:
- return np.array(arr, dtype=np.intp)
-
-cpdef ensure_object(object arr):
- if util.is_array(arr):
- if ( arr).descr.type_num == NPY_OBJECT:
- return arr
- else:
- return arr.astype(np.object_)
- elif hasattr(arr, 'asobject'):
- return arr.asobject
- else:
- return np.array(arr, dtype=np.object_)
-
-cpdef ensure_float64(object arr):
- if util.is_array(arr):
- if ( arr).descr.type_num == NPY_FLOAT64:
- return arr
- else:
- return arr.astype(np.float64)
- else:
- return np.array(arr, dtype=np.float64)
-
-cpdef ensure_float32(object arr):
- if util.is_array(arr):
- if ( arr).descr.type_num == NPY_FLOAT32:
- return arr
- else:
- return arr.astype(np.float32)
- else:
- return np.array(arr, dtype=np.float32)
-
-cpdef ensure_int8(object arr):
- if util.is_array(arr):
- if ( arr).descr.type_num == NPY_INT8:
- return arr
- else:
- return arr.astype(np.int8)
- else:
- return np.array(arr, dtype=np.int8)
-
-cpdef ensure_int16(object arr):
- if util.is_array(arr):
- if ( arr).descr.type_num == NPY_INT16:
- return arr
- else:
- return arr.astype(np.int16)
- else:
- return np.array(arr, dtype=np.int16)
-
-cpdef ensure_int32(object arr):
- if util.is_array(arr):
- if ( arr).descr.type_num == NPY_INT32:
- return arr
- else:
- return arr.astype(np.int32)
- else:
- return np.array(arr, dtype=np.int32)
-
-cpdef ensure_int64(object arr):
- if util.is_array(arr):
- if ( arr).descr.type_num == NPY_INT64:
- return arr
- else:
- return arr.astype(np.int64)
- else:
- return np.array(arr, dtype=np.int64)
diff --git a/pandas/src/algos_groupby_helper.pxi b/pandas/src/algos_groupby_helper.pxi
deleted file mode 100644
index 013a03f719bbd..0000000000000
--- a/pandas/src/algos_groupby_helper.pxi
+++ /dev/null
@@ -1,1375 +0,0 @@
-"""
-Template for each `dtype` helper function using groupby
-
-WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
-"""
-
-cdef extern from "numpy/npy_math.h":
- double NAN "NPY_NAN"
-_int64_max = np.iinfo(np.int64).max
-
-#----------------------------------------------------------------------
-# group_add, group_prod, group_var, group_mean, group_ohlc
-#----------------------------------------------------------------------
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_add_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float64_t val, count
- ndarray[float64_t, ndim=2] sumx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
- sumx = np.zeros_like(out)
-
- N, K = ( values).shape
-
- with nogil:
-
- if K > 1:
-
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val:
- nobs[lab, j] += 1
- sumx[lab, j] += val
-
- else:
-
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val:
- nobs[lab, 0] += 1
- sumx[lab, 0] += val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = sumx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_prod_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float64_t val, count
- ndarray[float64_t, ndim=2] prodx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
- prodx = np.ones_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val:
- nobs[lab, j] += 1
- prodx[lab, j] *= val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val:
- nobs[lab, 0] += 1
- prodx[lab, 0] *= val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = prodx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-@cython.cdivision(True)
-def group_var_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float64_t val, ct, oldmean
- ndarray[float64_t, ndim=2] nobs, mean
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
- mean = np.zeros_like(out)
-
- N, K = ( values).shape
-
- out[:, :] = 0.0
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
-
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val:
- nobs[lab, j] += 1
- oldmean = mean[lab, j]
- mean[lab, j] += (val - oldmean) / nobs[lab, j]
- out[lab, j] += (val - mean[lab, j]) * (val - oldmean)
-
- for i in range(ncounts):
- for j in range(K):
- ct = nobs[i, j]
- if ct < 2:
- out[i, j] = NAN
- else:
- out[i, j] /= (ct - 1)
-# add passing bin edges, instead of labels
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_mean_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float64_t val, count
- ndarray[float64_t, ndim=2] sumx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
- sumx = np.zeros_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
- # not nan
- if val == val:
- nobs[lab, j] += 1
- sumx[lab, j] += val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
- # not nan
- if val == val:
- nobs[lab, 0] += 1
- sumx[lab, 0] += val
-
- for i in range(ncounts):
- for j in range(K):
- count = nobs[i, j]
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = sumx[i, j] / count
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_ohlc_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab
- float64_t val, count
- Py_ssize_t ngroups = len(counts)
-
- if len(labels) == 0:
- return
-
- N, K = ( values).shape
-
- if out.shape[1] != 4:
- raise ValueError('Output array must have 4 columns')
-
- if K > 1:
- raise NotImplementedError("Argument 'values' must have only "
- "one dimension")
- out.fill(np.nan)
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab == -1:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
- if val != val:
- continue
-
- if out[lab, 0] != out[lab, 0]:
- out[lab, 0] = out[lab, 1] = out[lab, 2] = out[lab, 3] = val
- else:
- out[lab, 1] = max(out[lab, 1], val)
- out[lab, 2] = min(out[lab, 2], val)
- out[lab, 3] = val
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_add_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float32_t val, count
- ndarray[float32_t, ndim=2] sumx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
- sumx = np.zeros_like(out)
-
- N, K = ( values).shape
-
- with nogil:
-
- if K > 1:
-
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val:
- nobs[lab, j] += 1
- sumx[lab, j] += val
-
- else:
-
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val:
- nobs[lab, 0] += 1
- sumx[lab, 0] += val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = sumx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_prod_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float32_t val, count
- ndarray[float32_t, ndim=2] prodx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
- prodx = np.ones_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val:
- nobs[lab, j] += 1
- prodx[lab, j] *= val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val:
- nobs[lab, 0] += 1
- prodx[lab, 0] *= val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = prodx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-@cython.cdivision(True)
-def group_var_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels):
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float32_t val, ct, oldmean
- ndarray[float32_t, ndim=2] nobs, mean
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
- mean = np.zeros_like(out)
-
- N, K = ( values).shape
-
- out[:, :] = 0.0
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
-
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val:
- nobs[lab, j] += 1
- oldmean = mean[lab, j]
- mean[lab, j] += (val - oldmean) / nobs[lab, j]
- out[lab, j] += (val - mean[lab, j]) * (val - oldmean)
-
- for i in range(ncounts):
- for j in range(K):
- ct = nobs[i, j]
- if ct < 2:
- out[i, j] = NAN
- else:
- out[i, j] /= (ct - 1)
-# add passing bin edges, instead of labels
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_mean_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels):
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float32_t val, count
- ndarray[float32_t, ndim=2] sumx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
- sumx = np.zeros_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
- # not nan
- if val == val:
- nobs[lab, j] += 1
- sumx[lab, j] += val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
- # not nan
- if val == val:
- nobs[lab, 0] += 1
- sumx[lab, 0] += val
-
- for i in range(ncounts):
- for j in range(K):
- count = nobs[i, j]
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = sumx[i, j] / count
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_ohlc_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab
- float32_t val, count
- Py_ssize_t ngroups = len(counts)
-
- if len(labels) == 0:
- return
-
- N, K = ( values).shape
-
- if out.shape[1] != 4:
- raise ValueError('Output array must have 4 columns')
-
- if K > 1:
- raise NotImplementedError("Argument 'values' must have only "
- "one dimension")
- out.fill(np.nan)
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab == -1:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
- if val != val:
- continue
-
- if out[lab, 0] != out[lab, 0]:
- out[lab, 0] = out[lab, 1] = out[lab, 2] = out[lab, 3] = val
- else:
- out[lab, 1] = max(out[lab, 1], val)
- out[lab, 2] = min(out[lab, 2], val)
- out[lab, 3] = val
-
-#----------------------------------------------------------------------
-# group_nth, group_last
-#----------------------------------------------------------------------
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_last_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float64_t val, count
- ndarray[float64_t, ndim=2] resx
- ndarray[int64_t, ndim=2] nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros(( out).shape, dtype=np.int64)
- resx = np.empty_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, j] += 1
- resx[lab, j] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = resx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_nth_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels, int64_t rank):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float64_t val, count
- ndarray[float64_t, ndim=2] resx
- ndarray[int64_t, ndim=2] nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros(( out).shape, dtype=np.int64)
- resx = np.empty_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, j] += 1
- if nobs[lab, j] == rank:
- resx[lab, j] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = resx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_last_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float32_t val, count
- ndarray[float32_t, ndim=2] resx
- ndarray[int64_t, ndim=2] nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros(( out).shape, dtype=np.int64)
- resx = np.empty_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, j] += 1
- resx[lab, j] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = resx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_nth_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels, int64_t rank):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float32_t val, count
- ndarray[float32_t, ndim=2] resx
- ndarray[int64_t, ndim=2] nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros(( out).shape, dtype=np.int64)
- resx = np.empty_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, j] += 1
- if nobs[lab, j] == rank:
- resx[lab, j] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = resx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_last_int64(ndarray[int64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[int64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- int64_t val, count
- ndarray[int64_t, ndim=2] resx
- ndarray[int64_t, ndim=2] nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros(( out).shape, dtype=np.int64)
- resx = np.empty_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != iNaT:
- nobs[lab, j] += 1
- resx[lab, j] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = iNaT
- else:
- out[i, j] = resx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_nth_int64(ndarray[int64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[int64_t, ndim=2] values,
- ndarray[int64_t] labels, int64_t rank):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- int64_t val, count
- ndarray[int64_t, ndim=2] resx
- ndarray[int64_t, ndim=2] nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros(( out).shape, dtype=np.int64)
- resx = np.empty_like(out)
-
- N, K = ( values).shape
-
- with nogil:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != iNaT:
- nobs[lab, j] += 1
- if nobs[lab, j] == rank:
- resx[lab, j] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = iNaT
- else:
- out[i, j] = resx[i, j]
-
-#----------------------------------------------------------------------
-# group_min, group_max
-#----------------------------------------------------------------------
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_max_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float64_t val, count
- ndarray[float64_t, ndim=2] maxx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
-
- maxx = np.empty_like(out)
- maxx.fill(-np.inf)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, j] += 1
- if val > maxx[lab, j]:
- maxx[lab, j] = val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, 0] += 1
- if val > maxx[lab, 0]:
- maxx[lab, 0] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = maxx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_min_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float64_t val, count
- ndarray[float64_t, ndim=2] minx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
-
- minx = np.empty_like(out)
- minx.fill(np.inf)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != NAN:
-
- nobs[lab, j] += 1
- if val < minx[lab, j]:
- minx[lab, j] = val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, 0] += 1
- if val < minx[lab, 0]:
- minx[lab, 0] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = minx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_max_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float32_t val, count
- ndarray[float32_t, ndim=2] maxx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
-
- maxx = np.empty_like(out)
- maxx.fill(-np.inf)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, j] += 1
- if val > maxx[lab, j]:
- maxx[lab, j] = val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, 0] += 1
- if val > maxx[lab, 0]:
- maxx[lab, 0] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = maxx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_min_float32(ndarray[float32_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- float32_t val, count
- ndarray[float32_t, ndim=2] minx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
-
- minx = np.empty_like(out)
- minx.fill(np.inf)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != NAN:
-
- nobs[lab, j] += 1
- if val < minx[lab, j]:
- minx[lab, j] = val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val and val != NAN:
- nobs[lab, 0] += 1
- if val < minx[lab, 0]:
- minx[lab, 0] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = NAN
- else:
- out[i, j] = minx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_max_int64(ndarray[int64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[int64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- int64_t val, count
- ndarray[int64_t, ndim=2] maxx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
-
- maxx = np.empty_like(out)
- maxx.fill(-_int64_max)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != iNaT:
- nobs[lab, j] += 1
- if val > maxx[lab, j]:
- maxx[lab, j] = val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val and val != iNaT:
- nobs[lab, 0] += 1
- if val > maxx[lab, 0]:
- maxx[lab, 0] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = iNaT
- else:
- out[i, j] = maxx[i, j]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def group_min_int64(ndarray[int64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[int64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
- int64_t val, count
- ndarray[int64_t, ndim=2] minx, nobs
-
- if not len(values) == len(labels):
- raise AssertionError("len(index) != len(labels)")
-
- nobs = np.zeros_like(out)
-
- minx = np.empty_like(out)
- minx.fill(_int64_max)
-
- N, K = ( values).shape
-
- with nogil:
- if K > 1:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- for j in range(K):
- val = values[i, j]
-
- # not nan
- if val == val and val != iNaT:
-
- nobs[lab, j] += 1
- if val < minx[lab, j]:
- minx[lab, j] = val
- else:
- for i in range(N):
- lab = labels[i]
- if lab < 0:
- continue
-
- counts[lab] += 1
- val = values[i, 0]
-
- # not nan
- if val == val and val != iNaT:
- nobs[lab, 0] += 1
- if val < minx[lab, 0]:
- minx[lab, 0] = val
-
- for i in range(ncounts):
- for j in range(K):
- if nobs[i, j] == 0:
- out[i, j] = iNaT
- else:
- out[i, j] = minx[i, j]
-
-#----------------------------------------------------------------------
-# other grouping functions not needing a template
-#----------------------------------------------------------------------
-
-
-def group_median_float64(ndarray[float64_t, ndim=2] out,
- ndarray[int64_t] counts,
- ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] labels):
- """
- Only aggregates on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, ngroups, size
- ndarray[int64_t] _counts
- ndarray data
- float64_t* ptr
- ngroups = len(counts)
- N, K = ( values).shape
-
- indexer, _counts = groupsort_indexer(labels, ngroups)
- counts[:] = _counts[1:]
-
- data = np.empty((K, N), dtype=np.float64)
- ptr = data.data
-
- take_2d_axis1_float64_float64(values.T, indexer, out=data)
-
- for i in range(K):
- # exclude NA group
- ptr += _counts[0]
- for j in range(ngroups):
- size = _counts[j + 1]
- out[j, i] = _median_linear(ptr, size)
- ptr += size
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def group_cumprod_float64(float64_t[:, :] out,
- float64_t[:, :] values,
- int64_t[:] labels,
- float64_t[:, :] accum):
- """
- Only transforms on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, size
- float64_t val
- int64_t lab
-
- N, K = ( values).shape
- accum = np.ones_like(accum)
-
- with nogil:
- for i in range(N):
- lab = labels[i]
-
- if lab < 0:
- continue
- for j in range(K):
- val = values[i, j]
- if val == val:
- accum[lab, j] *= val
- out[i, j] = accum[lab, j]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def group_cumsum(numeric[:, :] out,
- numeric[:, :] values,
- int64_t[:] labels,
- numeric[:, :] accum):
- """
- Only transforms on axis=0
- """
- cdef:
- Py_ssize_t i, j, N, K, size
- numeric val
- int64_t lab
-
- N, K = ( values).shape
- accum = np.zeros_like(accum)
-
- with nogil:
- for i in range(N):
- lab = labels[i]
-
- if lab < 0:
- continue
- for j in range(K):
- val = values[i, j]
- if val == val:
- accum[lab, j] += val
- out[i, j] = accum[lab, j]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-def group_shift_indexer(int64_t[:] out, int64_t[:] labels,
- int ngroups, int periods):
- cdef:
- Py_ssize_t N, i, j, ii
- int offset, sign
- int64_t lab, idxer, idxer_slot
- int64_t[:] label_seen = np.zeros(ngroups, dtype=np.int64)
- int64_t[:, :] label_indexer
-
- N, = ( labels).shape
-
- if periods < 0:
- periods = -periods
- offset = N - 1
- sign = -1
- elif periods > 0:
- offset = 0
- sign = 1
-
- if periods == 0:
- with nogil:
- for i in range(N):
- out[i] = i
- else:
- # array of each previous indexer seen
- label_indexer = np.zeros((ngroups, periods), dtype=np.int64)
- with nogil:
- for i in range(N):
- ## reverse iterator if shifting backwards
- ii = offset + sign * i
- lab = labels[ii]
-
- # Skip null keys
- if lab == -1:
- out[ii] = -1
- continue
-
- label_seen[lab] += 1
-
- idxer_slot = label_seen[lab] % periods
- idxer = label_indexer[lab, idxer_slot]
-
- if label_seen[lab] > periods:
- out[ii] = idxer
- else:
- out[ii] = -1
-
- label_indexer[lab, idxer_slot] = ii
diff --git a/pandas/src/algos_take_helper.pxi b/pandas/src/algos_take_helper.pxi
deleted file mode 100644
index d8fb05804d4e5..0000000000000
--- a/pandas/src/algos_take_helper.pxi
+++ /dev/null
@@ -1,4949 +0,0 @@
-"""
-Template for each `dtype` helper function for take
-
-WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
-"""
-
-#----------------------------------------------------------------------
-# take_1d, take_2d
-#----------------------------------------------------------------------
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_bool_bool_memview(uint8_t[:] values,
- int64_t[:] indexer,
- uint8_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- uint8_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_bool_bool(ndarray[uint8_t, ndim=1] values,
- int64_t[:] indexer,
- uint8_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_bool_bool_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- uint8_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_bool_bool_memview(uint8_t[:, :] values,
- int64_t[:] indexer,
- uint8_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- uint8_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- uint8_t *v
- uint8_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(uint8_t) and
- sizeof(uint8_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(uint8_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_bool_bool(ndarray[uint8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- uint8_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_bool_bool_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- uint8_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- uint8_t *v
- uint8_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(uint8_t) and
- sizeof(uint8_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(uint8_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_bool_bool_memview(uint8_t[:, :] values,
- int64_t[:] indexer,
- uint8_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- uint8_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_bool_bool(ndarray[uint8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- uint8_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_bool_bool_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- uint8_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_bool_bool(ndarray[uint8_t, ndim=2] values,
- indexer,
- ndarray[uint8_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- uint8_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_bool_object_memview(uint8_t[:] values,
- int64_t[:] indexer,
- object[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- object fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = True if values[idx] > 0 else False
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_bool_object(ndarray[uint8_t, ndim=1] values,
- int64_t[:] indexer,
- object[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_bool_object_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- object fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = True if values[idx] > 0 else False
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_bool_object_memview(uint8_t[:, :] values,
- int64_t[:] indexer,
- object[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- object fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- object *v
- object *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(object) and
- sizeof(object) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(object) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = True if values[idx, j] > 0 else False
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_bool_object(ndarray[uint8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- object[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_bool_object_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- object fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- object *v
- object *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(object) and
- sizeof(object) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(object) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = True if values[idx, j] > 0 else False
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_bool_object_memview(uint8_t[:, :] values,
- int64_t[:] indexer,
- object[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- object fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = True if values[i, idx] > 0 else False
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_bool_object(ndarray[uint8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- object[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_bool_object_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- object fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = True if values[i, idx] > 0 else False
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_bool_object(ndarray[uint8_t, ndim=2] values,
- indexer,
- ndarray[object, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- object fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = True if values[idx, idx1[j]] > 0 else False
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int8_int8_memview(int8_t[:] values,
- int64_t[:] indexer,
- int8_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int8_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int8_int8(ndarray[int8_t, ndim=1] values,
- int64_t[:] indexer,
- int8_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int8_int8_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int8_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int8_int8_memview(int8_t[:, :] values,
- int64_t[:] indexer,
- int8_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int8_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- int8_t *v
- int8_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int8_t) and
- sizeof(int8_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int8_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int8_int8(ndarray[int8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int8_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int8_int8_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int8_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- int8_t *v
- int8_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int8_t) and
- sizeof(int8_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int8_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int8_int8_memview(int8_t[:, :] values,
- int64_t[:] indexer,
- int8_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int8_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int8_int8(ndarray[int8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int8_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int8_int8_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int8_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int8_int8(ndarray[int8_t, ndim=2] values,
- indexer,
- ndarray[int8_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int8_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int8_int32_memview(int8_t[:] values,
- int64_t[:] indexer,
- int32_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int32_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int8_int32(ndarray[int8_t, ndim=1] values,
- int64_t[:] indexer,
- int32_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int8_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int32_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int8_int32_memview(int8_t[:, :] values,
- int64_t[:] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int32_t *v
- int32_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int32_t) and
- sizeof(int32_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int32_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int8_int32(ndarray[int8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int8_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int32_t *v
- int32_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int32_t) and
- sizeof(int32_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int32_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int8_int32_memview(int8_t[:, :] values,
- int64_t[:] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int8_int32(ndarray[int8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int8_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int8_int32(ndarray[int8_t, ndim=2] values,
- indexer,
- ndarray[int32_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int32_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int8_int64_memview(int8_t[:] values,
- int64_t[:] indexer,
- int64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int8_int64(ndarray[int8_t, ndim=1] values,
- int64_t[:] indexer,
- int64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int8_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int8_int64_memview(int8_t[:, :] values,
- int64_t[:] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int64_t *v
- int64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int64_t) and
- sizeof(int64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int8_int64(ndarray[int8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int8_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int64_t *v
- int64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int64_t) and
- sizeof(int64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int8_int64_memview(int8_t[:, :] values,
- int64_t[:] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int8_int64(ndarray[int8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int8_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int8_int64(ndarray[int8_t, ndim=2] values,
- indexer,
- ndarray[int64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int8_float64_memview(int8_t[:] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int8_float64(ndarray[int8_t, ndim=1] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int8_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int8_float64_memview(int8_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int8_float64(ndarray[int8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int8_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int8_float64_memview(int8_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int8_float64(ndarray[int8_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int8_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int8_float64(ndarray[int8_t, ndim=2] values,
- indexer,
- ndarray[float64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- float64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int16_int16_memview(int16_t[:] values,
- int64_t[:] indexer,
- int16_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int16_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int16_int16(ndarray[int16_t, ndim=1] values,
- int64_t[:] indexer,
- int16_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int16_int16_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int16_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int16_int16_memview(int16_t[:, :] values,
- int64_t[:] indexer,
- int16_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int16_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- int16_t *v
- int16_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int16_t) and
- sizeof(int16_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int16_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int16_int16(ndarray[int16_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int16_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int16_int16_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int16_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- int16_t *v
- int16_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int16_t) and
- sizeof(int16_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int16_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int16_int16_memview(int16_t[:, :] values,
- int64_t[:] indexer,
- int16_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int16_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int16_int16(ndarray[int16_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int16_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int16_int16_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int16_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int16_int16(ndarray[int16_t, ndim=2] values,
- indexer,
- ndarray[int16_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int16_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int16_int32_memview(int16_t[:] values,
- int64_t[:] indexer,
- int32_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int32_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int16_int32(ndarray[int16_t, ndim=1] values,
- int64_t[:] indexer,
- int32_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int16_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int32_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int16_int32_memview(int16_t[:, :] values,
- int64_t[:] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int32_t *v
- int32_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int32_t) and
- sizeof(int32_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int32_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int16_int32(ndarray[int16_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int16_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int32_t *v
- int32_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int32_t) and
- sizeof(int32_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int32_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int16_int32_memview(int16_t[:, :] values,
- int64_t[:] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int16_int32(ndarray[int16_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int16_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int16_int32(ndarray[int16_t, ndim=2] values,
- indexer,
- ndarray[int32_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int32_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int16_int64_memview(int16_t[:] values,
- int64_t[:] indexer,
- int64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int16_int64(ndarray[int16_t, ndim=1] values,
- int64_t[:] indexer,
- int64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int16_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int16_int64_memview(int16_t[:, :] values,
- int64_t[:] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int64_t *v
- int64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int64_t) and
- sizeof(int64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int16_int64(ndarray[int16_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int16_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int64_t *v
- int64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int64_t) and
- sizeof(int64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int16_int64_memview(int16_t[:, :] values,
- int64_t[:] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int16_int64(ndarray[int16_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int16_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int16_int64(ndarray[int16_t, ndim=2] values,
- indexer,
- ndarray[int64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int16_float64_memview(int16_t[:] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int16_float64(ndarray[int16_t, ndim=1] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int16_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int16_float64_memview(int16_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int16_float64(ndarray[int16_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int16_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int16_float64_memview(int16_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int16_float64(ndarray[int16_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int16_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int16_float64(ndarray[int16_t, ndim=2] values,
- indexer,
- ndarray[float64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- float64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int32_int32_memview(int32_t[:] values,
- int64_t[:] indexer,
- int32_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int32_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int32_int32(ndarray[int32_t, ndim=1] values,
- int64_t[:] indexer,
- int32_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int32_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int32_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int32_int32_memview(int32_t[:, :] values,
- int64_t[:] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- int32_t *v
- int32_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int32_t) and
- sizeof(int32_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int32_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int32_int32(ndarray[int32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int32_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- int32_t *v
- int32_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int32_t) and
- sizeof(int32_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int32_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int32_int32_memview(int32_t[:, :] values,
- int64_t[:] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int32_int32(ndarray[int32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int32_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int32_int32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int32_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int32_int32(ndarray[int32_t, ndim=2] values,
- indexer,
- ndarray[int32_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int32_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int32_int64_memview(int32_t[:] values,
- int64_t[:] indexer,
- int64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int32_int64(ndarray[int32_t, ndim=1] values,
- int64_t[:] indexer,
- int64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int32_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int32_int64_memview(int32_t[:, :] values,
- int64_t[:] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int64_t *v
- int64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int64_t) and
- sizeof(int64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int32_int64(ndarray[int32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int32_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- int64_t *v
- int64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int64_t) and
- sizeof(int64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int32_int64_memview(int32_t[:, :] values,
- int64_t[:] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int32_int64(ndarray[int32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int32_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int32_int64(ndarray[int32_t, ndim=2] values,
- indexer,
- ndarray[int64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int32_float64_memview(int32_t[:] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int32_float64(ndarray[int32_t, ndim=1] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int32_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int32_float64_memview(int32_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int32_float64(ndarray[int32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int32_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int32_float64_memview(int32_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int32_float64(ndarray[int32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int32_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int32_float64(ndarray[int32_t, ndim=2] values,
- indexer,
- ndarray[float64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- float64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int64_int64_memview(int64_t[:] values,
- int64_t[:] indexer,
- int64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- int64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int64_int64(ndarray[int64_t, ndim=1] values,
- int64_t[:] indexer,
- int64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int64_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- int64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int64_int64_memview(int64_t[:, :] values,
- int64_t[:] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- int64_t *v
- int64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int64_t) and
- sizeof(int64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int64_int64(ndarray[int64_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int64_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- int64_t *v
- int64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(int64_t) and
- sizeof(int64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(int64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int64_int64_memview(int64_t[:, :] values,
- int64_t[:] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int64_int64(ndarray[int64_t, ndim=2] values,
- ndarray[int64_t] indexer,
- int64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int64_int64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- int64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int64_int64(ndarray[int64_t, ndim=2] values,
- indexer,
- ndarray[int64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- int64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_int64_float64_memview(int64_t[:] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_int64_float64(ndarray[int64_t, ndim=1] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_int64_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_int64_float64_memview(int64_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_int64_float64(ndarray[int64_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_int64_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_int64_float64_memview(int64_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_int64_float64(ndarray[int64_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_int64_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_int64_float64(ndarray[int64_t, ndim=2] values,
- indexer,
- ndarray[float64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- float64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_float32_float32_memview(float32_t[:] values,
- int64_t[:] indexer,
- float32_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- float32_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_float32_float32(ndarray[float32_t, ndim=1] values,
- int64_t[:] indexer,
- float32_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_float32_float32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- float32_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_float32_float32_memview(float32_t[:, :] values,
- int64_t[:] indexer,
- float32_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float32_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- float32_t *v
- float32_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float32_t) and
- sizeof(float32_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float32_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_float32_float32(ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float32_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_float32_float32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float32_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- float32_t *v
- float32_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float32_t) and
- sizeof(float32_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float32_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_float32_float32_memview(float32_t[:, :] values,
- int64_t[:] indexer,
- float32_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float32_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_float32_float32(ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float32_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_float32_float32_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float32_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_float32_float32(ndarray[float32_t, ndim=2] values,
- indexer,
- ndarray[float32_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- float32_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_float32_float64_memview(float32_t[:] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_float32_float64(ndarray[float32_t, ndim=1] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_float32_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_float32_float64_memview(float32_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_float32_float64(ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_float32_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_float32_float64_memview(float32_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_float32_float64(ndarray[float32_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_float32_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_float32_float64(ndarray[float32_t, ndim=2] values,
- indexer,
- ndarray[float64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- float64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_float64_float64_memview(float64_t[:] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_float64_float64(ndarray[float64_t, ndim=1] values,
- int64_t[:] indexer,
- float64_t[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_float64_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- float64_t fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
- with nogil:
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_float64_float64_memview(float64_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_float64_float64(ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_float64_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF True:
- cdef:
- float64_t *v
- float64_t *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(float64_t) and
- sizeof(float64_t) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(float64_t) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_float64_float64_memview(float64_t[:, :] values,
- int64_t[:] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_float64_float64(ndarray[float64_t, ndim=2] values,
- ndarray[int64_t] indexer,
- float64_t[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_float64_float64_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- float64_t fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_float64_float64(ndarray[float64_t, ndim=2] values,
- indexer,
- ndarray[float64_t, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- float64_t fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_1d_object_object_memview(object[:] values,
- int64_t[:] indexer,
- object[:] out,
- fill_value=np.nan):
-
-
-
- cdef:
- Py_ssize_t i, n, idx
- object fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_1d_object_object(ndarray[object, ndim=1] values,
- int64_t[:] indexer,
- object[:] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_1d_object_object_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
-
- cdef:
- Py_ssize_t i, n, idx
- object fv
-
- n = indexer.shape[0]
-
- fv = fill_value
-
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- out[i] = fv
- else:
- out[i] = values[idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis0_object_object_memview(object[:, :] values,
- int64_t[:] indexer,
- object[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- object fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- object *v
- object *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(object) and
- sizeof(object) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(object) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis0_object_object(ndarray[object, ndim=2] values,
- ndarray[int64_t] indexer,
- object[:, :] out,
- fill_value=np.nan):
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis0_object_object_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- object fv
-
- n = len(indexer)
- k = values.shape[1]
-
- fv = fill_value
-
- IF False:
- cdef:
- object *v
- object *o
-
- #GH3130
- if (values.strides[1] == out.strides[1] and
- values.strides[1] == sizeof(object) and
- sizeof(object) * n >= 256):
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- v = &values[idx, 0]
- o = &out[i, 0]
- memmove(o, v, (sizeof(object) * k))
- return
-
- for i from 0 <= i < n:
- idx = indexer[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- out[i, j] = values[idx, j]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline take_2d_axis1_object_object_memview(object[:, :] values,
- int64_t[:] indexer,
- object[:, :] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- object fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_axis1_object_object(ndarray[object, ndim=2] values,
- ndarray[int64_t] indexer,
- object[:, :] out,
- fill_value=np.nan):
-
- if values.flags.writeable:
- # We can call the memoryview version of the code
- take_2d_axis1_object_object_memview(values, indexer, out,
- fill_value=fill_value)
- return
-
- # We cannot use the memoryview version on readonly-buffers due to
- # a limitation of Cython's typed memoryviews. Instead we can use
- # the slightly slower Cython ndarray type directly.
- cdef:
- Py_ssize_t i, j, k, n, idx
- object fv
-
- n = len(values)
- k = len(indexer)
-
- if n == 0 or k == 0:
- return
-
- fv = fill_value
-
- for i from 0 <= i < n:
- for j from 0 <= j < k:
- idx = indexer[j]
- if idx == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[i, idx]
-
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def take_2d_multi_object_object(ndarray[object, ndim=2] values,
- indexer,
- ndarray[object, ndim=2] out,
- fill_value=np.nan):
- cdef:
- Py_ssize_t i, j, k, n, idx
- ndarray[int64_t] idx0 = indexer[0]
- ndarray[int64_t] idx1 = indexer[1]
- object fv
-
- n = len(idx0)
- k = len(idx1)
-
- fv = fill_value
- for i from 0 <= i < n:
- idx = idx0[i]
- if idx == -1:
- for j from 0 <= j < k:
- out[i, j] = fv
- else:
- for j from 0 <= j < k:
- if idx1[j] == -1:
- out[i, j] = fv
- else:
- out[i, j] = values[idx, idx1[j]]
diff --git a/pandas/src/datetime.pxd b/pandas/src/datetime.pxd
index 5f7de8244d17e..2267c8282ec14 100644
--- a/pandas/src/datetime.pxd
+++ b/pandas/src/datetime.pxd
@@ -42,9 +42,6 @@ cdef extern from "datetime.h":
object PyDateTime_FromDateAndTime(int year, int month, int day, int hour,
int minute, int second, int us)
-cdef extern from "datetime_helper.h":
- void mangle_nat(object o)
-
cdef extern from "numpy/ndarrayobject.h":
ctypedef int64_t npy_timedelta
@@ -126,8 +123,8 @@ cdef extern from "datetime/np_datetime_strings.h":
-cdef inline _string_to_dts(object val, pandas_datetimestruct* dts,
- int* out_local, int* out_tzoffset):
+cdef inline int _string_to_dts(object val, pandas_datetimestruct* dts,
+ int* out_local, int* out_tzoffset) except? -1:
cdef int result
cdef char *tmp
@@ -139,10 +136,11 @@ cdef inline _string_to_dts(object val, pandas_datetimestruct* dts,
if result == -1:
raise ValueError('Unable to parse %s' % str(val))
+ return result
cdef inline int _cstring_to_dts(char *val, int length,
pandas_datetimestruct* dts,
- int* out_local, int* out_tzoffset):
+ int* out_local, int* out_tzoffset) except? -1:
cdef:
npy_bool special
PANDAS_DATETIMEUNIT out_bestunit
@@ -195,4 +193,3 @@ cdef inline int64_t _date_to_datetime64(object val,
dts.hour = dts.min = dts.sec = dts.us = 0
dts.ps = dts.as = 0
return pandas_datetimestruct_to_datetime(PANDAS_FR_ns, dts)
-
diff --git a/pandas/src/datetime/np_datetime.c b/pandas/src/datetime/np_datetime.c
index 80703c8b08de6..8458418988863 100644
--- a/pandas/src/datetime/np_datetime.c
+++ b/pandas/src/datetime/np_datetime.c
@@ -1,65 +1,65 @@
/*
- * This is derived from Numpy 1.7
- *
- * See NP_LICENSE.txt
- */
+
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+
+Copyright (c) 2005-2011, NumPy Developers
+All rights reserved.
+
+This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
+
+*/
#define NO_IMPORT
#include
#include
-/* #define __MSVCRT_VERSION__ 0x0700 /\* whatever above 0x0601 *\/ */
-/* #include */
-/* #define time_t __time64_t */
-/* #define localtime _localtime64 */
-/* #define time _time64 */
-
#include
#include
#include "np_datetime.h"
#if PY_MAJOR_VERSION >= 3
- #define PyIntObject PyLongObject
- #define PyInt_Type PyLong_Type
- #define PyInt_Check(op) PyLong_Check(op)
- #define PyInt_CheckExact(op) PyLong_CheckExact(op)
- #define PyInt_FromString PyLong_FromString
- #define PyInt_FromUnicode PyLong_FromUnicode
- #define PyInt_FromLong PyLong_FromLong
- #define PyInt_FromSize_t PyLong_FromSize_t
- #define PyInt_FromSsize_t PyLong_FromSsize_t
- #define PyInt_AsLong PyLong_AsLong
- #define PyInt_AS_LONG PyLong_AS_LONG
- #define PyInt_AsSsize_t PyLong_AsSsize_t
- #define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask
- #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask
+#define PyIntObject PyLongObject
+#define PyInt_Type PyLong_Type
+#define PyInt_Check(op) PyLong_Check(op)
+#define PyInt_CheckExact(op) PyLong_CheckExact(op)
+#define PyInt_FromString PyLong_FromString
+#define PyInt_FromUnicode PyLong_FromUnicode
+#define PyInt_FromLong PyLong_FromLong
+#define PyInt_FromSize_t PyLong_FromSize_t
+#define PyInt_FromSsize_t PyLong_FromSsize_t
+#define PyInt_AsLong PyLong_AsLong
+#define PyInt_AS_LONG PyLong_AS_LONG
+#define PyInt_AsSsize_t PyLong_AsSsize_t
+#define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask
+#define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask
#endif
const int days_per_month_table[2][12] = {
- { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 },
- { 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
-};
+ {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31},
+ {31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}};
/*
* Returns 1 if the given year is a leap year, 0 otherwise.
*/
-int is_leapyear(npy_int64 year)
-{
+int is_leapyear(npy_int64 year) {
return (year & 0x3) == 0 && /* year % 4 == 0 */
- ((year % 100) != 0 ||
- (year % 400) == 0);
+ ((year % 100) != 0 || (year % 400) == 0);
}
/*
* Sakamoto's method, from wikipedia
*/
-int dayofweek(int y, int m, int d)
-{
+int dayofweek(int y, int m, int d) {
int day;
static const int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
y -= m < 3;
- day = (y + y/4 - y/100 + y/400 + t[m-1] + d) % 7;
+ day = (y + y / 4 - y / 100 + y / 400 + t[m - 1] + d) % 7;
// convert to python day
return (day + 6) % 7;
}
@@ -68,9 +68,7 @@ int dayofweek(int y, int m, int d)
* Adjusts a datetimestruct based on a minutes offset. Assumes
* the current values are valid.g
*/
-void
-add_minutes_to_datetimestruct(pandas_datetimestruct *dts, int minutes)
-{
+void add_minutes_to_datetimestruct(pandas_datetimestruct *dts, int minutes) {
int isleap;
/* MINUTES */
@@ -102,12 +100,11 @@ add_minutes_to_datetimestruct(pandas_datetimestruct *dts, int minutes)
dts->month = 12;
}
isleap = is_leapyear(dts->year);
- dts->day += days_per_month_table[isleap][dts->month-1];
- }
- else if (dts->day > 28) {
+ dts->day += days_per_month_table[isleap][dts->month - 1];
+ } else if (dts->day > 28) {
isleap = is_leapyear(dts->year);
- if (dts->day > days_per_month_table[isleap][dts->month-1]) {
- dts->day -= days_per_month_table[isleap][dts->month-1];
+ if (dts->day > days_per_month_table[isleap][dts->month - 1]) {
+ dts->day -= days_per_month_table[isleap][dts->month - 1];
dts->month++;
if (dts->month > 12) {
dts->year++;
@@ -120,9 +117,7 @@ add_minutes_to_datetimestruct(pandas_datetimestruct *dts, int minutes)
/*
* Calculates the days offset from the 1970 epoch.
*/
-npy_int64
-get_datetimestruct_days(const pandas_datetimestruct *dts)
-{
+npy_int64 get_datetimestruct_days(const pandas_datetimestruct *dts) {
int i, month;
npy_int64 year, days = 0;
const int *month_lengths;
@@ -147,8 +142,7 @@ get_datetimestruct_days(const pandas_datetimestruct *dts)
year += 300;
/* Add one day for each 400 years */
days += year / 400;
- }
- else {
+ } else {
/*
* 1972 is the closest later year after 1970.
* Include the current year, so subtract 2.
@@ -183,20 +177,17 @@ get_datetimestruct_days(const pandas_datetimestruct *dts)
* Modifies '*days_' to be the day offset within the year,
* and returns the year.
*/
-static npy_int64
-days_to_yearsdays(npy_int64 *days_)
-{
- const npy_int64 days_per_400years = (400*365 + 100 - 4 + 1);
+static npy_int64 days_to_yearsdays(npy_int64 *days_) {
+ const npy_int64 days_per_400years = (400 * 365 + 100 - 4 + 1);
/* Adjust so it's relative to the year 2000 (divisible by 400) */
- npy_int64 days = (*days_) - (365*30 + 7);
+ npy_int64 days = (*days_) - (365 * 30 + 7);
npy_int64 year;
/* Break down the 400 year cycle to get the year and day within the year */
if (days >= 0) {
year = 400 * (days / days_per_400years);
days = days % days_per_400years;
- }
- else {
+ } else {
year = 400 * ((days - (days_per_400years - 1)) / days_per_400years);
days = days % days_per_400years;
if (days < 0) {
@@ -206,14 +197,14 @@ days_to_yearsdays(npy_int64 *days_)
/* Work out the year/day within the 400 year cycle */
if (days >= 366) {
- year += 100 * ((days-1) / (100*365 + 25 - 1));
- days = (days-1) % (100*365 + 25 - 1);
+ year += 100 * ((days - 1) / (100 * 365 + 25 - 1));
+ days = (days - 1) % (100 * 365 + 25 - 1);
if (days >= 365) {
- year += 4 * ((days+1) / (4*365 + 1));
- days = (days+1) % (4*365 + 1);
+ year += 4 * ((days + 1) / (4 * 365 + 1));
+ days = (days + 1) % (4 * 365 + 1);
if (days >= 366) {
- year += (days-1) / 365;
- days = (days-1) % 365;
+ year += (days - 1) / 365;
+ days = (days - 1) % 365;
}
}
}
@@ -226,9 +217,8 @@ days_to_yearsdays(npy_int64 *days_)
* Adjusts a datetimestruct based on a seconds offset. Assumes
* the current values are valid.
*/
-NPY_NO_EXPORT void
-add_seconds_to_datetimestruct(pandas_datetimestruct *dts, int seconds)
-{
+NPY_NO_EXPORT void add_seconds_to_datetimestruct(pandas_datetimestruct *dts,
+ int seconds) {
int minutes;
dts->sec += seconds;
@@ -240,8 +230,7 @@ add_seconds_to_datetimestruct(pandas_datetimestruct *dts, int seconds)
dts->sec += 60;
}
add_minutes_to_datetimestruct(dts, minutes);
- }
- else if (dts->sec >= 60) {
+ } else if (dts->sec >= 60) {
minutes = dts->sec / 60;
dts->sec = dts->sec % 60;
add_minutes_to_datetimestruct(dts, minutes);
@@ -252,9 +241,8 @@ add_seconds_to_datetimestruct(pandas_datetimestruct *dts, int seconds)
* Fills in the year, month, day in 'dts' based on the days
* offset from 1970.
*/
-static void
-set_datetimestruct_days(npy_int64 days, pandas_datetimestruct *dts)
-{
+static void set_datetimestruct_days(npy_int64 days,
+ pandas_datetimestruct *dts) {
const int *month_lengths;
int i;
@@ -266,8 +254,7 @@ set_datetimestruct_days(npy_int64 days, pandas_datetimestruct *dts)
dts->month = i + 1;
dts->day = days + 1;
return;
- }
- else {
+ } else {
days -= month_lengths[i];
}
}
@@ -276,9 +263,8 @@ set_datetimestruct_days(npy_int64 days, pandas_datetimestruct *dts)
/*
* Compares two pandas_datetimestruct objects chronologically
*/
-int
-cmp_pandas_datetimestruct(pandas_datetimestruct *a, pandas_datetimestruct *b)
-{
+int cmp_pandas_datetimestruct(pandas_datetimestruct *a,
+ pandas_datetimestruct *b) {
if (a->year > b->year) {
return 1;
} else if (a->year < b->year) {
@@ -355,11 +341,10 @@ cmp_pandas_datetimestruct(pandas_datetimestruct *a, pandas_datetimestruct *b)
* Returns -1 on error, 0 on success, and 1 (with no error set)
* if obj doesn't have the neeeded date or datetime attributes.
*/
-int
-convert_pydatetime_to_datetimestruct(PyObject *obj, pandas_datetimestruct *out,
- PANDAS_DATETIMEUNIT *out_bestunit,
- int apply_tzinfo)
-{
+int convert_pydatetime_to_datetimestruct(PyObject *obj,
+ pandas_datetimestruct *out,
+ PANDAS_DATETIMEUNIT *out_bestunit,
+ int apply_tzinfo) {
PyObject *tmp;
int isleap;
@@ -370,8 +355,8 @@ convert_pydatetime_to_datetimestruct(PyObject *obj, pandas_datetimestruct *out,
/* Need at least year/month/day attributes */
if (!PyObject_HasAttrString(obj, "year") ||
- !PyObject_HasAttrString(obj, "month") ||
- !PyObject_HasAttrString(obj, "day")) {
+ !PyObject_HasAttrString(obj, "month") ||
+ !PyObject_HasAttrString(obj, "day")) {
return 1;
}
@@ -417,15 +402,15 @@ convert_pydatetime_to_datetimestruct(PyObject *obj, pandas_datetimestruct *out,
}
isleap = is_leapyear(out->year);
if (out->day < 1 ||
- out->day > days_per_month_table[isleap][out->month-1]) {
+ out->day > days_per_month_table[isleap][out->month - 1]) {
goto invalid_date;
}
/* Check for time attributes (if not there, return success as a date) */
if (!PyObject_HasAttrString(obj, "hour") ||
- !PyObject_HasAttrString(obj, "minute") ||
- !PyObject_HasAttrString(obj, "second") ||
- !PyObject_HasAttrString(obj, "microsecond")) {
+ !PyObject_HasAttrString(obj, "minute") ||
+ !PyObject_HasAttrString(obj, "second") ||
+ !PyObject_HasAttrString(obj, "microsecond")) {
/* The best unit for date is 'D' */
if (out_bestunit != NULL) {
*out_bestunit = PANDAS_FR_D;
@@ -481,10 +466,8 @@ convert_pydatetime_to_datetimestruct(PyObject *obj, pandas_datetimestruct *out,
}
Py_DECREF(tmp);
- if (out->hour < 0 || out->hour >= 24 ||
- out->min < 0 || out->min >= 60 ||
- out->sec < 0 || out->sec >= 60 ||
- out->us < 0 || out->us >= 1000000) {
+ if (out->hour < 0 || out->hour >= 24 || out->min < 0 || out->min >= 60 ||
+ out->sec < 0 || out->sec >= 60 || out->us < 0 || out->us >= 1000000) {
goto invalid_time;
}
@@ -496,8 +479,7 @@ convert_pydatetime_to_datetimestruct(PyObject *obj, pandas_datetimestruct *out,
}
if (tmp == Py_None) {
Py_DECREF(tmp);
- }
- else {
+ } else {
PyObject *offset;
int seconds_offset, minutes_offset;
@@ -540,20 +522,20 @@ convert_pydatetime_to_datetimestruct(PyObject *obj, pandas_datetimestruct *out,
invalid_date:
PyErr_Format(PyExc_ValueError,
- "Invalid date (%d,%d,%d) when converting to NumPy datetime",
- (int)out->year, (int)out->month, (int)out->day);
+ "Invalid date (%d,%d,%d) when converting to NumPy datetime",
+ (int)out->year, (int)out->month, (int)out->day);
return -1;
invalid_time:
PyErr_Format(PyExc_ValueError,
- "Invalid time (%d,%d,%d,%d) when converting "
- "to NumPy datetime",
- (int)out->hour, (int)out->min, (int)out->sec, (int)out->us);
+ "Invalid time (%d,%d,%d,%d) when converting "
+ "to NumPy datetime",
+ (int)out->hour, (int)out->min, (int)out->sec, (int)out->us);
return -1;
}
-npy_datetime pandas_datetimestruct_to_datetime(PANDAS_DATETIMEUNIT fr, pandas_datetimestruct *d)
-{
+npy_datetime pandas_datetimestruct_to_datetime(PANDAS_DATETIMEUNIT fr,
+ pandas_datetimestruct *d) {
pandas_datetime_metadata meta;
npy_datetime result = PANDAS_DATETIME_NAT;
@@ -565,8 +547,7 @@ npy_datetime pandas_datetimestruct_to_datetime(PANDAS_DATETIMEUNIT fr, pandas_da
}
void pandas_datetime_to_datetimestruct(npy_datetime val, PANDAS_DATETIMEUNIT fr,
- pandas_datetimestruct *result)
-{
+ pandas_datetimestruct *result) {
pandas_datetime_metadata meta;
meta.base = fr;
@@ -576,10 +557,9 @@ void pandas_datetime_to_datetimestruct(npy_datetime val, PANDAS_DATETIMEUNIT fr,
}
PANDAS_DATETIMEUNIT get_datetime64_unit(PyObject *obj) {
- return (PANDAS_DATETIMEUNIT)((PyDatetimeScalarObject *) obj)->obmeta.base;
+ return (PANDAS_DATETIMEUNIT)((PyDatetimeScalarObject *)obj)->obmeta.base;
}
-
/*
* Converts a datetime from a datetimestruct to a datetime based
* on some metadata. The date is assumed to be valid.
@@ -588,23 +568,19 @@ PANDAS_DATETIMEUNIT get_datetime64_unit(PyObject *obj) {
*
* Returns 0 on success, -1 on failure.
*/
-int
-convert_datetimestruct_to_datetime(pandas_datetime_metadata *meta,
- const pandas_datetimestruct *dts,
- npy_datetime *out)
-{
+int convert_datetimestruct_to_datetime(pandas_datetime_metadata *meta,
+ const pandas_datetimestruct *dts,
+ npy_datetime *out) {
npy_datetime ret;
PANDAS_DATETIMEUNIT base = meta->base;
if (base == PANDAS_FR_Y) {
/* Truncate to the year */
ret = dts->year - 1970;
- }
- else if (base == PANDAS_FR_M) {
+ } else if (base == PANDAS_FR_M) {
/* Truncate to the month */
ret = 12 * (dts->year - 1970) + (dts->month - 1);
- }
- else {
+ } else {
/* Otherwise calculate the number of days to start */
npy_int64 days = get_datetimestruct_days(dts);
@@ -613,8 +589,7 @@ convert_datetimestruct_to_datetime(pandas_datetime_metadata *meta,
/* Truncate to weeks */
if (days >= 0) {
ret = days / 7;
- }
- else {
+ } else {
ret = (days - 6) / 7;
}
break;
@@ -622,74 +597,69 @@ convert_datetimestruct_to_datetime(pandas_datetime_metadata *meta,
ret = days;
break;
case PANDAS_FR_h:
- ret = days * 24 +
- dts->hour;
+ ret = days * 24 + dts->hour;
break;
case PANDAS_FR_m:
- ret = (days * 24 +
- dts->hour) * 60 +
- dts->min;
+ ret = (days * 24 + dts->hour) * 60 + dts->min;
break;
case PANDAS_FR_s:
- ret = ((days * 24 +
- dts->hour) * 60 +
- dts->min) * 60 +
- dts->sec;
+ ret = ((days * 24 + dts->hour) * 60 + dts->min) * 60 + dts->sec;
break;
case PANDAS_FR_ms:
- ret = (((days * 24 +
- dts->hour) * 60 +
- dts->min) * 60 +
- dts->sec) * 1000 +
+ ret = (((days * 24 + dts->hour) * 60 + dts->min) * 60 +
+ dts->sec) *
+ 1000 +
dts->us / 1000;
break;
case PANDAS_FR_us:
- ret = (((days * 24 +
- dts->hour) * 60 +
- dts->min) * 60 +
- dts->sec) * 1000000 +
+ ret = (((days * 24 + dts->hour) * 60 + dts->min) * 60 +
+ dts->sec) *
+ 1000000 +
dts->us;
break;
case PANDAS_FR_ns:
- ret = ((((days * 24 +
- dts->hour) * 60 +
- dts->min) * 60 +
- dts->sec) * 1000000 +
- dts->us) * 1000 +
+ ret = ((((days * 24 + dts->hour) * 60 + dts->min) * 60 +
+ dts->sec) *
+ 1000000 +
+ dts->us) *
+ 1000 +
dts->ps / 1000;
break;
case PANDAS_FR_ps:
- ret = ((((days * 24 +
- dts->hour) * 60 +
- dts->min) * 60 +
- dts->sec) * 1000000 +
- dts->us) * 1000000 +
+ ret = ((((days * 24 + dts->hour) * 60 + dts->min) * 60 +
+ dts->sec) *
+ 1000000 +
+ dts->us) *
+ 1000000 +
dts->ps;
break;
case PANDAS_FR_fs:
/* only 2.6 hours */
- ret = (((((days * 24 +
- dts->hour) * 60 +
- dts->min) * 60 +
- dts->sec) * 1000000 +
- dts->us) * 1000000 +
- dts->ps) * 1000 +
+ ret = (((((days * 24 + dts->hour) * 60 + dts->min) * 60 +
+ dts->sec) *
+ 1000000 +
+ dts->us) *
+ 1000000 +
+ dts->ps) *
+ 1000 +
dts->as / 1000;
break;
case PANDAS_FR_as:
/* only 9.2 secs */
- ret = (((((days * 24 +
- dts->hour) * 60 +
- dts->min) * 60 +
- dts->sec) * 1000000 +
- dts->us) * 1000000 +
- dts->ps) * 1000000 +
+ ret = (((((days * 24 + dts->hour) * 60 + dts->min) * 60 +
+ dts->sec) *
+ 1000000 +
+ dts->us) *
+ 1000000 +
+ dts->ps) *
+ 1000000 +
dts->as;
break;
default:
/* Something got corrupted */
- PyErr_SetString(PyExc_ValueError,
- "NumPy datetime metadata with corrupt unit value");
+ PyErr_SetString(
+ PyExc_ValueError,
+ "NumPy datetime metadata with corrupt unit value");
return -1;
}
}
@@ -698,8 +668,7 @@ convert_datetimestruct_to_datetime(pandas_datetime_metadata *meta,
if (meta->num > 1) {
if (ret >= 0) {
ret /= meta->num;
- }
- else {
+ } else {
ret = (ret - meta->num + 1) / meta->num;
}
}
@@ -709,18 +678,15 @@ convert_datetimestruct_to_datetime(pandas_datetime_metadata *meta,
return 0;
}
-
/*
* This provides the casting rules for the TIMEDELTA data type units.
*
* Notably, there is a barrier between the nonlinear years and
* months units, and all the other units.
*/
-npy_bool
-can_cast_timedelta64_units(PANDAS_DATETIMEUNIT src_unit,
- PANDAS_DATETIMEUNIT dst_unit,
- NPY_CASTING casting)
-{
+npy_bool can_cast_timedelta64_units(PANDAS_DATETIMEUNIT src_unit,
+ PANDAS_DATETIMEUNIT dst_unit,
+ NPY_CASTING casting) {
switch (casting) {
/* Allow anything with unsafe casting */
case NPY_UNSAFE_CASTING:
@@ -732,7 +698,7 @@ can_cast_timedelta64_units(PANDAS_DATETIMEUNIT src_unit,
*/
case NPY_SAME_KIND_CASTING:
return (src_unit <= PANDAS_FR_M && dst_unit <= PANDAS_FR_M) ||
- (src_unit > PANDAS_FR_M && dst_unit > PANDAS_FR_M);
+ (src_unit > PANDAS_FR_M && dst_unit > PANDAS_FR_M);
/*
* Enforce the 'date units' vs 'time units' barrier and that
@@ -741,7 +707,7 @@ can_cast_timedelta64_units(PANDAS_DATETIMEUNIT src_unit,
*/
case NPY_SAFE_CASTING:
return (src_unit <= dst_unit) &&
- ((src_unit <= PANDAS_FR_M && dst_unit <= PANDAS_FR_M) ||
+ ((src_unit <= PANDAS_FR_M && dst_unit <= PANDAS_FR_M) ||
(src_unit > PANDAS_FR_M && dst_unit > PANDAS_FR_M));
/* Enforce equality with 'no' or 'equiv' casting */
@@ -756,11 +722,9 @@ can_cast_timedelta64_units(PANDAS_DATETIMEUNIT src_unit,
* Notably, there is a barrier between 'date units' and 'time units'
* for all but 'unsafe' casting.
*/
-npy_bool
-can_cast_datetime64_units(PANDAS_DATETIMEUNIT src_unit,
- PANDAS_DATETIMEUNIT dst_unit,
- NPY_CASTING casting)
-{
+npy_bool can_cast_datetime64_units(PANDAS_DATETIMEUNIT src_unit,
+ PANDAS_DATETIMEUNIT dst_unit,
+ NPY_CASTING casting) {
switch (casting) {
/* Allow anything with unsafe casting */
case NPY_UNSAFE_CASTING:
@@ -772,7 +736,7 @@ can_cast_datetime64_units(PANDAS_DATETIMEUNIT src_unit,
*/
case NPY_SAME_KIND_CASTING:
return (src_unit <= PANDAS_FR_D && dst_unit <= PANDAS_FR_D) ||
- (src_unit > PANDAS_FR_D && dst_unit > PANDAS_FR_D);
+ (src_unit > PANDAS_FR_D && dst_unit > PANDAS_FR_D);
/*
* Enforce the 'date units' vs 'time units' barrier and that
@@ -781,7 +745,7 @@ can_cast_datetime64_units(PANDAS_DATETIMEUNIT src_unit,
*/
case NPY_SAFE_CASTING:
return (src_unit <= dst_unit) &&
- ((src_unit <= PANDAS_FR_D && dst_unit <= PANDAS_FR_D) ||
+ ((src_unit <= PANDAS_FR_D && dst_unit <= PANDAS_FR_D) ||
(src_unit > PANDAS_FR_D && dst_unit > PANDAS_FR_D));
/* Enforce equality with 'no' or 'equiv' casting */
@@ -793,11 +757,9 @@ can_cast_datetime64_units(PANDAS_DATETIMEUNIT src_unit,
/*
* Converts a datetime based on the given metadata into a datetimestruct
*/
-int
-convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
- npy_datetime dt,
- pandas_datetimestruct *out)
-{
+int convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
+ npy_datetime dt,
+ pandas_datetimestruct *out) {
npy_int64 perday;
/* Initialize the output to all zeros */
@@ -820,12 +782,11 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
case PANDAS_FR_M:
if (dt >= 0) {
- out->year = 1970 + dt / 12;
+ out->year = 1970 + dt / 12;
out->month = dt % 12 + 1;
- }
- else {
- out->year = 1969 + (dt + 1) / 12;
- out->month = 12 + (dt + 1)% 12;
+ } else {
+ out->year = 1969 + (dt + 1) / 12;
+ out->month = 12 + (dt + 1) % 12;
}
break;
@@ -843,11 +804,11 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
if (dt >= 0) {
set_datetimestruct_days(dt / perday, out);
- dt = dt % perday;
- }
- else {
- set_datetimestruct_days((dt - (perday-1)) / perday, out);
- dt = (perday-1) + (dt + 1) % perday;
+ dt = dt % perday;
+ } else {
+ set_datetimestruct_days(
+ dt / perday - (dt % perday == 0 ? 0 : 1), out);
+ dt = (perday - 1) + (dt + 1) % perday;
}
out->hour = dt;
break;
@@ -857,11 +818,11 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
if (dt >= 0) {
set_datetimestruct_days(dt / perday, out);
- dt = dt % perday;
- }
- else {
- set_datetimestruct_days((dt - (perday-1)) / perday, out);
- dt = (perday-1) + (dt + 1) % perday;
+ dt = dt % perday;
+ } else {
+ set_datetimestruct_days(
+ dt / perday - (dt % perday == 0 ? 0 : 1), out);
+ dt = (perday - 1) + (dt + 1) % perday;
}
out->hour = dt / 60;
out->min = dt % 60;
@@ -872,13 +833,13 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
if (dt >= 0) {
set_datetimestruct_days(dt / perday, out);
- dt = dt % perday;
+ dt = dt % perday;
+ } else {
+ set_datetimestruct_days(
+ dt / perday - (dt % perday == 0 ? 0 : 1), out);
+ dt = (perday - 1) + (dt + 1) % perday;
}
- else {
- set_datetimestruct_days((dt - (perday-1)) / perday, out);
- dt = (perday-1) + (dt + 1) % perday;
- }
- out->hour = dt / (60*60);
+ out->hour = dt / (60 * 60);
out->min = (dt / 60) % 60;
out->sec = dt % 60;
break;
@@ -888,14 +849,14 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
if (dt >= 0) {
set_datetimestruct_days(dt / perday, out);
- dt = dt % perday;
- }
- else {
- set_datetimestruct_days((dt - (perday-1)) / perday, out);
- dt = (perday-1) + (dt + 1) % perday;
+ dt = dt % perday;
+ } else {
+ set_datetimestruct_days(
+ dt / perday - (dt % perday == 0 ? 0 : 1), out);
+ dt = (perday - 1) + (dt + 1) % perday;
}
- out->hour = dt / (60*60*1000LL);
- out->min = (dt / (60*1000LL)) % 60;
+ out->hour = dt / (60 * 60 * 1000LL);
+ out->min = (dt / (60 * 1000LL)) % 60;
out->sec = (dt / 1000LL) % 60;
out->us = (dt % 1000LL) * 1000;
break;
@@ -905,14 +866,14 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
if (dt >= 0) {
set_datetimestruct_days(dt / perday, out);
- dt = dt % perday;
+ dt = dt % perday;
+ } else {
+ set_datetimestruct_days(
+ dt / perday - (dt % perday == 0 ? 0 : 1), out);
+ dt = (perday - 1) + (dt + 1) % perday;
}
- else {
- set_datetimestruct_days((dt - (perday-1)) / perday, out);
- dt = (perday-1) + (dt + 1) % perday;
- }
- out->hour = dt / (60*60*1000000LL);
- out->min = (dt / (60*1000000LL)) % 60;
+ out->hour = dt / (60 * 60 * 1000000LL);
+ out->min = (dt / (60 * 1000000LL)) % 60;
out->sec = (dt / 1000000LL) % 60;
out->us = dt % 1000000LL;
break;
@@ -922,14 +883,14 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
if (dt >= 0) {
set_datetimestruct_days(dt / perday, out);
- dt = dt % perday;
- }
- else {
- set_datetimestruct_days((dt - (perday-1)) / perday, out);
- dt = (perday-1) + (dt + 1) % perday;
+ dt = dt % perday;
+ } else {
+ set_datetimestruct_days(
+ dt / perday - (dt % perday == 0 ? 0 : 1), out);
+ dt = (perday - 1) + (dt + 1) % perday;
}
- out->hour = dt / (60*60*1000000000LL);
- out->min = (dt / (60*1000000000LL)) % 60;
+ out->hour = dt / (60 * 60 * 1000000000LL);
+ out->min = (dt / (60 * 1000000000LL)) % 60;
out->sec = (dt / 1000000000LL) % 60;
out->us = (dt / 1000LL) % 1000000LL;
out->ps = (dt % 1000LL) * 1000;
@@ -940,14 +901,14 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
if (dt >= 0) {
set_datetimestruct_days(dt / perday, out);
- dt = dt % perday;
+ dt = dt % perday;
+ } else {
+ set_datetimestruct_days(
+ dt / perday - (dt % perday == 0 ? 0 : 1), out);
+ dt = (perday - 1) + (dt + 1) % perday;
}
- else {
- set_datetimestruct_days((dt - (perday-1)) / perday, out);
- dt = (perday-1) + (dt + 1) % perday;
- }
- out->hour = dt / (60*60*1000000000000LL);
- out->min = (dt / (60*1000000000000LL)) % 60;
+ out->hour = dt / (60 * 60 * 1000000000000LL);
+ out->min = (dt / (60 * 1000000000000LL)) % 60;
out->sec = (dt / 1000000000000LL) % 60;
out->us = (dt / 1000000LL) % 1000000LL;
out->ps = dt % 1000000LL;
@@ -956,20 +917,19 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
case PANDAS_FR_fs:
/* entire range is only +- 2.6 hours */
if (dt >= 0) {
- out->hour = dt / (60*60*1000000000000000LL);
- out->min = (dt / (60*1000000000000000LL)) % 60;
+ out->hour = dt / (60 * 60 * 1000000000000000LL);
+ out->min = (dt / (60 * 1000000000000000LL)) % 60;
out->sec = (dt / 1000000000000000LL) % 60;
out->us = (dt / 1000000000LL) % 1000000LL;
out->ps = (dt / 1000LL) % 1000000LL;
out->as = (dt % 1000LL) * 1000;
- }
- else {
+ } else {
npy_datetime minutes;
- minutes = dt / (60*1000000000000000LL);
- dt = dt % (60*1000000000000000LL);
+ minutes = dt / (60 * 1000000000000000LL);
+ dt = dt % (60 * 1000000000000000LL);
if (dt < 0) {
- dt += (60*1000000000000000LL);
+ dt += (60 * 1000000000000000LL);
--minutes;
}
/* Offset the negative minutes */
@@ -988,8 +948,7 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
out->us = (dt / 1000000000000LL) % 1000000LL;
out->ps = (dt / 1000000LL) % 1000000LL;
out->as = dt % 1000000LL;
- }
- else {
+ } else {
npy_datetime seconds;
seconds = dt / 1000000000000000000LL;
@@ -1008,11 +967,10 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
default:
PyErr_SetString(PyExc_RuntimeError,
- "NumPy datetime metadata is corrupted with invalid "
- "base unit");
+ "NumPy datetime metadata is corrupted with invalid "
+ "base unit");
return -1;
}
return 0;
}
-
diff --git a/pandas/src/datetime/np_datetime.h b/pandas/src/datetime/np_datetime.h
index f200d3a289c06..3445fc3e48376 100644
--- a/pandas/src/datetime/np_datetime.h
+++ b/pandas/src/datetime/np_datetime.h
@@ -1,29 +1,41 @@
/*
- * This is derived from numpy 1.7
- * See NP_LICENSE.TXT
- */
-#ifndef _PANDAS_DATETIME_H_
-#define _PANDAS_DATETIME_H_
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+
+Copyright (c) 2005-2011, NumPy Developers
+All rights reserved.
+
+This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
+
+*/
+
+#ifndef PANDAS_SRC_DATETIME_NP_DATETIME_H_
+#define PANDAS_SRC_DATETIME_NP_DATETIME_H_
#include
typedef enum {
- PANDAS_FR_Y = 0, /* Years */
- PANDAS_FR_M = 1, /* Months */
- PANDAS_FR_W = 2, /* Weeks */
- /* Gap where NPY_FR_B was */
- PANDAS_FR_D = 4, /* Days */
- PANDAS_FR_h = 5, /* hours */
- PANDAS_FR_m = 6, /* minutes */
- PANDAS_FR_s = 7, /* seconds */
- PANDAS_FR_ms = 8,/* milliseconds */
- PANDAS_FR_us = 9,/* microseconds */
- PANDAS_FR_ns = 10,/* nanoseconds */
- PANDAS_FR_ps = 11,/* picoseconds */
- PANDAS_FR_fs = 12,/* femtoseconds */
- PANDAS_FR_as = 13,/* attoseconds */
- PANDAS_FR_GENERIC = 14 /* Generic, unbound units, can convert to anything */
+ PANDAS_FR_Y = 0, // Years
+ PANDAS_FR_M = 1, // Months
+ PANDAS_FR_W = 2, // Weeks
+ // Gap where NPY_FR_B was
+ PANDAS_FR_D = 4, // Days
+ PANDAS_FR_h = 5, // hours
+ PANDAS_FR_m = 6, // minutes
+ PANDAS_FR_s = 7, // seconds
+ PANDAS_FR_ms = 8, // milliseconds
+ PANDAS_FR_us = 9, // microseconds
+ PANDAS_FR_ns = 10, // nanoseconds
+ PANDAS_FR_ps = 11, // picoseconds
+ PANDAS_FR_fs = 12, // femtoseconds
+ PANDAS_FR_as = 13, // attoseconds
+ PANDAS_FR_GENERIC = 14 // Generic, unbound units, can
+ // convert to anything
} PANDAS_DATETIMEUNIT;
#define PANDAS_DATETIME_NUMUNITS 13
@@ -45,7 +57,8 @@ typedef struct {
// stuff pandas needs
// ----------------------------------------------------------------------------
-int convert_pydatetime_to_datetimestruct(PyObject *obj, pandas_datetimestruct *out,
+int convert_pydatetime_to_datetimestruct(PyObject *obj,
+ pandas_datetimestruct *out,
PANDAS_DATETIMEUNIT *out_bestunit,
int apply_tzinfo);
@@ -96,11 +109,6 @@ add_minutes_to_datetimestruct(pandas_datetimestruct *dts, int minutes);
* Notably, there is a barrier between the nonlinear years and
* months units, and all the other units.
*/
-//npy_bool
-//can_cast_timedelta64_units(PANDAS_DATETIMEUNIT src_unit,
-// PANDAS_DATETIMEUNIT dst_unit,
-// NPY_CASTING casting);
-
npy_bool
can_cast_datetime64_units(PANDAS_DATETIMEUNIT src_unit,
PANDAS_DATETIMEUNIT dst_unit,
@@ -116,4 +124,4 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
PANDAS_DATETIMEUNIT get_datetime64_unit(PyObject *obj);
-#endif
+#endif // PANDAS_SRC_DATETIME_NP_DATETIME_H_
diff --git a/pandas/src/datetime/np_datetime_strings.c b/pandas/src/datetime/np_datetime_strings.c
index b633d6cde0820..5307d394423ff 100644
--- a/pandas/src/datetime/np_datetime_strings.c
+++ b/pandas/src/datetime/np_datetime_strings.c
@@ -1,11 +1,23 @@
/*
- * This file implements string parsing and creation for NumPy datetime.
- *
- * Written by Mark Wiebe (mwwiebe@gmail.com)
- * Copyright (c) 2011 by Enthought, Inc.
- *
- * See NP_LICENSE.txt for the license.
- */
+
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+
+Written by Mark Wiebe (mwwiebe@gmail.com)
+Copyright (c) 2011 by Enthought, Inc.
+
+Copyright (c) 2005-2011, NumPy Developers
+All rights reserved.
+
+See NUMPY_LICENSE.txt for the license.
+
+This file implements string parsing and creation for NumPy datetime.
+
+*/
#define PY_SSIZE_T_CLEAN
#define NO_IMPORT
@@ -20,9 +32,7 @@
#include "np_datetime.h"
#include "np_datetime_strings.h"
-NPY_NO_EXPORT const char *
-npy_casting_to_string(NPY_CASTING casting)
-{
+NPY_NO_EXPORT const char *npy_casting_to_string(NPY_CASTING casting) {
switch (casting) {
case NPY_NO_CASTING:
return "'no'";
@@ -42,35 +52,23 @@ npy_casting_to_string(NPY_CASTING casting)
/* Platform-specific time_t typedef */
typedef time_t NPY_TIME_T;
-/*// We *do* want these symbols, but for cython, not for C. fine in mac osx,*/
-/*// linux complains.*/
-/*static void _suppress_unused_variable_warning(void)*/
-/*{*/
-/* int x = days_per_month_table[0][0];*/
-/* x = x;*/
+/* We *do* want these symbols, but for Cython, not for C.
+ Fine in Mac OSX, but Linux complains.
+
+static void _suppress_unused_variable_warning(void) {
+ int x = days_per_month_table[0][0];
+ x = x;
-/* int y = _month_offset[0][0];*/
-/* y = y;*/
+ int y = _month_offset[0][0];
+ y = y;
-/* char *z = _datetime_strings[0];*/
-/* z = z;*/
-/*}*/
+ char *z = _datetime_strings[0];
+ z = z;
+} */
/* Exported as DATETIMEUNITS in multiarraymodule.c */
static char *_datetime_strings[PANDAS_DATETIME_NUMUNITS] = {
- "Y",
- "M",
- "W",
- "D",
- "h",
- "m",
- "s",
- "ms",
- "us",
- "ns",
- "ps",
- "fs",
- "as",
+ "Y", "M", "W", "D", "h", "m", "s", "ms", "us", "ns", "ps", "fs", "as",
};
/*
* Wraps `localtime` functionality for multiple platforms. This
@@ -78,30 +76,28 @@ static char *_datetime_strings[PANDAS_DATETIME_NUMUNITS] = {
*
* Returns 0 on success, -1 on failure.
*/
-static int
-get_localtime(NPY_TIME_T *ts, struct tm *tms)
-{
+static int get_localtime(NPY_TIME_T *ts, struct tm *tms) {
char *func_name = "";
#if defined(_WIN32)
- #if defined(_MSC_VER) && (_MSC_VER >= 1400)
+#if defined(_MSC_VER) && (_MSC_VER >= 1400)
if (localtime_s(tms, ts) != 0) {
func_name = "localtime_s";
goto fail;
}
- #elif defined(__GNUC__) && defined(NPY_MINGW_USE_CUSTOM_MSVCR)
+#elif defined(__GNUC__) && defined(NPY_MINGW_USE_CUSTOM_MSVCR)
if (_localtime64_s(tms, ts) != 0) {
func_name = "_localtime64_s";
goto fail;
}
- #else
+#else
struct tm *tms_tmp;
- tms_tmp = localtime(ts);
+ localtime_r(ts, tms_tmp);
if (tms_tmp == NULL) {
func_name = "localtime";
goto fail;
}
memcpy(tms, tms_tmp, sizeof(struct tm));
- #endif
+#endif
#else
if (localtime_r(ts, tms) == NULL) {
func_name = "localtime_r";
@@ -112,8 +108,10 @@ get_localtime(NPY_TIME_T *ts, struct tm *tms)
return 0;
fail:
- PyErr_Format(PyExc_OSError, "Failed to use '%s' to convert "
- "to a local time", func_name);
+ PyErr_Format(PyExc_OSError,
+ "Failed to use '%s' to convert "
+ "to a local time",
+ func_name);
return -1;
}
@@ -125,29 +123,28 @@ get_localtime(NPY_TIME_T *ts, struct tm *tms)
* Returns 0 on success, -1 on failure.
*/
static int
-get_gmtime(NPY_TIME_T *ts, struct tm *tms)
-{
+get_gmtime(NPY_TIME_T *ts, struct tm *tms) {
char *func_name = "";
#if defined(_WIN32)
- #if defined(_MSC_VER) && (_MSC_VER >= 1400)
+#if defined(_MSC_VER) && (_MSC_VER >= 1400)
if (gmtime_s(tms, ts) != 0) {
func_name = "gmtime_s";
goto fail;
}
- #elif defined(__GNUC__) && defined(NPY_MINGW_USE_CUSTOM_MSVCR)
+#elif defined(__GNUC__) && defined(NPY_MINGW_USE_CUSTOM_MSVCR)
if (_gmtime64_s(tms, ts) != 0) {
func_name = "_gmtime64_s";
goto fail;
}
- #else
+#else
struct tm *tms_tmp;
- tms_tmp = gmtime(ts);
+ gmtime_r(ts, tms_tmp);
if (tms_tmp == NULL) {
func_name = "gmtime";
goto fail;
}
memcpy(tms, tms_tmp, sizeof(struct tm));
- #endif
+#endif
#else
if (gmtime_r(ts, tms) == NULL) {
func_name = "gmtime_r";
@@ -170,10 +167,9 @@ get_gmtime(NPY_TIME_T *ts, struct tm *tms)
*
* Returns 0 on success, -1 on failure.
*/
-static int
-convert_datetimestruct_utc_to_local(pandas_datetimestruct *out_dts_local,
- const pandas_datetimestruct *dts_utc, int *out_timezone_offset)
-{
+static int convert_datetimestruct_utc_to_local(
+ pandas_datetimestruct *out_dts_local, const pandas_datetimestruct *dts_utc,
+ int *out_timezone_offset) {
NPY_TIME_T rawtime = 0, localrawtime;
struct tm tm_;
npy_int64 year_correction = 0;
@@ -187,8 +183,7 @@ convert_datetimestruct_utc_to_local(pandas_datetimestruct *out_dts_local,
/* 2036 is a leap year */
year_correction = out_dts_local->year - 2036;
out_dts_local->year -= year_correction;
- }
- else {
+ } else {
/* 2037 is not a leap year */
year_correction = out_dts_local->year - 2037;
out_dts_local->year -= year_correction;
@@ -239,8 +234,7 @@ convert_datetimestruct_utc_to_local(pandas_datetimestruct *out_dts_local,
*/
static int
convert_datetimestruct_local_to_utc(pandas_datetimestruct *out_dts_utc,
- const pandas_datetimestruct *dts_local)
-{
+ const pandas_datetimestruct *dts_local) {
npy_int64 year_correction = 0;
/* Make a copy of the input 'dts' to modify */
@@ -252,8 +246,7 @@ convert_datetimestruct_local_to_utc(pandas_datetimestruct *out_dts_utc,
/* 2036 is a leap year */
year_correction = out_dts_utc->year - 2036;
out_dts_utc->year -= year_correction;
- }
- else {
+ } else {
/* 2037 is not a leap year */
year_correction = out_dts_utc->year - 2037;
out_dts_utc->year -= year_correction;
@@ -332,7 +325,8 @@ convert_datetimestruct_local_to_utc(pandas_datetimestruct *out_dts_utc,
/* } */
/* /\* Parse the ISO date *\/ */
-/* if (parse_iso_8601_datetime(str, len, PANDAS_FR_us, NPY_UNSAFE_CASTING, */
+/* if (parse_iso_8601_datetime(str, len, PANDAS_FR_us, NPY_UNSAFE_CASTING,
+ */
/* dts, NULL, &bestunit, NULL) < 0) { */
/* Py_DECREF(bytes); */
/* return -1; */
@@ -342,7 +336,6 @@ convert_datetimestruct_local_to_utc(pandas_datetimestruct *out_dts_utc,
/* return 0; */
/* } */
-
/*
* Parses (almost) standard ISO 8601 date strings. The differences are:
*
@@ -365,7 +358,7 @@ convert_datetimestruct_local_to_utc(pandas_datetimestruct *out_dts_utc,
* to be cast to the 'unit' parameter.
*
* 'out' gets filled with the parsed date-time.
- * 'out_local' gets set to 1 if the parsed time contains timezone,
+ * 'out_local' gets set to 1 if the parsed time contains timezone,
* to 0 otherwise.
* 'out_tzoffset' gets set to timezone offset by minutes
* if the parsed time was in local time,
@@ -381,16 +374,11 @@ convert_datetimestruct_local_to_utc(pandas_datetimestruct *out_dts_utc,
*
* Returns 0 on success, -1 on failure.
*/
-int
-parse_iso_8601_datetime(char *str, int len,
- PANDAS_DATETIMEUNIT unit,
- NPY_CASTING casting,
- pandas_datetimestruct *out,
- int *out_local,
- int *out_tzoffset,
- PANDAS_DATETIMEUNIT *out_bestunit,
- npy_bool *out_special)
-{
+int parse_iso_8601_datetime(char *str, int len, PANDAS_DATETIMEUNIT unit,
+ NPY_CASTING casting, pandas_datetimestruct *out,
+ int *out_local, int *out_tzoffset,
+ PANDAS_DATETIMEUNIT *out_bestunit,
+ npy_bool *out_special) {
int year_leap = 0;
int i, numdigits;
char *substr, sublen;
@@ -417,7 +405,6 @@ parse_iso_8601_datetime(char *str, int len,
out->month = 1;
out->day = 1;
-
/*
* The string "today" means take today's date in local time, and
* convert it to a date representation. This date representation, if
@@ -427,11 +414,9 @@ parse_iso_8601_datetime(char *str, int len,
* switching to an adjacent day depending on the current time and your
* timezone.
*/
- if (len == 5 && tolower(str[0]) == 't' &&
- tolower(str[1]) == 'o' &&
- tolower(str[2]) == 'd' &&
- tolower(str[3]) == 'a' &&
- tolower(str[4]) == 'y') {
+ if (len == 5 && tolower(str[0]) == 't' && tolower(str[1]) == 'o' &&
+ tolower(str[2]) == 'd' && tolower(str[3]) == 'a' &&
+ tolower(str[4]) == 'y') {
NPY_TIME_T rawtime = 0;
struct tm tm_;
@@ -460,9 +445,9 @@ parse_iso_8601_datetime(char *str, int len,
}
/* Check the casting rule */
- if (!can_cast_datetime64_units(bestunit, unit,
- casting)) {
- PyErr_Format(PyExc_TypeError, "Cannot parse \"%s\" as unit "
+ if (!can_cast_datetime64_units(bestunit, unit, casting)) {
+ PyErr_Format(PyExc_TypeError,
+ "Cannot parse \"%s\" as unit "
"'%s' using casting rule %s",
str, _datetime_strings[unit],
npy_casting_to_string(casting));
@@ -473,9 +458,8 @@ parse_iso_8601_datetime(char *str, int len,
}
/* The string "now" resolves to the current UTC time */
- if (len == 3 && tolower(str[0]) == 'n' &&
- tolower(str[1]) == 'o' &&
- tolower(str[2]) == 'w') {
+ if (len == 3 && tolower(str[0]) == 'n' && tolower(str[1]) == 'o' &&
+ tolower(str[2]) == 'w') {
NPY_TIME_T rawtime = 0;
pandas_datetime_metadata meta;
@@ -503,9 +487,9 @@ parse_iso_8601_datetime(char *str, int len,
}
/* Check the casting rule */
- if (!can_cast_datetime64_units(bestunit, unit,
- casting)) {
- PyErr_Format(PyExc_TypeError, "Cannot parse \"%s\" as unit "
+ if (!can_cast_datetime64_units(bestunit, unit, casting)) {
+ PyErr_Format(PyExc_TypeError,
+ "Cannot parse \"%s\" as unit "
"'%s' using casting rule %s",
str, _datetime_strings[unit],
npy_casting_to_string(casting));
@@ -543,12 +527,11 @@ parse_iso_8601_datetime(char *str, int len,
out->year = 0;
if (sublen >= 4 && isdigit(substr[0]) && isdigit(substr[1]) &&
isdigit(substr[2]) && isdigit(substr[3])) {
-
out->year = 1000 * (substr[0] - '0') + 100 * (substr[1] - '0') +
- 10 * (substr[2] - '0') + (substr[3] - '0');
+ 10 * (substr[2] - '0') + (substr[3] - '0');
substr += 4;
- sublen -= 4;;
+ sublen -= 4;
}
/* Negate the year if necessary */
@@ -596,8 +579,7 @@ parse_iso_8601_datetime(char *str, int len,
out->month = 10 * out->month + (*substr - '0');
++substr;
--sublen;
- }
- else if (!has_ymd_sep) {
+ } else if (!has_ymd_sep) {
goto parse_error;
}
if (out->month < 1 || out->month > 12) {
@@ -610,7 +592,7 @@ parse_iso_8601_datetime(char *str, int len,
if (sublen == 0) {
/* Forbid YYYYMM. Parsed instead as YYMMDD by someone else. */
if (!has_ymd_sep) {
- goto parse_error;
+ goto parse_error;
}
if (out_local != NULL) {
*out_local = 0;
@@ -631,7 +613,7 @@ parse_iso_8601_datetime(char *str, int len,
/* PARSE THE DAY */
/* First digit required */
if (!isdigit(*substr)) {
- goto parse_error;
+ goto parse_error;
}
out->day = (*substr - '0');
++substr;
@@ -641,13 +623,11 @@ parse_iso_8601_datetime(char *str, int len,
out->day = 10 * out->day + (*substr - '0');
++substr;
--sublen;
- }
- else if (!has_ymd_sep) {
+ } else if (!has_ymd_sep) {
goto parse_error;
}
if (out->day < 1 ||
- out->day > days_per_month_table[year_leap][out->month-1])
- {
+ out->day > days_per_month_table[year_leap][out->month - 1]) {
PyErr_Format(PyExc_ValueError,
"Day out of range in datetime string \"%s\"", str);
goto error;
@@ -684,7 +664,7 @@ parse_iso_8601_datetime(char *str, int len,
--sublen;
if (out->hour >= 24) {
PyErr_Format(PyExc_ValueError,
- "Hours out of range in datetime string \"%s\"", str);
+ "Hours out of range in datetime string \"%s\"", str);
goto error;
}
}
@@ -706,8 +686,7 @@ parse_iso_8601_datetime(char *str, int len,
if (sublen == 0 || !isdigit(*substr)) {
goto parse_error;
}
- }
- else if (!isdigit(*substr)) {
+ } else if (!isdigit(*substr)) {
if (!hour_was_2_digits) {
goto parse_error;
}
@@ -730,8 +709,7 @@ parse_iso_8601_datetime(char *str, int len,
"Minutes out of range in datetime string \"%s\"", str);
goto error;
}
- }
- else if (!has_hms_sep) {
+ } else if (!has_hms_sep) {
goto parse_error;
}
@@ -749,10 +727,8 @@ parse_iso_8601_datetime(char *str, int len,
if (sublen == 0 || !isdigit(*substr)) {
goto parse_error;
}
- }
- else if (!has_hms_sep && isdigit(*substr)) {
- }
- else {
+ } else if (!has_hms_sep && isdigit(*substr)) {
+ } else {
bestunit = PANDAS_FR_m;
goto parse_timezone;
}
@@ -772,8 +748,7 @@ parse_iso_8601_datetime(char *str, int len,
"Seconds out of range in datetime string \"%s\"", str);
goto error;
}
- }
- else if (!has_hms_sep) {
+ } else if (!has_hms_sep) {
goto parse_error;
}
@@ -781,8 +756,7 @@ parse_iso_8601_datetime(char *str, int len,
if (sublen > 0 && *substr == '.') {
++substr;
--sublen;
- }
- else {
+ } else {
bestunit = PANDAS_FR_s;
goto parse_timezone;
}
@@ -791,7 +765,7 @@ parse_iso_8601_datetime(char *str, int len,
numdigits = 0;
for (i = 0; i < 6; ++i) {
out->us *= 10;
- if (sublen > 0 && isdigit(*substr)) {
+ if (sublen > 0 && isdigit(*substr)) {
out->us += (*substr - '0');
++substr;
--sublen;
@@ -802,8 +776,7 @@ parse_iso_8601_datetime(char *str, int len,
if (sublen == 0 || !isdigit(*substr)) {
if (numdigits > 3) {
bestunit = PANDAS_FR_us;
- }
- else {
+ } else {
bestunit = PANDAS_FR_ms;
}
goto parse_timezone;
@@ -824,8 +797,7 @@ parse_iso_8601_datetime(char *str, int len,
if (sublen == 0 || !isdigit(*substr)) {
if (numdigits > 3) {
bestunit = PANDAS_FR_ps;
- }
- else {
+ } else {
bestunit = PANDAS_FR_ns;
}
goto parse_timezone;
@@ -845,16 +817,15 @@ parse_iso_8601_datetime(char *str, int len,
if (numdigits > 3) {
bestunit = PANDAS_FR_as;
- }
- else {
+ } else {
bestunit = PANDAS_FR_fs;
}
parse_timezone:
/* trim any whitepsace between time/timeezone */
while (sublen > 0 && isspace(*substr)) {
- ++substr;
- --sublen;
+ ++substr;
+ --sublen;
}
if (sublen == 0) {
@@ -871,18 +842,16 @@ parse_iso_8601_datetime(char *str, int len,
if (out_tzoffset != NULL) {
*out_tzoffset = 0;
- }
+ }
if (sublen == 1) {
goto finish;
- }
- else {
+ } else {
++substr;
--sublen;
}
- }
- /* Time zone offset */
- else if (*substr == '-' || *substr == '+') {
+ } else if (*substr == '-' || *substr == '+') {
+ /* Time zone offset */
int offset_neg = 0, offset_hour = 0, offset_minute = 0;
/*
@@ -903,17 +872,16 @@ parse_iso_8601_datetime(char *str, int len,
sublen -= 2;
if (offset_hour >= 24) {
PyErr_Format(PyExc_ValueError,
- "Timezone hours offset out of range "
- "in datetime string \"%s\"", str);
+ "Timezone hours offset out of range "
+ "in datetime string \"%s\"",
+ str);
goto error;
}
- }
- else if (sublen >= 1 && isdigit(substr[0])) {
+ } else if (sublen >= 1 && isdigit(substr[0])) {
offset_hour = substr[0] - '0';
++substr;
--sublen;
- }
- else {
+ } else {
goto parse_error;
}
@@ -932,17 +900,16 @@ parse_iso_8601_datetime(char *str, int len,
sublen -= 2;
if (offset_minute >= 60) {
PyErr_Format(PyExc_ValueError,
- "Timezone minutes offset out of range "
- "in datetime string \"%s\"", str);
+ "Timezone minutes offset out of range "
+ "in datetime string \"%s\"",
+ str);
goto error;
}
- }
- else if (sublen >= 1 && isdigit(substr[0])) {
+ } else if (sublen >= 1 && isdigit(substr[0])) {
offset_minute = substr[0] - '0';
++substr;
--sublen;
- }
- else {
+ } else {
goto parse_error;
}
}
@@ -975,9 +942,9 @@ parse_iso_8601_datetime(char *str, int len,
}
/* Check the casting rule */
- if (!can_cast_datetime64_units(bestunit, unit,
- casting)) {
- PyErr_Format(PyExc_TypeError, "Cannot parse \"%s\" as unit "
+ if (!can_cast_datetime64_units(bestunit, unit, casting)) {
+ PyErr_Format(PyExc_TypeError,
+ "Cannot parse \"%s\" as unit "
"'%s' using casting rule %s",
str, _datetime_strings[unit],
npy_casting_to_string(casting));
@@ -988,8 +955,8 @@ parse_iso_8601_datetime(char *str, int len,
parse_error:
PyErr_Format(PyExc_ValueError,
- "Error parsing datetime string \"%s\" at position %d",
- str, (int)(substr-str));
+ "Error parsing datetime string \"%s\" at position %d", str,
+ (int)(substr - str));
return -1;
error:
@@ -1000,9 +967,7 @@ parse_iso_8601_datetime(char *str, int len,
* Provides a string length to use for converting datetime
* objects with the given local and unit settings.
*/
-int
-get_datetime_iso_8601_strlen(int local, PANDAS_DATETIMEUNIT base)
-{
+int get_datetime_iso_8601_strlen(int local, PANDAS_DATETIMEUNIT base) {
int len = 0;
switch (base) {
@@ -1010,28 +975,28 @@ get_datetime_iso_8601_strlen(int local, PANDAS_DATETIMEUNIT base)
/*case PANDAS_FR_GENERIC:*/
/* return 4;*/
case PANDAS_FR_as:
- len += 3; /* "###" */
+ len += 3; /* "###" */
case PANDAS_FR_fs:
- len += 3; /* "###" */
+ len += 3; /* "###" */
case PANDAS_FR_ps:
- len += 3; /* "###" */
+ len += 3; /* "###" */
case PANDAS_FR_ns:
- len += 3; /* "###" */
+ len += 3; /* "###" */
case PANDAS_FR_us:
- len += 3; /* "###" */
+ len += 3; /* "###" */
case PANDAS_FR_ms:
- len += 4; /* ".###" */
+ len += 4; /* ".###" */
case PANDAS_FR_s:
- len += 3; /* ":##" */
+ len += 3; /* ":##" */
case PANDAS_FR_m:
- len += 3; /* ":##" */
+ len += 3; /* ":##" */
case PANDAS_FR_h:
- len += 3; /* "T##" */
+ len += 3; /* "T##" */
case PANDAS_FR_D:
case PANDAS_FR_W:
- len += 3; /* "-##" */
+ len += 3; /* "-##" */
case PANDAS_FR_M:
- len += 3; /* "-##" */
+ len += 3; /* "-##" */
case PANDAS_FR_Y:
len += 21; /* 64-bit year */
break;
@@ -1042,10 +1007,9 @@ get_datetime_iso_8601_strlen(int local, PANDAS_DATETIMEUNIT base)
if (base >= PANDAS_FR_h) {
if (local) {
- len += 5; /* "+####" or "-####" */
- }
- else {
- len += 1; /* "Z" */
+ len += 5; /* "+####" or "-####" */
+ } else {
+ len += 1; /* "Z" */
}
}
@@ -1058,43 +1022,31 @@ get_datetime_iso_8601_strlen(int local, PANDAS_DATETIMEUNIT base)
* Finds the largest unit whose value is nonzero, and for which
* the remainder for the rest of the units is zero.
*/
-static PANDAS_DATETIMEUNIT
-lossless_unit_from_datetimestruct(pandas_datetimestruct *dts)
-{
+static PANDAS_DATETIMEUNIT lossless_unit_from_datetimestruct(
+ pandas_datetimestruct *dts) {
if (dts->as % 1000 != 0) {
return PANDAS_FR_as;
- }
- else if (dts->as != 0) {
+ } else if (dts->as != 0) {
return PANDAS_FR_fs;
- }
- else if (dts->ps % 1000 != 0) {
+ } else if (dts->ps % 1000 != 0) {
return PANDAS_FR_ps;
- }
- else if (dts->ps != 0) {
+ } else if (dts->ps != 0) {
return PANDAS_FR_ns;
- }
- else if (dts->us % 1000 != 0) {
+ } else if (dts->us % 1000 != 0) {
return PANDAS_FR_us;
- }
- else if (dts->us != 0) {
+ } else if (dts->us != 0) {
return PANDAS_FR_ms;
- }
- else if (dts->sec != 0) {
+ } else if (dts->sec != 0) {
return PANDAS_FR_s;
- }
- else if (dts->min != 0) {
+ } else if (dts->min != 0) {
return PANDAS_FR_m;
- }
- else if (dts->hour != 0) {
+ } else if (dts->hour != 0) {
return PANDAS_FR_h;
- }
- else if (dts->day != 1) {
+ } else if (dts->day != 1) {
return PANDAS_FR_D;
- }
- else if (dts->month != 1) {
+ } else if (dts->month != 1) {
return PANDAS_FR_M;
- }
- else {
+ } else {
return PANDAS_FR_Y;
}
}
@@ -1125,11 +1077,9 @@ lossless_unit_from_datetimestruct(pandas_datetimestruct *dts)
* Returns 0 on success, -1 on failure (for example if the output
* string was too short).
*/
-int
-make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
- int local, PANDAS_DATETIMEUNIT base, int tzoffset,
- NPY_CASTING casting)
-{
+int make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
+ int local, PANDAS_DATETIMEUNIT base, int tzoffset,
+ NPY_CASTING casting) {
pandas_datetimestruct dts_local;
int timezone_offset = 0;
@@ -1160,10 +1110,9 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
/* Set dts to point to our local time instead of the UTC time */
dts = &dts_local;
- }
- /* Use the manually provided tzoffset */
- else if (local) {
- /* Make a copy of the pandas_datetimestruct we can modify */
+ } else if (local) {
+ // Use the manually provided tzoffset.
+ // Make a copy of the pandas_datetimestruct we can modify.
dts_local = *dts;
dts = &dts_local;
@@ -1180,22 +1129,23 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
if (casting != NPY_UNSAFE_CASTING) {
/* Producing a date as a local time is always 'unsafe' */
if (base <= PANDAS_FR_D && local) {
- PyErr_SetString(PyExc_TypeError, "Cannot create a local "
- "timezone-based date string from a NumPy "
- "datetime without forcing 'unsafe' casting");
+ PyErr_SetString(PyExc_TypeError,
+ "Cannot create a local "
+ "timezone-based date string from a NumPy "
+ "datetime without forcing 'unsafe' casting");
return -1;
- }
- /* Only 'unsafe' and 'same_kind' allow data loss */
- else {
+ } else {
+ /* Only 'unsafe' and 'same_kind' allow data loss */
PANDAS_DATETIMEUNIT unitprec;
unitprec = lossless_unit_from_datetimestruct(dts);
if (casting != NPY_SAME_KIND_CASTING && unitprec > base) {
- PyErr_Format(PyExc_TypeError, "Cannot create a "
- "string with unit precision '%s' "
- "from the NumPy datetime, which has data at "
- "unit precision '%s', "
- "requires 'unsafe' or 'same_kind' casting",
+ PyErr_Format(PyExc_TypeError,
+ "Cannot create a "
+ "string with unit precision '%s' "
+ "from the NumPy datetime, which has data at "
+ "unit precision '%s', "
+ "requires 'unsafe' or 'same_kind' casting",
_datetime_strings[base],
_datetime_strings[unitprec]);
return -1;
@@ -1203,12 +1153,12 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
}
- /* YEAR */
- /*
- * Can't use PyOS_snprintf, because it always produces a '\0'
- * character at the end, and NumPy string types are permitted
- * to have data all the way to the end of the buffer.
- */
+/* YEAR */
+/*
+ * Can't use PyOS_snprintf, because it always produces a '\0'
+ * character at the end, and NumPy string types are permitted
+ * to have data all the way to the end of the buffer.
+ */
#ifdef _WIN32
tmplen = _snprintf(substr, sublen, "%04" NPY_INT64_FMT, dts->year);
#else
@@ -1230,15 +1180,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* MONTH */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = '-';
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->month / 10) + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)((dts->month % 10) + '0');
@@ -1254,15 +1204,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* DAY */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = '-';
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->day / 10) + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)((dts->day % 10) + '0');
@@ -1278,15 +1228,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* HOUR */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = 'T';
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->hour / 10) + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)((dts->hour % 10) + '0');
@@ -1299,15 +1249,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* MINUTE */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = ':';
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->min / 10) + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)((dts->min % 10) + '0');
@@ -1320,15 +1270,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* SECOND */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = ':';
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->sec / 10) + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)((dts->sec % 10) + '0');
@@ -1341,19 +1291,19 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* MILLISECOND */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = '.';
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->us / 100000) % 10 + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)((dts->us / 10000) % 10 + '0');
- if (sublen < 4 ) {
+ if (sublen < 4) {
goto string_too_short;
}
substr[3] = (char)((dts->us / 1000) % 10 + '0');
@@ -1366,15 +1316,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* MICROSECOND */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = (char)((dts->us / 100) % 10 + '0');
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->us / 10) % 10 + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)(dts->us % 10 + '0');
@@ -1387,15 +1337,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* NANOSECOND */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = (char)((dts->ps / 100000) % 10 + '0');
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->ps / 10000) % 10 + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)((dts->ps / 1000) % 10 + '0');
@@ -1408,15 +1358,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* PICOSECOND */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = (char)((dts->ps / 100) % 10 + '0');
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->ps / 10) % 10 + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)(dts->ps % 10 + '0');
@@ -1429,15 +1379,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* FEMTOSECOND */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = (char)((dts->as / 100000) % 10 + '0');
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->as / 10000) % 10 + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)((dts->as / 1000) % 10 + '0');
@@ -1450,15 +1400,15 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
}
/* ATTOSECOND */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
substr[0] = (char)((dts->as / 100) % 10 + '0');
- if (sublen < 2 ) {
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((dts->as / 10) % 10 + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)(dts->as % 10 + '0');
@@ -1474,35 +1424,33 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
if (timezone_offset < 0) {
substr[0] = '-';
timezone_offset = -timezone_offset;
- }
- else {
+ } else {
substr[0] = '+';
}
substr += 1;
sublen -= 1;
/* Add the timezone offset */
- if (sublen < 1 ) {
+ if (sublen < 1) {
goto string_too_short;
}
- substr[0] = (char)((timezone_offset / (10*60)) % 10 + '0');
- if (sublen < 2 ) {
+ substr[0] = (char)((timezone_offset / (10 * 60)) % 10 + '0');
+ if (sublen < 2) {
goto string_too_short;
}
substr[1] = (char)((timezone_offset / 60) % 10 + '0');
- if (sublen < 3 ) {
+ if (sublen < 3) {
goto string_too_short;
}
substr[2] = (char)(((timezone_offset % 60) / 10) % 10 + '0');
- if (sublen < 4 ) {
+ if (sublen < 4) {
goto string_too_short;
}
substr[3] = (char)((timezone_offset % 60) % 10 + '0');
substr += 4;
sublen -= 4;
- }
- /* UTC "Zulu" time */
- else {
+ } else {
+ /* UTC "Zulu" time */
if (sublen < 1) {
goto string_too_short;
}
@@ -1520,8 +1468,8 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
string_too_short:
PyErr_Format(PyExc_RuntimeError,
- "The string provided for NumPy ISO datetime formatting "
- "was too short, with length %d",
- outlen);
+ "The string provided for NumPy ISO datetime formatting "
+ "was too short, with length %d",
+ outlen);
return -1;
}
diff --git a/pandas/src/datetime/np_datetime_strings.h b/pandas/src/datetime/np_datetime_strings.h
index 0d9a0944310fb..1114ec5eae064 100644
--- a/pandas/src/datetime/np_datetime_strings.h
+++ b/pandas/src/datetime/np_datetime_strings.h
@@ -1,9 +1,26 @@
/*
- * This is derived from numpy 1.7. See NP_LICENSE.txt
- */
-#ifndef _NPY_PRIVATE__DATETIME_STRINGS_H_
-#define _NPY_PRIVATE__DATETIME_STRINGS_H_
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+
+Written by Mark Wiebe (mwwiebe@gmail.com)
+Copyright (c) 2011 by Enthought, Inc.
+
+Copyright (c) 2005-2011, NumPy Developers
+All rights reserved.
+
+See NUMPY_LICENSE.txt for the license.
+
+This file implements string parsing and creation for NumPy datetime.
+
+*/
+
+#ifndef PANDAS_SRC_DATETIME_NP_DATETIME_STRINGS_H_
+#define PANDAS_SRC_DATETIME_NP_DATETIME_STRINGS_H_
/*
* Parses (almost) standard ISO 8601 date strings. The differences are:
@@ -86,4 +103,4 @@ make_iso_8601_datetime(pandas_datetimestruct *dts, char *outstr, int outlen,
int local, PANDAS_DATETIMEUNIT base, int tzoffset,
NPY_CASTING casting);
-#endif
+#endif // PANDAS_SRC_DATETIME_NP_DATETIME_STRINGS_H_
diff --git a/pandas/src/datetime_helper.h b/pandas/src/datetime_helper.h
index d78e91e747854..2b24028ff3d8c 100644
--- a/pandas/src/datetime_helper.h
+++ b/pandas/src/datetime_helper.h
@@ -1,21 +1,29 @@
+/*
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+*/
+
+#ifndef PANDAS_SRC_DATETIME_HELPER_H_
+#define PANDAS_SRC_DATETIME_HELPER_H_
+
+#include
#include "datetime.h"
#include "numpy/arrayobject.h"
#include "numpy/arrayscalars.h"
-#include
#if PY_MAJOR_VERSION >= 3
#define PyInt_AS_LONG PyLong_AsLong
#endif
-void mangle_nat(PyObject *val) {
- PyDateTime_GET_MONTH(val) = -1;
- PyDateTime_GET_DAY(val) = -1;
-}
-
npy_int64 get_long_attr(PyObject *o, const char *attr) {
npy_int64 long_val;
PyObject *value = PyObject_GetAttrString(o, attr);
- long_val = (PyLong_Check(value) ? PyLong_AsLongLong(value) : PyInt_AS_LONG(value));
+ long_val = (PyLong_Check(value) ?
+ PyLong_AsLongLong(value) : PyInt_AS_LONG(value));
Py_DECREF(value);
return long_val;
}
@@ -28,3 +36,5 @@ npy_float64 total_seconds(PyObject *td) {
npy_int64 days_in_seconds = days * 24LL * 3600LL;
return (microseconds + (seconds + days_in_seconds) * 1000000.0) / 1000000.0;
}
+
+#endif // PANDAS_SRC_DATETIME_HELPER_H_
diff --git a/pandas/src/hash.pyx b/pandas/src/hash.pyx
new file mode 100644
index 0000000000000..06ed947808e39
--- /dev/null
+++ b/pandas/src/hash.pyx
@@ -0,0 +1,191 @@
+# cython: profile=False
+# Translated from the reference implementation
+# at https://github.com/veorq/SipHash
+
+import cython
+cimport numpy as cnp
+import numpy as np
+from numpy cimport ndarray, uint8_t, uint32_t, uint64_t
+
+from util cimport _checknull
+from cpython cimport (PyString_Check,
+ PyBytes_Check,
+ PyUnicode_Check)
+from libc.stdlib cimport malloc, free
+
+DEF cROUNDS = 2
+DEF dROUNDS = 4
+
+
+@cython.boundscheck(False)
+def hash_object_array(ndarray[object] arr, object key, object encoding='utf8'):
+ """
+ Parameters
+ ----------
+ arr : 1-d object ndarray of objects
+ key : hash key, must be 16 byte len encoded
+ encoding : encoding for key & arr, default to 'utf8'
+
+ Returns
+ -------
+ 1-d uint64 ndarray of hashes
+
+ Notes
+ -----
+ allowed values must be strings, or nulls
+ mixed array types will raise TypeError
+
+ """
+ cdef:
+ Py_ssize_t i, l, n
+ ndarray[uint64_t] result
+ bytes data, k
+ uint8_t *kb
+ uint64_t *lens
+ char **vecs, *cdata
+ object val
+
+ k = key.encode(encoding)
+ kb = k
+ if len(k) != 16:
+ raise ValueError(
+ 'key should be a 16-byte string encoded, got {!r} (len {})'.format(
+ k, len(k)))
+
+ n = len(arr)
+
+ # create an array of bytes
+ vecs = malloc(n * sizeof(char *))
+ lens = malloc(n * sizeof(uint64_t))
+
+ cdef list datas = []
+ for i in range(n):
+ val = arr[i]
+ if PyString_Check(val):
+ data = val.encode(encoding)
+ elif PyBytes_Check(val):
+ data = val
+ elif PyUnicode_Check(val):
+ data = val.encode(encoding)
+ elif _checknull(val):
+ # null, stringify and encode
+ data = str(val).encode(encoding)
+
+ else:
+ raise TypeError("{} of type {} is not a valid type for hashing, "
+ "must be string or null".format(val, type(val)))
+
+ l = len(data)
+ lens[i] = l
+ cdata = data
+
+ # keep the refernce alive thru the end of the
+ # function
+ datas.append(data)
+ vecs[i] = cdata
+
+ result = np.empty(n, dtype=np.uint64)
+ with nogil:
+ for i in range(n):
+ result[i] = low_level_siphash(vecs[i], lens[i], kb)
+
+ free(vecs)
+ free(lens)
+ return result
+
+cdef inline uint64_t _rotl(uint64_t x, uint64_t b) nogil:
+ return (x << b) | (x >> (64 - b))
+
+cdef inline void u32to8_le(uint8_t* p, uint32_t v) nogil:
+ p[0] = (v)
+ p[1] = (v >> 8)
+ p[2] = (v >> 16)
+ p[3] = (v >> 24)
+
+cdef inline void u64to8_le(uint8_t* p, uint64_t v) nogil:
+ u32to8_le(p, v)
+ u32to8_le(p + 4, (v >> 32))
+
+cdef inline uint64_t u8to64_le(uint8_t* p) nogil:
+ return (p[0] |
+ p[1] << 8 |
+ p[2] << 16 |
+ p[3] << 24 |
+ p[4] << 32 |
+ p[5] << 40 |
+ p[6] << 48 |
+ p[7] << 56)
+
+cdef inline void _sipround(uint64_t* v0, uint64_t* v1,
+ uint64_t* v2, uint64_t* v3) nogil:
+ v0[0] += v1[0]
+ v1[0] = _rotl(v1[0], 13)
+ v1[0] ^= v0[0]
+ v0[0] = _rotl(v0[0], 32)
+ v2[0] += v3[0]
+ v3[0] = _rotl(v3[0], 16)
+ v3[0] ^= v2[0]
+ v0[0] += v3[0]
+ v3[0] = _rotl(v3[0], 21)
+ v3[0] ^= v0[0]
+ v2[0] += v1[0]
+ v1[0] = _rotl(v1[0], 17)
+ v1[0] ^= v2[0]
+ v2[0] = _rotl(v2[0], 32)
+
+cpdef uint64_t siphash(bytes data, bytes key) except? 0:
+ if len(key) != 16:
+ raise ValueError(
+ 'key should be a 16-byte bytestring, got {!r} (len {})'.format(
+ key, len(key)))
+ return low_level_siphash(data, len(data), key)
+
+
+@cython.cdivision(True)
+cdef uint64_t low_level_siphash(uint8_t* data, size_t datalen,
+ uint8_t* key) nogil:
+ cdef uint64_t v0 = 0x736f6d6570736575ULL
+ cdef uint64_t v1 = 0x646f72616e646f6dULL
+ cdef uint64_t v2 = 0x6c7967656e657261ULL
+ cdef uint64_t v3 = 0x7465646279746573ULL
+ cdef uint64_t b
+ cdef uint64_t k0 = u8to64_le(key)
+ cdef uint64_t k1 = u8to64_le(key + 8)
+ cdef uint64_t m
+ cdef int i
+ cdef uint8_t* end = data + datalen - (datalen % sizeof(uint64_t))
+ cdef int left = datalen & 7
+ cdef int left_byte
+
+ b = (datalen) << 56
+ v3 ^= k1
+ v2 ^= k0
+ v1 ^= k1
+ v0 ^= k0
+
+ while (data != end):
+ m = u8to64_le(data)
+ v3 ^= m
+ for i in range(cROUNDS):
+ _sipround(&v0, &v1, &v2, &v3)
+ v0 ^= m
+
+ data += sizeof(uint64_t)
+
+ for i in range(left-1, -1, -1):
+ b |= (data[i]) << (i * 8)
+
+ v3 ^= b
+
+ for i in range(cROUNDS):
+ _sipround(&v0, &v1, &v2, &v3)
+
+ v0 ^= b
+ v2 ^= 0xff
+
+ for i in range(dROUNDS):
+ _sipround(&v0, &v1, &v2, &v3)
+
+ b = v0 ^ v1 ^ v2 ^ v3
+
+ return b
diff --git a/pandas/src/hashtable_class_helper.pxi b/pandas/src/hashtable_class_helper.pxi
deleted file mode 100644
index da0c76aeca86f..0000000000000
--- a/pandas/src/hashtable_class_helper.pxi
+++ /dev/null
@@ -1,860 +0,0 @@
-"""
-Template for each `dtype` helper function for hashtable
-
-WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
-"""
-
-#----------------------------------------------------------------------
-# VectorData
-#----------------------------------------------------------------------
-
-
-ctypedef struct Float64VectorData:
- float64_t *data
- size_t n, m
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef void append_data_float64(Float64VectorData *data,
- float64_t x) nogil:
-
- data.data[data.n] = x
- data.n += 1
-
-
-ctypedef struct Int64VectorData:
- int64_t *data
- size_t n, m
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef void append_data_int64(Int64VectorData *data,
- int64_t x) nogil:
-
- data.data[data.n] = x
- data.n += 1
-
-ctypedef fused vector_data:
- Int64VectorData
- Float64VectorData
-
-cdef bint needs_resize(vector_data *data) nogil:
- return data.n == data.m
-
-#----------------------------------------------------------------------
-# Vector
-#----------------------------------------------------------------------
-
-cdef class Float64Vector:
-
- cdef:
- Float64VectorData *data
- ndarray ao
-
- def __cinit__(self):
- self.data = PyMem_Malloc(
- sizeof(Float64VectorData))
- if not self.data:
- raise MemoryError()
- self.data.n = 0
- self.data.m = _INIT_VEC_CAP
- self.ao = np.empty(self.data.m, dtype=np.float64)
- self.data.data = self.ao.data
-
- cdef resize(self):
- self.data.m = max(self.data.m * 4, _INIT_VEC_CAP)
- self.ao.resize(self.data.m)
- self.data.data = self.ao.data
-
- def __dealloc__(self):
- PyMem_Free(self.data)
-
- def __len__(self):
- return self.data.n
-
- def to_array(self):
- self.ao.resize(self.data.n)
- self.data.m = self.data.n
- return self.ao
-
- cdef inline void append(self, float64_t x):
-
- if needs_resize(self.data):
- self.resize()
-
- append_data_float64(self.data, x)
-
-cdef class Int64Vector:
-
- cdef:
- Int64VectorData *data
- ndarray ao
-
- def __cinit__(self):
- self.data = PyMem_Malloc(
- sizeof(Int64VectorData))
- if not self.data:
- raise MemoryError()
- self.data.n = 0
- self.data.m = _INIT_VEC_CAP
- self.ao = np.empty(self.data.m, dtype=np.int64)
- self.data.data = self.ao.data
-
- cdef resize(self):
- self.data.m = max(self.data.m * 4, _INIT_VEC_CAP)
- self.ao.resize(self.data.m)
- self.data.data = self.ao.data
-
- def __dealloc__(self):
- PyMem_Free(self.data)
-
- def __len__(self):
- return self.data.n
-
- def to_array(self):
- self.ao.resize(self.data.n)
- self.data.m = self.data.n
- return self.ao
-
- cdef inline void append(self, int64_t x):
-
- if needs_resize(self.data):
- self.resize()
-
- append_data_int64(self.data, x)
-
-
-cdef class ObjectVector:
-
- cdef:
- PyObject **data
- size_t n, m
- ndarray ao
-
- def __cinit__(self):
- self.n = 0
- self.m = _INIT_VEC_CAP
- self.ao = np.empty(_INIT_VEC_CAP, dtype=object)
- self.data = self.ao.data
-
- def __len__(self):
- return self.n
-
- cdef inline append(self, object o):
- if self.n == self.m:
- self.m = max(self.m * 2, _INIT_VEC_CAP)
- self.ao.resize(self.m)
- self.data = self.ao.data
-
- Py_INCREF(o)
- self.data[self.n] = o
- self.n += 1
-
- def to_array(self):
- self.ao.resize(self.n)
- self.m = self.n
- return self.ao
-
-
-#----------------------------------------------------------------------
-# HashTable
-#----------------------------------------------------------------------
-
-
-cdef class HashTable:
- pass
-
-cdef class Float64HashTable(HashTable):
-
- def __cinit__(self, size_hint=1):
- self.table = kh_init_float64()
- if size_hint is not None:
- kh_resize_float64(self.table, size_hint)
-
- def __len__(self):
- return self.table.size
-
- def __dealloc__(self):
- kh_destroy_float64(self.table)
-
- def __contains__(self, object key):
- cdef khiter_t k
- k = kh_get_float64(self.table, key)
- return k != self.table.n_buckets
-
- cpdef get_item(self, float64_t val):
- cdef khiter_t k
- k = kh_get_float64(self.table, val)
- if k != self.table.n_buckets:
- return self.table.vals[k]
- else:
- raise KeyError(val)
-
- def get_iter_test(self, float64_t key, Py_ssize_t iterations):
- cdef Py_ssize_t i, val=0
- for i in range(iterations):
- k = kh_get_float64(self.table, val)
- if k != self.table.n_buckets:
- val = self.table.vals[k]
-
- cpdef set_item(self, float64_t key, Py_ssize_t val):
- cdef:
- khiter_t k
- int ret = 0
-
- k = kh_put_float64(self.table, key, &ret)
- self.table.keys[k] = key
- if kh_exist_float64(self.table, k):
- self.table.vals[k] = val
- else:
- raise KeyError(key)
-
- @cython.boundscheck(False)
- def map(self, float64_t[:] keys, int64_t[:] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- float64_t key
- khiter_t k
-
- with nogil:
- for i in range(n):
- key = keys[i]
- k = kh_put_float64(self.table, key, &ret)
- self.table.vals[k] = values[i]
-
- @cython.boundscheck(False)
- def map_locations(self, ndarray[float64_t, ndim=1] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- float64_t val
- khiter_t k
-
- with nogil:
- for i in range(n):
- val = values[i]
- k = kh_put_float64(self.table, val, &ret)
- self.table.vals[k] = i
-
- @cython.boundscheck(False)
- def lookup(self, float64_t[:] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- float64_t val
- khiter_t k
- int64_t[:] locs = np.empty(n, dtype=np.int64)
-
- with nogil:
- for i in range(n):
- val = values[i]
- k = kh_get_float64(self.table, val)
- if k != self.table.n_buckets:
- locs[i] = self.table.vals[k]
- else:
- locs[i] = -1
-
- return np.asarray(locs)
-
- def factorize(self, float64_t values):
- uniques = Float64Vector()
- labels = self.get_labels(values, uniques, 0, 0)
- return uniques.to_array(), labels
-
- @cython.boundscheck(False)
- def get_labels(self, float64_t[:] values, Float64Vector uniques,
- Py_ssize_t count_prior, Py_ssize_t na_sentinel,
- bint check_null=True):
- cdef:
- Py_ssize_t i, n = len(values)
- int64_t[:] labels
- Py_ssize_t idx, count = count_prior
- int ret = 0
- float64_t val
- khiter_t k
- Float64VectorData *ud
-
- labels = np.empty(n, dtype=np.int64)
- ud = uniques.data
-
- with nogil:
- for i in range(n):
- val = values[i]
-
- if check_null and val != val:
- labels[i] = na_sentinel
- continue
-
- k = kh_get_float64(self.table, val)
-
- if k != self.table.n_buckets:
- idx = self.table.vals[k]
- labels[i] = idx
- else:
- k = kh_put_float64(self.table, val, &ret)
- self.table.vals[k] = count
-
- if needs_resize(ud):
- with gil:
- uniques.resize()
- append_data_float64(ud, val)
- labels[i] = count
- count += 1
-
- return np.asarray(labels)
-
- @cython.boundscheck(False)
- def get_labels_groupby(self, float64_t[:] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int64_t[:] labels
- Py_ssize_t idx, count = 0
- int ret = 0
- float64_t val
- khiter_t k
- Float64Vector uniques = Float64Vector()
- Float64VectorData *ud
-
- labels = np.empty(n, dtype=np.int64)
- ud = uniques.data
-
- with nogil:
- for i in range(n):
- val = values[i]
-
- # specific for groupby
- if val < 0:
- labels[i] = -1
- continue
-
- k = kh_get_float64(self.table, val)
- if k != self.table.n_buckets:
- idx = self.table.vals[k]
- labels[i] = idx
- else:
- k = kh_put_float64(self.table, val, &ret)
- self.table.vals[k] = count
-
- if needs_resize(ud):
- with gil:
- uniques.resize()
- append_data_float64(ud, val)
- labels[i] = count
- count += 1
-
- arr_uniques = uniques.to_array()
-
- return np.asarray(labels), arr_uniques
-
- @cython.boundscheck(False)
- def unique(self, float64_t[:] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- float64_t val
- khiter_t k
- bint seen_na = 0
- Float64Vector uniques = Float64Vector()
- Float64VectorData *ud
-
- ud = uniques.data
-
- with nogil:
- for i in range(n):
- val = values[i]
-
- if val == val:
- k = kh_get_float64(self.table, val)
- if k == self.table.n_buckets:
- kh_put_float64(self.table, val, &ret)
- if needs_resize(ud):
- with gil:
- uniques.resize()
- append_data_float64(ud, val)
- elif not seen_na:
- seen_na = 1
- if needs_resize(ud):
- with gil:
- uniques.resize()
- append_data_float64(ud, NAN)
-
- return uniques.to_array()
-
-cdef class Int64HashTable(HashTable):
-
- def __cinit__(self, size_hint=1):
- self.table = kh_init_int64()
- if size_hint is not None:
- kh_resize_int64(self.table, size_hint)
-
- def __len__(self):
- return self.table.size
-
- def __dealloc__(self):
- kh_destroy_int64(self.table)
-
- def __contains__(self, object key):
- cdef khiter_t k
- k = kh_get_int64(self.table, key)
- return k != self.table.n_buckets
-
- cpdef get_item(self, int64_t val):
- cdef khiter_t k
- k = kh_get_int64(self.table, val)
- if k != self.table.n_buckets:
- return self.table.vals[k]
- else:
- raise KeyError(val)
-
- def get_iter_test(self, int64_t key, Py_ssize_t iterations):
- cdef Py_ssize_t i, val=0
- for i in range(iterations):
- k = kh_get_int64(self.table, val)
- if k != self.table.n_buckets:
- val = self.table.vals[k]
-
- cpdef set_item(self, int64_t key, Py_ssize_t val):
- cdef:
- khiter_t k
- int ret = 0
-
- k = kh_put_int64(self.table, key, &ret)
- self.table.keys[k] = key
- if kh_exist_int64(self.table, k):
- self.table.vals[k] = val
- else:
- raise KeyError(key)
-
- @cython.boundscheck(False)
- def map(self, int64_t[:] keys, int64_t[:] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- int64_t key
- khiter_t k
-
- with nogil:
- for i in range(n):
- key = keys[i]
- k = kh_put_int64(self.table, key, &ret)
- self.table.vals[k] = values[i]
-
- @cython.boundscheck(False)
- def map_locations(self, ndarray[int64_t, ndim=1] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- int64_t val
- khiter_t k
-
- with nogil:
- for i in range(n):
- val = values[i]
- k = kh_put_int64(self.table, val, &ret)
- self.table.vals[k] = i
-
- @cython.boundscheck(False)
- def lookup(self, int64_t[:] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- int64_t val
- khiter_t k
- int64_t[:] locs = np.empty(n, dtype=np.int64)
-
- with nogil:
- for i in range(n):
- val = values[i]
- k = kh_get_int64(self.table, val)
- if k != self.table.n_buckets:
- locs[i] = self.table.vals[k]
- else:
- locs[i] = -1
-
- return np.asarray(locs)
-
- def factorize(self, int64_t values):
- uniques = Int64Vector()
- labels = self.get_labels(values, uniques, 0, 0)
- return uniques.to_array(), labels
-
- @cython.boundscheck(False)
- def get_labels(self, int64_t[:] values, Int64Vector uniques,
- Py_ssize_t count_prior, Py_ssize_t na_sentinel,
- bint check_null=True):
- cdef:
- Py_ssize_t i, n = len(values)
- int64_t[:] labels
- Py_ssize_t idx, count = count_prior
- int ret = 0
- int64_t val
- khiter_t k
- Int64VectorData *ud
-
- labels = np.empty(n, dtype=np.int64)
- ud = uniques.data
-
- with nogil:
- for i in range(n):
- val = values[i]
-
- if check_null and val == iNaT:
- labels[i] = na_sentinel
- continue
-
- k = kh_get_int64(self.table, val)
-
- if k != self.table.n_buckets:
- idx = self.table.vals[k]
- labels[i] = idx
- else:
- k = kh_put_int64(self.table, val, &ret)
- self.table.vals[k] = count
-
- if needs_resize(ud):
- with gil:
- uniques.resize()
- append_data_int64(ud, val)
- labels[i] = count
- count += 1
-
- return np.asarray(labels)
-
- @cython.boundscheck(False)
- def get_labels_groupby(self, int64_t[:] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int64_t[:] labels
- Py_ssize_t idx, count = 0
- int ret = 0
- int64_t val
- khiter_t k
- Int64Vector uniques = Int64Vector()
- Int64VectorData *ud
-
- labels = np.empty(n, dtype=np.int64)
- ud = uniques.data
-
- with nogil:
- for i in range(n):
- val = values[i]
-
- # specific for groupby
- if val < 0:
- labels[i] = -1
- continue
-
- k = kh_get_int64(self.table, val)
- if k != self.table.n_buckets:
- idx = self.table.vals[k]
- labels[i] = idx
- else:
- k = kh_put_int64(self.table, val, &ret)
- self.table.vals[k] = count
-
- if needs_resize(ud):
- with gil:
- uniques.resize()
- append_data_int64(ud, val)
- labels[i] = count
- count += 1
-
- arr_uniques = uniques.to_array()
-
- return np.asarray(labels), arr_uniques
-
- @cython.boundscheck(False)
- def unique(self, int64_t[:] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- int64_t val
- khiter_t k
- bint seen_na = 0
- Int64Vector uniques = Int64Vector()
- Int64VectorData *ud
-
- ud = uniques.data
-
- with nogil:
- for i in range(n):
- val = values[i]
-
- k = kh_get_int64(self.table, val)
- if k == self.table.n_buckets:
- kh_put_int64(self.table, val, &ret)
- if needs_resize(ud):
- with gil:
- uniques.resize()
- append_data_int64(ud, val)
-
- return uniques.to_array()
-
-
-cdef class StringHashTable(HashTable):
- cdef kh_str_t *table
-
- def __cinit__(self, int size_hint=1):
- self.table = kh_init_str()
- if size_hint is not None:
- kh_resize_str(self.table, size_hint)
-
- def __dealloc__(self):
- kh_destroy_str(self.table)
-
- cpdef get_item(self, object val):
- cdef khiter_t k
- k = kh_get_str(self.table, util.get_c_string(val))
- if k != self.table.n_buckets:
- return self.table.vals[k]
- else:
- raise KeyError(val)
-
- def get_iter_test(self, object key, Py_ssize_t iterations):
- cdef Py_ssize_t i, val
- for i in range(iterations):
- k = kh_get_str(self.table, util.get_c_string(key))
- if k != self.table.n_buckets:
- val = self.table.vals[k]
-
- cpdef set_item(self, object key, Py_ssize_t val):
- cdef:
- khiter_t k
- int ret = 0
- char* buf
-
- buf = util.get_c_string(key)
-
- k = kh_put_str(self.table, buf, &ret)
- self.table.keys[k] = key
- if kh_exist_str(self.table, k):
- self.table.vals[k] = val
- else:
- raise KeyError(key)
-
- def get_indexer(self, ndarray[object] values):
- cdef:
- Py_ssize_t i, n = len(values)
- ndarray[int64_t] labels = np.empty(n, dtype=np.int64)
- char *buf
- int64_t *resbuf = labels.data
- khiter_t k
- kh_str_t *table = self.table
-
- for i in range(n):
- buf = util.get_c_string(values[i])
- k = kh_get_str(table, buf)
- if k != table.n_buckets:
- resbuf[i] = table.vals[k]
- else:
- resbuf[i] = -1
- return labels
-
- def unique(self, ndarray[object] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- object val
- char *buf
- khiter_t k
- ObjectVector uniques = ObjectVector()
-
- for i in range(n):
- val = values[i]
- buf = util.get_c_string(val)
- k = kh_get_str(self.table, buf)
- if k == self.table.n_buckets:
- kh_put_str(self.table, buf, &ret)
- uniques.append(val)
-
- return uniques.to_array()
-
- def factorize(self, ndarray[object] values):
- cdef:
- Py_ssize_t i, n = len(values)
- ndarray[int64_t] labels = np.empty(n, dtype=np.int64)
- dict reverse = {}
- Py_ssize_t idx, count = 0
- int ret = 0
- object val
- char *buf
- khiter_t k
-
- for i in range(n):
- val = values[i]
- buf = util.get_c_string(val)
- k = kh_get_str(self.table, buf)
- if k != self.table.n_buckets:
- idx = self.table.vals[k]
- labels[i] = idx
- else:
- k = kh_put_str(self.table, buf, &ret)
- # print 'putting %s, %s' % (val, count)
-
- self.table.vals[k] = count
- reverse[count] = val
- labels[i] = count
- count += 1
-
- return reverse, labels
-
-
-na_sentinel = object
-
-cdef class PyObjectHashTable(HashTable):
-
- def __init__(self, size_hint=1):
- self.table = kh_init_pymap()
- kh_resize_pymap(self.table, size_hint)
-
- def __dealloc__(self):
- if self.table is not NULL:
- self.destroy()
-
- def __len__(self):
- return self.table.size
-
- def __contains__(self, object key):
- cdef khiter_t k
- hash(key)
- if key != key or key is None:
- key = na_sentinel
- k = kh_get_pymap(self.table, key)
- return k != self.table.n_buckets
-
- def destroy(self):
- kh_destroy_pymap(self.table)
- self.table = NULL
-
- cpdef get_item(self, object val):
- cdef khiter_t k
- if val != val or val is None:
- val = na_sentinel
- k = kh_get_pymap(self.table, val)
- if k != self.table.n_buckets:
- return self.table.vals[k]
- else:
- raise KeyError(val)
-
- def get_iter_test(self, object key, Py_ssize_t iterations):
- cdef Py_ssize_t i, val
- if key != key or key is None:
- key = na_sentinel
- for i in range(iterations):
- k = kh_get_pymap(self.table, key)
- if k != self.table.n_buckets:
- val = self.table.vals[k]
-
- cpdef set_item(self, object key, Py_ssize_t val):
- cdef:
- khiter_t k
- int ret = 0
- char* buf
-
- hash(key)
- if key != key or key is None:
- key = na_sentinel
- k = kh_put_pymap(self.table, key, &ret)
- # self.table.keys[k] = key
- if kh_exist_pymap(self.table, k):
- self.table.vals[k] = val
- else:
- raise KeyError(key)
-
- def map_locations(self, ndarray[object] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- object val
- khiter_t k
-
- for i in range(n):
- val = values[i]
- hash(val)
- if val != val or val is None:
- val = na_sentinel
-
- k = kh_put_pymap(self.table, val, &ret)
- self.table.vals[k] = i
-
- def lookup(self, ndarray[object] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- object val
- khiter_t k
- int64_t[:] locs = np.empty(n, dtype=np.int64)
-
- for i in range(n):
- val = values[i]
- hash(val)
- if val != val or val is None:
- val = na_sentinel
-
- k = kh_get_pymap(self.table, val)
- if k != self.table.n_buckets:
- locs[i] = self.table.vals[k]
- else:
- locs[i] = -1
-
- return np.asarray(locs)
-
- def unique(self, ndarray[object] values):
- cdef:
- Py_ssize_t i, n = len(values)
- int ret = 0
- object val
- khiter_t k
- ObjectVector uniques = ObjectVector()
- bint seen_na = 0
-
- for i in range(n):
- val = values[i]
- hash(val)
- if not _checknan(val):
- k = kh_get_pymap(self.table, val)
- if k == self.table.n_buckets:
- kh_put_pymap(self.table, val, &ret)
- uniques.append(val)
- elif not seen_na:
- seen_na = 1
- uniques.append(nan)
-
- return uniques.to_array()
-
- def get_labels(self, ndarray[object] values, ObjectVector uniques,
- Py_ssize_t count_prior, int64_t na_sentinel,
- bint check_null=True):
- cdef:
- Py_ssize_t i, n = len(values)
- int64_t[:] labels
- Py_ssize_t idx, count = count_prior
- int ret = 0
- object val
- khiter_t k
-
- labels = np.empty(n, dtype=np.int64)
-
- for i in range(n):
- val = values[i]
- hash(val)
-
- if check_null and val != val or val is None:
- labels[i] = na_sentinel
- continue
-
- k = kh_get_pymap(self.table, val)
- if k != self.table.n_buckets:
- idx = self.table.vals[k]
- labels[i] = idx
- else:
- k = kh_put_pymap(self.table, val, &ret)
- self.table.vals[k] = count
- uniques.append(val)
- labels[i] = count
- count += 1
-
- return np.asarray(labels)
\ No newline at end of file
diff --git a/pandas/src/hashtable_class_helper.pxi.in b/pandas/src/hashtable_class_helper.pxi.in
index 14e5363eee20c..22714e6305677 100644
--- a/pandas/src/hashtable_class_helper.pxi.in
+++ b/pandas/src/hashtable_class_helper.pxi.in
@@ -10,23 +10,28 @@ WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
{{py:
-# name, dtype
-dtypes = [('Float64', 'float64'), ('Int64', 'int64')]
-
+# name, dtype, arg
+# the generated StringVector is not actually used
+# but is included for completeness (rather ObjectVector is used
+# for uniques in hashtables)
+
+dtypes = [('Float64', 'float64', 'float64_t'),
+ ('Int64', 'int64', 'int64_t'),
+ ('String', 'string', 'char *')]
}}
-{{for name, dtype in dtypes}}
+{{for name, dtype, arg in dtypes}}
ctypedef struct {{name}}VectorData:
- {{dtype}}_t *data
+ {{arg}} *data
size_t n, m
@cython.wraparound(False)
@cython.boundscheck(False)
-cdef void append_data_{{dtype}}({{name}}VectorData *data,
- {{dtype}}_t x) nogil:
+cdef inline void append_data_{{dtype}}({{name}}VectorData *data,
+ {{arg}} x) nogil:
data.data[data.n] = x
data.n += 1
@@ -36,8 +41,9 @@ cdef void append_data_{{dtype}}({{name}}VectorData *data,
ctypedef fused vector_data:
Int64VectorData
Float64VectorData
+ StringVectorData
-cdef bint needs_resize(vector_data *data) nogil:
+cdef inline bint needs_resize(vector_data *data) nogil:
return data.n == data.m
#----------------------------------------------------------------------
@@ -46,12 +52,13 @@ cdef bint needs_resize(vector_data *data) nogil:
{{py:
-# name, dtype
-dtypes = [('Float64', 'float64'), ('Int64', 'int64')]
+# name, dtype, arg, idtype
+dtypes = [('Float64', 'float64', 'float64_t', 'np.float64'),
+ ('Int64', 'int64', 'int64_t', 'np.int64')]
}}
-{{for name, dtype in dtypes}}
+{{for name, dtype, arg, idtype in dtypes}}
cdef class {{name}}Vector:
@@ -66,13 +73,13 @@ cdef class {{name}}Vector:
raise MemoryError()
self.data.n = 0
self.data.m = _INIT_VEC_CAP
- self.ao = np.empty(self.data.m, dtype=np.{{dtype}})
- self.data.data = <{{dtype}}_t*> self.ao.data
+ self.ao = np.empty(self.data.m, dtype={{idtype}})
+ self.data.data = <{{arg}}*> self.ao.data
cdef resize(self):
self.data.m = max(self.data.m * 4, _INIT_VEC_CAP)
self.ao.resize(self.data.m)
- self.data.data = <{{dtype}}_t*> self.ao.data
+ self.data.data = <{{arg}}*> self.ao.data
def __dealloc__(self):
PyMem_Free(self.data)
@@ -85,7 +92,7 @@ cdef class {{name}}Vector:
self.data.m = self.data.n
return self.ao
- cdef inline void append(self, {{dtype}}_t x):
+ cdef inline void append(self, {{arg}} x):
if needs_resize(self.data):
self.resize()
@@ -94,6 +101,61 @@ cdef class {{name}}Vector:
{{endfor}}
+cdef class StringVector:
+
+ cdef:
+ StringVectorData *data
+
+ def __cinit__(self):
+ self.data = PyMem_Malloc(
+ sizeof(StringVectorData))
+ if not self.data:
+ raise MemoryError()
+ self.data.n = 0
+ self.data.m = _INIT_VEC_CAP
+ self.data.data = malloc(self.data.m * sizeof(char *))
+
+ cdef resize(self):
+ cdef:
+ char **orig_data
+ size_t i, m
+
+ m = self.data.m
+ self.data.m = max(self.data.m * 4, _INIT_VEC_CAP)
+
+ # TODO: can resize?
+ orig_data = self.data.data
+ self.data.data = malloc(self.data.m * sizeof(char *))
+ for i in range(m):
+ self.data.data[i] = orig_data[i]
+
+ def __dealloc__(self):
+ free(self.data.data)
+ PyMem_Free(self.data)
+
+ def __len__(self):
+ return self.data.n
+
+ def to_array(self):
+ cdef:
+ ndarray ao
+ size_t n
+ object val
+
+ ao = np.empty(self.data.n, dtype=np.object)
+ for i in range(self.data.n):
+ val = self.data.data[i]
+ ao[i] = val
+ self.data.m = self.data.n
+ return ao
+
+ cdef inline void append(self, char * x):
+
+ if needs_resize(self.data):
+ self.resize()
+
+ append_data_string(self.data, x)
+
cdef class ObjectVector:
@@ -377,9 +439,11 @@ cdef class {{name}}HashTable(HashTable):
cdef class StringHashTable(HashTable):
- cdef kh_str_t *table
+ # these by-definition *must* be strings
+ # or a sentinel np.nan / None missing value
+ na_string_sentinel = '__nan__'
- def __cinit__(self, int size_hint=1):
+ def __init__(self, int size_hint=1):
self.table = kh_init_str()
if size_hint is not None:
kh_resize_str(self.table, size_hint)
@@ -388,17 +452,26 @@ cdef class StringHashTable(HashTable):
kh_destroy_str(self.table)
cpdef get_item(self, object val):
- cdef khiter_t k
- k = kh_get_str(self.table, util.get_c_string(val))
+ cdef:
+ khiter_t k
+ char *v
+ v = util.get_c_string(val)
+
+ k = kh_get_str(self.table, v)
if k != self.table.n_buckets:
return self.table.vals[k]
else:
raise KeyError(val)
def get_iter_test(self, object key, Py_ssize_t iterations):
- cdef Py_ssize_t i, val
+ cdef:
+ Py_ssize_t i, val
+ char *v
+
+ v = util.get_c_string(key)
+
for i in range(iterations):
- k = kh_get_str(self.table, util.get_c_string(key))
+ k = kh_get_str(self.table, v)
if k != self.table.n_buckets:
val = self.table.vals[k]
@@ -406,83 +479,203 @@ cdef class StringHashTable(HashTable):
cdef:
khiter_t k
int ret = 0
- char* buf
+ char *v
- buf = util.get_c_string(key)
+ v = util.get_c_string(val)
- k = kh_put_str(self.table, buf, &ret)
+ k = kh_put_str(self.table, v, &ret)
self.table.keys[k] = key
if kh_exist_str(self.table, k):
self.table.vals[k] = val
else:
raise KeyError(key)
+ @cython.boundscheck(False)
def get_indexer(self, ndarray[object] values):
cdef:
Py_ssize_t i, n = len(values)
ndarray[int64_t] labels = np.empty(n, dtype=np.int64)
- char *buf
int64_t *resbuf = labels.data
khiter_t k
kh_str_t *table = self.table
+ char *v
+ char **vecs
+ vecs = malloc(n * sizeof(char *))
for i in range(n):
- buf = util.get_c_string(values[i])
- k = kh_get_str(table, buf)
- if k != table.n_buckets:
- resbuf[i] = table.vals[k]
- else:
- resbuf[i] = -1
+ val = values[i]
+ v = util.get_c_string(val)
+ vecs[i] = v
+
+ with nogil:
+ for i in range(n):
+ k = kh_get_str(table, vecs[i])
+ if k != table.n_buckets:
+ resbuf[i] = table.vals[k]
+ else:
+ resbuf[i] = -1
+
+ free(vecs)
return labels
+ @cython.boundscheck(False)
def unique(self, ndarray[object] values):
cdef:
- Py_ssize_t i, n = len(values)
+ Py_ssize_t i, count, n = len(values)
+ int64_t[:] uindexer
int ret = 0
object val
- char *buf
+ ObjectVector uniques
khiter_t k
- ObjectVector uniques = ObjectVector()
+ char *v
+ char **vecs
+ vecs = malloc(n * sizeof(char *))
+ uindexer = np.empty(n, dtype=np.int64)
for i in range(n):
val = values[i]
- buf = util.get_c_string(val)
- k = kh_get_str(self.table, buf)
- if k == self.table.n_buckets:
- kh_put_str(self.table, buf, &ret)
- uniques.append(val)
+ v = util.get_c_string(val)
+ vecs[i] = v
+
+ count = 0
+ with nogil:
+ for i in range(n):
+ v = vecs[i]
+ k = kh_get_str(self.table, v)
+ if k == self.table.n_buckets:
+ kh_put_str(self.table, v, &ret)
+ uindexer[count] = i
+ count += 1
+ free(vecs)
+ # uniques
+ uniques = ObjectVector()
+ for i in range(count):
+ uniques.append(values[uindexer[i]])
return uniques.to_array()
def factorize(self, ndarray[object] values):
+ uniques = ObjectVector()
+ labels = self.get_labels(values, uniques, 0, 0)
+ return uniques.to_array(), labels
+
+ @cython.boundscheck(False)
+ def lookup(self, ndarray[object] values):
cdef:
Py_ssize_t i, n = len(values)
- ndarray[int64_t] labels = np.empty(n, dtype=np.int64)
- dict reverse = {}
- Py_ssize_t idx, count = 0
int ret = 0
object val
- char *buf
+ char *v
khiter_t k
+ int64_t[:] locs = np.empty(n, dtype=np.int64)
+ # these by-definition *must* be strings
+ vecs = malloc(n * sizeof(char *))
for i in range(n):
val = values[i]
- buf = util.get_c_string(val)
- k = kh_get_str(self.table, buf)
- if k != self.table.n_buckets:
- idx = self.table.vals[k]
- labels[i] = idx
+
+ if PyUnicode_Check(val) or PyString_Check(val):
+ v = util.get_c_string(val)
else:
- k = kh_put_str(self.table, buf, &ret)
- # print 'putting %s, %s' % (val, count)
+ v = util.get_c_string(self.na_string_sentinel)
+ vecs[i] = v
- self.table.vals[k] = count
- reverse[count] = val
- labels[i] = count
- count += 1
+ with nogil:
+ for i in range(n):
+ v = vecs[i]
+ k = kh_get_str(self.table, v)
+ if k != self.table.n_buckets:
+ locs[i] = self.table.vals[k]
+ else:
+ locs[i] = -1
- return reverse, labels
+ free(vecs)
+ return np.asarray(locs)
+ @cython.boundscheck(False)
+ def map_locations(self, ndarray[object] values):
+ cdef:
+ Py_ssize_t i, n = len(values)
+ int ret = 0
+ object val
+ char *v
+ char **vecs
+ khiter_t k
+
+ # these by-definition *must* be strings
+ vecs = malloc(n * sizeof(char *))
+ for i in range(n):
+ val = values[i]
+
+ if PyUnicode_Check(val) or PyString_Check(val):
+ v = util.get_c_string(val)
+ else:
+ v = util.get_c_string(self.na_string_sentinel)
+ vecs[i] = v
+
+ with nogil:
+ for i in range(n):
+ v = vecs[i]
+ k = kh_put_str(self.table, v, &ret)
+ self.table.vals[k] = i
+ free(vecs)
+
+ @cython.boundscheck(False)
+ def get_labels(self, ndarray[object] values, ObjectVector uniques,
+ Py_ssize_t count_prior, int64_t na_sentinel,
+ bint check_null=1):
+ cdef:
+ Py_ssize_t i, n = len(values)
+ int64_t[:] labels
+ int64_t[:] uindexer
+ Py_ssize_t idx, count = count_prior
+ int ret = 0
+ object val
+ char *v
+ char **vecs
+ khiter_t k
+
+ # these by-definition *must* be strings
+ labels = np.zeros(n, dtype=np.int64)
+ uindexer = np.empty(n, dtype=np.int64)
+
+ # pre-filter out missing
+ # and assign pointers
+ vecs = malloc(n * sizeof(char *))
+ for i in range(n):
+ val = values[i]
+
+ if PyUnicode_Check(val) or PyString_Check(val):
+ v = util.get_c_string(val)
+ vecs[i] = v
+ else:
+ labels[i] = na_sentinel
+
+ # compute
+ with nogil:
+ for i in range(n):
+ if labels[i] == na_sentinel:
+ continue
+
+ v = vecs[i]
+ k = kh_get_str(self.table, v)
+ if k != self.table.n_buckets:
+ idx = self.table.vals[k]
+ labels[i] = idx
+ else:
+ k = kh_put_str(self.table, v, &ret)
+ self.table.vals[k] = count
+ uindexer[count] = i
+ labels[i] = count
+ count += 1
+
+ free(vecs)
+
+ # uniques
+ for i in range(count):
+ uniques.append(values[uindexer[i]])
+
+ return np.asarray(labels)
na_sentinel = object
@@ -639,4 +832,4 @@ cdef class PyObjectHashTable(HashTable):
labels[i] = count
count += 1
- return np.asarray(labels)
\ No newline at end of file
+ return np.asarray(labels)
diff --git a/pandas/src/hashtable_func_helper.pxi b/pandas/src/hashtable_func_helper.pxi
deleted file mode 100644
index d05b81acc5dd5..0000000000000
--- a/pandas/src/hashtable_func_helper.pxi
+++ /dev/null
@@ -1,197 +0,0 @@
-"""
-Template for each `dtype` helper function for hashtable
-
-WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
-"""
-
-#----------------------------------------------------------------------
-# VectorData
-#----------------------------------------------------------------------
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef build_count_table_float64(float64_t[:] values,
- kh_float64_t *table, bint dropna):
- cdef:
- khiter_t k
- Py_ssize_t i, n = len(values)
- float64_t val
- int ret = 0
-
- with nogil:
- kh_resize_float64(table, n)
-
- for i in range(n):
- val = values[i]
- if val == val or not dropna:
- k = kh_get_float64(table, val)
- if k != table.n_buckets:
- table.vals[k] += 1
- else:
- k = kh_put_float64(table, val, &ret)
- table.vals[k] = 1
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cpdef value_count_float64(float64_t[:] values, bint dropna):
- cdef:
- Py_ssize_t i=0
- kh_float64_t *table
- float64_t[:] result_keys
- int64_t[:] result_counts
- int k
-
- table = kh_init_float64()
- build_count_table_float64(values, table, dropna)
-
- result_keys = np.empty(table.n_occupied, dtype=np.float64)
- result_counts = np.zeros(table.n_occupied, dtype=np.int64)
-
- with nogil:
- for k in range(table.n_buckets):
- if kh_exist_float64(table, k):
- result_keys[i] = table.keys[k]
- result_counts[i] = table.vals[k]
- i += 1
- kh_destroy_float64(table)
-
- return np.asarray(result_keys), np.asarray(result_counts)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def duplicated_float64(float64_t[:] values,
- object keep='first'):
- cdef:
- int ret = 0, k
- float64_t value
- Py_ssize_t i, n = len(values)
- kh_float64_t * table = kh_init_float64()
- ndarray[uint8_t, ndim=1, cast=True] out = np.empty(n, dtype='bool')
-
- kh_resize_float64(table, min(n, _SIZE_HINT_LIMIT))
-
- if keep not in ('last', 'first', False):
- raise ValueError('keep must be either "first", "last" or False')
-
- if keep == 'last':
- with nogil:
- for i from n > i >=0:
- kh_put_float64(table, values[i], &ret)
- out[i] = ret == 0
- elif keep == 'first':
- with nogil:
- for i from 0 <= i < n:
- kh_put_float64(table, values[i], &ret)
- out[i] = ret == 0
- else:
- with nogil:
- for i from 0 <= i < n:
- value = values[i]
- k = kh_get_float64(table, value)
- if k != table.n_buckets:
- out[table.vals[k]] = 1
- out[i] = 1
- else:
- k = kh_put_float64(table, value, &ret)
- table.keys[k] = value
- table.vals[k] = i
- out[i] = 0
- kh_destroy_float64(table)
- return out
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef build_count_table_int64(int64_t[:] values,
- kh_int64_t *table, bint dropna):
- cdef:
- khiter_t k
- Py_ssize_t i, n = len(values)
- int64_t val
- int ret = 0
-
- with nogil:
- kh_resize_int64(table, n)
-
- for i in range(n):
- val = values[i]
- if val == val or not dropna:
- k = kh_get_int64(table, val)
- if k != table.n_buckets:
- table.vals[k] += 1
- else:
- k = kh_put_int64(table, val, &ret)
- table.vals[k] = 1
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cpdef value_count_int64(int64_t[:] values, bint dropna):
- cdef:
- Py_ssize_t i=0
- kh_int64_t *table
- int64_t[:] result_keys
- int64_t[:] result_counts
- int k
-
- table = kh_init_int64()
- build_count_table_int64(values, table, dropna)
-
- result_keys = np.empty(table.n_occupied, dtype=np.int64)
- result_counts = np.zeros(table.n_occupied, dtype=np.int64)
-
- with nogil:
- for k in range(table.n_buckets):
- if kh_exist_int64(table, k):
- result_keys[i] = table.keys[k]
- result_counts[i] = table.vals[k]
- i += 1
- kh_destroy_int64(table)
-
- return np.asarray(result_keys), np.asarray(result_counts)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def duplicated_int64(int64_t[:] values,
- object keep='first'):
- cdef:
- int ret = 0, k
- int64_t value
- Py_ssize_t i, n = len(values)
- kh_int64_t * table = kh_init_int64()
- ndarray[uint8_t, ndim=1, cast=True] out = np.empty(n, dtype='bool')
-
- kh_resize_int64(table, min(n, _SIZE_HINT_LIMIT))
-
- if keep not in ('last', 'first', False):
- raise ValueError('keep must be either "first", "last" or False')
-
- if keep == 'last':
- with nogil:
- for i from n > i >=0:
- kh_put_int64(table, values[i], &ret)
- out[i] = ret == 0
- elif keep == 'first':
- with nogil:
- for i from 0 <= i < n:
- kh_put_int64(table, values[i], &ret)
- out[i] = ret == 0
- else:
- with nogil:
- for i from 0 <= i < n:
- value = values[i]
- k = kh_get_int64(table, value)
- if k != table.n_buckets:
- out[table.vals[k]] = 1
- out[i] = 1
- else:
- k = kh_put_int64(table, value, &ret)
- table.keys[k] = value
- table.vals[k] = i
- out[i] = 0
- kh_destroy_int64(table)
- return out
diff --git a/pandas/src/helper.h b/pandas/src/helper.h
index b8c3cecbb2dc7..39bcf27e074df 100644
--- a/pandas/src/helper.h
+++ b/pandas/src/helper.h
@@ -1,16 +1,25 @@
-#ifndef C_HELPER_H
-#define C_HELPER_H
+/*
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+*/
+
+#ifndef PANDAS_SRC_HELPER_H_
+#define PANDAS_SRC_HELPER_H_
#ifndef PANDAS_INLINE
#if defined(__GNUC__)
#define PANDAS_INLINE static __inline__
#elif defined(_MSC_VER)
#define PANDAS_INLINE static __inline
- #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+ #elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
#define PANDAS_INLINE static inline
#else
#define PANDAS_INLINE
#endif
#endif
-#endif
+#endif // PANDAS_SRC_HELPER_H_
diff --git a/pandas/src/inference.pyx b/pandas/src/inference.pyx
index 4fa730eac0fd1..5ac2c70bb1808 100644
--- a/pandas/src/inference.pyx
+++ b/pandas/src/inference.pyx
@@ -6,19 +6,9 @@ iNaT = util.get_nat()
cdef bint PY2 = sys.version_info[0] == 2
-cdef extern from "headers/stdint.h":
- enum: UINT8_MAX
- enum: UINT16_MAX
- enum: UINT32_MAX
- enum: UINT64_MAX
- enum: INT8_MIN
- enum: INT8_MAX
- enum: INT16_MIN
- enum: INT16_MAX
- enum: INT32_MAX
- enum: INT32_MIN
- enum: INT64_MAX
- enum: INT64_MIN
+from util cimport (UINT8_MAX, UINT16_MAX, UINT32_MAX, UINT64_MAX,
+ INT8_MIN, INT8_MAX, INT16_MIN, INT16_MAX,
+ INT32_MAX, INT32_MIN, INT64_MAX, INT64_MIN)
# core.common import for fast inference checks
diff --git a/pandas/src/join_helper.pxi b/pandas/src/join_helper.pxi
deleted file mode 100644
index 44b8159351492..0000000000000
--- a/pandas/src/join_helper.pxi
+++ /dev/null
@@ -1,1899 +0,0 @@
-"""
-Template for each `dtype` helper function for join
-
-WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
-"""
-
-#----------------------------------------------------------------------
-# left_join_indexer, inner_join_indexer, outer_join_indexer
-#----------------------------------------------------------------------
-
-# Joins on ordered, unique indices
-
-# right might contain non-unique values
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def left_join_indexer_unique_float64(ndarray[float64_t] left,
- ndarray[float64_t] right):
- cdef:
- Py_ssize_t i, j, nleft, nright
- ndarray[int64_t] indexer
- float64_t lval, rval
-
- i = 0
- j = 0
- nleft = len(left)
- nright = len(right)
-
- indexer = np.empty(nleft, dtype=np.int64)
- while True:
- if i == nleft:
- break
-
- if j == nright:
- indexer[i] = -1
- i += 1
- continue
-
- rval = right[j]
-
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
-
- if left[i] == right[j]:
- indexer[i] = j
- i += 1
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
- j += 1
- elif left[i] > rval:
- indexer[i] = -1
- j += 1
- else:
- indexer[i] = -1
- i += 1
- return indexer
-
-
-# @cython.wraparound(False)
-# @cython.boundscheck(False)
-def left_join_indexer_float64(ndarray[float64_t] left,
- ndarray[float64_t] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- float64_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[float64_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.float64)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- i += 1
- count += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def inner_join_indexer_float64(ndarray[float64_t] left,
- ndarray[float64_t] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- float64_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[float64_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.float64)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = rval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def outer_join_indexer_float64(ndarray[float64_t] left,
- ndarray[float64_t] right):
- cdef:
- Py_ssize_t i, j, nright, nleft, count
- float64_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[float64_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- count = nright
- elif nright == 0:
- count = nleft
- else:
- while True:
- if i == nleft:
- count += nright - j
- break
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- count += 1
- j += 1
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.float64)
-
- # do it again, but populate the indexers / result
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- for j in range(nright):
- lindexer[j] = -1
- rindexer[j] = j
- result[j] = right[j]
- elif nright == 0:
- for i in range(nleft):
- lindexer[i] = i
- rindexer[i] = -1
- result[i] = left[i]
- else:
- while True:
- if i == nleft:
- while j < nright:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = right[j]
- count += 1
- j += 1
- break
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = lval
- count += 1
- i += 1
- else:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = rval
- count += 1
- j += 1
-
- return result, lindexer, rindexer
-
-# Joins on ordered, unique indices
-
-# right might contain non-unique values
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def left_join_indexer_unique_float32(ndarray[float32_t] left,
- ndarray[float32_t] right):
- cdef:
- Py_ssize_t i, j, nleft, nright
- ndarray[int64_t] indexer
- float32_t lval, rval
-
- i = 0
- j = 0
- nleft = len(left)
- nright = len(right)
-
- indexer = np.empty(nleft, dtype=np.int64)
- while True:
- if i == nleft:
- break
-
- if j == nright:
- indexer[i] = -1
- i += 1
- continue
-
- rval = right[j]
-
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
-
- if left[i] == right[j]:
- indexer[i] = j
- i += 1
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
- j += 1
- elif left[i] > rval:
- indexer[i] = -1
- j += 1
- else:
- indexer[i] = -1
- i += 1
- return indexer
-
-
-# @cython.wraparound(False)
-# @cython.boundscheck(False)
-def left_join_indexer_float32(ndarray[float32_t] left,
- ndarray[float32_t] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- float32_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[float32_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.float32)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- i += 1
- count += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def inner_join_indexer_float32(ndarray[float32_t] left,
- ndarray[float32_t] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- float32_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[float32_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.float32)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = rval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def outer_join_indexer_float32(ndarray[float32_t] left,
- ndarray[float32_t] right):
- cdef:
- Py_ssize_t i, j, nright, nleft, count
- float32_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[float32_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- count = nright
- elif nright == 0:
- count = nleft
- else:
- while True:
- if i == nleft:
- count += nright - j
- break
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- count += 1
- j += 1
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.float32)
-
- # do it again, but populate the indexers / result
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- for j in range(nright):
- lindexer[j] = -1
- rindexer[j] = j
- result[j] = right[j]
- elif nright == 0:
- for i in range(nleft):
- lindexer[i] = i
- rindexer[i] = -1
- result[i] = left[i]
- else:
- while True:
- if i == nleft:
- while j < nright:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = right[j]
- count += 1
- j += 1
- break
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = lval
- count += 1
- i += 1
- else:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = rval
- count += 1
- j += 1
-
- return result, lindexer, rindexer
-
-# Joins on ordered, unique indices
-
-# right might contain non-unique values
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def left_join_indexer_unique_object(ndarray[object] left,
- ndarray[object] right):
- cdef:
- Py_ssize_t i, j, nleft, nright
- ndarray[int64_t] indexer
- object lval, rval
-
- i = 0
- j = 0
- nleft = len(left)
- nright = len(right)
-
- indexer = np.empty(nleft, dtype=np.int64)
- while True:
- if i == nleft:
- break
-
- if j == nright:
- indexer[i] = -1
- i += 1
- continue
-
- rval = right[j]
-
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
-
- if left[i] == right[j]:
- indexer[i] = j
- i += 1
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
- j += 1
- elif left[i] > rval:
- indexer[i] = -1
- j += 1
- else:
- indexer[i] = -1
- i += 1
- return indexer
-
-
-# @cython.wraparound(False)
-# @cython.boundscheck(False)
-def left_join_indexer_object(ndarray[object] left,
- ndarray[object] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- object lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[object] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=object)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- i += 1
- count += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def inner_join_indexer_object(ndarray[object] left,
- ndarray[object] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- object lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[object] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=object)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = rval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def outer_join_indexer_object(ndarray[object] left,
- ndarray[object] right):
- cdef:
- Py_ssize_t i, j, nright, nleft, count
- object lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[object] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- count = nright
- elif nright == 0:
- count = nleft
- else:
- while True:
- if i == nleft:
- count += nright - j
- break
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- count += 1
- j += 1
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=object)
-
- # do it again, but populate the indexers / result
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- for j in range(nright):
- lindexer[j] = -1
- rindexer[j] = j
- result[j] = right[j]
- elif nright == 0:
- for i in range(nleft):
- lindexer[i] = i
- rindexer[i] = -1
- result[i] = left[i]
- else:
- while True:
- if i == nleft:
- while j < nright:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = right[j]
- count += 1
- j += 1
- break
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = lval
- count += 1
- i += 1
- else:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = rval
- count += 1
- j += 1
-
- return result, lindexer, rindexer
-
-# Joins on ordered, unique indices
-
-# right might contain non-unique values
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def left_join_indexer_unique_int32(ndarray[int32_t] left,
- ndarray[int32_t] right):
- cdef:
- Py_ssize_t i, j, nleft, nright
- ndarray[int64_t] indexer
- int32_t lval, rval
-
- i = 0
- j = 0
- nleft = len(left)
- nright = len(right)
-
- indexer = np.empty(nleft, dtype=np.int64)
- while True:
- if i == nleft:
- break
-
- if j == nright:
- indexer[i] = -1
- i += 1
- continue
-
- rval = right[j]
-
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
-
- if left[i] == right[j]:
- indexer[i] = j
- i += 1
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
- j += 1
- elif left[i] > rval:
- indexer[i] = -1
- j += 1
- else:
- indexer[i] = -1
- i += 1
- return indexer
-
-
-# @cython.wraparound(False)
-# @cython.boundscheck(False)
-def left_join_indexer_int32(ndarray[int32_t] left,
- ndarray[int32_t] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- int32_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[int32_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.int32)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- i += 1
- count += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def inner_join_indexer_int32(ndarray[int32_t] left,
- ndarray[int32_t] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- int32_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[int32_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.int32)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = rval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def outer_join_indexer_int32(ndarray[int32_t] left,
- ndarray[int32_t] right):
- cdef:
- Py_ssize_t i, j, nright, nleft, count
- int32_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[int32_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- count = nright
- elif nright == 0:
- count = nleft
- else:
- while True:
- if i == nleft:
- count += nright - j
- break
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- count += 1
- j += 1
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.int32)
-
- # do it again, but populate the indexers / result
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- for j in range(nright):
- lindexer[j] = -1
- rindexer[j] = j
- result[j] = right[j]
- elif nright == 0:
- for i in range(nleft):
- lindexer[i] = i
- rindexer[i] = -1
- result[i] = left[i]
- else:
- while True:
- if i == nleft:
- while j < nright:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = right[j]
- count += 1
- j += 1
- break
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = lval
- count += 1
- i += 1
- else:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = rval
- count += 1
- j += 1
-
- return result, lindexer, rindexer
-
-# Joins on ordered, unique indices
-
-# right might contain non-unique values
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def left_join_indexer_unique_int64(ndarray[int64_t] left,
- ndarray[int64_t] right):
- cdef:
- Py_ssize_t i, j, nleft, nright
- ndarray[int64_t] indexer
- int64_t lval, rval
-
- i = 0
- j = 0
- nleft = len(left)
- nright = len(right)
-
- indexer = np.empty(nleft, dtype=np.int64)
- while True:
- if i == nleft:
- break
-
- if j == nright:
- indexer[i] = -1
- i += 1
- continue
-
- rval = right[j]
-
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
-
- if left[i] == right[j]:
- indexer[i] = j
- i += 1
- while i < nleft - 1 and left[i] == rval:
- indexer[i] = j
- i += 1
- j += 1
- elif left[i] > rval:
- indexer[i] = -1
- j += 1
- else:
- indexer[i] = -1
- i += 1
- return indexer
-
-
-# @cython.wraparound(False)
-# @cython.boundscheck(False)
-def left_join_indexer_int64(ndarray[int64_t] left,
- ndarray[int64_t] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- int64_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[int64_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.int64)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0:
- while i < nleft:
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- i += 1
- count += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def inner_join_indexer_int64(ndarray[int64_t] left,
- ndarray[int64_t] right):
- """
- Two-pass algorithm for monotonic indexes. Handles many-to-one merges
- """
- cdef:
- Py_ssize_t i, j, k, nright, nleft, count
- int64_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[int64_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- # do it again now that result size is known
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.int64)
-
- i = 0
- j = 0
- count = 0
- if nleft > 0 and nright > 0:
- while True:
- if i == nleft:
- break
- if j == nright:
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = rval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- i += 1
- else:
- j += 1
-
- return result, lindexer, rindexer
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-def outer_join_indexer_int64(ndarray[int64_t] left,
- ndarray[int64_t] right):
- cdef:
- Py_ssize_t i, j, nright, nleft, count
- int64_t lval, rval
- ndarray[int64_t] lindexer, rindexer
- ndarray[int64_t] result
-
- nleft = len(left)
- nright = len(right)
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- count = nright
- elif nright == 0:
- count = nleft
- else:
- while True:
- if i == nleft:
- count += nright - j
- break
- if j == nright:
- count += nleft - i
- break
-
- lval = left[i]
- rval = right[j]
- if lval == rval:
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- count += 1
- i += 1
- else:
- count += 1
- j += 1
-
- lindexer = np.empty(count, dtype=np.int64)
- rindexer = np.empty(count, dtype=np.int64)
- result = np.empty(count, dtype=np.int64)
-
- # do it again, but populate the indexers / result
-
- i = 0
- j = 0
- count = 0
- if nleft == 0:
- for j in range(nright):
- lindexer[j] = -1
- rindexer[j] = j
- result[j] = right[j]
- elif nright == 0:
- for i in range(nleft):
- lindexer[i] = i
- rindexer[i] = -1
- result[i] = left[i]
- else:
- while True:
- if i == nleft:
- while j < nright:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = right[j]
- count += 1
- j += 1
- break
- if j == nright:
- while i < nleft:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = left[i]
- count += 1
- i += 1
- break
-
- lval = left[i]
- rval = right[j]
-
- if lval == rval:
- lindexer[count] = i
- rindexer[count] = j
- result[count] = lval
- count += 1
- if i < nleft - 1:
- if j < nright - 1 and right[j + 1] == rval:
- j += 1
- else:
- i += 1
- if left[i] != rval:
- j += 1
- elif j < nright - 1:
- j += 1
- if lval != right[j]:
- i += 1
- else:
- # end of the road
- break
- elif lval < rval:
- lindexer[count] = i
- rindexer[count] = -1
- result[count] = lval
- count += 1
- i += 1
- else:
- lindexer[count] = -1
- rindexer[count] = j
- result[count] = rval
- count += 1
- j += 1
-
- return result, lindexer, rindexer
diff --git a/pandas/src/joins_func_helper.pxi b/pandas/src/joins_func_helper.pxi
deleted file mode 100644
index 7a59da37c5ced..0000000000000
--- a/pandas/src/joins_func_helper.pxi
+++ /dev/null
@@ -1,373 +0,0 @@
-"""
-Template for each `dtype` helper function for hashtable
-
-WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
-"""
-
-#----------------------------------------------------------------------
-# asof_join_by
-#----------------------------------------------------------------------
-
-
-from hashtable cimport *
-
-
-def asof_join_int64_t_by_object(ndarray[int64_t] left_values,
- ndarray[int64_t] right_values,
- ndarray[object] left_by_values,
- ndarray[object] right_by_values,
- bint allow_exact_matches=1,
- tolerance=None):
-
- cdef:
- Py_ssize_t left_pos, right_pos, left_size, right_size, found_right_pos
- ndarray[int64_t] left_indexer, right_indexer
- bint has_tolerance = 0
- int64_t tolerance_
- PyObjectHashTable hash_table
- object by_value
-
- # if we are using tolerance, set our objects
- if tolerance is not None:
- has_tolerance = 1
- tolerance_ = tolerance
-
- left_size = len(left_values)
- right_size = len(right_values)
-
- left_indexer = np.empty(left_size, dtype=np.int64)
- right_indexer = np.empty(left_size, dtype=np.int64)
-
- hash_table = PyObjectHashTable(right_size)
-
- right_pos = 0
- for left_pos in range(left_size):
- # restart right_pos if it went negative in a previous iteration
- if right_pos < 0:
- right_pos = 0
-
- # find last position in right whose value is less than left's value
- if allow_exact_matches:
- while right_pos < right_size and\
- right_values[right_pos] <= left_values[left_pos]:
- hash_table.set_item(right_by_values[right_pos], right_pos)
- right_pos += 1
- else:
- while right_pos < right_size and\
- right_values[right_pos] < left_values[left_pos]:
- hash_table.set_item(right_by_values[right_pos], right_pos)
- right_pos += 1
- right_pos -= 1
-
- # save positions as the desired index
- by_value = left_by_values[left_pos]
- found_right_pos = hash_table.get_item(by_value)\
- if by_value in hash_table else -1
- left_indexer[left_pos] = left_pos
- right_indexer[left_pos] = found_right_pos
-
- # if needed, verify that tolerance is met
- if has_tolerance and found_right_pos != -1:
- diff = left_values[left_pos] - right_values[found_right_pos]
- if diff > tolerance_:
- right_indexer[left_pos] = -1
-
- return left_indexer, right_indexer
-
-
-def asof_join_double_by_object(ndarray[double] left_values,
- ndarray[double] right_values,
- ndarray[object] left_by_values,
- ndarray[object] right_by_values,
- bint allow_exact_matches=1,
- tolerance=None):
-
- cdef:
- Py_ssize_t left_pos, right_pos, left_size, right_size, found_right_pos
- ndarray[int64_t] left_indexer, right_indexer
- bint has_tolerance = 0
- double tolerance_
- PyObjectHashTable hash_table
- object by_value
-
- # if we are using tolerance, set our objects
- if tolerance is not None:
- has_tolerance = 1
- tolerance_ = tolerance
-
- left_size = len(left_values)
- right_size = len(right_values)
-
- left_indexer = np.empty(left_size, dtype=np.int64)
- right_indexer = np.empty(left_size, dtype=np.int64)
-
- hash_table = PyObjectHashTable(right_size)
-
- right_pos = 0
- for left_pos in range(left_size):
- # restart right_pos if it went negative in a previous iteration
- if right_pos < 0:
- right_pos = 0
-
- # find last position in right whose value is less than left's value
- if allow_exact_matches:
- while right_pos < right_size and\
- right_values[right_pos] <= left_values[left_pos]:
- hash_table.set_item(right_by_values[right_pos], right_pos)
- right_pos += 1
- else:
- while right_pos < right_size and\
- right_values[right_pos] < left_values[left_pos]:
- hash_table.set_item(right_by_values[right_pos], right_pos)
- right_pos += 1
- right_pos -= 1
-
- # save positions as the desired index
- by_value = left_by_values[left_pos]
- found_right_pos = hash_table.get_item(by_value)\
- if by_value in hash_table else -1
- left_indexer[left_pos] = left_pos
- right_indexer[left_pos] = found_right_pos
-
- # if needed, verify that tolerance is met
- if has_tolerance and found_right_pos != -1:
- diff = left_values[left_pos] - right_values[found_right_pos]
- if diff > tolerance_:
- right_indexer[left_pos] = -1
-
- return left_indexer, right_indexer
-
-
-def asof_join_int64_t_by_int64_t(ndarray[int64_t] left_values,
- ndarray[int64_t] right_values,
- ndarray[int64_t] left_by_values,
- ndarray[int64_t] right_by_values,
- bint allow_exact_matches=1,
- tolerance=None):
-
- cdef:
- Py_ssize_t left_pos, right_pos, left_size, right_size, found_right_pos
- ndarray[int64_t] left_indexer, right_indexer
- bint has_tolerance = 0
- int64_t tolerance_
- Int64HashTable hash_table
- int64_t by_value
-
- # if we are using tolerance, set our objects
- if tolerance is not None:
- has_tolerance = 1
- tolerance_ = tolerance
-
- left_size = len(left_values)
- right_size = len(right_values)
-
- left_indexer = np.empty(left_size, dtype=np.int64)
- right_indexer = np.empty(left_size, dtype=np.int64)
-
- hash_table = Int64HashTable(right_size)
-
- right_pos = 0
- for left_pos in range(left_size):
- # restart right_pos if it went negative in a previous iteration
- if right_pos < 0:
- right_pos = 0
-
- # find last position in right whose value is less than left's value
- if allow_exact_matches:
- while right_pos < right_size and\
- right_values[right_pos] <= left_values[left_pos]:
- hash_table.set_item(right_by_values[right_pos], right_pos)
- right_pos += 1
- else:
- while right_pos < right_size and\
- right_values[right_pos] < left_values[left_pos]:
- hash_table.set_item(right_by_values[right_pos], right_pos)
- right_pos += 1
- right_pos -= 1
-
- # save positions as the desired index
- by_value = left_by_values[left_pos]
- found_right_pos = hash_table.get_item(by_value)\
- if by_value in hash_table else -1
- left_indexer[left_pos] = left_pos
- right_indexer[left_pos] = found_right_pos
-
- # if needed, verify that tolerance is met
- if has_tolerance and found_right_pos != -1:
- diff = left_values[left_pos] - right_values[found_right_pos]
- if diff > tolerance_:
- right_indexer[left_pos] = -1
-
- return left_indexer, right_indexer
-
-
-def asof_join_double_by_int64_t(ndarray[double] left_values,
- ndarray[double] right_values,
- ndarray[int64_t] left_by_values,
- ndarray[int64_t] right_by_values,
- bint allow_exact_matches=1,
- tolerance=None):
-
- cdef:
- Py_ssize_t left_pos, right_pos, left_size, right_size, found_right_pos
- ndarray[int64_t] left_indexer, right_indexer
- bint has_tolerance = 0
- double tolerance_
- Int64HashTable hash_table
- int64_t by_value
-
- # if we are using tolerance, set our objects
- if tolerance is not None:
- has_tolerance = 1
- tolerance_ = tolerance
-
- left_size = len(left_values)
- right_size = len(right_values)
-
- left_indexer = np.empty(left_size, dtype=np.int64)
- right_indexer = np.empty(left_size, dtype=np.int64)
-
- hash_table = Int64HashTable(right_size)
-
- right_pos = 0
- for left_pos in range(left_size):
- # restart right_pos if it went negative in a previous iteration
- if right_pos < 0:
- right_pos = 0
-
- # find last position in right whose value is less than left's value
- if allow_exact_matches:
- while right_pos < right_size and\
- right_values[right_pos] <= left_values[left_pos]:
- hash_table.set_item(right_by_values[right_pos], right_pos)
- right_pos += 1
- else:
- while right_pos < right_size and\
- right_values[right_pos] < left_values[left_pos]:
- hash_table.set_item(right_by_values[right_pos], right_pos)
- right_pos += 1
- right_pos -= 1
-
- # save positions as the desired index
- by_value = left_by_values[left_pos]
- found_right_pos = hash_table.get_item(by_value)\
- if by_value in hash_table else -1
- left_indexer[left_pos] = left_pos
- right_indexer[left_pos] = found_right_pos
-
- # if needed, verify that tolerance is met
- if has_tolerance and found_right_pos != -1:
- diff = left_values[left_pos] - right_values[found_right_pos]
- if diff > tolerance_:
- right_indexer[left_pos] = -1
-
- return left_indexer, right_indexer
-
-
-#----------------------------------------------------------------------
-# asof_join
-#----------------------------------------------------------------------
-
-
-def asof_join_int64_t(ndarray[int64_t] left_values,
- ndarray[int64_t] right_values,
- bint allow_exact_matches=1,
- tolerance=None):
-
- cdef:
- Py_ssize_t left_pos, right_pos, left_size, right_size
- ndarray[int64_t] left_indexer, right_indexer
- bint has_tolerance = 0
- int64_t tolerance_
-
- # if we are using tolerance, set our objects
- if tolerance is not None:
- has_tolerance = 1
- tolerance_ = tolerance
-
- left_size = len(left_values)
- right_size = len(right_values)
-
- left_indexer = np.empty(left_size, dtype=np.int64)
- right_indexer = np.empty(left_size, dtype=np.int64)
-
- right_pos = 0
- for left_pos in range(left_size):
- # restart right_pos if it went negative in a previous iteration
- if right_pos < 0:
- right_pos = 0
-
- # find last position in right whose value is less than left's value
- if allow_exact_matches:
- while right_pos < right_size and\
- right_values[right_pos] <= left_values[left_pos]:
- right_pos += 1
- else:
- while right_pos < right_size and\
- right_values[right_pos] < left_values[left_pos]:
- right_pos += 1
- right_pos -= 1
-
- # save positions as the desired index
- left_indexer[left_pos] = left_pos
- right_indexer[left_pos] = right_pos
-
- # if needed, verify that tolerance is met
- if has_tolerance and right_pos != -1:
- diff = left_values[left_pos] - right_values[right_pos]
- if diff > tolerance_:
- right_indexer[left_pos] = -1
-
- return left_indexer, right_indexer
-
-
-def asof_join_double(ndarray[double] left_values,
- ndarray[double] right_values,
- bint allow_exact_matches=1,
- tolerance=None):
-
- cdef:
- Py_ssize_t left_pos, right_pos, left_size, right_size
- ndarray[int64_t] left_indexer, right_indexer
- bint has_tolerance = 0
- double tolerance_
-
- # if we are using tolerance, set our objects
- if tolerance is not None:
- has_tolerance = 1
- tolerance_ = tolerance
-
- left_size = len(left_values)
- right_size = len(right_values)
-
- left_indexer = np.empty(left_size, dtype=np.int64)
- right_indexer = np.empty(left_size, dtype=np.int64)
-
- right_pos = 0
- for left_pos in range(left_size):
- # restart right_pos if it went negative in a previous iteration
- if right_pos < 0:
- right_pos = 0
-
- # find last position in right whose value is less than left's value
- if allow_exact_matches:
- while right_pos < right_size and\
- right_values[right_pos] <= left_values[left_pos]:
- right_pos += 1
- else:
- while right_pos < right_size and\
- right_values[right_pos] < left_values[left_pos]:
- right_pos += 1
- right_pos -= 1
-
- # save positions as the desired index
- left_indexer[left_pos] = left_pos
- right_indexer[left_pos] = right_pos
-
- # if needed, verify that tolerance is met
- if has_tolerance and right_pos != -1:
- diff = left_values[left_pos] - right_values[right_pos]
- if diff > tolerance_:
- right_indexer[left_pos] = -1
-
- return left_indexer, right_indexer
diff --git a/pandas/src/msgpack/unpack_template.h b/pandas/src/msgpack/unpack_template.h
index 95af6735520fc..fba372ddcb3e4 100644
--- a/pandas/src/msgpack/unpack_template.h
+++ b/pandas/src/msgpack/unpack_template.h
@@ -17,7 +17,7 @@
*/
#ifndef USE_CASE_RANGE
-#if !defined(_MSC_VER)
+#ifdef __GNUC__
#define USE_CASE_RANGE
#endif
#endif
diff --git a/pandas/src/numpy_helper.h b/pandas/src/numpy_helper.h
index 9f406890c4e68..17d5ec12f4f79 100644
--- a/pandas/src/numpy_helper.h
+++ b/pandas/src/numpy_helper.h
@@ -1,7 +1,19 @@
+/*
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+*/
+
+#ifndef PANDAS_SRC_NUMPY_HELPER_H_
+#define PANDAS_SRC_NUMPY_HELPER_H_
+
#include "Python.h"
+#include "helper.h"
#include "numpy/arrayobject.h"
#include "numpy/arrayscalars.h"
-#include "helper.h"
#define PANDAS_FLOAT 0
#define PANDAS_INT 1
@@ -10,111 +22,87 @@
#define PANDAS_OBJECT 4
#define PANDAS_DATETIME 5
-PANDAS_INLINE int
-infer_type(PyObject* obj) {
- if (PyBool_Check(obj)) {
- return PANDAS_BOOL;
- }
- else if (PyArray_IsIntegerScalar(obj)) {
- return PANDAS_INT;
- }
- else if (PyArray_IsScalar(obj, Datetime)) {
- return PANDAS_DATETIME;
- }
- else if (PyFloat_Check(obj) || PyArray_IsScalar(obj, Floating)) {
- return PANDAS_FLOAT;
- }
- else if (PyString_Check(obj) || PyUnicode_Check(obj)) {
- return PANDAS_STRING;
- }
- else {
- return PANDAS_OBJECT;
- }
+PANDAS_INLINE int infer_type(PyObject* obj) {
+ if (PyBool_Check(obj)) {
+ return PANDAS_BOOL;
+ } else if (PyArray_IsIntegerScalar(obj)) {
+ return PANDAS_INT;
+ } else if (PyArray_IsScalar(obj, Datetime)) {
+ return PANDAS_DATETIME;
+ } else if (PyFloat_Check(obj) || PyArray_IsScalar(obj, Floating)) {
+ return PANDAS_FLOAT;
+ } else if (PyString_Check(obj) || PyUnicode_Check(obj)) {
+ return PANDAS_STRING;
+ } else {
+ return PANDAS_OBJECT;
+ }
}
-PANDAS_INLINE npy_int64
-get_nat(void) {
- return NPY_MIN_INT64;
-}
+PANDAS_INLINE npy_int64 get_nat(void) { return NPY_MIN_INT64; }
-PANDAS_INLINE npy_datetime
-get_datetime64_value(PyObject* obj) {
- return ((PyDatetimeScalarObject*) obj)->obval;
+PANDAS_INLINE npy_datetime get_datetime64_value(PyObject* obj) {
+ return ((PyDatetimeScalarObject*)obj)->obval;
}
-PANDAS_INLINE npy_timedelta
-get_timedelta64_value(PyObject* obj) {
- return ((PyTimedeltaScalarObject*) obj)->obval;
+PANDAS_INLINE npy_timedelta get_timedelta64_value(PyObject* obj) {
+ return ((PyTimedeltaScalarObject*)obj)->obval;
}
-PANDAS_INLINE int
-is_integer_object(PyObject* obj) {
- return (!PyBool_Check(obj)) && PyArray_IsIntegerScalar(obj);
-// return PyArray_IsIntegerScalar(obj);
+PANDAS_INLINE int is_integer_object(PyObject* obj) {
+ return (!PyBool_Check(obj)) && PyArray_IsIntegerScalar(obj);
}
-PANDAS_INLINE int
-is_float_object(PyObject* obj) {
- return (PyFloat_Check(obj) || PyArray_IsScalar(obj, Floating));
+PANDAS_INLINE int is_float_object(PyObject* obj) {
+ return (PyFloat_Check(obj) || PyArray_IsScalar(obj, Floating));
}
-PANDAS_INLINE int
-is_complex_object(PyObject* obj) {
- return (PyComplex_Check(obj) || PyArray_IsScalar(obj, ComplexFloating));
+PANDAS_INLINE int is_complex_object(PyObject* obj) {
+ return (PyComplex_Check(obj) || PyArray_IsScalar(obj, ComplexFloating));
}
-PANDAS_INLINE int
-is_bool_object(PyObject* obj) {
- return (PyBool_Check(obj) || PyArray_IsScalar(obj, Bool));
+PANDAS_INLINE int is_bool_object(PyObject* obj) {
+ return (PyBool_Check(obj) || PyArray_IsScalar(obj, Bool));
}
-PANDAS_INLINE int
-is_string_object(PyObject* obj) {
- return (PyString_Check(obj) || PyUnicode_Check(obj));
+PANDAS_INLINE int is_string_object(PyObject* obj) {
+ return (PyString_Check(obj) || PyUnicode_Check(obj));
}
-PANDAS_INLINE int
-is_datetime64_object(PyObject *obj) {
- return PyArray_IsScalar(obj, Datetime);
+PANDAS_INLINE int is_datetime64_object(PyObject* obj) {
+ return PyArray_IsScalar(obj, Datetime);
}
-PANDAS_INLINE int
-is_timedelta64_object(PyObject *obj) {
- return PyArray_IsScalar(obj, Timedelta);
+PANDAS_INLINE int is_timedelta64_object(PyObject* obj) {
+ return PyArray_IsScalar(obj, Timedelta);
}
-PANDAS_INLINE int
-assign_value_1d(PyArrayObject* ap, Py_ssize_t _i, PyObject* v) {
- npy_intp i = (npy_intp) _i;
- char *item = (char *) PyArray_DATA(ap) + i * PyArray_STRIDE(ap, 0);
- return PyArray_DESCR(ap)->f->setitem(v, item, ap);
+PANDAS_INLINE int assign_value_1d(PyArrayObject* ap, Py_ssize_t _i,
+ PyObject* v) {
+ npy_intp i = (npy_intp)_i;
+ char* item = (char*)PyArray_DATA(ap) + i * PyArray_STRIDE(ap, 0);
+ return PyArray_DESCR(ap)->f->setitem(v, item, ap);
}
-PANDAS_INLINE PyObject*
-get_value_1d(PyArrayObject* ap, Py_ssize_t i) {
- char *item = (char *) PyArray_DATA(ap) + i * PyArray_STRIDE(ap, 0);
- return PyArray_Scalar(item, PyArray_DESCR(ap), (PyObject*) ap);
+PANDAS_INLINE PyObject* get_value_1d(PyArrayObject* ap, Py_ssize_t i) {
+ char* item = (char*)PyArray_DATA(ap) + i * PyArray_STRIDE(ap, 0);
+ return PyArray_Scalar(item, PyArray_DESCR(ap), (PyObject*)ap);
}
-
-PANDAS_INLINE char*
-get_c_string(PyObject* obj) {
+PANDAS_INLINE char* get_c_string(PyObject* obj) {
#if PY_VERSION_HEX >= 0x03000000
- PyObject* enc_str = PyUnicode_AsEncodedString(obj, "utf-8", "error");
+ PyObject* enc_str = PyUnicode_AsEncodedString(obj, "utf-8", "error");
- char *ret;
- ret = PyBytes_AS_STRING(enc_str);
+ char* ret;
+ ret = PyBytes_AS_STRING(enc_str);
- // TODO: memory leak here
+ // TODO(general): memory leak here
- // Py_XDECREF(enc_str);
- return ret;
+ return ret;
#else
- return PyString_AsString(obj);
+ return PyString_AsString(obj);
#endif
}
-PANDAS_INLINE PyObject*
-char_to_string(char* data) {
+PANDAS_INLINE PyObject* char_to_string(char* data) {
#if PY_VERSION_HEX >= 0x03000000
return PyUnicode_FromString(data);
#else
@@ -122,61 +110,47 @@ char_to_string(char* data) {
#endif
}
-// PANDAS_INLINE int
-// is_string(PyObject* obj) {
-// #if PY_VERSION_HEX >= 0x03000000
-// return PyUnicode_Check(obj);
-// #else
-// return PyString_Check(obj);
-// #endif
-
-PyObject* sarr_from_data(PyArray_Descr *descr, int length, void* data) {
- PyArrayObject *result;
+PyObject* sarr_from_data(PyArray_Descr* descr, int length, void* data) {
+ PyArrayObject* result;
npy_intp dims[1] = {length};
- Py_INCREF(descr); // newfromdescr steals a reference to descr
- result = (PyArrayObject*) PyArray_NewFromDescr(&PyArray_Type, descr, 1, dims,
- NULL, data, 0, NULL);
+ Py_INCREF(descr); // newfromdescr steals a reference to descr
+ result = (PyArrayObject*)PyArray_NewFromDescr(&PyArray_Type, descr, 1, dims,
+ NULL, data, 0, NULL);
// Returned array doesn't own data by default
result->flags |= NPY_OWNDATA;
- return (PyObject*) result;
+ return (PyObject*)result;
}
-
-void transfer_object_column(char *dst, char *src, size_t stride,
+void transfer_object_column(char* dst, char* src, size_t stride,
size_t length) {
int i;
size_t sz = sizeof(PyObject*);
- for (i = 0; i < length; ++i)
- {
+ for (i = 0; i < length; ++i) {
// uninitialized data
// Py_XDECREF(*((PyObject**) dst));
memcpy(dst, src, sz);
- Py_INCREF(*((PyObject**) dst));
+ Py_INCREF(*((PyObject**)dst));
src += sz;
dst += stride;
}
}
-void set_array_owndata(PyArrayObject *ao) {
- ao->flags |= NPY_OWNDATA;
-}
+void set_array_owndata(PyArrayObject* ao) { ao->flags |= NPY_OWNDATA; }
-void set_array_not_contiguous(PyArrayObject *ao) {
+void set_array_not_contiguous(PyArrayObject* ao) {
ao->flags &= ~(NPY_C_CONTIGUOUS | NPY_F_CONTIGUOUS);
}
-
// If arr is zerodim array, return a proper array scalar (e.g. np.int64).
// Otherwise, return arr as is.
-PANDAS_INLINE PyObject*
-unbox_if_zerodim(PyObject* arr) {
+PANDAS_INLINE PyObject* unbox_if_zerodim(PyObject* arr) {
if (PyArray_IsZeroDim(arr)) {
- PyObject *ret;
+ PyObject* ret;
ret = PyArray_ToScalar(PyArray_DATA(arr), arr);
return ret;
} else {
@@ -185,20 +159,4 @@ unbox_if_zerodim(PyObject* arr) {
}
}
-
-// PANDAS_INLINE PyObject*
-// get_base_ndarray(PyObject* ap) {
-// // if (!ap || (NULL == ap)) {
-// // Py_RETURN_NONE;
-// // }
-
-// while (!PyArray_CheckExact(ap)) {
-// ap = PyArray_BASE((PyArrayObject*) ap);
-// if (ap == Py_None) Py_RETURN_NONE;
-// }
-// // PyArray_BASE is a borrowed reference
-// if(ap) {
-// Py_INCREF(ap);
-// }
-// return ap;
-// }
+#endif // PANDAS_SRC_NUMPY_HELPER_H_
diff --git a/pandas/src/parse_helper.h b/pandas/src/parse_helper.h
index e565f02f27c88..5d2a0dad3da17 100644
--- a/pandas/src/parse_helper.h
+++ b/pandas/src/parse_helper.h
@@ -1,3 +1,15 @@
+/*
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+*/
+
+#ifndef PANDAS_SRC_PARSE_HELPER_H_
+#define PANDAS_SRC_PARSE_HELPER_H_
+
#include
#include
#include "headers/portable.h"
@@ -5,8 +17,8 @@
static double xstrtod(const char *p, char **q, char decimal, char sci,
int skip_trailing, int *maybe_int);
-int to_double(char *item, double *p_value, char sci, char decimal, int *maybe_int)
-{
+int to_double(char *item, double *p_value, char sci, char decimal,
+ int *maybe_int) {
char *p_end = NULL;
*p_value = xstrtod(item, &p_end, decimal, sci, 1, maybe_int);
@@ -15,14 +27,14 @@ int to_double(char *item, double *p_value, char sci, char decimal, int *maybe_in
}
#if PY_VERSION_HEX < 0x02060000
- #define PyBytes_Check PyString_Check
- #define PyBytes_AS_STRING PyString_AS_STRING
+#define PyBytes_Check PyString_Check
+#define PyBytes_AS_STRING PyString_AS_STRING
#endif
-int floatify(PyObject* str, double *result, int *maybe_int) {
+int floatify(PyObject *str, double *result, int *maybe_int) {
int status;
char *data;
- PyObject* tmp = NULL;
+ PyObject *tmp = NULL;
const char sci = 'E';
const char dec = '.';
@@ -70,17 +82,15 @@ int floatify(PyObject* str, double *result, int *maybe_int) {
Py_XDECREF(tmp);
return -1;
-/*
-#if PY_VERSION_HEX >= 0x03000000
- return PyFloat_FromString(str);
-#else
- return PyFloat_FromString(str, NULL);
-#endif
-*/
-
+ /*
+ #if PY_VERSION_HEX >= 0x03000000
+ return PyFloat_FromString(str);
+ #else
+ return PyFloat_FromString(str, NULL);
+ #endif
+ */
}
-
// ---------------------------------------------------------------------------
// Implementation of xstrtod
@@ -104,10 +114,12 @@ int floatify(PyObject* str, double *result, int *maybe_int) {
// may be used to endorse or promote products derived from this software
// without specific prior written permission.
//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND
// ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+// LIABLE
// FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
// DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
// OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
@@ -125,149 +137,137 @@ int floatify(PyObject* str, double *result, int *maybe_int) {
//
PANDAS_INLINE void lowercase(char *p) {
- for ( ; *p; ++p) *p = tolower(*p);
+ for (; *p; ++p) *p = tolower(*p);
}
PANDAS_INLINE void uppercase(char *p) {
- for ( ; *p; ++p) *p = toupper(*p);
+ for (; *p; ++p) *p = toupper(*p);
}
+static double xstrtod(const char *str, char **endptr, char decimal, char sci,
+ int skip_trailing, int *maybe_int) {
+ double number;
+ int exponent;
+ int negative;
+ char *p = (char *)str;
+ double p10;
+ int n;
+ int num_digits;
+ int num_decimals;
+
+ errno = 0;
+ *maybe_int = 1;
-static double xstrtod(const char *str, char **endptr, char decimal,
- char sci, int skip_trailing, int *maybe_int)
-{
- double number;
- int exponent;
- int negative;
- char *p = (char *) str;
- double p10;
- int n;
- int num_digits;
- int num_decimals;
-
- errno = 0;
- *maybe_int = 1;
-
- // Skip leading whitespace
- while (isspace(*p)) p++;
-
- // Handle optional sign
- negative = 0;
- switch (*p)
- {
- case '-': negative = 1; // Fall through to increment position
- case '+': p++;
- }
-
- number = 0.;
- exponent = 0;
- num_digits = 0;
- num_decimals = 0;
-
- // Process string of digits
- while (isdigit(*p))
- {
- number = number * 10. + (*p - '0');
- p++;
- num_digits++;
- }
-
- // Process decimal part
- if (*p == decimal)
- {
- *maybe_int = 0;
- p++;
-
- while (isdigit(*p))
- {
- number = number * 10. + (*p - '0');
- p++;
- num_digits++;
- num_decimals++;
+ // Skip leading whitespace
+ while (isspace(*p)) p++;
+
+ // Handle optional sign
+ negative = 0;
+ switch (*p) {
+ case '-':
+ negative = 1; // Fall through to increment position
+ case '+':
+ p++;
}
- exponent -= num_decimals;
- }
+ number = 0.;
+ exponent = 0;
+ num_digits = 0;
+ num_decimals = 0;
- if (num_digits == 0)
- {
- errno = ERANGE;
- return 0.0;
- }
+ // Process string of digits
+ while (isdigit(*p)) {
+ number = number * 10. + (*p - '0');
+ p++;
+ num_digits++;
+ }
- // Correct for sign
- if (negative) number = -number;
+ // Process decimal part
+ if (*p == decimal) {
+ *maybe_int = 0;
+ p++;
- // Process an exponent string
- if (toupper(*p) == toupper(sci))
- {
- *maybe_int = 0;
+ while (isdigit(*p)) {
+ number = number * 10. + (*p - '0');
+ p++;
+ num_digits++;
+ num_decimals++;
+ }
- // Handle optional sign
- negative = 0;
- switch (*++p)
- {
- case '-': negative = 1; // Fall through to increment pos
- case '+': p++;
+ exponent -= num_decimals;
}
- // Process string of digits
- num_digits = 0;
- n = 0;
- while (isdigit(*p))
- {
- n = n * 10 + (*p - '0');
- num_digits++;
- p++;
+ if (num_digits == 0) {
+ errno = ERANGE;
+ return 0.0;
}
- if (negative)
- exponent -= n;
- else
- exponent += n;
-
- // If no digits, after the 'e'/'E', un-consume it
- if (num_digits == 0)
- p--;
- }
-
-
- if (exponent < DBL_MIN_EXP || exponent > DBL_MAX_EXP)
- {
-
- errno = ERANGE;
- return HUGE_VAL;
- }
-
- // Scale the result
- p10 = 10.;
- n = exponent;
- if (n < 0) n = -n;
- while (n)
- {
- if (n & 1)
- {
- if (exponent < 0)
- number /= p10;
- else
- number *= p10;
+ // Correct for sign
+ if (negative) number = -number;
+
+ // Process an exponent string
+ if (toupper(*p) == toupper(sci)) {
+ *maybe_int = 0;
+
+ // Handle optional sign
+ negative = 0;
+ switch (*++p) {
+ case '-':
+ negative = 1; // Fall through to increment pos
+ case '+':
+ p++;
+ }
+
+ // Process string of digits
+ num_digits = 0;
+ n = 0;
+ while (isdigit(*p)) {
+ n = n * 10 + (*p - '0');
+ num_digits++;
+ p++;
+ }
+
+ if (negative)
+ exponent -= n;
+ else
+ exponent += n;
+
+ // If no digits, after the 'e'/'E', un-consume it
+ if (num_digits == 0) p--;
}
- n >>= 1;
- p10 *= p10;
- }
+ if (exponent < DBL_MIN_EXP || exponent > DBL_MAX_EXP) {
+ errno = ERANGE;
+ return HUGE_VAL;
+ }
- if (number == HUGE_VAL) {
- errno = ERANGE;
- }
+ // Scale the result
+ p10 = 10.;
+ n = exponent;
+ if (n < 0) n = -n;
+ while (n) {
+ if (n & 1) {
+ if (exponent < 0)
+ number /= p10;
+ else
+ number *= p10;
+ }
+ n >>= 1;
+ p10 *= p10;
+ }
- if (skip_trailing) {
- // Skip trailing whitespace
- while (isspace(*p)) p++;
- }
+ if (number == HUGE_VAL) {
+ errno = ERANGE;
+ }
- if (endptr) *endptr = p;
+ if (skip_trailing) {
+ // Skip trailing whitespace
+ while (isspace(*p)) p++;
+ }
+ if (endptr) *endptr = p;
- return number;
+ return number;
}
+
+#endif // PANDAS_SRC_PARSE_HELPER_H_
diff --git a/pandas/src/parser/io.c b/pandas/src/parser/io.c
index 566de72804968..562d6033ce3eb 100644
--- a/pandas/src/parser/io.c
+++ b/pandas/src/parser/io.c
@@ -1,12 +1,20 @@
-#include "io.h"
+/*
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+*/
- /*
- On-disk FILE, uncompressed
- */
+#include "io.h"
+/*
+ On-disk FILE, uncompressed
+*/
void *new_file_source(char *fname, size_t buffer_size) {
- file_source *fs = (file_source *) malloc(sizeof(file_source));
+ file_source *fs = (file_source *)malloc(sizeof(file_source));
fs->fp = fopen(fname, "rb");
if (fs->fp == NULL) {
@@ -18,7 +26,7 @@ void *new_file_source(char *fname, size_t buffer_size) {
fs->initial_file_pos = ftell(fs->fp);
// Only allocate this heap memory if we are not memory-mapping the file
- fs->buffer = (char*) malloc((buffer_size + 1) * sizeof(char));
+ fs->buffer = (char *)malloc((buffer_size + 1) * sizeof(char));
if (fs->buffer == NULL) {
return NULL;
@@ -27,25 +35,11 @@ void *new_file_source(char *fname, size_t buffer_size) {
memset(fs->buffer, 0, buffer_size + 1);
fs->buffer[buffer_size] = '\0';
- return (void *) fs;
+ return (void *)fs;
}
-
-// XXX handle on systems without the capability
-
-
-/*
- * void *new_file_buffer(FILE *f, int buffer_size)
- *
- * Allocate a new file_buffer.
- * Returns NULL if the memory allocation fails or if the call to mmap fails.
- *
- * buffer_size is ignored.
- */
-
-
-void* new_rd_source(PyObject *obj) {
- rd_source *rds = (rd_source *) malloc(sizeof(rd_source));
+void *new_rd_source(PyObject *obj) {
+ rd_source *rds = (rd_source *)malloc(sizeof(rd_source));
/* hold on to this object */
Py_INCREF(obj);
@@ -53,7 +47,7 @@ void* new_rd_source(PyObject *obj) {
rds->buffer = NULL;
rds->position = 0;
- return (void*) rds;
+ return (void *)rds;
}
/*
@@ -63,9 +57,7 @@ void* new_rd_source(PyObject *obj) {
*/
int del_file_source(void *fs) {
- // fseek(FS(fs)->fp, FS(fs)->initial_file_pos, SEEK_SET);
- if (fs == NULL)
- return 0;
+ if (fs == NULL) return 0;
/* allocated on the heap */
free(FS(fs)->buffer);
@@ -89,13 +81,11 @@ int del_rd_source(void *rds) {
*/
-
-void* buffer_file_bytes(void *source, size_t nbytes,
- size_t *bytes_read, int *status) {
+void *buffer_file_bytes(void *source, size_t nbytes, size_t *bytes_read,
+ int *status) {
file_source *src = FS(source);
- *bytes_read = fread((void*) src->buffer, sizeof(char), nbytes,
- src->fp);
+ *bytes_read = fread((void *)src->buffer, sizeof(char), nbytes, src->fp);
if (*bytes_read == 0) {
*status = REACHED_EOF;
@@ -103,13 +93,11 @@ void* buffer_file_bytes(void *source, size_t nbytes,
*status = 0;
}
- return (void*) src->buffer;
-
+ return (void *)src->buffer;
}
-
-void* buffer_rd_bytes(void *source, size_t nbytes,
- size_t *bytes_read, int *status) {
+void *buffer_rd_bytes(void *source, size_t nbytes, size_t *bytes_read,
+ int *status) {
PyGILState_STATE state;
PyObject *result, *func, *args, *tmp;
@@ -125,21 +113,18 @@ void* buffer_rd_bytes(void *source, size_t nbytes,
args = Py_BuildValue("(i)", nbytes);
func = PyObject_GetAttrString(src->obj, "read");
- /* printf("%s\n", PyBytes_AsString(PyObject_Repr(func))); */
/* TODO: does this release the GIL? */
result = PyObject_CallObject(func, args);
Py_XDECREF(args);
Py_XDECREF(func);
- /* PyObject_Print(PyObject_Type(result), stdout, 0); */
if (result == NULL) {
PyGILState_Release(state);
*bytes_read = 0;
*status = CALLING_READ_FAILED;
return NULL;
- }
- else if (!PyBytes_Check(result)) {
+ } else if (!PyBytes_Check(result)) {
tmp = PyUnicode_AsUTF8String(result);
Py_XDECREF(result);
result = tmp;
@@ -154,8 +139,7 @@ void* buffer_rd_bytes(void *source, size_t nbytes,
/* hang on to the Python object */
src->buffer = result;
- retval = (void*) PyBytes_AsString(result);
-
+ retval = (void *)PyBytes_AsString(result);
PyGILState_Release(state);
@@ -165,21 +149,18 @@ void* buffer_rd_bytes(void *source, size_t nbytes,
return retval;
}
-
#ifdef HAVE_MMAP
-#include
#include
+#include
-void *new_mmap(char *fname)
-{
+void *new_mmap(char *fname) {
struct stat buf;
int fd;
memory_map *mm;
- /* off_t position; */
off_t filesize;
- mm = (memory_map *) malloc(sizeof(memory_map));
+ mm = (memory_map *)malloc(sizeof(memory_map));
mm->fp = fopen(fname, "rb");
fd = fileno(mm->fp);
@@ -187,20 +168,19 @@ void *new_mmap(char *fname)
fprintf(stderr, "new_file_buffer: fstat() failed. errno =%d\n", errno);
return NULL;
}
- filesize = buf.st_size; /* XXX This might be 32 bits. */
-
+ filesize = buf.st_size; /* XXX This might be 32 bits. */
if (mm == NULL) {
/* XXX Eventually remove this print statement. */
fprintf(stderr, "new_file_buffer: malloc() failed.\n");
return NULL;
}
- mm->size = (off_t) filesize;
+ mm->size = (off_t)filesize;
mm->line_number = 0;
mm->fileno = fd;
mm->position = ftell(mm->fp);
- mm->last_pos = (off_t) filesize;
+ mm->last_pos = (off_t)filesize;
mm->memmap = mmap(NULL, filesize, PROT_READ, MAP_SHARED, fd, 0);
if (mm->memmap == NULL) {
@@ -210,30 +190,20 @@ void *new_mmap(char *fname)
mm = NULL;
}
- return (void*) mm;
+ return (void *)mm;
}
-
-int del_mmap(void *src)
-{
+int del_mmap(void *src) {
munmap(MM(src)->memmap, MM(src)->size);
fclose(MM(src)->fp);
-
- /*
- * With a memory mapped file, there is no need to do
- * anything if restore == RESTORE_INITIAL.
- */
- /* if (restore == RESTORE_FINAL) { */
- /* fseek(FB(fb)->file, FB(fb)->current_pos, SEEK_SET); */
- /* } */
free(src);
return 0;
}
-void* buffer_mmap_bytes(void *source, size_t nbytes,
- size_t *bytes_read, int *status) {
+void *buffer_mmap_bytes(void *source, size_t nbytes, size_t *bytes_read,
+ int *status) {
void *retval;
memory_map *src = MM(source);
@@ -264,19 +234,15 @@ void* buffer_mmap_bytes(void *source, size_t nbytes,
/* kludgy */
-void *new_mmap(char *fname) {
- return NULL;
-}
+void *new_mmap(char *fname) { return NULL; }
-int del_mmap(void *src) {
- return 0;
-}
+int del_mmap(void *src) { return 0; }
/* don't use this! */
-void* buffer_mmap_bytes(void *source, size_t nbytes,
- size_t *bytes_read, int *status) {
- return NULL;
+void *buffer_mmap_bytes(void *source, size_t nbytes, size_t *bytes_read,
+ int *status) {
+ return NULL;
}
#endif
diff --git a/pandas/src/parser/io.h b/pandas/src/parser/io.h
index 2ae72ff8a7fe0..5a0c2b2b5e4a4 100644
--- a/pandas/src/parser/io.h
+++ b/pandas/src/parser/io.h
@@ -1,14 +1,23 @@
+/*
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
+*/
+
+#ifndef PANDAS_SRC_PARSER_IO_H_
+#define PANDAS_SRC_PARSER_IO_H_
+
#include "Python.h"
#include "tokenizer.h"
-
typedef struct _file_source {
/* The file being read. */
FILE *fp;
char *buffer;
- /* Size of the file, in bytes. */
- /* off_t size; */
/* file position when the file_buffer was created. */
off_t initial_file_pos;
@@ -16,15 +25,9 @@ typedef struct _file_source {
/* Offset in the file of the data currently in the buffer. */
off_t buffer_file_pos;
- /* Actual number of bytes in the current buffer. (Can be less than buffer_size.) */
+ /* Actual number of bytes in the current buffer. (Can be less than
+ * buffer_size.) */
off_t last_pos;
-
- /* Size (in bytes) of the buffer. */
- // off_t buffer_size;
-
- /* Pointer to the buffer. */
- // char *buffer;
-
} file_source;
#define FS(source) ((file_source *)source)
@@ -34,7 +37,6 @@ typedef struct _file_source {
#endif
typedef struct _memory_map {
-
FILE *fp;
/* Size of the file, in bytes. */
@@ -49,22 +51,20 @@ typedef struct _memory_map {
off_t position;
off_t last_pos;
char *memmap;
-
} memory_map;
-#define MM(src) ((memory_map*) src)
+#define MM(src) ((memory_map *)src)
void *new_mmap(char *fname);
int del_mmap(void *src);
-void* buffer_mmap_bytes(void *source, size_t nbytes,
- size_t *bytes_read, int *status);
-
+void *buffer_mmap_bytes(void *source, size_t nbytes, size_t *bytes_read,
+ int *status);
typedef struct _rd_source {
- PyObject* obj;
- PyObject* buffer;
+ PyObject *obj;
+ PyObject *buffer;
size_t position;
} rd_source;
@@ -77,9 +77,10 @@ void *new_rd_source(PyObject *obj);
int del_file_source(void *src);
int del_rd_source(void *src);
-void* buffer_file_bytes(void *source, size_t nbytes,
- size_t *bytes_read, int *status);
+void *buffer_file_bytes(void *source, size_t nbytes, size_t *bytes_read,
+ int *status);
-void* buffer_rd_bytes(void *source, size_t nbytes,
- size_t *bytes_read, int *status);
+void *buffer_rd_bytes(void *source, size_t nbytes, size_t *bytes_read,
+ int *status);
+#endif // PANDAS_SRC_PARSER_IO_H_
diff --git a/pandas/src/parser/tokenizer.c b/pandas/src/parser/tokenizer.c
index af85b7b894d26..1ea62d66345bd 100644
--- a/pandas/src/parser/tokenizer.c
+++ b/pandas/src/parser/tokenizer.c
@@ -9,61 +9,33 @@ See LICENSE for the license
*/
- /*
- Low-level ascii-file processing for pandas. Combines some elements from
- Python's built-in csv module and Warren Weckesser's textreader project on
- GitHub. See Python Software Foundation License and BSD licenses for these.
+/*
- */
+Low-level ascii-file processing for pandas. Combines some elements from
+Python's built-in csv module and Warren Weckesser's textreader project on
+GitHub. See Python Software Foundation License and BSD licenses for these.
+*/
#include "tokenizer.h"
#include
-#include
#include
-
-
-//#define READ_ERROR_OUT_OF_MEMORY 1
-
-
-/*
-* restore:
-* RESTORE_NOT (0):
-* Free memory, but leave the file position wherever it
-* happend to be.
-* RESTORE_INITIAL (1):
-* Restore the file position to the location at which
-* the file_buffer was created.
-* RESTORE_FINAL (2):
-* Put the file position at the next byte after the
-* data read from the file_buffer.
-*
-#define RESTORE_NOT 0
-#define RESTORE_INITIAL 1
-#define RESTORE_FINAL 2
-*/
+#include
static void *safe_realloc(void *buffer, size_t size) {
void *result;
- // OS X is weird
+ // OSX is weird.
// http://stackoverflow.com/questions/9560609/
// different-realloc-behaviour-in-linux-and-osx
result = realloc(buffer, size);
- TRACE(("safe_realloc: buffer = %p, size = %zu, result = %p\n", buffer, size, result))
+ TRACE(("safe_realloc: buffer = %p, size = %zu, result = %p\n", buffer, size,
+ result))
-/* if (result != NULL) {
- // errno gets set to 12 on my OS Xmachine in some cases even when the
- // realloc succeeds. annoying
- errno = 0;
- } else {
- return buffer;
- }*/
return result;
}
-
void coliter_setup(coliter_t *self, parser_t *parser, int i, int start) {
// column i, starting at 0
self->words = parser->words;
@@ -73,7 +45,7 @@ void coliter_setup(coliter_t *self, parser_t *parser, int i, int start) {
coliter_t *coliter_new(parser_t *self, int i) {
// column i, starting at 0
- coliter_t *iter = (coliter_t*) malloc(sizeof(coliter_t));
+ coliter_t *iter = (coliter_t *)malloc(sizeof(coliter_t));
if (NULL == iter) {
return NULL;
@@ -83,36 +55,28 @@ coliter_t *coliter_new(parser_t *self, int i) {
return iter;
}
-
- /* int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max, int *error); */
- /* uint64_t str_to_uint64(const char *p_item, uint64_t uint_max, int *error); */
-
-
-static void free_if_not_null(void **ptr) {
+static void free_if_not_null(void **ptr) {
TRACE(("free_if_not_null %p\n", *ptr))
if (*ptr != NULL) {
free(*ptr);
*ptr = NULL;
}
- }
-
-
-
- /*
+}
- Parser / tokenizer
+/*
- */
+ Parser / tokenizer
+*/
-static void *grow_buffer(void *buffer, int length, int *capacity,
- int space, int elsize, int *error) {
+static void *grow_buffer(void *buffer, int length, int *capacity, int space,
+ int elsize, int *error) {
int cap = *capacity;
void *newbuffer = buffer;
// Can we fit potentially nbytes tokens (+ null terminators) in the stream?
- while ( (length + space >= cap) && (newbuffer != NULL) ){
- cap = cap? cap << 1 : 2;
+ while ((length + space >= cap) && (newbuffer != NULL)) {
+ cap = cap ? cap << 1 : 2;
buffer = newbuffer;
newbuffer = safe_realloc(newbuffer, elsize * cap);
}
@@ -122,15 +86,14 @@ static void *grow_buffer(void *buffer, int length, int *capacity,
// and return the last good realloc'd buffer so it can be freed
*error = errno;
newbuffer = buffer;
- } else {
+ } else {
// realloc worked, update *capacity and set *error to 0
// sigh, multiple return values
*capacity = cap;
*error = 0;
}
return newbuffer;
- }
-
+}
void parser_set_default_options(parser_t *self) {
self->decimal = '.';
@@ -139,7 +102,7 @@ void parser_set_default_options(parser_t *self) {
// For tokenization
self->state = START_RECORD;
- self->delimiter = ','; // XXX
+ self->delimiter = ','; // XXX
self->delim_whitespace = 0;
self->doublequote = 0;
@@ -161,17 +124,13 @@ void parser_set_default_options(parser_t *self) {
self->thousands = '\0';
self->skipset = NULL;
- self-> skip_first_N_rows = -1;
+ self->skip_first_N_rows = -1;
self->skip_footer = 0;
}
-int get_parser_memory_footprint(parser_t *self) {
- return 0;
-}
+int get_parser_memory_footprint(parser_t *self) { return 0; }
-parser_t* parser_new() {
- return (parser_t*) calloc(1, sizeof(parser_t));
-}
+parser_t *parser_new() { return (parser_t *)calloc(1, sizeof(parser_t)); }
int parser_clear_data_buffers(parser_t *self) {
free_if_not_null((void *)&self->stream);
@@ -183,14 +142,14 @@ int parser_clear_data_buffers(parser_t *self) {
}
int parser_cleanup(parser_t *self) {
- int status = 0;
+ int status = 0;
// XXX where to put this
- free_if_not_null((void *) &self->error_msg);
- free_if_not_null((void *) &self->warn_msg);
+ free_if_not_null((void *)&self->error_msg);
+ free_if_not_null((void *)&self->warn_msg);
if (self->skipset != NULL) {
- kh_destroy_int64((kh_int64_t*) self->skipset);
+ kh_destroy_int64((kh_int64_t *)self->skipset);
self->skipset = NULL;
}
@@ -207,8 +166,6 @@ int parser_cleanup(parser_t *self) {
return status;
}
-
-
int parser_init(parser_t *self) {
int sz;
@@ -225,7 +182,7 @@ int parser_init(parser_t *self) {
self->warn_msg = NULL;
// token stream
- self->stream = (char*) malloc(STREAM_INIT_SIZE * sizeof(char));
+ self->stream = (char *)malloc(STREAM_INIT_SIZE * sizeof(char));
if (self->stream == NULL) {
parser_cleanup(self);
return PARSER_OUT_OF_MEMORY;
@@ -235,16 +192,16 @@ int parser_init(parser_t *self) {
// word pointers and metadata
sz = STREAM_INIT_SIZE / 10;
- sz = sz? sz : 1;
- self->words = (char**) malloc(sz * sizeof(char*));
- self->word_starts = (int*) malloc(sz * sizeof(int));
+ sz = sz ? sz : 1;
+ self->words = (char **)malloc(sz * sizeof(char *));
+ self->word_starts = (int *)malloc(sz * sizeof(int));
self->words_cap = sz;
self->words_len = 0;
// line pointers and metadata
- self->line_start = (int*) malloc(sz * sizeof(int));
+ self->line_start = (int *)malloc(sz * sizeof(int));
- self->line_fields = (int*) malloc(sz * sizeof(int));
+ self->line_fields = (int *)malloc(sz * sizeof(int));
self->lines_cap = sz;
self->lines = 0;
@@ -253,7 +210,6 @@ int parser_init(parser_t *self) {
if (self->stream == NULL || self->words == NULL ||
self->word_starts == NULL || self->line_start == NULL ||
self->line_fields == NULL) {
-
parser_cleanup(self);
return PARSER_OUT_OF_MEMORY;
@@ -279,7 +235,6 @@ int parser_init(parser_t *self) {
return 0;
}
-
void parser_free(parser_t *self) {
// opposite of parser_init
parser_cleanup(self);
@@ -292,20 +247,21 @@ static int make_stream_space(parser_t *self, size_t nbytes) {
// Can we fit potentially nbytes tokens (+ null terminators) in the stream?
- /* TRACE(("maybe growing buffers\n")); */
-
/*
TOKEN STREAM
*/
- orig_ptr = (void *) self->stream;
- TRACE(("\n\nmake_stream_space: nbytes = %zu. grow_buffer(self->stream...)\n", nbytes))
- self->stream = (char*) grow_buffer((void *) self->stream,
- self->stream_len,
- &self->stream_cap, nbytes * 2,
- sizeof(char), &status);
- TRACE(("make_stream_space: self->stream=%p, self->stream_len = %zu, self->stream_cap=%zu, status=%zu\n",
- self->stream, self->stream_len, self->stream_cap, status))
+ orig_ptr = (void *)self->stream;
+ TRACE(
+ ("\n\nmake_stream_space: nbytes = %zu. grow_buffer(self->stream...)\n",
+ nbytes))
+ self->stream = (char *)grow_buffer((void *)self->stream, self->stream_len,
+ &self->stream_cap, nbytes * 2,
+ sizeof(char), &status);
+ TRACE(
+ ("make_stream_space: self->stream=%p, self->stream_len = %zu, "
+ "self->stream_cap=%zu, status=%zu\n",
+ self->stream, self->stream_len, self->stream_cap, status))
if (status != 0) {
return PARSER_OUT_OF_MEMORY;
@@ -313,95 +269,86 @@ static int make_stream_space(parser_t *self, size_t nbytes) {
// realloc sets errno when moving buffer?
if (self->stream != orig_ptr) {
- // uff
- /* TRACE(("Moving word pointers\n")) */
-
self->pword_start = self->stream + self->word_start;
- for (i = 0; i < self->words_len; ++i)
- {
+ for (i = 0; i < self->words_len; ++i) {
self->words[i] = self->stream + self->word_starts[i];
}
}
-
/*
WORD VECTORS
*/
cap = self->words_cap;
- self->words = (char**) grow_buffer((void *) self->words,
- self->words_len,
- &self->words_cap, nbytes,
- sizeof(char*), &status);
- TRACE(("make_stream_space: grow_buffer(self->self->words, %zu, %zu, %zu, %d)\n",
- self->words_len, self->words_cap, nbytes, status))
+ self->words =
+ (char **)grow_buffer((void *)self->words, self->words_len,
+ &self->words_cap, nbytes, sizeof(char *), &status);
+ TRACE(
+ ("make_stream_space: grow_buffer(self->self->words, %zu, %zu, %zu, "
+ "%d)\n",
+ self->words_len, self->words_cap, nbytes, status))
if (status != 0) {
return PARSER_OUT_OF_MEMORY;
}
-
// realloc took place
if (cap != self->words_cap) {
- TRACE(("make_stream_space: cap != self->words_cap, nbytes = %d, self->words_cap=%d\n", nbytes, self->words_cap))
- newptr = safe_realloc((void *) self->word_starts, sizeof(int) * self->words_cap);
+ TRACE(
+ ("make_stream_space: cap != self->words_cap, nbytes = %d, "
+ "self->words_cap=%d\n",
+ nbytes, self->words_cap))
+ newptr = safe_realloc((void *)self->word_starts,
+ sizeof(int) * self->words_cap);
if (newptr == NULL) {
return PARSER_OUT_OF_MEMORY;
} else {
- self->word_starts = (int*) newptr;
+ self->word_starts = (int *)newptr;
}
}
-
/*
LINE VECTORS
*/
- /*
- printf("Line_start: ");
-
- for (j = 0; j < self->lines + 1; ++j) {
- printf("%d ", self->line_fields[j]);
- }
- printf("\n");
-
- printf("lines_cap: %d\n", self->lines_cap);
- */
cap = self->lines_cap;
- self->line_start = (int*) grow_buffer((void *) self->line_start,
- self->lines + 1,
- &self->lines_cap, nbytes,
- sizeof(int), &status);
- TRACE(("make_stream_space: grow_buffer(self->line_start, %zu, %zu, %zu, %d)\n",
- self->lines + 1, self->lines_cap, nbytes, status))
+ self->line_start =
+ (int *)grow_buffer((void *)self->line_start, self->lines + 1,
+ &self->lines_cap, nbytes, sizeof(int), &status);
+ TRACE((
+ "make_stream_space: grow_buffer(self->line_start, %zu, %zu, %zu, %d)\n",
+ self->lines + 1, self->lines_cap, nbytes, status))
if (status != 0) {
return PARSER_OUT_OF_MEMORY;
}
// realloc took place
if (cap != self->lines_cap) {
- TRACE(("make_stream_space: cap != self->lines_cap, nbytes = %d\n", nbytes))
- newptr = safe_realloc((void *) self->line_fields, sizeof(int) * self->lines_cap);
+ TRACE(("make_stream_space: cap != self->lines_cap, nbytes = %d\n",
+ nbytes))
+ newptr = safe_realloc((void *)self->line_fields,
+ sizeof(int) * self->lines_cap);
if (newptr == NULL) {
return PARSER_OUT_OF_MEMORY;
} else {
- self->line_fields = (int*) newptr;
+ self->line_fields = (int *)newptr;
}
}
- /* TRACE(("finished growing buffers\n")); */
-
return 0;
}
-
static int push_char(parser_t *self, char c) {
- /* TRACE(("pushing %c \n", c)) */
- TRACE(("push_char: self->stream[%zu] = %x, stream_cap=%zu\n", self->stream_len+1, c, self->stream_cap))
+ TRACE(("push_char: self->stream[%zu] = %x, stream_cap=%zu\n",
+ self->stream_len + 1, c, self->stream_cap))
if (self->stream_len >= self->stream_cap) {
- TRACE(("push_char: ERROR!!! self->stream_len(%d) >= self->stream_cap(%d)\n",
- self->stream_len, self->stream_cap))
- self->error_msg = (char*) malloc(64);
- sprintf(self->error_msg, "Buffer overflow caught - possible malformed input file.\n");
+ TRACE(
+ ("push_char: ERROR!!! self->stream_len(%d) >= "
+ "self->stream_cap(%d)\n",
+ self->stream_len, self->stream_cap))
+ int bufsize = 100;
+ self->error_msg = (char *)malloc(bufsize);
+ snprintf(self->error_msg, bufsize,
+ "Buffer overflow caught - possible malformed input file.\n");
return PARSER_OUT_OF_MEMORY;
}
self->stream[self->stream_len++] = c;
@@ -410,11 +357,15 @@ static int push_char(parser_t *self, char c) {
int P_INLINE end_field(parser_t *self) {
// XXX cruft
-// self->numeric_field = 0;
if (self->words_len >= self->words_cap) {
- TRACE(("end_field: ERROR!!! self->words_len(%zu) >= self->words_cap(%zu)\n", self->words_len, self->words_cap))
- self->error_msg = (char*) malloc(64);
- sprintf(self->error_msg, "Buffer overflow caught - possible malformed input file.\n");
+ TRACE(
+ ("end_field: ERROR!!! self->words_len(%zu) >= "
+ "self->words_cap(%zu)\n",
+ self->words_len, self->words_cap))
+ int bufsize = 100;
+ self->error_msg = (char *)malloc(bufsize);
+ snprintf(self->error_msg, bufsize,
+ "Buffer overflow caught - possible malformed input file.\n");
return PARSER_OUT_OF_MEMORY;
}
@@ -426,8 +377,8 @@ int P_INLINE end_field(parser_t *self) {
TRACE(("end_field: Char diff: %d\n", self->pword_start - self->words[0]));
- TRACE(("end_field: Saw word %s at: %d. Total: %d\n",
- self->pword_start, self->word_start, self->words_len + 1))
+ TRACE(("end_field: Saw word %s at: %d. Total: %d\n", self->pword_start,
+ self->word_start, self->words_len + 1))
self->word_starts[self->words_len] = self->word_start;
self->words_len++;
@@ -442,29 +393,29 @@ int P_INLINE end_field(parser_t *self) {
return 0;
}
-
static void append_warning(parser_t *self, const char *msg) {
int ex_length;
int length = strlen(msg);
void *newptr;
if (self->warn_msg == NULL) {
- self->warn_msg = (char*) malloc(length + 1);
- strcpy(self->warn_msg, msg);
+ self->warn_msg = (char *)malloc(length + 1);
+ strncpy(self->warn_msg, msg, strlen(msg) + 1);
} else {
ex_length = strlen(self->warn_msg);
newptr = safe_realloc(self->warn_msg, ex_length + length + 1);
if (newptr != NULL) {
- self->warn_msg = (char*) newptr;
- strcpy(self->warn_msg + ex_length, msg);
+ self->warn_msg = (char *)newptr;
+ strncpy(self->warn_msg + ex_length, msg, strlen(msg) + 1);
}
}
}
static int end_line(parser_t *self) {
+ char *msg;
int fields;
int ex_fields = self->expected_fields;
- char *msg;
+ int bufsize = 100; // for error or warning messages
fields = self->line_fields[self->lines];
@@ -478,10 +429,10 @@ static int end_line(parser_t *self) {
}
}
- if (self->state == SKIP_LINE || \
- self->state == QUOTE_IN_SKIP_LINE || \
- self->state == QUOTE_IN_QUOTE_IN_SKIP_LINE
- ) {
+ if (self->state == START_FIELD_IN_SKIP_LINE ||
+ self->state == IN_FIELD_IN_SKIP_LINE ||
+ self->state == IN_QUOTED_FIELD_IN_SKIP_LINE ||
+ self->state == QUOTE_IN_QUOTED_FIELD_IN_SKIP_LINE) {
TRACE(("end_line: Skipping row %d\n", self->file_lines));
// increment file line count
self->file_lines++;
@@ -494,9 +445,8 @@ static int end_line(parser_t *self) {
return 0;
}
- if (!(self->lines <= self->header_end + 1)
- && (self->expected_fields < 0 && fields > ex_fields)
- && !(self->usecols)) {
+ if (!(self->lines <= self->header_end + 1) &&
+ (self->expected_fields < 0 && fields > ex_fields) && !(self->usecols)) {
// increment file line count
self->file_lines++;
@@ -508,8 +458,9 @@ static int end_line(parser_t *self) {
// file_lines is now the actual file line number (starting at 1)
if (self->error_bad_lines) {
- self->error_msg = (char*) malloc(100);
- sprintf(self->error_msg, "Expected %d fields in line %d, saw %d\n",
+ self->error_msg = (char *)malloc(bufsize);
+ snprintf(self->error_msg, bufsize,
+ "Expected %d fields in line %d, saw %d\n",
ex_fields, self->file_lines, fields);
TRACE(("Error at line %d, %d fields\n", self->file_lines, fields));
@@ -519,9 +470,10 @@ static int end_line(parser_t *self) {
// simply skip bad lines
if (self->warn_bad_lines) {
// pass up error message
- msg = (char*) malloc(100);
- sprintf(msg, "Skipping line %d: expected %d fields, saw %d\n",
- self->file_lines, ex_fields, fields);
+ msg = (char *)malloc(bufsize);
+ snprintf(msg, bufsize,
+ "Skipping line %d: expected %d fields, saw %d\n",
+ self->file_lines, ex_fields, fields);
append_warning(self, msg);
free(msg);
}
@@ -529,14 +481,13 @@ static int end_line(parser_t *self) {
} else {
// missing trailing delimiters
if ((self->lines >= self->header_end + 1) && fields < ex_fields) {
-
// might overrun the buffer when closing fields
if (make_stream_space(self, ex_fields - fields) < 0) {
self->error_msg = "out of memory";
return -1;
}
- while (fields < ex_fields){
+ while (fields < ex_fields) {
end_field(self);
fields++;
}
@@ -548,15 +499,21 @@ static int end_line(parser_t *self) {
// good line, set new start point
if (self->lines >= self->lines_cap) {
- TRACE(("end_line: ERROR!!! self->lines(%zu) >= self->lines_cap(%zu)\n", self->lines, self->lines_cap)) \
- self->error_msg = (char*) malloc(100); \
- sprintf(self->error_msg, "Buffer overflow caught - possible malformed input file.\n"); \
- return PARSER_OUT_OF_MEMORY; \
+ TRACE((
+ "end_line: ERROR!!! self->lines(%zu) >= self->lines_cap(%zu)\n",
+ self->lines, self->lines_cap))
+ int bufsize = 100;
+ self->error_msg = (char *)malloc(bufsize);
+ snprintf(self->error_msg, bufsize,
+ "Buffer overflow caught - "
+ "possible malformed input file.\n");
+ return PARSER_OUT_OF_MEMORY;
}
- self->line_start[self->lines] = (self->line_start[self->lines - 1] +
- fields);
+ self->line_start[self->lines] =
+ (self->line_start[self->lines - 1] + fields);
- TRACE(("end_line: new line start: %d\n", self->line_start[self->lines]));
+ TRACE(
+ ("end_line: new line start: %d\n", self->line_start[self->lines]));
// new line start with 0 fields
self->line_fields[self->lines] = 0;
@@ -573,10 +530,10 @@ int parser_add_skiprow(parser_t *self, int64_t row) {
int ret = 0;
if (self->skipset == NULL) {
- self->skipset = (void*) kh_init_int64();
+ self->skipset = (void *)kh_init_int64();
}
- set = (kh_int64_t*) self->skipset;
+ set = (kh_int64_t *)self->skipset;
k = kh_put_int64(set, row, &ret);
set->keys[k] = row;
@@ -600,18 +557,21 @@ static int parser_buffer_bytes(parser_t *self, size_t nbytes) {
status = 0;
self->datapos = 0;
self->data = self->cb_io(self->source, nbytes, &bytes_read, &status);
- TRACE(("parser_buffer_bytes self->cb_io: nbytes=%zu, datalen: %d, status=%d\n",
- nbytes, bytes_read, status));
+ TRACE((
+ "parser_buffer_bytes self->cb_io: nbytes=%zu, datalen: %d, status=%d\n",
+ nbytes, bytes_read, status));
self->datalen = bytes_read;
if (status != REACHED_EOF && self->data == NULL) {
- self->error_msg = (char*) malloc(200);
+ int bufsize = 200;
+ self->error_msg = (char *)malloc(bufsize);
if (status == CALLING_READ_FAILED) {
- sprintf(self->error_msg, ("Calling read(nbytes) on source failed. "
- "Try engine='python'."));
+ snprintf(self->error_msg, bufsize,
+ "Calling read(nbytes) on source failed. "
+ "Try engine='python'.");
} else {
- sprintf(self->error_msg, "Unknown error in IO callback");
+ snprintf(self->error_msg, bufsize, "Unknown error in IO callback");
}
return -1;
}
@@ -621,93 +581,96 @@ static int parser_buffer_bytes(parser_t *self, size_t nbytes) {
return status;
}
-
/*
Tokenization macros and state machine code
*/
-// printf("pushing %c\n", c);
-
-#define PUSH_CHAR(c) \
- TRACE(("PUSH_CHAR: Pushing %c, slen= %d, stream_cap=%zu, stream_len=%zu\n", c, slen, self->stream_cap, self->stream_len)) \
- if (slen >= maxstreamsize) { \
- TRACE(("PUSH_CHAR: ERROR!!! slen(%d) >= maxstreamsize(%d)\n", slen, maxstreamsize)) \
- self->error_msg = (char*) malloc(100); \
- sprintf(self->error_msg, "Buffer overflow caught - possible malformed input file.\n"); \
- return PARSER_OUT_OF_MEMORY; \
- } \
- *stream++ = c; \
+#define PUSH_CHAR(c) \
+ TRACE( \
+ ("PUSH_CHAR: Pushing %c, slen= %d, stream_cap=%zu, stream_len=%zu\n", \
+ c, slen, self->stream_cap, self->stream_len)) \
+ if (slen >= maxstreamsize) { \
+ TRACE(("PUSH_CHAR: ERROR!!! slen(%d) >= maxstreamsize(%d)\n", slen, \
+ maxstreamsize)) \
+ int bufsize = 100; \
+ self->error_msg = (char *)malloc(bufsize); \
+ snprintf(self->error_msg, bufsize, \
+ "Buffer overflow caught - possible malformed input file.\n");\
+ return PARSER_OUT_OF_MEMORY; \
+ } \
+ *stream++ = c; \
slen++;
// This is a little bit of a hack but works for now
-#define END_FIELD() \
- self->stream_len = slen; \
- if (end_field(self) < 0) { \
- goto parsingerror; \
- } \
- stream = self->stream + self->stream_len; \
+#define END_FIELD() \
+ self->stream_len = slen; \
+ if (end_field(self) < 0) { \
+ goto parsingerror; \
+ } \
+ stream = self->stream + self->stream_len; \
slen = self->stream_len;
-#define END_LINE_STATE(STATE) \
- self->stream_len = slen; \
- if (end_line(self) < 0) { \
- goto parsingerror; \
- } \
- stream = self->stream + self->stream_len; \
- slen = self->stream_len; \
- self->state = STATE; \
- if (line_limit > 0 && self->lines == start_lines + line_limit) { \
- goto linelimit; \
- \
- }
-
-#define END_LINE_AND_FIELD_STATE(STATE) \
- self->stream_len = slen; \
- if (end_line(self) < 0) { \
- goto parsingerror; \
- } \
- if (end_field(self) < 0) { \
- goto parsingerror; \
- } \
- stream = self->stream + self->stream_len; \
- slen = self->stream_len; \
- self->state = STATE; \
- if (line_limit > 0 && self->lines == start_lines + line_limit) { \
- goto linelimit; \
- \
+#define END_LINE_STATE(STATE) \
+ self->stream_len = slen; \
+ if (end_line(self) < 0) { \
+ goto parsingerror; \
+ } \
+ stream = self->stream + self->stream_len; \
+ slen = self->stream_len; \
+ self->state = STATE; \
+ if (line_limit > 0 && self->lines == start_lines + line_limit) { \
+ goto linelimit; \
+ }
+
+#define END_LINE_AND_FIELD_STATE(STATE) \
+ self->stream_len = slen; \
+ if (end_line(self) < 0) { \
+ goto parsingerror; \
+ } \
+ if (end_field(self) < 0) { \
+ goto parsingerror; \
+ } \
+ stream = self->stream + self->stream_len; \
+ slen = self->stream_len; \
+ self->state = STATE; \
+ if (line_limit > 0 && self->lines == start_lines + line_limit) { \
+ goto linelimit; \
}
#define END_LINE() END_LINE_STATE(START_RECORD)
#define IS_WHITESPACE(c) ((c == ' ' || c == '\t'))
-#define IS_TERMINATOR(c) ((self->lineterminator == '\0' && c == '\n') || \
- (self->lineterminator != '\0' && \
- c == self->lineterminator))
+#define IS_TERMINATOR(c) \
+ ((self->lineterminator == '\0' && c == '\n') || \
+ (self->lineterminator != '\0' && c == self->lineterminator))
#define IS_QUOTE(c) ((c == self->quotechar && self->quoting != QUOTE_NONE))
// don't parse '\r' with a custom line terminator
#define IS_CARRIAGE(c) ((self->lineterminator == '\0' && c == '\r'))
-#define IS_COMMENT_CHAR(c) ((self->commentchar != '\0' && c == self->commentchar))
+#define IS_COMMENT_CHAR(c) \
+ ((self->commentchar != '\0' && c == self->commentchar))
#define IS_ESCAPE_CHAR(c) ((self->escapechar != '\0' && c == self->escapechar))
-#define IS_SKIPPABLE_SPACE(c) ((!self->delim_whitespace && c == ' ' && \
- self->skipinitialspace))
+#define IS_SKIPPABLE_SPACE(c) \
+ ((!self->delim_whitespace && c == ' ' && self->skipinitialspace))
// applied when in a field
-#define IS_DELIMITER(c) ((!self->delim_whitespace && c == self->delimiter) || \
- (self->delim_whitespace && IS_WHITESPACE(c)))
+#define IS_DELIMITER(c) \
+ ((!self->delim_whitespace && c == self->delimiter) || \
+ (self->delim_whitespace && IS_WHITESPACE(c)))
#define _TOKEN_CLEANUP() \
self->stream_len = slen; \
self->datapos = i; \
- TRACE(("_TOKEN_CLEANUP: datapos: %d, datalen: %d\n", self->datapos, self->datalen));
+ TRACE(("_TOKEN_CLEANUP: datapos: %d, datalen: %d\n", self->datapos, \
+ self->datalen));
#define CHECK_FOR_BOM() \
if (*buf == '\xef' && *(buf + 1) == '\xbb' && *(buf + 2) == '\xbf') { \
@@ -717,24 +680,20 @@ static int parser_buffer_bytes(parser_t *self, size_t nbytes) {
int skip_this_line(parser_t *self, int64_t rownum) {
if (self->skipset != NULL) {
- return ( kh_get_int64((kh_int64_t*) self->skipset, self->file_lines) !=
- ((kh_int64_t*)self->skipset)->n_buckets );
- }
- else {
- return ( rownum <= self->skip_first_N_rows );
+ return (kh_get_int64((kh_int64_t *)self->skipset, self->file_lines) !=
+ ((kh_int64_t *)self->skipset)->n_buckets);
+ } else {
+ return (rownum <= self->skip_first_N_rows);
}
}
-int tokenize_bytes(parser_t *self, size_t line_limit)
-{
- int i, slen, start_lines;
+int tokenize_bytes(parser_t *self, size_t line_limit, int start_lines) {
+ int i, slen;
long maxstreamsize;
char c;
char *stream;
char *buf = self->data + self->datapos;
- start_lines = self->lines;
-
if (make_stream_space(self, self->datalen - self->datapos) < 0) {
self->error_msg = "out of memory";
return -1;
@@ -750,352 +709,364 @@ int tokenize_bytes(parser_t *self, size_t line_limit)
CHECK_FOR_BOM();
}
- for (i = self->datapos; i < self->datalen; ++i)
- {
+ for (i = self->datapos; i < self->datalen; ++i) {
// next character in file
c = *buf++;
- TRACE(("tokenize_bytes - Iter: %d Char: 0x%x Line %d field_count %d, state %d\n",
- i, c, self->file_lines + 1, self->line_fields[self->lines],
- self->state));
-
- switch(self->state) {
-
- case SKIP_LINE:
- TRACE(("tokenize_bytes SKIP_LINE 0x%x, state %d\n", c, self->state));
- if (IS_TERMINATOR(c)) {
- END_LINE();
- } else if (IS_CARRIAGE(c)) {
- self->file_lines++;
- self->state = EAT_CRNL_NOP;
- } else if (IS_QUOTE(c)) {
- self->state = QUOTE_IN_SKIP_LINE;
- }
- break;
+ TRACE(
+ ("tokenize_bytes - Iter: %d Char: 0x%x Line %d field_count %d, "
+ "state %d\n",
+ i, c, self->file_lines + 1, self->line_fields[self->lines],
+ self->state));
- case QUOTE_IN_SKIP_LINE:
- if (IS_QUOTE(c)) {
- if (self->doublequote) {
- self->state = QUOTE_IN_QUOTE_IN_SKIP_LINE;
+ switch (self->state) {
+ case START_FIELD_IN_SKIP_LINE:
+ if (IS_TERMINATOR(c)) {
+ END_LINE();
+ } else if (IS_CARRIAGE(c)) {
+ self->file_lines++;
+ self->state = EAT_CRNL_NOP;
+ } else if (IS_QUOTE(c)) {
+ self->state = IN_QUOTED_FIELD_IN_SKIP_LINE;
+ } else if (IS_DELIMITER(c)) {
+ // Do nothing, we're starting a new field again.
} else {
- self->state = SKIP_LINE;
+ self->state = IN_FIELD_IN_SKIP_LINE;
}
- }
- break;
-
- case QUOTE_IN_QUOTE_IN_SKIP_LINE:
- if (IS_QUOTE(c)) {
- self->state = QUOTE_IN_SKIP_LINE;
- } else if (IS_TERMINATOR(c)) {
- END_LINE();
- } else if (IS_CARRIAGE(c)) {
- self->file_lines++;
- self->state = EAT_CRNL_NOP;
- } else {
- self->state = SKIP_LINE;
- }
- break;
-
- case WHITESPACE_LINE:
- if (IS_TERMINATOR(c)) {
- self->file_lines++;
- self->state = START_RECORD;
break;
- } else if (IS_CARRIAGE(c)) {
- self->file_lines++;
- self->state = EAT_CRNL_NOP;
- break;
- } else if (!self->delim_whitespace) {
- if (IS_WHITESPACE(c) && c != self->delimiter) {
- ;
- } else { // backtrack
- // use i + 1 because buf has been incremented but not i
- do {
- --buf;
- --i;
- } while (i + 1 > self->datapos && !IS_TERMINATOR(*buf));
- // reached a newline rather than the beginning
- if (IS_TERMINATOR(*buf)) {
- ++buf; // move pointer to first char after newline
- ++i;
- }
- self->state = START_FIELD;
+ case IN_FIELD_IN_SKIP_LINE:
+ if (IS_TERMINATOR(c)) {
+ END_LINE();
+ } else if (IS_CARRIAGE(c)) {
+ self->file_lines++;
+ self->state = EAT_CRNL_NOP;
+ } else if (IS_DELIMITER(c)) {
+ self->state = START_FIELD_IN_SKIP_LINE;
}
break;
- }
- // fall through
-
- case EAT_WHITESPACE:
- if (IS_TERMINATOR(c)) {
- END_LINE();
- self->state = START_RECORD;
- break;
- } else if (IS_CARRIAGE(c)) {
- self->state = EAT_CRNL;
- break;
- } else if (!IS_WHITESPACE(c)) {
- self->state = START_FIELD;
- // fall through to subsequent state
- } else {
- // if whitespace char, keep slurping
- break;
- }
- case START_RECORD:
- // start of record
- if (skip_this_line(self, self->file_lines)) {
+ case IN_QUOTED_FIELD_IN_SKIP_LINE:
if (IS_QUOTE(c)) {
- self->state = QUOTE_IN_SKIP_LINE;
- } else {
- self->state = SKIP_LINE;
-
- if (IS_TERMINATOR(c)) {
- END_LINE();
+ if (self->doublequote) {
+ self->state = QUOTE_IN_QUOTED_FIELD_IN_SKIP_LINE;
+ } else {
+ self->state = IN_FIELD_IN_SKIP_LINE;
}
}
break;
- } else if (IS_TERMINATOR(c)) {
- // \n\r possible?
- if (self->skip_empty_lines) {
+
+ case QUOTE_IN_QUOTED_FIELD_IN_SKIP_LINE:
+ if (IS_QUOTE(c)) {
+ self->state = IN_QUOTED_FIELD_IN_SKIP_LINE;
+ } else if (IS_TERMINATOR(c)) {
+ END_LINE();
+ } else if (IS_CARRIAGE(c)) {
self->file_lines++;
+ self->state = EAT_CRNL_NOP;
+ } else if (IS_DELIMITER(c)) {
+ self->state = START_FIELD_IN_SKIP_LINE;
} else {
- END_LINE();
+ self->state = IN_FIELD_IN_SKIP_LINE;
}
break;
- } else if (IS_CARRIAGE(c)) {
- if (self->skip_empty_lines) {
+
+ case WHITESPACE_LINE:
+ if (IS_TERMINATOR(c)) {
+ self->file_lines++;
+ self->state = START_RECORD;
+ break;
+ } else if (IS_CARRIAGE(c)) {
self->file_lines++;
self->state = EAT_CRNL_NOP;
- } else {
+ break;
+ } else if (!self->delim_whitespace) {
+ if (IS_WHITESPACE(c) && c != self->delimiter) {
+ } else { // backtrack
+ // use i + 1 because buf has been incremented but not i
+ do {
+ --buf;
+ --i;
+ } while (i + 1 > self->datapos && !IS_TERMINATOR(*buf));
+
+ // reached a newline rather than the beginning
+ if (IS_TERMINATOR(*buf)) {
+ ++buf; // move pointer to first char after newline
+ ++i;
+ }
+ self->state = START_FIELD;
+ }
+ break;
+ }
+ // fall through
+
+ case EAT_WHITESPACE:
+ if (IS_TERMINATOR(c)) {
+ END_LINE();
+ self->state = START_RECORD;
+ break;
+ } else if (IS_CARRIAGE(c)) {
self->state = EAT_CRNL;
+ break;
+ } else if (!IS_WHITESPACE(c)) {
+ self->state = START_FIELD;
+ // fall through to subsequent state
+ } else {
+ // if whitespace char, keep slurping
+ break;
}
- break;
- } else if (IS_COMMENT_CHAR(c)) {
- self->state = EAT_LINE_COMMENT;
- break;
- } else if (IS_WHITESPACE(c)) {
- if (self->delim_whitespace) {
+
+ case START_RECORD:
+ // start of record
+ if (skip_this_line(self, self->file_lines)) {
+ if (IS_QUOTE(c)) {
+ self->state = IN_QUOTED_FIELD_IN_SKIP_LINE;
+ } else {
+ self->state = IN_FIELD_IN_SKIP_LINE;
+
+ if (IS_TERMINATOR(c)) {
+ END_LINE();
+ }
+ }
+ break;
+ } else if (IS_TERMINATOR(c)) {
+ // \n\r possible?
if (self->skip_empty_lines) {
- self->state = WHITESPACE_LINE;
+ self->file_lines++;
} else {
- self->state = EAT_WHITESPACE;
+ END_LINE();
}
break;
- } else if (c != self->delimiter && self->skip_empty_lines) {
- self->state = WHITESPACE_LINE;
+ } else if (IS_CARRIAGE(c)) {
+ if (self->skip_empty_lines) {
+ self->file_lines++;
+ self->state = EAT_CRNL_NOP;
+ } else {
+ self->state = EAT_CRNL;
+ }
break;
+ } else if (IS_COMMENT_CHAR(c)) {
+ self->state = EAT_LINE_COMMENT;
+ break;
+ } else if (IS_WHITESPACE(c)) {
+ if (self->delim_whitespace) {
+ if (self->skip_empty_lines) {
+ self->state = WHITESPACE_LINE;
+ } else {
+ self->state = EAT_WHITESPACE;
+ }
+ break;
+ } else if (c != self->delimiter && self->skip_empty_lines) {
+ self->state = WHITESPACE_LINE;
+ break;
+ }
+ // fall through
}
- // fall through
- }
- // normal character - fall through
- // to handle as START_FIELD
- self->state = START_FIELD;
+ // normal character - fall through
+ // to handle as START_FIELD
+ self->state = START_FIELD;
- case START_FIELD:
- // expecting field
- if (IS_TERMINATOR(c)) {
- END_FIELD();
- END_LINE();
- } else if (IS_CARRIAGE(c)) {
- END_FIELD();
- self->state = EAT_CRNL;
- } else if (IS_QUOTE(c)) {
- // start quoted field
- self->state = IN_QUOTED_FIELD;
- } else if (IS_ESCAPE_CHAR(c)) {
- // possible escaped character
- self->state = ESCAPED_CHAR;
- } else if (IS_SKIPPABLE_SPACE(c)) {
- // ignore space at start of field
- ;
- } else if (IS_DELIMITER(c)) {
- if (self->delim_whitespace) {
- self->state = EAT_WHITESPACE;
- } else {
- // save empty field
+ case START_FIELD:
+ // expecting field
+ if (IS_TERMINATOR(c)) {
+ END_FIELD();
+ END_LINE();
+ } else if (IS_CARRIAGE(c)) {
END_FIELD();
+ self->state = EAT_CRNL;
+ } else if (IS_QUOTE(c)) {
+ // start quoted field
+ self->state = IN_QUOTED_FIELD;
+ } else if (IS_ESCAPE_CHAR(c)) {
+ // possible escaped character
+ self->state = ESCAPED_CHAR;
+ } else if (IS_SKIPPABLE_SPACE(c)) {
+ // ignore space at start of field
+ } else if (IS_DELIMITER(c)) {
+ if (self->delim_whitespace) {
+ self->state = EAT_WHITESPACE;
+ } else {
+ // save empty field
+ END_FIELD();
+ }
+ } else if (IS_COMMENT_CHAR(c)) {
+ END_FIELD();
+ self->state = EAT_COMMENT;
+ } else {
+ // begin new unquoted field
+ PUSH_CHAR(c);
+ self->state = IN_FIELD;
}
- } else if (IS_COMMENT_CHAR(c)) {
- END_FIELD();
- self->state = EAT_COMMENT;
- } else {
- // begin new unquoted field
- // if (self->delim_whitespace && \
- // self->quoting == QUOTE_NONNUMERIC) {
- // self->numeric_field = 1;
- // }
+ break;
+ case ESCAPED_CHAR:
PUSH_CHAR(c);
self->state = IN_FIELD;
- }
- break;
+ break;
- case ESCAPED_CHAR:
- PUSH_CHAR(c);
- self->state = IN_FIELD;
- break;
+ case EAT_LINE_COMMENT:
+ if (IS_TERMINATOR(c)) {
+ self->file_lines++;
+ self->state = START_RECORD;
+ } else if (IS_CARRIAGE(c)) {
+ self->file_lines++;
+ self->state = EAT_CRNL_NOP;
+ }
+ break;
- case EAT_LINE_COMMENT:
- if (IS_TERMINATOR(c)) {
- self->file_lines++;
- self->state = START_RECORD;
- } else if (IS_CARRIAGE(c)) {
- self->file_lines++;
- self->state = EAT_CRNL_NOP;
- }
- break;
+ case IN_FIELD:
+ // in unquoted field
+ if (IS_TERMINATOR(c)) {
+ END_FIELD();
+ END_LINE();
+ } else if (IS_CARRIAGE(c)) {
+ END_FIELD();
+ self->state = EAT_CRNL;
+ } else if (IS_ESCAPE_CHAR(c)) {
+ // possible escaped character
+ self->state = ESCAPED_CHAR;
+ } else if (IS_DELIMITER(c)) {
+ // end of field - end of line not reached yet
+ END_FIELD();
- case IN_FIELD:
- // in unquoted field
- if (IS_TERMINATOR(c)) {
- END_FIELD();
- END_LINE();
- } else if (IS_CARRIAGE(c)) {
- END_FIELD();
- self->state = EAT_CRNL;
- } else if (IS_ESCAPE_CHAR(c)) {
- // possible escaped character
- self->state = ESCAPED_CHAR;
- } else if (IS_DELIMITER(c)) {
- // end of field - end of line not reached yet
- END_FIELD();
-
- if (self->delim_whitespace) {
- self->state = EAT_WHITESPACE;
+ if (self->delim_whitespace) {
+ self->state = EAT_WHITESPACE;
+ } else {
+ self->state = START_FIELD;
+ }
+ } else if (IS_COMMENT_CHAR(c)) {
+ END_FIELD();
+ self->state = EAT_COMMENT;
} else {
- self->state = START_FIELD;
+ // normal character - save in field
+ PUSH_CHAR(c);
}
- } else if (IS_COMMENT_CHAR(c)) {
- END_FIELD();
- self->state = EAT_COMMENT;
- } else {
- // normal character - save in field
- PUSH_CHAR(c);
- }
- break;
+ break;
- case IN_QUOTED_FIELD:
- // in quoted field
- if (IS_ESCAPE_CHAR(c)) {
- // possible escape character
- self->state = ESCAPE_IN_QUOTED_FIELD;
- } else if (IS_QUOTE(c)) {
- if (self->doublequote) {
- // double quote - " represented by ""
- self->state = QUOTE_IN_QUOTED_FIELD;
+ case IN_QUOTED_FIELD:
+ // in quoted field
+ if (IS_ESCAPE_CHAR(c)) {
+ // possible escape character
+ self->state = ESCAPE_IN_QUOTED_FIELD;
+ } else if (IS_QUOTE(c)) {
+ if (self->doublequote) {
+ // double quote - " represented by ""
+ self->state = QUOTE_IN_QUOTED_FIELD;
+ } else {
+ // end of quote part of field
+ self->state = IN_FIELD;
+ }
} else {
- // end of quote part of field
- self->state = IN_FIELD;
+ // normal character - save in field
+ PUSH_CHAR(c);
}
- } else {
- // normal character - save in field
- PUSH_CHAR(c);
- }
- break;
-
- case ESCAPE_IN_QUOTED_FIELD:
- PUSH_CHAR(c);
- self->state = IN_QUOTED_FIELD;
- break;
-
- case QUOTE_IN_QUOTED_FIELD:
- // double quote - seen a quote in an quoted field
- if (IS_QUOTE(c)) {
- // save "" as "
+ break;
+ case ESCAPE_IN_QUOTED_FIELD:
PUSH_CHAR(c);
self->state = IN_QUOTED_FIELD;
- } else if (IS_DELIMITER(c)) {
- // end of field - end of line not reached yet
- END_FIELD();
-
- if (self->delim_whitespace) {
- self->state = EAT_WHITESPACE;
- } else {
- self->state = START_FIELD;
- }
- } else if (IS_TERMINATOR(c)) {
- END_FIELD();
- END_LINE();
- } else if (IS_CARRIAGE(c)) {
- END_FIELD();
- self->state = EAT_CRNL;
- } else if (!self->strict) {
- PUSH_CHAR(c);
- self->state = IN_FIELD;
- } else {
- self->error_msg = (char*) malloc(50);
- sprintf(self->error_msg,
- "delimiter expected after "
- "quote in quote");
- goto parsingerror;
- }
- break;
+ break;
- case EAT_COMMENT:
- if (IS_TERMINATOR(c)) {
- END_LINE();
- } else if (IS_CARRIAGE(c)) {
- self->state = EAT_CRNL;
- }
- break;
+ case QUOTE_IN_QUOTED_FIELD:
+ // double quote - seen a quote in an quoted field
+ if (IS_QUOTE(c)) {
+ // save "" as "
- // only occurs with non-custom line terminator,
- // which is why we directly check for '\n'
- case EAT_CRNL:
- if (c == '\n') {
- END_LINE();
- } else if (IS_DELIMITER(c)){
+ PUSH_CHAR(c);
+ self->state = IN_QUOTED_FIELD;
+ } else if (IS_DELIMITER(c)) {
+ // end of field - end of line not reached yet
+ END_FIELD();
- if (self->delim_whitespace) {
- END_LINE_STATE(EAT_WHITESPACE);
+ if (self->delim_whitespace) {
+ self->state = EAT_WHITESPACE;
+ } else {
+ self->state = START_FIELD;
+ }
+ } else if (IS_TERMINATOR(c)) {
+ END_FIELD();
+ END_LINE();
+ } else if (IS_CARRIAGE(c)) {
+ END_FIELD();
+ self->state = EAT_CRNL;
+ } else if (!self->strict) {
+ PUSH_CHAR(c);
+ self->state = IN_FIELD;
} else {
- // Handle \r-delimited files
- END_LINE_AND_FIELD_STATE(START_FIELD);
+ int bufsize = 100;
+ self->error_msg = (char *)malloc(bufsize);
+ snprintf(self->error_msg, bufsize,
+ "delimiter expected after quote in quote");
+ goto parsingerror;
}
- } else {
- if (self->delim_whitespace) {
- /* XXX
- * first character of a new record--need to back up and reread
- * to handle properly...
- */
- i--; buf--; // back up one character (HACK!)
- END_LINE_STATE(START_RECORD);
- } else {
- // \r line terminator
- // UGH. we don't actually want
- // to consume the token. fix this later
- self->stream_len = slen;
- if (end_line(self) < 0) {
- goto parsingerror;
- }
+ break;
- stream = self->stream + self->stream_len;
- slen = self->stream_len;
- self->state = START_RECORD;
+ case EAT_COMMENT:
+ if (IS_TERMINATOR(c)) {
+ END_LINE();
+ } else if (IS_CARRIAGE(c)) {
+ self->state = EAT_CRNL;
+ }
+ break;
+
+ // only occurs with non-custom line terminator,
+ // which is why we directly check for '\n'
+ case EAT_CRNL:
+ if (c == '\n') {
+ END_LINE();
+ } else if (IS_DELIMITER(c)) {
+ if (self->delim_whitespace) {
+ END_LINE_STATE(EAT_WHITESPACE);
+ } else {
+ // Handle \r-delimited files
+ END_LINE_AND_FIELD_STATE(START_FIELD);
+ }
+ } else {
+ if (self->delim_whitespace) {
+ /* XXX
+ * first character of a new record--need to back up and
+ * reread
+ * to handle properly...
+ */
+ i--;
+ buf--; // back up one character (HACK!)
+ END_LINE_STATE(START_RECORD);
+ } else {
+ // \r line terminator
+ // UGH. we don't actually want
+ // to consume the token. fix this later
+ self->stream_len = slen;
+ if (end_line(self) < 0) {
+ goto parsingerror;
+ }
+
+ stream = self->stream + self->stream_len;
+ slen = self->stream_len;
+ self->state = START_RECORD;
- --i; buf--; // let's try this character again (HACK!)
- if (line_limit > 0 && self->lines == start_lines + line_limit) {
- goto linelimit;
+ --i;
+ buf--; // let's try this character again (HACK!)
+ if (line_limit > 0 &&
+ self->lines == start_lines + line_limit) {
+ goto linelimit;
+ }
}
}
- }
- break;
+ break;
- // only occurs with non-custom line terminator,
- // which is why we directly check for '\n'
- case EAT_CRNL_NOP: // inside an ignored comment line
- self->state = START_RECORD;
- // \r line terminator -- parse this character again
- if (c != '\n' && !IS_DELIMITER(c)) {
- --i;
- --buf;
- }
- break;
- default:
- break;
+ // only occurs with non-custom line terminator,
+ // which is why we directly check for '\n'
+ case EAT_CRNL_NOP: // inside an ignored comment line
+ self->state = START_RECORD;
+ // \r line terminator -- parse this character again
+ if (c != '\n' && !IS_DELIMITER(c)) {
+ --i;
+ --buf;
+ }
+ break;
+ default:
+ break;
}
}
@@ -1119,39 +1090,41 @@ int tokenize_bytes(parser_t *self, size_t line_limit)
}
static int parser_handle_eof(parser_t *self) {
- TRACE(("handling eof, datalen: %d, pstate: %d\n", self->datalen, self->state))
+ int bufsize = 100;
- if (self->datalen != 0)
- return -1;
+ TRACE(
+ ("handling eof, datalen: %d, pstate: %d\n", self->datalen, self->state))
- switch (self->state) {
- case START_RECORD:
- case WHITESPACE_LINE:
- case EAT_CRNL_NOP:
- case EAT_LINE_COMMENT:
- return 0;
+ if (self->datalen != 0) return -1;
- case ESCAPE_IN_QUOTED_FIELD:
- case IN_QUOTED_FIELD:
- self->error_msg = (char*)malloc(100);
- sprintf(self->error_msg, "EOF inside string starting at line %d",
- self->file_lines);
- return -1;
+ switch (self->state) {
+ case START_RECORD:
+ case WHITESPACE_LINE:
+ case EAT_CRNL_NOP:
+ case EAT_LINE_COMMENT:
+ return 0;
- case ESCAPED_CHAR:
- self->error_msg = (char*)malloc(100);
- sprintf(self->error_msg, "EOF following escape character");
- return -1;
+ case ESCAPE_IN_QUOTED_FIELD:
+ case IN_QUOTED_FIELD:
+ self->error_msg = (char *)malloc(bufsize);
+ snprintf(self->error_msg, bufsize,
+ "EOF inside string starting at line %d", self->file_lines);
+ return -1;
- case IN_FIELD:
- case START_FIELD:
- case QUOTE_IN_QUOTED_FIELD:
- if (end_field(self) < 0)
+ case ESCAPED_CHAR:
+ self->error_msg = (char *)malloc(bufsize);
+ snprintf(self->error_msg, bufsize,
+ "EOF following escape character");
return -1;
- break;
- default:
- break;
+ case IN_FIELD:
+ case START_FIELD:
+ case QUOTE_IN_QUOTED_FIELD:
+ if (end_field(self) < 0) return -1;
+ break;
+
+ default:
+ break;
}
if (end_line(self) < 0)
@@ -1168,19 +1141,19 @@ int parser_consume_rows(parser_t *self, size_t nrows) {
}
/* do nothing */
- if (nrows == 0)
- return 0;
+ if (nrows == 0) return 0;
/* cannot guarantee that nrows + 1 has been observed */
word_deletions = self->line_start[nrows - 1] + self->line_fields[nrows - 1];
char_count = (self->word_starts[word_deletions - 1] +
strlen(self->words[word_deletions - 1]) + 1);
- TRACE(("parser_consume_rows: Deleting %d words, %d chars\n", word_deletions, char_count));
+ TRACE(("parser_consume_rows: Deleting %d words, %d chars\n", word_deletions,
+ char_count));
/* move stream, only if something to move */
if (char_count < self->stream_len) {
- memmove((void*) self->stream, (void*) (self->stream + char_count),
+ memmove((void *)self->stream, (void *)(self->stream + char_count),
self->stream_len - char_count);
}
/* buffer counts */
@@ -1198,26 +1171,14 @@ int parser_consume_rows(parser_t *self, size_t nrows) {
/* move current word pointer to stream */
self->pword_start -= char_count;
self->word_start -= char_count;
- /*
- printf("Line_start: ");
- for (i = 0; i < self->lines + 1; ++i) {
- printf("%d ", self->line_fields[i]);
- }
- printf("\n");
- */
+
/* move line metadata */
- for (i = 0; i < self->lines - nrows + 1; ++i)
- {
+ for (i = 0; i < self->lines - nrows + 1; ++i) {
offset = i + nrows;
self->line_start[i] = self->line_start[offset] - word_deletions;
-
- /* TRACE(("First word in line %d is now %s\n", i, */
- /* self->words[self->line_start[i]])); */
-
self->line_fields[i] = self->line_fields[offset];
}
self->lines -= nrows;
- /* self->line_fields[self->lines] = 0; */
return 0;
}
@@ -1241,47 +1202,50 @@ int parser_trim_buffers(parser_t *self) {
new_cap = _next_pow2(self->words_len) + 1;
if (new_cap < self->words_cap) {
TRACE(("parser_trim_buffers: new_cap < self->words_cap\n"));
- newptr = safe_realloc((void*) self->words, new_cap * sizeof(char*));
+ newptr = safe_realloc((void *)self->words, new_cap * sizeof(char *));
if (newptr == NULL) {
return PARSER_OUT_OF_MEMORY;
} else {
- self->words = (char**) newptr;
+ self->words = (char **)newptr;
}
- newptr = safe_realloc((void*) self->word_starts, new_cap * sizeof(int));
+ newptr = safe_realloc((void *)self->word_starts, new_cap * sizeof(int));
if (newptr == NULL) {
return PARSER_OUT_OF_MEMORY;
} else {
- self->word_starts = (int*) newptr;
+ self->word_starts = (int *)newptr;
self->words_cap = new_cap;
}
}
/* trim stream */
new_cap = _next_pow2(self->stream_len) + 1;
- TRACE(("parser_trim_buffers: new_cap = %zu, stream_cap = %zu, lines_cap = %zu\n",
- new_cap, self->stream_cap, self->lines_cap));
+ TRACE(
+ ("parser_trim_buffers: new_cap = %zu, stream_cap = %zu, lines_cap = "
+ "%zu\n",
+ new_cap, self->stream_cap, self->lines_cap));
if (new_cap < self->stream_cap) {
- TRACE(("parser_trim_buffers: new_cap < self->stream_cap, calling safe_realloc\n"));
- newptr = safe_realloc((void*) self->stream, new_cap);
+ TRACE(
+ ("parser_trim_buffers: new_cap < self->stream_cap, calling "
+ "safe_realloc\n"));
+ newptr = safe_realloc((void *)self->stream, new_cap);
if (newptr == NULL) {
return PARSER_OUT_OF_MEMORY;
} else {
- // Update the pointers in the self->words array (char **) if `safe_realloc`
- // moved the `self->stream` buffer. This block mirrors a similar block in
+ // Update the pointers in the self->words array (char **) if
+ // `safe_realloc`
+ // moved the `self->stream` buffer. This block mirrors a similar
+ // block in
// `make_stream_space`.
if (self->stream != newptr) {
- /* TRACE(("Moving word pointers\n")) */
- self->pword_start = (char*) newptr + self->word_start;
+ self->pword_start = (char *)newptr + self->word_start;
- for (i = 0; i < self->words_len; ++i)
- {
- self->words[i] = (char*) newptr + self->word_starts[i];
+ for (i = 0; i < self->words_len; ++i) {
+ self->words[i] = (char *)newptr + self->word_starts[i];
}
}
self->stream = newptr;
self->stream_cap = new_cap;
-
}
}
@@ -1289,17 +1253,17 @@ int parser_trim_buffers(parser_t *self) {
new_cap = _next_pow2(self->lines) + 1;
if (new_cap < self->lines_cap) {
TRACE(("parser_trim_buffers: new_cap < self->lines_cap\n"));
- newptr = safe_realloc((void*) self->line_start, new_cap * sizeof(int));
+ newptr = safe_realloc((void *)self->line_start, new_cap * sizeof(int));
if (newptr == NULL) {
return PARSER_OUT_OF_MEMORY;
} else {
- self->line_start = (int*) newptr;
+ self->line_start = (int *)newptr;
}
- newptr = safe_realloc((void*) self->line_fields, new_cap * sizeof(int));
+ newptr = safe_realloc((void *)self->line_fields, new_cap * sizeof(int));
if (newptr == NULL) {
return PARSER_OUT_OF_MEMORY;
} else {
- self->line_fields = (int*) newptr;
+ self->line_fields = (int *)newptr;
self->lines_cap = new_cap;
}
}
@@ -1311,12 +1275,10 @@ void debug_print_parser(parser_t *self) {
int j, line;
char *token;
- for (line = 0; line < self->lines; ++line)
- {
+ for (line = 0; line < self->lines; ++line) {
printf("(Parsed) Line %d: ", line);
- for (j = 0; j < self->line_fields[j]; ++j)
- {
+ for (j = 0; j < self->line_fields[j]; ++j) {
token = self->words[j + self->line_start[line]];
printf("%s ", token);
}
@@ -1324,13 +1286,6 @@ void debug_print_parser(parser_t *self) {
}
}
-/*int clear_parsed_lines(parser_t *self, size_t nlines) {
- // TODO. move data up in stream, shift relevant word pointers
-
- return 0;
-}*/
-
-
/*
nrows : number of rows to tokenize (or until reach EOF)
all : tokenize all the data vs. certain number of rows
@@ -1344,12 +1299,12 @@ int _tokenize_helper(parser_t *self, size_t nrows, int all) {
return 0;
}
- TRACE(("_tokenize_helper: Asked to tokenize %d rows, datapos=%d, datalen=%d\n", \
- (int) nrows, self->datapos, self->datalen));
+ TRACE((
+ "_tokenize_helper: Asked to tokenize %d rows, datapos=%d, datalen=%d\n",
+ (int)nrows, self->datapos, self->datalen));
while (1) {
- if (!all && self->lines - start_lines >= nrows)
- break;
+ if (!all && self->lines - start_lines >= nrows) break;
if (self->datapos == self->datalen) {
status = parser_buffer_bytes(self, self->chunksize);
@@ -1364,15 +1319,19 @@ int _tokenize_helper(parser_t *self, size_t nrows, int all) {
}
}
- TRACE(("_tokenize_helper: Trying to process %d bytes, datalen=%d, datapos= %d\n",
- self->datalen - self->datapos, self->datalen, self->datapos));
+ TRACE(
+ ("_tokenize_helper: Trying to process %d bytes, datalen=%d, "
+ "datapos= %d\n",
+ self->datalen - self->datapos, self->datalen, self->datapos));
- status = tokenize_bytes(self, nrows);
+ status = tokenize_bytes(self, nrows, start_lines);
if (status < 0) {
// XXX
- TRACE(("_tokenize_helper: Status %d returned from tokenize_bytes, breaking\n",
- status));
+ TRACE(
+ ("_tokenize_helper: Status %d returned from tokenize_bytes, "
+ "breaking\n",
+ status));
status = -1;
break;
}
@@ -1391,86 +1350,11 @@ int tokenize_all_rows(parser_t *self) {
return status;
}
-/* SEL - does not look like this routine is used anywhere
-void test_count_lines(char *fname) {
- clock_t start = clock();
-
- char *buffer, *tmp;
- size_t bytes, lines = 0;
- int i;
- FILE *fp = fopen(fname, "rb");
-
- buffer = (char*) malloc(CHUNKSIZE * sizeof(char));
-
- while(1) {
- tmp = buffer;
- bytes = fread((void *) buffer, sizeof(char), CHUNKSIZE, fp);
- // printf("Read %d bytes\n", bytes);
-
- if (bytes == 0) {
- break;
- }
-
- for (i = 0; i < bytes; ++i)
- {
- if (*tmp++ == '\n') {
- lines++;
- }
- }
- }
-
-
- printf("Saw %d lines\n", (int) lines);
-
- free(buffer);
- fclose(fp);
-
- printf("Time elapsed: %f\n", ((double)clock() - start) / CLOCKS_PER_SEC);
-}*/
-
-
P_INLINE void uppercase(char *p) {
- for ( ; *p; ++p) *p = toupper(*p);
-}
-
-/* SEL - does not look like these routines are used anywhere
-P_INLINE void lowercase(char *p) {
- for ( ; *p; ++p) *p = tolower(*p);
+ for (; *p; ++p) *p = toupper(*p);
}
-int P_INLINE to_complex(char *item, double *p_real, double *p_imag, char sci, char decimal)
-{
- char *p_end;
-
- *p_real = xstrtod(item, &p_end, decimal, sci, '\0', FALSE);
- if (*p_end == '\0') {
- *p_imag = 0.0;
- return errno == 0;
- }
- if (*p_end == 'i' || *p_end == 'j') {
- *p_imag = *p_real;
- *p_real = 0.0;
- ++p_end;
- }
- else {
- if (*p_end == '+') {
- ++p_end;
- }
- *p_imag = xstrtod(p_end, &p_end, decimal, sci, '\0', FALSE);
- if (errno || ((*p_end != 'i') && (*p_end != 'j'))) {
- return FALSE;
- }
- ++p_end;
- }
- while(*p_end == ' ') {
- ++p_end;
- }
- return *p_end == '\0';
-}*/
-
-
-int P_INLINE to_longlong(char *item, long long *p_value)
-{
+int P_INLINE to_longlong(char *item, long long *p_value) {
char *p_end;
// Try integer conversion. We explicitly give the base to be 10. If
@@ -1485,65 +1369,26 @@ int P_INLINE to_longlong(char *item, long long *p_value)
return (errno == 0) && (!*p_end);
}
-/* does not look like this routine is used anywhere
-int P_INLINE to_longlong_thousands(char *item, long long *p_value, char tsep)
-{
- int i, pos, status, n = strlen(item), count = 0;
- char *tmp;
- char *p_end;
-
- for (i = 0; i < n; ++i)
- {
- if (*(item + i) == tsep) {
- count++;
- }
- }
-
- if (count == 0) {
- return to_longlong(item, p_value);
- }
-
- tmp = (char*) malloc((n - count + 1) * sizeof(char));
- if (tmp == NULL) {
- return 0;
- }
-
- pos = 0;
- for (i = 0; i < n; ++i)
- {
- if (item[i] != tsep)
- tmp[pos++] = item[i];
- }
-
- tmp[pos] = '\0';
-
- status = to_longlong(tmp, p_value);
- free(tmp);
-
- return status;
-}*/
-
int to_boolean(const char *item, uint8_t *val) {
char *tmp;
int i, status = 0;
+ int bufsize = sizeof(char) * (strlen(item) + 1);
static const char *tstrs[1] = {"TRUE"};
static const char *fstrs[1] = {"FALSE"};
- tmp = malloc(sizeof(char) * (strlen(item) + 1));
- strcpy(tmp, item);
+ tmp = malloc(bufsize);
+ strncpy(tmp, item, bufsize);
uppercase(tmp);
- for (i = 0; i < 1; ++i)
- {
+ for (i = 0; i < 1; ++i) {
if (strcmp(tmp, tstrs[i]) == 0) {
*val = 1;
goto done;
}
}
- for (i = 0; i < 1; ++i)
- {
+ for (i = 0; i < 1; ++i) {
if (strcmp(tmp, fstrs[i]) == 0) {
*val = 0;
goto done;
@@ -1557,27 +1402,19 @@ int to_boolean(const char *item, uint8_t *val) {
return status;
}
-// #define TEST
-
#ifdef TEST
-int main(int argc, char *argv[])
-{
+int main(int argc, char *argv[]) {
double x, y;
long long xi;
int status;
char *s;
- //s = "0.10e-3-+5.5e2i";
- // s = "1-0j";
- // status = to_complex(s, &x, &y, 'e', '.');
s = "123,789";
status = to_longlong_thousands(s, &xi, ',');
printf("s = '%s'\n", s);
printf("status = %d\n", status);
- printf("x = %d\n", (int) xi);
-
- // printf("x = %lg, y = %lg\n", x, y);
+ printf("x = %d\n", (int)xi);
return 0;
}
@@ -1606,10 +1443,12 @@ int main(int argc, char *argv[])
// may be used to endorse or promote products derived from this software
// without specific prior written permission.
//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND
// ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+// LIABLE
// FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
// DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
// OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
@@ -1628,197 +1467,185 @@ int main(int argc, char *argv[])
// * Add tsep argument for thousands separator
//
-double xstrtod(const char *str, char **endptr, char decimal,
- char sci, char tsep, int skip_trailing)
-{
- double number;
- int exponent;
- int negative;
- char *p = (char *) str;
- double p10;
- int n;
- int num_digits;
- int num_decimals;
-
- errno = 0;
-
- // Skip leading whitespace
- while (isspace(*p)) p++;
-
- // Handle optional sign
- negative = 0;
- switch (*p)
- {
- case '-': negative = 1; // Fall through to increment position
- case '+': p++;
- }
-
- number = 0.;
- exponent = 0;
- num_digits = 0;
- num_decimals = 0;
-
- // Process string of digits
- while (isdigit(*p))
- {
- number = number * 10. + (*p - '0');
- p++;
- num_digits++;
-
- p += (tsep != '\0' && *p == tsep);
- }
-
- // Process decimal part
- if (*p == decimal)
- {
- p++;
-
- while (isdigit(*p))
- {
- number = number * 10. + (*p - '0');
- p++;
- num_digits++;
- num_decimals++;
- }
-
- exponent -= num_decimals;
- }
-
- if (num_digits == 0)
- {
- errno = ERANGE;
- return 0.0;
- }
-
- // Correct for sign
- if (negative) number = -number;
-
- // Process an exponent string
- if (toupper(*p) == toupper(sci))
- {
- // Handle optional sign
+double xstrtod(const char *str, char **endptr, char decimal, char sci,
+ char tsep, int skip_trailing) {
+ double number;
+ int exponent;
+ int negative;
+ char *p = (char *)str;
+ double p10;
+ int n;
+ int num_digits;
+ int num_decimals;
+
+ errno = 0;
+
+ // Skip leading whitespace.
+ while (isspace(*p)) p++;
+
+ // Handle optional sign.
negative = 0;
- switch (*++p)
- {
- case '-': negative = 1; // Fall through to increment pos
- case '+': p++;
+ switch (*p) {
+ case '-':
+ negative = 1; // Fall through to increment position.
+ case '+':
+ p++;
}
- // Process string of digits
+ number = 0.;
+ exponent = 0;
num_digits = 0;
- n = 0;
- while (isdigit(*p))
- {
- n = n * 10 + (*p - '0');
- num_digits++;
- p++;
+ num_decimals = 0;
+
+ // Process string of digits.
+ while (isdigit(*p)) {
+ number = number * 10. + (*p - '0');
+ p++;
+ num_digits++;
+
+ p += (tsep != '\0' && *p == tsep);
}
- if (negative)
- exponent -= n;
- else
- exponent += n;
+ // Process decimal part.
+ if (*p == decimal) {
+ p++;
- // If no digits, after the 'e'/'E', un-consume it
- if (num_digits == 0)
- p--;
- }
+ while (isdigit(*p)) {
+ number = number * 10. + (*p - '0');
+ p++;
+ num_digits++;
+ num_decimals++;
+ }
+ exponent -= num_decimals;
+ }
- if (exponent < DBL_MIN_EXP || exponent > DBL_MAX_EXP)
- {
+ if (num_digits == 0) {
+ errno = ERANGE;
+ return 0.0;
+ }
- errno = ERANGE;
- return HUGE_VAL;
- }
+ // Correct for sign.
+ if (negative) number = -number;
- // Scale the result
- p10 = 10.;
- n = exponent;
- if (n < 0) n = -n;
- while (n)
- {
- if (n & 1)
- {
- if (exponent < 0)
- number /= p10;
- else
- number *= p10;
+ // Process an exponent string.
+ if (toupper(*p) == toupper(sci)) {
+ // Handle optional sign.
+ negative = 0;
+ switch (*++p) {
+ case '-':
+ negative = 1; // Fall through to increment pos.
+ case '+':
+ p++;
+ }
+
+ // Process string of digits.
+ num_digits = 0;
+ n = 0;
+ while (isdigit(*p)) {
+ n = n * 10 + (*p - '0');
+ num_digits++;
+ p++;
+ }
+
+ if (negative)
+ exponent -= n;
+ else
+ exponent += n;
+
+ // If no digits, after the 'e'/'E', un-consume it
+ if (num_digits == 0) p--;
}
- n >>= 1;
- p10 *= p10;
- }
+ if (exponent < DBL_MIN_EXP || exponent > DBL_MAX_EXP) {
+ errno = ERANGE;
+ return HUGE_VAL;
+ }
- if (number == HUGE_VAL) {
- errno = ERANGE;
- }
+ // Scale the result.
+ p10 = 10.;
+ n = exponent;
+ if (n < 0) n = -n;
+ while (n) {
+ if (n & 1) {
+ if (exponent < 0)
+ number /= p10;
+ else
+ number *= p10;
+ }
+ n >>= 1;
+ p10 *= p10;
+ }
- if (skip_trailing) {
- // Skip trailing whitespace
- while (isspace(*p)) p++;
- }
+ if (number == HUGE_VAL) {
+ errno = ERANGE;
+ }
- if (endptr) *endptr = p;
+ if (skip_trailing) {
+ // Skip trailing whitespace.
+ while (isspace(*p)) p++;
+ }
+ if (endptr) *endptr = p;
- return number;
+ return number;
}
-double precise_xstrtod(const char *str, char **endptr, char decimal,
- char sci, char tsep, int skip_trailing)
-{
+double precise_xstrtod(const char *str, char **endptr, char decimal, char sci,
+ char tsep, int skip_trailing) {
double number;
int exponent;
int negative;
- char *p = (char *) str;
+ char *p = (char *)str;
int num_digits;
int num_decimals;
int max_digits = 17;
int n;
- // Cache powers of 10 in memory
- static double e[] = {1., 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10,
- 1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19, 1e20,
- 1e21, 1e22, 1e23, 1e24, 1e25, 1e26, 1e27, 1e28, 1e29, 1e30,
- 1e31, 1e32, 1e33, 1e34, 1e35, 1e36, 1e37, 1e38, 1e39, 1e40,
- 1e41, 1e42, 1e43, 1e44, 1e45, 1e46, 1e47, 1e48, 1e49, 1e50,
- 1e51, 1e52, 1e53, 1e54, 1e55, 1e56, 1e57, 1e58, 1e59, 1e60,
- 1e61, 1e62, 1e63, 1e64, 1e65, 1e66, 1e67, 1e68, 1e69, 1e70,
- 1e71, 1e72, 1e73, 1e74, 1e75, 1e76, 1e77, 1e78, 1e79, 1e80,
- 1e81, 1e82, 1e83, 1e84, 1e85, 1e86, 1e87, 1e88, 1e89, 1e90,
- 1e91, 1e92, 1e93, 1e94, 1e95, 1e96, 1e97, 1e98, 1e99, 1e100,
- 1e101, 1e102, 1e103, 1e104, 1e105, 1e106, 1e107, 1e108, 1e109, 1e110,
- 1e111, 1e112, 1e113, 1e114, 1e115, 1e116, 1e117, 1e118, 1e119, 1e120,
- 1e121, 1e122, 1e123, 1e124, 1e125, 1e126, 1e127, 1e128, 1e129, 1e130,
- 1e131, 1e132, 1e133, 1e134, 1e135, 1e136, 1e137, 1e138, 1e139, 1e140,
- 1e141, 1e142, 1e143, 1e144, 1e145, 1e146, 1e147, 1e148, 1e149, 1e150,
- 1e151, 1e152, 1e153, 1e154, 1e155, 1e156, 1e157, 1e158, 1e159, 1e160,
- 1e161, 1e162, 1e163, 1e164, 1e165, 1e166, 1e167, 1e168, 1e169, 1e170,
- 1e171, 1e172, 1e173, 1e174, 1e175, 1e176, 1e177, 1e178, 1e179, 1e180,
- 1e181, 1e182, 1e183, 1e184, 1e185, 1e186, 1e187, 1e188, 1e189, 1e190,
- 1e191, 1e192, 1e193, 1e194, 1e195, 1e196, 1e197, 1e198, 1e199, 1e200,
- 1e201, 1e202, 1e203, 1e204, 1e205, 1e206, 1e207, 1e208, 1e209, 1e210,
- 1e211, 1e212, 1e213, 1e214, 1e215, 1e216, 1e217, 1e218, 1e219, 1e220,
- 1e221, 1e222, 1e223, 1e224, 1e225, 1e226, 1e227, 1e228, 1e229, 1e230,
- 1e231, 1e232, 1e233, 1e234, 1e235, 1e236, 1e237, 1e238, 1e239, 1e240,
- 1e241, 1e242, 1e243, 1e244, 1e245, 1e246, 1e247, 1e248, 1e249, 1e250,
- 1e251, 1e252, 1e253, 1e254, 1e255, 1e256, 1e257, 1e258, 1e259, 1e260,
- 1e261, 1e262, 1e263, 1e264, 1e265, 1e266, 1e267, 1e268, 1e269, 1e270,
- 1e271, 1e272, 1e273, 1e274, 1e275, 1e276, 1e277, 1e278, 1e279, 1e280,
- 1e281, 1e282, 1e283, 1e284, 1e285, 1e286, 1e287, 1e288, 1e289, 1e290,
- 1e291, 1e292, 1e293, 1e294, 1e295, 1e296, 1e297, 1e298, 1e299, 1e300,
- 1e301, 1e302, 1e303, 1e304, 1e305, 1e306, 1e307, 1e308};
+ // Cache powers of 10 in memory.
+ static double e[] = {
+ 1., 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9,
+ 1e10, 1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19,
+ 1e20, 1e21, 1e22, 1e23, 1e24, 1e25, 1e26, 1e27, 1e28, 1e29,
+ 1e30, 1e31, 1e32, 1e33, 1e34, 1e35, 1e36, 1e37, 1e38, 1e39,
+ 1e40, 1e41, 1e42, 1e43, 1e44, 1e45, 1e46, 1e47, 1e48, 1e49,
+ 1e50, 1e51, 1e52, 1e53, 1e54, 1e55, 1e56, 1e57, 1e58, 1e59,
+ 1e60, 1e61, 1e62, 1e63, 1e64, 1e65, 1e66, 1e67, 1e68, 1e69,
+ 1e70, 1e71, 1e72, 1e73, 1e74, 1e75, 1e76, 1e77, 1e78, 1e79,
+ 1e80, 1e81, 1e82, 1e83, 1e84, 1e85, 1e86, 1e87, 1e88, 1e89,
+ 1e90, 1e91, 1e92, 1e93, 1e94, 1e95, 1e96, 1e97, 1e98, 1e99,
+ 1e100, 1e101, 1e102, 1e103, 1e104, 1e105, 1e106, 1e107, 1e108, 1e109,
+ 1e110, 1e111, 1e112, 1e113, 1e114, 1e115, 1e116, 1e117, 1e118, 1e119,
+ 1e120, 1e121, 1e122, 1e123, 1e124, 1e125, 1e126, 1e127, 1e128, 1e129,
+ 1e130, 1e131, 1e132, 1e133, 1e134, 1e135, 1e136, 1e137, 1e138, 1e139,
+ 1e140, 1e141, 1e142, 1e143, 1e144, 1e145, 1e146, 1e147, 1e148, 1e149,
+ 1e150, 1e151, 1e152, 1e153, 1e154, 1e155, 1e156, 1e157, 1e158, 1e159,
+ 1e160, 1e161, 1e162, 1e163, 1e164, 1e165, 1e166, 1e167, 1e168, 1e169,
+ 1e170, 1e171, 1e172, 1e173, 1e174, 1e175, 1e176, 1e177, 1e178, 1e179,
+ 1e180, 1e181, 1e182, 1e183, 1e184, 1e185, 1e186, 1e187, 1e188, 1e189,
+ 1e190, 1e191, 1e192, 1e193, 1e194, 1e195, 1e196, 1e197, 1e198, 1e199,
+ 1e200, 1e201, 1e202, 1e203, 1e204, 1e205, 1e206, 1e207, 1e208, 1e209,
+ 1e210, 1e211, 1e212, 1e213, 1e214, 1e215, 1e216, 1e217, 1e218, 1e219,
+ 1e220, 1e221, 1e222, 1e223, 1e224, 1e225, 1e226, 1e227, 1e228, 1e229,
+ 1e230, 1e231, 1e232, 1e233, 1e234, 1e235, 1e236, 1e237, 1e238, 1e239,
+ 1e240, 1e241, 1e242, 1e243, 1e244, 1e245, 1e246, 1e247, 1e248, 1e249,
+ 1e250, 1e251, 1e252, 1e253, 1e254, 1e255, 1e256, 1e257, 1e258, 1e259,
+ 1e260, 1e261, 1e262, 1e263, 1e264, 1e265, 1e266, 1e267, 1e268, 1e269,
+ 1e270, 1e271, 1e272, 1e273, 1e274, 1e275, 1e276, 1e277, 1e278, 1e279,
+ 1e280, 1e281, 1e282, 1e283, 1e284, 1e285, 1e286, 1e287, 1e288, 1e289,
+ 1e290, 1e291, 1e292, 1e293, 1e294, 1e295, 1e296, 1e297, 1e298, 1e299,
+ 1e300, 1e301, 1e302, 1e303, 1e304, 1e305, 1e306, 1e307, 1e308};
errno = 0;
- // Skip leading whitespace
+ // Skip leading whitespace.
while (isspace(*p)) p++;
- // Handle optional sign
+ // Handle optional sign.
negative = 0;
- switch (*p)
- {
- case '-': negative = 1; // Fall through to increment position
- case '+': p++;
+ switch (*p) {
+ case '-':
+ negative = 1; // Fall through to increment position.
+ case '+':
+ p++;
}
number = 0.;
@@ -1826,66 +1653,59 @@ double precise_xstrtod(const char *str, char **endptr, char decimal,
num_digits = 0;
num_decimals = 0;
- // Process string of digits
- while (isdigit(*p))
- {
- if (num_digits < max_digits)
- {
+ // Process string of digits.
+ while (isdigit(*p)) {
+ if (num_digits < max_digits) {
number = number * 10. + (*p - '0');
num_digits++;
- }
- else
+ } else {
++exponent;
+ }
p++;
p += (tsep != '\0' && *p == tsep);
}
// Process decimal part
- if (*p == decimal)
- {
+ if (*p == decimal) {
p++;
- while (num_digits < max_digits && isdigit(*p))
- {
+ while (num_digits < max_digits && isdigit(*p)) {
number = number * 10. + (*p - '0');
p++;
num_digits++;
num_decimals++;
}
- if (num_digits >= max_digits) // consume extra decimal digits
- while (isdigit(*p))
- ++p;
+ if (num_digits >= max_digits) // Consume extra decimal digits.
+ while (isdigit(*p)) ++p;
exponent -= num_decimals;
}
- if (num_digits == 0)
- {
+ if (num_digits == 0) {
errno = ERANGE;
return 0.0;
}
- // Correct for sign
+ // Correct for sign.
if (negative) number = -number;
- // Process an exponent string
- if (toupper(*p) == toupper(sci))
- {
+ // Process an exponent string.
+ if (toupper(*p) == toupper(sci)) {
// Handle optional sign
negative = 0;
- switch (*++p)
- {
- case '-': negative = 1; // Fall through to increment pos
- case '+': p++;
+ switch (*++p) {
+ case '-':
+ negative = 1; // Fall through to increment pos.
+ case '+':
+ p++;
}
- // Process string of digits
+ // Process string of digits.
num_digits = 0;
n = 0;
- while (isdigit(*p))
- {
+ while (isdigit(*p)) {
n = n * 10 + (*p - '0');
num_digits++;
p++;
@@ -1896,33 +1716,28 @@ double precise_xstrtod(const char *str, char **endptr, char decimal,
else
exponent += n;
- // If no digits, after the 'e'/'E', un-consume it
- if (num_digits == 0)
- p--;
+ // If no digits after the 'e'/'E', un-consume it.
+ if (num_digits == 0) p--;
}
- if (exponent > 308)
- {
+ if (exponent > 308) {
errno = ERANGE;
return HUGE_VAL;
- }
- else if (exponent > 0)
+ } else if (exponent > 0) {
number *= e[exponent];
- else if (exponent < -308) // subnormal
- {
- if (exponent < -616) // prevent invalid array access
+ } else if (exponent < -308) { // Subnormal
+ if (exponent < -616) // Prevent invalid array access.
number = 0.;
number /= e[-308 - exponent];
number /= e[308];
- }
- else
+ } else {
number /= e[-exponent];
+ }
- if (number == HUGE_VAL || number == -HUGE_VAL)
- errno = ERANGE;
+ if (number == HUGE_VAL || number == -HUGE_VAL) errno = ERANGE;
if (skip_trailing) {
- // Skip trailing whitespace
+ // Skip trailing whitespace.
while (isspace(*p)) p++;
}
@@ -1930,9 +1745,8 @@ double precise_xstrtod(const char *str, char **endptr, char decimal,
return number;
}
-double round_trip(const char *p, char **q, char decimal, char sci,
- char tsep, int skip_trailing)
-{
+double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
+ int skip_trailing) {
#if PY_VERSION_HEX >= 0x02070000
return PyOS_string_to_double(p, q, 0);
#else
@@ -1940,31 +1754,12 @@ double round_trip(const char *p, char **q, char decimal, char sci,
#endif
}
-/*
-float strtof(const char *str, char **endptr)
-{
- return (float) strtod(str, endptr);
-}
-
-
-long double strtold(const char *str, char **endptr)
-{
- return strtod(str, endptr);
-}
-
-double atof(const char *str)
-{
- return strtod(str, NULL);
-}
-*/
-
// End of xstrtod code
// ---------------------------------------------------------------------------
int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
- int *error, char tsep)
-{
- const char *p = (const char *) p_item;
+ int *error, char tsep) {
+ const char *p = (const char *)p_item;
int isneg = 0;
int64_t number = 0;
int d;
@@ -1978,8 +1773,7 @@ int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
if (*p == '-') {
isneg = 1;
++p;
- }
- else if (*p == '+') {
+ } else if (*p == '+') {
p++;
}
@@ -2008,11 +1802,9 @@ int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
}
if ((number > pre_min) ||
((number == pre_min) && (d - '0' <= dig_pre_min))) {
-
number = number * 10 - (d - '0');
d = *++p;
- }
- else {
+ } else {
*error = ERROR_OVERFLOW;
return 0;
}
@@ -2021,25 +1813,20 @@ int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
while (isdigit(d)) {
if ((number > pre_min) ||
((number == pre_min) && (d - '0' <= dig_pre_min))) {
-
number = number * 10 - (d - '0');
d = *++p;
- }
- else {
+ } else {
*error = ERROR_OVERFLOW;
return 0;
}
}
}
- }
- else {
+ } else {
// If number is less than pre_max, at least one more digit
// can be processed without overflowing.
int64_t pre_max = int_max / 10;
int dig_pre_max = int_max % 10;
- //printf("pre_max = %lld dig_pre_max = %d\n", pre_max, dig_pre_max);
-
// Process the digits.
d = *p;
if (tsep != '\0') {
@@ -2052,12 +1839,10 @@ int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
}
if ((number < pre_max) ||
((number == pre_max) && (d - '0' <= dig_pre_max))) {
-
number = number * 10 + (d - '0');
d = *++p;
- }
- else {
+ } else {
*error = ERROR_OVERFLOW;
return 0;
}
@@ -2066,12 +1851,10 @@ int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
while (isdigit(d)) {
if ((number < pre_max) ||
((number == pre_max) && (d - '0' <= dig_pre_max))) {
-
number = number * 10 + (d - '0');
d = *++p;
- }
- else {
+ } else {
*error = ERROR_OVERFLOW;
return 0;
}
@@ -2093,66 +1876,3 @@ int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
*error = 0;
return number;
}
-
-/* does not look like this routine is used anywhere
-uint64_t str_to_uint64(const char *p_item, uint64_t uint_max, int *error)
-{
- int d, dig_pre_max;
- uint64_t pre_max;
- const char *p = (const char *) p_item;
- uint64_t number = 0;
-
- // Skip leading spaces.
- while (isspace(*p)) {
- ++p;
- }
-
- // Handle sign.
- if (*p == '-') {
- *error = ERROR_MINUS_SIGN;
- return 0;
- }
- if (*p == '+') {
- p++;
- }
-
- // Check that there is a first digit.
- if (!isdigit(*p)) {
- // Error...
- *error = ERROR_NO_DIGITS;
- return 0;
- }
-
- // If number is less than pre_max, at least one more digit
- // can be processed without overflowing.
- pre_max = uint_max / 10;
- dig_pre_max = uint_max % 10;
-
- // Process the digits.
- d = *p;
- while (isdigit(d)) {
- if ((number < pre_max) || ((number == pre_max) && (d - '0' <= dig_pre_max))) {
- number = number * 10 + (d - '0');
- d = *++p;
- }
- else {
- *error = ERROR_OVERFLOW;
- return 0;
- }
- }
-
- // Skip trailing spaces.
- while (isspace(*p)) {
- ++p;
- }
-
- // Did we use up all the characters?
- if (*p) {
- *error = ERROR_INVALID_CHARS;
- return 0;
- }
-
- *error = 0;
- return number;
-}
-*/
diff --git a/pandas/src/parser/tokenizer.h b/pandas/src/parser/tokenizer.h
index 8f7ae436bb7b7..e01812f1c5520 100644
--- a/pandas/src/parser/tokenizer.h
+++ b/pandas/src/parser/tokenizer.h
@@ -9,29 +9,29 @@ See LICENSE for the license
*/
-#ifndef _PARSER_COMMON_H_
-#define _PARSER_COMMON_H_
+#ifndef PANDAS_SRC_PARSER_TOKENIZER_H_
+#define PANDAS_SRC_PARSER_TOKENIZER_H_
-#include "Python.h"
+#include
#include
-#include
#include
+#include
#include
-#include
+#include "Python.h"
#include
-#define ERROR_OK 0
-#define ERROR_NO_DIGITS 1
-#define ERROR_OVERFLOW 2
-#define ERROR_INVALID_CHARS 3
-#define ERROR_MINUS_SIGN 4
+#define ERROR_OK 0
+#define ERROR_NO_DIGITS 1
+#define ERROR_OVERFLOW 2
+#define ERROR_INVALID_CHARS 3
+#define ERROR_MINUS_SIGN 4
#include "../headers/stdint.h"
#include "khash.h"
-#define CHUNKSIZE 1024*256
+#define CHUNKSIZE 1024 * 256
#define KB 1024
#define MB 1024 * KB
#define STREAM_INIT_SIZE 32
@@ -40,15 +40,15 @@ See LICENSE for the license
#define CALLING_READ_FAILED 2
#ifndef P_INLINE
- #if defined(__GNUC__)
- #define P_INLINE static __inline__
- #elif defined(_MSC_VER)
- #define P_INLINE
- #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
- #define P_INLINE static inline
- #else
- #define P_INLINE
- #endif
+#if defined(__GNUC__)
+#define P_INLINE static __inline__
+#elif defined(_MSC_VER)
+#define P_INLINE
+#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+#define P_INLINE static inline
+#else
+#define P_INLINE
+#endif
#endif
#if defined(_MSC_VER)
@@ -62,41 +62,34 @@ See LICENSE for the license
*/
#define FALSE 0
-#define TRUE 1
-
-/* Maximum number of columns in a file. */
-#define MAX_NUM_COLUMNS 2000
+#define TRUE 1
-/* Maximum number of characters in single field. */
-
-#define FIELD_BUFFER_SIZE 2000
+// Maximum number of columns in a file.
+#define MAX_NUM_COLUMNS 2000
+// Maximum number of characters in single field.
+#define FIELD_BUFFER_SIZE 2000
/*
* Common set of error types for the read_rows() and tokenize()
* functions.
*/
-
-#define ERROR_OUT_OF_MEMORY 1
-#define ERROR_INVALID_COLUMN_INDEX 10
+#define ERROR_OUT_OF_MEMORY 1
+#define ERROR_INVALID_COLUMN_INDEX 10
#define ERROR_CHANGED_NUMBER_OF_FIELDS 12
-#define ERROR_TOO_MANY_CHARS 21
-#define ERROR_TOO_MANY_FIELDS 22
-#define ERROR_NO_DATA 23
-
-
-/* #define VERBOSE */
+#define ERROR_TOO_MANY_CHARS 21
+#define ERROR_TOO_MANY_FIELDS 22
+#define ERROR_NO_DATA 23
+// #define VERBOSE
#if defined(VERBOSE)
#define TRACE(X) printf X;
#else
#define TRACE(X)
#endif
-
#define PARSER_OUT_OF_MEMORY -1
-
/*
* XXX Might want to couple count_rows() with read_rows() to avoid duplication
* of some file I/O.
@@ -108,7 +101,6 @@ See LICENSE for the license
*/
#define WORD_BUFFER_SIZE 4000
-
typedef enum {
START_RECORD,
START_FIELD,
@@ -123,19 +115,22 @@ typedef enum {
EAT_COMMENT,
EAT_LINE_COMMENT,
WHITESPACE_LINE,
- SKIP_LINE,
- QUOTE_IN_SKIP_LINE,
- QUOTE_IN_QUOTE_IN_SKIP_LINE,
+ START_FIELD_IN_SKIP_LINE,
+ IN_FIELD_IN_SKIP_LINE,
+ IN_QUOTED_FIELD_IN_SKIP_LINE,
+ QUOTE_IN_QUOTED_FIELD_IN_SKIP_LINE,
FINISHED
} ParserState;
typedef enum {
- QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONNUMERIC, QUOTE_NONE
+ QUOTE_MINIMAL,
+ QUOTE_ALL,
+ QUOTE_NONNUMERIC,
+ QUOTE_NONE
} QuoteStyle;
-
-typedef void* (*io_callback)(void *src, size_t nbytes, size_t *bytes_read,
- int *status);
+typedef void *(*io_callback)(void *src, size_t nbytes, size_t *bytes_read,
+ int *status);
typedef int (*io_cleanup)(void *src);
typedef struct parser_t {
@@ -155,38 +150,38 @@ typedef struct parser_t {
// Store words in (potentially ragged) matrix for now, hmm
char **words;
- int *word_starts; // where we are in the stream
+ int *word_starts; // where we are in the stream
int words_len;
int words_cap;
- char *pword_start; // pointer to stream start of current field
- int word_start; // position start of current field
+ char *pword_start; // pointer to stream start of current field
+ int word_start; // position start of current field
- int *line_start; // position in words for start of line
- int *line_fields; // Number of fields in each line
- int lines; // Number of (good) lines observed
- int file_lines; // Number of file lines observed (including bad or skipped)
- int lines_cap; // Vector capacity
+ int *line_start; // position in words for start of line
+ int *line_fields; // Number of fields in each line
+ int lines; // Number of (good) lines observed
+ int file_lines; // Number of file lines observed (including bad or skipped)
+ int lines_cap; // Vector capacity
// Tokenizing stuff
ParserState state;
- int doublequote; /* is " represented by ""? */
- char delimiter; /* field separator */
- int delim_whitespace; /* delimit by consuming space/tabs instead */
- char quotechar; /* quote character */
- char escapechar; /* escape character */
+ int doublequote; /* is " represented by ""? */
+ char delimiter; /* field separator */
+ int delim_whitespace; /* delimit by consuming space/tabs instead */
+ char quotechar; /* quote character */
+ char escapechar; /* escape character */
char lineterminator;
- int skipinitialspace; /* ignore spaces following delimiter? */
- int quoting; /* style of quoting to write */
+ int skipinitialspace; /* ignore spaces following delimiter? */
+ int quoting; /* style of quoting to write */
// krufty, hmm =/
int numeric_field;
char commentchar;
int allow_embedded_newline;
- int strict; /* raise exception on bad CSV */
+ int strict; /* raise exception on bad CSV */
- int usecols; // Boolean: 1: usecols provided, 0: none provided
+ int usecols; // Boolean: 1: usecols provided, 0: none provided
int expected_fields;
int error_bad_lines;
@@ -199,9 +194,9 @@ typedef struct parser_t {
// thousands separator (comma, period)
char thousands;
- int header; // Boolean: 1: has header, 0: no header
- int header_start; // header row start
- int header_end; // header row end
+ int header; // Boolean: 1: has header, 0: no header
+ int header_start; // header row start
+ int header_end; // header row end
void *skipset;
int64_t skip_first_N_rows;
@@ -215,7 +210,6 @@ typedef struct parser_t {
int skip_empty_lines;
} parser_t;
-
typedef struct coliter_t {
char **words;
int *line_start;
@@ -225,15 +219,13 @@ typedef struct coliter_t {
void coliter_setup(coliter_t *self, parser_t *parser, int i, int start);
coliter_t *coliter_new(parser_t *self, int i);
-/* #define COLITER_NEXT(iter) iter->words[iter->line_start[iter->line++] + iter->col] */
-// #define COLITER_NEXT(iter) iter.words[iter.line_start[iter.line++] + iter.col]
+#define COLITER_NEXT(iter, word) \
+ do { \
+ const int i = *iter.line_start++ + iter.col; \
+ word = i < *iter.line_start ? iter.words[i] : ""; \
+ } while (0)
-#define COLITER_NEXT(iter, word) do { \
- const int i = *iter.line_start++ + iter.col; \
- word = i < *iter.line_start ? iter.words[i]: ""; \
- } while(0)
-
-parser_t* parser_new(void);
+parser_t *parser_new(void);
int parser_init(parser_t *self);
@@ -255,24 +247,17 @@ int tokenize_nrows(parser_t *self, size_t nrows);
int tokenize_all_rows(parser_t *self);
-/*
-
- Have parsed / type-converted a chunk of data and want to free memory from the
- token stream
-
- */
-//int clear_parsed_lines(parser_t *self, size_t nlines);
-
-int64_t str_to_int64(const char *p_item, int64_t int_min,
- int64_t int_max, int *error, char tsep);
-//uint64_t str_to_uint64(const char *p_item, uint64_t uint_max, int *error);
-
-double xstrtod(const char *p, char **q, char decimal, char sci, char tsep, int skip_trailing);
-double precise_xstrtod(const char *p, char **q, char decimal, char sci, char tsep, int skip_trailing);
-double round_trip(const char *p, char **q, char decimal, char sci, char tsep, int skip_trailing);
-//int P_INLINE to_complex(char *item, double *p_real, double *p_imag, char sci, char decimal);
-//int P_INLINE to_longlong(char *item, long long *p_value);
-//int P_INLINE to_longlong_thousands(char *item, long long *p_value, char tsep);
+// Have parsed / type-converted a chunk of data
+// and want to free memory from the token stream
+
+int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
+ int *error, char tsep);
+double xstrtod(const char *p, char **q, char decimal, char sci, char tsep,
+ int skip_trailing);
+double precise_xstrtod(const char *p, char **q, char decimal, char sci,
+ char tsep, int skip_trailing);
+double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
+ int skip_trailing);
int to_boolean(const char *item, uint8_t *val);
-#endif // _PARSER_COMMON_H_
+#endif // PANDAS_SRC_PARSER_TOKENIZER_H_
diff --git a/pandas/src/period.pyx b/pandas/src/period.pyx
index 5565f25937394..2d92b9f192328 100644
--- a/pandas/src/period.pyx
+++ b/pandas/src/period.pyx
@@ -45,12 +45,12 @@ cdef bint PY2 = version_info[0] == 2
cdef int64_t NPY_NAT = util.get_nat()
-cdef int US_RESO = frequencies.US_RESO
-cdef int MS_RESO = frequencies.MS_RESO
-cdef int S_RESO = frequencies.S_RESO
-cdef int T_RESO = frequencies.T_RESO
-cdef int H_RESO = frequencies.H_RESO
-cdef int D_RESO = frequencies.D_RESO
+cdef int RESO_US = frequencies.RESO_US
+cdef int RESO_MS = frequencies.RESO_MS
+cdef int RESO_SEC = frequencies.RESO_SEC
+cdef int RESO_MIN = frequencies.RESO_MIN
+cdef int RESO_HR = frequencies.RESO_HR
+cdef int RESO_DAY = frequencies.RESO_DAY
cdef extern from "period_helper.h":
ctypedef struct date_info:
@@ -516,7 +516,7 @@ cpdef resolution(ndarray[int64_t] stamps, tz=None):
cdef:
Py_ssize_t i, n = len(stamps)
pandas_datetimestruct dts
- int reso = D_RESO, curr_reso
+ int reso = RESO_DAY, curr_reso
if tz is not None:
tz = maybe_get_tz(tz)
@@ -535,20 +535,20 @@ cpdef resolution(ndarray[int64_t] stamps, tz=None):
cdef inline int _reso_stamp(pandas_datetimestruct *dts):
if dts.us != 0:
if dts.us % 1000 == 0:
- return MS_RESO
- return US_RESO
+ return RESO_MS
+ return RESO_US
elif dts.sec != 0:
- return S_RESO
+ return RESO_SEC
elif dts.min != 0:
- return T_RESO
+ return RESO_MIN
elif dts.hour != 0:
- return H_RESO
- return D_RESO
+ return RESO_HR
+ return RESO_DAY
cdef _reso_local(ndarray[int64_t] stamps, object tz):
cdef:
Py_ssize_t n = len(stamps)
- int reso = D_RESO, curr_reso
+ int reso = RESO_DAY, curr_reso
ndarray[int64_t] trans, deltas, pos
pandas_datetimestruct dts
diff --git a/pandas/src/period_helper.c b/pandas/src/period_helper.c
index 6078be6fc3d19..19f810eb54ea7 100644
--- a/pandas/src/period_helper.c
+++ b/pandas/src/period_helper.c
@@ -1,30 +1,37 @@
-#include "period_helper.h"
+/*
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+Distributed under the terms of the BSD Simplified License.
-/*
- * Borrowed and derived code from scikits.timeseries that we will expose via
- * Cython to pandas. This primarily concerns period representation and
- * frequency conversion routines.
- */
+The full license is in the LICENSE file, distributed with this software.
-/* see end of file for stuff pandas uses (search for 'pandas') */
+Borrowed and derived code from scikits.timeseries that we will expose via
+Cython to pandas. This primarily concerns interval representation and
+frequency conversion routines.
+
+See end of file for stuff pandas uses (search for 'pandas').
+*/
+
+#include "period_helper.h"
/* ------------------------------------------------------------------
* Code derived from scikits.timeseries
* ------------------------------------------------------------------*/
static int mod_compat(int x, int m) {
- int result = x % m;
- if (result < 0) return result + m;
- return result;
+ int result = x % m;
+ if (result < 0) return result + m;
+ return result;
}
static int floordiv(int x, int divisor) {
if (x < 0) {
if (mod_compat(x, divisor)) {
return x / divisor - 1;
+ } else {
+ return x / divisor;
}
- else return x / divisor;
} else {
return x / divisor;
}
@@ -32,19 +39,16 @@ static int floordiv(int x, int divisor) {
/* Table with day offsets for each month (0-based, without and with leap) */
static int month_offset[2][13] = {
- { 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365 },
- { 0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366 }
-};
+ {0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365},
+ {0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366}};
/* Table of number of days in a month (0-based, without and with leap) */
static int days_in_month[2][12] = {
- { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 },
- { 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
-};
+ {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31},
+ {31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}};
/* Return 1/0 iff year points to a leap year in calendar. */
-static int dInfoCalc_Leapyear(npy_int64 year, int calendar)
-{
+static int dInfoCalc_Leapyear(npy_int64 year, int calendar) {
if (calendar == GREGORIAN_CALENDAR) {
return (year % 4 == 0) && ((year % 100 != 0) || (year % 400 == 0));
} else {
@@ -53,8 +57,7 @@ static int dInfoCalc_Leapyear(npy_int64 year, int calendar)
}
/* Return the day of the week for the given absolute date. */
-static int dInfoCalc_DayOfWeek(npy_int64 absdate)
-{
+static int dInfoCalc_DayOfWeek(npy_int64 absdate) {
int day_of_week;
if (absdate >= 1) {
@@ -65,7 +68,7 @@ static int dInfoCalc_DayOfWeek(npy_int64 absdate)
return day_of_week;
}
-static int monthToQuarter(int month) { return ((month-1)/3)+1; }
+static int monthToQuarter(int month) { return ((month - 1) / 3) + 1; }
/* Return the year offset, that is the absolute date of the day
31.12.(year-1) in the given calendar.
@@ -75,23 +78,22 @@ static int monthToQuarter(int month) { return ((month-1)/3)+1; }
using the Gregorian Epoch) value by two days because the Epoch
(0001-01-01) in the Julian calendar lies 2 days before the Epoch in
the Gregorian calendar. */
-static int dInfoCalc_YearOffset(npy_int64 year, int calendar)
-{
+static int dInfoCalc_YearOffset(npy_int64 year, int calendar) {
year--;
if (calendar == GREGORIAN_CALENDAR) {
- if (year >= 0 || -1/4 == -1)
- return year*365 + year/4 - year/100 + year/400;
- else
- return year*365 + (year-3)/4 - (year-99)/100 + (year-399)/400;
- }
- else if (calendar == JULIAN_CALENDAR) {
- if (year >= 0 || -1/4 == -1)
- return year*365 + year/4 - 2;
- else
- return year*365 + (year-3)/4 - 2;
+ if (year >= 0 || -1 / 4 == -1)
+ return year * 365 + year / 4 - year / 100 + year / 400;
+ else
+ return year * 365 + (year - 3) / 4 - (year - 99) / 100 +
+ (year - 399) / 400;
+ } else if (calendar == JULIAN_CALENDAR) {
+ if (year >= 0 || -1 / 4 == -1)
+ return year * 365 + year / 4 - 2;
+ else
+ return year * 365 + (year - 3) / 4 - 2;
}
Py_Error(PyExc_ValueError, "unknown calendar");
- onError:
+onError:
return INT_ERR_CODE;
}
@@ -99,39 +101,32 @@ static int dInfoCalc_YearOffset(npy_int64 year, int calendar)
* to the flags: GREGORIAN_CALENDAR, JULIAN_CALENDAR to indicate the calendar
* to be used. */
-static int dInfoCalc_SetFromDateAndTime(struct date_info *dinfo,
- int year, int month, int day, int hour, int minute, double second,
- int calendar)
-{
-
+static int dInfoCalc_SetFromDateAndTime(struct date_info *dinfo, int year,
+ int month, int day, int hour,
+ int minute, double second,
+ int calendar) {
/* Calculate the absolute date */
{
int leap;
- npy_int64 absdate;
+ npy_int64 absdate;
int yearoffset;
/* Range check */
- Py_AssertWithArg(year > -(INT_MAX / 366) && year < (INT_MAX / 366),
- PyExc_ValueError,
- "year out of range: %i",
- year);
+ Py_AssertWithArg(year > -(INT_MAX / 366) && year < (INT_MAX / 366),
+ PyExc_ValueError, "year out of range: %i", year);
/* Is it a leap year ? */
leap = dInfoCalc_Leapyear(year, calendar);
/* Negative month values indicate months relative to the years end */
if (month < 0) month += 13;
- Py_AssertWithArg(month >= 1 && month <= 12,
- PyExc_ValueError,
- "month out of range (1-12): %i",
- month);
+ Py_AssertWithArg(month >= 1 && month <= 12, PyExc_ValueError,
+ "month out of range (1-12): %i", month);
/* Negative values indicate days relative to the months end */
if (day < 0) day += days_in_month[leap][month - 1] + 1;
Py_AssertWithArg(day >= 1 && day <= days_in_month[leap][month - 1],
- PyExc_ValueError,
- "day out of range: %i",
- day);
+ PyExc_ValueError, "day out of range: %i", day);
yearoffset = dInfoCalc_YearOffset(year, calendar);
if (yearoffset == INT_ERR_CODE) goto onError;
@@ -142,7 +137,7 @@ static int dInfoCalc_SetFromDateAndTime(struct date_info *dinfo,
dinfo->year = year;
dinfo->month = month;
- dinfo->quarter = ((month-1)/3)+1;
+ dinfo->quarter = ((month - 1) / 3) + 1;
dinfo->day = day;
dinfo->day_of_week = dInfoCalc_DayOfWeek(absdate);
@@ -153,23 +148,18 @@ static int dInfoCalc_SetFromDateAndTime(struct date_info *dinfo,
/* Calculate the absolute time */
{
- Py_AssertWithArg(hour >= 0 && hour <= 23,
- PyExc_ValueError,
- "hour out of range (0-23): %i",
- hour);
- Py_AssertWithArg(minute >= 0 && minute <= 59,
- PyExc_ValueError,
- "minute out of range (0-59): %i",
- minute);
- Py_AssertWithArg(second >= (double)0.0 &&
+ Py_AssertWithArg(hour >= 0 && hour <= 23, PyExc_ValueError,
+ "hour out of range (0-23): %i", hour);
+ Py_AssertWithArg(minute >= 0 && minute <= 59, PyExc_ValueError,
+ "minute out of range (0-59): %i", minute);
+ Py_AssertWithArg(
+ second >= (double)0.0 &&
(second < (double)60.0 ||
- (hour == 23 && minute == 59 &&
- second < (double)61.0)),
- PyExc_ValueError,
- "second out of range (0.0 - <60.0; <61.0 for 23:59): %f",
- second);
+ (hour == 23 && minute == 59 && second < (double)61.0)),
+ PyExc_ValueError,
+ "second out of range (0.0 - <60.0; <61.0 for 23:59): %f", second);
- dinfo->abstime = (double)(hour*3600 + minute*60) + second;
+ dinfo->abstime = (double)(hour * 3600 + minute * 60) + second;
dinfo->hour = hour;
dinfo->minute = minute;
@@ -177,7 +167,7 @@ static int dInfoCalc_SetFromDateAndTime(struct date_info *dinfo,
}
return 0;
- onError:
+onError:
return INT_ERR_CODE;
}
@@ -186,13 +176,11 @@ static int dInfoCalc_SetFromDateAndTime(struct date_info *dinfo,
XXX This could also be done using some integer arithmetics rather
than with this iterative approach... */
-static
-int dInfoCalc_SetFromAbsDate(register struct date_info *dinfo,
- npy_int64 absdate, int calendar)
-{
+static int dInfoCalc_SetFromAbsDate(register struct date_info *dinfo,
+ npy_int64 absdate, int calendar) {
register npy_int64 year;
npy_int64 yearoffset;
- int leap,dayoffset;
+ int leap, dayoffset;
int *monthoffset;
/* Approximate year */
@@ -220,7 +208,7 @@ int dInfoCalc_SetFromAbsDate(register struct date_info *dinfo,
}
dayoffset = absdate - yearoffset;
- leap = dInfoCalc_Leapyear(year,calendar);
+ leap = dInfoCalc_Leapyear(year, calendar);
/* Forward correction: non leap years only have 365 days */
if (dayoffset > 365 && !leap) {
@@ -239,23 +227,21 @@ int dInfoCalc_SetFromAbsDate(register struct date_info *dinfo,
register int month;
for (month = 1; month < 13; month++) {
- if (monthoffset[month] >= dayoffset)
- break;
+ if (monthoffset[month] >= dayoffset) break;
}
dinfo->month = month;
dinfo->quarter = monthToQuarter(month);
- dinfo->day = dayoffset - month_offset[leap][month-1];
+ dinfo->day = dayoffset - month_offset[leap][month - 1];
}
-
dinfo->day_of_week = dInfoCalc_DayOfWeek(absdate);
dinfo->day_of_year = dayoffset;
dinfo->absdate = absdate;
return 0;
- onError:
+onError:
return INT_ERR_CODE;
}
@@ -269,39 +255,25 @@ int dInfoCalc_SetFromAbsDate(register struct date_info *dinfo,
// helpers for frequency conversion routines //
static int daytime_conversion_factors[][2] = {
- { FR_DAY, 1 },
- { FR_HR, 24 },
- { FR_MIN, 60 },
- { FR_SEC, 60 },
- { FR_MS, 1000 },
- { FR_US, 1000 },
- { FR_NS, 1000 },
- { 0, 0 }
-};
+ {FR_DAY, 1}, {FR_HR, 24}, {FR_MIN, 60}, {FR_SEC, 60},
+ {FR_MS, 1000}, {FR_US, 1000}, {FR_NS, 1000}, {0, 0}};
-static npy_int64** daytime_conversion_factor_matrix = NULL;
+static npy_int64 **daytime_conversion_factor_matrix = NULL;
-PANDAS_INLINE int max_value(int a, int b) {
- return a > b ? a : b;
-}
+PANDAS_INLINE int max_value(int a, int b) { return a > b ? a : b; }
-PANDAS_INLINE int min_value(int a, int b) {
- return a < b ? a : b;
-}
+PANDAS_INLINE int min_value(int a, int b) { return a < b ? a : b; }
-PANDAS_INLINE int get_freq_group(int freq) {
- return (freq/1000)*1000;
-}
+PANDAS_INLINE int get_freq_group(int freq) { return (freq / 1000) * 1000; }
-PANDAS_INLINE int get_freq_group_index(int freq) {
- return freq/1000;
-}
+PANDAS_INLINE int get_freq_group_index(int freq) { return freq / 1000; }
static int calc_conversion_factors_matrix_size(void) {
int matrix_size = 0;
int index;
- for (index=0;; index++) {
- int period_value = get_freq_group_index(daytime_conversion_factors[index][0]);
+ for (index = 0;; index++) {
+ int period_value =
+ get_freq_group_index(daytime_conversion_factors[index][0]);
if (period_value == 0) {
break;
}
@@ -313,9 +285,11 @@ static int calc_conversion_factors_matrix_size(void) {
static void alloc_conversion_factors_matrix(int matrix_size) {
int row_index;
int column_index;
- daytime_conversion_factor_matrix = malloc(matrix_size * sizeof(**daytime_conversion_factor_matrix));
+ daytime_conversion_factor_matrix =
+ malloc(matrix_size * sizeof(**daytime_conversion_factor_matrix));
for (row_index = 0; row_index < matrix_size; row_index++) {
- daytime_conversion_factor_matrix[row_index] = malloc(matrix_size * sizeof(**daytime_conversion_factor_matrix));
+ daytime_conversion_factor_matrix[row_index] =
+ malloc(matrix_size * sizeof(**daytime_conversion_factor_matrix));
for (column_index = 0; column_index < matrix_size; column_index++) {
daytime_conversion_factor_matrix[row_index][column_index] = 0;
}
@@ -325,7 +299,7 @@ static void alloc_conversion_factors_matrix(int matrix_size) {
static npy_int64 calculate_conversion_factor(int start_value, int end_value) {
npy_int64 conversion_factor = 0;
int index;
- for (index=0;; index++) {
+ for (index = 0;; index++) {
int freq_group = daytime_conversion_factors[index][0];
if (freq_group == 0) {
@@ -348,11 +322,11 @@ static npy_int64 calculate_conversion_factor(int start_value, int end_value) {
static void populate_conversion_factors_matrix(void) {
int row_index_index;
- int row_value, row_index;
+ int row_value, row_index;
int column_index_index;
- int column_value, column_index;
+ int column_value, column_index;
- for (row_index_index = 0;; row_index_index++) {
+ for (row_index_index = 0;; row_index_index++) {
row_value = daytime_conversion_factors[row_index_index][0];
if (row_value == 0) {
break;
@@ -365,7 +339,8 @@ static void populate_conversion_factors_matrix(void) {
}
column_index = get_freq_group_index(column_value);
- daytime_conversion_factor_matrix[row_index][column_index] = calculate_conversion_factor(row_value, column_value);
+ daytime_conversion_factor_matrix[row_index][column_index] =
+ calculate_conversion_factor(row_value, column_value);
}
}
}
@@ -378,13 +353,14 @@ void initialize_daytime_conversion_factor_matrix() {
}
}
-PANDAS_INLINE npy_int64 get_daytime_conversion_factor(int from_index, int to_index)
-{
- return daytime_conversion_factor_matrix[min_value(from_index, to_index)][max_value(from_index, to_index)];
+PANDAS_INLINE npy_int64 get_daytime_conversion_factor(int from_index,
+ int to_index) {
+ return daytime_conversion_factor_matrix[min_value(from_index, to_index)]
+ [max_value(from_index, to_index)];
}
-PANDAS_INLINE npy_int64 upsample_daytime(npy_int64 ordinal, asfreq_info *af_info, int atEnd)
-{
+PANDAS_INLINE npy_int64 upsample_daytime(npy_int64 ordinal,
+ asfreq_info *af_info, int atEnd) {
if (atEnd) {
return (ordinal + 1) * af_info->intraday_conversion_factor - 1;
} else {
@@ -392,14 +368,19 @@ PANDAS_INLINE npy_int64 upsample_daytime(npy_int64 ordinal, asfreq_info *af_info
}
}
-PANDAS_INLINE npy_int64 downsample_daytime(npy_int64 ordinal, asfreq_info *af_info, int atEnd)
-{
+PANDAS_INLINE npy_int64 downsample_daytime(npy_int64 ordinal,
+ asfreq_info *af_info, int atEnd) {
return ordinal / (af_info->intraday_conversion_factor);
}
-PANDAS_INLINE npy_int64 transform_via_day(npy_int64 ordinal, char relation, asfreq_info *af_info, freq_conv_func first_func, freq_conv_func second_func) {
- //printf("transform_via_day(%ld, %ld, %d)\n", ordinal, af_info->intraday_conversion_factor, af_info->intraday_conversion_upsample);
- npy_int64 result;
+PANDAS_INLINE npy_int64 transform_via_day(npy_int64 ordinal, char relation,
+ asfreq_info *af_info,
+ freq_conv_func first_func,
+ freq_conv_func second_func) {
+ // printf("transform_via_day(%ld, %ld, %d)\n", ordinal,
+ // af_info->intraday_conversion_factor,
+ // af_info->intraday_conversion_upsample);
+ npy_int64 result;
result = (*first_func)(ordinal, relation, af_info);
result = (*second_func)(result, relation, af_info);
@@ -413,7 +394,7 @@ static npy_int64 DtoB_weekday(npy_int64 absdate) {
static npy_int64 DtoB_WeekendToMonday(npy_int64 absdate, int day_of_week) {
if (day_of_week > 4) {
- //change to Monday after weekend
+ // change to Monday after weekend
absdate += (7 - day_of_week);
}
return DtoB_weekday(absdate);
@@ -421,7 +402,7 @@ static npy_int64 DtoB_WeekendToMonday(npy_int64 absdate, int day_of_week) {
static npy_int64 DtoB_WeekendToFriday(npy_int64 absdate, int day_of_week) {
if (day_of_week > 4) {
- //change to friday before weekend
+ // change to friday before weekend
absdate -= (day_of_week - 4);
}
return DtoB_weekday(absdate);
@@ -429,7 +410,8 @@ static npy_int64 DtoB_WeekendToFriday(npy_int64 absdate, int day_of_week) {
static npy_int64 absdate_from_ymd(int y, int m, int d) {
struct date_info tempDate;
- if (dInfoCalc_SetFromDateAndTime(&tempDate, y, m, d, 0, 0, 0, GREGORIAN_CALENDAR)) {
+ if (dInfoCalc_SetFromDateAndTime(&tempDate, y, m, d, 0, 0, 0,
+ GREGORIAN_CALENDAR)) {
return INT_ERR_CODE;
}
return tempDate.absdate;
@@ -437,27 +419,33 @@ static npy_int64 absdate_from_ymd(int y, int m, int d) {
//************ FROM DAILY ***************
-static npy_int64 asfreq_DTtoA(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_DTtoA(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
struct date_info dinfo;
ordinal = downsample_daytime(ordinal, af_info, 0);
- if (dInfoCalc_SetFromAbsDate(&dinfo, ordinal + ORD_OFFSET, GREGORIAN_CALENDAR))
+ if (dInfoCalc_SetFromAbsDate(&dinfo, ordinal + ORD_OFFSET,
+ GREGORIAN_CALENDAR))
return INT_ERR_CODE;
if (dinfo.month > af_info->to_a_year_end) {
return (npy_int64)(dinfo.year + 1 - BASE_YEAR);
- }
- else {
+ } else {
return (npy_int64)(dinfo.year - BASE_YEAR);
}
}
-static npy_int64 DtoQ_yq(npy_int64 ordinal, asfreq_info *af_info, int *year, int *quarter) {
+static npy_int64 DtoQ_yq(npy_int64 ordinal, asfreq_info *af_info, int *year,
+ int *quarter) {
struct date_info dinfo;
- if (dInfoCalc_SetFromAbsDate(&dinfo, ordinal + ORD_OFFSET, GREGORIAN_CALENDAR))
+ if (dInfoCalc_SetFromAbsDate(&dinfo, ordinal + ORD_OFFSET,
+ GREGORIAN_CALENDAR))
return INT_ERR_CODE;
if (af_info->to_q_year_end != 12) {
dinfo.month -= af_info->to_q_year_end;
- if (dinfo.month <= 0) { dinfo.month += 12; }
- else { dinfo.year += 1; }
+ if (dinfo.month <= 0) {
+ dinfo.month += 12;
+ } else {
+ dinfo.year += 1;
+ }
dinfo.quarter = monthToQuarter(dinfo.month);
}
@@ -467,7 +455,8 @@ static npy_int64 DtoQ_yq(npy_int64 ordinal, asfreq_info *af_info, int *year, int
return 0;
}
-static npy_int64 asfreq_DTtoQ(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_DTtoQ(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
int year, quarter;
ordinal = downsample_daytime(ordinal, af_info, 0);
@@ -479,27 +468,33 @@ static npy_int64 asfreq_DTtoQ(npy_int64 ordinal, char relation, asfreq_info *af_
return (npy_int64)((year - BASE_YEAR) * 4 + quarter - 1);
}
-static npy_int64 asfreq_DTtoM(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_DTtoM(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
struct date_info dinfo;
ordinal = downsample_daytime(ordinal, af_info, 0);
- if (dInfoCalc_SetFromAbsDate(&dinfo, ordinal + ORD_OFFSET, GREGORIAN_CALENDAR))
+ if (dInfoCalc_SetFromAbsDate(&dinfo, ordinal + ORD_OFFSET,
+ GREGORIAN_CALENDAR))
return INT_ERR_CODE;
return (npy_int64)((dinfo.year - BASE_YEAR) * 12 + dinfo.month - 1);
}
-static npy_int64 asfreq_DTtoW(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_DTtoW(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
ordinal = downsample_daytime(ordinal, af_info, 0);
- return (ordinal + ORD_OFFSET - (1 + af_info->to_week_end))/7 + 1 - WEEK_OFFSET;
+ return (ordinal + ORD_OFFSET - (1 + af_info->to_week_end)) / 7 + 1 -
+ WEEK_OFFSET;
}
-static npy_int64 asfreq_DTtoB(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_DTtoB(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
struct date_info dinfo;
- ordinal = downsample_daytime(ordinal, af_info, 0);
+ ordinal = downsample_daytime(ordinal, af_info, 0);
- if (dInfoCalc_SetFromAbsDate(&dinfo, ordinal + ORD_OFFSET, GREGORIAN_CALENDAR))
+ if (dInfoCalc_SetFromAbsDate(&dinfo, ordinal + ORD_OFFSET,
+ GREGORIAN_CALENDAR))
return INT_ERR_CODE;
if (relation == 'S') {
@@ -510,43 +505,54 @@ static npy_int64 asfreq_DTtoB(npy_int64 ordinal, char relation, asfreq_info *af_
}
// all intra day calculations are now done within one function
-static npy_int64 asfreq_DownsampleWithinDay(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_DownsampleWithinDay(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
return downsample_daytime(ordinal, af_info, relation == 'E');
}
-static npy_int64 asfreq_UpsampleWithinDay(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_UpsampleWithinDay(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
return upsample_daytime(ordinal, af_info, relation == 'E');
}
//************ FROM BUSINESS ***************
-static npy_int64 asfreq_BtoDT(npy_int64 ordinal, char relation, asfreq_info *af_info)
-{
+static npy_int64 asfreq_BtoDT(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
ordinal += BDAY_OFFSET;
- ordinal = (((ordinal - 1) / 5) * 7 +
- mod_compat(ordinal - 1, 5) + 1 - ORD_OFFSET);
+ ordinal =
+ (((ordinal - 1) / 5) * 7 + mod_compat(ordinal - 1, 5) + 1 - ORD_OFFSET);
return upsample_daytime(ordinal, af_info, relation != 'S');
}
-static npy_int64 asfreq_BtoA(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_BtoDT, asfreq_DTtoA);
+static npy_int64 asfreq_BtoA(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_BtoDT,
+ asfreq_DTtoA);
}
-static npy_int64 asfreq_BtoQ(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_BtoDT, asfreq_DTtoQ);
+static npy_int64 asfreq_BtoQ(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_BtoDT,
+ asfreq_DTtoQ);
}
-static npy_int64 asfreq_BtoM(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_BtoDT, asfreq_DTtoM);
+static npy_int64 asfreq_BtoM(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_BtoDT,
+ asfreq_DTtoM);
}
-static npy_int64 asfreq_BtoW(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_BtoDT, asfreq_DTtoW);
+static npy_int64 asfreq_BtoW(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_BtoDT,
+ asfreq_DTtoW);
}
//************ FROM WEEKLY ***************
-static npy_int64 asfreq_WtoDT(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_WtoDT(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
ordinal += WEEK_OFFSET;
if (relation != 'S') {
ordinal += 1;
@@ -561,33 +567,41 @@ static npy_int64 asfreq_WtoDT(npy_int64 ordinal, char relation, asfreq_info *af_
return upsample_daytime(ordinal, af_info, relation != 'S');
}
-static npy_int64 asfreq_WtoA(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_WtoDT, asfreq_DTtoA);
+static npy_int64 asfreq_WtoA(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_WtoDT,
+ asfreq_DTtoA);
}
-static npy_int64 asfreq_WtoQ(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_WtoDT, asfreq_DTtoQ);
+static npy_int64 asfreq_WtoQ(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_WtoDT,
+ asfreq_DTtoQ);
}
-static npy_int64 asfreq_WtoM(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_WtoDT, asfreq_DTtoM);
+static npy_int64 asfreq_WtoM(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_WtoDT,
+ asfreq_DTtoM);
}
-static npy_int64 asfreq_WtoW(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_WtoDT, asfreq_DTtoW);
+static npy_int64 asfreq_WtoW(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_WtoDT,
+ asfreq_DTtoW);
}
-static npy_int64 asfreq_WtoB(npy_int64 ordinal, char relation, asfreq_info *af_info) {
-
+static npy_int64 asfreq_WtoB(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
struct date_info dinfo;
- if (dInfoCalc_SetFromAbsDate(&dinfo,
- asfreq_WtoDT(ordinal, relation, af_info) + ORD_OFFSET,
- GREGORIAN_CALENDAR)) return INT_ERR_CODE;
+ if (dInfoCalc_SetFromAbsDate(
+ &dinfo, asfreq_WtoDT(ordinal, relation, af_info) + ORD_OFFSET,
+ GREGORIAN_CALENDAR))
+ return INT_ERR_CODE;
if (relation == 'S') {
return DtoB_WeekendToMonday(dinfo.absdate, dinfo.day_of_week);
- }
- else {
+ } else {
return DtoB_WeekendToFriday(dinfo.absdate, dinfo.day_of_week);
}
}
@@ -598,46 +612,58 @@ static void MtoD_ym(npy_int64 ordinal, int *y, int *m) {
*m = mod_compat(ordinal, 12) + 1;
}
-
-static npy_int64 asfreq_MtoDT(npy_int64 ordinal, char relation, asfreq_info* af_info) {
+static npy_int64 asfreq_MtoDT(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
npy_int64 absdate;
int y, m;
if (relation == 'E') {
- ordinal += 1;
+ ordinal += 1;
}
MtoD_ym(ordinal, &y, &m);
- if ((absdate = absdate_from_ymd(y, m, 1)) == INT_ERR_CODE) return INT_ERR_CODE;
+ if ((absdate = absdate_from_ymd(y, m, 1)) == INT_ERR_CODE)
+ return INT_ERR_CODE;
ordinal = absdate - ORD_OFFSET;
if (relation == 'E') {
- ordinal -= 1;
+ ordinal -= 1;
}
return upsample_daytime(ordinal, af_info, relation != 'S');
}
-static npy_int64 asfreq_MtoA(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_MtoDT, asfreq_DTtoA);
+static npy_int64 asfreq_MtoA(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_MtoDT,
+ asfreq_DTtoA);
}
-static npy_int64 asfreq_MtoQ(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_MtoDT, asfreq_DTtoQ);
+static npy_int64 asfreq_MtoQ(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_MtoDT,
+ asfreq_DTtoQ);
}
-static npy_int64 asfreq_MtoW(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_MtoDT, asfreq_DTtoW);
+static npy_int64 asfreq_MtoW(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_MtoDT,
+ asfreq_DTtoW);
}
-static npy_int64 asfreq_MtoB(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_MtoB(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
struct date_info dinfo;
-
- if (dInfoCalc_SetFromAbsDate(&dinfo,
- asfreq_MtoDT(ordinal, relation, af_info) + ORD_OFFSET,
- GREGORIAN_CALENDAR)) return INT_ERR_CODE;
- if (relation == 'S') { return DtoB_WeekendToMonday(dinfo.absdate, dinfo.day_of_week); }
- else { return DtoB_WeekendToFriday(dinfo.absdate, dinfo.day_of_week); }
+ if (dInfoCalc_SetFromAbsDate(
+ &dinfo, asfreq_MtoDT(ordinal, relation, af_info) + ORD_OFFSET,
+ GREGORIAN_CALENDAR))
+ return INT_ERR_CODE;
+
+ if (relation == 'S') {
+ return DtoB_WeekendToMonday(dinfo.absdate, dinfo.day_of_week);
+ } else {
+ return DtoB_WeekendToFriday(dinfo.absdate, dinfo.day_of_week);
+ }
}
//************ FROM QUARTERLY ***************
@@ -648,62 +674,78 @@ static void QtoD_ym(npy_int64 ordinal, int *y, int *m, asfreq_info *af_info) {
if (af_info->from_q_year_end != 12) {
*m += af_info->from_q_year_end;
- if (*m > 12) { *m -= 12; }
- else { *y -= 1; }
+ if (*m > 12) {
+ *m -= 12;
+ } else {
+ *y -= 1;
+ }
}
}
-static npy_int64 asfreq_QtoDT(npy_int64 ordinal, char relation, asfreq_info *af_info) {
-
+static npy_int64 asfreq_QtoDT(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
npy_int64 absdate;
int y, m;
if (relation == 'E') {
- ordinal += 1;
+ ordinal += 1;
}
QtoD_ym(ordinal, &y, &m, af_info);
- if ((absdate = absdate_from_ymd(y, m, 1)) == INT_ERR_CODE) return INT_ERR_CODE;
+ if ((absdate = absdate_from_ymd(y, m, 1)) == INT_ERR_CODE)
+ return INT_ERR_CODE;
if (relation == 'E') {
- absdate -= 1;
+ absdate -= 1;
}
return upsample_daytime(absdate - ORD_OFFSET, af_info, relation != 'S');
}
-static npy_int64 asfreq_QtoQ(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_QtoDT, asfreq_DTtoQ);
+static npy_int64 asfreq_QtoQ(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_QtoDT,
+ asfreq_DTtoQ);
}
-static npy_int64 asfreq_QtoA(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_QtoDT, asfreq_DTtoA);
+static npy_int64 asfreq_QtoA(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_QtoDT,
+ asfreq_DTtoA);
}
-static npy_int64 asfreq_QtoM(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_QtoDT, asfreq_DTtoM);
+static npy_int64 asfreq_QtoM(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_QtoDT,
+ asfreq_DTtoM);
}
-static npy_int64 asfreq_QtoW(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_QtoDT, asfreq_DTtoW);
+static npy_int64 asfreq_QtoW(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_QtoDT,
+ asfreq_DTtoW);
}
-static npy_int64 asfreq_QtoB(npy_int64 ordinal, char relation, asfreq_info *af_info) {
-
+static npy_int64 asfreq_QtoB(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
struct date_info dinfo;
- if (dInfoCalc_SetFromAbsDate(&dinfo,
- asfreq_QtoDT(ordinal, relation, af_info) + ORD_OFFSET,
- GREGORIAN_CALENDAR)) return INT_ERR_CODE;
+ if (dInfoCalc_SetFromAbsDate(
+ &dinfo, asfreq_QtoDT(ordinal, relation, af_info) + ORD_OFFSET,
+ GREGORIAN_CALENDAR))
+ return INT_ERR_CODE;
- if (relation == 'S') { return DtoB_WeekendToMonday(dinfo.absdate, dinfo.day_of_week); }
- else { return DtoB_WeekendToFriday(dinfo.absdate, dinfo.day_of_week); }
+ if (relation == 'S') {
+ return DtoB_WeekendToMonday(dinfo.absdate, dinfo.day_of_week);
+ } else {
+ return DtoB_WeekendToFriday(dinfo.absdate, dinfo.day_of_week);
+ }
}
-
//************ FROM ANNUAL ***************
-static npy_int64 asfreq_AtoDT(npy_int64 year, char relation, asfreq_info *af_info) {
+static npy_int64 asfreq_AtoDT(npy_int64 year, char relation,
+ asfreq_info *af_info) {
npy_int64 absdate;
int month = (af_info->from_a_year_end) % 12;
@@ -713,164 +755,193 @@ static npy_int64 asfreq_AtoDT(npy_int64 year, char relation, asfreq_info *af_inf
month += 1;
if (af_info->from_a_year_end != 12) {
- year -= 1;
+ year -= 1;
}
if (relation == 'E') {
- year += 1;
+ year += 1;
}
absdate = absdate_from_ymd(year, month, 1);
- if (absdate == INT_ERR_CODE) {
+ if (absdate == INT_ERR_CODE) {
return INT_ERR_CODE;
}
if (relation == 'E') {
- absdate -= 1;
+ absdate -= 1;
}
return upsample_daytime(absdate - ORD_OFFSET, af_info, relation != 'S');
}
-static npy_int64 asfreq_AtoA(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_AtoDT, asfreq_DTtoA);
+static npy_int64 asfreq_AtoA(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_AtoDT,
+ asfreq_DTtoA);
}
-static npy_int64 asfreq_AtoQ(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_AtoDT, asfreq_DTtoQ);
+static npy_int64 asfreq_AtoQ(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_AtoDT,
+ asfreq_DTtoQ);
}
-static npy_int64 asfreq_AtoM(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_AtoDT, asfreq_DTtoM);
+static npy_int64 asfreq_AtoM(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_AtoDT,
+ asfreq_DTtoM);
}
-static npy_int64 asfreq_AtoW(npy_int64 ordinal, char relation, asfreq_info *af_info) {
- return transform_via_day(ordinal, relation, af_info, asfreq_AtoDT, asfreq_DTtoW);
+static npy_int64 asfreq_AtoW(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return transform_via_day(ordinal, relation, af_info, asfreq_AtoDT,
+ asfreq_DTtoW);
}
-static npy_int64 asfreq_AtoB(npy_int64 ordinal, char relation, asfreq_info *af_info) {
-
+static npy_int64 asfreq_AtoB(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
struct date_info dinfo;
- if (dInfoCalc_SetFromAbsDate(&dinfo,
- asfreq_AtoDT(ordinal, relation, af_info) + ORD_OFFSET,
- GREGORIAN_CALENDAR)) return INT_ERR_CODE;
+ if (dInfoCalc_SetFromAbsDate(
+ &dinfo, asfreq_AtoDT(ordinal, relation, af_info) + ORD_OFFSET,
+ GREGORIAN_CALENDAR))
+ return INT_ERR_CODE;
- if (relation == 'S') { return DtoB_WeekendToMonday(dinfo.absdate, dinfo.day_of_week); }
- else { return DtoB_WeekendToFriday(dinfo.absdate, dinfo.day_of_week); }
+ if (relation == 'S') {
+ return DtoB_WeekendToMonday(dinfo.absdate, dinfo.day_of_week);
+ } else {
+ return DtoB_WeekendToFriday(dinfo.absdate, dinfo.day_of_week);
+ }
}
-static npy_int64 nofunc(npy_int64 ordinal, char relation, asfreq_info *af_info) { return INT_ERR_CODE; }
-static npy_int64 no_op(npy_int64 ordinal, char relation, asfreq_info *af_info) { return ordinal; }
+static npy_int64 nofunc(npy_int64 ordinal, char relation,
+ asfreq_info *af_info) {
+ return INT_ERR_CODE;
+}
+static npy_int64 no_op(npy_int64 ordinal, char relation, asfreq_info *af_info) {
+ return ordinal;
+}
// end of frequency specific conversion routines
static int calc_a_year_end(int freq, int group) {
int result = (freq - group) % 12;
- if (result == 0) {return 12;}
- else {return result;}
+ if (result == 0) {
+ return 12;
+ } else {
+ return result;
+ }
}
-static int calc_week_end(int freq, int group) {
- return freq - group;
-}
+static int calc_week_end(int freq, int group) { return freq - group; }
void get_asfreq_info(int fromFreq, int toFreq, asfreq_info *af_info) {
int fromGroup = get_freq_group(fromFreq);
int toGroup = get_freq_group(toFreq);
- af_info->intraday_conversion_factor =
- get_daytime_conversion_factor(
- get_freq_group_index(max_value(fromGroup, FR_DAY)),
- get_freq_group_index(max_value(toGroup, FR_DAY))
- );
+ af_info->intraday_conversion_factor = get_daytime_conversion_factor(
+ get_freq_group_index(max_value(fromGroup, FR_DAY)),
+ get_freq_group_index(max_value(toGroup, FR_DAY)));
- //printf("get_asfreq_info(%d, %d) %ld, %d\n", fromFreq, toFreq, af_info->intraday_conversion_factor, af_info->intraday_conversion_upsample);
+ // printf("get_asfreq_info(%d, %d) %ld, %d\n", fromFreq, toFreq,
+ // af_info->intraday_conversion_factor,
+ // af_info->intraday_conversion_upsample);
- switch(fromGroup)
- {
- case FR_WK:
+ switch (fromGroup) {
+ case FR_WK:
af_info->from_week_end = calc_week_end(fromFreq, fromGroup);
break;
- case FR_ANN:
+ case FR_ANN:
af_info->from_a_year_end = calc_a_year_end(fromFreq, fromGroup);
break;
- case FR_QTR:
+ case FR_QTR:
af_info->from_q_year_end = calc_a_year_end(fromFreq, fromGroup);
break;
}
- switch(toGroup)
- {
- case FR_WK:
+ switch (toGroup) {
+ case FR_WK:
af_info->to_week_end = calc_week_end(toFreq, toGroup);
break;
- case FR_ANN:
+ case FR_ANN:
af_info->to_a_year_end = calc_a_year_end(toFreq, toGroup);
break;
- case FR_QTR:
+ case FR_QTR:
af_info->to_q_year_end = calc_a_year_end(toFreq, toGroup);
break;
}
}
-
-freq_conv_func get_asfreq_func(int fromFreq, int toFreq)
-{
+freq_conv_func get_asfreq_func(int fromFreq, int toFreq) {
int fromGroup = get_freq_group(fromFreq);
int toGroup = get_freq_group(toFreq);
- if (fromGroup == FR_UND) { fromGroup = FR_DAY; }
+ if (fromGroup == FR_UND) {
+ fromGroup = FR_DAY;
+ }
- switch(fromGroup)
- {
+ switch (fromGroup) {
case FR_ANN:
- switch(toGroup)
- {
- case FR_ANN: return &asfreq_AtoA;
- case FR_QTR: return &asfreq_AtoQ;
- case FR_MTH: return &asfreq_AtoM;
- case FR_WK: return &asfreq_AtoW;
- case FR_BUS: return &asfreq_AtoB;
- case FR_DAY:
- case FR_HR:
- case FR_MIN:
+ switch (toGroup) {
+ case FR_ANN:
+ return &asfreq_AtoA;
+ case FR_QTR:
+ return &asfreq_AtoQ;
+ case FR_MTH:
+ return &asfreq_AtoM;
+ case FR_WK:
+ return &asfreq_AtoW;
+ case FR_BUS:
+ return &asfreq_AtoB;
+ case FR_DAY:
+ case FR_HR:
+ case FR_MIN:
case FR_SEC:
case FR_MS:
case FR_US:
case FR_NS:
- return &asfreq_AtoDT;
+ return &asfreq_AtoDT;
- default: return &nofunc;
+ default:
+ return &nofunc;
}
case FR_QTR:
- switch(toGroup)
- {
- case FR_ANN: return &asfreq_QtoA;
- case FR_QTR: return &asfreq_QtoQ;
- case FR_MTH: return &asfreq_QtoM;
- case FR_WK: return &asfreq_QtoW;
- case FR_BUS: return &asfreq_QtoB;
- case FR_DAY:
+ switch (toGroup) {
+ case FR_ANN:
+ return &asfreq_QtoA;
+ case FR_QTR:
+ return &asfreq_QtoQ;
+ case FR_MTH:
+ return &asfreq_QtoM;
+ case FR_WK:
+ return &asfreq_QtoW;
+ case FR_BUS:
+ return &asfreq_QtoB;
+ case FR_DAY:
case FR_HR:
case FR_MIN:
case FR_SEC:
case FR_MS:
case FR_US:
case FR_NS:
- return &asfreq_QtoDT;
- default: return &nofunc;
+ return &asfreq_QtoDT;
+ default:
+ return &nofunc;
}
case FR_MTH:
- switch(toGroup)
- {
- case FR_ANN: return &asfreq_MtoA;
- case FR_QTR: return &asfreq_MtoQ;
- case FR_MTH: return &no_op;
- case FR_WK: return &asfreq_MtoW;
- case FR_BUS: return &asfreq_MtoB;
+ switch (toGroup) {
+ case FR_ANN:
+ return &asfreq_MtoA;
+ case FR_QTR:
+ return &asfreq_MtoQ;
+ case FR_MTH:
+ return &no_op;
+ case FR_WK:
+ return &asfreq_MtoW;
+ case FR_BUS:
+ return &asfreq_MtoB;
case FR_DAY:
case FR_HR:
case FR_MIN:
@@ -878,46 +949,57 @@ freq_conv_func get_asfreq_func(int fromFreq, int toFreq)
case FR_MS:
case FR_US:
case FR_NS:
- return &asfreq_MtoDT;
- default: return &nofunc;
+ return &asfreq_MtoDT;
+ default:
+ return &nofunc;
}
case FR_WK:
- switch(toGroup)
- {
- case FR_ANN: return &asfreq_WtoA;
- case FR_QTR: return &asfreq_WtoQ;
- case FR_MTH: return &asfreq_WtoM;
- case FR_WK: return &asfreq_WtoW;
- case FR_BUS: return &asfreq_WtoB;
- case FR_DAY:
- case FR_HR:
- case FR_MIN:
- case FR_SEC:
+ switch (toGroup) {
+ case FR_ANN:
+ return &asfreq_WtoA;
+ case FR_QTR:
+ return &asfreq_WtoQ;
+ case FR_MTH:
+ return &asfreq_WtoM;
+ case FR_WK:
+ return &asfreq_WtoW;
+ case FR_BUS:
+ return &asfreq_WtoB;
+ case FR_DAY:
+ case FR_HR:
+ case FR_MIN:
+ case FR_SEC:
case FR_MS:
case FR_US:
case FR_NS:
- return &asfreq_WtoDT;
- default: return &nofunc;
+ return &asfreq_WtoDT;
+ default:
+ return &nofunc;
}
case FR_BUS:
- switch(toGroup)
- {
- case FR_ANN: return &asfreq_BtoA;
- case FR_QTR: return &asfreq_BtoQ;
- case FR_MTH: return &asfreq_BtoM;
- case FR_WK: return &asfreq_BtoW;
- case FR_BUS: return &no_op;
- case FR_DAY:
- case FR_HR:
- case FR_MIN:
+ switch (toGroup) {
+ case FR_ANN:
+ return &asfreq_BtoA;
+ case FR_QTR:
+ return &asfreq_BtoQ;
+ case FR_MTH:
+ return &asfreq_BtoM;
+ case FR_WK:
+ return &asfreq_BtoW;
+ case FR_BUS:
+ return &no_op;
+ case FR_DAY:
+ case FR_HR:
+ case FR_MIN:
case FR_SEC:
case FR_MS:
case FR_US:
case FR_NS:
- return &asfreq_BtoDT;
- default: return &nofunc;
+ return &asfreq_BtoDT;
+ default:
+ return &nofunc;
}
case FR_DAY:
@@ -927,14 +1009,18 @@ freq_conv_func get_asfreq_func(int fromFreq, int toFreq)
case FR_MS:
case FR_US:
case FR_NS:
- switch(toGroup)
- {
- case FR_ANN: return &asfreq_DTtoA;
- case FR_QTR: return &asfreq_DTtoQ;
- case FR_MTH: return &asfreq_DTtoM;
- case FR_WK: return &asfreq_DTtoW;
- case FR_BUS: return &asfreq_DTtoB;
- case FR_DAY:
+ switch (toGroup) {
+ case FR_ANN:
+ return &asfreq_DTtoA;
+ case FR_QTR:
+ return &asfreq_DTtoQ;
+ case FR_MTH:
+ return &asfreq_DTtoM;
+ case FR_WK:
+ return &asfreq_DTtoW;
+ case FR_BUS:
+ return &asfreq_DTtoB;
+ case FR_DAY:
case FR_HR:
case FR_MIN:
case FR_SEC:
@@ -946,59 +1032,60 @@ freq_conv_func get_asfreq_func(int fromFreq, int toFreq)
} else {
return &asfreq_UpsampleWithinDay;
}
- default: return &nofunc;
+ default:
+ return &nofunc;
}
- default: return &nofunc;
+ default:
+ return &nofunc;
}
}
double get_abs_time(int freq, npy_int64 date_ordinal, npy_int64 ordinal) {
- //printf("get_abs_time %d %lld %lld\n", freq, date_ordinal, ordinal);
+ // printf("get_abs_time %d %lld %lld\n", freq, date_ordinal, ordinal);
- int freq_index, day_index, base_index;
- npy_int64 per_day, start_ord;
- double unit, result;
+ int freq_index, day_index, base_index;
+ npy_int64 per_day, start_ord;
+ double unit, result;
if (freq <= FR_DAY) {
- return 0;
+ return 0;
}
freq_index = get_freq_group_index(freq);
day_index = get_freq_group_index(FR_DAY);
base_index = get_freq_group_index(FR_SEC);
- //printf(" indices: day %d, freq %d, base %d\n", day_index, freq_index, base_index);
+ // printf(" indices: day %d, freq %d, base %d\n", day_index, freq_index,
+ // base_index);
per_day = get_daytime_conversion_factor(day_index, freq_index);
unit = get_daytime_conversion_factor(freq_index, base_index);
- //printf(" per_day: %lld, unit: %f\n", per_day, unit);
+ // printf(" per_day: %lld, unit: %f\n", per_day, unit);
if (base_index < freq_index) {
- unit = 1 / unit;
- //printf(" corrected unit: %f\n", unit);
+ unit = 1 / unit;
+ // printf(" corrected unit: %f\n", unit);
}
start_ord = date_ordinal * per_day;
- //printf("start_ord: %lld\n", start_ord);
- result = (double) ( unit * (ordinal - start_ord));
- //printf(" result: %f\n", result);
+ // printf("start_ord: %lld\n", start_ord);
+ result = (double)(unit * (ordinal - start_ord));
+ // printf(" result: %f\n", result);
return result;
}
/* Sets the time part of the DateTime object. */
-static int dInfoCalc_SetFromAbsTime(struct date_info *dinfo,
- double abstime)
-{
+static int dInfoCalc_SetFromAbsTime(struct date_info *dinfo, double abstime) {
int inttime;
- int hour,minute;
+ int hour, minute;
double second;
inttime = (int)abstime;
hour = inttime / 3600;
minute = (inttime % 3600) / 60;
- second = abstime - (double)(hour*3600 + minute*60);
+ second = abstime - (double)(hour * 3600 + minute * 60);
dinfo->hour = hour;
dinfo->minute = minute;
@@ -1013,15 +1100,12 @@ static int dInfoCalc_SetFromAbsTime(struct date_info *dinfo,
may be set to the flags: GREGORIAN_CALENDAR, JULIAN_CALENDAR to
indicate the calendar to be used. */
static int dInfoCalc_SetFromAbsDateTime(struct date_info *dinfo,
- npy_int64 absdate,
- double abstime,
- int calendar)
-{
+ npy_int64 absdate, double abstime,
+ int calendar) {
/* Bounds check */
Py_AssertWithArg(abstime >= 0.0 && abstime <= SECONDS_PER_DAY,
- PyExc_ValueError,
- "abstime out of range (0.0 - 86400.0): %f",
- abstime);
+ PyExc_ValueError,
+ "abstime out of range (0.0 - 86400.0): %f", abstime);
/* Calculate the date */
if (dInfoCalc_SetFromAbsDate(dinfo, absdate, calendar)) goto onError;
@@ -1038,8 +1122,8 @@ static int dInfoCalc_SetFromAbsDateTime(struct date_info *dinfo,
* New pandas API-helper code, to expose to cython
* ------------------------------------------------------------------*/
-npy_int64 asfreq(npy_int64 period_ordinal, int freq1, int freq2, char relation)
-{
+npy_int64 asfreq(npy_int64 period_ordinal, int freq1, int freq2,
+ char relation) {
npy_int64 val;
freq_conv_func func;
asfreq_info finfo;
@@ -1048,12 +1132,14 @@ npy_int64 asfreq(npy_int64 period_ordinal, int freq1, int freq2, char relation)
get_asfreq_info(freq1, freq2, &finfo);
- //printf("\n%x %d %d %ld %ld\n", func, freq1, freq2, finfo.intraday_conversion_factor, -finfo.intraday_conversion_factor);
+ // printf("\n%x %d %d %ld %ld\n", func, freq1, freq2,
+ // finfo.intraday_conversion_factor, -finfo.intraday_conversion_factor);
val = (*func)(period_ordinal, relation, &finfo);
if (val == INT_ERR_CODE) {
- //Py_Error(PyExc_ValueError, "Unable to convert to desired frequency.");
+ // Py_Error(PyExc_ValueError, "Unable to convert to desired
+ // frequency.");
goto onError;
}
return val;
@@ -1061,12 +1147,10 @@ npy_int64 asfreq(npy_int64 period_ordinal, int freq1, int freq2, char relation)
return INT_ERR_CODE;
}
-
/* generate an ordinal in period space */
-npy_int64 get_period_ordinal(int year, int month, int day,
- int hour, int minute, int second, int microseconds, int picoseconds,
- int freq)
-{
+npy_int64 get_period_ordinal(int year, int month, int day, int hour, int minute,
+ int second, int microseconds, int picoseconds,
+ int freq) {
npy_int64 absdays, delta, seconds;
npy_int64 weeks, days;
npy_int64 ordinal, day_adj;
@@ -1074,20 +1158,21 @@ npy_int64 get_period_ordinal(int year, int month, int day,
freq_group = get_freq_group(freq);
if (freq == FR_SEC || freq == FR_MS || freq == FR_US || freq == FR_NS) {
-
absdays = absdate_from_ymd(year, month, day);
delta = (absdays - ORD_OFFSET);
- seconds = (npy_int64)(delta * 86400 + hour * 3600 + minute * 60 + second);
+ seconds =
+ (npy_int64)(delta * 86400 + hour * 3600 + minute * 60 + second);
- switch(freq) {
- case FR_MS:
- return seconds * 1000 + microseconds / 1000;
+ switch (freq) {
+ case FR_MS:
+ return seconds * 1000 + microseconds / 1000;
- case FR_US:
- return seconds * 1000000 + microseconds;
+ case FR_US:
+ return seconds * 1000000 + microseconds;
- case FR_NS:
- return seconds * 1000000000 + microseconds * 1000 + picoseconds / 1000;
+ case FR_NS:
+ return seconds * 1000000000 + microseconds * 1000 +
+ picoseconds / 1000;
}
return seconds;
@@ -1096,63 +1181,55 @@ npy_int64 get_period_ordinal(int year, int month, int day,
if (freq == FR_MIN) {
absdays = absdate_from_ymd(year, month, day);
delta = (absdays - ORD_OFFSET);
- return (npy_int64)(delta*1440 + hour*60 + minute);
+ return (npy_int64)(delta * 1440 + hour * 60 + minute);
}
if (freq == FR_HR) {
- if ((absdays = absdate_from_ymd(year, month, day)) == INT_ERR_CODE)
- {
+ if ((absdays = absdate_from_ymd(year, month, day)) == INT_ERR_CODE) {
goto onError;
}
delta = (absdays - ORD_OFFSET);
- return (npy_int64)(delta*24 + hour);
+ return (npy_int64)(delta * 24 + hour);
}
- if (freq == FR_DAY)
- {
- return (npy_int64) (absdate_from_ymd(year, month, day) - ORD_OFFSET);
+ if (freq == FR_DAY) {
+ return (npy_int64)(absdate_from_ymd(year, month, day) - ORD_OFFSET);
}
- if (freq == FR_UND)
- {
- return (npy_int64) (absdate_from_ymd(year, month, day) - ORD_OFFSET);
+ if (freq == FR_UND) {
+ return (npy_int64)(absdate_from_ymd(year, month, day) - ORD_OFFSET);
}
- if (freq == FR_BUS)
- {
- if((days = absdate_from_ymd(year, month, day)) == INT_ERR_CODE)
- {
+ if (freq == FR_BUS) {
+ if ((days = absdate_from_ymd(year, month, day)) == INT_ERR_CODE) {
goto onError;
}
// calculate the current week assuming sunday as last day of a week
weeks = (days - BASE_WEEK_TO_DAY_OFFSET) / DAYS_PER_WEEK;
// calculate the current weekday (in range 1 .. 7)
delta = (days - BASE_WEEK_TO_DAY_OFFSET) % DAYS_PER_WEEK + 1;
- // return the number of business days in full weeks plus the business days in the last - possible partial - week
- return (npy_int64)(weeks * BUSINESS_DAYS_PER_WEEK)
- + (delta <= BUSINESS_DAYS_PER_WEEK
- ? delta
- : BUSINESS_DAYS_PER_WEEK + 1)
- - BDAY_OFFSET;
+ // return the number of business days in full weeks plus the business
+ // days in the last - possible partial - week
+ return (npy_int64)(weeks * BUSINESS_DAYS_PER_WEEK) +
+ (delta <= BUSINESS_DAYS_PER_WEEK ? delta
+ : BUSINESS_DAYS_PER_WEEK + 1) -
+ BDAY_OFFSET;
}
- if (freq_group == FR_WK)
- {
- if((ordinal = (npy_int64)absdate_from_ymd(year, month, day)) == INT_ERR_CODE)
- {
+ if (freq_group == FR_WK) {
+ if ((ordinal = (npy_int64)absdate_from_ymd(year, month, day)) ==
+ INT_ERR_CODE) {
goto onError;
}
day_adj = freq - FR_WK;
return (ordinal - (1 + day_adj)) / 7 + 1 - WEEK_OFFSET;
}
- if (freq == FR_MTH)
- {
+ if (freq == FR_MTH) {
return (year - BASE_YEAR) * 12 + month - 1;
}
- if (freq_group == FR_QTR)
- {
+ if (freq_group == FR_QTR) {
fmonth = freq - FR_QTR;
if (fmonth == 0) fmonth = 12;
@@ -1163,14 +1240,12 @@ npy_int64 get_period_ordinal(int year, int month, int day,
return (year - BASE_YEAR) * 4 + (mdiff - 1) / 3;
}
- if (freq_group == FR_ANN)
- {
+ if (freq_group == FR_ANN) {
fmonth = freq - FR_ANN;
if (fmonth == 0) fmonth = 12;
if (month <= fmonth) {
return year - BASE_YEAR;
- }
- else {
+ } else {
return year - BASE_YEAR + 1;
}
}
@@ -1188,13 +1263,11 @@ npy_int64 get_period_ordinal(int year, int month, int day,
is calculated for the last day of the period.
*/
-npy_int64 get_python_ordinal(npy_int64 period_ordinal, int freq)
-{
+npy_int64 get_python_ordinal(npy_int64 period_ordinal, int freq) {
asfreq_info af_info;
- freq_conv_func toDaily = NULL;
+ freq_conv_func toDaily = NULL;
- if (freq == FR_DAY)
- return period_ordinal + ORD_OFFSET;
+ if (freq == FR_DAY) return period_ordinal + ORD_OFFSET;
toDaily = get_asfreq_func(freq, FR_DAY);
get_asfreq_info(freq, FR_DAY, &af_info);
@@ -1216,12 +1289,14 @@ char *str_replace(const char *s, const char *old, const char *new) {
}
ret = PyArray_malloc(i + 1 + count * (newlen - oldlen));
- if (ret == NULL) {return (char *)PyErr_NoMemory();}
+ if (ret == NULL) {
+ return (char *)PyErr_NoMemory();
+ }
i = 0;
while (*s) {
if (strstr(s, old) == s) {
- strcpy(&ret[i], new);
+ strncpy(&ret[i], new, sizeof(char) * newlen);
i += newlen;
s += oldlen;
} else {
@@ -1236,9 +1311,9 @@ char *str_replace(const char *s, const char *old, const char *new) {
// function to generate a nice string representation of the period
// object, originally from DateObject_strftime
-char* c_strftime(struct date_info *tmp, char *fmt) {
+char *c_strftime(struct date_info *tmp, char *fmt) {
struct tm c_date;
- char* result;
+ char *result;
struct date_info dinfo = *tmp;
int result_len = strlen(fmt) + 50;
@@ -1263,7 +1338,7 @@ int get_yq(npy_int64 ordinal, int freq, int *quarter, int *year) {
asfreq_info af_info;
int qtr_freq;
npy_int64 daily_ord;
- npy_int64 (*toDaily)(npy_int64, char, asfreq_info*) = NULL;
+ npy_int64 (*toDaily)(npy_int64, char, asfreq_info *) = NULL;
toDaily = get_asfreq_func(freq, FR_DAY);
get_asfreq_info(freq, FR_DAY, &af_info);
@@ -1272,19 +1347,16 @@ int get_yq(npy_int64 ordinal, int freq, int *quarter, int *year) {
if (get_freq_group(freq) == FR_QTR) {
qtr_freq = freq;
- } else { qtr_freq = FR_QTR; }
+ } else {
+ qtr_freq = FR_QTR;
+ }
get_asfreq_info(FR_DAY, qtr_freq, &af_info);
- if(DtoQ_yq(daily_ord, &af_info, year, quarter) == INT_ERR_CODE)
- return -1;
+ if (DtoQ_yq(daily_ord, &af_info, year, quarter) == INT_ERR_CODE) return -1;
return 0;
}
-
-
-
-
static int _quarter_year(npy_int64 ordinal, int freq, int *year, int *quarter) {
asfreq_info af_info;
int qtr_freq;
@@ -1301,31 +1373,29 @@ static int _quarter_year(npy_int64 ordinal, int freq, int *year, int *quarter) {
if (DtoQ_yq(ordinal, &af_info, year, quarter) == INT_ERR_CODE)
return INT_ERR_CODE;
- if ((qtr_freq % 1000) > 12)
- *year -= 1;
+ if ((qtr_freq % 1000) > 12) *year -= 1;
return 0;
}
-static int _ISOWeek(struct date_info *dinfo)
-{
+static int _ISOWeek(struct date_info *dinfo) {
int week;
/* Estimate */
- week = (dinfo->day_of_year-1) - dinfo->day_of_week + 3;
+ week = (dinfo->day_of_year - 1) - dinfo->day_of_week + 3;
if (week >= 0) week = week / 7 + 1;
/* Verify */
if (week < 0) {
/* The day lies in last week of the previous year */
- if ((week > -2) ||
- (week == -2 && dInfoCalc_Leapyear(dinfo->year-1, dinfo->calendar)))
+ if ((week > -2) || (week == -2 && dInfoCalc_Leapyear(dinfo->year - 1,
+ dinfo->calendar)))
week = 53;
else
week = 52;
} else if (week == 53) {
/* Check if the week belongs to year or year+1 */
- if (31-dinfo->day + dinfo->day_of_week < 3) {
+ if (31 - dinfo->day + dinfo->day_of_week < 3) {
week = 1;
}
}
@@ -1333,8 +1403,7 @@ static int _ISOWeek(struct date_info *dinfo)
return week;
}
-int get_date_info(npy_int64 ordinal, int freq, struct date_info *dinfo)
-{
+int get_date_info(npy_int64 ordinal, int freq, struct date_info *dinfo) {
npy_int64 absdate = get_python_ordinal(ordinal, freq);
double abstime = get_abs_time(freq, absdate - ORD_OFFSET, ordinal);
@@ -1344,11 +1413,11 @@ int get_date_info(npy_int64 ordinal, int freq, struct date_info *dinfo)
}
while (abstime >= 86400) {
abstime -= 86400;
- absdate += 1;
+ absdate += 1;
}
- if(dInfoCalc_SetFromAbsDateTime(dinfo, absdate,
- abstime, GREGORIAN_CALENDAR))
+ if (dInfoCalc_SetFromAbsDateTime(dinfo, absdate, abstime,
+ GREGORIAN_CALENDAR))
return INT_ERR_CODE;
return 0;
@@ -1362,77 +1431,77 @@ int pyear(npy_int64 ordinal, int freq) {
int pqyear(npy_int64 ordinal, int freq) {
int year, quarter;
- if( _quarter_year(ordinal, freq, &year, &quarter) == INT_ERR_CODE)
+ if (_quarter_year(ordinal, freq, &year, &quarter) == INT_ERR_CODE)
return INT_ERR_CODE;
return year;
}
int pquarter(npy_int64 ordinal, int freq) {
int year, quarter;
- if(_quarter_year(ordinal, freq, &year, &quarter) == INT_ERR_CODE)
+ if (_quarter_year(ordinal, freq, &year, &quarter) == INT_ERR_CODE)
return INT_ERR_CODE;
return quarter;
}
int pmonth(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return dinfo.month;
}
int pday(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return dinfo.day;
}
int pweekday(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return dinfo.day_of_week;
}
int pday_of_week(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return dinfo.day_of_week;
}
int pday_of_year(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return dinfo.day_of_year;
}
int pweek(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return _ISOWeek(&dinfo);
}
int phour(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return dinfo.hour;
}
int pminute(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return dinfo.minute;
}
int psecond(npy_int64 ordinal, int freq) {
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
return (int)dinfo.second;
}
@@ -1440,9 +1509,10 @@ int psecond(npy_int64 ordinal, int freq) {
int pdays_in_month(npy_int64 ordinal, int freq) {
int days;
struct date_info dinfo;
- if(get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
+ if (get_date_info(ordinal, freq, &dinfo) == INT_ERR_CODE)
return INT_ERR_CODE;
-
- days = days_in_month[dInfoCalc_Leapyear(dinfo.year, dinfo.calendar)][dinfo.month-1];
+
+ days = days_in_month[dInfoCalc_Leapyear(dinfo.year, dinfo.calendar)]
+ [dinfo.month - 1];
return days;
}
diff --git a/pandas/src/period_helper.h b/pandas/src/period_helper.h
index 0351321926fa2..601717692ff6d 100644
--- a/pandas/src/period_helper.h
+++ b/pandas/src/period_helper.h
@@ -1,17 +1,24 @@
/*
- * Borrowed and derived code from scikits.timeseries that we will expose via
- * Cython to pandas. This primarily concerns interval representation and
- * frequency conversion routines.
- */
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
-#ifndef C_PERIOD_H
-#define C_PERIOD_H
+Borrowed and derived code from scikits.timeseries that we will expose via
+Cython to pandas. This primarily concerns interval representation and
+frequency conversion routines.
+*/
+
+#ifndef PANDAS_SRC_PERIOD_HELPER_H_
+#define PANDAS_SRC_PERIOD_HELPER_H_
#include
-#include "helper.h"
-#include "numpy/ndarraytypes.h"
#include "headers/stdint.h"
+#include "helper.h"
#include "limits.h"
+#include "numpy/ndarraytypes.h"
/*
* declarations from period here
@@ -20,100 +27,113 @@
#define GREGORIAN_CALENDAR 0
#define JULIAN_CALENDAR 1
-#define SECONDS_PER_DAY ((double) 86400.0)
-
-#define Py_AssertWithArg(x,errortype,errorstr,a1) {if (!(x)) {PyErr_Format(errortype,errorstr,a1);goto onError;}}
-#define Py_Error(errortype,errorstr) {PyErr_SetString(errortype,errorstr);goto onError;}
+#define SECONDS_PER_DAY ((double)86400.0)
+
+#define Py_AssertWithArg(x, errortype, errorstr, a1) \
+ { \
+ if (!(x)) { \
+ PyErr_Format(errortype, errorstr, a1); \
+ goto onError; \
+ } \
+ }
+#define Py_Error(errortype, errorstr) \
+ { \
+ PyErr_SetString(errortype, errorstr); \
+ goto onError; \
+ }
/*** FREQUENCY CONSTANTS ***/
// HIGHFREQ_ORIG is the datetime ordinal from which to begin the second
// frequency ordinal sequence
-// typedef int64_t npy_int64;
-// begins second ordinal at 1/1/1970 unix epoch
-
// #define HIGHFREQ_ORIG 62135683200LL
#define BASE_YEAR 1970
-#define ORD_OFFSET 719163LL // days until 1970-01-01
-#define BDAY_OFFSET 513689LL // days until 1970-01-01
+#define ORD_OFFSET 719163LL // days until 1970-01-01
+#define BDAY_OFFSET 513689LL // days until 1970-01-01
#define WEEK_OFFSET 102737LL
-#define BASE_WEEK_TO_DAY_OFFSET 1 // difference between day 0 and end of week in days
+#define BASE_WEEK_TO_DAY_OFFSET \
+ 1 // difference between day 0 and end of week in days
#define DAYS_PER_WEEK 7
#define BUSINESS_DAYS_PER_WEEK 5
-#define HIGHFREQ_ORIG 0 // ORD_OFFSET * 86400LL // days until 1970-01-01
-
-#define FR_ANN 1000 /* Annual */
-#define FR_ANNDEC FR_ANN /* Annual - December year end*/
-#define FR_ANNJAN 1001 /* Annual - January year end*/
-#define FR_ANNFEB 1002 /* Annual - February year end*/
-#define FR_ANNMAR 1003 /* Annual - March year end*/
-#define FR_ANNAPR 1004 /* Annual - April year end*/
-#define FR_ANNMAY 1005 /* Annual - May year end*/
-#define FR_ANNJUN 1006 /* Annual - June year end*/
-#define FR_ANNJUL 1007 /* Annual - July year end*/
-#define FR_ANNAUG 1008 /* Annual - August year end*/
-#define FR_ANNSEP 1009 /* Annual - September year end*/
-#define FR_ANNOCT 1010 /* Annual - October year end*/
-#define FR_ANNNOV 1011 /* Annual - November year end*/
+#define HIGHFREQ_ORIG 0 // ORD_OFFSET * 86400LL // days until 1970-01-01
+
+#define FR_ANN 1000 /* Annual */
+#define FR_ANNDEC FR_ANN /* Annual - December year end*/
+#define FR_ANNJAN 1001 /* Annual - January year end*/
+#define FR_ANNFEB 1002 /* Annual - February year end*/
+#define FR_ANNMAR 1003 /* Annual - March year end*/
+#define FR_ANNAPR 1004 /* Annual - April year end*/
+#define FR_ANNMAY 1005 /* Annual - May year end*/
+#define FR_ANNJUN 1006 /* Annual - June year end*/
+#define FR_ANNJUL 1007 /* Annual - July year end*/
+#define FR_ANNAUG 1008 /* Annual - August year end*/
+#define FR_ANNSEP 1009 /* Annual - September year end*/
+#define FR_ANNOCT 1010 /* Annual - October year end*/
+#define FR_ANNNOV 1011 /* Annual - November year end*/
/* The standard quarterly frequencies with various fiscal year ends
eg, Q42005 for Q@OCT runs Aug 1, 2005 to Oct 31, 2005 */
-#define FR_QTR 2000 /* Quarterly - December year end (default quarterly) */
-#define FR_QTRDEC FR_QTR /* Quarterly - December year end */
-#define FR_QTRJAN 2001 /* Quarterly - January year end */
-#define FR_QTRFEB 2002 /* Quarterly - February year end */
-#define FR_QTRMAR 2003 /* Quarterly - March year end */
-#define FR_QTRAPR 2004 /* Quarterly - April year end */
-#define FR_QTRMAY 2005 /* Quarterly - May year end */
-#define FR_QTRJUN 2006 /* Quarterly - June year end */
-#define FR_QTRJUL 2007 /* Quarterly - July year end */
-#define FR_QTRAUG 2008 /* Quarterly - August year end */
-#define FR_QTRSEP 2009 /* Quarterly - September year end */
-#define FR_QTROCT 2010 /* Quarterly - October year end */
-#define FR_QTRNOV 2011 /* Quarterly - November year end */
-
-#define FR_MTH 3000 /* Monthly */
-
-#define FR_WK 4000 /* Weekly */
+#define FR_QTR 2000 /* Quarterly - December year end (default quarterly) */
+#define FR_QTRDEC FR_QTR /* Quarterly - December year end */
+#define FR_QTRJAN 2001 /* Quarterly - January year end */
+#define FR_QTRFEB 2002 /* Quarterly - February year end */
+#define FR_QTRMAR 2003 /* Quarterly - March year end */
+#define FR_QTRAPR 2004 /* Quarterly - April year end */
+#define FR_QTRMAY 2005 /* Quarterly - May year end */
+#define FR_QTRJUN 2006 /* Quarterly - June year end */
+#define FR_QTRJUL 2007 /* Quarterly - July year end */
+#define FR_QTRAUG 2008 /* Quarterly - August year end */
+#define FR_QTRSEP 2009 /* Quarterly - September year end */
+#define FR_QTROCT 2010 /* Quarterly - October year end */
+#define FR_QTRNOV 2011 /* Quarterly - November year end */
+
+#define FR_MTH 3000 /* Monthly */
+
+#define FR_WK 4000 /* Weekly */
#define FR_WKSUN FR_WK /* Weekly - Sunday end of week */
-#define FR_WKMON 4001 /* Weekly - Monday end of week */
-#define FR_WKTUE 4002 /* Weekly - Tuesday end of week */
-#define FR_WKWED 4003 /* Weekly - Wednesday end of week */
-#define FR_WKTHU 4004 /* Weekly - Thursday end of week */
-#define FR_WKFRI 4005 /* Weekly - Friday end of week */
-#define FR_WKSAT 4006 /* Weekly - Saturday end of week */
-
-#define FR_BUS 5000 /* Business days */
-#define FR_DAY 6000 /* Daily */
-#define FR_HR 7000 /* Hourly */
-#define FR_MIN 8000 /* Minutely */
-#define FR_SEC 9000 /* Secondly */
-#define FR_MS 10000 /* Millisecondly */
-#define FR_US 11000 /* Microsecondly */
-#define FR_NS 12000 /* Nanosecondly */
-
-#define FR_UND -10000 /* Undefined */
+#define FR_WKMON 4001 /* Weekly - Monday end of week */
+#define FR_WKTUE 4002 /* Weekly - Tuesday end of week */
+#define FR_WKWED 4003 /* Weekly - Wednesday end of week */
+#define FR_WKTHU 4004 /* Weekly - Thursday end of week */
+#define FR_WKFRI 4005 /* Weekly - Friday end of week */
+#define FR_WKSAT 4006 /* Weekly - Saturday end of week */
+
+#define FR_BUS 5000 /* Business days */
+#define FR_DAY 6000 /* Daily */
+#define FR_HR 7000 /* Hourly */
+#define FR_MIN 8000 /* Minutely */
+#define FR_SEC 9000 /* Secondly */
+#define FR_MS 10000 /* Millisecondly */
+#define FR_US 11000 /* Microsecondly */
+#define FR_NS 12000 /* Nanosecondly */
+
+#define FR_UND -10000 /* Undefined */
#define INT_ERR_CODE INT32_MIN
-#define MEM_CHECK(item) if (item == NULL) { return PyErr_NoMemory(); }
-#define ERR_CHECK(item) if (item == NULL) { return NULL; }
+#define MEM_CHECK(item) \
+ if (item == NULL) { \
+ return PyErr_NoMemory(); \
+ }
+#define ERR_CHECK(item) \
+ if (item == NULL) { \
+ return NULL; \
+ }
typedef struct asfreq_info {
- int from_week_end; // day the week ends on in the "from" frequency
- int to_week_end; // day the week ends on in the "to" frequency
+ int from_week_end; // day the week ends on in the "from" frequency
+ int to_week_end; // day the week ends on in the "to" frequency
- int from_a_year_end; // month the year ends on in the "from" frequency
- int to_a_year_end; // month the year ends on in the "to" frequency
+ int from_a_year_end; // month the year ends on in the "from" frequency
+ int to_a_year_end; // month the year ends on in the "to" frequency
- int from_q_year_end; // month the year ends on in the "from" frequency
- int to_q_year_end; // month the year ends on in the "to" frequency
+ int from_q_year_end; // month the year ends on in the "from" frequency
+ int to_q_year_end; // month the year ends on in the "to" frequency
npy_int64 intraday_conversion_factor;
} asfreq_info;
-
typedef struct date_info {
npy_int64 absdate;
double abstime;
@@ -130,7 +150,7 @@ typedef struct date_info {
int calendar;
} date_info;
-typedef npy_int64 (*freq_conv_func)(npy_int64, char, asfreq_info*);
+typedef npy_int64 (*freq_conv_func)(npy_int64, char, asfreq_info *);
/*
* new pandas API helper functions here
@@ -138,9 +158,9 @@ typedef npy_int64 (*freq_conv_func)(npy_int64, char, asfreq_info*);
npy_int64 asfreq(npy_int64 period_ordinal, int freq1, int freq2, char relation);
-npy_int64 get_period_ordinal(int year, int month, int day,
- int hour, int minute, int second, int microseconds, int picoseconds,
- int freq);
+npy_int64 get_period_ordinal(int year, int month, int day, int hour, int minute,
+ int second, int microseconds, int picoseconds,
+ int freq);
npy_int64 get_python_ordinal(npy_int64 period_ordinal, int freq);
@@ -167,4 +187,5 @@ char *c_strftime(struct date_info *dinfo, char *fmt);
int get_yq(npy_int64 ordinal, int freq, int *quarter, int *year);
void initialize_daytime_conversion_factor_matrix(void);
-#endif
+
+#endif // PANDAS_SRC_PERIOD_HELPER_H_
diff --git a/pandas/src/skiplist.h b/pandas/src/skiplist.h
index 3bf63aedce9cb..013516a49fa2f 100644
--- a/pandas/src/skiplist.h
+++ b/pandas/src/skiplist.h
@@ -1,298 +1,290 @@
-
/*
- Flexibly-sized, indexable skiplist data structure for maintaining a sorted
- list of values
+Copyright (c) 2016, PyData Development Team
+All rights reserved.
+
+Distributed under the terms of the BSD Simplified License.
+
+The full license is in the LICENSE file, distributed with this software.
- Port of Wes McKinney's Cython version of Raymond Hettinger's original pure
- Python recipe (http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/)
- */
+Flexibly-sized, index-able skiplist data structure for maintaining a sorted
+list of values
-// #include
-// #include
+Port of Wes McKinney's Cython version of Raymond Hettinger's original pure
+Python recipe (http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/)
+*/
+#ifndef PANDAS_SRC_SKIPLIST_H_
+#define PANDAS_SRC_SKIPLIST_H_
+#include
#include
#include
#include
-#include
#ifndef PANDAS_INLINE
- #if defined(__GNUC__)
- #define PANDAS_INLINE static __inline__
- #elif defined(_MSC_VER)
- #define PANDAS_INLINE static __inline
- #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
- #define PANDAS_INLINE static inline
- #else
- #define PANDAS_INLINE
- #endif
+#if defined(__GNUC__)
+#define PANDAS_INLINE static __inline__
+#elif defined(_MSC_VER)
+#define PANDAS_INLINE static __inline
+#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+#define PANDAS_INLINE static inline
+#else
+#define PANDAS_INLINE
+#endif
#endif
-PANDAS_INLINE float __skiplist_nanf(void)
-{
- const union { int __i; float __f;} __bint = {0x7fc00000UL};
+PANDAS_INLINE float __skiplist_nanf(void) {
+ const union {
+ int __i;
+ float __f;
+ } __bint = {0x7fc00000UL};
return __bint.__f;
}
-#define PANDAS_NAN ((double) __skiplist_nanf())
+#define PANDAS_NAN ((double)__skiplist_nanf())
-
-PANDAS_INLINE double Log2(double val) {
- return log(val) / log(2.);
-}
+PANDAS_INLINE double Log2(double val) { return log(val) / log(2.); }
typedef struct node_t node_t;
struct node_t {
- node_t **next;
- int *width;
- double value;
- int is_nil;
- int levels;
- int ref_count;
+ node_t **next;
+ int *width;
+ double value;
+ int is_nil;
+ int levels;
+ int ref_count;
};
typedef struct {
- node_t *head;
- node_t **tmp_chain;
- int *tmp_steps;
- int size;
- int maxlevels;
+ node_t *head;
+ node_t **tmp_chain;
+ int *tmp_steps;
+ int size;
+ int maxlevels;
} skiplist_t;
PANDAS_INLINE double urand(void) {
- return ((double) rand() + 1) / ((double) RAND_MAX + 2);
+ return ((double)rand() + 1) / ((double)RAND_MAX + 2);
}
-PANDAS_INLINE int int_min(int a, int b) {
- return a < b ? a : b;
-}
+PANDAS_INLINE int int_min(int a, int b) { return a < b ? a : b; }
PANDAS_INLINE node_t *node_init(double value, int levels) {
- node_t *result;
- result = (node_t*) malloc(sizeof(node_t));
- if (result) {
- result->value = value;
- result->levels = levels;
- result->is_nil = 0;
- result->ref_count = 0;
- result->next = (node_t**) malloc(levels * sizeof(node_t*));
- result->width = (int*) malloc(levels * sizeof(int));
- if (!(result->next && result->width) && (levels != 0)) {
- free(result->next);
- free(result->width);
- free(result);
- return NULL;
- }
- }
- return result;
+ node_t *result;
+ result = (node_t *)malloc(sizeof(node_t));
+ if (result) {
+ result->value = value;
+ result->levels = levels;
+ result->is_nil = 0;
+ result->ref_count = 0;
+ result->next = (node_t **)malloc(levels * sizeof(node_t *));
+ result->width = (int *)malloc(levels * sizeof(int));
+ if (!(result->next && result->width) && (levels != 0)) {
+ free(result->next);
+ free(result->width);
+ free(result);
+ return NULL;
+ }
+ }
+ return result;
}
// do this ourselves
-PANDAS_INLINE void node_incref(node_t *node) {
- ++(node->ref_count);
-}
+PANDAS_INLINE void node_incref(node_t *node) { ++(node->ref_count); }
-PANDAS_INLINE void node_decref(node_t *node) {
- --(node->ref_count);
-}
+PANDAS_INLINE void node_decref(node_t *node) { --(node->ref_count); }
static void node_destroy(node_t *node) {
- int i;
- if (node) {
- if (node->ref_count <= 1) {
- for (i = 0; i < node->levels; ++i) {
- node_destroy(node->next[i]);
- }
- free(node->next);
- free(node->width);
- // printf("Reference count was 1, freeing\n");
- free(node);
- }
- else {
- node_decref(node);
+ int i;
+ if (node) {
+ if (node->ref_count <= 1) {
+ for (i = 0; i < node->levels; ++i) {
+ node_destroy(node->next[i]);
+ }
+ free(node->next);
+ free(node->width);
+ // printf("Reference count was 1, freeing\n");
+ free(node);
+ } else {
+ node_decref(node);
+ }
+ // pretty sure that freeing the struct above will be enough
}
- // pretty sure that freeing the struct above will be enough
- }
}
PANDAS_INLINE void skiplist_destroy(skiplist_t *skp) {
- if (skp) {
- node_destroy(skp->head);
- free(skp->tmp_steps);
- free(skp->tmp_chain);
- free(skp);
- }
+ if (skp) {
+ node_destroy(skp->head);
+ free(skp->tmp_steps);
+ free(skp->tmp_chain);
+ free(skp);
+ }
}
PANDAS_INLINE skiplist_t *skiplist_init(int expected_size) {
- skiplist_t *result;
- node_t *NIL, *head;
- int maxlevels, i;
-
- maxlevels = 1 + Log2((double) expected_size);
- result = (skiplist_t*) malloc(sizeof(skiplist_t));
- if (!result) {
- return NULL;
- }
- result->tmp_chain = (node_t**) malloc(maxlevels * sizeof(node_t*));
- result->tmp_steps = (int*) malloc(maxlevels * sizeof(int));
- result->maxlevels = maxlevels;
- result->size = 0;
-
- head = result->head = node_init(PANDAS_NAN, maxlevels);
- NIL = node_init(0.0, 0);
-
- if (!(result->tmp_chain && result->tmp_steps && result->head && NIL)) {
- skiplist_destroy(result);
- node_destroy(NIL);
- return NULL;
- }
-
- node_incref(head);
-
- NIL->is_nil = 1;
-
- for (i = 0; i < maxlevels; ++i)
- {
- head->next[i] = NIL;
- head->width[i] = 1;
- node_incref(NIL);
- }
-
- return result;
+ skiplist_t *result;
+ node_t *NIL, *head;
+ int maxlevels, i;
+
+ maxlevels = 1 + Log2((double)expected_size);
+ result = (skiplist_t *)malloc(sizeof(skiplist_t));
+ if (!result) {
+ return NULL;
+ }
+ result->tmp_chain = (node_t **)malloc(maxlevels * sizeof(node_t *));
+ result->tmp_steps = (int *)malloc(maxlevels * sizeof(int));
+ result->maxlevels = maxlevels;
+ result->size = 0;
+
+ head = result->head = node_init(PANDAS_NAN, maxlevels);
+ NIL = node_init(0.0, 0);
+
+ if (!(result->tmp_chain && result->tmp_steps && result->head && NIL)) {
+ skiplist_destroy(result);
+ node_destroy(NIL);
+ return NULL;
+ }
+
+ node_incref(head);
+
+ NIL->is_nil = 1;
+
+ for (i = 0; i < maxlevels; ++i) {
+ head->next[i] = NIL;
+ head->width[i] = 1;
+ node_incref(NIL);
+ }
+
+ return result;
}
// 1 if left < right, 0 if left == right, -1 if left > right
-PANDAS_INLINE int _node_cmp(node_t* node, double value){
- if (node->is_nil || node->value > value) {
- return -1;
- }
- else if (node->value < value) {
- return 1;
- }
- else {
- return 0;
- }
+PANDAS_INLINE int _node_cmp(node_t *node, double value) {
+ if (node->is_nil || node->value > value) {
+ return -1;
+ } else if (node->value < value) {
+ return 1;
+ } else {
+ return 0;
+ }
}
PANDAS_INLINE double skiplist_get(skiplist_t *skp, int i, int *ret) {
- node_t *node;
- int level;
-
- if (i < 0 || i >= skp->size) {
- *ret = 0;
- return 0;
- }
-
- node = skp->head;
- ++i;
- for (level = skp->maxlevels - 1; level >= 0; --level)
- {
- while (node->width[level] <= i)
- {
- i -= node->width[level];
- node = node->next[level];
+ node_t *node;
+ int level;
+
+ if (i < 0 || i >= skp->size) {
+ *ret = 0;
+ return 0;
+ }
+
+ node = skp->head;
+ ++i;
+ for (level = skp->maxlevels - 1; level >= 0; --level) {
+ while (node->width[level] <= i) {
+ i -= node->width[level];
+ node = node->next[level];
+ }
}
- }
- *ret = 1;
- return node->value;
+ *ret = 1;
+ return node->value;
}
PANDAS_INLINE int skiplist_insert(skiplist_t *skp, double value) {
- node_t *node, *prevnode, *newnode, *next_at_level;
- int *steps_at_level;
- int size, steps, level;
- node_t **chain;
-
- chain = skp->tmp_chain;
-
- steps_at_level = skp->tmp_steps;
- memset(steps_at_level, 0, skp->maxlevels * sizeof(int));
-
- node = skp->head;
-
- for (level = skp->maxlevels - 1; level >= 0; --level)
- {
- next_at_level = node->next[level];
- while (_node_cmp(next_at_level, value) >= 0) {
- steps_at_level[level] += node->width[level];
- node = next_at_level;
- next_at_level = node->next[level];
+ node_t *node, *prevnode, *newnode, *next_at_level;
+ int *steps_at_level;
+ int size, steps, level;
+ node_t **chain;
+
+ chain = skp->tmp_chain;
+
+ steps_at_level = skp->tmp_steps;
+ memset(steps_at_level, 0, skp->maxlevels * sizeof(int));
+
+ node = skp->head;
+
+ for (level = skp->maxlevels - 1; level >= 0; --level) {
+ next_at_level = node->next[level];
+ while (_node_cmp(next_at_level, value) >= 0) {
+ steps_at_level[level] += node->width[level];
+ node = next_at_level;
+ next_at_level = node->next[level];
+ }
+ chain[level] = node;
}
- chain[level] = node;
- }
- size = int_min(skp->maxlevels, 1 - ((int) Log2(urand())));
+ size = int_min(skp->maxlevels, 1 - ((int)Log2(urand())));
- newnode = node_init(value, size);
- if (!newnode) {
- return -1;
- }
- steps = 0;
+ newnode = node_init(value, size);
+ if (!newnode) {
+ return -1;
+ }
+ steps = 0;
- for (level = 0; level < size; ++level) {
- prevnode = chain[level];
- newnode->next[level] = prevnode->next[level];
+ for (level = 0; level < size; ++level) {
+ prevnode = chain[level];
+ newnode->next[level] = prevnode->next[level];
- prevnode->next[level] = newnode;
- node_incref(newnode); // increment the reference count
+ prevnode->next[level] = newnode;
+ node_incref(newnode); // increment the reference count
- newnode->width[level] = prevnode->width[level] - steps;
- prevnode->width[level] = steps + 1;
+ newnode->width[level] = prevnode->width[level] - steps;
+ prevnode->width[level] = steps + 1;
- steps += steps_at_level[level];
- }
+ steps += steps_at_level[level];
+ }
- for (level = size; level < skp->maxlevels; ++level) {
- chain[level]->width[level] += 1;
- }
+ for (level = size; level < skp->maxlevels; ++level) {
+ chain[level]->width[level] += 1;
+ }
- ++(skp->size);
+ ++(skp->size);
- return 1;
+ return 1;
}
PANDAS_INLINE int skiplist_remove(skiplist_t *skp, double value) {
- int level, size;
- node_t *node, *prevnode, *tmpnode, *next_at_level;
- node_t **chain;
-
- chain = skp->tmp_chain;
- node = skp->head;
-
- for (level = skp->maxlevels - 1; level >= 0; --level)
- {
- next_at_level = node->next[level];
- while (_node_cmp(next_at_level, value) > 0) {
- node = next_at_level;
- next_at_level = node->next[level];
+ int level, size;
+ node_t *node, *prevnode, *tmpnode, *next_at_level;
+ node_t **chain;
+
+ chain = skp->tmp_chain;
+ node = skp->head;
+
+ for (level = skp->maxlevels - 1; level >= 0; --level) {
+ next_at_level = node->next[level];
+ while (_node_cmp(next_at_level, value) > 0) {
+ node = next_at_level;
+ next_at_level = node->next[level];
+ }
+ chain[level] = node;
}
- chain[level] = node;
- }
- if (value != chain[0]->next[0]->value) {
- return 0;
- }
+ if (value != chain[0]->next[0]->value) {
+ return 0;
+ }
- size = chain[0]->next[0]->levels;
+ size = chain[0]->next[0]->levels;
- for (level = 0; level < size; ++level) {
- prevnode = chain[level];
+ for (level = 0; level < size; ++level) {
+ prevnode = chain[level];
- tmpnode = prevnode->next[level];
+ tmpnode = prevnode->next[level];
- prevnode->width[level] += tmpnode->width[level] - 1;
- prevnode->next[level] = tmpnode->next[level];
+ prevnode->width[level] += tmpnode->width[level] - 1;
+ prevnode->next[level] = tmpnode->next[level];
- tmpnode->next[level] = NULL;
- node_destroy(tmpnode); // decrement refcount or free
- }
+ tmpnode->next[level] = NULL;
+ node_destroy(tmpnode); // decrement refcount or free
+ }
- for (level = size; level < skp->maxlevels; ++level) {
- --(chain[level]->width[level]);
- }
+ for (level = size; level < skp->maxlevels; ++level) {
+ --(chain[level]->width[level]);
+ }
- --(skp->size);
- return 1;
+ --(skp->size);
+ return 1;
}
+
+#endif // PANDAS_SRC_SKIPLIST_H_
diff --git a/pandas/src/sparse_op_helper.pxi b/pandas/src/sparse_op_helper.pxi
deleted file mode 100644
index 8462c31c84679..0000000000000
--- a/pandas/src/sparse_op_helper.pxi
+++ /dev/null
@@ -1,5864 +0,0 @@
-"""
-Template for each `dtype` helper function for sparse ops
-
-WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
-"""
-
-#----------------------------------------------------------------------
-# Sparse op
-#----------------------------------------------------------------------
-
-cdef inline float64_t __div_float64(float64_t a, float64_t b):
- if b == 0:
- if a > 0:
- return INF
- elif a < 0:
- return -INF
- else:
- return NaN
- else:
- return float(a) / b
-
-cdef inline float64_t __truediv_float64(float64_t a, float64_t b):
- return __div_float64(a, b)
-
-cdef inline float64_t __floordiv_float64(float64_t a, float64_t b):
- if b == 0:
- # numpy >= 1.11 returns NaN
- # for a // 0, rather than +-inf
- if _np_version_under1p11:
- if a > 0:
- return INF
- elif a < 0:
- return -INF
- return NaN
- else:
- return a // b
-
-cdef inline float64_t __mod_float64(float64_t a, float64_t b):
- if b == 0:
- return NaN
- else:
- return a % b
-
-cdef inline float64_t __div_int64(int64_t a, int64_t b):
- if b == 0:
- if a > 0:
- return INF
- elif a < 0:
- return -INF
- else:
- return NaN
- else:
- return float(a) / b
-
-cdef inline float64_t __truediv_int64(int64_t a, int64_t b):
- return __div_int64(a, b)
-
-cdef inline int64_t __floordiv_int64(int64_t a, int64_t b):
- if b == 0:
- return 0
- else:
- return a // b
-
-cdef inline int64_t __mod_int64(int64_t a, int64_t b):
- if b == 0:
- return 0
- else:
- return a % b
-
-#----------------------------------------------------------------------
-# sparse array op
-#----------------------------------------------------------------------
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_add_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] + yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill + y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] + y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] + yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill + y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill + yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_add_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill + y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] + yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] + y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] + yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill + y[yi]
- yi += 1
-
- return out, out_index, xfill + yfill
-
-
-cpdef sparse_add_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_add_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_add_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_add_float64(float64_t xfill,
- float64_t yfill):
- return xfill + yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_add_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] + yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill + y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] + y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] + yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill + y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill + yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_add_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill + y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] + yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] + y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] + yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill + y[yi]
- yi += 1
-
- return out, out_index, xfill + yfill
-
-
-cpdef sparse_add_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_add_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_add_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_add_int64(int64_t xfill,
- int64_t yfill):
- return xfill + yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_sub_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] - yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill - y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] - y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] - yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill - y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill - yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_sub_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill - y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] - yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] - y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] - yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill - y[yi]
- yi += 1
-
- return out, out_index, xfill - yfill
-
-
-cpdef sparse_sub_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_sub_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_sub_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_sub_float64(float64_t xfill,
- float64_t yfill):
- return xfill - yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_sub_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] - yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill - y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] - y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] - yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill - y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill - yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_sub_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill - y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] - yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] - y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] - yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill - y[yi]
- yi += 1
-
- return out, out_index, xfill - yfill
-
-
-cpdef sparse_sub_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_sub_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_sub_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_sub_int64(int64_t xfill,
- int64_t yfill):
- return xfill - yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_mul_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] * yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill * y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] * y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] * yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill * y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill * yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_mul_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill * y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] * yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] * y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] * yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill * y[yi]
- yi += 1
-
- return out, out_index, xfill * yfill
-
-
-cpdef sparse_mul_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_mul_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_mul_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_mul_float64(float64_t xfill,
- float64_t yfill):
- return xfill * yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_mul_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] * yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill * y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] * y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] * yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill * y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill * yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_mul_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill * y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] * yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] * y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] * yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill * y[yi]
- yi += 1
-
- return out, out_index, xfill * yfill
-
-
-cpdef sparse_mul_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_mul_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_mul_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_mul_int64(int64_t xfill,
- int64_t yfill):
- return xfill * yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_div_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = __div_float64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = __div_float64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __div_float64(x[xi], y[yi])
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __div_float64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = __div_float64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, __div_float64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_div_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = __div_float64(xfill, y[yi])
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = __div_float64(x[xi], yfill)
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __div_float64(x[xi], y[yi])
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __div_float64(x[xi], yfill)
- xi += 1
- else:
- # use x fill value
- out[out_i] = __div_float64(xfill, y[yi])
- yi += 1
-
- return out, out_index, __div_float64(xfill, yfill)
-
-
-cpdef sparse_div_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_div_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_div_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_div_float64(float64_t xfill,
- float64_t yfill):
- return __div_float64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_div_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = __div_int64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = __div_int64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __div_int64(x[xi], y[yi])
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __div_int64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = __div_int64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, __div_int64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_div_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = __div_int64(xfill, y[yi])
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = __div_int64(x[xi], yfill)
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __div_int64(x[xi], y[yi])
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __div_int64(x[xi], yfill)
- xi += 1
- else:
- # use x fill value
- out[out_i] = __div_int64(xfill, y[yi])
- yi += 1
-
- return out, out_index, __div_int64(xfill, yfill)
-
-
-cpdef sparse_div_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_div_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_div_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_div_int64(int64_t xfill,
- int64_t yfill):
- return __div_int64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_mod_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = __mod_float64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = __mod_float64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __mod_float64(x[xi], y[yi])
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __mod_float64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = __mod_float64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, __mod_float64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_mod_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = __mod_float64(xfill, y[yi])
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = __mod_float64(x[xi], yfill)
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __mod_float64(x[xi], y[yi])
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __mod_float64(x[xi], yfill)
- xi += 1
- else:
- # use x fill value
- out[out_i] = __mod_float64(xfill, y[yi])
- yi += 1
-
- return out, out_index, __mod_float64(xfill, yfill)
-
-
-cpdef sparse_mod_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_mod_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_mod_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_mod_float64(float64_t xfill,
- float64_t yfill):
- return __mod_float64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_mod_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = __mod_int64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = __mod_int64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __mod_int64(x[xi], y[yi])
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __mod_int64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = __mod_int64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, __mod_int64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_mod_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = __mod_int64(xfill, y[yi])
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = __mod_int64(x[xi], yfill)
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __mod_int64(x[xi], y[yi])
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __mod_int64(x[xi], yfill)
- xi += 1
- else:
- # use x fill value
- out[out_i] = __mod_int64(xfill, y[yi])
- yi += 1
-
- return out, out_index, __mod_int64(xfill, yfill)
-
-
-cpdef sparse_mod_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_mod_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_mod_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_mod_int64(int64_t xfill,
- int64_t yfill):
- return __mod_int64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_truediv_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = __truediv_float64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = __truediv_float64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __truediv_float64(x[xi], y[yi])
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __truediv_float64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = __truediv_float64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, __truediv_float64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_truediv_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = __truediv_float64(xfill, y[yi])
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = __truediv_float64(x[xi], yfill)
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __truediv_float64(x[xi], y[yi])
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __truediv_float64(x[xi], yfill)
- xi += 1
- else:
- # use x fill value
- out[out_i] = __truediv_float64(xfill, y[yi])
- yi += 1
-
- return out, out_index, __truediv_float64(xfill, yfill)
-
-
-cpdef sparse_truediv_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_truediv_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_truediv_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_truediv_float64(float64_t xfill,
- float64_t yfill):
- return __truediv_float64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_truediv_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = __truediv_int64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = __truediv_int64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __truediv_int64(x[xi], y[yi])
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __truediv_int64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = __truediv_int64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, __truediv_int64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_truediv_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = __truediv_int64(xfill, y[yi])
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = __truediv_int64(x[xi], yfill)
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __truediv_int64(x[xi], y[yi])
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __truediv_int64(x[xi], yfill)
- xi += 1
- else:
- # use x fill value
- out[out_i] = __truediv_int64(xfill, y[yi])
- yi += 1
-
- return out, out_index, __truediv_int64(xfill, yfill)
-
-
-cpdef sparse_truediv_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_truediv_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_truediv_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_truediv_int64(int64_t xfill,
- int64_t yfill):
- return __truediv_int64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_floordiv_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = __floordiv_float64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = __floordiv_float64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __floordiv_float64(x[xi], y[yi])
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __floordiv_float64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = __floordiv_float64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, __floordiv_float64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_floordiv_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = __floordiv_float64(xfill, y[yi])
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = __floordiv_float64(x[xi], yfill)
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __floordiv_float64(x[xi], y[yi])
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __floordiv_float64(x[xi], yfill)
- xi += 1
- else:
- # use x fill value
- out[out_i] = __floordiv_float64(xfill, y[yi])
- yi += 1
-
- return out, out_index, __floordiv_float64(xfill, yfill)
-
-
-cpdef sparse_floordiv_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_floordiv_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_floordiv_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_floordiv_float64(float64_t xfill,
- float64_t yfill):
- return __floordiv_float64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_floordiv_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = __floordiv_int64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = __floordiv_int64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __floordiv_int64(x[xi], y[yi])
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __floordiv_int64(x[xi], yfill)
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = __floordiv_int64(xfill, y[yi])
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, __floordiv_int64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_floordiv_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = __floordiv_int64(xfill, y[yi])
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = __floordiv_int64(x[xi], yfill)
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = __floordiv_int64(x[xi], y[yi])
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = __floordiv_int64(x[xi], yfill)
- xi += 1
- else:
- # use x fill value
- out[out_i] = __floordiv_int64(xfill, y[yi])
- yi += 1
-
- return out, out_index, __floordiv_int64(xfill, yfill)
-
-
-cpdef sparse_floordiv_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_floordiv_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_floordiv_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_floordiv_int64(int64_t xfill,
- int64_t yfill):
- return __floordiv_int64(xfill, yfill)
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_pow_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] ** yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill ** y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] ** y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] ** yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill ** y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill ** yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_pow_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[float64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.float64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill ** y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] ** yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] ** y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] ** yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill ** y[yi]
- yi += 1
-
- return out, out_index, xfill ** yfill
-
-
-cpdef sparse_pow_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_pow_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_pow_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_pow_float64(float64_t xfill,
- float64_t yfill):
- return xfill ** yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_pow_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] ** yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill ** y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] ** y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] ** yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill ** y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill ** yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_pow_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[int64_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.int64)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill ** y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] ** yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] ** y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] ** yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill ** y[yi]
- yi += 1
-
- return out, out_index, xfill ** yfill
-
-
-cpdef sparse_pow_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_pow_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_pow_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_pow_int64(int64_t xfill,
- int64_t yfill):
- return xfill ** yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_eq_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] == yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill == y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] == y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] == yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill == y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill == yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_eq_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill == y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] == yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] == y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] == yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill == y[yi]
- yi += 1
-
- return out, out_index, xfill == yfill
-
-
-cpdef sparse_eq_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_eq_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_eq_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_eq_float64(float64_t xfill,
- float64_t yfill):
- return xfill == yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_eq_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] == yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill == y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] == y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] == yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill == y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill == yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_eq_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill == y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] == yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] == y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] == yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill == y[yi]
- yi += 1
-
- return out, out_index, xfill == yfill
-
-
-cpdef sparse_eq_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_eq_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_eq_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_eq_int64(int64_t xfill,
- int64_t yfill):
- return xfill == yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_ne_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] != yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill != y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] != y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] != yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill != y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill != yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_ne_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill != y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] != yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] != y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] != yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill != y[yi]
- yi += 1
-
- return out, out_index, xfill != yfill
-
-
-cpdef sparse_ne_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_ne_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_ne_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_ne_float64(float64_t xfill,
- float64_t yfill):
- return xfill != yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_ne_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] != yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill != y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] != y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] != yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill != y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill != yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_ne_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill != y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] != yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] != y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] != yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill != y[yi]
- yi += 1
-
- return out, out_index, xfill != yfill
-
-
-cpdef sparse_ne_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_ne_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_ne_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_ne_int64(int64_t xfill,
- int64_t yfill):
- return xfill != yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_lt_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] < yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill < y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] < y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] < yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill < y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill < yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_lt_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill < y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] < yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] < y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] < yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill < y[yi]
- yi += 1
-
- return out, out_index, xfill < yfill
-
-
-cpdef sparse_lt_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_lt_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_lt_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_lt_float64(float64_t xfill,
- float64_t yfill):
- return xfill < yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_lt_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] < yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill < y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] < y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] < yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill < y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill < yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_lt_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill < y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] < yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] < y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] < yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill < y[yi]
- yi += 1
-
- return out, out_index, xfill < yfill
-
-
-cpdef sparse_lt_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_lt_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_lt_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_lt_int64(int64_t xfill,
- int64_t yfill):
- return xfill < yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_gt_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] > yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill > y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] > y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] > yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill > y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill > yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_gt_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill > y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] > yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] > y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] > yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill > y[yi]
- yi += 1
-
- return out, out_index, xfill > yfill
-
-
-cpdef sparse_gt_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_gt_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_gt_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_gt_float64(float64_t xfill,
- float64_t yfill):
- return xfill > yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_gt_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] > yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill > y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] > y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] > yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill > y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill > yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_gt_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill > y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] > yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] > y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] > yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill > y[yi]
- yi += 1
-
- return out, out_index, xfill > yfill
-
-
-cpdef sparse_gt_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_gt_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_gt_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_gt_int64(int64_t xfill,
- int64_t yfill):
- return xfill > yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_le_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] <= yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill <= y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] <= y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] <= yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill <= y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill <= yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_le_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill <= y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] <= yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] <= y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] <= yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill <= y[yi]
- yi += 1
-
- return out, out_index, xfill <= yfill
-
-
-cpdef sparse_le_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_le_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_le_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_le_float64(float64_t xfill,
- float64_t yfill):
- return xfill <= yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_le_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] <= yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill <= y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] <= y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] <= yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill <= y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill <= yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_le_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill <= y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] <= yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] <= y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] <= yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill <= y[yi]
- yi += 1
-
- return out, out_index, xfill <= yfill
-
-
-cpdef sparse_le_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_le_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_le_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_le_int64(int64_t xfill,
- int64_t yfill):
- return xfill <= yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_ge_float64(ndarray x_,
- BlockIndex xindex,
- float64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- float64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] >= yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill >= y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] >= y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] >= yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill >= y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill >= yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_ge_float64(ndarray x_, IntIndex xindex,
- float64_t xfill,
- ndarray y_, IntIndex yindex,
- float64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[float64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill >= y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] >= yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] >= y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] >= yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill >= y[yi]
- yi += 1
-
- return out, out_index, xfill >= yfill
-
-
-cpdef sparse_ge_float64(ndarray[float64_t, ndim=1] x,
- SparseIndex xindex, float64_t xfill,
- ndarray[float64_t, ndim=1] y,
- SparseIndex yindex, float64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_ge_float64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_ge_float64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_ge_float64(float64_t xfill,
- float64_t yfill):
- return xfill >= yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_ge_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] >= yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill >= y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] >= y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] >= yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill >= y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill >= yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_ge_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill >= y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] >= yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] >= y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] >= yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill >= y[yi]
- yi += 1
-
- return out, out_index, xfill >= yfill
-
-
-cpdef sparse_ge_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_ge_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_ge_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_ge_int64(int64_t xfill,
- int64_t yfill):
- return xfill >= yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_and_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] & yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill & y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] & y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] & yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill & y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill & yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_and_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill & y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] & yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] & y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] & yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill & y[yi]
- yi += 1
-
- return out, out_index, xfill & yfill
-
-
-cpdef sparse_and_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_and_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_and_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_and_int64(int64_t xfill,
- int64_t yfill):
- return xfill & yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_and_uint8(ndarray x_,
- BlockIndex xindex,
- uint8_t xfill,
- ndarray y_,
- BlockIndex yindex,
- uint8_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[uint8_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] & yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill & y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] & y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] & yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill & y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill & yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_and_uint8(ndarray x_, IntIndex xindex,
- uint8_t xfill,
- ndarray y_, IntIndex yindex,
- uint8_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[uint8_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill & y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] & yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] & y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] & yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill & y[yi]
- yi += 1
-
- return out, out_index, xfill & yfill
-
-
-cpdef sparse_and_uint8(ndarray[uint8_t, ndim=1] x,
- SparseIndex xindex, uint8_t xfill,
- ndarray[uint8_t, ndim=1] y,
- SparseIndex yindex, uint8_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_and_uint8(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_and_uint8(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_and_uint8(uint8_t xfill,
- uint8_t yfill):
- return xfill & yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_or_int64(ndarray x_,
- BlockIndex xindex,
- int64_t xfill,
- ndarray y_,
- BlockIndex yindex,
- int64_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] | yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill | y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] | y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] | yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill | y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill | yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_or_int64(ndarray x_, IntIndex xindex,
- int64_t xfill,
- ndarray y_, IntIndex yindex,
- int64_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[int64_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill | y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] | yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] | y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] | yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill | y[yi]
- yi += 1
-
- return out, out_index, xfill | yfill
-
-
-cpdef sparse_or_int64(ndarray[int64_t, ndim=1] x,
- SparseIndex xindex, int64_t xfill,
- ndarray[int64_t, ndim=1] y,
- SparseIndex yindex, int64_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_or_int64(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_or_int64(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_or_int64(int64_t xfill,
- int64_t yfill):
- return xfill | yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple block_op_or_uint8(ndarray x_,
- BlockIndex xindex,
- uint8_t xfill,
- ndarray y_,
- BlockIndex yindex,
- uint8_t yfill):
- '''
- Binary operator on BlockIndex objects with fill values
- '''
-
- cdef:
- BlockIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xbp = 0, ybp = 0 # block positions
- int32_t xloc, yloc
- Py_ssize_t xblock = 0, yblock = 0 # block numbers
-
- ndarray[uint8_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # to suppress Cython warning
- x = x_
- y = y_
-
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- # Wow, what a hack job. Need to do something about this
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if yblock == yindex.nblocks:
- # use y fill value
- out[out_i] = x[xi] | yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- continue
-
- if xblock == xindex.nblocks:
- # use x fill value
- out[out_i] = xfill | y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
- continue
-
- yloc = yindex.locbuf[yblock] + ybp
- xloc = xindex.locbuf[xblock] + xbp
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] | y[yi]
- xi += 1
- yi += 1
-
- # advance both locations
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
-
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] | yfill
- xi += 1
-
- # advance x location
- xbp += 1
- if xbp == xindex.lenbuf[xblock]:
- xblock += 1
- xbp = 0
- else:
- # use x fill value
- out[out_i] = xfill | y[yi]
- yi += 1
-
- # advance y location
- ybp += 1
- if ybp == yindex.lenbuf[yblock]:
- yblock += 1
- ybp = 0
-
- return out, out_index, xfill | yfill
-
-
-@cython.wraparound(False)
-@cython.boundscheck(False)
-cdef inline tuple int_op_or_uint8(ndarray x_, IntIndex xindex,
- uint8_t xfill,
- ndarray y_, IntIndex yindex,
- uint8_t yfill):
- cdef:
- IntIndex out_index
- Py_ssize_t xi = 0, yi = 0, out_i = 0 # fp buf indices
- int32_t xloc, yloc
- ndarray[int32_t, ndim=1] xindices, yindices, out_indices
- ndarray[uint8_t, ndim=1] x, y
- ndarray[uint8_t, ndim=1] out
-
- # suppress Cython compiler warnings due to inlining
- x = x_
- y = y_
-
- # need to do this first to know size of result array
- out_index = xindex.make_union(yindex)
- out = np.empty(out_index.npoints, dtype=np.uint8)
-
- xindices = xindex.indices
- yindices = yindex.indices
- out_indices = out_index.indices
-
- # walk the two SparseVectors, adding matched locations...
- for out_i from 0 <= out_i < out_index.npoints:
- if xi == xindex.npoints:
- # use x fill value
- out[out_i] = xfill | y[yi]
- yi += 1
- continue
-
- if yi == yindex.npoints:
- # use y fill value
- out[out_i] = x[xi] | yfill
- xi += 1
- continue
-
- xloc = xindices[xi]
- yloc = yindices[yi]
-
- # each index in the out_index had to come from either x, y, or both
- if xloc == yloc:
- out[out_i] = x[xi] | y[yi]
- xi += 1
- yi += 1
- elif xloc < yloc:
- # use y fill value
- out[out_i] = x[xi] | yfill
- xi += 1
- else:
- # use x fill value
- out[out_i] = xfill | y[yi]
- yi += 1
-
- return out, out_index, xfill | yfill
-
-
-cpdef sparse_or_uint8(ndarray[uint8_t, ndim=1] x,
- SparseIndex xindex, uint8_t xfill,
- ndarray[uint8_t, ndim=1] y,
- SparseIndex yindex, uint8_t yfill):
-
- if isinstance(xindex, BlockIndex):
- return block_op_or_uint8(x, xindex.to_block_index(), xfill,
- y, yindex.to_block_index(), yfill)
- elif isinstance(xindex, IntIndex):
- return int_op_or_uint8(x, xindex.to_int_index(), xfill,
- y, yindex.to_int_index(), yfill)
- else:
- raise NotImplementedError
-
-
-cpdef sparse_fill_or_uint8(uint8_t xfill,
- uint8_t yfill):
- return xfill | yfill
diff --git a/pandas/src/ujson/lib/ultrajson.h b/pandas/src/ujson/lib/ultrajson.h
index c37fe8c8e6c38..3bfb4b26c0095 100644
--- a/pandas/src/ujson/lib/ultrajson.h
+++ b/pandas/src/ujson/lib/ultrajson.h
@@ -49,8 +49,8 @@ tree doesn't have cyclic references.
*/
-#ifndef __ULTRAJSON_H__
-#define __ULTRAJSON_H__
+#ifndef PANDAS_SRC_UJSON_LIB_ULTRAJSON_H_
+#define PANDAS_SRC_UJSON_LIB_ULTRAJSON_H_
#include
#include
@@ -143,25 +143,23 @@ typedef int64_t JSLONG;
#error "Endianess not supported"
#endif
-enum JSTYPES
-{
- JT_NULL, // NULL
- JT_TRUE, //boolean true
- JT_FALSE, //boolean false
- JT_INT, //(JSINT32 (signed 32-bit))
- JT_LONG, //(JSINT64 (signed 64-bit))
- JT_DOUBLE, //(double)
- JT_UTF8, //(char 8-bit)
- JT_ARRAY, // Array structure
- JT_OBJECT, // Key/Value structure
- JT_INVALID, // Internal, do not return nor expect
+enum JSTYPES {
+ JT_NULL, // NULL
+ JT_TRUE, // boolean true
+ JT_FALSE, // boolean false
+ JT_INT, // (JSINT32 (signed 32-bit))
+ JT_LONG, // (JSINT64 (signed 64-bit))
+ JT_DOUBLE, // (double)
+ JT_UTF8, // (char 8-bit)
+ JT_ARRAY, // Array structure
+ JT_OBJECT, // Key/Value structure
+ JT_INVALID, // Internal, do not return nor expect
};
typedef void * JSOBJ;
typedef void * JSITER;
-typedef struct __JSONTypeContext
-{
+typedef struct __JSONTypeContext {
int type;
void *encoder;
void *prv;
@@ -173,16 +171,17 @@ typedef void (*JSPFN_ITERBEGIN)(JSOBJ obj, JSONTypeContext *tc);
typedef int (*JSPFN_ITERNEXT)(JSOBJ obj, JSONTypeContext *tc);
typedef void (*JSPFN_ITEREND)(JSOBJ obj, JSONTypeContext *tc);
typedef JSOBJ (*JSPFN_ITERGETVALUE)(JSOBJ obj, JSONTypeContext *tc);
-typedef char *(*JSPFN_ITERGETNAME)(JSOBJ obj, JSONTypeContext *tc, size_t *outLen);
+typedef char *(*JSPFN_ITERGETNAME)(JSOBJ obj, JSONTypeContext *tc,
+ size_t *outLen);
typedef void *(*JSPFN_MALLOC)(size_t size);
typedef void (*JSPFN_FREE)(void *pptr);
typedef void *(*JSPFN_REALLOC)(void *base, size_t size);
-typedef struct __JSONObjectEncoder
-{
+typedef struct __JSONObjectEncoder {
void (*beginTypeContext)(JSOBJ obj, JSONTypeContext *tc);
void (*endTypeContext)(JSOBJ obj, JSONTypeContext *tc);
- const char *(*getStringValue)(JSOBJ obj, JSONTypeContext *tc, size_t *_outLen);
+ const char *(*getStringValue)(JSOBJ obj, JSONTypeContext *tc,
+ size_t *_outLen);
JSINT64 (*getLongValue)(JSOBJ obj, JSONTypeContext *tc);
JSINT32 (*getIntValue)(JSOBJ obj, JSONTypeContext *tc);
double (*getDoubleValue)(JSOBJ obj, JSONTypeContext *tc);
@@ -256,10 +255,8 @@ typedef struct __JSONObjectEncoder
char *end;
int heap;
int level;
-
} JSONObjectEncoder;
-
/*
Encode an object structure into JSON.
@@ -279,12 +276,10 @@ Life cycle of the provided buffer must still be handled by caller.
If the return value doesn't equal the specified buffer caller must release the memory using
JSONObjectEncoder.free or free() as specified when calling this function.
*/
-EXPORTFUNCTION char *JSON_EncodeObject(JSOBJ obj, JSONObjectEncoder *enc, char *buffer, size_t cbBuffer);
-
+EXPORTFUNCTION char *JSON_EncodeObject(JSOBJ obj, JSONObjectEncoder *enc,
+ char *buffer, size_t cbBuffer);
-
-typedef struct __JSONObjectDecoder
-{
+typedef struct __JSONObjectDecoder {
JSOBJ (*newString)(void *prv, wchar_t *start, wchar_t *end);
int (*objectAddKey)(void *prv, JSOBJ obj, JSOBJ name, JSOBJ value);
int (*arrayAddItem)(void *prv, JSOBJ obj, JSOBJ value);
@@ -308,7 +303,8 @@ typedef struct __JSONObjectDecoder
void *prv;
} JSONObjectDecoder;
-EXPORTFUNCTION JSOBJ JSON_DecodeObject(JSONObjectDecoder *dec, const char *buffer, size_t cbBuffer);
+EXPORTFUNCTION JSOBJ JSON_DecodeObject(JSONObjectDecoder *dec,
+ const char *buffer, size_t cbBuffer);
EXPORTFUNCTION void encode(JSOBJ, JSONObjectEncoder *, const char *, size_t);
-#endif
+#endif // PANDAS_SRC_UJSON_LIB_ULTRAJSON_H_
diff --git a/pandas/src/ujson/lib/ultrajsondec.c b/pandas/src/ujson/lib/ultrajsondec.c
index 5496068832f2e..a847b0f5d5102 100644
--- a/pandas/src/ujson/lib/ultrajsondec.c
+++ b/pandas/src/ujson/lib/ultrajsondec.c
@@ -16,8 +16,10 @@ derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL ESN SOCIAL SOFTWARE AB OR JONAS TARNSTROM BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+DISCLAIMED. IN NO EVENT SHALL ESN SOCIAL SOFTWARE AB OR JONAS TARNSTROM BE
+LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
@@ -27,7 +29,8 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Portions of code from MODP_ASCII - Ascii transformations (upper/lower, etc)
https://github.com/client9/stringencoders
-Copyright (c) 2007 Nick Galbreath -- nickg [at] modp [dot] com. All rights reserved.
+Copyright (c) 2007 Nick Galbreath -- nickg [at] modp [dot] com. All rights
+reserved.
Numeric decoder derived from from TCL library
http://www.opensource.apple.com/source/tcl/tcl-14/tcl/license.terms
@@ -35,15 +38,15 @@ Numeric decoder derived from from TCL library
* Copyright (c) 1994 Sun Microsystems, Inc.
*/
-#include "ultrajson.h"
-#include
#include
-#include
-#include
-#include
-#include
#include
+#include
#include
+#include