Skip to content

consolidated the duplicate definitions of NA values (in parsers & IO) #16589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 13, 2017

Conversation

OlegShteynbuk
Copy link
Contributor

@OlegShteynbuk OlegShteynbuk commented Jun 3, 2017

@codecov
Copy link

codecov bot commented Jun 3, 2017

Codecov Report

Merging #16589 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16589      +/-   ##
==========================================
- Coverage   90.95%   90.92%   -0.03%     
==========================================
  Files         161      161              
  Lines       49276    49276              
==========================================
- Hits        44817    44805      -12     
- Misses       4459     4471      +12
Flag Coverage Δ
#multiple 88.68% <ø> (-0.03%) ⬇️
#single 40.18% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d298414...10cba2c. Read the comment docs.

'#N/A', 'N/A', 'n/a', 'NA', '#NA', 'NULL', 'null',
'NaN', 'nan', '-NaN', '-nan', '#N/A N/A', ''])
assert _NA_VALUES == parsers._NA_VALUES
_NA_VALUES = parsers._NA_VALUES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to stay it asserts that nothing has changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted the change.

@@ -273,12 +275,16 @@ cdef extern from "parser/io.h":

DEFAULT_CHUNKSIZE = 256 * 1024


def c_type_conv(st):
cdef bytes py_bytes = st.encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we have a routine already like this
look around

Copy link
Contributor Author

@OlegShteynbuk OlegShteynbuk Jun 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i think have found the function
_NA_VALUES = _ensure_encoded(parsers._NA_VALUES)
then even list comprehension is not needed; need to double check as haven't used Cython before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another function that can be used is asbytes

@jreback jreback added Clean IO CSV read_csv, to_csv Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jun 5, 2017
@jreback jreback added this to the 0.21.0 milestone Jun 5, 2017
@jreback
Copy link
Contributor

jreback commented Jun 5, 2017

@gfyoung look ok to you?

@gfyoung
Copy link
Member

gfyoung commented Jun 5, 2017

Awesome that this works! My only question is whether we should be importing from io.common or from parsers (does it really matter, or does the former not work?)

@OlegShteynbuk
Copy link
Contributor Author

OlegShteynbuk commented Jun 5, 2017

i think it should not matter from the execution point of view as parsers import _NA_VALUES from common, "Explicit is better than implicit." common makes it explicit, on the other hand not sure what is the role of parsers? is it a gatekeeper?

@jreback
Copy link
Contributor

jreback commented Jun 5, 2017

importing from io.common is best

@OlegShteynbuk
Copy link
Contributor Author

will change to io.common

@gfyoung
Copy link
Member

gfyoung commented Jun 5, 2017

@OlegShteynbuk : nitpicking here: I think we generally use import pandas.io.common as com. However, LGTM otherwise.

@OlegShteynbuk
Copy link
Contributor Author

thanks, changed, was not aware of this

@OlegShteynbuk
Copy link
Contributor Author

got the message - 'No test commands were found", local tests were ok.
not sure what happened, is there a way to repeat the tests without pushing changes?

@jreback
Copy link
Contributor

jreback commented Jun 6, 2017

ok restarted circle. ping on green.

@OlegShteynbuk
Copy link
Contributor Author

repetition of the same default NA values in io.rst - removed;

in io.rst just after the defaults are listed there is a sentence

Although a 0-length string
'' is not included in the default NaN values list, it is still treated
as a missing value.

but '' is listed as a default and it is also in common.py is part of the defaults

should i remove this sentence?

@@ -1020,8 +1019,11 @@ the corresponding equivalent values will also imply a missing value (in this cas
``[5.0,5]`` are recognized as ``NaN``.

To completely override the default values that are recognized as missing, specify ``keep_default_na=False``.
The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you build the docs and show a rendering of this page. I think this might generate a build warning (and may not render correctly)

@jorisvandenbossche

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have build the docs locally before commit, there were warnings, some of them might be related to python 3, i have 2.7.13 on linux ; file doc/source/style.ipynb also was a problem, but the generated html looks ok, can not attach html file to this replay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only difference that i can see is a blank line because of the new label, but it might be not at bad thing at all, but i can remove the label and reuse the existing one that is for the heading NA Values, that will be several lines above

@jreback
Copy link
Contributor

jreback commented Jun 8, 2017

can u show a screenshot of the built page & also of the doc-string (in ipython)

@OlegShteynbuk
Copy link
Contributor Author

screenshot from 2017-06-08 09 35 45
screenshot from 2017-06-08 09 41 43

@OlegShteynbuk
Copy link
Contributor Author

have added the label and reference to the label, not the doc-string (in ipython) probably will be the next step

@jreback
Copy link
Contributor

jreback commented Jun 9, 2017

lgtm. @TomAugspurger @jorisvandenbossche

NA values. By default the following values are interpreted as NaN:
``'-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA',
'#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', ''``.
NA values. By default the following values are interpreted as NaN: See :ref:`na values const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence "By default the following values are interpreted as NaN:" can be removed. I think

See :ref:`na values const <io.navaluesconst>` below for a list of the values interpreted as NaN by default.

@OlegShteynbuk OlegShteynbuk force-pushed the na_values branch 2 times, most recently from db67d62 to d5a52ca Compare June 12, 2017 19:40
@jreback jreback merged commit 3caf858 into pandas-dev:master Jun 13, 2017
@jreback
Copy link
Contributor

jreback commented Jun 13, 2017

thanks @OlegShteynbuk
have a look at the built docs (prob take a bit to generate) http://pandas-docs.github.io/pandas-docs-travis/

and if anything looks amiss, pls open a PR

@OlegShteynbuk
Copy link
Contributor Author

OlegShteynbuk commented Jun 15, 2017

@jreback built docs are OK, it really took some time to generate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean IO CSV read_csv, to_csv Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants