-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
consolidated the duplicate definitions of NA values (in parsers & IO) #16589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #16589 +/- ##
==========================================
- Coverage 90.95% 90.92% -0.03%
==========================================
Files 161 161
Lines 49276 49276
==========================================
- Hits 44817 44805 -12
- Misses 4459 4471 +12
Continue to review full report at Codecov.
|
pandas/tests/io/parser/na_values.py
Outdated
'#N/A', 'N/A', 'n/a', 'NA', '#NA', 'NULL', 'null', | ||
'NaN', 'nan', '-NaN', '-nan', '#N/A N/A', '']) | ||
assert _NA_VALUES == parsers._NA_VALUES | ||
_NA_VALUES = parsers._NA_VALUES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to stay it asserts that nothing has changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted the change.
pandas/_libs/parsers.pyx
Outdated
@@ -273,12 +275,16 @@ cdef extern from "parser/io.h": | |||
|
|||
DEFAULT_CHUNKSIZE = 256 * 1024 | |||
|
|||
|
|||
def c_type_conv(st): | |||
cdef bytes py_bytes = st.encode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we have a routine already like this
look around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, i think have found the function
_NA_VALUES = _ensure_encoded(parsers._NA_VALUES)
then even list comprehension is not needed; need to double check as haven't used Cython before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another function that can be used is asbytes
@gfyoung look ok to you? |
Awesome that this works! My only question is whether we should be importing from |
i think it should not matter from the execution point of view as parsers import _NA_VALUES from common, "Explicit is better than implicit." common makes it explicit, on the other hand not sure what is the role of parsers? is it a gatekeeper? |
importing from io.common is best |
will change to io.common |
@OlegShteynbuk : nitpicking here: I think we generally use |
thanks, changed, was not aware of this |
got the message - 'No test commands were found", local tests were ok. |
ok restarted circle. ping on green. |
repetition of the same default NA values in io.rst - removed; in io.rst just after the defaults are listed there is a sentence Although a 0-length string but should i remove this sentence? |
@@ -1020,8 +1019,11 @@ the corresponding equivalent values will also imply a missing value (in this cas | |||
``[5.0,5]`` are recognized as ``NaN``. | |||
|
|||
To completely override the default values that are recognized as missing, specify ``keep_default_na=False``. | |||
The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you build the docs and show a rendering of this page. I think this might generate a build warning (and may not render correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have build the docs locally before commit, there were warnings, some of them might be related to python 3, i have 2.7.13 on linux ; file doc/source/style.ipynb also was a problem, but the generated html looks ok, can not attach html file to this replay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only difference that i can see is a blank line because of the new label, but it might be not at bad thing at all, but i can remove the label and reuse the existing one that is for the heading NA Values, that will be several lines above
can u show a screenshot of the built page & also of the doc-string (in ipython) |
have added the label and reference to the label, not the doc-string (in ipython) probably will be the next step |
doc/source/io.rst
Outdated
NA values. By default the following values are interpreted as NaN: | ||
``'-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA', | ||
'#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', ''``. | ||
NA values. By default the following values are interpreted as NaN: See :ref:`na values const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence "By default the following values are interpreted as NaN:" can be removed. I think
See :ref:`na values const <io.navaluesconst>` below for a list of the values interpreted as NaN by default.
db67d62
to
d5a52ca
Compare
thanks @OlegShteynbuk and if anything looks amiss, pls open a PR |
@jreback built docs are OK, it really took some time to generate. |
xref #16534 (comment)
xref #16606