-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix read_csv to parse timezone correctly #22380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/io/parsers.py
Outdated
dayfirst=dayfirst, | ||
errors='ignore', | ||
infer_datetime_format=infer_datetime_format | ||
) | ||
if not isinstance(converted, DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what breaks if you just change box-> True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
box=True
will make a non-datetime array into an Index
, resulting downstream errors. The downstream expects ndarray
rather than Index
pandas.io.parsers._infer_types breaks. Specifically, pandas._libs.parsers.sanitize_objects only accepts ndarray as the input. Otherwise, it raises an exception. |
right so just put a and see what happens |
Codecov Report
@@ Coverage Diff @@
## master #22380 +/- ##
=======================================
Coverage 92.05% 92.05%
=======================================
Files 169 169
Lines 50733 50733
=======================================
Hits 46702 46702
Misses 4031 4031
Continue to review full report at Codecov.
|
We if cast data into np.ndarray, it will lost the timezone information. ndarray can't contain timezone information. link That's why I had to bypass it. |
So this is a case I didn't account for when I recently fixed
I don't have the patch off the top of my head, but what would happen if instead if the |
@mroeschke Thinks for the idea! I don't want to modify I think I have come up with a solution, inspired by @mroeschke .
The whole problem was caused by |
@swyoon object arrays of Timestamps are quite non performant |
b0430cb
to
45c5432
Compare
@jreback I have applied |
@@ -674,3 +674,19 @@ def test_parse_date_float(self, data, expected, parse_dates): | |||
# (i.e. float precision should remain unchanged). | |||
result = self.read_csv(StringIO(data), parse_dates=parse_dates) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_parse_timezone(self): | |||
import pytz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move import to top of file.
8ef5b1d
to
e4fc116
Compare
a3e4f7b
to
52aa9b4
Compare
Yeah...Travis has been acting up a bit. Restarted the failing builds, as they didn't look related to this PR. |
@gfyoung Thank, but it failed again... failed job Could you run it one more time, please? |
@gfyoung Sorry, but travis still doesn't work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, lgtm. ping on green.
@@ -674,3 +666,18 @@ def test_parse_date_float(self, data, expected, parse_dates): | |||
# (i.e. float precision should remain unchanged). | |||
result = self.read_csv(StringIO(data), parse_dates=parse_dates) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_parse_timezone(self): | |||
data = """dt,val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number as a comment
I think we also might have another open issue about this (aside from the referenced one), if you'd do a search and see. |
2f7e9c3
to
91ef33e
Compare
- use box=True for to_datetime(), and adjust downstream processing to the change. - resolve pandas-dev#22256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
cc @jreback
@jreback is there anything needs to be done before merging? |
thanks @swyoon |
@jreback @gfyoung @mroeschke Thank you all |
Use
box=True
forto_datetime()
, and adjust downstream processing to the change.git diff upstream/master -u -- "*.py" | flake8 --diff