Skip to content

vectorised setting of timestamp columns fails with python datetime and numpy datetime64 #10408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seanv507 opened this issue Jun 22, 2015 · 13 comments · Fixed by #10644
Closed
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@seanv507
Copy link

import pandas as pd
import numpy as np
import datetime as dt
z=dt.date(2010,11,1)
zs=[z+dt.timedelta(days=r) for r in range(5)]
df=pd.DataFrame({'obj':zs, 'b':pd.Timestamp('2010-10-01'),'c':pd.Timestamp('2010-10-01')})
df.dtypes

#df.loc[0:2,'c']=dt.date(2010,10,12) # causes error: long() argument must be a string or a number, not 'datetime.date
df.loc[0:2,'c']=np.datetime64('2010-10-12') # sets to 1970...
df.at[4,'c']=np.datetime64('2010-10-12') # works
df.loc[0:2,'obj']=np.datetime64('2010-10-12') #works

df.loc[0:2,'obj']=dt.date(2010,10,12)

df
ind b c obj
0 2010-10-01 1970-01-01 2010-10-12
1 2010-10-01 1970-01-01 2010-10-12
2 2010-10-01 1970-01-01 2010-10-12
3 2010-10-01 2010-10-01 2010-11-04
4 2010-10-01 2010-10-12 2010-11-05

I am using Pandas 0.16.2

@jreback
Copy link
Contributor

jreback commented Jun 22, 2015

datetime.date are normally not supported for most datetime operations, simply use datetime.datetime and this will work. datetime.date are stored as object dtypes and thus are not very efficient. If you want to use a pure date object, the you might find Period objects useful.

I will mark this as a bug, and if you'd like to dig-in you are welcome. This fix is actually pretty straightforward.

@jreback jreback added Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions labels Jun 22, 2015
@jreback jreback added this to the Someday milestone Jun 22, 2015
@jreback jreback added the Bug label Jun 22, 2015
@seanv507
Copy link
Author

@jreback, you understood there is also an issue with numpy datetime64 'dates' not just python datetime.dates. (ie I get 1970 when I use a numpy datetime64 date)

@jreback
Copy link
Contributor

jreback commented Jun 23, 2015

@seanv507 you must have an older version of pandas/numpy. In current this works with np.datetime (see issue #9516), which is in 0.16.0.

In [13]: df.loc[0:2,'c']=np.datetime64('2010-10-12') # sets to 1970...

In [14]: df
Out[14]: 
           b                             c         obj
0 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-01
1 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-02
2 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-03
3 2010-10-01 2010-10-01 00:00:00.000000000  2010-11-04
4 2010-10-01 2010-10-01 00:00:00.000000000  2010-11-05

@jorisvandenbossche
Copy link
Member

@jreback I can confirm this also with pandas 0.16.2 / numpy 1.9.2

In [36]: np.__version__
Out[36]: '1.9.2'

In [37]: pd.__version__
Out[37]: '0.16.2'

In [38]: df.loc[0:2,'c'] = np.datetime64('2010-10-12')

In [39]: df
Out[39]:
           b                             c         obj
0 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-01
1 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-02
2 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-03
3 2010-10-01 2010-10-01 00:00:00.000000000  2010-11-04
4 2010-10-01 2010-10-01 00:00:00.000000000  2010-11-05

@seanv507
Copy link
Author

which is the version i am using as I mentioned in my original bug report
(forgot to state the numpy version 1.9.2)

On Tue, Jun 23, 2015 at 12:52 PM, Joris Van den Bossche <
notifications@github.com> wrote:

@jreback https://github.com/jreback I can confirm this also with pandas
0.16.2 / numpy 1.9.2

In [36]: np.version
Out[36]: '1.9.2'

In [37]: pd.version
Out[37]: '0.16.2'

In [38]: df.loc[0:2,'c'] = np.datetime64('2010-10-12')

In [39]: df
Out[39]:
b c obj
0 2010-10-01 1970-01-01 00:00:00.000014894 2010-11-01
1 2010-10-01 1970-01-01 00:00:00.000014894 2010-11-02
2 2010-10-01 1970-01-01 00:00:00.000014894 2010-11-03
3 2010-10-01 2010-10-01 00:00:00.000000000 2010-11-04
4 2010-10-01 2010-10-01 00:00:00.000000000 2010-11-05


Reply to this email directly or view it on GitHub
#10408 (comment).

@jreback
Copy link
Contributor

jreback commented Jun 23, 2015

Ahh, ok, seems that the test was insufficient; e.g. we are testing the equivalent of the [21], you are doing [22]

In [21]: np.datetime64(Timestamp('2010-10-12'))
Out[21]: numpy.datetime64('2010-10-11T20:00:00.000000-0400')

In [22]: np.datetime64('2010-10-12')
Out[22]: numpy.datetime64('2010-10-12')

ok, easy enough prob to fix, want to take a crack at it?

@jreback jreback modified the milestones: 0.17.0, Someday Jun 23, 2015
@jreback
Copy link
Contributor

jreback commented Jun 23, 2015

fix was in #9522 (original)

@jorisvandenbossche
Copy link
Member

@jreback the PR you link to is about a datetime64 in the left-hand-side (inside the loc), while here it is the value being assigned, so don't know if this is related

@jreback
Copy link
Contributor

jreback commented Jun 23, 2015

ahh right - ok should be straightforward in any event

@jorisvandenbossche
Copy link
Member

yep, indeed. But just to be sure this does not get lost, I will open a separate issue, and leave this one for the date error problem

-> #10412

@seanv507
Copy link
Author

@jreback - yes I will give it a go!

@seanv507 seanv507 reopened this Jun 23, 2015
@jreback
Copy link
Contributor

jreback commented Jun 23, 2015

@seanv507 gr8! here are the contributing docs. shout if you need help.

yarikoptic added a commit to neurodebian/pandas that referenced this issue Jul 2, 2015
* commit 'v0.16.2-42-g383865f': (72 commits)
  BUG: provide categorical concat always on axis 0, pandas-dev#10430     numpy 1.10 makes this an error for 1-d on axis != 0
  DOC: update missing.rst with ref to groupby.rst
  BUG: Timedeltas with no specified units (and frac) should raise, pandas-dev#10426
  BUG: using .loc[:,column] fails when the object is a multi-index, pandas-dev#10408
  Removed scikit-timeseries migration docs from FAQ
  BUG: GH10395 bug in DataFrame.interpolate with axis=1 and inplace=True
  BUG: GH10392 bug where Table.select_column does not preserve column name
  TST: Use unicode literals in string test
  PERF: fix _get_level_indexer to accept an intermediate indexer result
  PERF: bench for pandas-dev#10287
  BUG: drop_duplicates drops name(s).
  ENH: Enable ExcelWriter to construct in-memory sheets
  BLD: remove support for 3.2, pandas-dev#9118
  PERF: timedelta and datetime64 ops improvements
  PERF: parse timedelta strings in cython pandas-dev#6755
  closes bug in reset_index when index contains NaT
  Check for size=0 before setting item Fixes pandas-dev#10193
  closes bug in apply when function returns categorical
  BUG: frequencies.get_freq_code raises an error against offset with n != 1
  CI: run doc-tests always
  ...
@schettino72
Copy link
Contributor

PR #10644

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants