Skip to content

BUG: joining empty series with dtype: datetime64[ns, UTC] #18447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kcajf opened this issue Nov 23, 2017 · 1 comment
Closed

BUG: joining empty series with dtype: datetime64[ns, UTC] #18447

kcajf opened this issue Nov 23, 2017 · 1 comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@kcajf
Copy link

kcajf commented Nov 23, 2017

Code Sample

>>> s1 = pd.Series(pd.to_datetime([], utc=True))
>>> s2 = pd.Series([1,2,3])
>>> pd.concat([s1, s2], axis=1)

IndexError: cannot do a non-empty take from an empty axes.


>>> s1 = pd.Series(pd.to_datetime([], utc=False))
>>> s2 = pd.Series([1,2,3])
>>> pd.concat([s1, s2], axis=1)

Empty DataFrame
Columns: [0, 1]
Index: []


>>> s1 = pd.Series(pd.to_datetime([], utc=True))
>>> s2 = pd.Series([])
>>> pd.concat([s1, s2], axis=1)

Empty DataFrame
Columns: [0, 1]
Index: []


>>> df1 = pd.DataFrame(columns=['a', 'b'])
>>> df2 = pd.DataFrame(np.random.random((2, 2)), columns=['c', 'd'])
>>> df1['a'] = pd.to_datetime(df1['b'], utc=True) 
>>> pd.concat([df1, df2], axis=1)

IndexError: cannot do a non-empty take from an empty axes.


>>> df1.join(df2, how='outer')

IndexError: cannot do a non-empty take from an empty axes.


>>> df1['a'] = pd.to_datetime(df1['b'], utc=False) 
>>> pd.concat([df1, df2], axis=1)

    a    b         c         d
0 NaT  NaN  0.777252  0.657679
1 NaT  NaN  0.274332  0.981532

Problem description

When trying to concatenate multiple series (or dataframes) along axis 1, if one of them is empty and has a UTC datetime column, the concatenation will fail with IndexError. This applies to joins as well. If you set the datetime column to be non-utc (i.e. tz-naive), it works as expected. If you concatenate 2 empty objects, one of which has a UTC datetime column, it works as expected.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.12.9-300.fc26.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.utf8
LOCALE: en_GB.UTF-8

pandas: 0.21.0
pytest: 3.0.7
pip: 9.0.1
setuptools: 36.6.0
Cython: None
numpy: 1.13.3
scipy: 0.19.1
pyarrow: 0.7.1
xarray: 0.9.6
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.4.0

@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

this is relateed to #12396

should be straightforward to fix, if you can do a PR!

@jreback jreback added Bug Difficulty Intermediate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Nov 25, 2017
@jreback jreback added this to the Next Major Release milestone Nov 25, 2017
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 21, 2018
jreback pushed a commit to jreback/pandas that referenced this issue May 12, 2018
topper-123 pushed a commit to topper-123/pandas that referenced this issue May 13, 2018
topper-123 pushed a commit to topper-123/pandas that referenced this issue May 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
2 participants