-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resetting Index on slice #15930
Comments
Can you make a full copy-pastable example (including constructing |
Sure. import pandas
data_index = pandas.read_table("data_index.tsv")
df = data_index[data_index['torsions'] == 2]
print df Top Contents of
|
As you mention yourself, you can use Changing this (to do this resetting automatically when doing a slice) would fundamentally change how pandas currently works. There are some ideas for a future release of pandas to allow dataframes without an explicit index (see wesm/pandas2#17, but that is currenlty just a discussion, no code or commitment this actually will happen) |
@jadolfbr this is fundamental pandas behavior. The index is preserved thru virtually all operations. That's the point. And combing is actually quite easy with |
I think options to allow dataframes without indexes would be great. They are extremely unwieldy without resetting that index. Maybe if you have multiple indexes with layers, etc, they would be good. However, these easily run into tons of problems in pandas as it stands now, so most of us in our lab shy away from that. Here is is a simple example. Maybe you can suggest a better way and say that I'm using pandas wrong. That's fine too. This is just to divide two values that are different experiments (and yes, in this case the row order does matter): length_data['length_rr'] = (rr_data[rr_data['exp'] == 'mw']['length_rr'].reset_index(drop=True)\
/rr_data[rr_data['exp'] == 'mo']['length_rr'].reset_index(drop=True))
length_data2['length_rr'] = (rr_data[rr_data['exp'] == 'rmw']['length_rr'].reset_index(drop=True)\
/rr_data[rr_data['exp'] == 'rmo']['length_rr'].reset_index(drop=True))
length_enrich = pandas.concat([length_data, length_data2]).reset_index(drop=True) Note that for the concat, if you don't reset and drop the index, pandas throws a duplicate index error if you do not reset with drop. For joining, etc. the indexes can again get in the way. Many times you want to be joining based on some operation of the data, so we use merge. But I guess that might be preference.
|
you are not using pandas power at all. you are in fact making a big assumption that the data that you are dividing is exactly the same length and perfectly lines up. maybe that's always true for you. I would probably do something like this. In fact this is quite general and deals with missing labeled data.
This is a slightly different and IMHO better way of organizing things.
|
Thanks for the suggestion. Yes, this seems much better than what I was trying to do - use the indexes instead of fighting with them and trying to go around them. Makes sense. I guess this would make joining a whole lot more straightforward too. Awesome. Thanks for taking the time to write back. |
Code Sample, a copy-pastable example if possible
Problem description
When slicing a dataframe, the index is not reset by default. This becomes an issue if you want to output that dataframe, combine that dataframe with other dataframes (good luck with that), or output the dataframe without two index columns.
Fixing this will not break code in the wild.
Expected Output
Index being correct - without the need to manually call reset_index over and over again. This is much more intuitive to end users.
-> At end of slice, call reset_index(drop = True) on the returned dataframe or current dataframe if you are slicing in-place.
Output of
pd.show_versions()
loaded rc file /Users/jadolfbr/.matplotlib/matplotlibrc
matplotlib version 1.5.1
verbose.level helpful
interactive is False
platform is darwin
INSTALLED VERSIONS
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 9.0.1
setuptools: 20.3.1
Cython: None
numpy: 1.11.1
scipy: 0.13.0b1
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: None
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: