-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Excel to support reading Timedeltas #4332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can you put post the error you did get? |
Also it would be helpful if you could print the versions of pandas and xlrd you are using (might need openpyxl if you're not using the dev version.) |
|
please see an example error msg at #4339 |
you have a pretty old version of pandas, 0.11 has been out since april, and 0.12 is releasing this week. excel parsing uses the csv parser under the hood, and pretty sure that all of your 3 posted issues are fixed in more recent versions. Pls try and close these issues if that is the case. (e.g. #4332 , #4340) |
With
The issue does nit arise. But now I get: XLDateAmbiguous: 1.0 even if I change to parse_dates=False and index_col=0. There is one column in the xlsx that has time (not date). But aapraently, the parser expects a datetime:
|
This is essentially a timedelta. Maybe just change the column formatting to text? |
Can I not get around this? I am receiving these tables from elsewhere. So the process should be automatised and I'd rather not touch the Excel tables. |
you could try passing : |
@timmie that sounds like an xlrd error. What happens if you just try to On Wed, Jul 24, 2013 at 8:50 AM, jreback notifications@github.com wrote:
|
it says: ValueError: dtype is not supported with python parser |
@jtratner So we would need to find a way to read the time cloumn. So can we read it as string or alike? |
ok...so maybe 2 bugs here, I thought and 2 processing as @jtratner suggest.... |
I tested outside pandas:
Is that what you suggested? |
Maybe this one could help to include better error msgs: https://classic.scraperwiki.com/docs/python/python_excel_guide/ |
I thought there was an issue out there to interpret this as a timedelta, can't find it so converting this issue to do that |
Sorry now I am lost.
Where shall I look next? |
is the |
Acccording to the docs not: but the source shwos that is read automatically from the file: |
best thing to prob do is do a monkey patch (for now), if you really want that column: start by defining
so it will use your code (and essentially fix the bug locally for yourself) |
mmh. this appraoch is still new for me. I cannot imagine why my file would be so exotic. It seems that xlrd tries to be overly exclicit. Would you say it's a pandas bug or from xlrd? (BTW, thanks a lot for all your responses!) |
not sure |
@timmie if you can share your data, I can try to figure out what's causing the bug and where the issue is occurring (can't promise super-fast turnaround, but probably by this weekend 😄) |
@jtratner : Thank you. very generous! But this is difficult. Let me prepare an anonysed version tomorrow. anyway, I know where the problem comes from. But do not know how to solve finally;-(
The last row with 00:00 causes the problem:
So it can be solved by the following code:
The only problem is that the last two timestamps appear like
But later I will prepend a date anyway. Is this an accepted solution for the core? Issue #4340 still persists with this workaround |
Thinking more over it, I would say that we would need a date_parser option similar to pd.read_csv see: https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L1665 this would need to be added to:
In many data files (like my excel file), creators count from hour 1 to hour 24. This thinking is the source of the confusion. And since padas has no metadata tag, we cannot find another way to show this relation. What are your opinions? |
Yeah, a minimal example (just enough to produce the failure) would be
|
@jtratner you can actually create a essentially:
|
@jreback I don't think that we are after timedelta, but rather adding a date_parser here. look at: #4332 (comment) I have already found a workaround for exetended date parsing. But I am unsure how to feed this back into pandas core |
you create a timedelta which is why I out the example up here |
Please find it here: I may add an example script later or tomorrow... |
I added example code to the repo. Please have a look at: |
The issue here is with the way Basically the logic is like this:
This issue has been fixed via #6934 when using So, as far as I can see, the (confusing) root cause of this issue has been fixed and this item can be closed. @jreback |
closed via #6934 |
ExcelFile should print out line or even cell warnings
Today I was spending quite some time debugging why a decoding error stopped the code from reading in a table.
I thought the skiprows counts from 0 (= Excel row 1). It was always failing.
In one line there were column headers, the next line contained units which couldn't probably be parsed.
I think it could be helpful if the parser would show the line number or even cell were it fails to ready (like due to decoding errors).
The text was updated successfully, but these errors were encountered: