-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest: Excel ingest can't accept more than 26 (Z) columns. #3382
Comments
Here's how it looks in the UI: The same file also produces the error at #3383. |
FWIW, we just ran into this same issue, when testing depositing excel files that were created by a user. See attached. Is this a problem that will be solved, or should the user be formatting their data differently? Thanks.! |
@oscardssmith do you happen to know how many columns are supported in the new CSV parser developed for #3767? I'm wondering if @markrlondon 's 17ja030_fig1_data.xlsx could be saved as CSV and ingested that way once the next version of Dataverse is out. |
I believe it is roughly unlimited. That would be a solution, but not a very good one, as xlss can store more information than a corresponding csv. (Colour, etc). As such this issue should not be closed, as we should at some point support ingesting the original files. |
Thanks for looking at this problem. Unfortunately, this particular excel file has multiple sheets, and each sheet would have to be saved as individual CSV files. So that's another problem with using that format. (I'm not sure what Dataverse does with an excel file with the multiple sheets, even if it didn't have the column problem). As long as we can update the files, I guess I can't complain too much. Although it would nice if I could disable the displayed warning message. :) - Mark |
@markrlondon #585 says, "it is unclear what is typically expected from researchers who use Excel to store their data" so I like to encourage you to leave a comment there to explain what your expectations are for Excel. Thanks. |
We might be able to fix this by rewriting csv ingest to work by using libre office to convert multi page Excel files to 1 csv per tab, and ingesting those as csv. It will still lose some info, but it would be a pretty big improvement, and could be done with about 4 lines of code. We would have to decide though that it was a path we liked, as you till now, I don't think we ever split a file when ingesting |
Is the 26 column limit for ingesting Excel still an issue? I manage a Dataverse within the Harvard Dataverse and the user is getting the same "could not parse Excel..." error message. The user's two files that are giving the error have 30 columns each. |
probably. Excel is one of the few formats that hasn't received much love in the past two years. |
@paciorek yes, I believe so. I'm not sure if you caught the "convert to CSV" workaround I suggested above but you could try that, if your Excel file doesn't have multiple sheets. |
Looks like this is just a limit due to the number-letter conversion routines for columns only supporting A-Z columns mapping to 0-25. I'll add a PR that also supports AA-ZZ (up to 26**2 == 676 more columns). |
See RT 241282
User had excel file with more than 26 columns, past Z, AA, AB, etc.
Removing the extra columns made the error disappear.
[2016-09-26T15:56:56.664-0400] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.ingest.tabulardata.impl.plugins.xlsx] [tid: _ThreadID=68 _ThreadName=p: thread-pool-1; w: 5] [timeMillis: 1474919816664] [levelValue: 900] [[
Unsupported column index tag: AA]]
[2016-09-26T15:56:56.665-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=68 _ThreadName=p: thread-pool-1; w: 5] [timeMillis: 1474919816665] [levelValue: 800] [[
Ingest failure (IO Exception): Could not parse Excel/XLSX spreadsheet. Could not establish position index of a cell element unambiguously!; Sent push notification to the page.]]
[2016-09-26T15:56:56.668-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestMessageBean] [tid: _ThreadID=68 _ThreadName=p: thread-pool-1; w: 5] [timeMillis: 1474919816668] [levelValue: 800] [[
Error occurred during ingest job!]]
The text was updated successfully, but these errors were encountered: