Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest: Excel ingest can't accept more than 26 (Z) columns. #3382

Closed
kcondon opened this issue Sep 26, 2016 · 12 comments
Closed

Ingest: Excel ingest can't accept more than 26 (Z) columns. #3382

kcondon opened this issue Sep 26, 2016 · 12 comments
Labels
Feature: File Upload & Handling Type: Bug a defect User Role: Depositor Creates datasets, uploads data, etc.
Milestone

Comments

@kcondon
Copy link
Contributor

kcondon commented Sep 26, 2016

See RT 241282

User had excel file with more than 26 columns, past Z, AA, AB, etc.
Removing the extra columns made the error disappear.

[2016-09-26T15:56:56.664-0400] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.ingest.tabulardata.impl.plugins.xlsx] [tid: _ThreadID=68 _ThreadName=p: thread-pool-1; w: 5] [timeMillis: 1474919816664] [levelValue: 900] [[
Unsupported column index tag: AA]]

[2016-09-26T15:56:56.665-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=68 _ThreadName=p: thread-pool-1; w: 5] [timeMillis: 1474919816665] [levelValue: 800] [[
Ingest failure (IO Exception): Could not parse Excel/XLSX spreadsheet. Could not establish position index of a cell element unambiguously!; Sent push notification to the page.]]

[2016-09-26T15:56:56.668-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestMessageBean] [tid: _ThreadID=68 _ThreadName=p: thread-pool-1; w: 5] [timeMillis: 1474919816668] [levelValue: 800] [[
Error occurred during ingest job!]]

@pdurbin
Copy link
Member

pdurbin commented Mar 30, 2017

Here's how it looks in the UI:

screen shot 2017-03-30 at 8 45 39 am

The same file also produces the error at #3383.

@pdurbin pdurbin added the User Role: Depositor Creates datasets, uploads data, etc. label Jul 12, 2017
@markrlondon
Copy link

FWIW, we just ran into this same issue, when testing depositing excel files that were created by a user. See attached. Is this a problem that will be solved, or should the user be formatting their data differently? Thanks.!

17ja030_fig1_data.xlsx

@pdurbin
Copy link
Member

pdurbin commented Aug 7, 2017

@oscardssmith do you happen to know how many columns are supported in the new CSV parser developed for #3767? I'm wondering if @markrlondon 's 17ja030_fig1_data.xlsx could be saved as CSV and ingested that way once the next version of Dataverse is out.

@oscardssmith
Copy link
Contributor

I believe it is roughly unlimited. That would be a solution, but not a very good one, as xlss can store more information than a corresponding csv. (Colour, etc). As such this issue should not be closed, as we should at some point support ingesting the original files.

@markrlondon
Copy link

Thanks for looking at this problem. Unfortunately, this particular excel file has multiple sheets, and each sheet would have to be saved as individual CSV files. So that's another problem with using that format. (I'm not sure what Dataverse does with an excel file with the multiple sheets, even if it didn't have the column problem). As long as we can update the files, I guess I can't complain too much. Although it would nice if I could disable the displayed warning message. :) - Mark

@pdurbin
Copy link
Member

pdurbin commented Aug 10, 2017

@markrlondon #585 says, "it is unclear what is typically expected from researchers who use Excel to store their data" so I like to encourage you to leave a comment there to explain what your expectations are for Excel. Thanks.

@oscardssmith
Copy link
Contributor

We might be able to fix this by rewriting csv ingest to work by using libre office to convert multi page Excel files to 1 csv per tab, and ingesting those as csv. It will still lose some info, but it would be a pretty big improvement, and could be done with about 4 lines of code. We would have to decide though that it was a path we liked, as you till now, I don't think we ever split a file when ingesting

@paciorek
Copy link

Is the 26 column limit for ingesting Excel still an issue? I manage a Dataverse within the Harvard Dataverse and the user is getting the same "could not parse Excel..." error message. The user's two files that are giving the error have 30 columns each.

@oscardssmith
Copy link
Contributor

probably. Excel is one of the few formats that hasn't received much love in the past two years.

@pdurbin
Copy link
Member

pdurbin commented Sep 14, 2018

@paciorek yes, I believe so. I'm not sure if you caught the "convert to CSV" workaround I suggested above but you could try that, if your Excel file doesn't have multiple sheets.

@qqmyers
Copy link
Member

qqmyers commented May 29, 2019

Looks like this is just a limit due to the number-letter conversion routines for columns only supporting A-Z columns mapping to 0-25. I'll add a PR that also supports AA-ZZ (up to 26**2 == 676 more columns).

@mheppler
Copy link
Contributor

Seeing as the PR #5891 from @qqmyers was approved by @landreev and merged by @kcondon, I am closing this issue as "Done".

@pdurbin pdurbin added this to the 4.15 milestone Jun 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: File Upload & Handling Type: Bug a defect User Role: Depositor Creates datasets, uploads data, etc.
Projects
None yet
Development

No branches or pull requests

7 participants