-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest: line breaks in Excel cause ingest to fail #6874
Comments
Welcome @rdemgenski and thanks for the detailed report. If we work on this, we'd want to consider it at the same time as #3383. |
Version: 4.20 Hello everybody, Thank you for reporting this, @rdemgenski. We have the same problem here. Interestingly, the MD5 checksum produced by Dataverse for my test file (which contains a linebreak, thus leading to the error message and the failure to convert the file in .tab format) is the same hash that I get with another MD5 parser, http://onlinemd5.com/. So the ingest seems to be actually successful, apart from the change documented by @rdemgenski. |
Re-reading this, is this not actually just a duplicate of #3383 ? I think Robert's description & MWE are a lot clearer than the original error report, but it's the same problem |
And is it not the same as #7386, too? |
Ah my bad. Thanks for clarifying @pdurbin |
There's some recent discussion here: |
I wanted to suggest looking at Apache POI for parsing XLSX files, but I now see it is already in use. (I'm not volunteering to refactor the |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
Just wondering: was this really completed? Because it seems you are saying it is not planned. |
Hi @bencomp we are closing issues created before 2020-08-18 that do not have the |
Hello, first time here - working with @adam3smith at QDR
Line breaks in Excel lead to a reading mismatch and cause ingest to fail:
Issue replicated on demo (https://demo.dataverse.org/dataset.xhtml?persistentId=doi%3A10.70122%2FFK2%2FOIWIG6) with two files that are identical except for the line break. The one with line break at text end fails to ingest, the other ingests successfully.
Converting the xlsx file to csv and opening it in a text editor shows that line breaks in xlsx create line breaks in the text where they should not (i.e. they create new rows), which is likely the root issue.
The text was updated successfully, but these errors were encountered: