-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column type detection fails in XlsxReader #751
Comments
Hi, @Nadavbi87, Is it possible to see if you can recreate this by first exporting the Excel file to CSV and see if it has the same problem on import? |
Hi all, I could reproduce the problem always by using one of the empty string indicators (TypeUtils.MISSING_INDICATORS) as the first value in the column (after the header). In this case the column is identified as a StringColumn and the following numeric values are ignored (the column is filled with empty values). The problem does not occur with csv as the column type identification is done in a different way than with the excel reader (which rely on the cellType). I think there are 2 different issues there:
I can provide a couple of (failing) tests and try to implement a solution to the 1st issue if you want |
We have a The difficulty of using |
@benmccann Current type detection isn't based on the values, it's based on cell type on excel. |
Ah, I wasn't aware Excel provided a cell type |
Column type detection uses all column cells to determine the columnType instead of only the first one.
Column type detection uses all column cells to determine the columnType instead of only the first one.
Hi,
First of all, thanks for this awesome package.
I'm using the XlsxReader to import excel file, in one of my tests I came across the following issue:
When the reader auto-detect the column types and the first column value is empty, it automatically detects it as a String Column and creates a String typed column, till here everything is fine.
When any other value in that column is a Numeric type(in my case all of them) it gets ignored and instead, it returns an empty value.
So, as a result, I get a table with a column that all of the values are empty.
I would except that I get all the values at least as a string and not empty value.
The logic is in the
XlsxReader.private Column<?> appendValue(Column<?> column, Cell cell)
.I try to solve it by using the "specifying the datatypes for each column" approach
I did the following :
Then I realize that there is no use in the columnTypedToDetect in the XlsxReader.
*Another suggestion, it will be good if I'm using multiple sheets and using
public List<Table> readMultiple(XlsxReadOptions options) throws IOException
That I can pass a list of options that each one of them is bounded to a specific sheet, Because in the current situation if I want for example different name ( or different column type list) for each sheet/table I need to perform a multiple reads ( as the number of the sheets ) and set the options for each one even though under the hood for each call it goes over all the sheets and return the sheet that correspond the index the user pass in the options.
I 'm using the latest version 0.37.3.
Thanks,
Nadav
The text was updated successfully, but these errors were encountered: