Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XlsxReader casting integers cells to blanks #882

Open
Afollet opened this issue Feb 5, 2021 · 6 comments
Open

XlsxReader casting integers cells to blanks #882

Afollet opened this issue Feb 5, 2021 · 6 comments
Labels

Comments

@Afollet
Copy link

Afollet commented Feb 5, 2021

Hello devs,

Thanks for making/maintaining this library. I am currently in a microservice that validates the data in excel spreadsheets.

I'm reading an excel workbook with multiple sheets in tablesaw-excel.

public static List<Table> readXlsx(InputStream inputStream) throws IOException {
    Source source = new Source(inputStream);
    XlsxReadOptions options = XlsxReadOptions.builder(source).build();
    XlsxReader reader = new XlsxReader();
    return reader.readMultiple(options);
  }

I have cell types in Excel of both numeric and text in a column. When the table is read in, the numeric cell types become blank cells. Therefore, when I try to pull up missing values I get the rows where the numeric type cells are at

Table missing = table.where(table.column("myColumn").isMissing());

Is there some way I can change the behavior of this using XlsxReadOptions? Or is there another method I can use to avoid this behavior?

Thanks!

@Afollet
Copy link
Author

Afollet commented Apr 7, 2021

So, I made some changes to the XlsxReader to change the behavior as I needed ( I also made some changes so it's easier to use the code base in our project):
https://github.com/PDXFinder/pdx-validator/blob/dev/src/main/java/org/pdxfinder/validator/ValidatorXlsxReader.java

I made two changes:

  1. Change the method "appendValue" to return a string representation of a numeric cell if the column type is String.
  2. Change "isBlank" to not count numeric cells with a value of 0 as blank

I added a basic test for this behavior:
https://github.com/PDXFinder/pdx-validator/blob/dev/src/test/java/org/pdxfinder/ValidatorXlsxReaderTest.java

I would be willing to make a PR for these changes on this repo. I was not sure if the maintainers are interested in these changes, so let me know.

@lwhite1 lwhite1 added the excel label May 4, 2021
@lujop
Copy link
Contributor

lujop commented May 7, 2021

I think that it will be solved with #909 if we also do an improvement in type detection like commented on #751

With #909 the reader it's able to read number values in String columns
And if #751 it's solved will be a more consistent type detection that doesn't depend on the first cell type

lujop added a commit to lujop/tablesaw that referenced this issue May 9, 2021
Column type detection uses all column cells to determine the columnType instead of only the first one.
lujop added a commit to lujop/tablesaw that referenced this issue May 9, 2021
Column type detection uses all column cells to determine the columnType instead of only the first one.
@lujop lujop mentioned this issue May 9, 2021
1 task
lwhite1 pushed a commit that referenced this issue May 11, 2021
Column type detection uses all column cells to determine the columnType instead of only the first one.
@lujop
Copy link
Contributor

lujop commented May 11, 2021

@Afollet If you have time to check and can test with the current master version I think that this issue must be solved

@Afollet
Copy link
Author

Afollet commented May 12, 2021

@lujop Can I pull down the snapshot of this version of master? I think that would be easiest for me to test it, but I'm was not able to pull it on https://oss.sonatype.org/

Could you advise me on how to use maven to test these changes?

@lujop
Copy link
Contributor

lujop commented May 12, 2021

@Afollet you can just clone the repo and do a mvn install and it will build and install the tablesaw jar in your local maven repository.
Then use the dependency with the new version as normal and while in the same machine maven will take the dependency from your local repository.

@Afollet
Copy link
Author

Afollet commented May 13, 2021

@lujop Yes, I have tested your changes and it has resolved this issue.

Thanks for the great work @lujop . Dev's like you make the world go round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants