Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODS sheets: Wrong results with sparse tables #12

Open
hvbtup opened this issue Nov 26, 2015 · 4 comments
Open

ODS sheets: Wrong results with sparse tables #12

hvbtup opened this issue Nov 26, 2015 · 4 comments

Comments

@hvbtup
Copy link

hvbtup commented Nov 26, 2015

At least the following function return wrong results when I create a sheet that contains data only in the cells (for example) G1, Z1, CD1, CE1:
sheet.ncols()
sheet["CD1"].value
and many more.
It seems as if the table:number-columns-repeated attribute is simply ignored.

@hvbtup
Copy link
Author

hvbtup commented Nov 26, 2015

I think the reason is the logic in tablenormalizer.py, class _ExpandAllLessMaxCount, method expand_cell. The elif condition seems wrong. OTOH the class name says exactly what is happening. This makes me think that using this class is wrong.
In my case, maxcols is less than the table:number-columns-repeated attribute.

I could work around this problem by calling

ezodf.conf.config.set_table_expand_strategy('all_less_maxcount', (100, 100))

Anyway, if the elif clause in expand_cell is called, the xml attribute is removed (and information gets lost) while the cell is treated as repeating-1.
This in turn causes the wrong result for ncols(), accessing columns and so on. And then it's not even possible to work around this by directly looking at the XML node (as I tried), because the table:number-columns-repeated attribute has been removed silently.

I propose to raise an Exception in this case or to mark the sheet and the row as corrupted (raising an exception if accessing cells by position is tried). And the XML attribute should not be removed in this case, allowing the developer to examine it himself.

@matkoniecz
Copy link

@T0ha Are you planning on doing anything with that? I just encountered this bug and it is especially bad as it silently gives completely wrong data.

I almost failed to notice it.

At least pyexcel-ods and odfpy clearly crashed, silently giving bad data is much worse.

sirex added a commit to sirex/tabulator-py that referenced this issue Oct 26, 2016
ODS support is based on ezodf [1] library.

It looks, that ezodf has a bug [2], where number of sheet rows and cells are
calculated incorrectly. Probably this is the reason, why I get extra rows and
cols in tests.

[1] https://github.com/T0ha/ezodf
[2] T0ha/ezodf#12

See: frictionlessdata#28
@sirex
Copy link

sirex commented Oct 26, 2016

Not sure, but maybe this is related: https://github.com/frictionlessdata/tabulator-py/pull/114/files#r85104563

I got too many extra data filled with None values.

@sirex
Copy link

sirex commented Oct 26, 2016

Sorry, issue I posted few hours ago is not related. It turned out, that if cells have a formatting applied, they are considered non-empty, even if they don't contain any data. When I created completely new document and pasted to it only part containing data, issue gone.

sirex added a commit to sirex/tabulator-py that referenced this issue Oct 27, 2016
ODS support is based on ezodf [1] library.

It looks, that ezodf has a bug [2], where number of sheet rows and cells are
calculated incorrectly. Probably this is the reason, why I get extra rows and
cols in tests.

[1] https://github.com/T0ha/ezodf
[2] T0ha/ezodf#12

See: frictionlessdata#28
roll pushed a commit to frictionlessdata/tabulator-py that referenced this issue Oct 27, 2016
* Add ODS support

ODS support is based on ezodf [1] library.

It looks, that ezodf has a bug [2], where number of sheet rows and cells are
calculated incorrectly. Probably this is the reason, why I get extra rows and
cols in tests.

[1] https://github.com/T0ha/ezodf
[2] T0ha/ezodf#12

See: #28

* Fix ODS issue with empty cells

I had to create new sheet and paste unformated text there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants