ODS sheets: Wrong results with sparse tables #12

hvbtup · 2015-11-26T10:32:49Z

At least the following function return wrong results when I create a sheet that contains data only in the cells (for example) G1, Z1, CD1, CE1:
sheet.ncols()
sheet["CD1"].value
and many more.
It seems as if the table:number-columns-repeated attribute is simply ignored.

hvbtup · 2015-11-26T12:46:58Z

I think the reason is the logic in tablenormalizer.py, class _ExpandAllLessMaxCount, method expand_cell. The elif condition seems wrong. OTOH the class name says exactly what is happening. This makes me think that using this class is wrong.
In my case, maxcols is less than the table:number-columns-repeated attribute.

I could work around this problem by calling

ezodf.conf.config.set_table_expand_strategy('all_less_maxcount', (100, 100))

Anyway, if the elif clause in expand_cell is called, the xml attribute is removed (and information gets lost) while the cell is treated as repeating-1.
This in turn causes the wrong result for ncols(), accessing columns and so on. And then it's not even possible to work around this by directly looking at the XML node (as I tried), because the table:number-columns-repeated attribute has been removed silently.

I propose to raise an Exception in this case or to mark the sheet and the row as corrupted (raising an exception if accessing cells by position is tried). And the XML attribute should not be removed in this case, allowing the developer to examine it himself.

matkoniecz · 2016-02-09T21:34:34Z

@T0ha Are you planning on doing anything with that? I just encountered this bug and it is especially bad as it silently gives completely wrong data.

I almost failed to notice it.

At least pyexcel-ods and odfpy clearly crashed, silently giving bad data is much worse.

ODS support is based on ezodf [1] library. It looks, that ezodf has a bug [2], where number of sheet rows and cells are calculated incorrectly. Probably this is the reason, why I get extra rows and cols in tests. [1] https://github.com/T0ha/ezodf [2] T0ha/ezodf#12 See: frictionlessdata#28

sirex · 2016-10-26T11:52:00Z

Not sure, but maybe this is related: https://github.com/frictionlessdata/tabulator-py/pull/114/files#r85104563

I got too many extra data filled with None values.

sirex · 2016-10-26T16:12:42Z

Sorry, issue I posted few hours ago is not related. It turned out, that if cells have a formatting applied, they are considered non-empty, even if they don't contain any data. When I created completely new document and pasted to it only part containing data, issue gone.

ODS support is based on ezodf [1] library. It looks, that ezodf has a bug [2], where number of sheet rows and cells are calculated incorrectly. Probably this is the reason, why I get extra rows and cols in tests. [1] https://github.com/T0ha/ezodf [2] T0ha/ezodf#12 See: frictionlessdata#28

* Add ODS support ODS support is based on ezodf [1] library. It looks, that ezodf has a bug [2], where number of sheet rows and cells are calculated incorrectly. Probably this is the reason, why I get extra rows and cols in tests. [1] https://github.com/T0ha/ezodf [2] T0ha/ezodf#12 See: #28 * Fix ODS issue with empty cells I had to create new sheet and paste unformated text there.

sirex mentioned this issue Oct 26, 2016

Add ODS support frictionlessdata/tabulator-py#114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ODS sheets: Wrong results with sparse tables #12

ODS sheets: Wrong results with sparse tables #12

hvbtup commented Nov 26, 2015

hvbtup commented Nov 26, 2015

matkoniecz commented Feb 9, 2016

sirex commented Oct 26, 2016

sirex commented Oct 26, 2016

ODS sheets: Wrong results with sparse tables #12

ODS sheets: Wrong results with sparse tables #12

Comments

hvbtup commented Nov 26, 2015

hvbtup commented Nov 26, 2015

matkoniecz commented Feb 9, 2016

sirex commented Oct 26, 2016

sirex commented Oct 26, 2016