Version 1.7.3
What's Changed
-
Table linearization improvements by @Belval in #313
- Add
.get_text()
,.to_html()
and.to_markdown()
functions toLinearizable
which is now implemented byDocument
,Page
,DocumentEntity
andEntityList
- Add
HTMLLinearizationConfig
andMarkdownLinearizationConfig
as pre-configuredTextLinearizationConfig
- Add the follow parameters to
TextLinearizationConfig
duplicate_text_in_merged_cells
duplicates the text in merge cells to preserve row-level alignmenttable_flatten_headers
combines multi-row headers into a single row, duplicating the merged cells horizontally as neededtable_tabulate_remove_extra_hyphens
removes extra hyphens '-' in markdown tables to reduce context lengthmax_number_of_consecutive_spaces
defines the maximum number of contiguous whitespace characters, similar tomax_number_of_consecutive_new_lines
- Add
-
Fixes:
New Contributors
Full Changelog: v1.7.2...v1.7.3