Extract all Text Objects #83

Banguiskode · 2019-12-22T09:38:02Z

Hi !
First of all, thank you for this great tool.
Let me ask you a question:
I would like to be able to extract a table (or a list) containing the text objects with their properties, is that possible?
Thanks

sambitdash · 2019-12-22T13:58:13Z

@Banguiskode thank you for your interest in the library. Your expectations are captured as enhancements #2, #7, #11 and #17.

PDF as a specification does not have any simple mechanism of specifying tabular structures as tables unless you post process the text positions extracted from the PDF files. While the API does not provided a very explicit API for the same, pdPageEvaluate can be extended to extract the text data and their positions. As part of tagged specification PDF supports specifying the tabular structure representations but a very small portion of the PDF files available in the market actually implement those specifications to a great extent. If you will like to contribute to any parts of PDFIO by implementing any of the features, we will be happy to accept PRs.

Since, the intent of the issue is already captured as part of other issues, I will close the issue with this comment.

Banguiskode · 2019-12-22T16:07:20Z

Thank you very much for your answer !

sambitdash closed this as completed Dec 22, 2019

sambitdash added the duplicate label Dec 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract all Text Objects #83

Extract all Text Objects #83

Banguiskode commented Dec 22, 2019

sambitdash commented Dec 22, 2019 •

edited

Loading

Banguiskode commented Dec 22, 2019

Extract all Text Objects #83

Extract all Text Objects #83

Comments

Banguiskode commented Dec 22, 2019

sambitdash commented Dec 22, 2019 • edited Loading

Banguiskode commented Dec 22, 2019

sambitdash commented Dec 22, 2019 •

edited

Loading