Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #183

dsoft-jvo · 2024-06-24T12:37:07Z

Discussed in #182

^{Originally posted by dsoft-jvo June 21, 2024}
I use this table-transformer code to extract the tables and table structures of invoices. Without adding the --words_dir argument, the result is very satisfactory. From my understanding, the words_dir is needed to add the contents of the found structures to the result, so I tried adding it. After adding one, however, the result is strange. The detected table gets shrunk to a small corner of the image and the table-structures all overlap each other. At first, this seemed like a scaling problem, but after fixing this, the problem persists.

Aside from the visual result, the 'tables_structure' output is also strange when a --words_dir is added. Without --words_dir the amount of rows and columns seems to be constant. When adding the --words_dir, however, the amount of rows and columns varies. Sometimes there are more, sometimes less. The tokens are formatted as described in the docs/INFERENCE.MD document.

I cannot show any actual data or images, as the data is sensitive, but this is what I found during debugging:

Without --words_dir, i.e. tokens=[]:

With a --words_dir, i.e. tokens=[...data...]:

I feel like the problem lies in a misunderstanding I have about the functions of the --words_dir data. I have read the papers, but I feel like I am missing something about that aspect.

Could someone give some further explanation about the use and function of --words_dir? Are the results I am seeing expected? Why, or why not? And if not, how do I go about fixing them?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #183

Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #183

dsoft-jvo commented Jun 24, 2024

Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #183

Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #183

Comments

dsoft-jvo commented Jun 24, 2024

Discussed in #182