Extraction cell info from a specific form #235

adhamj90 · 2023-10-09T15:31:34Z

adhamj90
Oct 9, 2023

I am new to deepdoctection .
I would like to know if it is possible to get a custom pipeline that extract 5 tables, together with all the information, from a specific hand filled form.
There are 5 tables in the page, with almost 100 cells.
I have almost a hundred of filled documents, and would like to automate the extraction of all the information in the form.
is deepdoctection suitable for this kind of work? If so is there a tutorial on how I can get starting with the training?
I installed deepdoctection and tried the default pipeline with some of the scanned forms I have, but it managed to see only 1 table in the whole form. That is the reason why I am considering to train a custom pipeline.
What are the steps I should take? Should I annotate the documents I already have using somethine like Label Studio and create a training dataset? If so, should I label the tables alone or also the cells? Then, how could I proceed from there?
Any suggestion is much appreciated

JaMe76 · 2023-10-09T21:58:36Z

JaMe76
Oct 9, 2023
Maintainer

If you are only looking for tables, it's worth to check TableTransformer as well. You only need to change some of the configs.
This notebook will guide you.

If this does not work, then yes, labeling tables and fine-tuning the model to adapt to your samples is a possible way to go.

Once you have some labeled samples, this tutorial will show you, how to fine-tune your model.

You should try to start with table detection not cell recognition. Table segmentation is a second step and requires its own dataset. However, I would try if cell/row/column recognition works with open sourced model as labeling this is very cumbersome.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extraction cell info from a specific form #235

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Extraction cell info from a specific form #235

adhamj90 Oct 9, 2023

Replies: 1 comment

JaMe76 Oct 9, 2023 Maintainer

adhamj90
Oct 9, 2023

JaMe76
Oct 9, 2023
Maintainer