Replies: 1 comment
-
If you are only looking for tables, it's worth to check TableTransformer as well. You only need to change some of the configs. If this does not work, then yes, labeling tables and fine-tuning the model to adapt to your samples is a possible way to go. Once you have some labeled samples, this tutorial will show you, how to fine-tune your model. You should try to start with table detection not cell recognition. Table segmentation is a second step and requires its own dataset. However, I would try if cell/row/column recognition works with open sourced model as labeling this is very cumbersome. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am new to deepdoctection .
I would like to know if it is possible to get a custom pipeline that extract 5 tables, together with all the information, from a specific hand filled form.
There are 5 tables in the page, with almost 100 cells.
I have almost a hundred of filled documents, and would like to automate the extraction of all the information in the form.
is deepdoctection suitable for this kind of work? If so is there a tutorial on how I can get starting with the training?
I installed deepdoctection and tried the default pipeline with some of the scanned forms I have, but it managed to see only 1 table in the whole form. That is the reason why I am considering to train a custom pipeline.
What are the steps I should take? Should I annotate the documents I already have using somethine like Label Studio and create a training dataset? If so, should I label the tables alone or also the cells? Then, how could I proceed from there?
Any suggestion is much appreciated
Beta Was this translation helpful? Give feedback.
All reactions