-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve LayoutLM #9476
Improve LayoutLM #9476
Conversation
def test_LayoutLM_backward_pass_reduces_loss(self): | ||
"""Test loss/gradients same as reference implementation, for example.""" | ||
pass | ||
self.assertTrue(torch.allclose(outputs.loss, expected_loss, atol=0.1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atol=1e-3
would not pass here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR! Thanks so much for taking care of this. The notebook looks great as well. Left a couple of comments. If possible it would be awesome if we could make the example a bit more concise (e.g. to just use tokenizer(...)
instead of tokenize(...)
and conevrt_tokens_to_ids(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great job @NielsRogge! Thanks a lot for your contribution!
Improve docs Add LayoutLM notebook to list of community notebooks
7eed265
to
fffc19a
Compare
Thanks for the reviews, I've addressed all comments. There are 2 things remaining:
|
I pushed the reformat you asked for @NielsRogge, make sure to pull before doing any more changes! |
Ok thank you, so the only thing remaining is make the code examples more efficient? Is there a way to make the code block (see comment above) better? |
* Add LayoutLMForSequenceClassification and integration tests Improve docs Add LayoutLM notebook to list of community notebooks * Make style & quality * Address comments by @sgugger, @patrickvonplaten and @LysandreJik * Fix rebase with master * Reformat in one line * Improve code examples as requested by @patrickvonplaten Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
What does this PR do?
LayoutLM
, explaining how people can normalize bounding boxes before passing them to the model, add links to the various datasets on which the model achieves state-of-the-art results, add code examples in the documentation for the various modelsLayoutLMForTokenClassification
on the FUNSD dataset (on which the model achieves SOTA results)LayoutLMForSequenceClassification
, which makes it possible to fine-tune LayoutLM for document image classification tasks (such as the RVL-CLIP dataset), extra tests included.Fixes the following issues:
Who can review?
@LysandreJik, @patrickvonplaten, @sgugger