-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DocumentQuestionAnswering pipeline #18414
Merged
Merged
Changes from 33 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
17de42d
[WIP] Skeleton of VisualQuestionAnweringPipeline extended to support …
ankrgyl 8057900
Fixup
ankrgyl 652d140
Use the full encoding
ankrgyl 5fb7de5
Basic refactoring to DocumentQuestionAnsweringPipeline
ankrgyl 56618d0
Cleanup
ankrgyl 229920a
Improve args, docs, and implement preprocessing
ankrgyl 0e39080
Integrate OCR
ankrgyl 355ddc9
Refactor question_answering pipeline
ankrgyl afdbdaa
Use refactored QA code in the document qa pipeline
ankrgyl 3393395
Fix tests
ankrgyl fe83056
Some small cleanups
ankrgyl 391f98d
Use a string type annotation for Image.Image
ankrgyl 7f67d92
Update encoding with image features
ankrgyl 27790d7
Wire through the basic docs
ankrgyl b71835d
Handle invalid response
ankrgyl 8a4d8aa
Handle empty word_boxes properly
ankrgyl 2966e4f
Docstring fix
ankrgyl e852fc3
Integrate Donut model
ankrgyl 8e5fe30
Fixup
ankrgyl c60533e
Incorporate comments
ankrgyl d45dfe7
Address comments
ankrgyl a8d260b
Initial incorporation of tests
ankrgyl aeff3b2
Address Comments
ankrgyl a9e70c8
Change assert to ValueError
ankrgyl f654983
Comments
ankrgyl 23f6600
Wrap `score` in float to make it JSON serializable
ankrgyl 0168f3a
Incorporate AutoModeLForDocumentQuestionAnswering changes
ankrgyl 92f641b
Fixup
ankrgyl 2fe7a8c
Rename postprocess function
ankrgyl a59fbd3
Fix auto import
ankrgyl 6c94556
Applying comments
ankrgyl 08193d3
Improve docs
ankrgyl d271829
Remove extra assets and add copyright
ankrgyl 2a2bf09
Address comments
ankrgyl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,6 +51,7 @@ | |
infer_framework_load_model, | ||
) | ||
from .conversational import Conversation, ConversationalPipeline | ||
from .document_question_answering import DocumentQuestionAnsweringPipeline | ||
from .feature_extraction import FeatureExtractionPipeline | ||
from .fill_mask import FillMaskPipeline | ||
from .image_classification import ImageClassificationPipeline | ||
|
@@ -109,6 +110,7 @@ | |
AutoModelForAudioClassification, | ||
AutoModelForCausalLM, | ||
AutoModelForCTC, | ||
AutoModelForDocumentQuestionAnswering, | ||
AutoModelForImageClassification, | ||
AutoModelForImageSegmentation, | ||
AutoModelForMaskedLM, | ||
|
@@ -215,6 +217,17 @@ | |
}, | ||
"type": "multimodal", | ||
}, | ||
"document-question-answering": { | ||
"impl": DocumentQuestionAnsweringPipeline, | ||
"pt": (AutoModelForDocumentQuestionAnswering,) if is_torch_available() else (), | ||
"tf": (), | ||
"default": { | ||
"model": { | ||
"pt": ("impira/layoutlm-document-qa", "3a93017") | ||
}, # TODO Update with custom pipeline removed, just before we land | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this still relevant? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, good catch, removed. |
||
}, | ||
"type": "multimodal", | ||
}, | ||
"fill-mask": { | ||
"impl": FillMaskPipeline, | ||
"tf": (TFAutoModelForMaskedLM,) if is_tf_available() else (), | ||
|
@@ -443,7 +456,7 @@ def pipeline( | |
trust_remote_code: Optional[bool] = None, | ||
model_kwargs: Dict[str, Any] = None, | ||
pipeline_class: Optional[Any] = None, | ||
**kwargs | ||
**kwargs, | ||
) -> Pipeline: | ||
""" | ||
Utility factory method to build a [`Pipeline`]. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NielsRogge @Narsil here's what I added. I can additionally add layoutlmv2 and v3 here (while keeping them in
MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES
)? Or leave it as just LayoutLMv1 for now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add v2 and v3 here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do!