Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(msword_backend): handle conversion error in label parsing #896

Conversation

imvladikon
Copy link
Contributor

@imvladikon imvladikon commented Feb 5, 2025

Description:
fix(msword_backend): Handle conversion error in label parsing

Updated the label parsing logic to use str_to_int with a default value, preventing potential conversion errors when unexpected formats are encountered.

Steps to Reproduce:
The issue can be reproduced using the attached document, which contains a paragraph style: "Shell title: Table and Figure".

Code to Reproduce:

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
_ = converter.convert("issue_paragraph_style.docx")

Attachment:
issue_paragraph_style.docx


PS: Thank you for this excellent package!


Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link

mergify bot commented Feb 5, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@cau-git cau-git requested a review from maxmnemonic February 5, 2025 12:58
Copy link
Contributor

@maxmnemonic maxmnemonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @imvladikon, thanks for the fix!
Attached docx however don't drop an error when I try to reproduce it, but as we handle rest of such situations on backend with str_to_int, make sense to also use it here.

Can you please sign your commit:
In your local branch, run: git rebase HEAD~1 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin fix/msword-backend-handle-conversion-error-in-label-parsing

That will satisfy DCO and we can proceed with merging

Updated label parsing to use `str_to_int` with a default value to prevent potential conversion errors.

Signed-off-by: Vladimir Gurevich <vladimir@beaconcure.com>
@imvladikon imvladikon force-pushed the fix/msword-backend-handle-conversion-error-in-label-parsing branch from be1c097 to 5e4056f Compare February 6, 2025 08:53
@imvladikon
Copy link
Contributor Author

I did it!
Thank you @maxmnemonic !

Copy link
Contributor

@maxmnemonic maxmnemonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@maxmnemonic maxmnemonic merged commit 722a6eb into DS4SD:main Feb 6, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants