Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with DoclingDocument.add_code() got an unexpected keyword argument 'label' #863

Closed
MikeLP opened this issue Feb 2, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@MikeLP
Copy link

MikeLP commented Feb 2, 2025

Bug

Getting TypeError: DoclingDocument.add_code() got an unexpected keyword argument 'label' when chunking and converting html to markdown (html pages by web-search)
...

Steps to reproduce

text = "...." # Some html
buffer = io.BytesIO(text.encode("utf-8"))
stream = DocumentStream(name=f"{page}.html", stream=buffer)

result = converter.convert(stream)

output

Main  2025-02-01 19:45:48 -0800 INFO: Converted document: everything-you-need-to-know-about-vite-6
--- Logging error ---
Traceback (most recent call last):
  File "/backend/.venv/lib/python3.12/site-packages/docling/backend/html_backend.py", line 97, in walk
    self.analyse_element(element, idx, doc)
  File "/backend/.venv/lib/python3.12/site-packages/docling/backend/html_backend.py", line 125, in analyse_element
    self.handle_code(element, idx, doc)
  File "/backend/.venv/lib/python3.12/site-packages/docling/backend/html_backend.py", line 219, in handle_code
    doc.add_code(parent=self.parents[self.level], label=label, text=text)
TypeError: DoclingDocument.add_code() got an unexpected keyword argument 'label'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/iyanello/.local/share/uv/python/cpython-3.12.7-linux-x86_64-gnu/lib/python3.12/logging/__init__.py", line 1160, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/iyanello/.local/share/uv/python/cpython-3.12.7-linux-x86_64-gnu/lib/python3.12/logging/__init__.py", line 999, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/home/iyanello/.local/share/uv/python/cpython-3.12.7-linux-x86_64-gnu/lib/python3.12/logging/__init__.py", line 703, in format
    record.message = record.getMessage()
                     ^^^^^^^^^^^^^^^^^^^
  File "/home/iyanello/.local/share/uv/python/cpython-3.12.7-linux-x86_64-gnu/lib/python3.12/logging/__init__.py", line 392, in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/backend/.venv/lib/python3.12/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
......
_log.error(" => element: ", element, "\n")
Message: ' => element: '
Arguments: (<pre class="jo"><div class="yj l"><div class="bf b bg z bk"><div class="jc">some random text from web search.</div></div></div></pre>, '\n')

...

Docling version

=2.17
=2.16
...

Python version

3.12
...

@MikeLP MikeLP added the bug Something isn't working label Feb 2, 2025
@MikeLP MikeLP changed the title Bug: Crash with DoclingDocument.add_code() got an unexpected keyword argument 'label' Crash with DoclingDocument.add_code() got an unexpected keyword argument 'label' Feb 2, 2025
@flobotde
Copy link

flobotde commented Feb 3, 2025

Same to me using:
Python 3.11.10
docling 2.17
docling-core 2.16.1

from docling.document_converter import DocumentConverter

def main():
    url = "https://ds4sd.github.io/docling/usage/"

    converter = DocumentConverter()
    try:
        result = converter.convert(url)
        with open(f'docling-out.md', 'w', encoding='utf-8') as f:
            f.write(result.document.export_to_markdown())
            print(f"Saved as {f.result.name}")
    except Exception as e:
        print(f"Error saving result for {url}: {str(e)}")

if __name__ == "__main__":
    main()

@dolfim-ibm
Copy link
Contributor

This was fixed and merged in #850. It is coming soon in the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants