Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDFs are hard to debug #106

Closed
shcheklein opened this issue Jul 20, 2024 · 2 comments
Closed

UDFs are hard to debug #106

shcheklein opened this issue Jul 20, 2024 · 2 comments
Labels
bug Something isn't working ux

Comments

@shcheklein
Copy link
Member

shcheklein commented Jul 20, 2024

When you run a code like:

def pdf_chunks(file: File) -> Iterator[Chunk]:
    
    chunks = []
    if len(chunks) > 3:
        # Mind this line, it is causing an obvious IndexError
        print(chunks[100000])

dc = (
    DataChain.from_storage(source)
    .filter(C.name.glob("*.pdf"))
    .gen(document=pdf_chunks)
)

dc

it leads to something like this:

Traceback (most recent call last):
  File "<string>", line 48, in <module>
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/dataset.py", line 1771, in query_wrapper
    _send_result(dataset_query)
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/dataset.py", line 1720, in _send_result
    preview = preview_query.to_records()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/dataset.py", line 1293, in to_records
    return self.results(lambda cols, row: dict(zip(cols, row)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/lib/dc.py", line 564, in results
    return list(rows)
           ^^^^^^^^^^
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/lib/dc.py", line 563, in <genexpr>
    rows = (row_factory(db_signals, r) for r in rows)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/lib/dc.py", line 554, in iterate_flatten
    with super().select(*db_signals).as_iterable() as rows:
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/dataset.py", line 1236, in as_iterable
    query = self.apply_steps().select()
            ^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/dataset.py", line 1179, in apply_steps
    result = step.apply(
             ^^^^^^^^^^^
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/dataset.py", line 631, in apply
    self.populate_udf_table(udf_table, query)
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/dataset.py", line 549, in populate_udf_table
    process_udf_outputs(
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/dataset.py", line 399, in process_udf_outputs
    for row in udf_output:
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/query/udf.py", line 147, in <genexpr>
    return (dict(zip(self.signal_names, row)) for row in results)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/pdf-datachain-demo/.venv/lib/python3.12/site-packages/datachain/lib/udf.py", line 204, in <genexpr>
    res = (
          ^
  File "<string>", line 34, in pdf_chunks
IndexError: list index out of range
  • Line number is wrong
  • No clear stack trace
  • Also, no way to add prints inside to get at least some information
  • No easy way to set a breakpoint inside (?)
@shcheklein shcheklein added bug Something isn't working ux labels Jul 20, 2024
@dmpetrov
Copy link
Member

No easy way to set a breakpoint inside (?)

It works in a single thread

@shcheklein
Copy link
Member Author

Closing in favor of #360 - that should resolve most of the issues here. And we can come back to this after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ux
Projects
None yet
Development

No branches or pull requests

2 participants