Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDF hangs with no exception / warning #535

Closed
robbieculkin opened this issue Mar 3, 2021 · 5 comments
Closed

UDF hangs with no exception / warning #535

robbieculkin opened this issue Mar 3, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@robbieculkin
Copy link

robbieculkin commented Mar 3, 2021

Description of the bug

When running UDF for parsing, candidate extraction, or labeling, the process will hang for an indefinite amount of time. No exception or warning is generated.
Hangs aren't easily reproducible: UDF will hang on a file for one run, then after a full reset UDF will parse that file without a problem, only to hang on some other file.

To Reproduce

Using very similar code to what's found in the fonduer-tutorials max_storage_temp_tuturial, run the parser UDF on a set of less-trivial PDFs (multiple pages).

Expected behavior

Informative / actionable error reporting so the developer can take further action to diagnose the problem.
Decoupling operations from postgresql and moving towards in-memory operations as mentioned in #137 might address this concern.

Error Logs/Screenshots

N/A, only a frozen progress bar.

Environment (please complete the following information)

  • OS: MacOS 10.15
  • PostgreSQL Version: 12.1
  • Poppler Utils Version: 0.71.0-5
  • Fonduer Version: 0.8.3

Additional context

I'm using the jupyter/postgres docker containers provided for fonduer-tutorials.

@lukehsiao lukehsiao added the bug Something isn't working label Mar 3, 2021
@robbieculkin
Copy link
Author

Some updates:

  1. It appears hangs do happen for specific HTML files. Switching parallelism=1 made it clear that specific files trigger this behavior. I can provide example files if necessary.
  2. This behavior occurs when using MacOS, but cannot be reproduced using a Ubuntu VM with the fonduer-tutorials docker image.

@lukehsiao
Copy link
Contributor

Hi @robbieculkin, yes please! Can you include examples of files that cause the issue? Thanks for this additional info.

@robbieculkin
Copy link
Author

Sure, here's a ZIP that includes the PDF & HTML (Adobe Acrobat converted).
huang2016.zip

Thanks for your help, @lukehsiao

@senwu
Copy link
Collaborator

senwu commented Mar 14, 2021

Hi @robbieculkin,

I test your data with the latest master branch and it works properly (see the figure below).

Screen Shot 2021-03-13 at 7 34 42 PM

Can you try the latest master branch and let us know if it's still a problem.

Thanks,
Sen

@robbieculkin
Copy link
Author

Hmm, the error remains for MacOS. I've switched to my Ubuntu machine for Fonduer-related development so I can close this. If others have issues, maybe they can reopen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants