Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Intermittent segfault arising from orjson #14299

Closed
danking opened this issue Feb 15, 2024 · 3 comments
Closed

[query] Intermittent segfault arising from orjson #14299

danking opened this issue Feb 15, 2024 · 3 comments

Comments

@danking
Copy link
Contributor

danking commented Feb 15, 2024

What happened?

Executive Summary

We will pin to orjson<=3.9.11 until ijl/orjson#457 merges and addresses the root cause of these segfaults. This issue is resolved when orjson merges orjson#457, releases a new version, and we upgrade to it.

Details

Tests that use the py4j_backend and thus rely on orjson to (de)serialize data have been intermittently segfaulting:

[2024-02-08 22:36:47] test/hail/matrixtable/test_file_formats.py::test_backward_compatability_ht[/io/resources/backward_compatability/1.6.0/table/6.ht/] Fatal Python error: Segmentation fault

Thread 0x00007fa51d817640 (most recent call first):
  File "/usr/lib/python3.9/selectors.py", line 416 in select
  File "/usr/lib/python3.9/socketserver.py", line 232 in serve_forever
  File "/usr/lib/python3.9/threading.py", line 917 in run
  File "/usr/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007fa5273ff640 (most recent call first):
  File "/usr/local/lib/python3.9/dist-packages/py4j/clientserver.py", line 58 in run
  File "/usr/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/lib/python3.9/threading.py", line 937 in _bootstrap

Current thread 0x00007fa52bd6b000 (most recent call first):
  File "/usr/local/lib/python3.9/dist-packages/hail/backend/py4j_backend.py", line 217 in _rpc
  File "/usr/local/lib/python3.9/dist-packages/hail/backend/backend.py", line 212 in table_type
...

Line 217 only does one thing: call orjson.dumps.

def _rpc(self, action, payload) -> Tuple[bytes, str]:
data = orjson.dumps(payload)
path = action_routes[action]

Indeed, orjson has had this issue since 3.9.12 and we just recently updated orjson from 3.9.10 to 3.9.12:

commit d2615543476bde5d01061499c92f26124b85caf3
Author: Dan King <daniel.zidan.king@gmail.com>
Date:   Fri Feb 2 14:21:47 2024 -0500

    [dependencies] mass update (#14233)

The relevant part of the diff:

-orjson==3.9.10
+orjson==3.9.12

orjson reduced the frequency of this segfault in 3.9.13 by eliding some of the code that caused buffer overheads; however, the problem persists. I complete fix is currently awaiting pull request review.

Reports:

Batches:

Version

0.2.127

Relevant log output

[2024-02-08 22:36:47] test/hail/matrixtable/test_file_formats.py::test_backward_compatability_ht[/io/resources/backward_compatability/1.6.0/table/6.ht/] Fatal Python error: Segmentation fault

Thread 0x00007fa51d817640 (most recent call first):
  File "/usr/lib/python3.9/selectors.py", line 416 in select
  File "/usr/lib/python3.9/socketserver.py", line 232 in serve_forever
  File "/usr/lib/python3.9/threading.py", line 917 in run
  File "/usr/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007fa5273ff640 (most recent call first):
  File "/usr/local/lib/python3.9/dist-packages/py4j/clientserver.py", line 58 in run
  File "/usr/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/lib/python3.9/threading.py", line 937 in _bootstrap

Current thread 0x00007fa52bd6b000 (most recent call first):
  File "/usr/local/lib/python3.9/dist-packages/hail/backend/py4j_backend.py", line 217 in _rpc
  File "/usr/local/lib/python3.9/dist-packages/hail/backend/backend.py", line 212 in table_type
  File "/usr/local/lib/python3.9/dist-packages/hail/ir/table_ir.py", line 438 in _compute_type
  File "/usr/local/lib/python3.9/dist-packages/hail/ir/base_ir.py", line 406 in compute_type
  File "/usr/local/lib/python3.9/dist-packages/hail/ir/base_ir.py", line 415 in typ
  File "/usr/local/lib/python3.9/dist-packages/hail/table.py", line 393 in __init__
  File "/usr/local/lib/python3.9/dist-packages/hail/methods/impex.py", line 3293 in read_table
  File "/usr/local/lib/python3.9/dist-packages/hail/typecheck/check.py", line 584 in wrapper
  File "<decorator-gen-1482>", line 2 in read_table
  File "/io/test/hail/matrixtable/test_file_formats.py", line 104 in test_backward_compatability_ht
  File "/usr/local/lib/python3.9/dist-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_callers.py", line 102 in _multicall
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_hooks.py", line 501 in __call__
  File "/usr/local/lib/python3.9/dist-packages/_pytest/python.py", line 1792 in runtest
  File "/usr/local/lib/python3.9/dist-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_callers.py", line 102 in _multicall
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_hooks.py", line 501 in __call__
  File "/usr/local/lib/python3.9/dist-packages/_pytest/runner.py", line 262 in <lambda>
  File "/usr/local/lib/python3.9/dist-packages/_pytest/runner.py", line 341 in from_call
  File "/usr/local/lib/python3.9/dist-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/usr/local/lib/python3.9/dist-packages/_pytest/runner.py", line 222 in call_and_report
  File "/usr/local/lib/python3.9/dist-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/usr/local/lib/python3.9/dist-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_callers.py", line 102 in _multicall
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_hooks.py", line 501 in __call__
  File "/usr/local/lib/python3.9/dist-packages/_pytest/main.py", line 350 in pytest_runtestloop
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_callers.py", line 102 in _multicall
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_hooks.py", line 501 in __call__
  File "/usr/local/lib/python3.9/dist-packages/_pytest/main.py", line 325 in _main
  File "/usr/local/lib/python3.9/dist-packages/_pytest/main.py", line 271 in wrap_session
  File "/usr/local/lib/python3.9/dist-packages/_pytest/main.py", line 318 in pytest_cmdline_main
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_callers.py", line 102 in _multicall
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_manager.py", line 119 in _hookexec
  File "/usr/local/lib/python3.9/dist-packages/pluggy/_hooks.py", line 501 in __call__
  File "/usr/local/lib/python3.9/dist-packages/_pytest/config/__init__.py", line 169 in main
  File "/usr/local/lib/python3.9/dist-packages/_pytest/config/__init__.py", line 192 in console_main
  File "/usr/local/lib/python3.9/dist-packages/pytest/__main__.py", line 5 in <module>
  File "/usr/lib/python3.9/runpy.py", line 87 in _run_code
  File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main
danking pushed a commit to danking/hail that referenced this issue Feb 15, 2024
CHANGELOG: Require `orjson<3.9.12` to avoid a segfault introduced in orjson 3.9.12.

See hail-is#14299 for details.
danking added a commit that referenced this issue Feb 15, 2024
CHANGELOG: Require `orjson<3.9.12` to avoid a segfault introduced in
orjson 3.9.12.

See #14299 for details.
@danking
Copy link
Contributor Author

danking commented Feb 16, 2024

This appears to persist in 3.9.11. We are further dropping back to 3.9.10 in #14310.

@danking
Copy link
Contributor Author

danking commented Feb 21, 2024

When ijl/orjson#457 merges and is released, we can update to that latest version of orjson. Do not update before that happens.

@chrisvittal
Copy link
Collaborator

Apparently orjson closes stale PRs very aggressively, the relevant one is now ijl/orjson#459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants