Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Sentry SDK in worker leads to crash #261

Closed
itssimon opened this issue Dec 11, 2023 · 14 comments
Closed

Using Sentry SDK in worker leads to crash #261

itssimon opened this issue Dec 11, 2023 · 14 comments

Comments

@itssimon
Copy link

itssimon commented Dec 11, 2023

The Taskiq workers crash immediately if I initialise the Sentry SDK in the worker startup handler.

$ taskiq worker --ack-type when_executed --reload apitally.tasks:broker                                                                4s  cloud  base 20:37:47
[2023-12-11 20:40:48,284][taskiq.worker][INFO   ][MainProcess] Starting 2 worker processes.
[2023-12-11 20:40:48,284][root][WARNING][MainProcess] Reload on change enabled. Number of worker processes set to 1.
[2023-12-11 20:40:48,290][taskiq.process-manager][INFO   ][MainProcess] Started process worker-0 with pid 30331 
objc[30331]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[30331]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
[2023-12-11 20:40:50,301][taskiq.process-manager][INFO   ][MainProcess] worker-0 is dead. Scheduling reload.
[2023-12-11 20:40:51,314][taskiq.process-manager][INFO   ][MainProcess] Process worker-0 restarted with pid 30338
objc[30338]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.
objc[30338]: +[NSNumber initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
[2023-12-11 20:40:53,330][taskiq.process-manager][INFO   ][MainProcess] worker-0 is dead. Scheduling reload.
[2023-12-11 20:40:54,338][taskiq.process-manager][INFO   ][MainProcess] Process worker-0 restarted with pid 30342
objc[30342]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.
objc[30342]: +[NSNumber initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

The worker startup handler looks like this:

@broker.on_event(TaskiqEvents.WORKER_STARTUP)
async def on_worker_startup(state: TaskiqState) -> None:
    sentry_sdk.init(
        dsn=str(settings.sentry_dsn),
        environment=settings.sentry_environment,
    )

This doesn't happen if I don't use Sentry, or if I start the worker without the --reload flag.

Environment:

  • macOS 14.0 (Sonoma)
  • Python 3.11
  • taskiq 0.10.4
  • sentry-sdk 1.38.0
@s3rius
Copy link
Member

s3rius commented Dec 11, 2023

Hi and thanks for creating this report. But I'm not sure how we can solve it, because it's a MacOS related issue only.

Here's the origin on this issue: ansible/ansible#32499 (comment).

Maybe you'll be able to resolve the issue by running this command before starting the worker:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

@itssimon
Copy link
Author

itssimon commented Dec 11, 2023

Setting that environment variable doesn't work unfortunately, but the log output changes to:

[2023-12-11 21:25:53,237][taskiq.worker][INFO   ][MainProcess] Starting 2 worker processes.
[2023-12-11 21:25:53,237][root][WARNING][MainProcess] Reload on change enabled. Number of worker processes set to 1.
[2023-12-11 21:25:53,242][taskiq.process-manager][INFO   ][MainProcess] Started process worker-0 with pid 32423 
[2023-12-11 21:25:53,296][taskiq.worker][INFO   ][worker-0] Importing tasks from module apitally.tasks.tasks
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
[2023-12-11 21:25:54,246][taskiq.process-manager][INFO   ][MainProcess] worker-0 is dead. Scheduling reload.

Even though this error is clearly rooted somewhere in macOS there must be a workaround. Taskiq is the only framework I've encountered this with, and I've been using hot reloading and Sentry with FastAPI, Dramatiq, Celery, etc.

@s3rius
Copy link
Member

s3rius commented Dec 11, 2023

I guess I have a possible solution. I will publish it soon. Will you be able to install taskiq from the branch to verify that the problem is gone?

@itssimon
Copy link
Author

Ah, nice one! Yes, no problem.

@s3rius
Copy link
Member

s3rius commented Dec 11, 2023

The branch is called feature/macos-fork. Please verify that now there's no problem.

@itssimon
Copy link
Author

It's still not working, but the error is different:

[2023-12-11 21:43:29,859][taskiq.worker][INFO   ][MainProcess] Starting 2 worker processes.
[2023-12-11 21:43:29,859][root][WARNING][MainProcess] Reload on change enabled. Number of worker processes set to 1.
[2023-12-11 21:43:29,868][taskiq.process-manager][INFO   ][MainProcess] Started process worker-0 with pid 33654 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/homebrew/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/synchronize.py", line 115, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory
[2023-12-11 21:43:30,973][taskiq.process-manager][INFO   ][MainProcess] worker-0 is dead. Scheduling reload.

@s3rius
Copy link
Member

s3rius commented Dec 11, 2023

Okay. That was initial error that we were avoiding when choosing fork instead of spawn for macos.

Let me give you a small overview of what is happening:
Taskiq's master process creates a bunch of children processes. They don't know about each other, only master process knows about it's children. On the startup we check that all children processes have started correctly and all signals have been set for each child (Signals are used for graceful shutdowns (code)). We achieve this by using the multiprocessing.Event. We give a reference to the event from the master process to it's children processes (code) and wait for all events to be set (code). When all events are set, that all signal handlers have been successfully set up and we can continue. After that we import a broker, run all startup functions and start listening.

Because we used to use fork method, the macos security considerations disallow spawning another fork from forked process (as I understood). Maybe because infinite forking might lead to the fork-bomb exploit or something. The initial problem we observed happened because sentry tries to spawns another process that sends all the data to the sentry using fork method and fails to spawn it because of macOS' securities.

If we change the multiprocessing method to spawn, which is "safer" from macos perspective, then we observe another problem. Now processes cannot be started because of FileNotFoundError. The origin of this issue is that spawn method actually doesn't have any shared memory with parent process and all variables recreated from scratch. Which leads to an error, since the multiprocess.Event is no longer can be dereferenced and accessed by the child process and therefore cannot be set. (Not sure why this error doesn't happen on linux, tho).

This is how I see the problem. I see one possible solution. It's to start using spawn method by default and use another mechanism of readiness instead of the multiprocessing.Event . I will try to fix this error, but I don't have macos to verify it. I will tag you on github when the solution is ready so you can test it out. Let's solve the issue together.

Have a nice day.

@itssimon
Copy link
Author

Thanks for the detailed explanation. That all makes sense. I'm very happy to help solve this issue!

By the way, it may be good to set up automated tests to also run on GitHub Actions macOS runners. That way you can make sure Taskiq continues to work on macOS without you having a Mac yourself.

@s3rius
Copy link
Member

s3rius commented Dec 12, 2023

We already use it for most places. But CLI and process manager is one of parts that is missing coverage. I will try to add integration testing later. But it's really hard to verify and test.

@s3rius
Copy link
Member

s3rius commented Dec 15, 2023

@itssimon, hi! At first I came up with quite crazy idea of having info-server, but I disliked the way it goes, so I just removed the event. Can you try running taskiq from the branch feature/new-process-mgr? If it solves the issue, we might merge it, but with some additional changes.

@itssimon
Copy link
Author

(Disregard my previous deleted comment.)

Running taskiq from that feature branch does indeed resolve the issue.

@s3rius
Copy link
Member

s3rius commented Dec 17, 2023

Great. I will publish it after some small updates .

@igoose1
Copy link

igoose1 commented Mar 27, 2024

@s3rius, I think this can be closed as #266 is merged and @itssimon confirmed it fixed a bug.

@s3rius
Copy link
Member

s3rius commented Mar 27, 2024

Yes, true. Thanks for noticing.

@s3rius s3rius closed this as completed Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants