-
Notifications
You must be signed in to change notification settings - Fork 68
subprocess: Support for fork in subprocess debugging #943
Comments
To do this right, we need clean shutdown of ptvsd. #799 tracks the work needed for clean shutdown. |
As long as this is unsupported, could you make it fail loudly? For now, it just hangs |
As a note for anyone using multiprocessing arriving here, on
to the start of the program to make multiprocessing work (because that way multiprocessing will not use |
True, but incompatible with many basic use cases for multiprocessing.Pool and can lead to running out of memory pretty quickly due to recursive process spawning and unexpected memory usage patterns. spawn() starts a new process with a full memory copy of the parent, which runs from the main entry point, whereas fork() splits off a process from the parent with only the context from the parent that the new process needs to run and continues execution with the next instruction. This difference is why spawn() is slow and fork() is fast. It also means that changing the setting usually results in processes that execute very different code paths from the time they are invoked. The following common pattern is used in scripts that expect to run a single thread, except to occasionally fork() n scope-limited processes that allow data to be processed in parallel, and is based off a basic use case in the multiprocessing docs:
The above code is written for fork(), and trying to debug it with spawn() creates a spawn bomb. Luckily spawn() is slow, so even recursively spawning ~20 more processes per iteration that each load a new copy of a multi-million row dataset from storage only leads to leaking an average of 2-3GB per second until around 10-15 seconds in, where the exponential nature starts to really accelerate the growth. A smaller dataset would lead to faster growth, though it would eventually bottleneck due to I/O if you had enough RAM. This workaround is a Bad Idea(tm) to try if you're not running (and looking at) |
We're still actively working on code refactoring that is necessary for us to support fork properly. It's a tricky thing to get right, because of issues it has with multi-threading (orphaned locks etc), if it's not immediately followed by exec - and we use threads heavily in the debugger itself, so this applies even when debugging single-threaded programs. I'm not sure I quite follow your code example, though. Wouldn't it create the same number of processes regardless of how they're spawned? Or are you saying that it's not such a big deal with fork, because they're all going to share most of their memory pages? |
@supertaz I do agree with you that there may be caveats when changing Although I'd also like to point that it's very easy to shoot yourself on the foot with That's especially true on CPython because although And that's besides the regular caveats with locks/uis/file descriptors/buffers/threads/etc ;) p.s.: that's not to say that there aren't cases where it's a better option nor that the debugger won't support it in the future, as @int19h pointed, we're working on that and I just wanted to present the current workaround to use the debugger until that work is finished. |
even though vs code doesn't work with this project microsoft/ptvsd#943
It's got a lot to do with the problem at hand, where there's processing of very large data structures that have been split into chunks. Because fork() is CoW (even if reference counting makes it CoR, this point will hold), when you're trying to accelerate processing of large datasets on NUMA systems with many cores (especially with HT pipelines), and you aren't overwriting data, but instead aggregating results and then handling them in the parent at completion, you have a near-linear improvement of processing times with fork() because you're not copying much (if any) memory into the child. The same can't be said of spawn(), as it is copying memory you don't need to access, and thus is slow. Also, spawn() and fork() work differently as to where the entry point is for the child. Because python's multiprocessing package abstracts some of this away (though it's accessible), and because the code is designed to be short-lived and operate as parallel inline execution of a single function, instead of being part of a long-lived pool that operates in a dispatcher-worker pattern, there are some pretty big differences in how the code behaves. fork() followed by exec() is similar in behavior to spawn() plus CoW, but other fork() usage patterns are not, and I think this is where any confusion is coming in. The above code isn't resetting state and entry point via exec(), because the children have one line of code to execute (which calls others, but there is no branching) before they return data to the parent and die. This is efficient with fork() only, and is a workaround for Python's GIL hamstringing threading. The pattern I supplied is used in data science and data engineering to handle processing of very large datasets via pandas DataFrames and other similar structures. I fully acknowledge how someone who didn't know how to properly use fork() or spawn() could wreak havoc on a system (typically their own system, as most multi-occupancy systems limit a user's resources), but I've been using both for decades and understand the case for one over the other. spawn() is useful for cases where it is appropriate, it's just not useful in these types of cases, and I was trying to illustrate that switching blindly to code not written for spawn() could be just as bad as not knowing how to use fork(), just slower. It's a good illustration, however, of how either fork() or spawn() can be abused if the wrong one is used in code that is designed for a use case where only the other is appropriate. Since someone who doesn't truly understand how fork() and spawn() work, but was trying to debug something based on a common pattern that is often recommended for speeding up processing under Python and wasn't watching system resources (and/or didn't know to) might not realize that they were creating a recursive memory monster, I thought it important to illustrate that the workaround isn't a workaround for anything that uses multiprocessing for short-lifetime, inline purposes where fork() without exec() is the appropriate solution. Also, I was trying to illustrate the importance of a solution to the debugging issue (as difficult as it may be to create one), over offering a workaround without explaining why said workaround is limited in scope and applicability. Sorry it took me so long to respond, I hadn't noticed the notification icon, and I missed the notification emails. I hope I addressed the questions, and feel free to ask for more clarifications, as mine my have further obfuscated the point (my concentration is flagging at present). |
For me setting start method to spawn or forkserver just avoids the exception thrown for fork, but breakpoints in subprocesses are still not working at all. Do they work for you? |
Ah, it seems that I have to create a launch configuration and \enable |
@memeplex that is correct. |
Any update here? Spawning isn't a feasible alternative. |
@Breich90 My apologies - we haven't done a good job of tracking the work on the new implementation in a way that's easy to follow. At this point, it's mostly centralized in this issue, which references a bunch of others: #1706 TL;DR is that the new implementation is already committed and does fix the fork issue, but we need to do a few more bug fixes and polish before we can ship it as stable. There's a pre-release build for it, ptvsd 5.0.0a7, but it has to be mated with a supporting build of VSCode - and there isn't a ready-made one yet, so there's no easy way to test it without building things locally. We'll have something available for testing soon, though. |
Hello, I'm very sorry, just don't understand what should we do - just wait for the solution and look for #1706? The error is the following: Traceback (most recent call last): |
In general, you'll need a recent pre-release of ptvsd, and the corresponding version of VSCode, to have fork supported. While the issue is still open because we're still working on some bits and pieces, the bulk of it is already there in ptvsd 5 alphas. However, this particular call stack doesn't look to me like a typical ptvsd failure do to fork. It mentions pickling and |
Pavel, you are definitely right - i added this set_start_method ("spawn") and after adding it the above error started to occur. Before it there was the multiprocessing error RuntimeError: already started. So, could you advise how to solve the issue with debugging such code - I could share if it could help... |
This is complete. The remaining work to ship the new adapter is being tracked by #1706 |
Hi Team, I came across a strange issue using VS code for debugging PyTorch code on enumerator(data_loader) line. The error is: This is happening because of doing multi-processing in data-loader. This has happened specifically when I updated the VS code to 1.46.1 yesterday. On searching on the net, I figured out the solution as to set num_workers=0 or using the following code before enumerating data loader: Is there another way to resolve this issue? Thanks and regards, |
I up @guaravmunjal13 After update of vscode I have the same exception. Adding his multiprocessing lines doesn't work for me. Before the debugging was working correctly. Setting num_workers to 0 works but is slow for my training. |
Two possible options here:
Option 1: refactor main, daemon, and session code to allow teardown and restart.
Option 2: Delete all loaded ptvsd and pydevd modules, and attempt ptvsd attach
The text was updated successfully, but these errors were encountered: