Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python debugger dataviewer is broken if multiprocessing is used at some point earlier in the script #15290

Open
2 tasks
sacha-hirsch opened this issue Feb 14, 2023 · 16 comments · Fixed by #12865
Open
2 tasks
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug data-viewer

Comments

@sacha-hirsch
Copy link

Applies To

  • Notebooks (.ipynb files)
  • Interactive Window and/or Cell Scripts (.py files with #%% markers)

What happened?

When debugging a python file (OS : Windows, Environnement : Anaconda), right clicking on a numpy array or a pandas dataframe in the "variables" side panel offers the option to "view the value in the data viewer" (it might not be the exact words, my vscode install is configured in another language).

When I click on it, an error appears : "Cannot read properties of undefined (reading 'disposed')". And nothing else happens.

I tried with both the prerelease and commercial versions of Jupyter, to no avail.

In the Jupyter logs, "xxxx" is my Windows session username

Here is my lanch.json code for the debugging configuration I'm using :

{ "version": "0.2.0", "configurations": [ { "name": "Python : fichier actif", "type": "python", "request": "launch", "program": "${file}", "console": "integratedTerminal", "justMyCode": false } ] }

VS Code Version

Version : 1.75.1 (user setup) Validation : 441438abd1ac652551dbe4d408dfcec8a499b8bf Date : 2023-02-08T21:32:34.589Z Electron : 19.1.9 Chromium : 102.0.5005.194 Node.js : 16.14.2 V8 : 10.2.154.23-electron.0 Système d’exploitation : Windows_NT x64 10.0.19044 Sandboxé : No

Jupyter Extension Version

v2023.1.2010391206 AND v2023.2.1000461014

Jupyter logs

info 17:57:03.945: Found debugAdapterPython on Debug Configuration to use
info 17:57:04.764: Process Execution: > c:\Anaconda3\python.exe -c "import pandas;print(pandas.__version__)"
> c:\Anaconda3\python.exe -c "import pandas;print(pandas.__version__)"
error 17:57:05.365: [TypeError: Cannot read properties of undefined (reading 'disposed')
	at Zh (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:17:165040)
	at ef.getDataFrameInfo (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:17:165522)
	at async Fk.ensureInitialized (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:519924)
	at async Fk.getDataFrameInfo (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:518749)
	at async Ek.onVariablePanelShowDataViewerRequest (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:510752)
	at async s.h (d:\xxxx\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:96:108008)]
warn 17:57:05.366: DataScience Error [TypeError: Cannot read properties of undefined (reading 'disposed')
	at Zh (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:17:165040)
	at ef.getDataFrameInfo (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:17:165522)
	at async Fk.ensureInitialized (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:519924)
	at async Fk.getDataFrameInfo (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:518749)
	at async Ek.onVariablePanelShowDataViewerRequest (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:510752)
	at async s.h (d:\xxxx\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:96:108008)]

Coding Language and Runtime Version

Python 3.9.7

Language Extension Version (if applicable)

No response

Anaconda Version (if applicable)

conda 4.10.3

Running Jupyter locally or remotely?

None

@sacha-hirsch sacha-hirsch added the bug Issue identified by VS Code Team member as probable bug label Feb 14, 2023
@roblourens
Copy link
Member

roblourens commented Feb 15, 2023

I think that callstack should be impossible. Annotating it with the real class names

safeExecuteSilently
	at Zh (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:17:165040)
PythonVariablesRequester#getDataFrameInfo
	at ef.getDataFrameInfo (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:17:165522)
JupyterVariableDataProvider#ensureInitialized
	at async Fk.ensureInitialized (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:519924)
at async Fk.getDataFrameInfo (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:518749)
at async Ek.onVariablePanelShowDataViewerRequest (d:\xxxx\.vscode\extensions\ms-toolsai.jupyter-2023.2.1000461014\out\extension.node.js:24:510752)
at async s.h (d:\xxxx\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:96:108008)]
  • JupyterVariableDataProvider calls into its variableManager here
    this.variable = await this.variableManager.getDataFrameInfo(this.variable, this._kernel);
    and it got that with injection for @inject(IJupyterVariables) @named(Identifiers.ALL_VARIABLES)
  • The stack shows that it got to PythonVariablesRequester#getDataFrameInfo here
  • But PythonVariablesRequester is not the type that should be injected for that interface, the registrations are set up here
    serviceManager.addSingleton<IJupyterVariables>(IJupyterVariables, JupyterVariables, Identifiers.ALL_VARIABLES);
    serviceManager.addSingleton<IJupyterVariables>(IJupyterVariables, KernelVariables, Identifiers.KERNEL_VARIABLES);
    serviceManager.addSingleton<IKernelVariableRequester>(
    IKernelVariableRequester,
    PythonVariablesRequester,
    Identifiers.PYTHON_VARIABLES_REQUESTER
    );

So the code should not call from JupyterVariableDataProvider to PythonVariablesRequester but somehow it clearly did. I have to investigate some more.

@amunger
Copy link
Contributor

amunger commented Feb 15, 2023

it happens when debuggerVariables.active is false and kernelVariables are returned from the getter here. I got a repro by just commenting out the first return

@roblourens
Copy link
Member

I can't repro- if you could, it means that kernel was undefined at that point and I'm not sure why that would happen (or why debuggerVariables would say it's not active at this point)

But the interface clearly states that kernel can be undefined, and this case just isn't implemented in KernelVariables, so I'll fix that case and see if it helps the OP

@joyceerhl joyceerhl added verified Verification succeeded author-verification-requested Issues potentially verifiable by issue author and removed verified Verification succeeded labels Feb 22, 2023
@joyceerhl
Copy link
Contributor

I couldn't repro the original problem described, so I've optimistically marked it verified, but @sacha-hirsch please try out the latest prerelease and see if it's been fixed for you as well.

@joyceerhl joyceerhl added the verified Verification succeeded label Feb 22, 2023
@sacha-hirsch
Copy link
Author

sacha-hirsch commented Feb 23, 2023

I have some news ! First, the bad one : with the newest release, the error message doesn't come up anymore... but nothing else happens. So this is still not resolved.

However, I noticed that this bug only happens when I have multiprocessing code running at some point earlier in the program (multiprocessing.Pool().map is called and resolved before the breakpoint where I'm peeking at the array) !

You can find attached two logs of the same program with the same breakpoint, one with multiprocessing enabled (multiprocessing.Pool().map is called and resolved earlier in the program; the bug appears) and one with multiprocessing disabled (multiprocessing.Pool().map is not called earlier in the program; no bug, data viewer works perfectly).

The bug also disappears if the breakpoint is placed before the call of multiprocessing.Pool().map

So, I guess a better title for this issue would be "dataviewer broken when multiprocessing is used earlier in the script"

I should have checked with something closer to a minimal example before posting, sorry!

Thank you all for your help!

vscode_variable_peeking_bug_multiproc_enabled.txt
vscode_variable_peeking_no_bug_multiproc_disabled.txt

@joyceerhl joyceerhl added verification-found Issue verification failed and removed verified Verification succeeded labels Feb 23, 2023
@joyceerhl joyceerhl reopened this Feb 23, 2023
@roblourens roblourens added this to the March 2023 milestone Feb 23, 2023
@roblourens
Copy link
Member

Thanks for the details, could you also include some sample code?

@sacha-hirsch
Copy link
Author

Here you go with a minimal example

Commenting line 14 will make the bug disappear

jupyter_bug_minimal_example.txt

@roblourens roblourens modified the milestones: March 2023, April 2023 Mar 21, 2023
@hyepod
Copy link

hyepod commented Mar 22, 2023

I have exactly the same problem as @sacha-hirsch
Did you find another way to display your dataframe like in the Dataframe Viewer.
(I have over 1 million rows in my case)

Do we have an update to fix this bug?

@ghost
Copy link

ghost commented Apr 13, 2023

I have the same problem too.
Only 20 rows of Pandas.Dataframe / the List can't display in Data Viewer.
1

@ghost
Copy link

ghost commented Apr 13, 2023

Pay attention. I just find that the variables could view in Data Viewer if pool of mutliprocessing didn't shutdown.
Maybe it is a temporary solution.
2

@leonmayer
Copy link

I had the same problem, but, in my case multiprocessing was called by torch.utils.data.DataLoader. Setting num_workers=0 fixed it for me.

@roblourens roblourens modified the milestones: April 2023, May 2023 Apr 26, 2023
@roblourens roblourens removed this from the May 2023 milestone May 31, 2023
@DonJayamanne
Copy link
Contributor

@leonmayer @bamurtaugh @hyepod @sacha-hirsch
I'm tryhing to replicate this issue with the code provided, unfortunately I cannot repro this.
Can someone please provide a simple sample to repro this issue.
When using the code provided by @sacha-hirsch it fails with the following error

  File "/Users/donjayamanne/.pyenv/versions/3.11.5/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'func' on <module '__main__' (built-in)>

@DonJayamanne DonJayamanne added info-needed Issue requires more information from poster and removed verification-found Issue verification failed author-verification-requested Issues potentially verifiable by issue author labels Dec 13, 2023
@sacha-hirsch
Copy link
Author

sacha-hirsch commented Dec 13, 2023

Hello @DonJayamanne ,
This is probably because my minimal code is not meant to be run in jupyter but as a classic python .py file.

This issue was probably categorized as "vscode-jupyter" because it deals with the dataviewer but it is first and foremost a python debugger issue, not a jupyter issue.

FYI I just restested the minimal code with python 3.11 and the last version of VScode, and the bug is still there. Calling the multiprocessing module somewhere in the code will break the debugger dataviewer for all the following code lines , even if meanwhile the multiprocessing call has been resolved.

Glad to see this issue is still being investigated.

Best of luck !

@vscodenpa
Copy link

Hey @DonJayamanne, this issue might need further attention.

@sacha-hirsch, you can help us out by closing this issue if the problem no longer exists, or adding more information.

@DonJayamanne DonJayamanne transferred this issue from microsoft/vscode-jupyter Mar 2, 2024
@DonJayamanne DonJayamanne removed their assignment Mar 2, 2024
@github-actions github-actions bot added the triage-needed Issue needs to be triaged label Mar 2, 2024
@sacha-hirsch
Copy link
Author

sacha-hirsch commented Mar 4, 2024

FYI as of today on the latest vscode stable release and python 3.11 the bug is still happening. I've already given a minimal example, I cannot provide any more info. Multiprocessing handling by the debugger is clearly broken.

@github-actions github-actions bot removed the info-needed Issue requires more information from poster label Mar 4, 2024
@amunger amunger assigned amunger and unassigned paulacamargo25 Mar 4, 2024
@amunger amunger transferred this issue from microsoft/vscode-python Mar 4, 2024
@amunger amunger added data-viewer and removed triage-needed Issue needs to be triaged labels Mar 4, 2024
@amunger
Copy link
Contributor

amunger commented Mar 4, 2024

bringing this back to jupyter since this command is a jupyter contribution. This is likely because we're not paying attention to any thread ID in our debug watcher, and may not be able to without microsoft/vscode#63943

@sacha-hirsch sacha-hirsch changed the title Can't view dataframe in data viewer while debugging "Cannot read properties of undefined (reading 'disposed')" Python debugger dataviewer is broken if multiprocessing is used at some point earlier in the script Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue identified by VS Code Team member as probable bug data-viewer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants