-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When getting the repr of a pandas dataframe in the repl, don't do any customization to avoid getting a big output. #1078
Comments
Regarding the Regarding pandas customization, we really set up lower limits in the debugger (you could use the To setup the limits used in the debugger you need to set the following environment variables (to the values you find appropriate in your use case):
Note that raising those values could make interacting with pandas slower when you're stepping in general in the debugger (I'm not sure why, but getting the representation of a pandas data frame is very slow in pandas and the debugger will ask for that representation whenever a pandas data frame is found). As microsoft/vscode#162965 is tracking the remaining issue (where the contents of the debug console messages seem clipped in your use case), I'm closing this one. |
We have a PR on python extension to set |
You closed the issue, but the problem remains even if settings.json i like this (please notice "-s" -> "--capture=no":
|
First, when you say "setup pytest to stop capturing the output" am I doing it correctly with settings.json above or there is another procedure?
How can I set pd.options.display.expand_frame_repr = False in env. vars case?
I really don't get it, why you have so many problems with pandas dataframes. Actually, vscode way of debugging code with pandas dataframes is the worst way I have ever experienced. For example, we have a lot of pydantic models that have several dataframes as attributes, something like this:
To be able to see the dataframes in debugger, I have to add to each model a Do you think this is normal for an IDE?
I think you shouldn't. |
Furthermore, if this is an idea:
than it doesn't work. |
Many issues in the same place... let's see one at a time:
i.e.:
When I add the Can you check |
Right now because pandas is slow to print (it can easily take 0.3 seconds even on small dataframe and having many dataframes or a single big one without the current limits can easily take a few seconds which adds to a very annoying debug experience due to the repr slowness of pandas, which is why the limits were made much smaller).
That isn't really handled by the debugger (so, you have to set it at runtime as usual).
Yes, storage is good as is usage of its api, just converting that to a string the user can see is very slow in pandas (they probably didn't really optimize those code paths). You can't really unset those values, but you can change it to different values as you see fit. If you're interested, the code which handles this is: -- you can locally comment the code which customizes the pandas display options (i.e.: -- The values for the |
Note that this will only override the current if the current representation is too big (i.e.: it only makes the repr from pandas smaller, but if it's small already it'll not make it bigger). So, you probably want to check the current values you have for pandas and customize those... i.e.:
if those are smaller you should raise those first... Also, just to note, if you're running a test from the testing view you also need to add a |
@fabioz Do you have an explanation for this ? Version: 1.72.0 (user setup) |
@zljubisic on the first line the header is aligned (it's just that it appears in the same line as the I don't have an explanation for the smaller print after changing Given that the print just calls |
Actually, I was able to reproduce the case where it gets smaller by making the terminal window smaller (apparently pandas checks the size of the terminal and as it sees that the string wouldn't fit in the terminal it just elides the column). You'll have to refer to the pandas docs / ask the pandas maintainers how to override that though... |
@fabioz you must kidding me. I have just executed pytest -s --no-cov in windows terminal (out of vscode) in two different terminal sizes. Both times dataframe is printed correctly. Some of things like jupyter notebooks are quite pleasant to use, vscode as editor is quite good, but working with pandas dataframes is not acceptable. |
I'm sorry if I passed you the impression that pandas is a bad library, that wasn't my intention... all I'm saying is that the debugger has no control over the printing is done by pandas itself and that the printing itself is really slow on the pandas side, but this is just a note on that particular case, not on the awesomes of pandas ;)
As far as I know, PyCharm doesn't have a real tty when you're doing that run, so, you're not comparing the same thing... I took a quick look at their manual and apparently you can override that using
This isn't controlled by
Well, the clipping issue is from vscode, the stdout capturing is from pytest, the data viewer is from jupyter and the printing is from pandas. But now, after thinking a bit about it, I'm going to reopen because there's one thing which could be improved on debugpy which is not customizing anything when the representation is gotten on the debug console repl (right now it makes no distinction if it's being printed for the debug console or a watch window, which is why when you just type |
@fabioz I must say that I am surprised with the way you (as an organization) are dealing with issues.
If I put "env": {"PYDEVD_PANDAS_MAX_COLWIDTH": "1000",} it is still not working as well as if I execute "export PYDEVD_PANDAS_MAX_COLWIDTH=1000". You said that you have an issue which solves default "--capture=no" (microsoft/vscode-python#19903) but I don't think it is relevant if I put in settings "python.testing.pytestArgs": ["--no-cov", "-s"], and in sys.argv there is --no-capture. So, it is not a matter of defaults, it is matter that even it is set, it is not working. You also said that playing with pandas options can make interaction slower, but we are not talking about performances here, we are talking about pandas options that don't work at all. About pandas.str() it is working everywhere except vscode. I never had an issue with it in pycharm, shell, jupyter(lab) notebook... if you execute print out of vscode and change terminal size, all the time dataframe is printed correctly. And then, from time to time last line is not displayed. After my all remarsk, you are changing the subject of the issue from "Pandas dataframes in (py)test debug console are broken" to "When getting the repr of a pandas dataframe in the repl, don't do any customization to avoid getting a big output". Maybe I am a bit irritated by your (as an organization) attitude. because after presentation of the problem, all said is practically ignored. At the moment my colleagues and I are using Anyway, in case that you think what I have reported here is not OK, maybe I can help in the process. |
Please bear in mind that this is a repository specifically for the debugger - which, by the way, is not even VSCode-specific. As explained above, this is really a set of different issues, some of which have to do with the simple fact that Debug Console is not a real terminal (and thus cannot be compared to e.g. running pdb in one), and some are bugs in code that is not even in this repo. So let's try to disentangle this issue into pieces that can be individually tackled and create separate issues for them in the appropriate repositories. So far as I can tell, these are:
Did I miss anything? |
Hi @int19h and thank you for interfering here.
As you can see on the picture above max width of all df columns = 136 (df is from the very first post here). We could add some characters for space between columns, for index... you can see what is dataframe representation in the debug console with default settings. Default column width is PYDEVD_PANDAS_MAX_COLWIDTH=80 and if you consider dataframe like this one ( Even this is truncated: Furthermore, after creating several issues I still don't know what to do to be able to see non truncated dataframe in debug console. For example, if I created a dataframe described in the very first post, what I have to do to see it in its entirety with no "..." nor braking lines, let's say in the same way as I would see it in regular python console or jupyter(lab) notebook, or pdb...? |
Besides the other configurations for the column width you also need to configure the display width for pandas (because the default value is based on the number of columns from the terminal) with:
|
@fabioz, please enter the debug console, create this dataframe as:
and tell me what I have to do to see all rows and columns in their entirety. |
@fabioz I could say that this works, although I don't know why dataframe is not printed from very first multiline command and also, I don't see the difference between Thanks. In case you need it:
|
Just doing
So, you'll need to set those to really large numbers if you don't want those constrains when doing just a As for why it doesn't work from the start like that it's because
The debugger doesn't really have a say in that, it's pandas that decides/sets those values... |
@fabioz maybe you haven't noticed, but at the picture above you can see that I haven't got the full column size with
That is the reason why I said that your solution is not working. |
@fabioz regarding these three variables:
can you please provide instructions to me when and where to put them if I want their influence in debug console? |
You can put them either in your OS environment variables (and then restart VSCode so that it picks up those values) or you can put them in the launch configuration
Note that if you're running from a shortcut -- such as test run -- you have to set the launch configuration |
I agree that does seem weird, but that seems some issue in the layout in VSCode (possibly the same one you opened a report about in VSCode already)... If you scroll the console all the way to the right, do you see the start of the line 0? Can you reproduce it if you print it multiple times? When I try it here it seems correct -- i.e.: |
@fabioz I have just tried it. First time it went well, but second print destroyed index line 2. there is no line 2 index at all. And than if I repeat print I am getting mixed results. Please notice different vertical space betwen For example, look at last dataframe line with index 0. It starts with "fmnPNf4WEmTGCy" which is ending of col_28. Here is csv of the dataframe's first line so you can find it (bold): 0,5jqlC8Kb97v77FLgpagGmIdAyrJjIzzayYHJU54Tu3XU3GyTWI,Wkppuo4RuuTpWkyeQrEyyofrEGVBfa2sc6yzOSLkf4LuttHqW5,GjkPkMenosi8zN7j28sJT33YbHRT8PWFDb8qrTmgxVE4IhUImQ,10KURM36dyFzKBn7YXVoKwlKogbh255UUKbv1wyGk0mEpTQyDs,u2qlIPWq3N2xWxb5BvP3uKWtqg2EVGSGd8XPCwyN6qSved3L4R,VqJiiTIlrnoqsh8s1GvsMQOmSImkHoU7QBU75mcSsEN5W8lhj7,L85qwggHLcsRpxqU6iMgRsuI1KyNdnlfqdng2yy9WgXONOy2jD,7cbQQlzgkzI22Gi8e4IQabmn4CA7SsHTfMIEkQgp7XCsK71x7i,5XnpCVmlgBIh0B8w4HlvTEpYQt7UhCR0P0xgEAIBYSP5oWtZEx,wFvweheFLPpxDSVRucNPSnEj8pR3yLr5dS5pRcosHrvqRNUafG,pK7YyqEAX3ccLWqWPyNK6cVKRDLnpiSmDMNxV0Mbw95Sspn8Lj,nje3RxD4nZ4Q2s1RmIwpBfH7bXXXkPOvT8OOHIAY4brOBD7FEs,gBQ5DT3P71hwYHVM3z59nUeNaph5znK4SgtTSJNwaHR8Wyv214,dd3Ldhv8R9qUt0JwHbtVy8Oytz43A3eQo1KlzWnugQSyS3Y67X,cYB2pPgFkl68Z5YKekfnvZ6rdVSsWyp2J1oMAOUbvLwjRDRbsr,XRD91yYpPAUmXRW89dpWtGPdiIcijjoOAVTymDp8tCctoLP0K1,hiLS1q2yoirwMOQQxcBb97ap6jHKxir0hXNklsOgxn5smB8kzj,AcTqLjt41BQNCDogJxYRGoB2EWqUwyFvPKBfpeBQHrigmPrwTA,mKdulC6kp6yB5S5FvsiCfDQugs8kOe6Slbg86Te40pC5c1RZ5B,IK56HsHiGel2B6cVOu2Uolcg1bcxzMOU0ZGRrAX0Au38yKe1N8,QJkpaGVOnu6lwWOtcRyik8A5OBW9E1YcGH5RgfFuRO1yz6N4ua,67LsvSspUPT2TSTtQxjrc2XzU5uZ4S0yflUUIlZRpbbkbok5jT,nouQE70AENiP6PXA5YkKSidkhPulWowgCjknjFJ2N2aWqpbfnw,C3zUh6axDfzjYGnSHB8pOy9XHigwu4HM6VsbeACIhopR0tiEqB,mWVsz5BdJCZ0SXD4DUZLDIavROAg9I1MKVxcyvPLr9lx7R0uaM,D7w7G9OoeJZ8UVTTLnjz9qZDvnmEdd7Ui9ZBsAfMTnT5hhNHPJ,KZhuxleg5Stdb1oL40FIWOCpHOLveXzahNp8PbvYuYODkrORYp,dG5CYKMr5rxZQEVG5bsENbyi9FgWrKWV82xyF4hBEQubBgOHqQ,h4YhPoJbdbaKAEN2Bmf3kC9EhaDXLVGLmAIOfmnPNf4WEmTGCy,FfKPSCMmbcGmIDiDXDxeIucefgMhI4mmGOEZ7EwFNY01FogkwM,wHPh9EnXJCUTwQXpRHSxOhrro5TuNR06LnclCjAlPCfZbJh8L4,QJ0y4lNMdlHRS0dTgt9Yee9C2MFYUKwpwsIeNlYEuaclVV3M79,YffsyWmandk8E1A1jrrXZfm6rdhuSyI6LqcfMOipefLYAB19kr,TxlkKMEVYRPM9K286SHgTr0jLqp42W9WLDI9SdHidSFnnfWMaL,9tYVaeMRp1Q0ARWlFmZilpZJsSc1WCMCkleQCsr6w7tyi1ghJc,Ys2EkSh242RZbFe30tuE1IOhVv0LTGePXEam9kOO80kJtYUuFr,7nOeZsVrV7bUVXRpeILnuMOcv2L6RPf8UHteDuxWVDHzRyMEYd,UI55qkQCu3sOOeDgSNzThgN4pCLjcBAek6tned7KkvAKyNzWWl,vwNqIfon4MztGtQ76pEiqbxDhs5YmxaOsimEVky1CVn6q1WXjm,3jeKYOunVVjPbGwQrQHnC3LtCjcgEt33hsg6zs8Skwm5LKoFO5,1UPoLAdbVmpI3J8CRLCtIbTYFMDguU7O2WRBLv7HQCJa3zvMvt,FYjwrlOVKCs19bQ4mM9XiJj0geNoMgMp4iSTpUMovCKQIct8c1,RycVNrr4zKlQLYnocNl4Mq3zqo2EbhDQgczo0cSvs7XJbNn4A5,aoEmCjSvcSRQCE40PEnIBopY9hDgK4blkG3kkuvlZbpi9IeU5P,z2jsgsiihZVMxCyIxwEIcR2LnTsqaVW0ESLyimG4lzHKIYAEvN,NX0dka4EdEvN1cRgtnN2F1lGI5W8UbPYh8w43PCAArUVPdXqkI,sIkOJ4j54SIFHDmGHWEqSXIkDmz1mf57wYUb8wuw6X7VLMZlZQ,qT3rJi5jhh8hpZ8QTLssydrjSuJZxLqNr9QAAJlO1mQyCxvD5k,09YYGQBaVJ6W585d1EFNebwwdg1sWuC2QDnnfgdgEeBRUXnmZ7,DkQYOnyqt8wdlu4mtIX5h56BZMxpl5XhLK1wrsKUUBj9HfbzGs When I add last dataframe line that is swallowed from time to time, you see how it is impossible to work with pandas dataframes in debug console. |
I see, unfortunately this seems to be a bug in VSCode itself and thus is not fixable in Please add this information to the existing issue you have reported at the VSCode repository so that the VSCode team can take a look at it and fix it accordingly (you may want to mention that you're using the debug console with word wrap disabled as that could be playing a part in it). |
Can you please help me with it? I have several issues opened and I am not sure what to put where. |
My setup is the following: Another option is to use launch.json like this:
First configuration is the default one, second is for debugging. |
You can do that if that's the same terminal that'll make the run afterwards (i.e.: the environment variables need to be inherited in the run from that command -- the important part is that the environment variable needs to be available in |
Yes, that's what I've meant. |
@fabioz does this help?
|
No, logs are needed on the python side (the logs from vscode don't really help). |
I fixed the part that requires users to set in the environment:
When evaluating in the repl to get the original pandas value (so, now when evaluating in the repl the debugger does no additional customization, it just calls Other issues reported here aren't fixable by |
Thanks @fabioz I am waiting for new release to check it. |
Props for your patience, persistence, and many responses throughout this thread @fabioz! |
Type: Bug
Please look at the picture:
If I am debugging a pytest test, when I reach the breakpoint, debug console behaves as I have described in the picture above.
I have few questions:
Extension version: 2022.14.0
VS Code version: Code 1.71.2 (74b1f979648cc44d385a2286793c226e611f59e7, 2022-09-14T21:03:37.738Z)
OS version: Windows_NT x64 10.0.19042
Modes:
Sandboxed: No
Remote OS version: Linux x64 3.10.0-1127.el7.x86_64
In case you need it:
The text was updated successfully, but these errors were encountered: