Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix variable fetching on remote machines #4064

Merged
merged 11 commits into from
Dec 2, 2020

Conversation

IanMatthewHuff
Copy link
Member

For #4006

  • Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR).
  • Title summarizes what is changing.
  • Has a news entry file (remember to thank yourself!).
  • Appropriate comments and documentation strings in the code.
  • Has sufficient logging.
  • Has telemetry for enhancements.
  • Unit tests & system/integration tests are added/updated.
  • Test plan is updated as appropriate.
  • package-lock.json has been regenerated by running npm install (if dependencies have changed).

@@ -0,0 +1,96 @@
import pandas as _VSCODE_pd
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two .py files have not really changed in contents. The helper and two data frame related files have just been squished into one single file here that doesn't need to import anything other than python libraries.

@@ -1,10 +1,6 @@
# Query Jupyter server for the info about a dataframe
import json as _VSCODE_json
import pandas as _VSCODE_pd
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports that were removed here were all unused in variable fetching code.

export const VariableInfoImportName = '_VSCODE_VariableImport';
export const VariableInfoImport = `import vscodeGetVariableInfo as ${VariableInfoImportName}`;
export const VariableInfoFunc = `${VariableInfoImportName}._VSCODE_getVariableInfo`;
export const ScriptPath = path.join(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate GetVariableInfo from DataFrameLoading since the two are independent from each other. We will now directly input the code into the kernel instead of adding to sys.path and importing.

@IanMatthewHuff
Copy link
Member Author

Thinking a bit on testing here. We already have tests that cover local testing. However this issue only shows up in remote cases. And in particular you can't test it with "fake" remote by starting a server on your own machine, as the sys paths will be there in that case.

Can we run Docker in tests? That's how I was reproing this locally without needing a second machine.

@IanMatthewHuff IanMatthewHuff marked this pull request as ready for review December 1, 2020 18:49
@IanMatthewHuff IanMatthewHuff requested a review from a team as a code owner December 1, 2020 18:49
@rchiodo
Copy link
Contributor

rchiodo commented Dec 1, 2020

Thinking a bit on testing here. We already have tests that cover local testing. However this issue only shows up in remote cases. And in particular you can't test it with "fake" remote by starting a server on your own machine, as the sys paths will be there in that case.

Can we run Docker in tests? That's how I was reproing this locally without needing a second machine.

I believe so, yes.

See the answers to this issue:
https://github.saobby.my.eu.orgmunity/t/running-automated-tests-inside-the-docker-container-after-building-it/18047

await this.evaluate(DataFrameLoading.DataFrameRowImport);
await this.evaluate(DataFrameLoading.VariableInfoImport);
this.importedIntoKernel.add(key);
if (key && !this.importedDataFrameScriptsIntoKernel.has(key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to make sure you tested this with interactive debugging and run by line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rchiodo I tested it with interactive cell debugging, but not RBL yet. I'll check that out quick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks good here (I'd already run the cell once, so the vars were already there):
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And dataframes open while debugging too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rchiodo Bleh, I'm glad you asked this. I think it was working in the above example as the variables had already been seen before in a non-debug context.

I don't think this is all working right for the debugger right now. I think that evaluate just takes an expression, so it's only getting one function if the script file contents are passed in.

I think that I can still use this approach for kernelVariables (as those work remote) then convert debugger back to using an import statement. Debugger is only local, so that should be acceptable. The import is a single expression which is probably why it works with evaluate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rchiodo Not sure if you want to take another look, but I reverted debugger back to using imports. Pretty close to the old code only we just have one file that we import now instead of two for the data frame functions. Checked the following scenarios:
Local IW Vars + DataFrame opening
Local Notebook Vars + DataFrame opening
Local IW Cell Debugging Vars + DataFrame opening
Local Notebook Run By Line Vars + DataFrame opening
Remote IW and Notebook Vars + DataFrame opening

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see Joyce's Team's comment about the view in variable menu? This might not work in remote situations for that too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rchiodo Did you mean on GitHub, not Teams? I commented on a thread on GitHub.

If I was reading that right it's more an issue that the sys path might not work for the situation of opening a data frame viewer from a remote debugging session. Since the debuggerVariables path hasn't really changed in this PR I was figuring that I'd still push this PR to fix the non-debugging cases, then pick that up next if it's considered a must-fix.

Was that right? Or were you concerned about this PR regressing something there?

Might have to do something more interesting if that's the case. I think that I could push each individual function over in an evaluate, though that sounds kinda ugly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry it was this issue here:
#4065

I agree this isn't necessary now. (especially if this bug is must fix)

columnNames = list(df)

# Compute the index column. It may have been renamed
indexColumn = df.index.name if df.index.name else "index"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem with this PR, but this line seems to be causing opening the data viewer from the debug window to fail with the following error on some machines (no root cause yet).

line 27, in _VSCODE_getDataFrameInfo
indexColumn = df.index.name if df.index.name else "index"
AttributeError: 'builtin_function_or_method' object has no attribute 'name'

While we're here:

Suggested change
indexColumn = df.index.name if df.index.name else "index"
try:
indexColumn = df.index.name
except AttributeError:
indexColumn = "index"

@IanMatthewHuff
Copy link
Member Author

Widget test only failures.

@IanMatthewHuff IanMatthewHuff merged commit f3d6632 into main Dec 2, 2020
@IanMatthewHuff IanMatthewHuff deleted the dev/ianhu/fixRemoteVariables branch December 2, 2020 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants