Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify remote Jupyter Sever connections upon connecting to remote kernels #9941

Merged
merged 15 commits into from
May 9, 2022

Conversation

DonJayamanne
Copy link
Contributor

Fixes #9167

@DonJayamanne DonJayamanne requested a review from a team as a code owner May 8, 2022 23:07
@codecov-commenter
Copy link

codecov-commenter commented May 8, 2022

Codecov Report

Merging #9941 (60858b1) into main (7c90873) will increase coverage by 0%.
The diff coverage is 86%.

@@         Coverage Diff          @@
##           main   #9941   +/-   ##
====================================
  Coverage    63%     63%           
====================================
  Files       204     207    +3     
  Lines      9843    9890   +47     
  Branches   1567    1571    +4     
====================================
+ Hits       6253    6299   +46     
+ Misses     3086    3083    -3     
- Partials    504     508    +4     
Impacted Files Coverage Δ
src/platform/errors/types.ts 77% <ø> (ø)
...atform/errors/localJupyterServerConnectionError.ts 60% <60%> (ø)
...rc/platform/errors/jupyterSelfCertsExpiredError.ts 66% <66%> (ø)
src/platform/errors/errorHandler.ts 68% <85%> (+1%) ⬆️
src/platform/api/kernelApi.ts 72% <100%> (ø)
src/platform/common/utils/localize.ts 98% <100%> (+<1%) ⬆️
src/platform/errors/jupyterSelfCertsError.ts 100% <100%> (ø)
...tform/errors/remoteJupyterServerConnectionError.ts 100% <100%> (ø)
...rc/platform/common/dataScienceSurveyBanner.node.ts 70% <0%> (+5%) ⬆️

@@ -210,9 +205,6 @@ export class JupyterExecutionBase implements IJupyterExecution {
cancelToken
);
} else {
// Prepare our map of server URIs
await this.jupyterConnection.updateServerUri(options.uri);

// If we have a URI spec up a connection info for it
return this.jupyterConnection.createConnectionInfo(options.uri);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cerateConnectionInfo will ensure the server Uri has been updated, calling code need to know about such internals

kernelConnection.kind === 'startUsingRemoteKernelSpec')
) {
try {
await Promise.all([this.sessionManager.getRunningKernels(), this.sessionManager.getKernelSpecs()]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure we try to fetch the kernels (validate the remote jupyter server connection) before creating a session.
This should also fix #8043

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as the current issue #9167

@@ -113,6 +112,7 @@ export class RemoteKernelFinder implements IRemoteKernelFinder {
return items;
} catch (ex) {
traceError(`Error fetching remote kernels:`, ex);
throw ex;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we'd swallow exceptions and return an empty list, now we throw the error and higher up when we have errors we just fallback to the cache (this behaviour is documented in the calling code)

}
// Double check this server can be connected to. Might need a password, might need a allowUnauthorized
try {
await this.jupyterConnection.validateRemoteUri(inputText);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now validate the Uri when its entered, previously its not validated, instead we validate it when we save it into the uri storage and error message is displayed on the bottom right (very disconnected ux)

// at this point no need to display all of the kernel specs,
// Its possible the connection is dead, just display the live kernels we had.
// I.e. if user had a notebook connected to a remote kernel, then just display that live kernel.
kernels = await this.getFromCache(kind, cancelToken);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what makes #9167 possible, display the old dead kernels and notify the user and let the user remove it from the list of kernels.

case 'startUsingRemoteKernelSpec':
case 'connectToLiveRemoteKernel':
// If this is a a remote kernel, it's valid if the URI is still active
const uri = await this.serverUriStorage.getRemoteUri();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure why this was done, i guess its a temporary solution for #9167, but this breaks the code for AML (as AML uris don't work like this) also its very slow when loading the cache (making the cache completely useless).

This also slows down loading of local kernel specs as we load local & remote kernels from cache together, hence nohting appears until we validate the remote kernels, which feels like a poor design right now, but we can look into that in debt week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this the cache can be completely invalid. I suppose you're handling this by always returning the cache and letting the user pick the invalid items, but then throwing an error then? Basically move this validation to later in the process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And not sure why the original code would have been slow. It's not actually going to the remote host, it's just querying the encrypted storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable azure AML compute and the cache ends up setting up the azure compute connection, thats very slow and now none of the locally cached kernels load either for a long time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also with azure AML compute this logic was incorrect, as the uri is not the remote jupyter uri, its something else.

@@ -38,7 +38,15 @@ export class PreferredRemoteKernelIdProvider {
}
}

public async storePreferredRemoteKernelId(uri: Uri, id: string | undefined): Promise<void> {
public async clearPreferredRemoteKernelId(uri: Uri): Promise<void> {
await this.storePreferredRemoteKernelIdInternal(uri);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug fix, added tests:

  • Assume you open a notebook and select a remote kernel
  • Now run a cell
  • Open the same notebook tomorrow, it will connect to the same remote kernel (good)
  • Shutdown the remote server
  • Open the notebook, you'll get an error about the server not being available, good
  • Now switch the kernel to a local kernel
  • Close vsc & open the same notebook again & it will try to connect to the same remote server.

We weren't clearing the mapping.

list.shift();
}
traceInfo(`Storing Preferred remote kernel for ${getDisplayPath(uri)} is ${id}`);
await this.globalMemento.update(ActiveKernelIdList, list);
if (requiresUpdate) {
await this.globalMemento.update(ActiveKernelIdList, list);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were always updating even when nothing had to be updated.

DataScience.selectDifferentKernel()
);
const serverId = err.serverId;
switch (selection) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lets users decide what needs to be done when the remote server is no longer valid.
Fixes #9167

id: '2',
serverId: computeUriHash(connInfo.url)
};
const invalidKernel: RemoteKernelSpecConnectionMetadata = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a valid test, we cannot end up with 2 kernels from two different remote jupyter uris in the same cache entry. Today we only ever support one at a time.

Also removed verification of remote jupyter uris, as we want failures so user can remove them.

@DonJayamanne DonJayamanne force-pushed the handleRemoteConnectionError branch from 7751ce9 to 79ac6f1 Compare May 9, 2022 17:21
@DonJayamanne DonJayamanne merged commit bd421a3 into main May 9, 2022
@DonJayamanne DonJayamanne deleted the handleRemoteConnectionError branch May 9, 2022 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adjust error notification behavior when unable to connect to the remote jupyter server
4 participants