-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify remote Jupyter Sever connections upon connecting to remote kernels #9941
Conversation
Codecov Report
@@ Coverage Diff @@
## main #9941 +/- ##
====================================
Coverage 63% 63%
====================================
Files 204 207 +3
Lines 9843 9890 +47
Branches 1567 1571 +4
====================================
+ Hits 6253 6299 +46
+ Misses 3086 3083 -3
- Partials 504 508 +4
|
@@ -210,9 +205,6 @@ export class JupyterExecutionBase implements IJupyterExecution { | |||
cancelToken | |||
); | |||
} else { | |||
// Prepare our map of server URIs | |||
await this.jupyterConnection.updateServerUri(options.uri); | |||
|
|||
// If we have a URI spec up a connection info for it | |||
return this.jupyterConnection.createConnectionInfo(options.uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cerateConnectionInfo
will ensure the server Uri has been updated, calling code need to know about such internals
kernelConnection.kind === 'startUsingRemoteKernelSpec') | ||
) { | ||
try { | ||
await Promise.all([this.sessionManager.getRunningKernels(), this.sessionManager.getKernelSpecs()]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure we try to fetch the kernels (validate the remote jupyter server connection) before creating a session.
This should also fix #8043
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well as the current issue #9167
@@ -113,6 +112,7 @@ export class RemoteKernelFinder implements IRemoteKernelFinder { | |||
return items; | |||
} catch (ex) { | |||
traceError(`Error fetching remote kernels:`, ex); | |||
throw ex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we'd swallow exceptions and return an empty list, now we throw the error and higher up when we have errors we just fallback to the cache (this behaviour is documented in the calling code)
} | ||
// Double check this server can be connected to. Might need a password, might need a allowUnauthorized | ||
try { | ||
await this.jupyterConnection.validateRemoteUri(inputText); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now validate the Uri when its entered, previously its not validated, instead we validate it when we save it into the uri storage and error message is displayed on the bottom right (very disconnected ux)
// at this point no need to display all of the kernel specs, | ||
// Its possible the connection is dead, just display the live kernels we had. | ||
// I.e. if user had a notebook connected to a remote kernel, then just display that live kernel. | ||
kernels = await this.getFromCache(kind, cancelToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what makes #9167 possible, display the old dead kernels and notify the user and let the user remove it from the list of kernels.
case 'startUsingRemoteKernelSpec': | ||
case 'connectToLiveRemoteKernel': | ||
// If this is a a remote kernel, it's valid if the URI is still active | ||
const uri = await this.serverUriStorage.getRemoteUri(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure why this was done, i guess its a temporary solution for #9167, but this breaks the code for AML (as AML uris don't work like this) also its very slow when loading the cache (making the cache completely useless).
This also slows down loading of local kernel specs as we load local & remote kernels from cache together, hence nohting appears until we validate the remote kernels, which feels like a poor design right now, but we can look into that in debt week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this the cache can be completely invalid. I suppose you're handling this by always returning the cache and letting the user pick the invalid items, but then throwing an error then? Basically move this validation to later in the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thats correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And not sure why the original code would have been slow. It's not actually going to the remote host, it's just querying the encrypted storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable azure AML compute and the cache ends up setting up the azure compute connection, thats very slow and now none of the locally cached kernels load either for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also with azure AML compute this logic was incorrect, as the uri is not the remote jupyter uri, its something else.
@@ -38,7 +38,15 @@ export class PreferredRemoteKernelIdProvider { | |||
} | |||
} | |||
|
|||
public async storePreferredRemoteKernelId(uri: Uri, id: string | undefined): Promise<void> { | |||
public async clearPreferredRemoteKernelId(uri: Uri): Promise<void> { | |||
await this.storePreferredRemoteKernelIdInternal(uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fix, added tests:
- Assume you open a notebook and select a remote kernel
- Now run a cell
- Open the same notebook tomorrow, it will connect to the same remote kernel (good)
- Shutdown the remote server
- Open the notebook, you'll get an error about the server not being available, good
- Now switch the kernel to a local kernel
- Close vsc & open the same notebook again & it will try to connect to the same remote server.
We weren't clearing the mapping.
list.shift(); | ||
} | ||
traceInfo(`Storing Preferred remote kernel for ${getDisplayPath(uri)} is ${id}`); | ||
await this.globalMemento.update(ActiveKernelIdList, list); | ||
if (requiresUpdate) { | ||
await this.globalMemento.update(ActiveKernelIdList, list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were always updating even when nothing had to be updated.
DataScience.selectDifferentKernel() | ||
); | ||
const serverId = err.serverId; | ||
switch (selection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lets users decide what needs to be done when the remote server is no longer valid.
Fixes #9167
id: '2', | ||
serverId: computeUriHash(connInfo.url) | ||
}; | ||
const invalidKernel: RemoteKernelSpecConnectionMetadata = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a valid test, we cannot end up with 2 kernels from two different remote jupyter uris in the same cache entry. Today we only ever support one at a time.
Also removed verification of remote jupyter uris, as we want failures so user can remove them.
7751ce9
to
79ac6f1
Compare
Fixes #9167