-
Notifications
You must be signed in to change notification settings - Fork 29.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook source control / diff #94810
Comments
Some useful links:
|
While building the first prototype for inline diff decorators (gutter) and side by side diff view, I didn't run into API limitations but quite a few UI/UX challenges and it has several new requirements for our notebook infrastructure (but it's already pretty powerful and doesn't require too many changes to build the prototype). Diff Decorator
Side by Side rich diff
|
Thoughts:
I believe these two would be crucial in addressing the pain points users have when dealing with Jupyter Question:
@rebornix Lets discuss offline how |
cc @roblourens as we have quite a few metadata which are used for UI states (runnable, collapse, editable). |
One main API gap/challenge raised from the offline discussions is that whether we should show inline SCM decorators (on gutter) when there are metadata or output changes is unknown. Currently when metadata or output changes, the core don't make the document dirty, instead content provide decides if they want to mark it dirty (by triggering content change event) as only the content provider has the knowledge if a specific metadata or output should be saved to disk. For example, GitHub Issue Notebook extension implements a command to lock the cell (to disallow editing of the content), the command will cell.editable = false;
this._onDidChangeNotebook.fire({ document } ); However, there is a bug with this approach: any extension can modify a cell's metadata, but they don't have the ability to trigger content change event. Imagine that another extension modifies One proposal from @jrieken is embedding interface Cell {
metadata: {
editable: { value: boolean, persistent: boolean}
};
} The content provider will tell the core what the value of a metadata and whether it's persistent or transient. The core is then responsible for marking the document dirty and computing Diff (and also ignore the metadata if it's transient). Whether a metadata will lead to content change on disk, is now a contract between the content provider and the core. We can apply this to cell type NotebookOptionalData<T> = { value: T, persistent: boolean }
interface Cell {
outputs: Readonly<NotebookOptionalData<Output[]>>;
metadata: Readonly<{
editable: Readonly<NotebookOptionalData<boolean>>;
executionCount: Readonly<NotebookOptionalData<number>>;
}>;
} Considering that modifying a metadata or output might have side effects (dirty), extensions should not modify them directly. All updates should come through a set of cell editing API provided by the core: // pseudo code
updateMetadata(cell, { editing: false, executionCount: 2 });\
updateOutputs(cell, []); The core will update the model based on the current state of the document (for example, reject the edit if the version is wrong), and the contract with the content provider (if the change will trigger content change). |
Several proposals from the discussion in the morning, we didn't finalize yet but listing all of them helps us have a better understanding of the problem. proposal 1 interface ICellMetadata {
editable?: boolean;
}
interface INotebookData {
cells: CellData[];
}
interface CellData {
persistant: ICellMetadata;
transient: ICellMetadata;
} Extensions still use the interface NotebookCell {
metadata: Readonly<ICellMetadata>
} proposal 2 The other proposal is making outputs and all metadata persistent. Firstly, it's inconsistent and confusing that some content providers save the outputs but some don't, we might want to leave the decision to users instead of content provider:
And since metadata will all be persistent, we will need to move execution and view related information out:
|
wrong assumption though 😢 . whether outputs should be saved to disk can be a preference, however whether outputs can be saved to disk is decided by the content provider. One example is powershell notebook (which allows to open |
My current thinking is for outputs, content provider tells us if they can be saved to disk; for metadata, we try to make them minimal and all persistent, execution and view related infos are moved out of With this approach, we can introduce a content provider ctor options export function registerNotebookContentProvider(
notebookType: string,
provider: NotebookContentProvider,
options?: {
persistOutputs: boolean;
}
): Disposable; Content provider tells VS Code if they can/want to store outputs at registration time. export interface NotebookCell {
outputs: ReadonlyArray<CellOutput>;
} |
We now support diffing notebook with output rendered. |
This is the summary of the discussions @kieferrm and me had about how to support diff in VS Code, what the UX is like and what are the gaps in the API to get it working properly.
UX (Rich/Plaintext diff)
Notebooks are reprenseted in rich UI (markdown previews, code editors, and outputs rendered in various forms) but the notebook documents are usually stored to file systems in text form and then it can be tracked by source control systems. Today in VS Code you can already do text form diffing but that's limited, for example, if an image/chart output changes, we can't tell what's being changed through text diff.
Thus we want to do rich diffs by rendering notebooks in Notebook Editor side by side, align the cells positions (similar to how we align lines in text diff editor). The catch with this approach is it doesn't present all the data in the document. One example is Jupyter Notebook stores custom metadatas (kernel info, document schema version, etc) but they are never presented in the VS Code UI. To allow users still have a full picture of what's being changed behind the scenes, we may want to still support text based diffing and users can easily switch between them.
FS & Source Control
Currently the two responsbilities of a
vscode.NotebookContentProvider
arevscode.Uri
) and converting to structured datavscode.NotebookDocument
, andvscode.NotebookDocument
and saving its text form onto file system.Since the identifiers for resources are
vscode.Uri
, which are always backed up by a file system provider, notebook content providers should usevscode.workspace.fs
to resolve its raw content, instead of using node'sfs
.The source control API in VS Code works seemlessly with
vscode.Uri
. For example, if you have a pending file change in a git repo, git extension can provide two resourceUri
for the file, onefile:///
uri for current content and onegit:///
uri for the content prior to the change. Then we can ask notebook content provider to resolve the content for bothUri
s.SCM API
Notebook Content Provider
Dirty changes in workspace
Uri
s work great for source control as the content changes are already saved to file system. However if users have a dirty notebook document in the workspace (say auto saved is turned off), we can't differenciate content on disk and content in workspace as they share the sameUri
and VS Code core doesn't know how to turn the dirtyvscode.NotebookDocument
to text.Since
vscode.NotebookContentProvider
is the only one who knows how to convert avscode.NotebookDocument
to text, we will delegate this to the content provider:Diff
We have a complex two way diff algorithm for the text files, which can probably be used for the notebook document too. The catch is how we are going to compare
NotebookCell
s efficiently. If notebook providers can provide an unique id for each cell, that would be great. If not, we have to do deep comparison forNotebookCell
content.The comparison algorithm for
NotebookCell
might differ for different notebook providers. For example the GitHub Notebook wants to exclude outputs but Jupyter Notebook may include them. Not sure yet whether this can be described descriptively through metadata or we need to introduce new APIs.The text was updated successfully, but these errors were encountered: