Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editor model world should offer some amenities to protect against data loss / file corruption #117873

Closed
6 tasks done
jrieken opened this issue Mar 1, 2021 · 17 comments
Closed
6 tasks done
Assignees
Labels
custom-editors Custom editor API (webview based editors) debt Code quality issues important Issue identified as high-priority notebook workbench-editors Managing of editor widgets in workbench window
Milestone

Comments

@jrieken
Copy link
Member

jrieken commented Mar 1, 2021

Normal file editors, custom editors, and notebooks all implement editors for something on a file system. All implement the relatively rigid structure of editor inputs and editor models. However, they get little to nothing in return for implementing this, there is no service-like infrastructure that makes my live simpler, e.g

  • there is logic for sequentialising saves inside TextFileEditorModel and roughly the same code exists (was copied) for notebook editors. Custom editors tackle the same problem, but uses different code. So, instead of everyone re-inventing the wheel the "framework", which interfaces I must implement, should take care of this for me.
  • there is duplicated logic for when a file is being deleted, e.g text file editors have inOrphanMode and custom editors have _inOrphaned. There is a utility method decorateFileEditorLabel to decorate the editor label to indicate orphaned editors but this should move into a common super type. Notebooks didn't copy this yet. Again, I should be able to express to the framework that I am some kind of file editor and then deletion should be taken care of
  • the logic for general file watching is duplicated, e.g when a file on disk changes we reload it unless the file is dirty (text files, notebooks). The framework should do that automatically because I already tell it how to load and whether my model is dirty or not.
  • [ ] there is a TextFileEditorTracker that takes care of opening dirty models and reloading them when the window gains focus but this code is only used for text file models. Notebooks has NotebookFileTracker for that
  • there is a complex dirty write prevention code in file service validateWriteFile . A simple version was implemented in notebook but it has "false negative" bugs as remote/live-share cases are not covered. Custom editor doesn't handle this yet it seems. Related: Dirty write prevention in notebook content provider #117715
  • text file models register listeners to gracefully finish any pending saves on shutdown, which notebooks and custom editors either do not do or currently fail to do with a high risk of data loss
  • properly restore backups for file working copy providers

I don't think this list is complete but that's my learning from the little exposure I have gotten via the notebook world. The current approach just seems wasteful: we duplicate code, we duplicate bugs, we duplicate work.

@jrieken jrieken added custom-editors Custom editor API (webview based editors) debt Code quality issues notebook labels Mar 1, 2021
@bpasero
Copy link
Member

bpasero commented Mar 1, 2021

Yes, 💯 , already commented here f3f4eeb#commitcomment-47623787 because I felt there is duplication.

I was not aware of the duplication in the save sequentializer. I think having more code shared from inputs and models is a good idea. Maybe there could be an abstract working copy where this lives because at least for text file models, the working copy IS the model.

@bpasero bpasero added this to the On Deck milestone Mar 1, 2021
@bpasero bpasero added the workbench-editors Managing of editor widgets in workbench window label Mar 1, 2021
@bpasero
Copy link
Member

bpasero commented Mar 1, 2021

@jrieken fyi I added 2 more entries to your list: the text file editor tracker and the text file save error handler

@jrieken
Copy link
Member Author

jrieken commented Mar 1, 2021

👏 added the NotebookFileTracker

@rebornix
Copy link
Member

rebornix commented Mar 2, 2021

added "dirty write preventer" to the list, revealed this through #117715

mjbvz added a commit that referenced this issue Mar 2, 2021
I previously duplicated this logic. It makes sense to reuse the strings and logic for consistency

From #117873
@jrieken jrieken added the important Issue identified as high-priority label Mar 2, 2021
@jrieken jrieken modified the milestones: On Deck, March 2021 Mar 2, 2021
@bpasero
Copy link
Member

bpasero commented Mar 3, 2021

Added this which I only recently added to text file models:

  • text file models register listeners to gracefully finish any pending saves on shutdown, which notebooks and custom editors either do not do or currently fail to do with a high risk of data loss

@bpasero bpasero changed the title Editor model world should offer some amenities Editor model world should offer some amenities to protect against data loss / file corruption Mar 3, 2021
@bpasero
Copy link
Member

bpasero commented Mar 3, 2021

@jrieken @rebornix since this issue seems to be important due to data loss / file corruption concerns I would like to understand how today it is possible for notebooks to suffer from this and why notebooks have to implement custom logic. Specifically, I do not understand why notebooks ask extensions to implement the save logic or even the backup logic:

saveNotebook(document: NotebookDocument, cancellation: CancellationToken): Thenable<void>;
// eslint-disable-next-line vscode-dts-provider-naming
saveNotebookAs(targetResource: Uri, document: NotebookDocument, cancellation: CancellationToken): Thenable<void>;
// eslint-disable-next-line vscode-dts-provider-naming
backupNotebook(document: NotebookDocument, context: NotebookDocumentBackupContext, cancellation: CancellationToken): Thenable<NotebookDocumentBackup>;

Why can we not apply the same concept we did for custom editors where we distinguish between:

  • text file based custom editors (99% case)
  • non-file or binary file based custom editors (1% case)

I think the best outcome is for an extension to not implement save, backup etc. at all because then we can guarantee that our handling of files is proper including all the tricks for sequentialising saves and dirty write prevention. I think text file based notebooks are (like custom editors) the 99% case and making that work properly without a lot of work on extensions seems right to me.

If notebooks were using text file models for everything like custom editors do, you don't have to duplicate any code. Here is how custom editors is doing that via ITextModelResolverService.createModelReference(resource):

return instantiationService.invokeFunction(async accessor => {
const textModelResolverService = accessor.get(ITextModelService);
const textFileService = accessor.get(ITextFileService);
const model = await textModelResolverService.createModelReference(resource);
return new CustomTextEditorModel(viewType, resource, model, textFileService);
});

@jrieken
Copy link
Member Author

jrieken commented Mar 3, 2021

Specifically, I do not understand why notebooks ask extensions to implement the save logic or even the backup logic:

We cannot save notebooks because we don't know what format they are being saved in. For instance there is the ipynb-format which is the most popular format which is text/json based. However, other notebook types can use other file formats (different text format, maybe binary) and therefore we cannot make assumptions.

@bpasero
Copy link
Member

bpasero commented Mar 3, 2021

@jrieken sure, we have had the same problem with custom editors, but we still ended up with a simple API for the typical case and a more advanced API for the rare cases. Why do we need to have the same discussion again for notebooks?

@jrieken
Copy link
Member Author

jrieken commented Mar 3, 2021

Why do we need to have the same discussion again for notebooks?

A) It's a very different problem. B) We should also care about the non-typical case.

@jrieken @rebornix since this issue seems to be important due to data loss / file corruption concerns I would like to understand how today it is possible for notebooks to suffer from this

A sample, of which I am unsure if it yields in dataloss or not, is the handling of the file stat. In text editor land there exists this

private updateLastResolvedFileStat(newFileStat: IFileStatWithMetadata): void {
// First resolve - just take
if (!this.lastResolvedFileStat) {
this.lastResolvedFileStat = newFileStat;
}
// Subsequent resolve - make sure that we only assign it if the mtime is equal or has advanced.
// This prevents race conditions from loading and saving. If a save comes in late after a revert
// was called, the mtime could be out of sync.
else if (this.lastResolvedFileStat.mtime <= newFileStat.mtime) {
this.lastResolvedFileStat = newFileStat;
}
}

This function is called with fresh stat objects from either write or read. The comments says that due to race conditions you don't trust that stat-object but check its mtime. Notebooks doesn't have this check. Please tell me if we need this check or not. Is the race condition happening because of a bug in the file service? Is it because the text file model itself doesn't cancel when revert happens before load or save is done?

This is just one sample and it scares us. Your code was obviously written for a reason but we don't know why and we have no way to re-use it. Today's only option is copy-paste.

Other samples that we kinda copied but not really are all over the place:

This basically continues with every little special case that textFileEditorModel.ts has (has for a good reason I assume) and where we wrote naive code which works most of the time.

@bpasero
Copy link
Member

bpasero commented Mar 4, 2021

@jrieken before going into details of how the text file code works, how do we expect extension authors to deal with these concepts if they have to implement saveNotebook? I bet every extension would implement save via fs.writeFile(), even if you tell them to use vscode.fs you cannot enforce it. And even if we somehow manage to enforce them to use vscode.fs APIs, we have no API currently to enable the flow of dirty-write prevention for the extension author because that currently works like this:

  • text file models keep track of their last resolved stat (by simply keeping it around when the model is loaded)
  • whenever text file models save, they pass on that etag from the stat
  • after successful save, the stat is remembered again because it changed now
  • there is no concept of etag in FileStat

Even if the workbench provides you with a reusable working copy that deals with all of this, it is useless because the truth happens in the extension code and not the workbench.

If we want to ensure that saves are 100% safe, I think we should:

  • allow extensions to provide text document based notebooks that don't require to implement any save methods (most cases)
  • optionally allow to provide binary file based notebooks as needed [1]

[1] for this case, instead of implementing a method saveNotebook, I would have a method getBytes and we would be in charge of saving to disk instead of the extension. That would give us at least a chance to implement dirty-write protection in some way. We would still need to bring the concepts over to the extension host because we obviously don't want to serialise the byte data over to the client.

@jrieken
Copy link
Member Author

jrieken commented Mar 4, 2021

there is no concept of etag in FileStat

Isn't the etag directly derived from stat information? Like here?

it is useless because the truth happens in the extension code and not the workbench.

So, you are saying that with contributed file systems, like in codespaces, these checks are useless? Like even for normal text files because the truth happens in the extension code? Our point is that extensions are in charge of getting this right and that we want to make it easy for them to get this right (and know when they get it wrong)

If we want to ensure that saves are 100% safe, I think we should:

There is no 100%, also not for text files but there is hard trying (which we want to re-use). As long as stat and write/read are distinct calls (and that's how file systems work) you cannot guarantee anything. You cannot know what happens in-between those two and this gets amplified when using "funny" filesystem (smb) or file systems that are implemented in the extension host. I believe there is common agreement to accept this "risk" and IMO that's why we have robust code to deal with this.

allow extensions to provide text document based notebooks that don't require to implement any save methods (most cases)

I think this is not a good model. A "notebook document" is a domain model object, just like a "text document" is a domain model object. Both share that they are saved as files on disk (in the common case). Now saving that you need to transform domain model A into B in order to save it is just wrong.

Also, this very different form a custom editor, like for XML files. In custom editor land we know nothing about the editor, for notebooks we know everything about the editor but nothing about the data format. For instance, in notebooks we don't want the undo behaviour of text documents, we have clear undo rules defined. A custom editor wants the text document undo behaviour because it must use the text document primitives to make changes. This list goes on.

Please trust us when we say that notebooks are very similar to custom binary editors. In fact all our questions/findings map exactly onto those editors too. We have discussed these different "get raw notebook" data approaches and we ended up with the current approach because a) it is consistent with an existing API that solves the same problem, b) it has the same requirements. For instance, a notebook can save a markdown-variant of itself so that it can be previewed on GH. You often see foo.ipynb with a sibling like foo.ipynb.md and we want to enable these cases.

@bpasero
Copy link
Member

bpasero commented Mar 4, 2021

After 1on1 with Jo, here are some ideas going forward what workbench can provide: assuming that we typically have a model where a working copy is always backed by 1 file resource (either text or binary - doesn't really matter), we can move and share a lot of the code around text file models to be reused by probably all notebook and custom editors (even search editor).

Rough ideas:

  • introduce an abstract working copy super type that is file resource based and provides
    • sequential saving support
    • dirty write prevention (via stat and etag) [1]
    • maybe: conditional read support (aka don't load the model again if etag match)
  • introduce the notion of file based working copies to working copy service [2]
    • observes working copies that are pending to be saved to join on shutdown to prevent data corruption
    • watches for changes on the file resource to
      • reload the working copy on changes
      • toggle orphaned state for deletions
    • ensures dirty working copies are opened as editors
    • ensures working copies are reloaded when focus moves back to the window

[1] For now, save conflict resolution would probably be simple and not involve the more complex diff support we have for text files

[2] This requires some changes to the registration of working copies: We should allow to register the multiple working copies of the same resource because that can easily happen when the same file is opened with different editors. Maybe a combination of resource + editorId / viewType could be used, because the working copy service (or some contributed tracker) will have to know how to open an editor when a working copy is getting dirty

@bpasero
Copy link
Member

bpasero commented Mar 19, 2021

I have pushed a first version of IFileWorkingCopy and IFileWorkingCopyManager (see #119233) which so far covers these things:

  • save sequentialising
  • dirty write prevention
  • conditional load (aka return early if file has not changed on disk)
  • orphaned file detection
  • reloading model on file changes (unless dirty)
  • undo handling (marking working copy as saved when undoing to saved version)
  • automated backup handling with file stat metadata (to preserve dirty write prevention across restarts)

@bpasero
Copy link
Member

bpasero commented Mar 22, 2021

With d9d0879 there is now:

  • "Save As..." support
  • automated participation in working copy file events [1]

[1] this means that file working copies track move/copy operations and automatically resolve again after the operation preserving any dirty contents that may exist. We already have generic code that will automatically reopen editors when they are moved via the EditorInput#rename method.

@bpasero
Copy link
Member

bpasero commented Apr 22, 2021

Small update:

  • backup service is now capable of backing up file working copies raw VSBuffer contents and allows to backup the same resource provided that the typeId is different
  • working copies can have the same URI when typeId is different (and this has been adopted for simple notebooks)
  • I feel that the tracker for ensuring dirty working copies open as editor cannot easily be reused currently and notebooks already have their own solution as well as text files (maybe we can revisit that later, but currently there is just no knowledge for working copies to know which editor to open in)

The remaining work is around how to restore backups: we have complicated code (involving the famous custom editor input factory) to try to open the right editor for a backup. This is currently quite broken for file working copies. After briefly talking with Jo, one idea would be to introduce a new event that anyone can subscribe to that is owning a working copy to figure out when a backup was found. It would then be the responsibility of the listener to restore that working copy. If a working copy backup is not restored, it would be the responsibility of the backup service to keep the backup around until it is handled (this can happen e.g. when an extension is no longer installed that belongs to a backup).

@bpasero
Copy link
Member

bpasero commented Apr 23, 2021

To handle backups properly, there is a new IWorkingCopyEditorService. This defines the minimal interface for a IWorkingCopy to correlate to IEditorInput, this is needed to restore backups. Currently this is only used to restore backups, but I think with this service things like opening dirty working copies automatically (aka "tracker") would actually also be possible to provide in a generic way (will look into that independently).

For notebooks this means, a new NotebookWorkingCopyEditorHandler is contributed (similar to the "tracker"):

The mapping code is quite straight forward and works on the fact that notebook working copies have a prefix of notebook/:

this._register(this._workingCopyEditorService.registerHandler({
handles: async workingCopy => typeof this.getViewType(workingCopy) === 'string',
isOpen: async (workingCopy, editor) => editor instanceof NotebookEditorInput && editor.viewType === this.getViewType(workingCopy),
createEditor: async workingCopy => NotebookEditorInput.create(this._instantiationService, workingCopy.resource, this.getViewType(workingCopy)!)
}));

The backup code has been changed to no longer delete backups that are not handled. In other words, even if a notebook extension is not installed anymore and if we have backups, these backups will persist on disk until the can be handled again. Previously those backups were lost.

It is now the responsibility of any working copy provider to register a IWorkingCopyEditorHandler in order for backups to restore properly. There is no other UI currently to indicate that backups exist but cannot be restored.

Will write a test plan item for this, probably leveraging the GH issue notebook.

Since the main tasks here are finished, I will close the issue. There is a few things that can be added in May still, but I think the critical aspects are done.

@bpasero bpasero closed this as completed Apr 23, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Jun 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
custom-editors Custom editor API (webview based editors) debt Code quality issues important Issue identified as high-priority notebook workbench-editors Managing of editor widgets in workbench window
Projects
None yet
Development

No branches or pull requests

5 participants
@rebornix @bpasero @jrieken @mjbvz and others