Editor model world should offer some amenities to protect against data loss / file corruption #117873

jrieken · 2021-03-01T08:34:33Z

Normal file editors, custom editors, and notebooks all implement editors for something on a file system. All implement the relatively rigid structure of editor inputs and editor models. However, they get little to nothing in return for implementing this, there is no service-like infrastructure that makes my live simpler, e.g

there is logic for sequentialising saves inside TextFileEditorModel and roughly the same code exists (was copied) for notebook editors. Custom editors tackle the same problem, but uses different code. So, instead of everyone re-inventing the wheel the "framework", which interfaces I must implement, should take care of this for me.
there is duplicated logic for when a file is being deleted, e.g text file editors have inOrphanMode and custom editors have _inOrphaned. There is a utility method decorateFileEditorLabel to decorate the editor label to indicate orphaned editors but this should move into a common super type. Notebooks didn't copy this yet. Again, I should be able to express to the framework that I am some kind of file editor and then deletion should be taken care of
the logic for general file watching is duplicated, e.g when a file on disk changes we reload it unless the file is dirty (text files, notebooks). The framework should do that automatically because I already tell it how to load and whether my model is dirty or not.
[ ] there is a TextFileEditorTracker that takes care of opening dirty models and reloading them when the window gains focus but this code is only used for text file models. Notebooks has NotebookFileTracker for that
there is a complex dirty write prevention code in file service validateWriteFile . A simple version was implemented in notebook but it has "false negative" bugs as remote/live-share cases are not covered. Custom editor doesn't handle this yet it seems. Related: Dirty write prevention in notebook content provider #117715
text file models register listeners to gracefully finish any pending saves on shutdown, which notebooks and custom editors either do not do or currently fail to do with a high risk of data loss
properly restore backups for file working copy providers

I don't think this list is complete but that's my learning from the little exposure I have gotten via the notebook world. The current approach just seems wasteful: we duplicate code, we duplicate bugs, we duplicate work.

The text was updated successfully, but these errors were encountered:

bpasero · 2021-03-01T09:01:45Z

Yes, 💯 , already commented here f3f4eeb#commitcomment-47623787 because I felt there is duplication.

I was not aware of the duplication in the save sequentializer. I think having more code shared from inputs and models is a good idea. Maybe there could be an abstract working copy where this lives because at least for text file models, the working copy IS the model.

bpasero · 2021-03-01T09:04:23Z

@jrieken fyi I added 2 more entries to your list: the text file editor tracker and the text file save error handler

jrieken · 2021-03-01T09:09:26Z

👏 added the NotebookFileTracker

rebornix · 2021-03-02T00:11:55Z

added "dirty write preventer" to the list, revealed this through #117715

I previously duplicated this logic. It makes sense to reuse the strings and logic for consistency From #117873

bpasero · 2021-03-03T10:54:08Z

Added this which I only recently added to text file models:

text file models register listeners to gracefully finish any pending saves on shutdown, which notebooks and custom editors either do not do or currently fail to do with a high risk of data loss

bpasero · 2021-03-03T18:07:35Z

@jrieken @rebornix since this issue seems to be important due to data loss / file corruption concerns I would like to understand how today it is possible for notebooks to suffer from this and why notebooks have to implement custom logic. Specifically, I do not understand why notebooks ask extensions to implement the save logic or even the backup logic:

vscode/src/vs/vscode.proposed.d.ts

Lines 1570 to 1574 in 72de8e9

    
           saveNotebook(document: NotebookDocument, cancellation: CancellationToken): Thenable<void>; 
        
           // eslint-disable-next-line vscode-dts-provider-naming 
        
           saveNotebookAs(targetResource: Uri, document: NotebookDocument, cancellation: CancellationToken): Thenable<void>; 
        
           // eslint-disable-next-line vscode-dts-provider-naming 
        
           backupNotebook(document: NotebookDocument, context: NotebookDocumentBackupContext, cancellation: CancellationToken): Thenable<NotebookDocumentBackup>;

Why can we not apply the same concept we did for custom editors where we distinguish between:

text file based custom editors (99% case)
non-file or binary file based custom editors (1% case)

I think the best outcome is for an extension to not implement save, backup etc. at all because then we can guarantee that our handling of files is proper including all the tricks for sequentialising saves and dirty write prevention. I think text file based notebooks are (like custom editors) the 99% case and making that work properly without a lot of work on extensions seems right to me.

If notebooks were using text file models for everything like custom editors do, you don't have to duplicate any code. Here is how custom editors is doing that via ITextModelResolverService.createModelReference(resource):

vscode/src/vs/workbench/contrib/customEditor/common/customTextEditorModel.ts

Lines 23 to 28 in f3f4eeb

    
           return instantiationService.invokeFunction(async accessor => { 
        
           	const textModelResolverService = accessor.get(ITextModelService); 
        
           	const textFileService = accessor.get(ITextFileService); 
        
           	const model = await textModelResolverService.createModelReference(resource); 
        
           	return new CustomTextEditorModel(viewType, resource, model, textFileService); 
        
           });

jrieken · 2021-03-03T18:25:37Z

Specifically, I do not understand why notebooks ask extensions to implement the save logic or even the backup logic:

We cannot save notebooks because we don't know what format they are being saved in. For instance there is the ipynb-format which is the most popular format which is text/json based. However, other notebook types can use other file formats (different text format, maybe binary) and therefore we cannot make assumptions.

bpasero · 2021-03-03T18:32:07Z

@jrieken sure, we have had the same problem with custom editors, but we still ended up with a simple API for the typical case and a more advanced API for the rare cases. Why do we need to have the same discussion again for notebooks?

jrieken · 2021-03-03T18:46:03Z

Why do we need to have the same discussion again for notebooks?

A) It's a very different problem. B) We should also care about the non-typical case.

@jrieken @rebornix since this issue seems to be important due to data loss / file corruption concerns I would like to understand how today it is possible for notebooks to suffer from this

A sample, of which I am unsure if it yields in dataloss or not, is the handling of the file stat. In text editor land there exists this

vscode/src/vs/workbench/services/textfile/common/textFileEditorModel.ts

Lines 877 to 890 in 509cd79

    
           private updateLastResolvedFileStat(newFileStat: IFileStatWithMetadata): void { 
        
           	// First resolve - just take 
        
           	if (!this.lastResolvedFileStat) { 
        
           		this.lastResolvedFileStat = newFileStat; 
        
           	} 
        
           	// Subsequent resolve - make sure that we only assign it if the mtime is equal or has advanced. 
        
           	// This prevents race conditions from loading and saving. If a save comes in late after a revert 
        
           	// was called, the mtime could be out of sync. 
        
           	else if (this.lastResolvedFileStat.mtime <= newFileStat.mtime) { 
        
           		this.lastResolvedFileStat = newFileStat; 
        
           	} 
        
           }

This function is called with fresh stat objects from either write or read. The comments says that due to race conditions you don't trust that stat-object but check its mtime. Notebooks doesn't have this check. Please tell me if we need this check or not. Is the race condition happening because of a bug in the file service? Is it because the text file model itself doesn't cancel when revert happens before load or save is done?

This is just one sample and it scares us. Your code was obviously written for a reason but we don't know why and we have no way to re-use it. Today's only option is copy-paste.

Other samples that we kinda copied but not really are all over the place:

do we need save cancellation like you have
does the framework not prevent disposing while save is happen as this comment suggests?
you have conflict mode which depends on FileOperationResults. We use stat-objects. What's safer to do?
...

This basically continues with every little special case that textFileEditorModel.ts has (has for a good reason I assume) and where we wrote naive code which works most of the time.

bpasero · 2021-03-04T06:46:21Z

@jrieken before going into details of how the text file code works, how do we expect extension authors to deal with these concepts if they have to implement saveNotebook? I bet every extension would implement save via fs.writeFile(), even if you tell them to use vscode.fs you cannot enforce it. And even if we somehow manage to enforce them to use vscode.fs APIs, we have no API currently to enable the flow of dirty-write prevention for the extension author because that currently works like this:

text file models keep track of their last resolved stat (by simply keeping it around when the model is loaded)
whenever text file models save, they pass on that etag from the stat
after successful save, the stat is remembered again because it changed now
there is no concept of etag in FileStat

Even if the workbench provides you with a reusable working copy that deals with all of this, it is useless because the truth happens in the extension code and not the workbench.

If we want to ensure that saves are 100% safe, I think we should:

allow extensions to provide text document based notebooks that don't require to implement any save methods (most cases)
optionally allow to provide binary file based notebooks as needed [1]

[1] for this case, instead of implementing a method saveNotebook, I would have a method getBytes and we would be in charge of saving to disk instead of the extension. That would give us at least a chance to implement dirty-write protection in some way. We would still need to bring the concepts over to the extension host because we obviously don't want to serialise the byte data over to the client.

jrieken · 2021-03-04T08:41:03Z

there is no concept of etag in FileStat

Isn't the etag directly derived from stat information? Like here?

it is useless because the truth happens in the extension code and not the workbench.

So, you are saying that with contributed file systems, like in codespaces, these checks are useless? Like even for normal text files because the truth happens in the extension code? Our point is that extensions are in charge of getting this right and that we want to make it easy for them to get this right (and know when they get it wrong)

If we want to ensure that saves are 100% safe, I think we should:

There is no 100%, also not for text files but there is hard trying (which we want to re-use). As long as stat and write/read are distinct calls (and that's how file systems work) you cannot guarantee anything. You cannot know what happens in-between those two and this gets amplified when using "funny" filesystem (smb) or file systems that are implemented in the extension host. I believe there is common agreement to accept this "risk" and IMO that's why we have robust code to deal with this.

allow extensions to provide text document based notebooks that don't require to implement any save methods (most cases)

I think this is not a good model. A "notebook document" is a domain model object, just like a "text document" is a domain model object. Both share that they are saved as files on disk (in the common case). Now saving that you need to transform domain model A into B in order to save it is just wrong.

Also, this very different form a custom editor, like for XML files. In custom editor land we know nothing about the editor, for notebooks we know everything about the editor but nothing about the data format. For instance, in notebooks we don't want the undo behaviour of text documents, we have clear undo rules defined. A custom editor wants the text document undo behaviour because it must use the text document primitives to make changes. This list goes on.

Please trust us when we say that notebooks are very similar to custom binary editors. In fact all our questions/findings map exactly onto those editors too. We have discussed these different "get raw notebook" data approaches and we ended up with the current approach because a) it is consistent with an existing API that solves the same problem, b) it has the same requirements. For instance, a notebook can save a markdown-variant of itself so that it can be previewed on GH. You often see foo.ipynb with a sibling like foo.ipynb.md and we want to enable these cases.

bpasero · 2021-03-04T15:11:50Z

After 1on1 with Jo, here are some ideas going forward what workbench can provide: assuming that we typically have a model where a working copy is always backed by 1 file resource (either text or binary - doesn't really matter), we can move and share a lot of the code around text file models to be reused by probably all notebook and custom editors (even search editor).

Rough ideas:

introduce an abstract working copy super type that is file resource based and provides
- sequential saving support
- dirty write prevention (via stat and etag) [1]
- maybe: conditional read support (aka don't load the model again if etag match)
introduce the notion of file based working copies to working copy service [2]
- observes working copies that are pending to be saved to join on shutdown to prevent data corruption
- watches for changes on the file resource to
  - reload the working copy on changes
  - toggle orphaned state for deletions
- ensures dirty working copies are opened as editors
- ensures working copies are reloaded when focus moves back to the window

[1] For now, save conflict resolution would probably be simple and not involve the more complex diff support we have for text files

[2] This requires some changes to the registration of working copies: We should allow to register the multiple working copies of the same resource because that can easily happen when the same file is opened with different editors. Maybe a combination of resource + editorId / viewType could be used, because the working copy service (or some contributed tracker) will have to know how to open an editor when a working copy is getting dirty

bpasero · 2021-03-19T09:38:13Z

I have pushed a first version of IFileWorkingCopy and IFileWorkingCopyManager (see #119233) which so far covers these things:

save sequentialising
dirty write prevention
conditional load (aka return early if file has not changed on disk)
orphaned file detection
reloading model on file changes (unless dirty)
undo handling (marking working copy as saved when undoing to saved version)
automated backup handling with file stat metadata (to preserve dirty write prevention across restarts)

bpasero · 2021-03-22T08:04:30Z

With d9d0879 there is now:

"Save As..." support
automated participation in working copy file events [1]

[1] this means that file working copies track move/copy operations and automatically resolve again after the operation preserving any dirty contents that may exist. We already have generic code that will automatically reopen editors when they are moved via the EditorInput#rename method.

bpasero · 2021-04-22T08:15:07Z

Small update:

backup service is now capable of backing up file working copies raw VSBuffer contents and allows to backup the same resource provided that the typeId is different
working copies can have the same URI when typeId is different (and this has been adopted for simple notebooks)
I feel that the tracker for ensuring dirty working copies open as editor cannot easily be reused currently and notebooks already have their own solution as well as text files (maybe we can revisit that later, but currently there is just no knowledge for working copies to know which editor to open in)

The remaining work is around how to restore backups: we have complicated code (involving the famous custom editor input factory) to try to open the right editor for a backup. This is currently quite broken for file working copies. After briefly talking with Jo, one idea would be to introduce a new event that anyone can subscribe to that is owning a working copy to figure out when a backup was found. It would then be the responsibility of the listener to restore that working copy. If a working copy backup is not restored, it would be the responsibility of the backup service to keep the backup around until it is handled (this can happen e.g. when an extension is no longer installed that belongs to a backup).

…ng copy editor service (#117873)

…red backups (#117873)

bpasero · 2021-04-23T14:34:49Z

To handle backups properly, there is a new IWorkingCopyEditorService. This defines the minimal interface for a IWorkingCopy to correlate to IEditorInput, this is needed to restore backups. Currently this is only used to restore backups, but I think with this service things like opening dirty working copies automatically (aka "tracker") would actually also be possible to provide in a generic way (will look into that independently).

For notebooks this means, a new NotebookWorkingCopyEditorHandler is contributed (similar to the "tracker"):

The mapping code is quite straight forward and works on the fact that notebook working copies have a prefix of notebook/:

vscode/src/vs/workbench/contrib/notebook/browser/notebook.contribution.ts

Lines 430 to 434 in a14e4d0

    
           this._register(this._workingCopyEditorService.registerHandler({ 
        
           	handles: async workingCopy => typeof this.getViewType(workingCopy) === 'string', 
        
           	isOpen: async (workingCopy, editor) => editor instanceof NotebookEditorInput && editor.viewType === this.getViewType(workingCopy), 
        
           	createEditor: async workingCopy => NotebookEditorInput.create(this._instantiationService, workingCopy.resource, this.getViewType(workingCopy)!) 
        
           }));

The backup code has been changed to no longer delete backups that are not handled. In other words, even if a notebook extension is not installed anymore and if we have backups, these backups will persist on disk until the can be handled again. Previously those backups were lost.

It is now the responsibility of any working copy provider to register a IWorkingCopyEditorHandler in order for backups to restore properly. There is no other UI currently to indicate that backups exist but cannot be restored.

Will write a test plan item for this, probably leveraging the GH issue notebook.

Since the main tasks here are finished, I will close the issue. There is a few things that can be added in May still, but I think the critical aspects are done.

jrieken assigned bpasero, mjbvz, rebornix and jrieken Mar 1, 2021

jrieken added custom-editors Custom editor API (webview based editors) debt Code quality issues notebook labels Mar 1, 2021

bpasero added this to the On Deck milestone Mar 1, 2021

bpasero added the workbench-editors Managing of editor widgets in workbench window label Mar 1, 2021

jrieken mentioned this issue Mar 1, 2021

Revisit custom editor input factory #117208

Closed

mjbvz added a commit that referenced this issue Mar 2, 2021

Extract decorateFileEditorLabel to a common function

9f7d229

I previously duplicated this logic. It makes sense to reuse the strings and logic for consistency From #117873

jrieken added the important Issue identified as high-priority label Mar 2, 2021

jrieken modified the milestones: On Deck, March 2021 Mar 2, 2021

bpasero changed the title ~~Editor model world should offer some amenities~~ Editor model world should offer some amenities to protect against data loss / file corruption Mar 3, 2021

kieferrm mentioned this issue Mar 8, 2021

Iteration Plan for March 2021 #118334

Closed

76 tasks

bpasero mentioned this issue Mar 15, 2021

Search editor: investigate to use file working copies #118947

Closed

This was referenced Mar 15, 2021

Custom text editors: dirty write prevention actions do not appear properly #118949

Closed

Initial file working copies #119233

Merged

bpasero modified the milestones: March 2021, April 2021 Mar 24, 2021

bpasero mentioned this issue Mar 24, 2021

Editor content conflicts with content saved on disk microsoft/vscode-hexeditor#276

Closed

rebornix removed their assignment Mar 30, 2021

kieferrm mentioned this issue Apr 5, 2021

Iteration Plan for April 2021 #120585

Closed

75 tasks

bpasero mentioned this issue Apr 19, 2021

Ben/workingcopy typeid #121646

Merged

bpasero added a commit that referenced this issue Apr 23, 2021

file working copy - provide some basic save error handling (#117873)

666f3ac

bpasero added a commit that referenced this issue Apr 23, 2021

working copy - first cut of a backup restorer that works on new worki…

a14e4d0

…ng copy editor service (#117873)

bpasero added a commit that referenced this issue Apr 23, 2021

working copy - implement backup restore in tracker and keep non resto…

9a978d2

…red backups (#117873)

bpasero closed this as completed Apr 23, 2021

bpasero mentioned this issue Apr 23, 2021

Test: (simple) Notebook file working copy #122033

Closed

3 tasks

kieferrm mentioned this issue May 10, 2021

Iteration Plan for May 2021 #123368

Closed

77 tasks

github-actions bot locked and limited conversation to collaborators Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editor model world should offer some amenities to protect against data loss / file corruption #117873

Editor model world should offer some amenities to protect against data loss / file corruption #117873

jrieken commented Mar 1, 2021 •

edited by bpasero

Loading

bpasero commented Mar 1, 2021

bpasero commented Mar 1, 2021

jrieken commented Mar 1, 2021

rebornix commented Mar 2, 2021

bpasero commented Mar 3, 2021

bpasero commented Mar 3, 2021 •

edited

Loading

jrieken commented Mar 3, 2021

bpasero commented Mar 3, 2021

jrieken commented Mar 3, 2021 •

edited

Loading

bpasero commented Mar 4, 2021 •

edited

Loading

jrieken commented Mar 4, 2021

bpasero commented Mar 4, 2021

bpasero commented Mar 19, 2021

bpasero commented Mar 22, 2021

bpasero commented Apr 22, 2021 •

edited

Loading

bpasero commented Apr 23, 2021 •

edited

Loading

Editor model world should offer some amenities to protect against data loss / file corruption #117873

Editor model world should offer some amenities to protect against data loss / file corruption #117873

Comments

jrieken commented Mar 1, 2021 • edited by bpasero Loading

bpasero commented Mar 1, 2021

bpasero commented Mar 1, 2021

jrieken commented Mar 1, 2021

rebornix commented Mar 2, 2021

bpasero commented Mar 3, 2021

bpasero commented Mar 3, 2021 • edited Loading

jrieken commented Mar 3, 2021

bpasero commented Mar 3, 2021

jrieken commented Mar 3, 2021 • edited Loading

bpasero commented Mar 4, 2021 • edited Loading

jrieken commented Mar 4, 2021

bpasero commented Mar 4, 2021

bpasero commented Mar 19, 2021

bpasero commented Mar 22, 2021

bpasero commented Apr 22, 2021 • edited Loading

bpasero commented Apr 23, 2021 • edited Loading

jrieken commented Mar 1, 2021 •

edited by bpasero

Loading

bpasero commented Mar 3, 2021 •

edited

Loading

jrieken commented Mar 3, 2021 •

edited

Loading

bpasero commented Mar 4, 2021 •

edited

Loading

bpasero commented Apr 22, 2021 •

edited

Loading

bpasero commented Apr 23, 2021 •

edited

Loading