-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editor model world should offer some amenities to protect against data loss / file corruption #117873
Comments
Yes, 💯 , already commented here f3f4eeb#commitcomment-47623787 because I felt there is duplication. I was not aware of the duplication in the save sequentializer. I think having more code shared from inputs and models is a good idea. Maybe there could be an abstract working copy where this lives because at least for text file models, the working copy IS the model. |
@jrieken fyi I added 2 more entries to your list: the text file editor tracker and the text file save error handler |
👏 added the NotebookFileTracker |
added "dirty write preventer" to the list, revealed this through #117715 |
I previously duplicated this logic. It makes sense to reuse the strings and logic for consistency From #117873
Added this which I only recently added to text file models:
|
@jrieken @rebornix since this issue seems to be important due to data loss / file corruption concerns I would like to understand how today it is possible for notebooks to suffer from this and why notebooks have to implement custom logic. Specifically, I do not understand why notebooks ask extensions to implement the save logic or even the backup logic: vscode/src/vs/vscode.proposed.d.ts Lines 1570 to 1574 in 72de8e9
Why can we not apply the same concept we did for custom editors where we distinguish between:
I think the best outcome is for an extension to not implement If notebooks were using text file models for everything like custom editors do, you don't have to duplicate any code. Here is how custom editors is doing that via vscode/src/vs/workbench/contrib/customEditor/common/customTextEditorModel.ts Lines 23 to 28 in f3f4eeb
|
We cannot save notebooks because we don't know what format they are being saved in. For instance there is the ipynb-format which is the most popular format which is text/json based. However, other notebook types can use other file formats (different text format, maybe binary) and therefore we cannot make assumptions. |
@jrieken sure, we have had the same problem with custom editors, but we still ended up with a simple API for the typical case and a more advanced API for the rare cases. Why do we need to have the same discussion again for notebooks? |
A) It's a very different problem. B) We should also care about the non-typical case.
A sample, of which I am unsure if it yields in dataloss or not, is the handling of the file stat. In text editor land there exists this vscode/src/vs/workbench/services/textfile/common/textFileEditorModel.ts Lines 877 to 890 in 509cd79
This function is called with fresh stat objects from either write or read. The comments says that due to race conditions you don't trust that stat-object but check its mtime. Notebooks doesn't have this check. Please tell me if we need this check or not. Is the race condition happening because of a bug in the file service? Is it because the text file model itself doesn't cancel when revert happens before load or save is done? This is just one sample and it scares us. Your code was obviously written for a reason but we don't know why and we have no way to re-use it. Today's only option is copy-paste. Other samples that we kinda copied but not really are all over the place:
This basically continues with every little special case that |
@jrieken before going into details of how the text file code works, how do we expect extension authors to deal with these concepts if they have to implement
Even if the workbench provides you with a reusable working copy that deals with all of this, it is useless because the truth happens in the extension code and not the workbench. If we want to ensure that saves are 100% safe, I think we should:
[1] for this case, instead of implementing a method |
Isn't the etag directly derived from stat information? Like here?
So, you are saying that with contributed file systems, like in codespaces, these checks are useless? Like even for normal text files because the truth happens in the extension code? Our point is that extensions are in charge of getting this right and that we want to make it easy for them to get this right (and know when they get it wrong)
There is no 100%, also not for text files but there is hard trying (which we want to re-use). As long as stat and write/read are distinct calls (and that's how file systems work) you cannot guarantee anything. You cannot know what happens in-between those two and this gets amplified when using "funny" filesystem (smb) or file systems that are implemented in the extension host. I believe there is common agreement to accept this "risk" and IMO that's why we have robust code to deal with this.
I think this is not a good model. A "notebook document" is a domain model object, just like a "text document" is a domain model object. Both share that they are saved as files on disk (in the common case). Now saving that you need to transform domain model A into B in order to save it is just wrong. Also, this very different form a custom editor, like for XML files. In custom editor land we know nothing about the editor, for notebooks we know everything about the editor but nothing about the data format. For instance, in notebooks we don't want the undo behaviour of text documents, we have clear undo rules defined. A custom editor wants the text document undo behaviour because it must use the text document primitives to make changes. This list goes on. Please trust us when we say that notebooks are very similar to custom binary editors. In fact all our questions/findings map exactly onto those editors too. We have discussed these different "get raw notebook" data approaches and we ended up with the current approach because a) it is consistent with an existing API that solves the same problem, b) it has the same requirements. For instance, a notebook can save a markdown-variant of itself so that it can be previewed on GH. You often see |
After 1on1 with Jo, here are some ideas going forward what workbench can provide: assuming that we typically have a model where a working copy is always backed by 1 file resource (either text or binary - doesn't really matter), we can move and share a lot of the code around text file models to be reused by probably all notebook and custom editors (even search editor). Rough ideas:
[1] For now, save conflict resolution would probably be simple and not involve the more complex diff support we have for text files [2] This requires some changes to the registration of working copies: We should allow to register the multiple working copies of the same resource because that can easily happen when the same file is opened with different editors. Maybe a combination of |
I have pushed a first version of
|
With d9d0879 there is now:
[1] this means that file working copies track move/copy operations and automatically |
Small update:
The remaining work is around how to restore backups: we have complicated code (involving the famous custom editor input factory) to try to open the right editor for a backup. This is currently quite broken for file working copies. After briefly talking with Jo, one idea would be to introduce a new event that anyone can subscribe to that is owning a working copy to figure out when a backup was found. It would then be the responsibility of the listener to restore that working copy. If a working copy backup is not restored, it would be the responsibility of the backup service to keep the backup around until it is handled (this can happen e.g. when an extension is no longer installed that belongs to a backup). |
To handle backups properly, there is a new For notebooks this means, a new The mapping code is quite straight forward and works on the fact that notebook working copies have a prefix of vscode/src/vs/workbench/contrib/notebook/browser/notebook.contribution.ts Lines 430 to 434 in a14e4d0
The backup code has been changed to no longer delete backups that are not handled. In other words, even if a notebook extension is not installed anymore and if we have backups, these backups will persist on disk until the can be handled again. Previously those backups were lost. It is now the responsibility of any working copy provider to register a Will write a test plan item for this, probably leveraging the GH issue notebook. Since the main tasks here are finished, I will close the issue. There is a few things that can be added in May still, but I think the critical aspects are done. |
Normal file editors, custom editors, and notebooks all implement editors for something on a file system. All implement the relatively rigid structure of editor inputs and editor models. However, they get little to nothing in return for implementing this, there is no service-like infrastructure that makes my live simpler, e.g
TextFileEditorModel
and roughly the same code exists (was copied) for notebook editors. Custom editors tackle the same problem, but uses different code. So, instead of everyone re-inventing the wheel the "framework", which interfaces I must implement, should take care of this for me.inOrphanMode
and custom editors have_inOrphaned
. There is a utility methoddecorateFileEditorLabel
to decorate the editor label to indicate orphaned editors but this should move into a common super type. Notebooks didn't copy this yet. Again, I should be able to express to the framework that I am some kind of file editor and then deletion should be taken care of[ ] there is aTextFileEditorTracker
that takes care of opening dirty models and reloading them when the window gains focus but this code is only used for text file models. Notebooks hasNotebookFileTracker
for thatI don't think this list is complete but that's my learning from the little exposure I have gotten via the notebook world. The current approach just seems wasteful: we duplicate code, we duplicate bugs, we duplicate work.
The text was updated successfully, but these errors were encountered: