Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve out-of-band change detection #240

Open
dlqqq opened this issue Feb 28, 2024 · 2 comments
Open

Improve out-of-band change detection #240

dlqqq opened this issue Feb 28, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@dlqqq
Copy link
Member

dlqqq commented Feb 28, 2024

Introduction

An out-of-band (abbrev. OOB) change is an edit of a collaborative file not done through JupyterLab's UI. For example, suppose user A runs git reset --hard <other-branch> while another user B is editing a collaborative document within the same repository. A's git reset deletes all edits made by B and sets the contents of that document to some fixed value. Then the YStore table containing B's edits needs to be dropped, and this event must be somehow communicated to all Yjs clients connected to the document. Here, we say that an out-of-band change occurred on B's document.

Description

The current implementation of OOB change detection results in more data loss than necessary. Fixing this issue will likely reduce instances of user data loss such as the one reported in #219. I've written this issue in two parts below.

Context: how OOB changes are detected currently

OOB change detection is implemented via polling in FileLoader. Here are the relevant portions of its source:

class FileLoader:
    """
    A class to centralize all the operation on a file.
    """

    def __init__(
        self, ...
    ) -> None:
        ...
        self._subscriptions: dict[str, Callable[[], Coroutine[Any, Any, None]]] = {}
        self._watcher = asyncio.create_task(self._watch_file()) if self._poll_interval else None
        self.last_modified = None
    
    def observe(self, id: str, callback: Callable[[], Coroutine[Any, Any, None]]) -> None:
        """
        Subscribe to the file to get notified about out-of-band file changes.

            Parameters:
                    id (str): Room ID
                    callback (Callable): Callback for notifying the room.
        """
        self._subscriptions[id] = callback

    
    async def _watch_file(self) -> None:
        """
        Async task for watching a file.
        """
        ...
        while True:
            try:
                await asyncio.sleep(self._poll_interval)
                try:
                    await self.maybe_notify()
                except Exception as e:
                    self._log.error(f"Error watching file: {self.path}\n{e!r}", exc_info=e)
            except asyncio.CancelledError:
                break
        ...

    async def maybe_notify(self) -> None:
        """
        Notifies subscribed rooms about out-of-band file changes.
        """
        do_notify = False
        async with self._lock:
            # Get model metadata; format and type are not need
            model = await ensure_async(self._contents_manager.get(self.path, content=False))

            if self.last_modified is not None and self.last_modified < model["last_modified"]:
                do_notify = True

            self.last_modified = model["last_modified"]

        if do_notify:
            # Notify out-of-band change
            # callbacks will load the file content, thus release the lock before calling them
            for callback in self._subscriptions.values():
                await callback()

Here, self.last_modified is a value that is set by FileLoader's' filesystem methods, FileLoader.load_content() and FileLoader.maybe_save_content(). FileLoader.maybe_notify() is repeatedly called to check the latest last_modified time (mtime) of the file against its previously recorded mtime. If the latest mtime is greater, then the file was probably modified directly by the server, and an OOB change probably occurred. Each parent is then notified of this OOB change by their callback, previously appended by calling FileLoader.observe().

In jupyter_collaboration, DocumentRoom is the parent initializing the FileLoader. DocumentRoom subscribes to FileLoader with a callback after it is initialized:

class DocumentRoom(YRoom):
    """A Y room for a possibly stored document (e.g. a notebook)."""

    def __init__(
        self, ...
    ):
        super().__init__(ready=False, ystore=ystore, log=log)
        ...
        self._file.observe(self.room_id, self._on_outofband_change)

Issue part 1: Using mtime alone is unreliable

Using mtime alone to detect OOB changes results in more false positives and false negatives than necessary.

When using mtime alone, false negatives (i.e. OOB change going undetected) occur because:

  • A file's mtime can have an imprecision of up to a few seconds (depending on the filesystem). Within that interval, the server can modify a file without updating its mtime, resulting in a false negative.

  • The POSIX standard allows for some system calls like mmap() to update the content of a file without updating its mtime.

When using mtime alone, false positives (i.e. an OOB change event is emitted without an OOB change) occur because:

  • Writing identical content to a file usually updates the mtime but leaves the content unaffected. If an OOB change is signaled here, then a false positive occurs, and collaboration is halted for no reason.
    • This can be triggered by anything that saves a file to disk. For example, if a user opens a file with vim and immediately exits by running :wq, that would cause a OOB change to be wrongly signaled.

Both false positives and false negatives result in data loss of at least one client's intent. If we consider the previous example with A and B, a false positive results in A's git reset going ignored, and a false negative results in halting collaboration on B's notebook no reason. In both cases, either A or B's intended content is ignored.

False positives are especially dangerous because of how easily they can be triggered and how they completely halt collaboration on a document. I suspect that there are specific common user scenarios that trigger these OOB false positives, which contributes towards a higher frequency of data loss issues for users.

Issue part 2: Redundant way of signaling OOB changes

FileLoader has a redundant way of signaling OOB changes to its parent object, but only when a file is saved through FileLoader.maybe_save_content():

    async def maybe_save_content(self, model: dict[str, Any]) -> None:
        """
        Save the content of the file.

            Parameters:
                model (dict): A dictionary with format, type, last_modified, and content of the file.

            Raises:
                OutOfBandChanges: if the file was modified at a latter time than the model

        ### Note:
            If there is changes on disk, this method will raise an OutOfBandChanges exception.
        """
        async with self._lock:
            ...

            if self.last_modified == m["last_modified"]:
                ...
            else:
                # file changed on disk, raise an error
                self.last_modified = m["last_modified"]
                raise OutOfBandChanges

Here, it signals an OOB change having occurred by raising an OutOfBandChanges exception. The redundant signaling of an OOB change causes DocumentRoom to similarly provide a redundant handling of an OOB change:

    async def _maybe_save_document(self, saving_document: asyncio.Task | None) -> None:
        """
        Saves the content of the document to disk.

        ### Note:
            There is a save delay to debounce the save since we could receive a high
            amount of changes in a short period of time. This way we can cancel the
            previous save.
        """
        ...

        try:
            ...
            await self._file.maybe_save_content(...)
            ...
        except OutOfBandChanges:
            ... # <= the block here does not call `self._on_outofband_change()`
        ...

Proposed new implementation

  1. Use other file metadata from os.stat() to detect OOB changes more accurately. Changes in file size should be the deciding factor in whether a file was changed, not changes in mtime.
  2. Replace the redundant signaling of OOB changes with the FileLoader.observe() signaling API.
@dlqqq dlqqq added the bug Something isn't working label Feb 28, 2024
@davidbrochart
Copy link
Collaborator

  1. Use other file metadata from os.stat() to detect OOB changes more accurately. Changes in file size should be the deciding factor in whether a file was changed, not changes in mtime.

I agree that file modification time is not optimal to check if the file was saved by Jupyter or externally. But as I mentioned here, I think the best way to improve the situation would be to use the new hash field of the contents model.

2. Replace the redundant signaling of OOB changes with the FileLoader.observe() signaling API.

Let me explain what you call "redundant signaling of OOB". There is a polling at regular intervals to check for file changes (regardless of whether Jupyter is saving), and a check that happens when Jupyter saves a file. The former is needed because we want users to have their document updated if an OOB happens. But since it's done by polling, we could easily miss an OOB change when saving. That's why the latter is also needed.
When you say "the block here does not call self._on_outofband_change()", if you look closely the code here is the same as here, so it's equivalent. But I agree that code should not be duplicated.

Happy to review PRs for both points.

@dlqqq
Copy link
Member Author

dlqqq commented Feb 28, 2024

@davidbrochart Yes, I agree. Thanks for clarifying! 👍

I won't work on a PR for this just yet. I'll be focusing on opening issues to track our work, as requested by Brian & Sylvain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants