Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimistic get_path() #27

Closed
dlqqq opened this issue Oct 25, 2022 · 4 comments · Fixed by #38
Closed

optimistic get_path() #27

dlqqq opened this issue Oct 25, 2022 · 4 comments · Fixed by #38
Labels
enhancement New feature or request

Comments

@dlqqq
Copy link
Collaborator

dlqqq commented Oct 25, 2022

Problem

get_path() calls sync_all() even when the file wasn't moved.

Proposed Solution

Implement an optimistic strategy that looks up the existing path record in the DB. Then check if the file currently is still present at the same path, with the same inode number and timestamps. If so, then return the path early without calling sync_all().

@dlqqq dlqqq added the enhancement New feature or request label Oct 25, 2022
@davidbrochart
Copy link
Contributor

Actually, it seems that get_path() doesn't call sync_all().

@davidbrochart
Copy link
Contributor

Actually, it seems that get_path() doesn't call sync_all().

This leads to some implementation details leaking out of the API:

file_path = file_id_manager.get_path(file_id)
if file_path is None:
    # the file may have been moved, sync and try again
    file_id_manager.sync_all()
    file_path = file_id_manager.get_path(file_id)

In the end, we just want the path corresponding to an ID, no matter what it takes to get it, so I think we shouldn't have to manually call sync_all().

@kevin-bates
Copy link
Member

Agreed. Since optimistic get_path() will call sync_all() only when necessary, I really don't see the need to expose sync_all() at all, either as a call or a parameter. The client code should look identical irrespective of the FileID implementation. Configuration "knobs" is where things can vary per implementation.

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 26, 2022

@kevin-bates @davidbrochart I've thought about this a lot today. I think an optimistic get_path() can actually replace autosync_interval, i.e. optimistic get_path() is actually what we want. Here's a summary of my thoughts:

  • Subsequent calls of get_path() won't call sync_all() because sync_all() would have already been called previously if there was an out-of-band move. This was the original motivation behind autosync_interval.
  • However, optimistic get_path() only calls sync_all() if there's a mismatch between the recorded file path and the actual file path. It determines this by checking if the inode number and timestamps at the recorded file path match those previously recorded.
    • This is better than autosync_interval, in which get_path() will always call sync_all() if too much time has elapsed.
    • In the "best case" (no files moved out-of-band), then optimistic get_path() never calls sync_all(), which is much faster.
  • Optimistic get_path() will always be correct. If an out-of-band move happens before every call to get_path() (worst case), then get_path() will always call sync_all().
    • autosync_interval will not handle this case as it throttles calls to sync_all().
  • Finally, because optimistic get_path() automatically detects whether sync_all() needs to be called, we can remove it from the public API, and preserve parity with the API of the base class.

In summary, optimistic get_path():

  • performs better in the best case
  • is always correct in the worst case
  • allows us to remove sync_all() and autosync_interval and preserve parity with the API of BaseFileIdManager.

I think this is definitely the direction we want to take with the LocalFileIdManager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants