-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin-ext: Change PluginsKeyValueStorage
to not continuously access the disk
#12236
Conversation
I pushed changes to use another semaphore library that is already used by Theia: I also cleaned up the class to get rid of the asynchronous Lastly, is there a way to write browser tests for this fix? From what I understand this class is quite nested in the plugin system infrastructure, so if it's too difficult writing integration tests please write unit tests then. Here's an example for a unit test I think could be good: const tasks = [];
for (let i = 0; i < 1000; i++) {
tasks.push(storage.set('test_value', i));
}
await Promise.allSettled(tasks);
assert.deepStrictEqual(
await storage.get('test_value'),
999,
'the last value being set should have been 999'
); edit: Updated the snippet to something that should work Previously the result of this snippet would have been non deterministic? |
The last commit I pushed adds logic to use unique mutexes based on the file path being written to. It seems to be something that can happen based on the |
aaffc67
to
5a8138a
Compare
One thing to consider, at least re: writing: there's already something doing a somewhat similar job for our preference files on the frontend in |
@colin-grant-work is it me or the component you linked seems hyper-specific to preferences? The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about the lifecycle of this system: it seems to me we only need to read from the file once at startup, and after that, all changes can be kept in memory and written to the underlying storage only at the end of the session (or periodically to avoid breakdown). That should mean that we never have a problem with concurrent reading and writing: before we write anything, we make sure we've read, and thereafter we never need to read again.
Yes, but it would do the job that we need to do for the |
I've thought about it too, but it would require heavily modifying this component. The way it currently operates might not be the most efficient you are right, and the current fix was the lowest effort required to patch the issue. I'll look into storing key/values in memory and periodically synchronizing on disk. |
I also think that @tsmaeder's point about this being an OS-level problem merits serious consideration, because two Theia instances running on the same machine will both want access to the same files, and as long as we're not using the single-instance lock to ensure that we only have one backend, we've got no guarantee that they won't be operating on the file at the same item. It also increases the urgency of keeping the file up to date with regular writes, because the second instance would want to have an up-to-date version of the data when it starts up. But that means that all changes need to be written promptly to disk, and that means that the probability of OS-level concurrent access is non-negligible. |
Looking at "system file locking" was inconclusive, and only the It then depends on what kind of behaviour we want: The current approach assumes any update is directly written on disk and visible by subsequent accesses. I'm not sure of the benefits from this approach. |
Well, |
8b2eb37
to
31f5662
Compare
@colin-grant-work I used the strategy you mentioned where we use an in-memory cache first for the values, and only sync with the disk every so often. The sync delay is long enough with a bit of randomness as to avoid collisions with other Theia instances. It might not be fool-proof but it's arguably better than the previous implementation. I think we'll have a long and happy life before we'll get to see problems with this new implementation, but I might be wrong. |
Have you tried to disable the realPath option? Else creating an empty file first, if/when it doesn't yet exists, looks like a viable option (the file itself is not used as the lock, so this should not introduce race conditions). |
The scope of the PR has changed quite a bit, so probably the title/description should be updated accordingly. |
PluginsKeyValueStorage
to not continuously access the disk
To @msujew, @JonasHelming, can someone from either TypeFox or EclipseSource review this patch? Summary of the changes so far: Don't access the disk for each read/write access and instead synchronize on disk every so often. Prevention against multiple Theia instances trying to use the same file is postponed until someone else is free to implement it as it should be easy to add later if we need it (just update the |
…al plugin storage file Fixes #8384 Added the use of a semaphore to lock access to the global plugin storage file, to avoid it being accessed in a disorderly fashion, causing potential corruption. As per the issue discussion, it's possible that there is also a potential race condition caused by node's write function, not flushing its buffers to disk, but in this specific case, it was obvious that parallel calls to set() and get() were the main cause of the problems. Signed-off-by: Marc Dumais <marc.dumais@ericsson.com>
Inversify 4 doesn't really support async postConstruct methods.
Also put a guard around the folder initialization before writing to the `global-state.json` file.
Turns out the `PluginsKeyValueStorage` implementation handles more than just the one `global-state.json` file, so we need to make sure to guard against concurrent access to these files too.
Signed-off-by: Marc Dumais <marc.dumais@ericsson.com>
The native Node file system APIs are powerful but low level: Nothing prevents bad code from concurrently writing to the same file. `FileSystemLocking` is meant to be a central component for any method handling files and wanting to prevent concurrent access by other parts of the framework.
Instead of reading/writing on each get/set, use an in-memory cache and synchronize the data on disk every so often.
d936c50
to
4675ccb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
- confirmed that the code looks good
- confirmed that the unit-tests pass successfully
- confirmed that the
global-state.json
error no longer occurs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me as well 👍
What it does
Fixes #8384
Added the use of a semaphore to lock access to the global plugin storage file, to avoid it being accessed in a disorderly fashion, causing potential corruption.
As per the issue discussion, it's possible that there is also a potential race condition caused by node's write function, not flushing its buffers to disk, but in this specific case, it was obvious that parallel calls to set() and get() were the main cause of the problems.
How to test
One way is to add ENTER/EXIT debug traces [1] to
packages/plugin-ext/src/main/node/plugins-key-value-storage.ts
, inset()
andget()
and then rebuild , run the browser tests ('yarn browser test`), and confirm that the accesses are now matched. i.e. "ENTER get()" should always be followed by "EXIT get()", and "ENTER set()" always followed by "EXIT set().[1]
Review checklist
Reminder for reviewers