-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contact file creation: fsync parent directory #2943
contact file creation: fsync parent directory #2943
Conversation
ed5a05c
to
6809e2a
Compare
On file system such as an NFS mounted file system, we may have a delay before the file becomes visible from other process running on other host. We already have a 1st fsync to ensure that the data of the contact file is written to disk. This change adds a 2nd fsync to ensure that the file metadata of the contact file is written as well - by doing an fsync on its directory.
6809e2a
to
57cb26b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unable to replicate issue, change looks sensible.
metomi/rose#2290 is companion of this PR. |
+1, also cannot replicate, but looks OK. Also spent some minutes reading about All checks passed, merging! |
@matthewrmshin watching the Fosdem talks from this year, and found this good one about an old issue in Postgres with Give away, they are planning to tackle this issue over 2 or 3 years 🙂 and quite interesting the issues with Kernel, memory pages, Decided to see if anybody ever had issues with So I think that if some day some user comes with some crash in Cylc, and complains that it looks like there was an error writing to the disk, and he is using NFS, then it is possible that there was an error on flush/fsync/write/etc, but it depends on operating system/file system/kernel version/python version/NFS versions/etc. And there are situations when we just cannot explain what happened. |
BTW I've noticed this many times in the past, but never looked into it (easy enough to manually fire up |
Yes, but users panic and keep blaming the new version. |
Not saying it shouldn't be fixed - just highlighting my own tardiness on never following this one up! |
Watched the video at the link posted by @kinow - all 40-something minutes of it. While not the problem here, it does highlight how little understanding I have with |
At least we are not alone @matthewrmshin . I am reading the Notebook documentation around security, authentication, etc. And found the note
The "working out better ways" links to this issue: jupyter/notebook#1782 |
On file system such as an NFS mounted file system, we may have a delay
before the file becomes visible from other process running on other
host. We already have a 1st fsync to ensure that the data of the contact
file is written to disk. This change adds a 2nd fsync to ensure that the
file metadata of the contact file is written as well - by doing an fsync
on its directory.
(The issue is causing
rose suite-run
to fail to launchcylc gui
after callingcylc run
successfully to start up a suite.)(Need to back port to 7.8.2.)