-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEEK_DATA / SEEK_HOLE do not work as expected with RO if there are pending atime updates #6958
Comments
is this on gentoo? #3125 (comment) |
@bunder2015 no not gentoo, zfs from repo as per commit id's above the release i'm using contains the above referenced fix (454365b) |
@cwedgwood this behavior is an intentional design compromise, see #4306. The problem is that when a file is open and contains dirty blocks it's difficult to determine exactly how those blocks will be written prior to actually writing them. New holes may be created, other holes filled, etc. In order to handle this dirty case gracefully ZFS falls back to the allowed safe behavior as described in the
When the file isn't dirty and it's possible for the filesystem to give an authoritative answer it does so. It will never report data as a hole. If you absolutely always need to have holes reported you can set the |
@behlendorf it's always going to be vulnerable to races so i think it's reasonable for it to give some 'best effort' approximation on test, i'm not entirely sure it's a case for dirty data ... it looks like it happens with recent reads (no writes) inside of a few seconds i will try and get example code showing this (what I have now needs to be pared back) |
Yes, this could probably be further optimized to say only apply to the dirty regions of given file. The thing is we need to be absolutely certain that a hole is never reported where data is about to be written. Doing so can confuse some of the system tools like
The window here is a full txg sync, so up to about 5s without writes is needed. |
@behlendorf you cannot prevent that in all cases for zfs or any other fs ... why pretend? updates/changes can always race with cp; if tools are to optimize in these cases they need to map afterwards, look at mtime/ctime, retry, error, cry or self-destruct as appropriate ignoring rw/updates for now though it happens on read-only ; here i'm using
are you sure there isn't some lifetime issue between state in the linux vfs and the underlying zpl that means after a read (seeking doesn't seem to matter) we're not ending up with something that confuses things immediately after? |
@behlendorf btw, i'm aware i have shown the code that produces this, it's on me to make something smaller showing the issue |
Sure, no argument there. The best we can do is make sure the system call is handled atomically.
Can you try disabling atime updates. Those could be dirtying the dnode on a read which might explain what you're seeing. |
This sounds like a consequence of the optimization that made it into 0.7 that improved performance by not waiting on the txg commit. It was not clear to me at the time how to avoid that while maintaining a correct view and it still isn't. Maybe we could let the user decide via a kernel module parameter or dataset property. The behavior can be either fast or strict. My vote is on a module parameter unless the other platforms feel that it should be a dataset option. |
@ryao the |
@behlendorf I forgot and neglected to read through the entire thread. Thanks for refreshing my memory. @cwedgwood Did you try setting |
it looks like (pending) atime updates are causing it to be dirty and therefore not work as i expected; i will edit the issue to reflect this code showing the problem; https://github.com/cwedgwood/unholey i have not tested setting |
@ryao fwiw i'm not using cp, i suspect it was mentioned as a relevant example of where a race might occur and cause problems using a snapshot isn't practical to 'map' a file for holes; the best i can do at present is put in logic to detect cases where zfs is involved and check the atime as well as mtime (i didn't test ctime for example in the case of a rename, i suspect that might be a problem too) |
@cwedgwood I assume it was you who asked yesterday if somebody could run this on Illumos. Anyway the code itself does not run like this in Illumos. There is no NO_ATIME constant in the OS available. The Output when removing the constant is:
Hope this helps |
Update the dirty check in dmu_offset_next() such that dnode's are only considered dirty for the purpose or reporting holes when there are pending data blocks or frees to be synced. This ensures that when there are only metadata updates to be synced (atime) that holes are reported. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#6958
Update the dirty check in dmu_offset_next() such that dnode's are only considered dirty for the purpose or reporting holes when there are pending data blocks or frees to be synced. This ensures that when there are only metadata updates to be synced (atime) that holes are reported. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#6958
Update the dirty check in dmu_offset_next() such that dnode's are only considered dirty for the purpose or reporting holes when there are pending data blocks or frees to be synced. This ensures that when there are only metadata updates to be synced (atime) that holes are reported. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#6958
Update the dirty check in dmu_offset_next() such that dnode's are only considered dirty for the purpose or reporting holes when there are pending data blocks or frees to be synced. This ensures that when there are only metadata updates to be synced (atime) that holes are reported. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#6958
Describe the problem you're observing
When using SEEK_DATA & SEEK_HOLE to map a file it works robustly if the file is not being updated (open elsewhere for write).
When it is open elsewhere for write and being updated, it's common to get wrong results, often a single data segment from [0, len]
Describe how to reproduce the problem
Create a sparse file, verify SEEK_DATA/SEEK_HOLE are working...
read the file (anywhere) and SEEK_DATA/SEEK_HOLE no long function until the atime update from the previous access has flushed
there is a (weak) argument for this in the case of pending file content updates; but i feel it's even weaker again if it's just causes by pending atime from RO access
The text was updated successfully, but these errors were encountered: