Periodic error with SEEK_HOLE results (likely timing-related) in recent git version(s) #6050

siebenmann · 2017-04-21T04:38:34Z

System information

Type	Version/Name
Distribution Name	Fedora
Distribution Version	25
Linux Kernel	4.10.10-200.fc25.x86_64
Architecture	x86_64
ZFS Version	v0.7.0-rc3_207_g06226b5
SPL Version	v0.7.0-rc3 commit `481762f`

Describe the problem you're observing

Sometimes a SEEK_HOLE lseek() will report that a file has a hole when it doesn't. The direct manifestation of the problem is that this causes grep to believe that a text file is a binary file and so report just 'Binary file WHATEVER matches'. This appears to be quite dependent on the specific operations involved and the exact timing (when I ran grep under gdb and was stepping through the code in question it didn't reproduce, although I could see it in strace), and even on the same files it doesn't always happen.

For an affected file, I can reproduce this in the Python repl:

>>> import os
>>> fd = os.open("testfile", os.O_RDONLY)
>>> r = os.read(fd, 32768); sr = os.lseek(fd, 32768, 4); print sr
32768
>>> os.stat("testfile").st_size
33005

This file doesn't have any holes, and if you don't read() or add enough of a pause the results are different:

>>> fd = os.open("testfile", os.O_RDONLY)
>>> sr = os.lseek(fd, 32768, 4); print sr
33005

Inserting a three second sleep between the read and the lseek will sometimes make this manifest and other times not. Four seconds seems to be a sufficient delay to make it reliably return the correct results. Short waits reliably make it return the wrong results.

This doesn't appear tied to the actual file object in question. On the most consistent reproducer, I can copy the file (including with cat) and the copy has the problem as well. On some files this seems extremely consistent; on others it comes and goes; I can run a test command and on consecutive runs it will report:

smtplog: hole 32768 st_size 74737226
smtplog: hole 74737226 st_size 74737226

This may have something to do with the file size. I can readily reproduce this with a completely synthetic file of length 33005, created in Python's REPL with:

>>> w = open("testfile", "w")
>>> w.write("a" * 33005)
>>> w.close()

This reproduces across two different Fedora machines, although they have the same git near-tip ZFS versions.

The text was updated successfully, but these errors were encountered:

loli10K · 2017-04-21T04:48:36Z

Related to 66aca24? @siebenmann can you try and reproduce this same behaviour you're describing on a machine with the tunable zfs_dmu_offset_next_sync set to "1"?

siebenmann · 2017-04-21T04:54:10Z

This doesn't seem to reproduce at all with zfs_dmu_offset_next_sync set to 1. In fact if I toggle zfs_dmu_offset_next_sync from 0 to 1 and back again I can make both grep and my test program flip between working correctly and reporting the problem.

bunder2015 · 2017-04-21T05:30:27Z

I've been chasing this bug in circles all day, at first I thought it was a problem with Gentoo, then a problem with grep, then libtool... Setting zfs_dmu_offset_next_sync to 1 fixed me.

edit: this also appears to be able to fix my inability to compile gcc 5.4.0-r3.

arrandale ~ # grep Binary /var/log/portage/elog/summary.log
         /var/tmp/portage/dev-db/sqlite-3.17.0/temp/environment: line 190: Binary: command not found
         /var/tmp/portage/sys-devel/gettext-0.19.8.1/temp/environment: line 188: Binary: command not found
         /var/tmp/portage/dev-libs/glib-2.50.3-r1/temp/environment: line 273: Binary: command not found
         /var/tmp/portage/dev-libs/libsigc++-2.10.0/temp/environment: line 173: Binary: command not found
         /var/tmp/portage/net-libs/gnutls-3.3.26/temp/environment: line 208: Binary: command not found
         /var/tmp/portage/net-libs/gnutls-3.3.26/temp/environment: line 208: Binary: command not found
         /var/tmp/portage/dev-cpp/glibmm-2.50.0/temp/environment: line 179: Binary: command not found
         /var/tmp/portage/dev-libs/libpcre2-10.22/temp/environment: line 163: Binary: command not found
         /var/tmp/portage/media-libs/libsamplerate-0.1.9/temp/environment: line 177: Binary: command not found
         /var/tmp/portage/net-libs/glib-networking-2.50.0/temp/environment: line 204: Binary: command not found
         /var/tmp/portage/media-libs/mesa-13.0.5/temp/environment: line 393: Binary: command not found
         /var/tmp/portage/dev-libs/gobject-introspection-1.50.0/temp/environment: line 202: Binary: command not found
         /var/tmp/portage/x11-libs/gdk-pixbuf-2.36.5/temp/environment: line 206: Binary: command not found
         /var/tmp/portage/dev-libs/atk-2.22.0/temp/environment: line 181: Binary: command not found
         /var/tmp/portage/media-libs/harfbuzz-1.4.5/temp/environment: line 183: Binary: command not found
         /var/tmp/portage/net-libs/libsoup-2.56.0/temp/environment: line 207: Binary: command not found
         /var/tmp/portage/dev-python/pygobject-3.22.0/temp/environment: line 226: Binary: command not found
         /var/tmp/portage/x11-themes/gnome-themes-standard-3.22.3/temp/environment: line 176: Binary: command not found
         /var/tmp/portage/dev-util/gtk-update-icon-cache-3.22.2/temp/environment: line 179: Binary: command not found
         /var/tmp/portage/sys-apps/util-linux-2.28.2/temp/environment: line 196: Binary: command not found
         /var/tmp/portage/app-accessibility/at-spi2-core-2.22.1/temp/environment: line 200: Binary: command not found
         /var/tmp/portage/app-accessibility/at-spi2-atk-2.22.0/temp/environment: line 191: Binary: command not found
         /var/tmp/portage/x11-libs/gtk+-3.22.11/temp/environment: line 334: Binary: command not found
         /var/tmp/portage/dev-libs/json-glib-1.2.8/temp/environment: line 183: Binary: command not found
         /var/tmp/portage/x11-libs/gtksourceview-3.22.2/temp/environment: line 197: Binary: command not found
         /var/tmp/portage/dev-libs/appstream-glib-0.6.8/temp/environment: line 206: Binary: command not found
         /var/tmp/portage/dev-libs/gdl-3.22.0/temp/environment: line 180: Binary: command not found
         /var/tmp/portage/www-client/firefox-52.0.2/temp/environment: line 441: Binary: command not found
         /var/tmp/portage/www-client/firefox-52.0.2/temp/environment: line 441: Binary: command not found
         /var/tmp/portage/www-client/firefox-52.0.2/temp/environment: line 441: Binary: command not found
         /var/tmp/portage/www-client/firefox-52.0.2/temp/environment: line 441: Binary: command not found
         /var/tmp/portage/x11-libs/vte-0.46.1/temp/environment: line 203: Binary: command not found
         /var/tmp/portage/dev-cpp/gtkmm-3.22.0/temp/environment: line 202: Binary: command not found
         /var/tmp/portage/dev-util/anjuta-3.22.0/temp/environment: line 286: Binary: command not found
         /var/tmp/portage/gnome-extra/gnome-calculator-3.22.3/temp/environment: line 196: Binary: command not found
         /var/tmp/portage/sys-apps/gnome-disk-utility-3.22.1/temp/environment: line 209: Binary: command not found
         /var/tmp/portage/gnome-base/gvfs-1.30.3/temp/environment: line 276: Binary: command not found
         /var/tmp/portage/gnome-extra/gnome-calculator-3.22.3/temp/environment: line 196: Binary: command not found

siebenmann · 2017-04-21T06:28:24Z

I just checked and the filesystems that this happens most readily on for me have the default of atime=on and relatime=off, which I believe means that the inode is dirtied (by an atime update) the moment that I read() from it. This makes the timing stuff I saw sound like it's really 'am I doing the SEEK_HOLE check before the txg is closed out', which by default is every five seconds. This would also explain why the read() is necessary; if there is no read the inode is clean.

As a test of this I temporarily set relatime to on on my filesystem, and this immediately made my test program work reliably. Setting relatime back to the default made it not work again.

bunder2015 · 2017-04-21T06:33:48Z

FWIW, all my pools run with both atime and relatime turned off. relatime gets turned on "temporarily" on the bootfs dataset at boot, which I turn off, and /var/tmp/portage resides on another dataset with both options turned off.

siebenmann · 2017-04-21T16:39:41Z

Now that I look at the code in zfs_holey_common() (in module/zfs/zfs_vnops.c), I don't see how it can possibly work correctly in the case where dmu_offset_next() returns EBUSY because the dnode is dirty, at least for SEEK_HOLE. Based on the comment at the start of dmu_offset_next(), it seems to be the intent that if a dnode is dirty, dmu_offset_next() would report the hole position as the end of the file (which is correct based on the manpage), but this is not what actually happens; instead the code just returns EBUSY without adjusting the off parameter, which remains at whatever initial offset was supplied to lseek().

I suspect that the correct fix is something like the following change for EBUSY handling in zfs_holey_common():

if (error == EBUSY) {
    error = 0;
   if (cmd == SEEK_HOLE)
       noff = file_sz;
}

This explicitly sets the offset that will be returned to the end of the file for the dnode-is-dirty case, basically reporting the dnode as having no holes per the comment before dmu_offset_next(). (A full fix would update the code comments as well.)

behlendorf · 2017-04-21T16:47:35Z

@siebenmann right, in the EBUSY case we can't rely on noff being updated and we need something like the check your proposing. Let's get @dbavatar's thoughts.

dbavatar · 2017-04-21T17:22:57Z

@siebenmann You're right, it's doing the wrong thing in that case. I will submit a change.

dbavatar · 2017-04-21T17:50:09Z

@siebenmann are you able to test dbavatar@4f21a00 ?

siebenmann · 2017-04-21T17:59:52Z

I'm afraid I won't be able to test this in the near future, although it looks good to me.

(Reproduction and testing should be relatively simple in a test environment, since I believe that all you need is a filesystem with atime on and relatime off, then you can use eg the Python example in my earlier comments to test the lseek() return values.)

behlendorf · 2017-04-21T20:59:38Z

@dbavatar when you get a chance could you open a PR with the proposed fix. It'd be great to add the test case as well to the test suite.

dbavatar · 2017-04-21T21:41:55Z

PR is in, but haven't been successful in hitting that path yet.

ryao · 2017-04-22T16:29:50Z

@bunder2015 Would you test dbavatar/zfs@4f21a00?

@dbavatar You could download a Gentoo stage3, put it in a chroot and try emerge =sys-devel/gcc-5.4.0-r3 after unkeywording it. That should reproduce what @bunder2015 said happens. It is not a minimal test case by any means though.

bunder2015 · 2017-04-23T00:27:27Z

Looks good to me now, thanks

behlendorf · 2017-04-24T16:39:02Z

Thanks everyone. The fix in #6053 has been merged.

bunder2015 · 2017-05-17T03:21:22Z

It looks like this is back in zfs master 0.7.0-rc4_12_g3d6da72. Using zfs_dmu_offset_next_sync=1 restores regular behaviour.

/var/tmp/portage/gnome-extra/libgda-5.2.4-r2/temp/environment: line 283: Binary: command not found

dbavatar · 2017-05-17T16:24:37Z

Ugh. Well I guess I will have to figure out how to reproduce. I wonder if this is a problem with the new algorithm or that code would fail anyway if this functionality was disabled entirely, because it really should not in that case.

behlendorf · 2017-05-17T17:35:39Z

Are we sure this is the same issue? We should try and write a little torture test for the ZTS. It could create a bunch of threads which occasionally dirty and lseek(SEEK_HOLE) a shared file looking for incorrectly reported holes. Perhaps something similar in spirit to the existing threadsappend.c test.

bunder2015 · 2017-05-18T12:05:37Z

I set the tunable back to 0 yesterday and ran some more updates, I can't reproduce it now. I'll keep an eye on it for now. Sorry for the noise.

edit: I'd be willing to test anything you can pass along as well.

dbavatar mentioned this issue Apr 21, 2017

Fix lseek result when dnode is dirty #6053

Merged

12 tasks

ryao added Type: Regression Indicates a functional regression Status: Work in Progress Not yet ready for general review labels Apr 24, 2017

ryao added this to the 0.7.0 milestone Apr 24, 2017

behlendorf closed this as completed in 6e03ec4 Apr 24, 2017

bunder2015 mentioned this issue May 19, 2017

empty files being written instead of data #6144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodic error with SEEK_HOLE results (likely timing-related) in recent git version(s) #6050

Periodic error with SEEK_HOLE results (likely timing-related) in recent git version(s) #6050

siebenmann commented Apr 21, 2017

loli10K commented Apr 21, 2017

siebenmann commented Apr 21, 2017

bunder2015 commented Apr 21, 2017 •

edited

Loading

siebenmann commented Apr 21, 2017

bunder2015 commented Apr 21, 2017

siebenmann commented Apr 21, 2017

behlendorf commented Apr 21, 2017

dbavatar commented Apr 21, 2017

dbavatar commented Apr 21, 2017 •

edited

Loading

siebenmann commented Apr 21, 2017

behlendorf commented Apr 21, 2017

dbavatar commented Apr 21, 2017

ryao commented Apr 22, 2017 •

edited

Loading

bunder2015 commented Apr 23, 2017

behlendorf commented Apr 24, 2017

bunder2015 commented May 17, 2017

dbavatar commented May 17, 2017

behlendorf commented May 17, 2017

bunder2015 commented May 18, 2017 •

edited

Loading

Periodic error with SEEK_HOLE results (likely timing-related) in recent git version(s) #6050

Periodic error with SEEK_HOLE results (likely timing-related) in recent git version(s) #6050

Comments

siebenmann commented Apr 21, 2017

System information

Describe the problem you're observing

loli10K commented Apr 21, 2017

siebenmann commented Apr 21, 2017

bunder2015 commented Apr 21, 2017 • edited Loading

siebenmann commented Apr 21, 2017

bunder2015 commented Apr 21, 2017

siebenmann commented Apr 21, 2017

behlendorf commented Apr 21, 2017

dbavatar commented Apr 21, 2017

dbavatar commented Apr 21, 2017 • edited Loading

siebenmann commented Apr 21, 2017

behlendorf commented Apr 21, 2017

dbavatar commented Apr 21, 2017

ryao commented Apr 22, 2017 • edited Loading

bunder2015 commented Apr 23, 2017

behlendorf commented Apr 24, 2017

bunder2015 commented May 17, 2017

dbavatar commented May 17, 2017

behlendorf commented May 17, 2017

bunder2015 commented May 18, 2017 • edited Loading

bunder2015 commented Apr 21, 2017 •

edited

Loading

dbavatar commented Apr 21, 2017 •

edited

Loading

ryao commented Apr 22, 2017 •

edited

Loading

bunder2015 commented May 18, 2017 •

edited

Loading