Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic error with SEEK_HOLE results (likely timing-related) in recent git version(s) #6050

Closed
siebenmann opened this issue Apr 21, 2017 · 19 comments
Labels
Status: Work in Progress Not yet ready for general review Type: Regression Indicates a functional regression
Milestone

Comments

@siebenmann
Copy link
Contributor

System information

Type Version/Name
Distribution Name Fedora
Distribution Version 25
Linux Kernel 4.10.10-200.fc25.x86_64
Architecture x86_64
ZFS Version v0.7.0-rc3_207_g06226b5
SPL Version v0.7.0-rc3 commit 481762f

Describe the problem you're observing

Sometimes a SEEK_HOLE lseek() will report that a file has a hole when it doesn't. The direct manifestation of the problem is that this causes grep to believe that a text file is a binary file and so report just 'Binary file WHATEVER matches'. This appears to be quite dependent on the specific operations involved and the exact timing (when I ran grep under gdb and was stepping through the code in question it didn't reproduce, although I could see it in strace), and even on the same files it doesn't always happen.

For an affected file, I can reproduce this in the Python repl:

>>> import os
>>> fd = os.open("testfile", os.O_RDONLY)
>>> r = os.read(fd, 32768); sr = os.lseek(fd, 32768, 4); print sr
32768
>>> os.stat("testfile").st_size
33005

This file doesn't have any holes, and if you don't read() or add enough of a pause the results are different:

>>> fd = os.open("testfile", os.O_RDONLY)
>>> sr = os.lseek(fd, 32768, 4); print sr
33005

Inserting a three second sleep between the read and the lseek will sometimes make this manifest and other times not. Four seconds seems to be a sufficient delay to make it reliably return the correct results. Short waits reliably make it return the wrong results.

This doesn't appear tied to the actual file object in question. On the most consistent reproducer, I can copy the file (including with cat) and the copy has the problem as well. On some files this seems extremely consistent; on others it comes and goes; I can run a test command and on consecutive runs it will report:

smtplog: hole 32768 st_size 74737226
smtplog: hole 74737226 st_size 74737226

This may have something to do with the file size. I can readily reproduce this with a completely synthetic file of length 33005, created in Python's REPL with:

>>> w = open("testfile", "w")
>>> w.write("a" * 33005)
>>> w.close()

This reproduces across two different Fedora machines, although they have the same git near-tip ZFS versions.

@loli10K
Copy link
Contributor

loli10K commented Apr 21, 2017

Related to 66aca24? @siebenmann can you try and reproduce this same behaviour you're describing on a machine with the tunable zfs_dmu_offset_next_sync set to "1"?

@siebenmann
Copy link
Contributor Author

This doesn't seem to reproduce at all with zfs_dmu_offset_next_sync set to 1. In fact if I toggle zfs_dmu_offset_next_sync from 0 to 1 and back again I can make both grep and my test program flip between working correctly and reporting the problem.

@bunder2015
Copy link
Contributor

bunder2015 commented Apr 21, 2017

I've been chasing this bug in circles all day, at first I thought it was a problem with Gentoo, then a problem with grep, then libtool... Setting zfs_dmu_offset_next_sync to 1 fixed me.

edit: this also appears to be able to fix my inability to compile gcc 5.4.0-r3.

arrandale ~ # grep Binary /var/log/portage/elog/summary.log
         /var/tmp/portage/dev-db/sqlite-3.17.0/temp/environment: line 190: Binary: command not found
         /var/tmp/portage/sys-devel/gettext-0.19.8.1/temp/environment: line 188: Binary: command not found
         /var/tmp/portage/dev-libs/glib-2.50.3-r1/temp/environment: line 273: Binary: command not found
         /var/tmp/portage/dev-libs/libsigc++-2.10.0/temp/environment: line 173: Binary: command not found
         /var/tmp/portage/net-libs/gnutls-3.3.26/temp/environment: line 208: Binary: command not found
         /var/tmp/portage/net-libs/gnutls-3.3.26/temp/environment: line 208: Binary: command not found
         /var/tmp/portage/dev-cpp/glibmm-2.50.0/temp/environment: line 179: Binary: command not found
         /var/tmp/portage/dev-libs/libpcre2-10.22/temp/environment: line 163: Binary: command not found
         /var/tmp/portage/media-libs/libsamplerate-0.1.9/temp/environment: line 177: Binary: command not found
         /var/tmp/portage/net-libs/glib-networking-2.50.0/temp/environment: line 204: Binary: command not found
         /var/tmp/portage/media-libs/mesa-13.0.5/temp/environment: line 393: Binary: command not found
         /var/tmp/portage/dev-libs/gobject-introspection-1.50.0/temp/environment: line 202: Binary: command not found
         /var/tmp/portage/x11-libs/gdk-pixbuf-2.36.5/temp/environment: line 206: Binary: command not found
         /var/tmp/portage/dev-libs/atk-2.22.0/temp/environment: line 181: Binary: command not found
         /var/tmp/portage/media-libs/harfbuzz-1.4.5/temp/environment: line 183: Binary: command not found
         /var/tmp/portage/net-libs/libsoup-2.56.0/temp/environment: line 207: Binary: command not found
         /var/tmp/portage/dev-python/pygobject-3.22.0/temp/environment: line 226: Binary: command not found
         /var/tmp/portage/x11-themes/gnome-themes-standard-3.22.3/temp/environment: line 176: Binary: command not found
         /var/tmp/portage/dev-util/gtk-update-icon-cache-3.22.2/temp/environment: line 179: Binary: command not found
         /var/tmp/portage/sys-apps/util-linux-2.28.2/temp/environment: line 196: Binary: command not found
         /var/tmp/portage/app-accessibility/at-spi2-core-2.22.1/temp/environment: line 200: Binary: command not found
         /var/tmp/portage/app-accessibility/at-spi2-atk-2.22.0/temp/environment: line 191: Binary: command not found
         /var/tmp/portage/x11-libs/gtk+-3.22.11/temp/environment: line 334: Binary: command not found
         /var/tmp/portage/dev-libs/json-glib-1.2.8/temp/environment: line 183: Binary: command not found
         /var/tmp/portage/x11-libs/gtksourceview-3.22.2/temp/environment: line 197: Binary: command not found
         /var/tmp/portage/dev-libs/appstream-glib-0.6.8/temp/environment: line 206: Binary: command not found
         /var/tmp/portage/dev-libs/gdl-3.22.0/temp/environment: line 180: Binary: command not found
         /var/tmp/portage/www-client/firefox-52.0.2/temp/environment: line 441: Binary: command not found
         /var/tmp/portage/www-client/firefox-52.0.2/temp/environment: line 441: Binary: command not found
         /var/tmp/portage/www-client/firefox-52.0.2/temp/environment: line 441: Binary: command not found
         /var/tmp/portage/www-client/firefox-52.0.2/temp/environment: line 441: Binary: command not found
         /var/tmp/portage/x11-libs/vte-0.46.1/temp/environment: line 203: Binary: command not found
         /var/tmp/portage/dev-cpp/gtkmm-3.22.0/temp/environment: line 202: Binary: command not found
         /var/tmp/portage/dev-util/anjuta-3.22.0/temp/environment: line 286: Binary: command not found
         /var/tmp/portage/gnome-extra/gnome-calculator-3.22.3/temp/environment: line 196: Binary: command not found
         /var/tmp/portage/sys-apps/gnome-disk-utility-3.22.1/temp/environment: line 209: Binary: command not found
         /var/tmp/portage/gnome-base/gvfs-1.30.3/temp/environment: line 276: Binary: command not found
         /var/tmp/portage/gnome-extra/gnome-calculator-3.22.3/temp/environment: line 196: Binary: command not found

@siebenmann
Copy link
Contributor Author

I just checked and the filesystems that this happens most readily on for me have the default of atime=on and relatime=off, which I believe means that the inode is dirtied (by an atime update) the moment that I read() from it. This makes the timing stuff I saw sound like it's really 'am I doing the SEEK_HOLE check before the txg is closed out', which by default is every five seconds. This would also explain why the read() is necessary; if there is no read the inode is clean.

As a test of this I temporarily set relatime to on on my filesystem, and this immediately made my test program work reliably. Setting relatime back to the default made it not work again.

@bunder2015
Copy link
Contributor

FWIW, all my pools run with both atime and relatime turned off. relatime gets turned on "temporarily" on the bootfs dataset at boot, which I turn off, and /var/tmp/portage resides on another dataset with both options turned off.

@siebenmann
Copy link
Contributor Author

Now that I look at the code in zfs_holey_common() (in module/zfs/zfs_vnops.c), I don't see how it can possibly work correctly in the case where dmu_offset_next() returns EBUSY because the dnode is dirty, at least for SEEK_HOLE. Based on the comment at the start of dmu_offset_next(), it seems to be the intent that if a dnode is dirty, dmu_offset_next() would report the hole position as the end of the file (which is correct based on the manpage), but this is not what actually happens; instead the code just returns EBUSY without adjusting the off parameter, which remains at whatever initial offset was supplied to lseek().

I suspect that the correct fix is something like the following change for EBUSY handling in zfs_holey_common():

if (error == EBUSY) {
    error = 0;
   if (cmd == SEEK_HOLE)
       noff = file_sz;
}

This explicitly sets the offset that will be returned to the end of the file for the dnode-is-dirty case, basically reporting the dnode as having no holes per the comment before dmu_offset_next(). (A full fix would update the code comments as well.)

@behlendorf
Copy link
Contributor

@siebenmann right, in the EBUSY case we can't rely on noff being updated and we need something like the check your proposing. Let's get @dbavatar's thoughts.

@dbavatar
Copy link
Contributor

@siebenmann You're right, it's doing the wrong thing in that case. I will submit a change.

@dbavatar
Copy link
Contributor

dbavatar commented Apr 21, 2017

@siebenmann are you able to test dbavatar@4f21a00 ?

@siebenmann
Copy link
Contributor Author

I'm afraid I won't be able to test this in the near future, although it looks good to me.

(Reproduction and testing should be relatively simple in a test environment, since I believe that all you need is a filesystem with atime on and relatime off, then you can use eg the Python example in my earlier comments to test the lseek() return values.)

@behlendorf
Copy link
Contributor

@dbavatar when you get a chance could you open a PR with the proposed fix. It'd be great to add the test case as well to the test suite.

@dbavatar
Copy link
Contributor

PR is in, but haven't been successful in hitting that path yet.

@ryao
Copy link
Contributor

ryao commented Apr 22, 2017

@bunder2015 Would you test dbavatar/zfs@4f21a00?

@dbavatar You could download a Gentoo stage3, put it in a chroot and try emerge =sys-devel/gcc-5.4.0-r3 after unkeywording it. That should reproduce what @bunder2015 said happens. It is not a minimal test case by any means though.

@bunder2015
Copy link
Contributor

Looks good to me now, thanks

@ryao ryao added Type: Regression Indicates a functional regression Status: Work in Progress Not yet ready for general review labels Apr 24, 2017
@ryao ryao added this to the 0.7.0 milestone Apr 24, 2017
@behlendorf
Copy link
Contributor

Thanks everyone. The fix in #6053 has been merged.

@bunder2015
Copy link
Contributor

It looks like this is back in zfs master 0.7.0-rc4_12_g3d6da72. Using zfs_dmu_offset_next_sync=1 restores regular behaviour.

/var/tmp/portage/gnome-extra/libgda-5.2.4-r2/temp/environment: line 283: Binary: command not found

@dbavatar
Copy link
Contributor

Ugh. Well I guess I will have to figure out how to reproduce. I wonder if this is a problem with the new algorithm or that code would fail anyway if this functionality was disabled entirely, because it really should not in that case.

@behlendorf
Copy link
Contributor

Are we sure this is the same issue? We should try and write a little torture test for the ZTS. It could create a bunch of threads which occasionally dirty and lseek(SEEK_HOLE) a shared file looking for incorrectly reported holes. Perhaps something similar in spirit to the existing threadsappend.c test.

@bunder2015
Copy link
Contributor

bunder2015 commented May 18, 2017

I set the tunable back to 0 yesterday and ran some more updates, I can't reproduce it now. I'll keep an eye on it for now. Sorry for the noise.

edit: I'd be willing to test anything you can pass along as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Work in Progress Not yet ready for general review Type: Regression Indicates a functional regression
Projects
None yet
Development

No branches or pull requests

6 participants