Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runtime fallback for _optimized_copyfile(). #647

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rkitover
Copy link

@rkitover rkitover commented Dec 7, 2020

In some situations, as described by this thread on Gentoo forums:

https://forums.gentoo.org/viewtopic-t-1088232-start-0.html

the build system's kernel interface is not compatible with the portage
_optimized_copyfile() implementation.

Check for this specific edge-case, where _optimized_copyfile() fails
with errno=22 (Invalid argument), invoke the fallback function and set a
flag to indicate that following invocations will also call the fallback
function.

Also print a warning message at noiselevel=2 when this happens.

In the case where errno=22 is caused by some other reason which the user
should know about, the fallback function should fail in the same (or
similar) way.

Signed-off-by: Rafael Kitover rkitover@gmail.com

In some situations, as described by this thread on Gentoo forums:

https://forums.gentoo.org/viewtopic-t-1088232-start-0.html

the build system's kernel interface is not compatible with the portage
_optimized_copyfile() implementation.

Check for this specific edge-case, where _optimized_copyfile() fails
with errno=22 (Invalid argument), invoke the fallback function and set a
flag to indicate that following invocations will also call the fallback
function.

Also print a warning message at noiselevel=2 when this happens.

In the case where errno=22 is caused by some other reason which the user
should know about, the fallback function should fail in the same (or
similar) way.

Signed-off-by: Rafael Kitover <rkitover@gmail.com>
@zmedico
Copy link
Member

zmedico commented Dec 7, 2020

I prefer to handle fallbacks like this inside the C code. For example see 58d44d3 and dad9cce. Please strace the failure like in https://bugs.gentoo.org/641088#c8 so that we can see which syscall is failing. We need to understand how and why it fails so that we can respond appropriately. Sometimes the appropriate response involves reporting a filesystem problem like in bug 705536. If we ignore errno 22 without a specific reason, then it's likely to hide problems that should really be fixed. I suppose we could add a way to allow users to configure how specific errors are handled.

@rkitover
Copy link
Author

rkitover commented Dec 7, 2020

The syscall that seems to be failing is copy_file_range(), here is what I hope is the most relevant section of my strace:

863501 mprotect(0x7f5a52321000, 4096, PROT_READ) = 0
863501 stat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/", {st_mode=S_IFDIR|0755, st_size=10, ...}) = 0
863501 stat("README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 lstat("README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 stat("README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 mmap(NULL, 8208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a529a5000
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 mmap(NULL, 8208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a5231b000
863501 lstat("/var", {st_mode=S_IFDIR|0755, st_size=16, ...}) = 0
863501 lstat("/var/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=9, ...}) = 0
863501 lstat("/var/tmp/portage", {st_mode=S_IFDIR|0775, st_size=6, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin", {st_mode=S_IFDIR|0775, st_size=4, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1", {st_mode=S_IFDIR|0700, st_size=16, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image", {st_mode=S_IFDIR|0755, st_size=3, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr", {st_mode=S_IFDIR|0755, st_size=5, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share", {st_mode=S_IFDIR|0755, st_size=5, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc", {st_mode=S_IFDIR|0755, st_size=3, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1", {st_mode=S_IFDIR|0755, st_size=10, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 munmap(0x7f5a529a5000, 8208)     = 0
863501 munmap(0x7f5a5231b000, 8208)     = 0
863501 unlink("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/README.md") = 0
863501 stat("README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 mmap(NULL, 8208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a529a5000
863501 getcwd("/var/tmp/portage/app-admin/stow-2.3.1/work/stow-2.3.1", 8192) = 54
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/work/stow-2.3.1", {st_mode=S_IFDIR|0755, st_size=29, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/work/stow-2.3.1", {st_mode=S_IFDIR|0755, st_size=29, ...}) = 0
863501 mmap(NULL, 8208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a5231b000
863501 getcwd("/var/tmp/portage/app-admin/stow-2.3.1/work/stow-2.3.1", 4096) = 54
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/work/stow-2.3.1/README.md", {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 munmap(0x7f5a529a5000, 8208)     = 0
863501 munmap(0x7f5a5231b000, 8208)     = 0
863501 openat(AT_FDCWD, "README.md", O_RDONLY|O_CLOEXEC) = 3
863501 fstat(3, {st_mode=S_IFREG|0644, st_size=4531, ...}) = 0
863501 ioctl(3, TCGETS, 0x7ffd95852ac0) = -1 ENOTTY (Неприменимый к данному устройству ioctl)
863501 mmap(NULL, 8208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a529a5000
863501 mmap(NULL, 8208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a5231b000
863501 lstat("/var", {st_mode=S_IFDIR|0755, st_size=16, ...}) = 0
863501 lstat("/var/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=9, ...}) = 0
863501 lstat("/var/tmp/portage", {st_mode=S_IFDIR|0775, st_size=6, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin", {st_mode=S_IFDIR|0775, st_size=4, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1", {st_mode=S_IFDIR|0700, st_size=16, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image", {st_mode=S_IFDIR|0755, st_size=3, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr", {st_mode=S_IFDIR|0755, st_size=5, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share", {st_mode=S_IFDIR|0755, st_size=5, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc", {st_mode=S_IFDIR|0755, st_size=3, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1", {st_mode=S_IFDIR|0755, st_size=9, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/README.md", 0x7ffd9584c880) = -1 ENOENT (Нет такого файла или каталога)
863501 mmap(NULL, 4112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a52e06000
863501 lstat("/var", {st_mode=S_IFDIR|0755, st_size=16, ...}) = 0
863501 lstat("/var/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=9, ...}) = 0
863501 lstat("/var/tmp/portage", {st_mode=S_IFDIR|0775, st_size=6, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin", {st_mode=S_IFDIR|0775, st_size=4, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1", {st_mode=S_IFDIR|0700, st_size=16, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image", {st_mode=S_IFDIR|0755, st_size=3, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr", {st_mode=S_IFDIR|0755, st_size=5, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share", {st_mode=S_IFDIR|0755, st_size=5, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc", {st_mode=S_IFDIR|0755, st_size=3, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1", {st_mode=S_IFDIR|0755, st_size=9, ...}) = 0
863501 lstat("/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/README.md", 0x7ffd9584c880) = -1 ENOENT (Нет такого файла или каталога)
863501 munmap(0x7f5a5231b000, 8208)     = 0
863501 munmap(0x7f5a529a5000, 8208)     = 0
863501 munmap(0x7f5a52e06000, 4112)     = 0
863501 openat(AT_FDCWD, "/var/tmp/portage/app-admin/stow-2.3.1/image/usr/share/doc/stow-2.3.1/README.md", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
863501 fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
863501 ioctl(4, TCGETS, 0x7ffd95852ac0) = -1 ENOTTY (Неприменимый к данному устройству ioctl)
863501 lseek(3, 0, SEEK_DATA)           = 0
863501 lseek(3, 0, SEEK_HOLE)           = 4531
863501 copy_file_range(3, [0], 4, [0], 4531, 0) = -1 EINVAL (Недопустимый аргумент)
863501 close(4)                         = 0
863501 close(3)                         = 0
863501 getpid()                         = 109

I am on the unstable KDE+systemd profile, just updated last night, linus kernel and git zfs.

My /var/tmp is a zfs file system, here is the information on it:

rkitover@epyc  ➤  zfs get all epyc/gentoo/var/tmp
NAME                 PROPERTY              VALUE                      SOURCE
epyc/gentoo/var/tmp  type                  filesystem                 -
epyc/gentoo/var/tmp  creation              Пн мая 25  0:04 2020  -
epyc/gentoo/var/tmp  used                  88,3G                      -
epyc/gentoo/var/tmp  available             935G                       -
epyc/gentoo/var/tmp  referenced            88,3G                      -
epyc/gentoo/var/tmp  compressratio         1.19x                      -
epyc/gentoo/var/tmp  mounted               yes                        -
epyc/gentoo/var/tmp  quota                 none                       default
epyc/gentoo/var/tmp  reservation           none                       default
epyc/gentoo/var/tmp  recordsize            128K                       default
epyc/gentoo/var/tmp  mountpoint            legacy                     local
epyc/gentoo/var/tmp  sharenfs              off                        default
epyc/gentoo/var/tmp  checksum              sha256                     inherited from epyc
epyc/gentoo/var/tmp  compression           zle                        local
epyc/gentoo/var/tmp  atime                 off                        inherited from epyc
epyc/gentoo/var/tmp  devices               on                         default
epyc/gentoo/var/tmp  exec                  on                         local
epyc/gentoo/var/tmp  setuid                off                        local
epyc/gentoo/var/tmp  readonly              off                        default
epyc/gentoo/var/tmp  zoned                 off                        default
epyc/gentoo/var/tmp  snapdir               hidden                     default
epyc/gentoo/var/tmp  aclmode               discard                    default
epyc/gentoo/var/tmp  aclinherit            restricted                 default
epyc/gentoo/var/tmp  createtxg             2219                       -
epyc/gentoo/var/tmp  canmount              on                         default
epyc/gentoo/var/tmp  xattr                 on                         default
epyc/gentoo/var/tmp  copies                1                          default
epyc/gentoo/var/tmp  version               5                          -
epyc/gentoo/var/tmp  utf8only              off                        -
epyc/gentoo/var/tmp  normalization         none                       -
epyc/gentoo/var/tmp  casesensitivity       sensitive                  -
epyc/gentoo/var/tmp  vscan                 off                        default
epyc/gentoo/var/tmp  nbmand                off                        default
epyc/gentoo/var/tmp  sharesmb              off                        default
epyc/gentoo/var/tmp  refquota              none                       default
epyc/gentoo/var/tmp  refreservation        none                       default
epyc/gentoo/var/tmp  guid                  2185585781077216968        -
epyc/gentoo/var/tmp  primarycache          all                        default
epyc/gentoo/var/tmp  secondarycache        all                        default
epyc/gentoo/var/tmp  usedbysnapshots       0B                         -
epyc/gentoo/var/tmp  usedbydataset         88,3G                      -
epyc/gentoo/var/tmp  usedbychildren        0B                         -
epyc/gentoo/var/tmp  usedbyrefreservation  0B                         -
epyc/gentoo/var/tmp  logbias               latency                    default
epyc/gentoo/var/tmp  objsetid              3465                       -
epyc/gentoo/var/tmp  dedup                 off                        default
epyc/gentoo/var/tmp  mlslabel              none                       default
epyc/gentoo/var/tmp  sync                  standard                   default
epyc/gentoo/var/tmp  dnodesize             legacy                     default
epyc/gentoo/var/tmp  refcompressratio      1.19x                      -
epyc/gentoo/var/tmp  written               88,3G                      -
epyc/gentoo/var/tmp  logicalused           91,2G                      -
epyc/gentoo/var/tmp  logicalreferenced     91,2G                      -
epyc/gentoo/var/tmp  volmode               default                    default
epyc/gentoo/var/tmp  filesystem_limit      none                       default
epyc/gentoo/var/tmp  snapshot_limit        none                       default
epyc/gentoo/var/tmp  filesystem_count      none                       default
epyc/gentoo/var/tmp  snapshot_count        none                       default
epyc/gentoo/var/tmp  snapdev               hidden                     default
epyc/gentoo/var/tmp  acltype               off                        default
epyc/gentoo/var/tmp  context               none                       default
epyc/gentoo/var/tmp  fscontext             none                       default
epyc/gentoo/var/tmp  defcontext            none                       default
epyc/gentoo/var/tmp  rootcontext           none                       default
epyc/gentoo/var/tmp  relatime              on                         temporary
epyc/gentoo/var/tmp  redundant_metadata    all                        default
epyc/gentoo/var/tmp  overlay               on                         default
epyc/gentoo/var/tmp  encryption            off                        default
epyc/gentoo/var/tmp  keylocation           none                       default
epyc/gentoo/var/tmp  keyformat             none                       default
epyc/gentoo/var/tmp  pbkdf2iters           0                          default
epyc/gentoo/var/tmp  special_small_blocks  0                          default

Of note is compression=zle.

I tried running test_copyfile.py with the extension built --inplace and it passes, I will do some more digging to see why.

The strace invocation is: sudo strace -f -o strace.txt emerge app-admin/stow.

@zmedico
Copy link
Member

zmedico commented Dec 7, 2020

We've found zfs bugs before, like bug 635002, and this could be another one. Given those lseek results, the copy_file_range call should succeed.

863501 lseek(3, 0, SEEK_DATA)           = 0
863501 lseek(3, 0, SEEK_HOLE)           = 4531
863501 copy_file_range(3, [0], 4, [0], 4531, 0) = -1 EINVAL (Недопустимый аргумент)

The copy_file_range man page says that these are the reasons for it to fail with EINVAL, and none of them apply here:

EINVAL The flags argument is not 0.

EINVAL fd_in and fd_out refer to the same file and the source and target ranges overlap.

EINVAL Either fd_in or fd_out is not a regular file.

@rkitover
Copy link
Author

rkitover commented Dec 7, 2020

Thank you, my next step will be to reproduce the specific error condition and I will report back here.

@zmedico
Copy link
Member

zmedico commented Dec 17, 2020

Looks like openzfs/zfs#11151.

@rkitover
Copy link
Author

That does look like the exact issue.

I was going to test with an older kernel to make sure that fixes it, but I'm having all kinds of serious problems with my linux install right now.

As far as this pull request goes, I agree that the user should know about breakage on their system, but if you look on those forum posts there are other situations where this problem arises. And it could be argued that installing software is a critical system function and it should be simple to temporarily alleviate these problems, to e.g. install a new version of whatever is causing the problem, for example.

If you would like a different implementation for this, perhaps activated with a command line switch or environment variable, or whatever design would make sense to you, I would be happy to do that.

Happy holidays!

@zmedico
Copy link
Member

zmedico commented Dec 20, 2020

As far as this pull request goes, I agree that the user should know about breakage on their system, but if you look on those forum posts there are other situations where this problem arises. And it could be argued that installing software is a critical system function and it should be simple to temporarily alleviate these problems, to e.g. install a new version of whatever is causing the problem, for example.

True.

If you would like a different implementation for this, perhaps activated with a command line switch or environment variable, or whatever design would make sense to you, I would be happy to do that.

Yes, I would like to add a FEATURES setting to enable the fallback, and would like to display a suggestion to enable this feature as a workaround when _optimized_copyfile fails. We can add a lazily initialized attribute to the portage.data module which will be true when the fallback is enabled via FEATURES setting, and _optimized_copyfile will be able to check the value of that attribute in order to trigger the new behavior. I've opened this bug report to track the issue:

https://bugs.gentoo.org/760929

Happy holidays!

Happy holidays!

@rkitover
Copy link
Author

I fixed my linux install issues and would like to report that I compiled linux kernel 5.9 with zfs stable 2.0.0 and the bug does not manifest.

I will work on this FEATURES setting as you describe it.

@zmedico
Copy link
Member

zmedico commented Dec 21, 2020

Thanks! @RinCat has also expressed interest in a feature like this.

@zmedico
Copy link
Member

zmedico commented Apr 14, 2021

There's a new zfs issue report involving portage here:

openzfs/zfs#11900

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants