Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some out of order ZIL tx type not getting synced #8769

Closed
tuxoko opened this issue May 20, 2019 · 0 comments
Closed

Some out of order ZIL tx type not getting synced #8769

tuxoko opened this issue May 20, 2019 · 0 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@tuxoko
Copy link
Contributor

tuxoko commented May 20, 2019

System information

Type Version/Name
Distribution Name
Distribution Version
Linux Kernel
Architecture
ZFS Version 0..7.13, master
SPL Version

Describe the problem you're observing

Out of order tx type like TX_WRITE and TX_SETATTR not always sync properly

Describe how to reproduce the problem

ops2.sh
https://gist.github.com/tuxoko/82b99ef6de0eba240bad7cdedbd5b1bf
This script is a randomly generated script of various operations and sync at the end.

cmp_ops2.sh
https://gist.github.com/tuxoko/df3a1080c9c4cfdb4b3b83139d34ea91
This script compare the contents of two directories.

$ mkdir dir1 dir2
$ ./ops2.sh dir1
$ zpool sync testpool
$ ./ops2.sh dir2 && sudo sh -c "echo b > /proc/sysrq-trigger"
(after reboot)
$ ./cmp_ops2.sh dir1 dir2

The compare script shows a lot of difference in file contents and file mode bits.

file content differs: ./dir.9hBMJBr3/dir.D5Uyl44C/dir.uBaArYlx/dir.yTbGodl9/dir.fMxeVi80/dir.416AMhS4/link.psZFsTWJ
fa50e2c4537e0997f7f052e483d125de
bad42a2f4688c8a664a2cc0dd004e78e
file content differs: ./dir.9hBMJBr3/dir.D5Uyl44C/dir.uBaArYlx/dir.yTbGodl9/dir.fMxeVi80/dir.rYjUNpSx/dir.wgxu8KAe/link.2rF9wws5
6506e52d1cb7ef8badcaa2bcdff705fa
f397833d3d80bacebe04d731e761afda
File mode differs: ./dir.9hBMJBr3/dir.D5Uyl44C/dir.uBaArYlx/dir.yTbGodl9/dir.lENocird/link.j1zkEFhr
8183
8188
file content differs: ./dir.9hBMJBr3/dir.D5Uyl44C/dir.uBaArYlx/dir.yTbGodl9/dir.O2zEc7qN/dir.SXUgfqtd/dir.aMhFYqY0/link.BY7vedN7
44b0356861ea8ddb58746af49d5e13aa
e3cccc523b8c947dfaf42ac97ea9a303
File mode differs: ./dir.9hBMJBr3/dir.D5Uyl44C/dir.uBaArYlx/dir.yTbGodl9/dir.O2zEc7qN/dir.SXUgfqtd/dir.K4N4sGEQ/file.gJODZsVd
8183
81bd
file content differs: ./dir.9hBMJBr3/dir.D5Uyl44C/dir.uBaArYlx/dir.yTbGodl9/dir.O2zEc7qN/dir.SXUgfqtd/dir.K4N4sGEQ/file.gJODZsVd
391b6ae9002c5386806ac5a59d86a5a5
897316929176464ebc9ad085f31e7284
file content differs: ./dir.9hBMJBr3/dir.D5Uyl44C/dir.uBaArYlx/dir.yTbGodl9/file.aC2p0wOg
a03cde747dea08b70857d4c4c14105b0
a4a590da7305d623b10606d6bac181fe
File mode differs: ./dir.9hBMJBr3/dir.D5Uyl44C/dir.uBaArYlx/dir.yTbGodl9/link.3CmTukN6
8187
81a3
file content differs: ./dir.9hBMJBr3/dir.NFO3TJr5/dir.mz9yFR3k/dir.JHUVYRi2/dir.1wB5zQiJ/link.H2PHDW0U
6506e52d1cb7ef8badcaa2bcdff705fa
f397833d3d80bacebe04d731e761afda
file content differs: ./dir.9hBMJBr3/dir.NFO3TJr5/link.aK6bDBG2
6506e52d1cb7ef8badcaa2bcdff705fa
f397833d3d80bacebe04d731e761afda
file content differs: ./dir.9hBMJBr3/dir.ZAVGa9rf/dir.2bHFOySm/link.6HslzwWy
6506e52d1cb7ef8badcaa2bcdff705fa
f397833d3d80bacebe04d731e761afda
file content differs: ./dir.9hBMJBr3/dir.ZAVGa9rf/dir.P5GuFEP6/link.7Dpt0gAq
a03cde747dea08b70857d4c4c14105b0
a4a590da7305d623b10606d6bac181fe
file content differs: ./dir.9hBMJBr3/link.UjjJGPBX
44b0356861ea8ddb58746af49d5e13aa
e3cccc523b8c947dfaf42ac97ea9a303
File mode differs: ./dir.fOfmh0Ox/link.S1DZo8bZ
81b7
81b4

Note, this can also be tested using zpool freeze.
However, when I tried it on master branch, while it does reproduce the problem, it somehow corrupted the pool in a ways that if you tried it the second time, it will trigger assert on import.

Include any warning/errors/backtraces from the system logs

@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 24, 2019
tuxoko pushed a commit to tuxoko/zfs that referenced this issue Jul 18, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Closes openzfs#8769
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
tuxoko pushed a commit to tuxoko/zfs that referenced this issue Jul 18, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Closes openzfs#8769
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
tuxoko pushed a commit to tuxoko/zfs that referenced this issue Jul 18, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Closes openzfs#8769
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
c0d3z3r0 added a commit to c0d3z3r0/zfs that referenced this issue Jul 26, 2019
…dlink

commit 2fb3d38
Author: Chunwei Chen <david.chen@nutanix.com>
Date:   Thu Jul 18 10:18:20 2019 -0700

    Fix out-of-order ZIL txtype lost on hardlinked files

    We should only call zil_remove_async when an object is removed. However,
    in current implementation, it is called whenever TX_REMOVE is called. In
    the case of hardlinked file, every unlink will generate TX_REMOVE and
    causing operations to be dropped even when the object is not removed.

    We fix this by only calling zil_remove_async when the file is fully
    unlinked.

    Closes openzfs#8769
    Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
c0d3z3r0 pushed a commit to c0d3z3r0/zfs that referenced this issue Jul 26, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Closes openzfs#8769
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
c0d3z3r0 added a commit to c0d3z3r0/zfs that referenced this issue Jul 28, 2019
…dlink

commit 2fb3d38
Author: Chunwei Chen <david.chen@nutanix.com>
Date:   Thu Jul 18 10:18:20 2019 -0700

    Fix out-of-order ZIL txtype lost on hardlinked files

    We should only call zil_remove_async when an object is removed. However,
    in current implementation, it is called whenever TX_REMOVE is called. In
    the case of hardlinked file, every unlink will generate TX_REMOVE and
    causing operations to be dropped even when the object is not removed.

    We fix this by only calling zil_remove_async when the file is fully
    unlinked.

    Closes openzfs#8769
    Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 21, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#8769
Closes openzfs#9061
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 22, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#8769
Closes openzfs#9061
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 23, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#8769
Closes openzfs#9061
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 17, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#8769
Closes openzfs#9061
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 18, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#8769
Closes openzfs#9061
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 23, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#8769
Closes openzfs#9061
tonyhutter pushed a commit that referenced this issue Sep 26, 2019
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #8769
Closes #9061
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants