Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dereference of dataset_t after dropping reference #14522

Closed
mmaybee opened this issue Feb 23, 2023 · 3 comments · Fixed by #14523
Closed

Dereference of dataset_t after dropping reference #14522

mmaybee opened this issue Feb 23, 2023 · 3 comments · Fixed by #14523
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@mmaybee
Copy link
Contributor

mmaybee commented Feb 23, 2023

System information

Type Version/Name
Distribution Name
Distribution Version
Kernel Version
Architecture
OpenZFS Version after 34ce4c4

Describe the problem you're observing

With a recent change to dsl_dataset_sync_done() (in 34ce4c4), a dereference to ds was added after the call to dmu_buf_rele() (for the ds-ds_dbuf) at the end of the function. Since the dataset_t can be freed after the last rele, this can lead to a reference to freed memory (and a panic).

The call to dmu_buf_rele() needs to be moved to the end of the function... or, better yet, perhaps moved into the caller (since the ds is being passed in).

Include any warning/errors/backtraces from the system logs

[11457.108930]: VERIFY0(0 == dnode_hold(mos, object, FTAG, &dn)) failed (0 == 22)
[11457.112832]: PANIC at dmu_object.c:465:dmu_object_zapify()
[11457.116154]: Kernel panic - not syncing: VERIFY0(0 == dnode_hold(mos, object, FTAG, &dn)) failed (0 == 22)
[11457.121852]: CPU: 0 PID: 849805 Comm: txg_sync Kdump: loaded Tainted: P           OE     5.4.0-1096-dx2023021019-60d1a250c-aws #104
[11457.128811]: Hardware name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017Rk

[11457.132661]: Call Trace:
[11457.134303]:  dump_stack+0x6d/0x8b
[11457.136431]:  panic+0x101/0x2e3
[11457.140412]:  spl_panic+0xcf/0x102 [spl]
[11457.178926]:  dmu_object_zapify+0xec/0x100 [zfs]
[11457.183876]:  dsl_dataset_activate_feature+0x96/0x1a0 [zfs]
[11457.189312]:  dsl_dataset_sync_done+0xe2/0x140 [zfs]
[11457.194435]:  dsl_pool_sync+0x272/0x400 [zfs]Jk

[11457.199212]:  spa_sync_iterate_to_convergence+0xf1/0x250 [zfs]
[11457.204787]:  spa_sync+0x31c/0x6c0 [zfs]
[11457.209297]:  txg_sync_thread+0x22d/0x2d0 [zfs]
[11457.214176]:  ? txg_dispatch_callbacks+0xf0/0xf0 [zfs]
[11457.219266]:  thread_generic_wrapper+0x83/0xa0 [spl]
[11457.224219]:  kthread+0x104/0x140
@mmaybee mmaybee added the Type: Defect Incorrect behavior (e.g. crash, hang) label Feb 23, 2023
@mmaybee
Copy link
Contributor Author

mmaybee commented Feb 23, 2023

@gamanakis Could you take a look at this please?

@gamanakis
Copy link
Contributor

@mmaybee thanks for reporting this. I think it may be best to move the feature activation code before the dmu_buf_rele().

@gamanakis
Copy link
Contributor

gamanakis commented Feb 23, 2023

Took a look at the callers of dsl_dataset_sync_done(), it may make more sense to move dmu_buf_rele() after dsl_dataset_sync_done() there, as you suggested. Will prepare a PR.

mmaybee pushed a commit that referenced this issue Feb 24, 2023
Otherwise the dataset may be freed after the last dmu_buf_rele() leading
to a panic.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14522
Closes #14523
gamanakis added a commit to gamanakis/zfs that referenced this issue Feb 27, 2023
Otherwise the dataset may be freed after the last dmu_buf_rele() leading
to a panic.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#14522
Closes openzfs#14523
behlendorf pushed a commit that referenced this issue Mar 1, 2023
Otherwise the dataset may be freed after the last dmu_buf_rele() leading
to a panic.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14522
Closes #14523
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Mar 3, 2023
Otherwise the dataset may be freed after the last dmu_buf_rele() leading
to a panic.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#14522
Closes openzfs#14523
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants