Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM Panic leading to unmountable zvol #14914

Open
fake-name opened this issue May 30, 2023 · 7 comments
Open

OOM Panic leading to unmountable zvol #14914

fake-name opened this issue May 30, 2023 · 7 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@fake-name
Copy link

fake-name commented May 30, 2023

System information

Type Version/Name
Distribution Name Linux
Distribution Version Debian GNU/Linux 11/Proxmox 7.4-3
Kernel Version MP PVE 5.15.107-2 (2023-05-10T09:10Z)
Architecture x86_64
OpenZFS Version zfs-2.1.11-pve1, zfs-kmod-2.1.11-pve1

Describe the problem you're observing

I have a ~72.8 TB 8-disk Raid-Z2.

The zvol was about 80% full, and then I had a out-of-control process create a single 10+ TB file which ran the pool completely out of space.

I then attempted to delete the file, and the system hosting the zpool then OOMed and crashed.

Exploring the system, I can import the pool as long as I do not mount the zvol. If I do mount the dataset, some portion of the zfs kernel module consumes all the ram in the system in about 3 seconds (The host here has 32 GB of ram), and then the machine kernel panics due to OOM.

I cannot easily get better logs because the machine goes from working fine to completely hosed in about 3 seconds.

There are no snapshots or anything else in this pool. It's just a single pool with a single zvol that takes the entire pool's space. No L2ARC, etc...

Describe how to reproduce the problem

What I have tried to fix the issue:

  • Adding a bunch of swap:
    Whatever is using the RAM, it cannot use swap.
  • Semi-random futzing with settings in /sys/module/zfs/parameters to reduce anything which seemed to have high memory limits to a few GB, as well as setting limits on any parameters that were set to zero.
    No change in behaviour
  • zpool import --rewind-to-checkpoint <recent-checkpoint>
    This caused the pool to need to scrub, which completed with some file corruption I can deal with. This was at the point where I was experimenting with any commands that seemed like they may help. This did cause the pool to change from being 100% full (0 bytes free) to 99% full (~500 GB free), but did not resolve the crash issue.

This is hard to reconstruct since the OOM death means I don't have .bash_history from the changes I made before each crash.

Describe how to reproduce the problem

I'm not certain, but I think:

  • create a large (?) zvol with lots of assorted files of assorted sizes, ~80% full.
  • Create a single large file that runs the pool up to 100%
  • Delete that file
  • Crash?

I cannot easily replicate the issue, since I do not have another system to experiment with. This is a personal machine, so I don't have a spare (or the ability to just nuke everything and restore from backup).

Misc:

root@nastwo:~# zpool status
  pool: taank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 2.55M in 2 days 08:37:29 with 43 errors on Tue May 23 06:20:58 2023
config:

        NAME                        STATE     READ WRITE CKSUM
        taank                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x5000cca264c6f926  ONLINE       0     0     0
            wwn-0x5000cca264c7b40a  ONLINE       0     0     0
            wwn-0x5000cca26a3ee9fc  ONLINE       0     0     0
            wwn-0x5000cca26a406564  ONLINE       0     0     0
            wwn-0x5000cca26a40902c  ONLINE       0     0     0
            wwn-0x5000cca26a409f88  ONLINE       0     0     0
            wwn-0x5000cca264c3006b  ONLINE       0     0     0
            wwn-0x5000cca264c4843c  ONLINE       0     0     0

root@nastwo:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
taank  72.8T  72.3T   489G        -         -    72%    99%  1.00x    ONLINE  -

root@nastwo:~# zfs list
NAME    USED  AVAIL     REFER  MOUNTPOINT
taank  51.4T   219G     51.4T  /media/new_store


I'm not sure where to go from here, I have no experience trying to debug this kind of zfs issue.

Would this constitute a security issue? It seems like any unprivileged user who can write a file to a ZFS pool can therefore brick the pool. I can't see how that wouldn't be a major concern.

Possibly related: #6783

@fake-name fake-name added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 30, 2023
@fake-name
Copy link
Author

fake-name commented May 30, 2023

Note that I was just able to replicate this in FreeBSD 13.2 RELEASE (off a live-usb drive) as well. This issue appears to not be linux specific.

@rincebrain
Copy link
Contributor

That's not a zvol, that's a filesystem. zvols don't have mountpoints, they're block devices.

zfs get all taank might be useful.

@fake-name
Copy link
Author

That's not a zvol, that's a filesystem. zvols don't have mountpoints, they're block devices.

Whoops, I thought a zvol was the a mountable region in a pool.

zfs get all taank might be useful.

root@nastwo:~# zpool get all taank
NAME   PROPERTY                       VALUE                          SOURCE
taank  size                           72.8T                          -
taank  capacity                       99%                            -
taank  altroot                        -                              default
taank  health                         ONLINE                         -
taank  guid                           1249401655927352147            -
taank  version                        -                              default
taank  bootfs                         -                              default
taank  delegation                     on                             default
taank  autoreplace                    off                            default
taank  cachefile                      -                              default
taank  failmode                       wait                           default
taank  listsnapshots                  off                            default
taank  autoexpand                     off                            default
taank  dedupratio                     1.00x                          -
taank  free                           489G                           -
taank  allocated                      72.3T                          -
taank  readonly                       off                            -
taank  ashift                         0                              default
taank  comment                        -                              default
taank  expandsize                     -                              -
taank  freeing                        0                              -
taank  fragmentation                  72%                            -
taank  leaked                         0                              -
taank  multihost                      off                            default
taank  checkpoint                     -                              -
taank  load_guid                      698661424586459753             -
taank  autotrim                       off                            default
taank  compatibility                  off                            default
taank  feature@async_destroy          enabled                        local
taank  feature@empty_bpobj            enabled                        local
taank  feature@lz4_compress           active                         local
taank  feature@multi_vdev_crash_dump  enabled                        local
taank  feature@spacemap_histogram     active                         local
taank  feature@enabled_txg            active                         local
taank  feature@hole_birth             active                         local
taank  feature@extensible_dataset     active                         local
taank  feature@embedded_data          active                         local
taank  feature@bookmarks              enabled                        local
taank  feature@filesystem_limits      enabled                        local
taank  feature@large_blocks           enabled                        local
taank  feature@large_dnode            enabled                        local
taank  feature@sha512                 enabled                        local
taank  feature@skein                  enabled                        local
taank  feature@edonr                  enabled                        local
taank  feature@userobj_accounting     active                         local
taank  feature@encryption             enabled                        local
taank  feature@project_quota          active                         local
taank  feature@device_removal         enabled                        local
taank  feature@obsolete_counts        enabled                        local
taank  feature@zpool_checkpoint       enabled                        local
taank  feature@spacemap_v2            active                         local
taank  feature@allocation_classes     enabled                        local
taank  feature@resilver_defer         enabled                        local
taank  feature@bookmark_v2            enabled                        local
taank  feature@redaction_bookmarks    enabled                        local
taank  feature@redacted_datasets      enabled                        local
taank  feature@bookmark_written       enabled                        local
taank  feature@log_spacemap           active                         local
taank  feature@livelist               enabled                        local
taank  feature@device_rebuild         enabled                        local
taank  feature@zstd_compress          enabled                        local
taank  feature@draid                  enabled                        local

@rincebrain
Copy link
Contributor

Could you possibly post the output of zfs get all taank? You posted the output of zpool get all taank.

@fake-name
Copy link
Author

fake-name commented May 31, 2023

Gosh, derp:

root@nastwo:~# zfs get all taank
NAME   PROPERTY              VALUE                  SOURCE
taank  type                  filesystem             -
taank  creation              Sat Nov  5 21:30 2022  -
taank  used                  51.4T                  -
taank  available             219G                   -
taank  referenced            51.4T                  -
taank  compressratio         1.00x                  -
taank  mounted               no                     -
taank  quota                 none                   default
taank  reservation           none                   default
taank  recordsize            128K                   default
taank  mountpoint            /tmp/new_store/        local
taank  sharenfs              off                    default
taank  checksum              on                     default
taank  compression           off                    default
taank  atime                 on                     default
taank  devices               on                     default
taank  exec                  on                     default
taank  setuid                on                     default
taank  readonly              off                    default
taank  zoned                 off                    default
taank  snapdir               hidden                 default
taank  aclmode               discard                default
taank  aclinherit            restricted             default
taank  createtxg             1                      -
taank  canmount              on                     default
taank  xattr                 on                     default
taank  copies                1                      default
taank  version               5                      -
taank  utf8only              off                    -
taank  normalization         none                   -
taank  casesensitivity       sensitive              -
taank  vscan                 off                    default
taank  nbmand                off                    default
taank  sharesmb              off                    default
taank  refquota              none                   default
taank  refreservation        none                   default
taank  guid                  1665898478647469187    -
taank  primarycache          all                    default
taank  secondarycache        all                    default
taank  usedbysnapshots       0B                     -
taank  usedbydataset         51.4T                  -
taank  usedbychildren        2.05G                  -
taank  usedbyrefreservation  0B                     -
taank  logbias               latency                default
taank  objsetid              54                     -
taank  dedup                 off                    default
taank  mlslabel              none                   default
taank  sync                  standard               default
taank  dnodesize             legacy                 default
taank  refcompressratio      1.00x                  -
taank  written               51.4T                  -
taank  logicalused           51.2T                  -
taank  logicalreferenced     51.2T                  -
taank  volmode               default                default
taank  filesystem_limit      none                   default
taank  snapshot_limit        none                   default
taank  filesystem_count      none                   default
taank  snapshot_count        none                   default
taank  snapdev               hidden                 default
taank  acltype               off                    default
taank  context               none                   default
taank  fscontext             none                   default
taank  defcontext            none                   default
taank  rootcontext           none                   default
taank  relatime              off                    default
taank  redundant_metadata    all                    default
taank  overlay               on                     default
taank  encryption            off                    default
taank  keylocation           none                   default
taank  keyformat             none                   default
taank  pbkdf2iters           0                      default
taank  special_small_blocks  0                      default

Note: the mountpoint has changed because I needed it to be somewhere writable when experimenting with FreeBSD off a thumbdrive.

@rincebrain
Copy link
Contributor

Getting it to be able to take a core dump when it, well, takes a dump might be informative. Something like this, I would think.

My guess would be that since that single file definitely clears the size threshold for "throw it on the async destroy queue", something in that queue isn't limiting how much memory it uses well, and boom is going the dynamite.

@fake-name
Copy link
Author

fake-name commented Jun 3, 2023

I spent some time trying to get kernel dumps working, but it turns out on proxmox you need to build your own kernel to get symbols (arrrrgh).

Anyways, I dug out a server with 128 GB of ram, and it was able to mount the volume fine (it seems to have peaked at 36 GB of ram in the kernel somewhere).

I need this system working more then I want to help fix this bug. The underlying issue is NOT FIXED, throwing more RAM at it is just a workaround.


One thing of note is I think the issue might be the destroy queue itself, or something similar. Watching memory usage, the total consumed shot up to the max (~36GB), and then slowly ramped down over the next ~20-30 minutes. It sure looks like something dumped a HUGE list of something into a queue, which couldn't keep up so it bloated up and used lots of RAM.

While the memory usage was ramping down, there was %100 disk utilization for the drives in the zpool.

Annoyingly, wherever in the kernel the RAM is used, it doesn't show up in something like top, so I can't narrow down which kernel thread is the one actually using the RAM in question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants