Risk of metadata partition filling up with LVM Thin Provisioning #3243

qubesuser · 2017-10-28T11:53:24Z

According to "man lvmthin" section "Metadata space exhaustion", exhaustion of a thin pool's metadata space can cause "inconsistent thin pool metadata and inconsistent file systems" (and even if that's not accurate it would necessarily stop thin volume creation and thus starting and creating VMs), so it would be important to avoid it.

Currently Qubes installer seems to create a ~100MB metadata partition and also leave ~15-16GB free. According to "man lvmthin" it seems possible to have the metadata partition automatically grow, but it's not clear how, and not clear whether this is enabled by the Qubes installer.

It would be nice to make sure that either the metadata partition is automatically extended up to the 16GB limit or just create a 16GB metadata partition from the start.

brendanhoar · 2019-06-24T11:51:43Z

Chris (@tasket) has also noted the above issue. On my primary Qubes machine, the default lvm metadata allocation is 112MB for a 472GB data allocation. Increasing it seems like a no-brainer, as some Qubes features make heavy use of it.

Suggested mechanisms to avoid metadata full->corruption in Qubes:

If possible, on the next release, modify anaconda config to double the default lvm metadata allocation. I suspect that would cover 99.99% of use cases and dramatically reduce this failure mode.
In a future release, add metadata monitoring to the Qubes Disk Space Monitor widget.
In a future release, consider adding a "pause all VMs except dom0 and warn user" at threshold data and/or metadata values of usage (e.g. 95%). Via the widget or some other mechanism.
In addition, pull in most recent stable LVM release. E.g. April's LVM release includes some logic to auto-extend the metadata (but only when a LVM volume is resized...which can be configured to happen automatically): https://sourceware.org/git/?p=lvm2.git;a=commit;h=e27d0271557d4b93e87a70854b3c7f1cc6008155

"committer Zdenek Kabelac <zkabelac@redhat.com>
Wed, 3 Apr 2019 11:28:22 +0000 (13:28 +0200)"
"Automatically grow thin metadata, when thin data gets too big."

B

tasket · 2019-06-24T13:03:57Z

I definitely agree with the first three.

However #4 might not help because it appears to be using an estimation routine that 's also used when the pool is initialized. Therefore, if Qubes installer is doubling the initial tmeta size, this patch would only extend tmeta further if the user were to increase pool size to more than double.

So I think #2+3 is much more valuable, and the widget could even prompt the user, offering to extend tmeta itself.

Long-term considerations:

Recalling the discussion we had in qubes-users on this topic, I came to a conclusion that defragmentation was a missing ingredient. It seems necessary because it keeps metadata complexity/size to manageable ratios with respect to the overall pool size. With defragmentation, you can handle metadata growth the way NTFS does which is far less likely to lead to catastrophic data loss.

I don't know what the likelihood is of a pool defrag tool being accepted in the thin-provisioning-tools package. There is a defragger for regular LVM volumes which has languished for over a decade.

awokd · 2019-08-11T23:04:45Z

modify anaconda config to double the default lvm metadata allocation

Can this be a "must have" for R4.1? Quadruple even. The current failure mode is punishing.

marmarek · 2019-08-18T15:26:34Z

I'm trying to find the right value and I'm confused. Metadata size is chosen here. Internally it uses thin_metadata_size tool.
So, lets play a little. Input data:

$ sudo lvs -o+chunksize -a qubes_dom0/pool00 qubes_dom0/pool00_tmeta qubes_dom0/pool00_tdata
  LV             VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Chunk  
  pool00         qubes_dom0 twi-aotz-- 208.20g             94.03  79.93                            128.00k
  [pool00_tdata] qubes_dom0 Twi-ao---- 208.20g                                                          0 
  [pool00_tmeta] qubes_dom0 ewi-ao---- 108.00m                                                          0

First, lets try to reproduce this calculation (108MB of metadata volume):

thin_metadata_size - 53.22 mebibytes estimated metadata area size for "--block-size=128kibibytes --pool-size=208gibibytes --max-thins=100"

Something is off. Maybe chunk size?

thin_metadata_size - 106.04 mebibytes estimated metadata area size for "--block-size=64kibibytes --pool-size=208gibibytes --max-thins=100"

Ok, looks like it (blivet apply rounding on top of this number).
So, lets try to calculate more appropriate number for Qubes use case, for much more snapshots/devices:

thin_metadata_size - 109.56 mebibytes estimated metadata area size for "--block-size=64kibibytes --pool-size=208gibibytes --max-thins=1000"

It's almost the same! Maybe even more:

thin_metadata_size - 144.71 mebibytes estimated metadata area size for "--block-size=64kibibytes --pool-size=208gibibytes --max-thins=10000"

Ok, here I got a bigger number.
But it still looks wrong. This pool has about 90 LVs (including snapshots) and is almost full (both data and metadata). Choosing "--max-thins=1000" wouldn't help that much.
That's all using block size 64k, even though LVM reports chunk size of 128k - which would result in even smaller numbers.

I consider using whatever number it calculate and doubling it.

marmarek · 2019-08-18T15:38:09Z

Additionally, I don't see any clean way to specify metadata size while using automatic partitioning. There are multiple abstraction layers, without much flexibility in extra arguments. I'm afraid I'll need to settle down on a blivet patch adjusting the defaults.

The default is too small in Qubes use case. Fixes QubesOS/qubes-issues#3243

qubesos-bot · 2019-10-31T21:06:43Z

Automated announcement from builder-github

The package pykickstart-2.32-4.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

qubesos-bot · 2019-11-11T04:24:02Z

Automated announcement from builder-github

The package pykickstart-2.32-4.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

The default is too small in Qubes use case. Fixes QubesOS/qubes-issues#3243 Adapted from 88b59572bccd4cfa60d5eb8d0379094a81908358 By Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

andrewdavidwong added C: installer task labels Oct 28, 2017

andrewdavidwong added this to the Release 4.0 milestone Oct 28, 2017

andrewdavidwong modified the milestones: Release 4.0, Release 4.1 Mar 31, 2018

andrewdavidwong mentioned this issue May 29, 2019

LVM Pool metadata default size is too small #5054

Closed

andrewdavidwong added P: major Priority: major. Between "default" and "critical" in severity. T: bug and removed T: task labels May 29, 2019

andrewdavidwong changed the title ~~Check risk of metadata partition filling up with LVM Thin Provisioning~~ Risk of metadata partition filling up with LVM Thin Provisioning May 29, 2019

brendanhoar mentioned this issue Jun 24, 2019

Corrupted LVM Pool Metadata - no free space (recoverable?) #4209

Closed

andrewdavidwong added P: critical Priority: critical. Between "major" and "blocker" in severity. and removed P: major Priority: major. Between "default" and "critical" in severity. labels Aug 11, 2019

marmarek added a commit to marmarek/qubes-blivet that referenced this issue Aug 20, 2019

Double default LVM thin pool metadata size

e6146d8

The default is too small in Qubes use case. Fixes QubesOS/qubes-issues#3243

marmarek added a commit to marmarek/qubes-installer-qubes-os that referenced this issue Aug 20, 2019

blivet: Double default LVM thin pool metadata size

2d4292e

The default is too small in Qubes use case. Fixes QubesOS/qubes-issues#3243

marmarek added a commit to marmarek/qubes-blivet that referenced this issue Aug 20, 2019

Double default LVM thin pool metadata size

308e8a6

The default is too small in Qubes use case. Fixes QubesOS/qubes-issues#3243

This was referenced Sep 22, 2019

Double default LVM thin pool metadata size QubesOS/qubes-blivet#1

Merged

blivet: Double default LVM thin pool metadata size QubesOS/qubes-installer-qubes-os#32

Merged

marmarek mentioned this issue Oct 13, 2019

32GB / 28.9 GiB Flash Drive rejected by installer as too small #5345

Closed

marmarek added a commit to marmarek/qubes-installer-qubes-os that referenced this issue Oct 22, 2019

blivet: Double default LVM thin pool metadata size

2f49921

The default is too small in Qubes use case. Fixes QubesOS/qubes-issues#3243

marmarek added a commit to marmarek/qubes-installer-qubes-os that referenced this issue Oct 23, 2019

blivet: Double default LVM thin pool metadata size

387dd52

The default is too small in Qubes use case. Fixes QubesOS/qubes-issues#3243

marmarek added a commit to marmarek/qubes-installer-qubes-os that referenced this issue Oct 25, 2019

blivet: Double default LVM thin pool metadata size

40a1574

The default is too small in Qubes use case. Fixes QubesOS/qubes-issues#3243

qubesos-bot added the r4.0-dom0-cur-test label Oct 31, 2019

qubesos-bot mentioned this issue Oct 31, 2019

installer-qubes-os v2.1.11-6-blivet (r4.0) QubesOS/updates-status#1431

Closed

qubesos-bot added r4.0-dom0-stable and removed r4.0-dom0-cur-test labels Nov 11, 2019

marmarek closed this as completed in QubesOS/qubes-blivet@933b35e Jan 6, 2020

marmarek closed this as completed in QubesOS/qubes-blivet#1 Jan 6, 2020

qubesos-bot mentioned this issue Jan 6, 2020

blivet v3.1.6-1 (r4.1) QubesOS/updates-status#1551

Closed

marmarek mentioned this issue May 15, 2020

Enable LVM autoextend #5826

Closed

marmarek referenced this issue in fepitre/qubes-dist-upgrade Jun 3, 2020

Add option for doubling metadata size

0ecccca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Risk of metadata partition filling up with LVM Thin Provisioning #3243

Risk of metadata partition filling up with LVM Thin Provisioning #3243

qubesuser commented Oct 28, 2017

brendanhoar commented Jun 24, 2019

tasket commented Jun 24, 2019

awokd commented Aug 11, 2019

marmarek commented Aug 18, 2019

marmarek commented Aug 18, 2019

qubesos-bot commented Oct 31, 2019

qubesos-bot commented Nov 11, 2019

Risk of metadata partition filling up with LVM Thin Provisioning #3243

Risk of metadata partition filling up with LVM Thin Provisioning #3243

Comments

qubesuser commented Oct 28, 2017

brendanhoar commented Jun 24, 2019

tasket commented Jun 24, 2019

Long-term considerations:

awokd commented Aug 11, 2019

marmarek commented Aug 18, 2019

marmarek commented Aug 18, 2019

qubesos-bot commented Oct 31, 2019

qubesos-bot commented Nov 11, 2019