-
Notifications
You must be signed in to change notification settings - Fork 402
Fix unknown handling in impl_writeable_tlv_based_enum_upgradable
#2969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unknown handling in impl_writeable_tlv_based_enum_upgradable
#2969
Conversation
c9286b2
to
7cf7308
Compare
lightning/src/util/ser_macros.rs
Outdated
@@ -1065,6 +1065,10 @@ macro_rules! impl_writeable_tlv_based_enum { | |||
/// when [`MaybeReadable`] is practical instead of just [`Readable`] as it provides an upgrade path for | |||
/// new variants to be added which are simply ignored by existing clients. | |||
/// | |||
/// Note that variants written as `$tuple_variant`s will not support downgrading no matter the | |||
/// `$tuple_variant_id` used, however any variants written as a normal variant will support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe $variant_id
rather than "variants written as a normal variant"? That sentence is a bit of a mouthful with the number of times "variant" is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went the other way and went with simply "tuple variants" in the first part of the sentence, referring to the type by the name argument seems weird. Tried to reword it but not sure its that much clearer.
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2969 +/- ##
==========================================
+ Coverage 89.51% 91.05% +1.54%
==========================================
Files 117 117
Lines 95146 108426 +13280
Branches 95146 108426 +13280
==========================================
+ Hits 85169 98730 +13561
+ Misses 7767 7534 -233
+ Partials 2210 2162 -48 ☔ View full report in Codecov by Sentry. |
cca8193
to
ecda130
Compare
Grrr, yet more issues, see new commits. Not quite sure what to do about this...we could just say "look, downgrade is broken, don't downgrade", but that's kinda shitty. Alternatively, we could ship an 0.0.122 that has just this PR, then say you can't downgrade past that if you upgrade to 0.0.123? |
lightning/src/util/ser_macros.rs
Outdated
/// Note that variants written as tuple variants will not support downgrading no matter the | ||
/// `$tuple_variant_id` used. Any variants written as a normal variant will support downgrading by | ||
/// having [`MaybeReadable`] return `None` for odd `$variant_id` variants not understood. | ||
/// Note that only normal variants (not tuple variants) will support downgrading, thus any new odd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/normal variants/struct and unit variants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you address this before squashing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry, somehow I missed this entirely.
Yeah, I don't see what choice we have besides doing one of those :( It seems possible that certain specific downgrades aren't broken because no upgradable odd TLVs were added in between them. Maybe it would help users to figure out what versions those are? |
FYI, I was having trouble wrapping my mind around the fixes here so I ended up writing some small tests that repro the bugs on main: 8da8295 |
Yea, there are definitely quite some upgrade/downgrades that are totally fine, but definitely trying to figure it all out would be quite a bit of work. Given we haven't yet received any reports of this issue in the wild (not that that means no one has ever hit it), I'd say its quite uncommon (then again, in normal operation we probably very rarely serialize with any pending events or monitor events, which is at least where we would break downgrade on the current git HEAD). We definitely have users who have deployed multi-device so we generally have to support downgrade (ie users switching from an upgraded node to an un-upgraded one) but we can ask our users if they're comfortable with just having an 0.0.122 with this fix (and then waiting a while for 0.0.123) or if they want us to consider some kind of backport to earlier LDK versions. |
Saw the request for some comments on Discord so putting in my two sats here. For us, being able to downgrade is not as hard of a requirement as it would be for say a multi-device wallet (e.g. Mutiny). We operate on a model where each user implicitly approves the latest three Lexe node versions that they have provisioned to. We as the Lexe operators are prevented from running anything other than these three approved versions, which helps protect the user from downgrade attacks, but also gives us (the operators) some flexibility to do some limited downgrades in the more mundane case that one of our releases is broken. In this case we would mark the broken version as 'yanked' in our system and downgrade the user to the latest version they have approved which is not broken. Just an example, feel free to skip if it's clear already:
If LDK breaks downgrades in a point release, it would prevents us from downgrading the user in the (hopefully very rare) case that that one of our releases is broken. This is not a big deal most of the time, since the necessity for downgrades should be rare. But there is a non-zero chance that it affects us. The worst case scenario for us is if LDK breaks downgrades, and our release with the updated LDK also happens to be irrecoverably broken in some way (possibly due to the LDK release). If any of our users approve the broken version and have their node run by us (thus updating their persisted data), we would not be able to run their node, or could only run the broken version, until we release a fix and wait for the user to re-provision (which could take months if they're not active).
This seems like a good idea in general, as any LDK changes which could break the node in other ways would be isolated from the change which breaks downgrades. This would allow us (Lexe) to release a version of our own which only includes the LDK version bump, and make sure things are OK before we 'commit' and release other functionality which might break. After confirming that LDK 0.0.122 works fine, we would have a safe version to downgrade to if a newer Lexe release is broken. I would like to emphasize however that once again, LDK breaking downgrades is not that huge of a deal for Lexe specifically. I think we would actually prefer that LDK break downgrades every once in a while if a downgrade would allow you to improve the codebase and pay down technical debt. The hard work you guys do to provide these guarantees for us is warmly appreciated as always. |
// Thus, we consume everything left in the `$outer_reader` here, ensuring that if | ||
// we're being read as a part of another TLV stream we don't spuriously fail to | ||
// deserialize the outer object due to a TLV length mismatch. | ||
$crate::io_extras::copy($outer_reader, &mut $crate::io_extras::sink()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use $crate::io_extras::copy
but $reader.eat_remaining()?
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not reading the inner reader (which is always a LengthLimitingReader
for just the specific TLV's value and thus supports eat_remaining()
) but rather the outer reader, which may be of any type and thus we can't assume eat_remaining
is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Makes sense.
ecda130
to
5661da0
Compare
LGTM, feel free to squash. |
`impl_writeable_tlv_based_enum_upgradable` professed to supporting upgrades by returning `None` from `MaybeReadable` when unknown variants written by newer versions of LDK were read. However, it generally didn't support this as it didn't discard bytes for unknown types, resulting in corrupt reading. This is fixed here for enum variants written as a TLV stream, however we don't have a length prefix for tuple enum variants, so the documentation on the macro is updated to mention that downgrades are not supported for tuple variants.
If we are reading an object that is `MaybeReadable` in a TLV stream using `upgradable_required`, it may return early with `Ok(None)`. In this case, it will not read any further TLVs from the TLV stream. This is fine, except that we generally expect `MaybeReadable` always consume the correct number of bytes for the full object, even if it doesn't understand it. This could pose a problem, for example, in cases where we're reading a TLV-stream `MaybeReadable` object inside another TLV-stream object. In that case, the `MaybeReadable` object may return `Ok(None)` and not consume all the available bytes, causing the outer TLV read to fail as the TLV length does not match.
Whils this is generally not supported, issues in our `MaybeReadable` implementations may occur, and we should try to be robust against them.
Squashed with no further changes. |
5661da0
to
f852d16
Compare
Might be worth including these in a follow-up PR. |
Oh duh, yea. |
impl_writeable_tlv_based_enum_upgradable
professed to supporting upgrades by returningNone
fromMaybeReadable
when unknown variants written by newer versions of LDK were read. However, it generally didn't support this as it didn't discard bytes for unknown types, resulting in corrupt reading.This is fixed here for enum variants written as a TLV stream, however we don't have a length prefix for tuple enum variants, so the documentation on the macro is updated to mention that downgrades are not supported for tuple variants.