chore(sequencer)!: exclusively use Borsh encoding for stored data (ENG-768) #1492

Fraser999 · 2024-09-11T22:53:15Z

Summary

This PR primarily changes the encoding format of all data being written to storage to Borsh.

Background

We currently have a variety of different encoding formats, and this can be confusing, sub-optimal (in terms of storage space consumed and serialization/deserialization performance) and potentially problematic as e.g. JSON-encoding leaves a lot of flexibility in how the actual serialized data can look.

This PR is part one of three which aim to improve the performance and quality of the storage component. As such, the APIs of the various StateReadExt and StateWriteExt extension traits were updated slightly in preparation for the upcoming changes. In broad terms, for getters this meant having ref parameters rather than value ones (even for copyable types like [u8; 32] this is significantly more performant), and for "putters", parameters which are used for DB keys are refs, while the DB value parameters become values, since in the next PR these values will be added to a cache.

Changes

Added a new storage module. This will ultimately contain our own equivalents of the cnidarium types, but for now consists only of a collection of submodules for types currently being written to storage. There is a top-level enum StoredValue which becomes the only type being written to storage by our own code.
To accommodate for converting between the storage types and the corresponding domain types defined in astria-core and astria-merkle, some of these have been provided with constructors named unchecked_from_parts. This allows the type to be constructed from a trusted source like our own DB, skipping any validation steps which might otherwise be done.
Updated all StateReadExt and StateWriteExt traits to use the new StoredValue, which internally uses Borsh encoding.
Updated the APIs of all these extension traits as described above. This change resulted in a slew of similar updates to avoid needless copying [u8; 32] and [u8; 20] arrays, and that has unfortunately made the PR larger than expected.
A few core types which currently derive or implement serde traits have had that removed, since it only existed to support JSON-encoding the type for storing. In one case it was for JSON-encoding to a debug log line, but I replaced that with a Display impl.

Testing

Existing unit tests of the StateReadExt and StateWriteExt traits already had almost full coverage. Where coverage was missing, I added round trip tests.

Breaking Changelist

The on-disk format of stored data has changed.

Related Issues

Closes #1434.

SuperFluffy

High level comment on the approach taken in this design:

At a high level, sequencer is defined by a set of components, each of which provides ways to change and read their respective state. So far this also meant that each component had full control over how the objects written to its state are converted to/from the domain types.

The storage and stored_value modules invert this logic by defining and passing in the on-disk types into the components.

I am ambivalent about this change since on the one hand it potentially allows avoiding code duplication. But on the other hand it breaks with the separation between the components we had so far.

Fraser999 · 2024-09-23T14:02:39Z

So far this also meant that each component had full control over how the objects written to its state are converted to/from the domain types.

I see that as one of the main problems with the existing approach. It's because we've allowed this flexibility that different components use different serialization for the same domain types. In a lot of cases, there is only the domain type - the component would serialize directly from the domain type into bytes. I'm very keen to remove that flexibility so that all components have to use Borsh and we have a complete collection of storage types which map to/from domain types, both serving different purposes (domain types supporting business logic only, storage types supporting being held in a DB only).

The storage and stored_value modules invert this logic by defining and passing in the on-disk types into the components.

This is only temporary until #1436 is completed. When that work is completed, the components will be dealing with the domain types only.

But on the other hand it breaks with the separation between the components we had so far.

Well, I don't think there's complete separation currently. For example, accounts::StateReadExt is used in a few other components. But this isn't really changed with the new approach IMO. Ultimately the components will continue to access/modify state relating to other components via their state extension traits.

SuperFluffy

Reviewed the changes in astria-core for now (most of which I think we should not have in this PR). Going to review the rest of sequencer now.

crates/astria-core/src/crypto.rs

crates/astria-core/src/primitive/v1/asset/denom.rs

crates/astria-core/src/primitive/v1/mod.rs

crates/astria-core/src/sequencerblock/v1alpha1/celestia.rs

crates/astria-merkle/src/audit.rs

crates/astria-sequencer/src/accounts/action.rs

crates/astria-sequencer/src/accounts/mod.rs

crates/astria-sequencer/src/storage/stored_value.rs

SuperFluffy

I have went through about half the changes, but I think my comments apply universally.

We should provide domain types for all objects that have their own on-disk borsh types (for example, fee has storage::Fee, it feels natural to make it a domain::Fee, too). The on-disk types right now are stricter (which is good) than the domain types.
I am ambivalent about many on-disk types being pub(crate). I prefer if the on-disk types were module private and never crossed the boundary.
A lot of error instances lack context but have a comment above them noting that enough error context is provided. Because skipping context by just escaping with an ? operator is all too common among junior engineers, I'd prefer to require them everywhere rather than to come up with a heuristic as to when to use them and when to skip.

crates/astria-sequencer/src/transaction/mod.rs

crates/astria-sequencer/src/sequence/storage/values.rs

crates/astria-sequencer/src/app/storage/values/storage_version.rs

crates/astria-sequencer/src/app/storage/values/revision_number.rs

crates/astria-sequencer/src/bridge/component.rs

crates/astria-sequencer/src/bridge/init_bridge_account_action.rs

crates/astria-sequencer/src/accounts/mod.rs

crates/astria-core/Cargo.toml

crates/astria-core/src/primitive/v1/asset/denom.rs

SuperFluffy

Comments:

the errors for failing to serialize from/to the on-disk type should use Debug (very rare, very bad if it happens: it's fine to be extra noisy).
on-disk types with a Display impl should should have a unit test so that their display impl matches that of the domain type (to avoid confusion) non-exhaustive list of types where this applies: RollupId, BlockHash.
we should provide an OndiskType trait that encapsulates the From/TryFrom impls we have everywhere right now.
we really really should split the tests into separate modules. This is not the first PR where I wish I could just ignore the tests when going through the code. :-/

crates/astria-sequencer/src/sequence/component.rs

crates/astria-sequencer/src/ibc/component.rs

crates/astria-sequencer/src/grpc/storage/values/sequencer_block_header.rs

crates/astria-sequencer/src/grpc/storage/values/rollup_ids.rs

crates/astria-sequencer/src/grpc/state_ext.rs

SuperFluffy · 2024-10-01T12:16:10Z

crates/astria-sequencer/src/grpc/state_ext.rs

+
+    #[expect(
+        clippy::default_trait_access,
+        reason = "want to avoid explicitly importing `index_map` crate to sequencer crate"


Builder to the rescue. :)

A good constructor would be even better! One which took an iterator over RollupTransactions so we don't need to care how it's held internally :)

crates/astria-sequencer/src/grpc/state_ext.rs

SuperFluffy

Thank you for the heroic task of going through all of sequencer and fixing its state write/read. I fat fingered my previous review but the comments were intended for this approval.

I don't see any blockers. From the previous list my main ask is to get remove all custom display impls in favor of just the derived Debug - we discussed this in the relevant comment chain, but as you noted the Display impls are intended for providing errors on failed state-decoding.

If that happens we are in so deep waters, that I think spewing a Debug log all over the screen is perfectly acceptable.

(side note that didn't make it into the list of the other comment:

We should extend snapshot tests to all keys - but that's likely covered in your open followup PR.

)

crates/astria-sequencer/src/grpc/sequencer.rs

crates/astria-sequencer/src/bridge/storage/values/transaction_id.rs

crates/astria-sequencer/src/bridge/storage/values/rollup_id.rs

crates/astria-sequencer/src/bridge/storage/values/address_bytes.rs

crates/astria-sequencer/src/accounts/storage/values.rs

crates/astria-sequencer/src/accounts/state_ext.rs

crates/astria-sequencer/Cargo.toml

crates/astria-merkle/src/audit.rs

* main: (34 commits) feat(proto): add bundle and optimistic block apis (#1519) feat(sequencer)!: make empty transactions invalid (#1609) chore(sequencer): simplify boolean expressions in `transaction container` (#1595) refactor(cli): merge argument parsing and command execution (#1568) feat(charts): astrotrek chart (#1513) chore(charts): genesis template to support latest changes (#1594) fix(ci): code freeze action fix (#1610) chore(sequencer)!: exclusively use Borsh encoding for stored data (ENG-768) (#1492) ci: code freeze through github actions (#1588) refactor(sequencer): use builder pattern for transaction container tests (#1592) feat(conductor)!: implement chain ID checks (#1482) chore(ci): upgrade audit-check (#1577) feat(sequencer)!: transaction categories on UnsignedTransaction (#1512) fix(charts): sequencer prometheus rules (#1598) chore(all): Migrate all instances of `#[allow]` to `#[expect]` (#1561) chore(charts,sequencer-faucet): asset precision (#1517) chore(docs): endpoints (#1543) fix(docker): use target binary build param as name of image entrypoint (#1573) fix(ci): ibc bridge test timeout (#1587) Feature: Add `graph-node` to charts. (#1489) ...

exclusively use borsh encoding for stored data

fb4ae44

Fraser999 requested review from a team, joroshiba, SuperFluffy and noot as code owners September 11, 2024 22:53

github-actions bot added conductor pertaining to the astria-conductor crate sequencer pertaining to the astria-sequencer crate composer pertaining to composer labels Sep 11, 2024

Merge remote-tracking branch 'upstream/main' into fraser/use-borsh

78a772e

SuperFluffy reviewed Sep 23, 2024

View reviewed changes

Fraser999 added 3 commits September 25, 2024 10:51

Merge remote-tracking branch 'upstream/main' into fraser/use-borsh

b48293e

Merge remote-tracking branch 'upstream/main' into fraser/use-borsh

44779e1

move stored values to component modules

b614f72

SuperFluffy reviewed Sep 30, 2024

View reviewed changes

crates/astria-sequencer/src/storage/stored_value.rs Show resolved Hide resolved

SuperFluffy reviewed Sep 30, 2024

View reviewed changes

crates/astria-sequencer/src/storage/stored_value.rs Outdated Show resolved Hide resolved

SuperFluffy reviewed Sep 30, 2024

View reviewed changes

Fraser999 added 8 commits September 30, 2024 15:43

Merge remote-tracking branch 'upstream/main' into fraser/use-borsh

a71c3b9

rename 'get' getters

7fa7de3

reinstate Address::bytes

132bddc

feature-gate unchecked constructors in core

d16b193

reinstate serde traits

3c47298

add context to errors

fe21bcb

restrict visibility of on-disk types

26e3683

Merge remote-tracking branch 'upstream/main' into fraser/use-borsh

b331caf

SuperFluffy reviewed Oct 1, 2024

View reviewed changes

crates/astria-core/Cargo.toml Show resolved Hide resolved

crates/astria-core/src/primitive/v1/asset/denom.rs Show resolved Hide resolved

SuperFluffy reviewed Oct 1, 2024

View reviewed changes

SuperFluffy approved these changes Oct 1, 2024

View reviewed changes

replace display impls for debug ones

44170ce

joroshiba approved these changes Oct 1, 2024

View reviewed changes

update unchecked-constructors feature

ccc3f9e

Fraser999 mentioned this pull request Oct 1, 2024

Provide stream for reading rollup data in sequencer #1606

Open

Fraser999 added 2 commits October 1, 2024 17:49

minor updates to sequencer

dd3bfa6

Merge remote-tracking branch 'upstream/main' into fraser/use-borsh

7cab92a

Fraser999 added the override-freeze label Oct 1, 2024

Fraser999 enabled auto-merge October 1, 2024 18:41

Fraser999 added this pull request to the merge queue Oct 1, 2024

Merged via the queue into main with commit 6d9eb28 Oct 1, 2024
55 of 56 checks passed

Fraser999 deleted the fraser/use-borsh branch October 1, 2024 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(sequencer)!: exclusively use Borsh encoding for stored data (ENG-768) #1492

chore(sequencer)!: exclusively use Borsh encoding for stored data (ENG-768) #1492

Fraser999 commented Sep 11, 2024

SuperFluffy left a comment

Fraser999 commented Sep 23, 2024

SuperFluffy left a comment

SuperFluffy left a comment

SuperFluffy left a comment •

edited

Loading

SuperFluffy Oct 1, 2024

Fraser999 Oct 1, 2024

SuperFluffy left a comment

chore(sequencer)!: exclusively use Borsh encoding for stored data (ENG-768) #1492

chore(sequencer)!: exclusively use Borsh encoding for stored data (ENG-768) #1492

Conversation

Fraser999 commented Sep 11, 2024

Summary

Background

Changes

Testing

Breaking Changelist

Related Issues

SuperFluffy left a comment

Choose a reason for hiding this comment

Fraser999 commented Sep 23, 2024

SuperFluffy left a comment

Choose a reason for hiding this comment

SuperFluffy left a comment

Choose a reason for hiding this comment

SuperFluffy left a comment • edited Loading

Choose a reason for hiding this comment

SuperFluffy Oct 1, 2024

Choose a reason for hiding this comment

Fraser999 Oct 1, 2024

Choose a reason for hiding this comment

SuperFluffy left a comment

Choose a reason for hiding this comment

SuperFluffy left a comment •

edited

Loading