Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(accountsdb): Generate snapshots, fix UAF, improve bincode #179

Merged
merged 37 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
65639e8
Add utility for identifying arraylist and hashmap
InKryption Jun 20, 2024
a7b6c68
Make use of less brittle stdlib type check
InKryption Jun 20, 2024
413e54b
Update our zstd dependency
InKryption Jun 20, 2024
132fc05
Make the backing int of `FileId` a decl
InKryption Jun 20, 2024
be8279c
Clean up some types & use ArrayHashMap
InKryption Jun 20, 2024
272fb7c
Add `writeSnapshotTar` initial implementation
InKryption Jun 20, 2024
83e3955
Make tar also accept EOF in absence of sentinel
InKryption Jun 20, 2024
ee2271a
Stop using `@typeName` in `bincode.free` as well
InKryption Jun 20, 2024
9ed2cf7
Improve incremental bank fields & type refactors
InKryption Jun 20, 2024
6242997
Store slice instead of arraylist in file_map
InKryption Jun 24, 2024
b02a387
LogLevel CLI arg improvement
InKryption Jun 27, 2024
032b0b5
Turn the `fields` field into a parameter
InKryption Jun 27, 2024
b3046be
Add utils.fmt.boundedFmt
InKryption Jul 3, 2024
84a0d9d
Some bincode additions & improvements
InKryption Jul 3, 2024
27f2b7a
Publicize `ReferenceMemory`
InKryption Jul 4, 2024
7096c0f
Implement `writeSnapshotTarTo` method, & more
InKryption Jul 4, 2024
0c3f0a8
Structure testWriteSnapshot better for clean cwd
InKryption Jul 4, 2024
deaa226
bincode improvements & hashmap config
InKryption Jul 8, 2024
60ad87e
Various accountsdb improvements & fixes
InKryption Jul 8, 2024
597ce20
Note probably-temporary parameters & alias types
InKryption Jul 8, 2024
c984ff9
Fix UAF with slightly hacky-ish code
InKryption Jul 8, 2024
0bc6b6f
Note purpose of duration of lock in function
InKryption Jul 8, 2024
0b2f8c7
Small renames
InKryption Jul 8, 2024
972e0f4
Move tar functions to tar module
InKryption Jul 8, 2024
e9644cd
Better snapshot generation names
InKryption Jul 8, 2024
6ccce17
run zig fmt
InKryption Jul 8, 2024
519bad0
Rename & assign count value to local variable
InKryption Jul 9, 2024
6db142d
Eliminate unused `AccountStorage` type
InKryption Jul 9, 2024
e0392ce
Some bincode refactors
InKryption Jul 9, 2024
3230959
Remove blocks
InKryption Jul 9, 2024
c71539c
Bincode simplifications
InKryption Jul 9, 2024
f9f5c2b
Enhance & limit `boundedFmt`
InKryption Jul 10, 2024
8a097c7
Correct the account file filtering filtering logic
InKryption Jul 10, 2024
e294dce
Improve file_map iteration & use more `FileId`
InKryption Jul 10, 2024
ab93e50
Add section for snapshot generation to readme
InKryption Jul 11, 2024
08572b9
Delete dead code, rename local `gen`s to `S`s
InKryption Jul 11, 2024
551eab3
Extract dedicated bincode.readInt function
InKryption Jul 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions build.zig.zon
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
.hash = "1220b8a918dfcee4fc8326ec337776e2ffd3029511c35f6b96d10aa7be98ca2faf99",
},
.zstd = .{
.url = "https://github.com/Syndica/zstd.zig/archive/a052e839a3dfc44feb00c2eb425815baa3c76e0d.tar.gz",
.hash = "122001d56e43ef94e31243739ae83d7508abf0b8102795aff1ac91446e7ff450d875",
.url = "https://github.com/Syndica/zstd.zig/archive/20a21798a253ea1f0388ee633f4110e1deb2ddef.tar.gz",
.hash = "1220e27e4fece8ff0cabb2f7439891919e8ed294dc936c2a54f0702a2f0ec96f4b6c",
},
.curl = .{
.url = "https://github.com/jiacai2050/zig-curl/archive/8a3f45798a80a5de4c11c6fa44dab8785c421d27.tar.gz",
Expand Down
10 changes: 6 additions & 4 deletions src/accountsdb/accounts_file.zig
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,16 @@ const AccountFileInfo = @import("snapshots.zig").AccountFileInfo;
/// Simple strictly-typed alias for an integer, used to represent a file ID.
///
/// Analogous to [AccountsFileId](https://github.com/anza-xyz/agave/blob/4c921ca276bbd5997f809dec1dd3937fb06463cc/accounts-db/src/accounts_db.rs#L824)
pub const FileId = enum(u32) {
pub const FileId = enum(Int) {
_,

pub const Int = u32;

pub inline fn fromInt(int: u32) FileId {
return @enumFromInt(int);
}

pub inline fn toInt(file_id: FileId) u32 {
pub inline fn toInt(file_id: FileId) Int {
return @intFromEnum(file_id);
}

Expand Down Expand Up @@ -222,7 +224,7 @@ pub const AccountInFile = struct {
pub const AccountFile = struct {
// file contents
memory: []align(std.mem.page_size) u8,
id: usize,
id: FileId,
slot: Slot,
// number of bytes used
length: usize,
Expand Down Expand Up @@ -395,7 +397,7 @@ test "core.accounts_file: verify accounts file" {
const path = "test_data/test_account_file";
const file = try std.fs.cwd().openFile(path, .{ .mode = .read_write });
const file_info = AccountFileInfo{
.id = 0,
.id = FileId.fromInt(0),
.length = 162224,
};
var accounts_file = try AccountFile.init(file, file_info, 10);
Expand Down
431 changes: 311 additions & 120 deletions src/accountsdb/db.zig

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions src/accountsdb/index.zig
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,6 @@ pub const AccountRef = struct {
}
};

const ReferenceMemory = std.AutoHashMap(Slot, ArrayList(AccountRef));

/// stores the mapping from Pubkey to the account location (AccountRef)
///
/// Analogous to [AccountsIndex](https://github.com/anza-xyz/agave/blob/a6b2283142192c5360ad0f53bec1eb4a9fb36154/accounts-db/src/accounts_index.rs#L644)
Expand All @@ -79,6 +77,7 @@ pub const AccountIndex = struct {
bins: []RwMux(RefMap),
calculator: PubkeyBinCalculator,

pub const ReferenceMemory = std.AutoHashMap(Slot, ArrayList(AccountRef));
pub const RefMap = SwissMap(Pubkey, AccountReferenceHead, pubkey_hash, pubkey_eql);

const Self = @This();
Expand Down Expand Up @@ -387,7 +386,7 @@ pub const AccountIndex = struct {
.slot = accounts_file.slot,
.location = .{
.File = .{
.file_id = FileId.fromInt(@intCast(accounts_file.id)),
.file_id = accounts_file.id,
.offset = offset,
},
},
Expand Down
19 changes: 17 additions & 2 deletions src/accountsdb/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ loading from a snapshot begins in `accounts_db.loadFromSnapshot` is a very
expensive operation.

the steps include:
- reads and load all the account files
- reads and load all the account files based on the snapshot manifest's file map
- validates + indexes every account in each file (in parallel)
- combines the results across the threads (also in parallel)

Expand Down Expand Up @@ -233,6 +233,21 @@ after validating accounts-db data, we also validate a few key structs:
- `Bank` : contains `bank_fields` which is in the snapshot metadata (not used right now)
- `StatusCache / SlotHistory Sysvar` : additional validation performed in `status_cache.validate`

## generating a snapshot

*note:* at the time of writing, this functionality is in its infancy.

The core logic for generating a snapshot lives in `accounts_db.db.writeSnapshotTarWithFields`; the principle entrypoint is `AccountsDB.writeSnapshotTar`.
The procedure consists of writing the version file, the status cache (`snapshots/status_cache`) file, the snapshot manifest (`snapshots/{SLOT}/{SLOT}`),
and the account files (`accounts/{SLOT}.{FILE_ID}`). This is all written to a stream in the TAR archive format.

The snapshot manifest file content is comprised of the bincoded (bincode-encoded) data structure `SnapshotFields`, which is an aggregate of:
* implicit state: data derived from the current state of AccountsDB, like the file map for all the account which exist at that snapshot, or which have
changed relative to a full snapshot in an incremental one
* configuration state: data that is used to communicate details about the snapshot, like the full slot to which an incremental snapshot is relative.

For full snapshots, we write all account files present in AccountsDB which are rooted - as in, less than or equal to the latest rooted slot.

## read/write benchmarks
`BenchArgs` contains all the configuration of a benchmark (comments describe each parameter)
- found at the bottom of `db.zig`
Expand All @@ -257,4 +272,4 @@ swissmapBenchmark(500k accounts) 1 7715875 7715875 0
WRITE: 17.163ms (1.44x faster than std)
READ: 50.975ms (0.70x faster than std)
swissmapBenchmark(1m accounts) 1 17163500 17163500 0 17163500
```
```
Loading
Loading