Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce TLV for bdk_file_store #1499

Closed
notmandatory opened this issue Jul 2, 2024 · 12 comments
Closed

Introduce TLV for bdk_file_store #1499

notmandatory opened this issue Jul 2, 2024 · 12 comments
Assignees

Comments

@notmandatory
Copy link
Member

notmandatory commented Jul 2, 2024

Sub-task for #1103

@notmandatory notmandatory added api A breaking API change module-database labels Jul 2, 2024
@notmandatory notmandatory added this to the 1.0.0-alpha milestone Jul 2, 2024
@notmandatory notmandatory added this to BDK Jul 2, 2024
@notmandatory notmandatory moved this to Todo in BDK Jul 2, 2024
@notmandatory notmandatory removed the api A breaking API change label Jul 2, 2024
@evanlinjin
Copy link
Member

Okay technically we don't need the T since we always append the same type. I propose the following:

Encoding

Encode each element as:

[Version: VarInt] [Length: VarInt] [Data: Bytes]

Where VarInt is defined here: https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer.

VersionedElement trait

pub trait VersionedElement: Sized {
    /// Decode the `data` of given `version`.
    fn decode_with_version(version: u64, data: &[u8]) -> Result<Self, VersionedDecodeError>;

    /// The version to encode the data with.
    fn encode_version() -> u64;
}

pub enum VersionedDecodeError {
    UnknownVersion(u64),
    Bincode(bincode::Error),
}

@LLFourn
Copy link
Contributor

LLFourn commented Jul 3, 2024

I did a lot of thinking about this.

Backwards compat

/// Persists an append-only list of changesets (`C`) to a single file.
#[derive(Debug)]
pub struct Store<C, V = C> {
    magic_len: usize,
    db_file: File,
    marker: PhantomData<(V, C)>,
}

impl<V, C> Store<V, C>
where
    C: Into<V> + Append,
    V: Into<C> + serde::Serialize + serde::de::DeserializeOwned,

As you can see there is a new type parameter V -- this is the type that is serialized to and from the actual disk.

So in practice you'd have:

pub struct MyChangeSetV0_17_0 {... }
pub struct MyChangeSet { ... }

// serde serde etc
pub enum VersionedMyChangeSet {
    V0_17_0(MyChangeSetV0_17_0)
    Current(MyChangeSet)
}

impl From<MyChangeSet> for VersionedMyChangeSet {
     fn from(value: VersionedMyChangeSet) -> MyChangeSet {
         // do the obvious
     }
}

Then you'd set V as VersionedMyChangeSet and C as MyChangeSet.

If we do have forwards compatibility we need a length prefix. This can be done by just encoding each entry into a Vec<u8> and bincode encoding the result to the file.

This makes backwards compat optional and straight forward to achieve. You just have to nudge users to create this enum. For wallet we can create VersionedChangeSet and do all this without depending on bdk_file_persist I'm sure it's likely to be useful in the future anyway.

When making new changesets for existing types (which for the record I actually think we can avoid) we keep the old one around of course and put ChangeSetV0_WHAT_EVER at the end. bdk_wallet will probably use this in turn to develop a new changeset when it goes to v2.

Forwards Compat

To do forwards compat we just write the length of each entry before we serialize it. This can be done in a couple of ways. e.g. https://docs.rs/bincode/latest/bincode/config/trait.Options.html#method.serialized_size or by just serialzing to a Vec<u8> and serializing that.

Now that we have the length, when we decode an entry to a Vec<u8>, we try and decode as V. If it fails we strip the enum varint (tell bincode to just read an int), and then just try to decode as C, if that fails we finally throw an error.

So this gives you forwards compat as long as whatever the future V is that has been written encodes first a varint and then something that C can partially decode. This would be the case if the new version has just added an extra optional field.

For what it's worth, I discussed with Evan about things we had as breaking changes coming up or speculated and none of them actually were compatible with forwards compat (none of them would be just adding an optional field).

@evanlinjin
Copy link
Member

I agree with @LLFourn's suggestion. I think someone should take this ticket on.

@LLFourn
Copy link
Contributor

LLFourn commented Jul 4, 2024

I later realized the "Forwards Compat" bit is not totally sound. Before trying to do a decode on a higher versioned entry you need to actually check it is higher versioned not just assume so because it failed. It could be a corrupted lower version entry or something else has gone wrong. This means we'd need to know the current version.

It's actually kinda possible to do this automatically with bincode if you require Default for V and just encode the Default and see what int it encodes as to get the implicit version number. Since we already enforce Default for Append, then V should do it too. A little hacky but it does give us the feature. I'd be tempted to not actually implement this just yet but leave forwards compat later. I fully realize that "let's do forwards compat later" sounds like the famous last words.

@nymius
Copy link
Contributor

nymius commented Jul 4, 2024

I agree with @LLFourn's suggestion. I think someone should take this ticket on.

I would like to give it a try.

@notmandatory
Copy link
Member Author

@nymius I've assigned this to you, thanks for jumping in to take this on.

@notmandatory
Copy link
Member Author

notmandatory commented Jul 5, 2024

Can we push this TLV work for the bdk_file_store out of the 1.0.0-alpha milestone?

This would let the team focus on #1496 and #1498 changes and we'd only deliver backward compatibility for the beta with bdk_sqlite. The examples would also have to be changed to use bdk_sqlite but that should be faster/easier to review than redesigning bdk_file_store.

@notmandatory
Copy link
Member Author

I discussed this with @evanlinjin and decided this issue can be moved to the 1.0.0-beta milestone. We just need to warn beta users that if they want backward data file compatibility they'll need to use the bdk_sql until this issue is done.

@notmandatory notmandatory modified the milestones: 1.0.0-alpha, 1.0.0-beta Jul 8, 2024
@nondiremanuel nondiremanuel moved this from Discussion to In Progress in BDK Aug 20, 2024
@LLFourn
Copy link
Contributor

LLFourn commented Sep 10, 2024

I'd like to know what @nymius and @evanlinjin actually think about whether it's a good idea to actually do this. Before I speculated about how to this but didn't really consider too deeply whether or not it's actually a good idea. We could just leave bdk_file_store as a low dependency ultra-minimal example of how to store and retrieve changesets. It's helpful in development before you commit to maintaining a specific backend for your users. Once we start making it forwards/backwards compatible we are implying that it can be used in production. I have no real fear about the design being used in production. I am concerned that bringing this to the level of quality needed to be confidently used in production might be wasted since I haven't heard of people wanting to use this? All the extra code also detracts from the its high-level pedagogical value since it's mostly about dealing with the particulars of bincode.

@nymius
Copy link
Contributor

nymius commented Sep 11, 2024

I didn't considered the legitimacy of this as a real production case either. It seemed a good case just for the comments on #1103.
Besides its utility I think I'm closer in the implementation to what you and evan underpinned in the comments, I only lacked the time to finish it last week.
From a pedagogical point of view, the code certainly gets more complex and probably hides what you are trying to "teach".
Looking at its production application, what are the alternatives? sqlite? sqlx? what benefits they bring in relation to bincode? Thinking about redundancy for example sqlite and file-store would have the same method: duplicate the file and copy it in another location.
One drawback I can think of is that Bincode is harder to recover if the file brakes, because we cannot be sure were the error occurred just from the file.
Something bincode doesn't have that SQLite does, is the ability to record snapshots, without having to implement any extra logic, for example. Something PostgreSQL is capable too.
Considering the number of operations database engines can perform that file-store would have to implement itself, I would consider it production unsuitable, as these operations usually improve the reliability of the application, and for file-store would have to be implemented from scratch with all the costs this involves.

@notmandatory
Copy link
Member Author

My 2 sats on this is that file store is 1. a good as a simple example, 2. only really needs to show backward compatibility (not forward), and 3. isn't needed for the 1.0 final release so should be moved out of the 1.0.0-beta milestone.

I do think most production apps will use sqlite for single user apps (ie mobile), or sqlx for larger multi-user apps (on servers). In my experience SQL stores are reliable and most devs know how to work with them. Unless someone has a very limited resources device that needs to tightly control dependencies I don't think a flat file based persister will be a very popular wallet data store option. But still as an example and inspiration for anyone who does need a custom data store it's good to have.

@notmandatory notmandatory removed this from the 1.0.0-beta milestone Sep 19, 2024
@LLFourn
Copy link
Contributor

LLFourn commented Oct 27, 2024

Let's close this for now then. Please see #1661 which documents the state of bdk_file_store.

@LLFourn LLFourn closed this as completed Oct 27, 2024
@github-project-automation github-project-automation bot moved this from Needs Review to Done in BDK Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants