Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] BTreeMap with unbounded keys and values. #84

Closed
wants to merge 25 commits into from

Conversation

ielashi
Copy link
Contributor

@ielashi ielashi commented Jun 1, 2023

This commit is a proof of concept for a new version of BTreeMap that doesn't require keys or values to be bounded in size. This POC is a simplified and inefficient implementation, and I'd like to get feedback on the general direction of this approach. Specifically:

  1. Adding BOUND to Storable
    Because Rust doesn't support generic specializations, we cannot have special logic in the BTreeMap based on whether or not the keys/values implement BoundedStorable. To get around this, I propose merging BoundedStorable into Storable by introducing the optional BOUND field. Structures can then support both bounded and unbounded types, but with the downside being that bound checks would now happen at run-time.

  2. Introduce variable-length nodes.
    Nodes in the BTreeMap have been fixed in size, where the maximum possible size is allocated for each key and value. In the V2 proposed in this POC, nodes can be variable in size, growing and shrinking to accommodate its contents. A node still contains the same bounds on the number of entries as in V1, and that allows us to reuse the same BTreeMap logic, which remains practically untouched.

    A node is initially given a page_size, and when it needs to outgrow it, it does so by allocating additional overflow pages of the same size. Having everything grow by a fixed page allows us to reuse the constant-size chunk Allocator with no modifications.

The inefficiency of the current implementation is due to it loading and writing all the entries on every load/save. A production-grade implementation would make these loads lazy.

Advantages of this approach

  • Minimal to no change to the BTreeMap logic.
  • No change to the BTreeMap's allocator.
  • No change to the BTreeMap API.
  • V1 versions of the map can be upgraded to V2 seamlessly

Disadvantages of this approach

  • Requires specifying a page_size. A page_size that's too large will waste memory, and a page_size that's too small will lead to a lot of overflow pages. If the developer specifies bounds for keys and/or values, then these can be used as heuristics to set an appropriate page_size.
  • There is some complexity with supporting lazy saving of entries, where only the dirtied entries are written to memory. Because the sizes of entries are unbounded, we'll need something like a first-fit allocator to rewrite dirtied entries within the page efficiently.

@ielashi
Copy link
Contributor Author

ielashi commented Jun 21, 2023

Having done a bit more research, I'm not sure that removing the BoundedStorable trait in favor of adding BOUND to the Storable trait is the best approach. Not only do bound checks now need to happen at runtime, but it's also impossible to compose bounded types, like we do with tuples. The alternative would be to introduce a new type, say UnboundedBTreeMap. It's unfortunate, but it seems to be the cleaner, more type-safe approach.

@lastmjs
Copy link

lastmjs commented Jun 21, 2023

Would the developer need to or be able to set the page_size? Ideally this complexity would not be exposed to the developer.

What options do we have for minimizing the impact of the page_size, for example could the page_size start out at a reasonable default and adapt itself over time?

@ielashi
Copy link
Contributor Author

ielashi commented Jun 22, 2023

Would the developer need to or be able to set the page_size? Ideally this complexity would not be exposed to the developer.

No, this would be set internally by the BTreeMap, but we could potentially provide an option to the developer to set it, should they be a power user.

What options do we have for minimizing the impact of the page_size, for example could the page_size start out at a reasonable default and adapt itself over time?

As things stand, where we are using a constant-size chunk allocator, once a page_size is set, it cannot be changed. I'm exploring an alternative design to this POC where we instead use a dynamic allocator. I'll post more on that soon.

@lastmjs
Copy link

lastmjs commented Jun 22, 2023

I'm very interested in the dynamic allocator, sounds great.

@dfinity dfinity deleted a comment from dragoljub-duric Jul 19, 2023
@dfinity dfinity deleted a comment from dragoljub-duric Jul 19, 2023
ielashi added a commit that referenced this pull request Aug 17, 2023
## Problem
We'd like to expand the functionality of `BTreeMap` to support unbounded
types. To add this support in a backward-compatible way, the `BTreeMap`
would need to support both bounded types (i.e. types that implement
`BoundedStorable`) and unbounded types (i.e. types that don't implement
`BoundedStorable`).

Rust doesn't support generic specializations, so we cannot have the same
`BTreeMap` support both use-cases simultaneously. It can either support
only `BoundedStorable` types, or all types implementing `Storable`, but
without the knowledge of whether a `BoundedStorable` implementation is
available for those types.

## Solution
Merge `BoundedStorable` and `Storable` into the same trait. The one
downside of this approach is that checking whether or not a bound exists
now happens at run-time.

This impacts the developer experience. Prior to the PR, if you try to
store a `String` inside a `StableVec`, the compiler will complain that
`String` doesn’t implement `BoundedStorable`. But, with this solution,
it’ll happily compile, and only when you run your code will you see a
panic that `String` is not bounded.

This PR relates to #84 and #69.
@ielashi
Copy link
Contributor Author

ielashi commented Sep 14, 2023

Closing as all these changes have been incorporated and merged into main in other PRs.

@ielashi ielashi closed this Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants