Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: stabilize Vector and improve docs #815

Merged
merged 11 commits into from
Aug 23, 2022
Merged

Conversation

austinabell
Copy link
Contributor

  • Exposes store module and documents it
  • Make doc examples more useful

Link to discussion about backwards compatibility of Vector https://near.zulipchat.com/#narrow/stream/308449-pagoda.2Fcore-tools/topic/collections.20backwards.20compatibility/near/282529069

This will likely not be pulled in before 4.0.0 stable -- since more opinionated changes will likely be done in 4.1 hopefully with all other store collection types

@austinabell austinabell requested a review from itegulov May 16, 2022 19:15
@austinabell austinabell marked this pull request as draft May 16, 2022 19:15
@austinabell austinabell marked this pull request as ready for review August 2, 2022 20:46
near-sdk/src/store/mod.rs Outdated Show resolved Hide resolved
near-sdk/src/store/mod.rs Outdated Show resolved Hide resolved
near-sdk/src/store/index_map.rs Show resolved Hide resolved
let mut v1 = Vec1::new(b"m");
v1.extend([1u8, 2, 3, 4]);
// Old collections serializes length as `u64` when new serializes as `u32`.
assert!(Vector::<u8>::try_from_slice(&v1.try_to_vec().unwrap()).is_err());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like it would be nice to have a convenient function to migrate from the old Vector collections to the new store one, but might be out of place, so leaving this one up to you.

Copy link
Contributor Author

@austinabell austinabell Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yeah. It's definitely opinionated, and was thinking of doing it in a separate PR because of that, but I can do it now.

I'm thinking of just a function like:

impl Vector<T> {
  fn to_v2(&self) -> crate::store::Vector { ... }
}

Reason being is that the data that is cloned is small, and most migration functions don't have owned self for some reason (assuming because the SDK marks it as a view call, but unclear why that would matter).

@ChaoticTempest @itegulov do you have an intuition of what the API would look like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. The whole structure is literally just length + prefix, so cloning does not seem to be a big issue to me.

most migration functions don't have owned self for some reason (assuming because the SDK marks it as a view call, but unclear why that would matter).

We had a conversation about this recently. Functions that take owned self are considered to be view and they can't write state back to storage, so they wouldn't be useful for migration anyway, no?

Unless I am misunderstanding something usually a user would version their contract like this:

enum Contract {
    V0(StateV0),
    V1(StateV1),
}

Where StateV0 uses old collections and StateV1 uses new collections. And then they would add a migration function that migrates all fields from StateV0 using one of the functions provided by us (.to_v2()). Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only if they structured their contract to be versioned from the start. If you don't, you would likely need to run a manual migration, otherwise, you wouldn't know which version of the object exists in storage if you tried to do ambiguously

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but are we thinking of getting rid of the old collections eventually? If so, we would have to move this function elsewhere than collections::Vector and someone might forget to do so. So probably best to have it be apart of the new collections as well, like from_v1

near-sdk/src/utils/stable_map.rs Show resolved Hide resolved
@austinabell
Copy link
Contributor Author

austinabell commented Aug 19, 2022

So one other thing I'd like to be sure about is the API of the swap_remove function.

Vec::swap_remove function panics on out of bounds, which I believe is just a mistake, but not able to be changed. VecDeque returns an Option on swap removes, which avoids the requirement of doing bounds checks.

Either we:

  • Keep as is, the benefit is that it matches std Vec implementation
  • Change to returning Option, seems like a better API and will not have asymmetry when/if we implement VecDeque

@matklad, if you have a second, seems like this might be a high-value but quick gut check for what might be better long-term.

@austinabell austinabell merged commit 3c32d4f into master Aug 23, 2022
@austinabell austinabell deleted the austin/vec_stabilize branch August 23, 2022 14:21
@austinabell
Copy link
Contributor Author

Merging now. Can change the above, if decided, before the next stable release (likely in a few weeks). Likely fine to stay as is though to keep parity with std::Vec

@@ -7,6 +7,7 @@ use std::marker::PhantomData;
use borsh::{BorshDeserialize, BorshSerialize};

use crate::collections::append_slice;
use crate::store::IndexMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is unused except with the "unstable" feature

Suggested change
use crate::store::IndexMap;
#[cfg(feature = "unstable")]
use crate::store::IndexMap;

Copy link
Contributor

@miraclx miraclx Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI should probably deny warnings, wdyt? #895

Comment on lines +1 to +8
//! Collections and types used when interacting with storage.
//!
//! This module is the updated version of [`near_sdk::collections`](crate::collections) where the
//! data structures are more optimized and have a closer API to [`std::collections`].
//!
//! These collections are more scalable versions of [`std::collections`] when used as contract
//! state because it allows values to be lazily loaded and stored based on what is actually
//! interacted with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth it to spend a bit more words on clarifying the difference between in-memory collections and such persistent collections. It's not immediately obvious why this needs needs to exist.

My take would be something like this:


Fundamentally, a contract's storage is a key/value store where both keys and values are just Vec<u8>. If you want to store some structured data, for example, Vec<Account>, one way to achieve that would be to serialize the Vec to bytes and store that. This has a drawback in that accessing or modifying event a single element would require reading the whole vec from the storage.

That's where store modules helps. It collections are backed by a key value store. For example, a store::Vector is stored as several key-value pairs, where indices are the keys. So, accesing a single element would only load this specific element.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is that paragraph which indicates the high-level difference, but I suppose I can add more detail here.

Comment on lines +3 to +4
//! This module is the updated version of [`near_sdk::collections`](crate::collections) where the
//! data structures are more optimized and have a closer API to [`std::collections`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid mentioning old code in the docs for the new code, except maybe as a footnote in the end. Instead, it's better to add migration docs to the old code.

len: u32,
values: IndexMap<T>,
pub(crate) len: u32,
pub(crate) values: IndexMap<T>,
}

//? Manual implementations needed only because borsh derive is leaking field types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, wait, what's the semantics of BorshSerialize? This only serializes the prefix, so that we can have nested collections, righth?

So, eg, if I want so return the sate of my contract as borsh for some contract call, if I just use serialize what I'd get would be the prefix (pointer), not the actual data? This is a bit surprising, worth documenting perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the main reason is to avoid loading all elements into memory when deserializing state. But yeah, it seems worth it to document that more specifically to avoid returning one of these structures from a function.

Ideally, we wouldn't even have this implementation, but the SDK codegen assumes that the state implements Borsh(De)Serialize to store and load state from storage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit surprising, worth documenting perhaps.

On the NEAR Con, one person was struggling with exactly they had a code like

#[derive(borsh)]
struct Thing {
   components: Vec<Component>
}

#[near_bindgen]
impl Contract {
    pub fn call_me(&self, thing: Thing)
}

They also have trouble figuring out how to make that use borsh or json, and ingeneral got entangled in two different serialization formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, that is very unrelated to this PR. Perhaps worth improving docs/visibility though. @itegulov wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, during NEARCon I have also seen a person who was confused about collections' (de)serialization semantics. Trying to generate an ABI actually helps to catch these kinds of errors as there is no JsonSchema/BorshSchema derivation for collections. We could perhaps somehow tailor the errors to communicate the reasoning (i.e. you are not supposed to use Vec for your public API) to the users.

@@ -61,8 +115,8 @@ pub struct Vector<T>
where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document what's the semantics of prefix argument of the new. Specifically, that the user should ensure that nothing else collides with this prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document why set is needed, and when to use it over IndexMut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants