Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Querying API #208

Closed
cybercent opened this issue Jul 4, 2020 · 37 comments · Fixed by #1892
Closed

Storage Querying API #208

cybercent opened this issue Jul 4, 2020 · 37 comments · Fixed by #1892
Assignees

Comments

@cybercent
Copy link

cybercent commented Jul 4, 2020

Issue To Be Solved

A way to iterate over the storage to be able to list the resources owned by an account.

(Optional): Suggest A Solution

  • using a Cadence script:
    for entry in account.storageEntries { log(entry) }
  • using RPC:
    getAccount(“0x01")
    The response result can have a storage field with data in the cadence format (json with types)
  • Details of the technical implementation
  • Tradeoffs made in design decisions
  • Caveats and considerations for the future

If there are multiple solutions, please present each one separately. Save comparisons for the very end.)

(Optional): Context

I’m working on a marketplace and I was looking for a way to list all NFTs owned by an account.

@turbolent turbolent changed the title List all resources owned by an account Storage Querying API Sep 9, 2020
@cybercent
Copy link
Author

Hi @turbolent is there an ETA for the Storage Querying API ?

@turbolent
Copy link
Member

@cybercent There's no ETA, but it's definitely one of the things we'll work on soon, along with other storage API improvements like #369 and #376.

I've added a roadmap for Cadence yesterday, and improving the storage API is a high-priority item.

If you're interested in helping out, let me know!

@10thfloor
Copy link
Contributor

Bump!

@dryruner
Copy link

@turbolent
Copy link
Member

@testrain No, this issue is still open. The documentation you linked requires knowledge about the fact that the account has certain values stored. The idea for this issue is to allow developers to query what storage paths store values, and what they are, without having to prior knowledge what is stored (i.e. what storage paths are used).

We're currently working on a new dictionary data structure and will eventually use is for account storage, effectively giving us the functionality to list storage paths which store values for free.

@atapin
Copy link

atapin commented Sep 16, 2021

@turbolent any updates for this issue?

@D10100111001
Copy link

This is the one thing that is highly painful and annoying when working with Flow. I think this should be prioritized. Any other blockchain allows any dev to generically get all NFTs for a given account. In the case of flow, because an NFT can be stored into any storage path, it is impossible to generically show all NFTs owned by the an account. Another problem that will need to be solved after the generic storage querying is figured out is generically loading metadata for the NFTs. The INFT interface should have the signature for a the getMetadata function so developers can generically pull it for any NFT in a collection.

@dete
Copy link

dete commented Nov 9, 2021

This is the one thing that is highly painful and annoying when working with Flow.

Agreed!

Any other blockchain allows any dev to generically get all NFTs for a given account.

Actually, I would disagree with this. Looking at Ethereum, it's easy to find out the list of NFTs of a specific type owned by an account, but that's also easy on Flow (by assuming the Collection is in the standard path). But on Ethereum, you have to know the contract address of all possible NFTs to know the list of all NFTs owned by a single account. That's barely tractable on Ethereum today, but not especially scalable for the future. Once Cadence has storage iteration, this will be something that is a MASSIVE improvement on Flow than on Ethereum.

This feature is taking a long time because it depends on a pretty significant rewrite of the underlying storage layer of Cadence. That rewrite is rolling out soon (code is complete and tested!), and so we hope to have storage iteration available to scripts and contracts very soon. Thanks to y'all for your patience!

@cybercent
Copy link
Author

@dete thank you for the update!
Thank you to the team for the continued effort you guys put in. Highly appreciated!
🖖

@turbolent
Copy link
Member

Once #1248 is merged we finally have the infrastructure in place to iterate over account storage. The next step would be to come up with an API.

Some challenges that need to be addressed:

  • Storage may contain resources
  • Storage may contain many objects
  • Cadence does not (yet) provide generics to Cadence programs (only built-in types may be generic)
  • Cadence does not (yet) provide means to dereference value-kinded objects (e.g. there is no way to get an Int from an &Int)

The first point makes defining a "storage entry" type for both resource-kinded and value-kinded (e.g. structs) objects difficult. For resource-kinded values, the entry should probably provide an authorized reference (auth &AnyResource). For value-kinded objects it would be more ergonomic to provide the value itself, instead of a reference to it (AnyStruct).

The second point makes adding a field let storageEntries: [StorageEntry] inefficient: If there are many objects in storage, then a large array would have to be temporarily allocated. A more efficient alternative could be an iterator.

Also, it might make sense to provide the run-time type of the object in the storage entry (e.g. a field let type: Type)

@turbolent
Copy link
Member

turbolent commented Nov 30, 2021

Maybe it is enough if the storage entries only contain the path and the type, and do not contain a value – we already have the storage API which can be used to load/copy/borrow the value for a given path.

struct StorageEntry {
  let path: StoragePath
  let type: Type
}

let account = getAccount(0x1)
account.forEachStored(fun (entry: StorageEntry): Bool {
  if entry.type == Type<FlowToken.Vault>() {
    log(entry.path)
    // stop iteration
    return false    
  }


  // continue iteration
  return true
})

Or without introducing a new type, which is not very useful on its own and making the information parameters of the iteration function:

let account = getAccount(0x1)
account.forEachStored(fun (path: StoragePath, type: Type): Bool {
  if type == Type<FlowToken.Vault>() {
    log(path)
    // stop iteration
    return false    
  }


  // continue iteration
  return true
})

@turbolent
Copy link
Member

In addition to iteration over the account storage, we can also add iteration over public and private links/capabilities, and the question is also, what should the API look like

@bluesign
Copy link
Contributor

bluesign commented Dec 2, 2021

I think iteration is nice for sure, but also additionally some kind of query functionality can be useful. ( to limit search area and return array of struct )

If we will use the struct below.

struct StorageEntry {
  let path: StoragePath
  let type: Type
}

I think we can also use query based ( returning [StorageEntry] ) functions.

Something like account.storageManager can be useful. ( later even maybe this can be used to move storage setup for DApps into contracts, without passing AuthAccount but passing storageManager )

For example:

account.storageManager.query(_ type: Type) [StorageEntry]
account.storageManager.queryFirst(_ type: Type) StorageEntry?

or if iterator:

account.storageManager.query(_ type: Type) StorageEntryIterator

for storageEntry in account.storageManager.query(Type<Topshot.Collection>()){
}

Also I think we can extend StorageEntry with load and borrow like below, so we can slowly get rid of Paths little bit.

struct StorageEntry {
  let path: StoragePath
  let type: Type
  fun borrow()
  fun load()
}

If we cover also links and capabilities with StorageEntry, can be useful I guess.

Other then query here also:

  • keep track of capabilities ( given to that path to other accounts )
  • if possible option to revoke

I am a bit torn between if needed to add some helper booleans like canBorrow canLoad etc. ( like capability check method )

But by moving a lot of functionality into StorageEntry can make it really powerful.

Also something like queryAndBorrowFirst can be useful.

Some typical Transaction can look like this:

let provider = acct.strorageManager.queryAndBorrowFirst(Type<DapperUtilityCoin.Vault{FungibleToken.Provider}>()) 
    ?? panic("Could not borrow a reference to the buyers FlowToken Vault")
self.purchaseTokens <- provider.withdraw(amount: UFix64(6)) as! @DapperUtilityCoin.Vault

@bjartek
Copy link
Contributor

bjartek commented Dec 3, 2021

I think it would be very nice to now only search in storage but also in public paths. Storage are right now not available from scripts, so the usefullness of that is limited IMHO.

Questions I want to have answered is:

  • Give me all capablities that can be borrowed as &{NonFungibleToken.CollectionPublic} or give med all wallets that is &{FungibleToken.Receiver}.

I really like @bluesign api from above. But for me allowing it to also search in what you can get a capability to this resource as would be awesome.

It would also be useful if we could combine this with the Metadata standard and be able to say. Give me all resources that have this View. Or even "give me all items in storage" that have this view, but that is a little bit heavy maybe?

@bluesign
Copy link
Contributor

bluesign commented Dec 3, 2021

I think best option is to expose storageManager ( or whatever we call it ) to both PublicAccount and AuthAccount. ( Immutability can work here very good )

It would also be useful if we could combine this with the Metadata standard and be able to say. Give me all resources that have this View. Or even "give me all items in storage" that have this view, but that is a little bit heavy maybe?

This would be very nice +1

Actually we can maybe introduce some Filter function to iterator/array etc, which will get a Cadence Any, then you will return True/False for some condition etc.

@bjartek
Copy link
Contributor

bjartek commented Dec 20, 2021

Are there any concrete plans for when this will be done? It is very important imho.

@bjartek
Copy link
Contributor

bjartek commented Jan 10, 2022

Are there any plans for this to span addresses? One other common «issue» on flow is get a listing of all nfts of a type across addresses.

Or even get a list of all account with the given thing stored/linked.

@turbolent
Copy link
Member

turbolent commented Jan 19, 2022

Thank you for the great ideas!

I agree, a type-based storage querying API would be great to have. Unfortunately it is currently not possible, as the FVM / state trie does not provide an index of type to storage locations.

As for iteration to other domains, like public and private, in addition to storage: That would be possible and sounds like a good idea 👍

@bluesign
Copy link
Contributor

@turbolent wouldn't it possible to handle this on cadence side ? Eventually it will be just a dictionary, {String : [Path]} where key is a type identifier. I don't think we need ability to deep query there.

@turbolent
Copy link
Member

@bluesign Yes, if we wanted to limit it to just root/top-level values, we could maintain the index from the Cadence side. I wonder though how useful it would be, as some developers may assume it returns all values of the type.

We had also discussed exposing the value traversal functionality that already exists in the implementation to Cadence, so one could use the querying function and the traversal function to find all values of a type (though it would be very expensive, as all intermediate data would need to get loaded)

@bluesign
Copy link
Contributor

@turbolent this is very good question actually.

In the context of containers, multi level storage, I see those options:

  • array / dictionary
  • composite resource

For composite resources, I think it is not not only feasible, but also not possible to query their members reliably.

I think even NFTCollection should not be queried for ownedNFTs. (Even id is key there we would need to use uuid which will not provide info)

For array and dictionaries: I think here is also little tricky, It would need some addressing like mentioned before, but it would be expensive.

On the other hand, I don't see much usage of dictionary/array in storage, they are rarely used in top level storage.

For me it seems like acceptable trade off to query top level by type, and possibly get rid of Paths a bit.

But more importantly, if we make Query storage by Type for collections and vaults, it would be super easy for wallets to say something like : 'this transaction will access your FlowTokenVault' etc so basic human readable transactions can be a thing in the future without simulating the transaction.

@turbolent
Copy link
Member

I totally agree that it would be nice to add a type-based querying API eventually, but I would also like to see an iteration querying API before that, as it could be implemented now.

What do people think about the proposal above (#208 (comment))? Or maybe someone has another good idea for an iteration API?

@bjartek
Copy link
Contributor

bjartek commented Mar 3, 2022

I think that proposal is a good place to start. Then you can build higher level abstractions on top of it later.

@sideninja
Copy link
Contributor

sideninja commented Mar 21, 2022

I totally agree that it would be nice to add a type-based querying API eventually, but I would also like to see an iteration querying API before that, as it could be implemented now.

What do people think about the proposal above (#208 (comment))? Or maybe someone has another good idea for an iteration API?

I like the second API example.

@turbolent
Copy link
Member

turbolent commented Jul 8, 2022

Discussed iteration / pagingation with @fxamacker and @ramtinms.

It won't be possible to create a "cursor" that will allow starting iteration over an atree data structure (e.g. ordered map, as used for account storage) in one script or transaction, then continue iteration in a subsequent script transaction against the state the cursor was created for.

If we loosen the requirement to effectively query previous state and always query the current state, the next problem are mutations to the container: If the container is mutated, e.g. a new key is inserted into an ordered map, then the iteration order changes, invalidating the cursor and leading to the iterator e.g. not reporting certain elements or reporting them multiple times.

If we loosen the requirement further and assume that the iterated container is not mutated, then @fxamacker thinks it might be possible to provide a cursor that can be serialized, so e.g. it can be returned from a script, then provided to a subsequent script, or it could be stored on-chain.

To help with the limitation of mutations invalidating the cursor, we could ensure the iterated data structure is not mutated, for example by:

  • Recommending developers to prevent mutations, e.g. by locking, and/or detecting mutations
  • Add built-in support for locking data structure for mutations, and/or provide a way to detect mutations by e.g. generating a "state identifier" for the data structure

@bluesign
Copy link
Contributor

bluesign commented Jul 9, 2022

I think solution here is detecting mutation ( not avoiding ). Technically nothing can iterate on mutating object. So mutation should just invalidate the cursor. I don't know much about atree structure, but probably ordered map can use some state identifier + key as cursor.

But if you limit discussion to path keys only. As we have ordered map, seek with prefix can work too. with seekWithPrefix and next we get rid of cursor totally, mutation responsibility falls on user of the api. But I still think state identifier is very nice idea in my opinion, I think it can be useful in other areas too. [0]

[0] #1777

@turbolent
Copy link
Member

turbolent commented Jul 22, 2022

This should probably implemented in three stages of functionality:

  1. Add support for retrieving all paths of an account, e.g. a computed field Auth/PublicAccount.paths: [Path]. This would not support pagination, but would be useful as an MVP.
  2. Add support for pagination, without handling mutation. This could be a function-based API, like above, or reified as an object as an iterator/enumerator/cursors.
  3. Add support for handling mutation during mutation, e.g. by locking or detecting mutation as described above

As for the implementation:

  • In the checker, AuthAccountType and PublicAccountType, add a new member, a constant field typed let paths: [Path]
  • In the interpreter, in NewAuthAccountValue add a computed field path. Implement it by getting the storage maps for all domains via Interpreter.Storage.GetStorageMap, and use StorageMap.Iterator and StorageMapIterator.NextKey to iterate over all keys of the storage map, the path identifiers. Create paths from the domain and the identifier

@turbolent
Copy link
Member

@janezpodhostnik Do you think we will need to add dedicated metering for this?

@SupunS
Copy link
Member

SupunS commented Jul 29, 2022

After seeing the implementation I have a suggestion for a slightly modified version of the proposed API:

Cross-posting from #1845 (comment):

Would it be better to group these 'paths' related operations (this one and the iterator functions) together? I am thinking of something similar to keys field (of AuthAccount).

e.g:

let paths: AuthAccount.Paths

and AuthAccount.Paths to be something like:

struct Paths {
   pub allPaths: [Path]

   pub fun forEachPublic(_ function: ((PublicPath, Type): Bool))

   pub fun forEachPrivate(_ function: ((PrivatePath, Type): Bool))

   pub fun forEachStored(_ function: ((StoragePath, Type): Bool))
}

The advantage is:

  • It groups similar operations together
  • The usage is more verbose (specially for the iterator functions):
    account.Paths.forEachPublic(...)

@bluesign
Copy link
Contributor

bluesign commented Jul 30, 2022

@SupunS good idea, maybe we can do something like:

With built in enum:

enum StorageDomain{
   ANY
   PUBLIC
   PRIVATE
   STORAGE
}

and

in AuthAccount:

   pub fun getPaths(_ StorageDomain): [Path]
   pub fun forEachPath(_ StorageDomain, _ function: ((Path, Type): Bool))

Do we have any benefit from getting PublicPath, StoragePath instead of Path ?

@turbolent
Copy link
Member

turbolent commented Aug 17, 2022

#1845 and #1851 have landed and will likely be deployed in the following spork

@dsainati1
Copy link
Contributor

The plan is to have iteration abort execution after storage is modified (either by saving or removing from it). However, we want people to be able to search storage for something and then perform one modification; e.g. searching for a specific capability in storage and then unlinking it. So instead of panicking immediately on the mutation, we will instead panic on the next invocation of the callback after the mutation is performed. This way if users return false from the callback, stopping iteration, after they do a write, then this use case is still supported.

@cybercent
Copy link
Author

🙏🏻 I appreciate all your efforts in bringing this feature to life.

@bluesign
Copy link
Contributor

@dsainati1 is it possible to make this panic free by making iterator return Bool instead of Void ? Something like iteration completed or not response.

@dsainati1
Copy link
Contributor

I would worry about people not properly checking the return value of the iterator and thinking that they properly iterated over everything in storage when they actually skipped a large number of elements. Continuing to iterate after storage is modified is significantly bad enough that I feel like a panic is appropriate.

@bluesign
Copy link
Contributor

I feel like this is very confusing API as is:

  • Allowing one mutation is worse than not allowing mutation actually. This creates a foot gun for people to assume one match. ( Already a lot of stuff making unfortunate assumption of one match, storage query API was a remedy for that )

  • If you need to cover multiple matches, you need to first iterate and save paths somewhere, then iterate second time to modify them.

@turbolent
Copy link
Member

Thank you everyone for contributing your feedback. The next release is going to ship the first feature related to storage querying: storage iteration.

This is just the first step and there have been discussions around other kinds of querying APIs above (e.g. by type). Please feel free to open a new feature request for those specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.