UniqueIndex range_de #500

uint · 2021-10-20T06:18:08Z

Deals with #461

packages/storage-plus/src/indexes.rs

packages/storage-plus/src/indexed_map.rs

maurolacy · 2021-10-27T08:32:20Z

OK, there's now a working impl of range_de / prefix_de for UniqueIndex and MultiIndex.

These impls come with some shortcomings / gotchas:

UniqueIndex: The PK type needs to be specified, in order to deserialize the pk to it. It comes with a default of (), which means, no deserialization / data will be provided for the primary key. Which could be good / useful for performance. This still needs to be tested, but should work well.

MultiIndex: The last element of the index tuple must be specified with the type you want it to be deserialized. That is, the last tuple element serves as a marker for the deserialization type (in the same way PK does it in UniqueIndex). This is not so convenient. The index function has to return a K (index key) composed of the proper types, and this implies converting a byte slice to the proper type, etc. This is done here in this way just for simplicity.

Clearly (as @uint said to me in chat), these indexes (particularly MultiIndex) merit a re-design / serious refactoring. To name a few points:

~~First, let's refactor UniqueIndex and MultiIndex into their own, different files, for ease of modification and maintenance. (Refactor UniqueIndex and MultiIndex into their own files #530)~~ (done)
In MultiIndex, remove the pk from the index key spec. That is, handle index key multiplicity internally, using the pk or another method (I think this is doable, and I plan to address it in another iteration). Alternatively, make the index function type-aware on the pk element. That is, instead of using a byte slice, use a proper type for the pk element. (Remove the primary key from the MultiIndex key specification #533)
For MultiIndex, avoid querying the store twice to get the value associated with a pk. That means, storing the value in the index key directly (like UniqueIndex does). This will be clostlier in terms of storage, but more efficient / performant for iterations. It will also be simpler to implement (and more robust / clear). (no. We want that to handle updated / removed values without re-indexing).
Signal the pk deserialization type of MultiIndex using a PK trait or similar, like in UniqueIndex. In passing, rename K to IK (Index Key) for clarity. Then, make those PK traits automatic / inherited from the IndexedMap specification. That means, better encapsulation / coupling between IndexedMap and the *Index impls. (Improve MultiIndex pk deserialization #531)
Unify the key returned by UniqueIndex and MultiIndex. Currently, UniqueIndex returns just the pk, whereas MultiIndex returns the remaining of the key, including the pk as last element. Either make both return just the pk, or make UniqueIndex behave like MultiIndex (better for usability / options). (UniqueIndex / MultiIndex key consistency #532)

Will create issues with these, and they can be addressed eventually.

maurolacy · 2021-10-27T08:40:03Z

Also, there are a number of impls / methods that are still missing, for IndexedMap, SnapshotMap, and IndexedSnapshotMap.

Let's address those in another PR, though. (see #461 (comment))

maurolacy · 2021-10-27T08:42:09Z

@uint I cannot put you as reviewer, but please take a look. Also, some documentation on this (besides tests) is still missing. Let's add it before merging. Along with more tests.

ethanfrey · 2021-10-27T16:03:17Z

Some comments here... let's not get too overboard with more issues.

TL;DR: there is some good cleanup here that should be done. There are some larger design changes I would put off to later, and do in a non-breaking way, and support multiple MultiIndex approaches. But we should document them better for sure.

OK, there's now a working impl of range_de / prefix_de for UniqueIndex and MultiIndex.

These impls come with some shortcomings / gotchas:

UniqueIndex: The PK type needs to be specified, in order to deserialize the pk to it. It comes with a default of (), which means, no deserialization / data will be provided for the primary key. Which could be good / useful for performance. This still needs to be tested, but should work well.

Fair enough for now. But usually there is some critical info in the pk that is not in the value (like owner's address). We should deal with this in the future.

MultiIndex: The last element of the index tuple must be specified with the type you want it to be deserialized. That is, the last tuple element serves as a marker for the deserialization type (in the same way PK does it in UniqueIndex). This is not so convenient. The index function has to return a K (index key) composed of the proper types, and this implies converting a byte slice to the proper type, etc. This is done here in this way just for simplicity.

Ah, the 1 we store to hold it's place... yeah, making this "work" is okay. We could try to make it prettier, but that will probably be state-breaking (forcing a migration of all indexes), so we should do it asap or never. (Before many people build 1.0 contracts and cannot update to next cw-plus.

Clearly (as @uint said to me in chat), these indexes (particularly MultiIndex) merit a re-design / serious refactoring. To name a few points:

First, let's refactor UniqueIndex and MultiIndex into their own, different files, for ease of modification and maintenance.

Sounds good.

In MultiIndex, remove the pk from the index key spec. That is, handle index key multiplicity internally, using the pk or another method (I think this is doable, and I plan to address it in another iteration). Alternatively, make the index function type-aware on the pk element. That is, instead of using a byte slice, use a proper type for the pk element.

Huh? We use the pk as a minimal viable differentiation between the different indexed values. The considered alternative was to have one index, and point to a Vec of all pk's that fill that index. For small number of items at one index it is better, but becomes almost unusable if there are 1000s at one index.

We could provide 2 different implementations, one that uses pk in the key, other that stores Vec<Vec<u8>> for all pks as value. I would only use the second when you are sure some malicious actor cannot feed 1000s of values in there.

For MultiIndex, avoid querying the store twice to get the value associated with a pk. That means, storing the value in the index key directly (like UniqueIndex does). This will be clostlier in terms of storage, but more efficient / performant for iterations. It will also be simpler to implement (and more robust / clear).

Storing the value directly in the index has an issue... every time you update the type, you must update all references. Currently, if you modify a value, but it doesn't change the indexes (you change different fields), those are not updated at all (at least that was the intention). Again, this is a design trade-off and I very consciously made this choice. Maybe the other one is valid as well.

The idea of doing some cleanup (multiple files) and then providing multiple MultiIndex implementations does seem okay.

Signal the pk deserialization type of MultiIndex using a PK trait or similar, like in UniqueIndex. In passing, rename K to IK (Index Key) for clarity. Then, make those PK traits automatic / inherited from the IndexedMap specification. That means, better encapsulation / coupling between IndexedMap and the *Index impls.

Makes sense

Unify the key returned by UniqueIndex and MultiIndex. Currently, UniqueIndex returns just the pk, whereas MultiIndex returns the remaining of the key, including the pk as last element. Either make both return just the pk, or make UniqueIndex behave like MultiIndex (better for usability / options).

Very good point.

maurolacy · 2021-10-27T17:09:29Z

TL;DR: there is some good cleanup here that should be done. There are some larger design changes I would put off to later, and do in a non-breaking way, and support multiple MultiIndex approaches. But we should document them better for sure.

Agreed.

Fair enough for now. But usually there is some critical info in the pk that is not in the value (like owner's address). We should deal with this in the future.

Just to be clear: You can get those pks deserialized, but you need to specify their type using the PK generic. It's just that we have a PK = () in the struct definition, for backwards compatibility.

MultiIndex: The last element of the index tuple must be specified with the type you want it to be deserialized. That is, the last tuple element serves as a marker for the deserialization type (in the same way PK does it in UniqueIndex). This is not so convenient. The index function has to return a K (index key) composed of the proper types, and this implies converting a byte slice to the proper type, etc. This is done here in this way just for simplicity.

Ah, the 1 we store to hold it's place... yeah, making this "work" is okay. We could try to make it prettier, but that will probably be state-breaking (forcing a migration of all indexes), so we should do it asap or never. (Before many people build 1.0 contracts and cannot update to next cw-plus.

OK. I think the best approach would be to separate the pk from the index, at the generic types level (i.e. having a proper PK generic in the multi index definition). Can work on this soon, as I have a good overview on how to do this now.

In MultiIndex, remove the pk from the index key spec. That is, handle index key multiplicity internally, using the pk or another method (I think this is doable, and I plan to address it in another iteration). Alternatively, make the index function type-aware on the pk element. That is, instead of using a byte slice, use a proper type for the pk element.

Huh? We use the pk as a minimal viable differentiation between the different indexed values. The considered alternative was to have one index, and point to a Vec of all pk's that fill that index. For small number of items at one index it is better, but becomes almost unusable if there are 1000s at one index.

Maybe I wasn't clear. I meant, removing the need to specify the pk in the index tuple. I.e. handling key multiplicity internally / opaquely to the user. I think this can be done, and have an idea on how to do it (related to my comment above).

We could provide 2 different implementations, one that uses pk in the key, other that stores Vec<Vec<u8>> for all pks as value. I would only use the second when you are sure some malicious actor cannot feed 1000s of values in there.

For MultiIndex, avoid querying the store twice to get the value associated with a pk. That means, storing the value in the index key directly (like UniqueIndex does). This will be clostlier in terms of storage, but more efficient / performant for iterations. It will also be simpler to implement (and more robust / clear).

Storing the value directly in the index has an issue... every time you update the type, you must update all references. Currently, if you modify a value, but it doesn't change the indexes (you change different fields), those are not updated at all (at least that was the intention). Again, this is a design trade-off and I very consciously made this choice. Maybe the other one is valid as well.

You are right, I hadn't thought of that. It's costlier in terms of updates / removals. And, it has to be done for all the multi indexes. I like the "pk indirection" approach more now.

The idea of doing some cleanup (multiple files) and then providing multiple MultiIndex implementations does seem okay.

Signal the pk deserialization type of MultiIndex using a PK trait or similar, like in UniqueIndex. In passing, rename K to IK (Index Key) for clarity. Then, make those PK traits automatic / inherited from the IndexedMap specification. That means, better encapsulation / coupling between IndexedMap and the *Index impls.

Makes sense

The last part about encapsulation / coupling is not fully clear to me yet, but it can probably be done with some add_index helper or so.

Unify the key returned by UniqueIndex and MultiIndex. Currently, UniqueIndex returns just the pk, whereas MultiIndex returns the remaining of the key, including the pk as last element. Either make both return just the pk, or make UniqueIndex behave like MultiIndex (better for usability / options).

Very good point.

I can work on this stuff next month, if there's not so much urgency with contracts work. I would really like to put this in a better / more consistent / more user friendly shape asap.

uint · 2021-10-28T21:12:12Z

@maurolacy Just looked through this. I don't think I have any specific nitpicks or ideas. Nice to see it works now and is more complete, shame about the less than pretty stuff.

maurolacy · 2021-10-29T05:12:29Z

No worries. I'll document this a little when finding some time, and I think we can merge it. And handle improvements in follow-ups.

ethanfrey · 2021-11-02T20:53:17Z

I agree this is mergeable.
Some follow up is clear.

Let's find time to discuss architecture for the larger follow-up issues before tackling them (but feel free to make issues in the meantime)

ethanfrey · 2021-11-15T10:15:08Z

Oh, man, there are like 5 PRs built on top of this, right?

Can you merge main in sometime (no more rebase possible)

maurolacy · 2021-11-15T14:53:52Z

Yes, I'll rebase everything once the int key changes are merged to main.

maurolacy · 2021-11-15T15:09:51Z

Can you merge main in sometime (no more rebase possible)

Just rebased it without issues.

ethanfrey · 2021-11-15T17:58:27Z

Cool. Want to merge this first, then the other branches on top?

maurolacy · 2021-11-15T18:41:08Z

Sure.

maurolacy · 2021-11-22T09:23:20Z

OK, all the follow-up issues / comments are addressed, are changes are merged / integrated here.

This is also rebased from master.

Let's merge this.

packages/storage-plus/src/indexes/mod.rs

packages/storage-plus/src/indexed_snapshot.rs

uint · 2021-11-24T12:02:54Z

LGTM overall. Does what we need it to!

uint requested a review from maurolacy October 20, 2021 06:18

uint changed the base branch from main to 461-indexedmap-range_de October 20, 2021 06:19

maurolacy reviewed Oct 20, 2021

View reviewed changes

uint mentioned this pull request Oct 20, 2021

range_de for IndexMap #498

Merged

Base automatically changed from 461-indexedmap-range_de to main October 20, 2021 17:24

uint commented Oct 20, 2021

View reviewed changes

packages/storage-plus/src/indexed_map.rs Outdated Show resolved Hide resolved

maurolacy self-assigned this Oct 26, 2021

maurolacy force-pushed the 461-uniqueindex-range_de branch from 7838470 to d323592 Compare October 26, 2021 13:48

maurolacy marked this pull request as ready for review October 26, 2021 13:54

maurolacy force-pushed the 461-uniqueindex-range_de branch from d323592 to a8e1434 Compare October 26, 2021 14:47

maurolacy requested a review from ethanfrey October 27, 2021 08:42

maurolacy force-pushed the 461-uniqueindex-range_de branch 2 times, most recently from f106b3b to 5fb6ec5 Compare October 27, 2021 13:34

maurolacy force-pushed the 461-uniqueindex-range_de branch from 5fb6ec5 to ae6871e Compare November 3, 2021 06:06

maurolacy mentioned this pull request Nov 4, 2021

Add range_de to Map-like structs #461

Closed

maurolacy force-pushed the 461-uniqueindex-range_de branch from e00072f to 067fb5f Compare November 9, 2021 14:37

maurolacy force-pushed the 461-uniqueindex-range_de branch from 37d9988 to 6eb222d Compare November 15, 2021 15:08

maurolacy added 19 commits November 22, 2021 10:16

Add prefix_de test

18ca487

Add missing keys_de shortcut

eddddf0

Add missing prefix_range_de to SnapshotMap

1148c35

Add extra sub_prefix_de test

3c2ce00

Add prefix_range_de test for completeness

4c8ca24

Add sub_/prefix_de to SnapshotMap for completeness

4db7407

Add sub_/prefix_de tests

7f5b80f

Change test names for consistency

e5eafe6

Change prefix_range_de tests to use string keys

f56f457

Remove unneeded clones

cae3ace

Add unique index key deserialization FIXMEs

c6d55d0

Change unique index test names for consistency

10ce6d6

Add range_/keys_de to IndexedSnapshotMap

d4af909

Add sub_/prefix_de to IndexedSnapshotMap for completeness

e070ab4

Adapt tests to better work with deserializable values

535c8ff

Add range_de multi index tests

c16e798

Rename unique index range tests for consistency

30c6a50

Add range_de unique index tests

a2a9fd7

Add simple base sub_/prefix_/range_de tests for completeness

ba3099f

maurolacy force-pushed the 461-uniqueindex-range_de branch from ce74d48 to ba3099f Compare November 22, 2021 09:21

maurolacy and others added 2 commits November 24, 2021 10:14

Merge branch 'main' into 461-uniqueindex-range_de

61fb8e9

Extend Index keys deserialization docs

49acea9

uint commented Nov 24, 2021

View reviewed changes

packages/storage-plus/src/indexes/mod.rs Show resolved Hide resolved

ueco-jb reviewed Nov 24, 2021

View reviewed changes

packages/storage-plus/src/indexed_snapshot.rs Outdated Show resolved Hide resolved

ueco-jb approved these changes Nov 24, 2021

View reviewed changes

Clarify prefix_range/_de docs

b4203be

maurolacy merged commit bdc9965 into main Nov 24, 2021

maurolacy deleted the 461-uniqueindex-range_de branch November 24, 2021 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UniqueIndex range_de #500

UniqueIndex range_de #500

uint commented Oct 20, 2021

maurolacy commented Oct 27, 2021 •

edited

Loading

maurolacy commented Oct 27, 2021 •

edited

Loading

maurolacy commented Oct 27, 2021 •

edited

Loading

ethanfrey commented Oct 27, 2021

maurolacy commented Oct 27, 2021 •

edited

Loading

uint commented Oct 28, 2021

maurolacy commented Oct 29, 2021 •

edited

Loading

ethanfrey commented Nov 2, 2021

ethanfrey commented Nov 15, 2021 •

edited

Loading

maurolacy commented Nov 15, 2021

maurolacy commented Nov 15, 2021

ethanfrey commented Nov 15, 2021

maurolacy commented Nov 15, 2021

maurolacy commented Nov 22, 2021

uint commented Nov 24, 2021

UniqueIndex range_de #500

UniqueIndex range_de #500

Conversation

uint commented Oct 20, 2021

maurolacy commented Oct 27, 2021 • edited Loading

maurolacy commented Oct 27, 2021 • edited Loading

maurolacy commented Oct 27, 2021 • edited Loading

ethanfrey commented Oct 27, 2021

maurolacy commented Oct 27, 2021 • edited Loading

uint commented Oct 28, 2021

maurolacy commented Oct 29, 2021 • edited Loading

ethanfrey commented Nov 2, 2021

ethanfrey commented Nov 15, 2021 • edited Loading

maurolacy commented Nov 15, 2021

maurolacy commented Nov 15, 2021

ethanfrey commented Nov 15, 2021

maurolacy commented Nov 15, 2021

maurolacy commented Nov 22, 2021

uint commented Nov 24, 2021

maurolacy commented Oct 27, 2021 •

edited

Loading

maurolacy commented Oct 27, 2021 •

edited

Loading

maurolacy commented Oct 27, 2021 •

edited

Loading

maurolacy commented Oct 27, 2021 •

edited

Loading

maurolacy commented Oct 29, 2021 •

edited

Loading

ethanfrey commented Nov 15, 2021 •

edited

Loading