Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key trimming after removing a key from index #7

Open
adlrocha opened this issue Jul 9, 2021 · 3 comments
Open

Key trimming after removing a key from index #7

adlrocha opened this issue Jul 9, 2021 · 3 comments

Comments

@adlrocha
Copy link
Collaborator

adlrocha commented Jul 9, 2021

Extends: #5

After removing a key from the index, if we want to optimize for storage we need to reorganize keys.
We only save a few bytes per removal, and only in certain cases, but it may be worth if we are storing a large amount of data (it may have limited impact for the MVP):

// Trim example
3 4 5                3 4 5                     3 4 5                        3 4 5    
3 4 6 6              3 4 6 6                   3 4 6 6                      3 4 6 6
3 4 6 8 (remove) --> 3 4 6 9 1 3 (remove) -->  3 4 6 9 2 4 (can trim?) -->  3 4 6 9 (trimmed)
3 4 6 9 1 3          3 4 6 9 2 4               
3 4 6 9 2 4           


// No trim example
3 4 5                3 4 5                     3 4 5                        3 4 5    
3 4 6 6              3 4 6 6                   3 4 6 6                      3 4 6 6
3 4 6 8 (remove) --> 3 4 6 9 1 3 (remove) -->  3 4 6 9 2 4 (can trim?) -->  3 4 6 9 2 4 (nope..)
3 4 6 9 1 3          3 4 6 9 2 4               3 4 6 9 2 5                  3 4 6 9 2 5
3 4 6 9 2 4          3 4 6 9 2 5  
3 4 6 9 2 5

In each removal a single key is eligible to be trimmed, the one that occupies the place of the removed key. The algorithm to trim is the following:

  • Check the largest common prefix of the replacing key with the previous key and the following one.
    • If len(common_prefix_prev) > len(common_prefix_next) --> Can be trimmed
    • Else --> do not trim.
@rvagg
Copy link
Member

rvagg commented Jul 12, 2021

would love to have @vmx engage on this but I think he's out for a couple of weeks

@adlrocha
Copy link
Collaborator Author

Agree @rvagg. There's no rush to implement this so let's wait for his input. Thanks!

@vmx
Copy link
Member

vmx commented Jul 26, 2021

The textual explanation kind of seems to be correct, though I'd like to make it more explicit: whenever you do any operations on the keys, you always need to look at the previous and the next key. I think (I'm not 100% sure) that if the invariant "it is always the shortest possible key" always holds true, that you always need to look at the next key and no key after that (anyone good with formal proving? I think this would be a nice case to get into that world, I might do that).

I provide examples here, that can then be translated into test cases once it gets implemented.

The first example isn't completely right, as it doesn't fulfill the "shortest key possible" invariant. The last two lines shouldn't have the last digit, here's the fixed version:

// Trim example
3 4 5                3 4 5                   3 4 5                      3 4 5    
3 4 6 6              3 4 6 6                 3 4 6 6                    3 4 6 6
3 4 6 8 (remove) --> 3 4 6 9 1 (remove) -->  3 4 6 9 2 (can trim?) -->  3 4 6 9 (trimmed)
3 4 6 9 1            3 4 6 9 2               
3 4 6 9 2      

Though you also need to look at the "new" next key before trimming. Here's an example based on the previous one, where the key doesn't get trimmed, due to the next key:

// Look at new next key trim example
3 4 5                3 4 5                   3 4 5                      3 4 5    
3 4 6 6              3 4 6 6                 3 4 6 6                    3 4 6 6
3 4 6 8 (remove) --> 3 4 6 9 1 (remove) -->  3 4 6 9 2 (can trim?) -->  3 4 6 9 2 (*not* trimmed)
3 4 6 9 1            3 4 6 9 2               3 4 6 9 3                  3 4 6 9 3 
3 4 6 9 2            3 4 6 9 3 
3 4 6 9 3     

Given that fact, I'd like to expand the first example again (the first example is still valuable as a unit test though, it's the case where there is no next key):

// Trim example with next key that doesn't have a large common prefix
3 4 5                3 4 5                   3 4 5                      3 4 5    
3 4 6 6              3 4 6 6                 3 4 6 6                    3 4 6 6
3 4 6 8 (remove) --> 3 4 6 9 1 (remove) -->  3 4 6 9 2 (can trim?) -->  3 4 6 9 (trimmed)
3 4 6 9 1            3 4 6 9 2               3 4 7                      3 4 7 
3 4 6 9 2            3 4 7 
3 4 7     

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants