Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): implement its own GC method #2802

Merged
merged 36 commits into from
Jun 19, 2020
Merged

feat(storage): implement its own GC method #2802

merged 36 commits into from
Jun 19, 2020

Conversation

ailisp
Copy link
Member

@ailisp ailisp commented Jun 6, 2020

implement #2732
remove cache and update gc count. because gc count is in batch update, cache it in ChainStoreUpdate to implement semantic: column.gc_count += 1.

Test Plan

Test it still pass existing gc test and gc count of each column correctly update when remove_old_data
This PR is also introduce the first storage migration, tested by near init, near run with master binary, then use this branch binary to run, observe not crash and new column created.

@gitpod-io
Copy link

gitpod-io bot commented Jun 6, 2020

@ailisp ailisp changed the title skeleton that rustc accepts feat(storage): implement its own GC method Jun 6, 2020
@Kouprin
Copy link
Member

Kouprin commented Jun 6, 2020

This is awesome. I approve the structure.
@frol any your suggestions at this point?

@frol
Copy link
Collaborator

frol commented Jun 6, 2020

I like it!

@ailisp
Copy link
Member Author

ailisp commented Jun 9, 2020

Call gc method on all gcing now. test failed due to:

debug_assert!(
            self.transaction.ops.len()
                == self
                    .transaction
                    .ops
                    .iter()
                    .map(|op| match op {
                        DBOp::Insert { col, key, .. } => {
                            (*col as u8, key)
                        }
                        DBOp::Delete { col, key } => {
                            (*col as u8, key)
                        }
                    })
                    .collect::<std::collections::HashSet<_>>()
                    .len(),
            "Transaction overwrites itself: {:?}",
            self
        );

Trying to figure out a way to write "update times of gc on col during this batch of update" only once, to pass this assert

core/store/src/lib.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
@ailisp
Copy link
Member Author

ailisp commented Jun 11, 2020

@Kouprin Tests failed quite weirdly. I debug the basic test clear_old_data, was failed at assert block 1 not removed (therefore probably everything is not removed). I added some debug print to trace chain_store_update, store_update merge and commit, and how get_block behaves. It turns out to be block is removed in storage, but still exists in sizedcache (know this by tracing read_with_cache, and if comment read_with_cache "read_without_cache" all gc tests passing),

And right after self.chain_store.blocks.cache_remove(key) do a print self.chain_store.blocks.cache_get(key) shows None so it's indeed removed. Then it's added back by self.chain_store.blocks.cache_set in ChainStoreUpdate.commit, and cause the fail. I still can't figure out why this happen with ChainStoreUpdate.gc_col approach, while not happen in the DBCol.gc(store_update, key, cache) approach. Will continue debug tomorrow. (Appreciate if you encounter similar issue before and has a solution!)

@Kouprin
Copy link
Member

Kouprin commented Jun 11, 2020

@Kouprin Tests failed quite weirdly. I debug the basic test clear_old_data, was failed at assert block 1 not removed (therefore probably everything is not removed). I added some debug print to trace chain_store_update, store_update merge and commit, and how get_block behaves. It turns out to be block is removed in storage, but still exists in sizedcache (know this by tracing read_with_cache, and if comment read_with_cache "read_without_cache" all gc tests passing),

And right after self.chain_store.blocks.cache_remove(key) do a print self.chain_store.blocks.cache_get(key) shows None so it's indeed removed. Then it's added back by self.chain_store.blocks.cache_set in ChainStoreUpdate.commit, and cause the fail. I still can't figure out why this happen with ChainStoreUpdate.gc_col approach, while not happen in the DBCol.gc(store_update, key, cache) approach. Will continue debug tomorrow. (Appreciate if you encounter similar issue before and has a solution!)

The root of evil is in calling self.gc_col(ColBlockPerHeight, &block_hash_vec)? after self.gc_col(ColBlock, &block_hash_vec)?. This is definitely a bug because inside gc_col() for ColBlockPerHeight you're trying to call get_block() which is absent.

Hope it helps.

@ailisp ailisp marked this pull request as ready for review June 11, 2020 21:05
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple things:

  • Can you run nightly tests to make sure that things are not trivially broken?
  • There seems to be no check on whether all columns are gc'ed. For example, it seems that we still don't gc ColTransactionResult

@ailisp
Copy link
Member Author

ailisp commented Jun 11, 2020

Can you run nightly tests to make sure that things are not trivially broken?

Good point, running.

There seems to be no check on whether all columns are gc'ed. For example, it seems that we still don't gc ColTransactionResult

I think this was supposed to check externally in store validator, based on discussion in #2732 ?

@ailisp ailisp requested review from Kouprin and bowenwang1996 June 16, 2020 23:07
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kouprin looks like there are some columns that are not garbage collected. Can you fix it?

chain/chain/src/store.rs Show resolved Hide resolved
chain/chain/src/store.rs Show resolved Hide resolved
chain/chain/src/store.rs Show resolved Hide resolved
chain/chain/src/store.rs Show resolved Hide resolved
chain/chain/src/store.rs Show resolved Hide resolved
chain/chain/src/store.rs Show resolved Hide resolved
@Kouprin Kouprin mentioned this pull request Jun 17, 2020
7 tasks
@Kouprin
Copy link
Member

Kouprin commented Jun 17, 2020

@Kouprin looks like there are some columns that are not garbage collected. Can you fix it?

#2860

chain/chain/src/store.rs Outdated Show resolved Hide resolved
Copy link
Member

@Kouprin Kouprin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is the great job. We need to do the following:

chain/chain/src/store.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented Jun 17, 2020

@mikhailOK please review this PR

Copy link
Collaborator

@frol frol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it is a great improvement! I agree with @Kouprin's suggestions, and here are some comments from me.

We still don't have GC enforcement (when someone adds a new column, we currently can only enforce the GC implementation, but then we should call the method for the column, and that can still be missed), but I don't have a suggestion on how to improve this.

chain/chain/src/store.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
@ailisp
Copy link
Member Author

ailisp commented Jun 17, 2020

We still don't have GC enforcement (when someone adds a new column, we currently can only enforce the GC implementation, but then we should call the method for the column, and that can still be missed), but I don't have a suggestion on how to improve this.

This answer from @Kouprin : this'll be checked in StorageValidator, any col that in SHOULD_COL_GC should have non 0 gc count.

chain/chain/src/store.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/store.rs Show resolved Hide resolved
chain/chain/src/store.rs Outdated Show resolved Hide resolved
@ailisp ailisp requested a review from bowenwang1996 June 18, 2020 22:14
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this per column gc count?

@ailisp
Copy link
Member Author

ailisp commented Jun 18, 2020

What's the purpose of this per column gc count?

it's for being checked in storevalidator so that all columns should be GCed should > 0 in per column gc count. This would prevent a column is marked as "should gc" but forgot gc it.

@bowenwang1996
Copy link
Collaborator

it's for being checked in storevalidator so that all columns should be GCed should > 0 in per column gc count. This would prevent a column is marked as "should gc" but forgot gc it.

What's the difference between 1 and 10 then? Is the count itself actually used?

@ailisp
Copy link
Member Author

ailisp commented Jun 18, 2020

What's the difference between 1 and 10 then? Is the count itself actually used?

From a use case mentioned from @Kouprin somewhere in #2732: Would be useful in test, such as if ColBlock has been gc count over 100, then some other columns should also be at least 1.
Also I think it's potentially helpful for us to understand how many GC operations happen in a production setting and future optimization

@ailisp ailisp merged commit d91d62a into master Jun 19, 2020
@ailisp ailisp deleted the db-col-gc-method branch June 19, 2020 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants