Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: btclightclient HeadersState keeper fuzz tests #61

Merged
merged 21 commits into from
Jul 18, 2022

Conversation

vitsalis
Copy link
Member

@vitsalis vitsalis commented Jul 12, 2022

This PR introduces fuzz tests for the state.go and base_btc_header.go keeper files.

It involves the creation of data generation functions and bug fixes that stemmed from the tests.

The newly added data generation methods include methodology for generating random BTCHeaderInfo objects as well as random trees of Headers. Based on the new updates, I have refactored the existing fuzz tests to use the newly added capabilities.

In subsequent pull requests, I'm going to add tests for the rest of the keeper functionality as this one is already getting too large.

@vitsalis vitsalis marked this pull request as ready for review July 13, 2022 15:34
@vitsalis vitsalis requested a review from aakoshh July 13, 2022 15:34
@vitsalis vitsalis changed the title feat: btclightclient keeper fuzz tests feat: btclightclient HeadersState keeper fuzz tests Jul 13, 2022
Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments about the generators.

testutil/datagen/bitcoin.go Outdated Show resolved Hide resolved
testutil/datagen/bitcoin.go Outdated Show resolved Hide resolved
testutil/datagen/bitcoin.go Outdated Show resolved Hide resolved
testutil/datagen/bitcoin.go Outdated Show resolved Hide resolved
testutil/datagen/bitcoin.go Outdated Show resolved Hide resolved

headers[root.Hash.String()] = root

genRandomHeaderInfoChildren(headers, root, minDepth-1, depth-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it doesn't make a difference if I pass minDepth = 0 or minDepth = 1, as both will not have an effect on what genRandomHeaderInfoChildren does because it only looks at positive values.

I think 0 and 1 should have their distinct meaning:

  • 0: only generate a root
  • 1: generate the root and at least one child, so the root ends up being 1 deep

So the difference would be that you pass in minDepth not minDepth - 1 here.

I also suggest renaming depth to maxDepth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up changing the logic from a depth perspective to a height perspective. Much more clear imo.

testutil/datagen/bitcoin.go Outdated Show resolved Hide resolved
testutil/datagen/bitcoin.go Outdated Show resolved Hide resolved
testutil/datagen/bitcoin.go Outdated Show resolved Hide resolved
return GenRandomHeaderInfoTreeMinDepth(0)
}

// MutateHash takes a hash as a parameter, copies it, modifies the copy, and returns the copy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs should explain what exactly you can expect. In the general context of this file I expected it to be some random mutation, not a deterministic one. Which is cool, ostensibly I could call it apply it multiple times if I wanted a list of deterministic hashes starting from a seed.

However, I'm not sure why it only mutates the first byte for example, limiting itself to 256 different options. Maybe it should convert the hash to a BigInt and increment/decrement that number?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Modified it to retrieve a random index from the input and add 1.

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, and you even caught some bugs!

Some people say a test should only assert one thing, but I think these are great bang for the buck 👏

@vitsalis
Copy link
Member Author

Thanks for the great comments @aakoshh ! Did a significant refactoring of the data generation methodology to make it more clean and reusable. This led to the refactoring of many of the state tests, while I modified some of the type tests to take advantage of the new progress. Please take a look and lmk what you think.


// GenRandomBTCHeaderMerkleRoot generates a random hex string corresponding to a merkle root
func GenRandomBTCHeaderMerkleRoot() *chainhash.Hash {
// TODO: this should become a constant and have a custom type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am referencing the length of the merkle root. Updating the comment to make it more clear.

Comment on lines 95 to 97
// The time should be more recent than the parent time
btcdHeader.Timestamp = parent.Header.Time().Add(1)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know we're talking BTC, we know timestamps should be ~10 minutes apart with some deviation around it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, updating this to be 10 minutes +- 0-60 seconds (randomly chosen)

// Only generate `minHeight-1` subtrees for the first child
childMinHeight := uint64(0)
if i != 0 {
childMinHeight = t.MinHeight - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be like max(0, t.MinHeight - 1)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, nice catch

childMinHeight = t.MinHeight - 1
}

childTree := NewBTCHeaderTree(childNode, childMinHeight-1, t.MaxHeight-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
childTree := NewBTCHeaderTree(childNode, childMinHeight-1, t.MaxHeight-1)
childTree := NewBTCHeaderTree(childNode, childMinHeight, t.MaxHeight-1)

Looks like it's been already decreased.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

Comment on lines 78 to 80
// All good, add it to children list
t.Root.InsertChild(childNode)
childTree.GenRandomBTCHeaderInfoTree(callback)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit of an unusual generation strategy to have this BTCHeaderTree data structure instead of just a recursive function - it looks like callback needs to be stateful to track the hash clashes that you mentioned, while the nodes are store in different roots, so it would be difficult for the algorithm itself to do so. Whereas previously it was able to do this on its own. Looks like a bit of boilerplate is needed if you are paranoid about it, whereas it could just be catered for, allowing callback to do things that are interesting to the test case.

Comment on lines 7 to 11
type BTCHeaderTreeNode struct {
Header *btclightclienttypes.BTCHeaderInfo
Parent *BTCHeaderTreeNode
Children []*BTCHeaderTreeNode
}
Copy link
Contributor

@aakoshh aakoshh Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative representation that allows all this is to have a couple of maps in a single data structure:

  • headers: BlockHash -> Header
  • children: BlockHash -> [BlockHash]

Then it's easy to make sure a hash is unique without delegating that to a callback.

Comment on lines 554 to 562
tree.GenRandomBTCHeaderInfoTree(func(headerInfo *types.BTCHeaderInfo) bool {
err := k.InsertHeader(ctx, headerInfo.Header)
if err != nil {
// Something went wrong, do not add this header
return false
}
return true
})
return tree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit like a chicken and egg problem, or at least unusual, that we can't just generate the random data we think we should be able to insert into the store without asking the store along the way if they are okay with it and store it at the same time.

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good I think, a bit big to review in detail now to be honest.

Personally I would try to maintain a strict separation between how the test data is generated and only involve feeding it to the keeper once it's done, so that we don't have to scratch our heads whether the SUT is actually disallowing generating data which we think should be an interesting case.

I would consider having a simpler, flat data structure with maps where the recursive algorithm is easily able to verify the integrity constraints, rather than having to rely on the keeper to do this for us, because we have a data structure that's good for traversal but we think it's not good enough for actual data generation.

@aakoshh
Copy link
Contributor

aakoshh commented Jul 18, 2022

There is the case of stateful testing where you alternate between generating a random step and mutating state, however, you have a reference model and the SUT and you mutate both, and then compare the outcome on both sides.

That means you don't just generate a random input, pass it to the SUT, and if it doesn't accept it you discard and generate another one. How do we know it didn't discard some valid input? Instead we pass it to the simplified model (the struct with the maps), and if that accepts it then we demand that the SUT accepts it as well, and similarly with rejections.

So while I understand where you wanted to go with the nicely traversable data structure, I think it led you down a garden path where the system is a bit deceiving.

@vitsalis
Copy link
Member Author

Thanks for the comments! Overall, a flat data structure would keep things simpler in regards to storage, but it would be more difficult for us to perform other operations.

Overall, what we build is trees of headers and we need to represent them as such. The structure that you propose accomplishes that, but many operations that need to be performed as part of the test cases are not entirely straightforward. For example, if we had a data structure of the form

  • headers: BlockHash -> Header
  • children: BlockHash -> [BlockHash]

Then we do not have the concept of a parent, which makes ancestry retrieval operations (e.g. retrieving the main chain, the ancestry list etc.) more difficult. The main reason that I used this data structure is so that these operations can be easily implemented.

About the callback on tree generation: I added this because the datagen package does not have access to the keeper (and we don't want it to have access to it). Basically, the datagen package just generates data and forwards it to the one that needs it in-order (parents first, then children) and the consumer decides what they want to do with it. In our particular case, I was using the keeper to do the duplicate check (whether a header hash already exists) because I did not want to implement this behavior in the test generation methodology as well. However, I agree that this is wrong, and this functionality is not required. The consumer of the data generation methodology can easily identify whether a hash already exists and panic if the insertion using the keeper returns an error.

Other than this argument (using the keeper to do checks) which can be taken care of by more complex checks as part of the consumer, I believe that the data structure methodology is a good one and provides many possibilities for the consumer.

@vitsalis
Copy link
Member Author

Pushed a commit that doesn't involve the keeper in deciding whether the result to the callback should be true or false.

@aakoshh
Copy link
Contributor

aakoshh commented Jul 18, 2022

Thanks for the updates!

The flat data structure is most likely how this would be implemented in practice I think, at least when you deal with stuff like KV stores. I didn't include a parent because I thought it is easy to access via the header itself.

@vitsalis
Copy link
Member Author

vitsalis commented Jul 18, 2022

@aakoshh thought further about the data generation data structures and ultimately I agree with your points. I pushed an update that follows the approach you suggested, please take a look. The main updates involve the data generation methodology and minor updates on the state tests.

@aakoshh
Copy link
Contributor

aakoshh commented Jul 18, 2022

I was just checking it out, it looks great! I think in the context of these chains being able to go to a block by hash is fundamental, so some supporting index would have been needed for your linked data structure anyway, if you wanted to really use it for practical things.

You can ignore these links, but in the past I ended up with a similar but generic data structure I called a KVTree, with the BlockStore being a special case of it; other stuff I wanted to store in this were tentative ledger states. I didn't think about making use of range queries at that point like we do in our keeper.

@vitsalis
Copy link
Member Author

Looks beautiful! I think there is room for simplification in the datagen procedures, since many of those involve boiler plate code for retrieving ancestors, descendants etc. But for now I believe it will serve it's purpose. When the time comes for fuzz tests for the btccheckpointing module, then we'll have a clearer picture on the exact needs for the data generation and we can further simplify.

@vitsalis vitsalis merged commit 88ff081 into main Jul 18, 2022
@vitsalis vitsalis deleted the btclightclient-keeper-fuzz branch July 18, 2022 16:21
KonradStaniec added a commit that referenced this pull request Oct 25, 2023
* Add api to check delegation by staking tx hash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants