-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: btclightclient HeadersState keeper fuzz tests #61
Conversation
ff0bce8
to
3e79df2
Compare
3e79df2
to
a2636fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of comments about the generators.
testutil/datagen/bitcoin.go
Outdated
|
||
headers[root.Hash.String()] = root | ||
|
||
genRandomHeaderInfoChildren(headers, root, minDepth-1, depth-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it doesn't make a difference if I pass minDepth = 0
or minDepth = 1
, as both will not have an effect on what genRandomHeaderInfoChildren
does because it only looks at positive values.
I think 0 and 1 should have their distinct meaning:
- 0: only generate a root
- 1: generate the root and at least one child, so the root ends up being 1 deep
So the difference would be that you pass in minDepth
not minDepth - 1
here.
I also suggest renaming depth
to maxDepth
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up changing the logic from a depth
perspective to a height
perspective. Much more clear imo.
testutil/datagen/bitcoin.go
Outdated
return GenRandomHeaderInfoTreeMinDepth(0) | ||
} | ||
|
||
// MutateHash takes a hash as a parameter, copies it, modifies the copy, and returns the copy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs should explain what exactly you can expect. In the general context of this file I expected it to be some random mutation, not a deterministic one. Which is cool, ostensibly I could call it apply it multiple times if I wanted a list of deterministic hashes starting from a seed.
However, I'm not sure why it only mutates the first byte for example, limiting itself to 256 different options. Maybe it should convert the hash to a BigInt and increment/decrement that number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Modified it to retrieve a random index from the input and add 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, and you even caught some bugs!
Some people say a test should only assert one thing, but I think these are great bang for the buck 👏
Thanks for the great comments @aakoshh ! Did a significant refactoring of the data generation methodology to make it more clean and reusable. This led to the refactoring of many of the state tests, while I modified some of the type tests to take advantage of the new progress. Please take a look and lmk what you think. |
testutil/datagen/btc_header_info.go
Outdated
|
||
// GenRandomBTCHeaderMerkleRoot generates a random hex string corresponding to a merkle root | ||
func GenRandomBTCHeaderMerkleRoot() *chainhash.Hash { | ||
// TODO: this should become a constant and have a custom type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am referencing the length of the merkle root. Updating the comment to make it more clear.
testutil/datagen/btc_header_info.go
Outdated
// The time should be more recent than the parent time | ||
btcdHeader.Timestamp = parent.Header.Time().Add(1) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know we're talking BTC, we know timestamps should be ~10 minutes apart with some deviation around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, updating this to be 10 minutes +- 0-60 seconds (randomly chosen)
testutil/datagen/btc_header_tree.go
Outdated
// Only generate `minHeight-1` subtrees for the first child | ||
childMinHeight := uint64(0) | ||
if i != 0 { | ||
childMinHeight = t.MinHeight - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be like max(0, t.MinHeight - 1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, nice catch
testutil/datagen/btc_header_tree.go
Outdated
childMinHeight = t.MinHeight - 1 | ||
} | ||
|
||
childTree := NewBTCHeaderTree(childNode, childMinHeight-1, t.MaxHeight-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
childTree := NewBTCHeaderTree(childNode, childMinHeight-1, t.MaxHeight-1) | |
childTree := NewBTCHeaderTree(childNode, childMinHeight, t.MaxHeight-1) |
Looks like it's been already decreased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
testutil/datagen/btc_header_tree.go
Outdated
// All good, add it to children list | ||
t.Root.InsertChild(childNode) | ||
childTree.GenRandomBTCHeaderInfoTree(callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit of an unusual generation strategy to have this BTCHeaderTree
data structure instead of just a recursive function - it looks like callback
needs to be stateful to track the hash clashes that you mentioned, while the nodes are store in different roots, so it would be difficult for the algorithm itself to do so. Whereas previously it was able to do this on its own. Looks like a bit of boilerplate is needed if you are paranoid about it, whereas it could just be catered for, allowing callback to do things that are interesting to the test case.
type BTCHeaderTreeNode struct { | ||
Header *btclightclienttypes.BTCHeaderInfo | ||
Parent *BTCHeaderTreeNode | ||
Children []*BTCHeaderTreeNode | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative representation that allows all this is to have a couple of maps in a single data structure:
- headers: BlockHash -> Header
- children: BlockHash -> [BlockHash]
Then it's easy to make sure a hash is unique without delegating that to a callback.
tree.GenRandomBTCHeaderInfoTree(func(headerInfo *types.BTCHeaderInfo) bool { | ||
err := k.InsertHeader(ctx, headerInfo.Header) | ||
if err != nil { | ||
// Something went wrong, do not add this header | ||
return false | ||
} | ||
return true | ||
}) | ||
return tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit like a chicken and egg problem, or at least unusual, that we can't just generate the random data we think we should be able to insert into the store without asking the store along the way if they are okay with it and store it at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good I think, a bit big to review in detail now to be honest.
Personally I would try to maintain a strict separation between how the test data is generated and only involve feeding it to the keeper once it's done, so that we don't have to scratch our heads whether the SUT is actually disallowing generating data which we think should be an interesting case.
I would consider having a simpler, flat data structure with maps where the recursive algorithm is easily able to verify the integrity constraints, rather than having to rely on the keeper to do this for us, because we have a data structure that's good for traversal but we think it's not good enough for actual data generation.
There is the case of stateful testing where you alternate between generating a random step and mutating state, however, you have a reference model and the SUT and you mutate both, and then compare the outcome on both sides. That means you don't just generate a random input, pass it to the SUT, and if it doesn't accept it you discard and generate another one. How do we know it didn't discard some valid input? Instead we pass it to the simplified model (the struct with the maps), and if that accepts it then we demand that the SUT accepts it as well, and similarly with rejections. So while I understand where you wanted to go with the nicely traversable data structure, I think it led you down a garden path where the system is a bit deceiving. |
Thanks for the comments! Overall, a flat data structure would keep things simpler in regards to storage, but it would be more difficult for us to perform other operations. Overall, what we build is trees of headers and we need to represent them as such. The structure that you propose accomplishes that, but many operations that need to be performed as part of the test cases are not entirely straightforward. For example, if we had a data structure of the form
Then we do not have the concept of a parent, which makes ancestry retrieval operations (e.g. retrieving the main chain, the ancestry list etc.) more difficult. The main reason that I used this data structure is so that these operations can be easily implemented. About the callback on tree generation: I added this because the datagen package does not have access to the keeper (and we don't want it to have access to it). Basically, the datagen package just generates data and forwards it to the one that needs it in-order (parents first, then children) and the consumer decides what they want to do with it. In our particular case, I was using the keeper to do the duplicate check (whether a header hash already exists) because I did not want to implement this behavior in the test generation methodology as well. However, I agree that this is wrong, and this functionality is not required. The consumer of the data generation methodology can easily identify whether a hash already exists and Other than this argument (using the keeper to do checks) which can be taken care of by more complex checks as part of the consumer, I believe that the data structure methodology is a good one and provides many possibilities for the consumer. |
Pushed a commit that doesn't involve the keeper in deciding whether the result to the callback should be true or false. |
Thanks for the updates! The flat data structure is most likely how this would be implemented in practice I think, at least when you deal with stuff like KV stores. I didn't include a parent because I thought it is easy to access via the header itself. |
@aakoshh thought further about the data generation data structures and ultimately I agree with your points. I pushed an update that follows the approach you suggested, please take a look. The main updates involve the data generation methodology and minor updates on the state tests. |
I was just checking it out, it looks great! I think in the context of these chains being able to go to a block by hash is fundamental, so some supporting index would have been needed for your linked data structure anyway, if you wanted to really use it for practical things. You can ignore these links, but in the past I ended up with a similar but generic data structure I called a KVTree, with the BlockStore being a special case of it; other stuff I wanted to store in this were tentative ledger states. I didn't think about making use of range queries at that point like we do in our keeper. |
Looks beautiful! I think there is room for simplification in the datagen procedures, since many of those involve boiler plate code for retrieving ancestors, descendants etc. But for now I believe it will serve it's purpose. When the time comes for fuzz tests for the |
* Add api to check delegation by staking tx hash
This PR introduces fuzz tests for the
state.go
andbase_btc_header.go
keeper files.It involves the creation of data generation functions and bug fixes that stemmed from the tests.
The newly added data generation methods include methodology for generating random
BTCHeaderInfo
objects as well as random trees of Headers. Based on the new updates, I have refactored the existing fuzz tests to use the newly added capabilities.In subsequent pull requests, I'm going to add tests for the rest of the keeper functionality as this one is already getting too large.