Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epoching: keeper functions, queries, and testing infra #19

Merged
merged 11 commits into from
Jun 24, 2022

Conversation

SebastianElvis
Copy link
Member

@SebastianElvis SebastianElvis commented Jun 21, 2022

Fixes BM-25, BM-28, BM-30.

This PR introduces keeper functions, queries, and test infra for the epoching module.

TODOs (will be done in subsequent PRs):

  • Unit/fuzz tests, depending on properties we want to test

@SebastianElvis SebastianElvis changed the title epoching: keeper functions epoching: keeper functions and testing infra Jun 22, 2022
@SebastianElvis SebastianElvis marked this pull request as ready for review June 22, 2022 05:12
@SebastianElvis
Copy link
Member Author

After this PR and #18, the next task would be state transition upon BeginBlock and EndBlock. Implementing these state transitions require our proposed modifications in the Cosmos SDK. Maybe after the two PRs, it will be the right time to fork and work on Cosmos SDK.

@SebastianElvis SebastianElvis changed the title epoching: keeper functions and testing infra epoching: keeper functions, queries, and testing infra Jun 22, 2022
@aakoshh
Copy link
Contributor

aakoshh commented Jun 22, 2022

Maybe after the two PRs, it will be the right time to fork and work on Cosmos SDK.

Frojdi suggested to @vitsalis that we try to do it without forking the SDK, by just copy pasting the necessary lines into our codebase and only calling ApplyAndReturnValidatorSetUpdates in our EndBlock. The idea was that if somebody would want to develop against Babylon as a library Go might mix up forked and non-forked stuff.

@SebastianElvis
Copy link
Member Author

Maybe after the two PRs, it will be the right time to fork and work on Cosmos SDK.

Frojdi suggested to @vitsalis that we try to do it without forking the SDK, by just copy pasting the necessary lines into our codebase and only calling ApplyAndReturnValidatorSetUpdates in our EndBlock. The idea was that if somebody would want to develop against Babylon as a library Go might mix up forked and non-forked stuff.

Right, so basically we need to do the following:

  1. Exclude the staking module from ModuleManager's EndBlocker/BeginBlocker function
  2. Copy/paste necessary functions to the epoching module and make our modifications
  3. Execute the modified logic in the epoching module totally, and invoke the staking module for non-modified logic

I will follow this approach then. Thanks for the information!

@aakoshh
Copy link
Contributor

aakoshh commented Jun 22, 2022

Yes, that's exactly right. If you agree with this approach, it's probably not a big risk to keep those lines in sync (while it only calls public methods). Frojdi thinks this part doesn't change often.

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Good idea using the count for providing a unique ID. A potential alternative could be using some sort of a timestamp like in https://github.com/cosmos/cosmos-sdk/blob/a2761bdc6f23500a2b38d398ee2a423f2a5311c0/x/staking/keeper/delegation.go#L374, but they seem to store lists under the keys, so it's actually a bit more complicated.

I don't really follow the epoch boundary logic, though.

It would be worth following up with a PR with fuzz tests to check that the invariants are maintained.

Comment on lines +32 to +36
epochNumber, err := k.GetEpochNumber(ctx)
if err != nil {
return nil, err
}
epochBoundary, err := k.GetEpochBoundary(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is any race condition here with block processing, or does the SDK take care of providing a consistent view throughout the query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC Cosmos SDK has taken care of this. The module manager triggers BeginBlock and EndBlock sequentially for each module. At any time slot there will only be one module processing ctx, which includes blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but these are gRPC queries which can run any time by anyone who accesses the RPC endpoints. It's probably okay, but it would be nice to know for sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, will look into this and get back to you here tomorrow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully certain, but there seems to be no handling on race conditions caused by concurrent GPRC requests. Cosmos SDK's GPRC server is implemented here https://github.com/cosmos/cosmos-sdk/blob/v0.45.5/server/grpc/grpc_web.go#L15-L45. It basically uses Go's net/http library. Upon each GPRC query, it creates a goroutine to handle it.

In this case, it seems tricky if we want to make multiple sequential DB read/write upon a single query..

Comment on lines 48 to 60
func (k Keeper) EpochMsgs(c context.Context, req *types.QueryEpochMsgsRequest) (*types.QueryEpochMsgsResponse, error) {
panic("TODO: unimplemented")
ctx := sdk.UnwrapSDKContext(c)
msgs, err := k.GetEpochMsgs(ctx)
if err != nil {
return nil, err
}
resp := &types.QueryEpochMsgsResponse{
Msgs: msgs,
Pagination: &qtypes.PageResponse{
Total: uint64(len(msgs)),
},
}
return resp, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What seems to be missing here is actual pagination handling, you always return all messages. This might violate the API contract by ignoring any Limit in the request. Actually it even has a flag CountTotal to indicate whether Total should be calculated, as well as Reverse.

Here's an example of using this, it looks like the SDK takes care of the heavy lifting: https://github.com/cosmos/cosmos-sdk/blob/v0.45.5/x/staking/keeper/grpc_query.go#L156-L174

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will do.

func TestParamsQuery(t *testing.T) {
keeper, ctx := testkeeper.EpochingKeeper(t)
wctx := sdk.WrapSDKContext(ctx)
params := types.DefaultParams()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big deal, but it's worth using fuzz tests even for this sort of thing. When I see a test inserting and asserting something like DefaultParams I am always wary whether something like this might be going on, showing a false positive by returning DefaultParams from the keeper if it can't find a record for example. Testing with non-default input is always more convincing, and using fuzzing can remove the burden of coming up with hard coded values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Perhaps we can have the fuzz tests in a separate PR? since the tests seem to heavily depend on the APIs and the properties we want to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem. Luckily you will still be able to use almost the same test, as the data is an in-memory one, so you can just create a new one in each test, like it's done now. The only difference is that instead of params := types.DefaultParams() you could use one of the libraries I linked to to pupulate the params with random values.

epochInterval := sdk.NewUint(k.GetParams(ctx).EpochInterval)

// hits epoch boundary
existingBlocks := epochNumber.Mod(epochInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be Mul? Aren't we dealing with things like "we are in epoch 5, with the interval being 1000 blocks"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you are right. I messed up with the epoch number and the height. Will address this function.

// hits epoch boundary
existingBlocks := epochNumber.Mod(epochInterval)
if existingBlocks.IsZero() {
return epochNumber, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could the epoch number be a correct block height for the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

x/epoching/keeper/keeper.go Show resolved Hide resolved

bz := store.Get(types.QueueLengthKey)
if bz == nil {
return sdk.NewUint(uint64(DefaultQueueLength)), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What possible default value other than 0 could this take?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot think of any other value. Perhaps this is not a "default" value: the queue has no item if no msg is queued. Will consider hard-code 0 here rather than having a new constant DefaultQueueLength

return err
}

store.Set(types.EpochNumberKey, queueLenBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
store.Set(types.EpochNumberKey, queueLenBytes)
store.Set(types.QueueLengthKey, queueLenBytes)

}

// IncQueueLength adds the queue length by 1
func (k Keeper) IncQueueLength(ctx sdk.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? Seems like the only time it should be called is during EnqueueMsg.

defer iterator.Close()
for ; iterator.Valid(); iterator.Next() {
key := iterator.Key()
queuedMsgBytes := store.Get(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be something like iterator.Value(), look, so you don't have to do a separate lookup.

@SebastianElvis
Copy link
Member Author

Thanks for the comments Akosh! I have addressed the comments and investigated the GPRC server stuff (replied above). Feel free to review again.

Comment on lines 59 to 60
// get value in bytes
storeValue := store.Get(storeKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the value []byte parameter passed to this func?

Comment on lines +48 to 49
// EpochMsgs handles the QueryEpochMsgsRequest query
func (k Keeper) EpochMsgs(c context.Context, req *types.QueryEpochMsgsRequest) (*types.QueryEpochMsgsResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to keep in mind with this query is that the epoch might end between pagination requests, so it might be presenting an inconsistent result by the time someone gets to the end.

There are a few fixes that I can think of, depending on what we want to achieve.

  • We could add the epoch number to the query, and return nothing if the current epoch number is different. But it's a bit of pain to have to set it and not know why there's no result.
  • We could not reset the key to 0 when the queue is cleared, and just keep incrementing the ID forever. That way when the next query comes, it might skip some records that have been deleted, then resume from the next available record which has a higher key than the value in the pagination data structure.
  • I suppose we can do nothing, in which case some records that have been inserted after the delete might be skipped because their keys are lower than the pagionation state.

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I don't see any blockers, but if possible it would be good to save that extra lookup during querying and just use value. That's what the other iterator in the keeper is doing as well.

@SebastianElvis
Copy link
Member Author

Thanks! Have fixed the extra lookup and added TODOs for the concurrency issue in EpochMsgs, and merged this PR.

Will have some subsequent PRs to

  • add fuzz/unit tests
  • fix race condition issues (I will need to look into how other Cosmos-SDK-based projects handle such issues.)

@SebastianElvis SebastianElvis merged commit aa07c40 into main Jun 24, 2022
@SebastianElvis SebastianElvis deleted the epoching-keeper-functions branch June 24, 2022 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants