-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an API to fetch blobs from a given batch header hash #688
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm
disperser/dataapi/blobs_handlers.go
Outdated
@@ -35,6 +35,23 @@ func (s *server) getBlobs(ctx context.Context, limit int) ([]*BlobMetadataRespon | |||
return s.convertBlobMetadatasToBlobMetadataResponse(ctx, blobMetadatas) | |||
} | |||
|
|||
func (s *server) getBlobsFromBatchHeaderHash(ctx context.Context, batcherHeaderHash string, limit int, exclusiveStartKey *disperser.BatchIndexExclusiveStartKey) ([]*BlobMetadataResponse, *disperser.BatchIndexExclusiveStartKey, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks this can be used to re-implement (hopefully simplify) the above getBlobs
function?
} | ||
} | ||
|
||
queryResult, err := s.dynamoDBClient.QueryIndexWithPagination( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this return error if 1mb limit is reached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it simply just returns less items than the limit.
A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data
Value: batchHeaderHash[:], | ||
}, | ||
}, | ||
limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If limit
is large and exceeds 1mb, can this be broken down into multiple calls so it get all desired number of blobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just following the existing pattern with this API which is to make the client keep track of the limit / amount of items returned and continuously call for more pages.
var allMetadata []*disperser.BlobMetadata | ||
var nextKey *disperser.BatchIndexExclusiveStartKey = exclusiveStartKey | ||
|
||
const maxLimit int32 = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A batch can have more than 1000 blobs easily (just 2 blobs/s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i've found a batch that has 16497 blobs in testnet. I'm setting some reasonable max limit on the page size. Do you think 10,000 is a better upper bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying with 10,000 takes around 13 seconds with a 27 megabyte response. While 1000 takes less than 2 seconds with 2 megabyte response.
I think it's better with smaller pages since it can also be parallelized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Can you make sure the cloudflare cache is enabled for all endpoints?
I'll track it as a separate item |
Why are these changes needed?
Adds a new endpoint to the data api server to fetch blob metadata from a batch header hash. This API should enable others to build blob explorers on top of our data.
The endpoint is paginated and returns a
next_token
field which is a base64 encoded token of the last evaluated key in the page.Example:
Sample query:
Sample response
next_token decoded:
Checks