-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[frontend] List named blobs SQL #2955
base: master
Are you sure you want to change the base?
Conversation
Improve query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be careful about what we changed here.
I suggest we could have a new query and switch by config. so we can rollback easily if something is wrong.
Also we need to add integration test to make sure it cover every case.
For example,
if we have the delete_ts is null for older version and deleted_ts is not null for a new version, we will return the old version blobName.
ambry-named-mysql/src/main/java/com/github/ambry/named/MySqlNamedBlobDb.java
Outdated
Show resolved
Hide resolved
ambry-named-mysql/src/main/java/com/github/ambry/named/MySqlNamedBlobDb.java
Outdated
Show resolved
Hide resolved
ambry-named-mysql/src/main/java/com/github/ambry/named/MySqlNamedBlobDb.java
Outdated
Show resolved
Hide resolved
ambry-named-mysql/src/main/java/com/github/ambry/named/MySqlNamedBlobDb.java
Outdated
Show resolved
Hide resolved
will add a test, the coverage seems limited, because the tests passed even though my prev query was not what is required. |
converted the query itself to a config to avoid a v2/v3 switch, as we far too many of those. |
* In each group, it selects the blob with the highest version. | ||
* The outer query filters out blobs that have been deleted, and return the latest blob in each group that is ready. | ||
*/ | ||
@Config(LIST_NAMED_BLOBS_SQL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like having the actual query as a config is not good. We want to make the config as simple as possible, besides, having a sql cmd in config instead of in the MySqlNamedBlobDb makes the code hard to track. if you think we have too many boolean configs, you can easily remove them after the query has been fully verified by actual traffic.
time.sleep(100); | ||
|
||
// delete blob and list should return empty | ||
namedBlobDb.delete(account.getName(), container.getName(), blobName).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some test like like
- add one permanent version.
- upsert the named blob and delete it.
In this case, we should list nothing cause the latest version already been deleted.
if (blobNamePrefix == null) { | ||
// list-all no prefix | ||
statement.setInt(1, accountId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit confused here.
You didn't change the LIST_ALL_QUERY_V2, why you can update the statement settings?
+ " AND blob_state = %1$s " | ||
+ " AND blob_name LIKE ? " // 7 | ||
+ " AND blob_name >= ? " // 8 | ||
+ " GROUP BY blob_name " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think GROUP BY logic operates independently of the WHERE clause, this GROUP BY account_id, container_id, blob_name ensures that the aggregation (e.g., MAX(version)) respects the boundaries of each account and container. Again, we are not the expert of mysql queries, and you might want to add some test case like adding named blob for different account and container and test the list. Even though you use where to select the corresponding account and container, it might not give you the right max version based on my understanding.
@Config(LIST_NAMED_BLOBS_SQL) | ||
public static final String DEFAULT_LIST_NAMED_BLOBS_SQL = "" | ||
+ " WITH " | ||
+ " BlobsAllVersion AS ( " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this query is still pretty complex. I suggest we can think of some simpler queries which makes the code easy to read. Sometimes having two queries is also ok cause we are trying to avoid the complex sql queris which might missing edge cases. cc @justinlin-linkedin
statement.setInt(6, containerId); | ||
statement.setString(7, blobNamePrefix + "%"); | ||
statement.setString(8, pageToken != null ? pageToken : blobNamePrefix); | ||
statement.setInt(9, maxKeysValue + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some time like this:
blobName1 blobId1 version1
blobName1 blobId2 version2
so listing we should return blobName1 blobId1 version2 right?
but I think in some case we might return the blobName1 blobId2 version.
This is ok but we should not return the NamedBlobRecord with blobId info. Instead, we can only return the blobName, cause blobId1 and blobId2 all map to blobName1 and we don't need blobId for listing.
Fix list-named-blobs SQL to use smaller joins. This improves query performance by 1000x. Yes you read that right.