[frontend] List named blobs SQL #2955

snalli · 2024-12-03T04:13:00Z

Fix list-named-blobs SQL to use smaller joins. This improves query performance by 1000x. Yes you read that right.

Improve query

SophieGuo410

We should be careful about what we changed here.
I suggest we could have a new query and switch by config. so we can rollback easily if something is wrong.
Also we need to add integration test to make sure it cover every case.
For example,
if we have the delete_ts is null for older version and deleted_ts is not null for a new version, we will return the old version blobName.

ambry-named-mysql/src/main/java/com/github/ambry/named/MySqlNamedBlobDb.java

snalli · 2024-12-03T18:44:18Z

We should be careful about what we changed here. I suggest we could have a new query and switch by config. so we can rollback easily if something is wrong. Also we need to add integration test to make sure it cover every case. For example, if we have the delete_ts is null for older version and deleted_ts is not null for a new version, we will return the old version blobName.

will add a test, the coverage seems limited, because the tests passed even though my prev query was not what is required.

snalli · 2024-12-03T18:45:24Z

We should be careful about what we changed here. I suggest we could have a new query and switch by config. so we can rollback easily if something is wrong. Also we need to add integration test to make sure it cover every case. For example, if we have the delete_ts is null for older version and deleted_ts is not null for a new version, we will return the old version blobName.

converted the query itself to a config to avoid a v2/v3 switch, as we far too many of those.

This reverts commit 6a73938.

This reverts commit 4a3ee9a.

This reverts commit 1a0b966.

SophieGuo410 · 2024-12-03T19:01:30Z

ambry-api/src/main/java/com/github/ambry/config/MySqlNamedBlobDbConfig.java

+   * In each group, it selects the blob with the highest version.
+   * The outer query filters out blobs that have been deleted, and return the latest blob in each group that is ready.
+   */
+  @Config(LIST_NAMED_BLOBS_SQL)


I feel like having the actual query as a config is not good. We want to make the config as simple as possible, besides, having a sql cmd in config instead of in the MySqlNamedBlobDb makes the code hard to track. if you think we have too many boolean configs, you can easily remove them after the query has been fully verified by actual traffic.

SophieGuo410 · 2024-12-04T00:22:03Z

...-mysql/src/integration-test/java/com/github/ambry/named/MySqlNamedBlobDbIntegrationTest.java

+    time.sleep(100);
+
+    // delete blob and list should return empty
+    namedBlobDb.delete(account.getName(), container.getName(), blobName).get();


Can you please add some test like like

add one permanent version.

upsert the named blob and delete it.
In this case, we should list nothing cause the latest version already been deleted.

SophieGuo410 · 2024-12-04T00:28:27Z

ambry-named-mysql/src/main/java/com/github/ambry/named/MySqlNamedBlobDb.java

      if (blobNamePrefix == null) {
+        // list-all no prefix
+        statement.setInt(1, accountId);


I'm a little bit confused here.
You didn't change the LIST_ALL_QUERY_V2, why you can update the statement settings?

SophieGuo410 · 2024-12-04T00:34:51Z

ambry-api/src/main/java/com/github/ambry/config/MySqlNamedBlobDbConfig.java

+      + "     AND blob_state = %1$s "
+      + "     AND blob_name LIKE ? " // 7
+      + "     AND blob_name >= ? " // 8
+      + "   GROUP BY blob_name "


I think GROUP BY logic operates independently of the WHERE clause, this GROUP BY account_id, container_id, blob_name ensures that the aggregation (e.g., MAX(version)) respects the boundaries of each account and container. Again, we are not the expert of mysql queries, and you might want to add some test case like adding named blob for different account and container and test the list. Even though you use where to select the corresponding account and container, it might not give you the right max version based on my understanding.

SophieGuo410 · 2024-12-04T00:36:40Z

ambry-api/src/main/java/com/github/ambry/config/MySqlNamedBlobDbConfig.java

+  @Config(LIST_NAMED_BLOBS_SQL)
+  public static final String DEFAULT_LIST_NAMED_BLOBS_SQL = ""
+      + " WITH "
+      + "  BlobsAllVersion AS ( "


I feel like this query is still pretty complex. I suggest we can think of some simpler queries which makes the code easy to read. Sometimes having two queries is also ok cause we are trying to avoid the complex sql queris which might missing edge cases. cc @justinlin-linkedin

SophieGuo410 · 2024-12-04T00:40:51Z

ambry-named-mysql/src/main/java/com/github/ambry/named/MySqlNamedBlobDb.java

+        statement.setInt(6, containerId);
+        statement.setString(7, blobNamePrefix + "%");
+        statement.setString(8, pageToken != null ? pageToken : blobNamePrefix);
+        statement.setInt(9, maxKeysValue + 1);


Can we add some time like this:
blobName1 blobId1 version1
blobName1 blobId2 version2
so listing we should return blobName1 blobId1 version2 right?
but I think in some case we might return the blobName1 blobId2 version.
This is ok but we should not return the NamedBlobRecord with blobId info. Instead, we can only return the blobName, cause blobId1 and blobId2 all map to blobName1 and we don't need blobId for listing.

This reverts commit c86acd6.

This reverts commit d8f8acf.

[frontend] List named blobs

4c87f1a

Improve query

snalli requested a review from SophieGuo410 December 3, 2024 04:13

snalli added 2 commits December 2, 2024 20:27

add limit

587367d

Disable strict mode ONLY_FULL_GROUP_BY

fc13d14

SophieGuo410 reviewed Dec 3, 2024

View reviewed changes

snalli added 3 commits December 3, 2024 06:22

use CTE

a164573

config list_blobs_sql

893ce4d

DEFAULT_LIST_NAMED_BLOBS_SQL

ad81af1

snalli requested a review from SophieGuo410 December 3, 2024 18:44

snalli added 19 commits December 3, 2024 10:48

comments

d57c7fc

rm newline

b4984bb

LatestBlobs

3679d21

make final

cfd0de2

static cfg name

fd7d8d3

commnts

8288c95

testListNamedBlobsWithStaleRecords

d2e2e4d

comments

81cae1a

sleep

be4ac4b

use b1/b2/b3

329489d

prefix state

f48bde9

put v2 temp

a3f3210

rm GROUP-BY

4a3ee9a

rm order-by

6a73938

Revert "rm order-by"

490b3b6

This reverts commit 6a73938.

Revert "rm GROUP-BY"

7eea268

This reverts commit 4a3ee9a.

inner join

abf2002

test with prefixes

5e9b208

restore

1d5989b

snalli added 5 commits December 3, 2024 15:54

set params

a77ce2e

condition on prefix

03d9ea5

whotes[sace

5d52454

use getBlobId; strange corupption

1a0b966

Revert "use getBlobId; strange corupption"

a4da8c5

This reverts commit 1a0b966.

SophieGuo410 reviewed Dec 4, 2024

View reviewed changes

snalli added 20 commits December 3, 2024 16:54

try another name

d8f8acf

remove -o

c86acd6

Revert "remove -o"

57c23f0

This reverts commit c86acd6.

Revert "try another name"

b32c395

This reverts commit d8f8acf.

set expiry to 1h

35af0d9

comment out some asserts

4b26cec

get records for test

18b00a8

get-other

34dc789

too

0c40d7c

remove hyphens

857564a

add one hypen

3cf29b0

shorter blob-id crap

523d56c

getBlobId

6ddc321

holy crap

e25a6de

insert v2_other

e125ec6

use moving time

1880cb1

compare db records

3a10dbd

holy crap 2

80aedf3

fix index

fed056e

add some extra blobs

d664aa7

snalli changed the title ~~[WIP][frontend] List named blobs~~ [frontend] List named blobs SQL Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[frontend] List named blobs SQL #2955

[frontend] List named blobs SQL #2955

snalli commented Dec 3, 2024 •

edited

Loading

SophieGuo410 left a comment

snalli commented Dec 3, 2024

snalli commented Dec 3, 2024

SophieGuo410 Dec 3, 2024

SophieGuo410 Dec 4, 2024

SophieGuo410 Dec 4, 2024

SophieGuo410 Dec 4, 2024

SophieGuo410 Dec 4, 2024

SophieGuo410 Dec 4, 2024

[frontend] List named blobs SQL #2955

Are you sure you want to change the base?

[frontend] List named blobs SQL #2955

Conversation

snalli commented Dec 3, 2024 • edited Loading

SophieGuo410 left a comment

Choose a reason for hiding this comment

snalli commented Dec 3, 2024

snalli commented Dec 3, 2024

SophieGuo410 Dec 3, 2024

Choose a reason for hiding this comment

SophieGuo410 Dec 4, 2024

Choose a reason for hiding this comment

SophieGuo410 Dec 4, 2024

Choose a reason for hiding this comment

SophieGuo410 Dec 4, 2024

Choose a reason for hiding this comment

SophieGuo410 Dec 4, 2024

Choose a reason for hiding this comment

SophieGuo410 Dec 4, 2024

Choose a reason for hiding this comment

snalli commented Dec 3, 2024 •

edited

Loading