Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-902709 Limit the max allowed number of chunks in blob #580

Merged
merged 3 commits into from
Oct 18, 2023

Conversation

sfc-gh-lsembera
Copy link
Contributor

@sfc-gh-lsembera sfc-gh-lsembera commented Aug 29, 2023

When one client is ingesting into many tables, we are seeing occasional timeouts from blob registrations calls. This PR introduces the limit of 20 chunks in one blob and in blob registration request.

Fixes #570
Fixes #567

@sfc-gh-lsembera sfc-gh-lsembera force-pushed the lsembera/limit-chunks-in-one-blob branch 4 times, most recently from 27a27ec to df04e8c Compare September 11, 2023 15:51
@sfc-gh-lsembera sfc-gh-lsembera marked this pull request as ready for review September 11, 2023 15:53
@sfc-gh-tzhang
Copy link
Contributor

  • max chunks per blob registration request (=sum of chunks over all blobs in the registration request): 100

Why we need to limit as well?

// Newly added BDEC file would exceed the max number of chunks in a single registration
// request. We put chunks collected so far into the result list and create a new batch with
// the current blob
result.add(currentBatch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This probably can't happen as long as maxChunksInBlob parameter is less than getMaxChunksInRegistrationRequest parameter but if it is not we will add a currentBatch that is empty to the result list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I added validation that maxChunksInBlob is always less than getMaxChunksInRegistrationRequest

Comment on lines 354 to 364
} else if (blobData.size()
>= this.owningClient.getParameterProvider().getMaxChunksInBlob()) {
// Create a new blob if the current one already contains max allowed number of chunks
logger.logInfo(
"Max allowed number of chunks in the current blob reached. chunkCount={}"
+ " maxChunkCount={} currentBlobPath={}",
blobData.size(),
this.owningClient.getParameterProvider().getMaxChunksInBlob(),
blobPath);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check be in the shouldStopProcessing() method?

if I am not wrong if the leftoverChannelsDataPerTable is not empty, then we will ignore this check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think shouldStopProcessing is checking on the channel level but this is on the chunks level, there is no correctness issue if leftoverChannelsDataPerTable is not empty, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not - the idea is that we process leftover channels first and only when they are empty and we are about to start a new chunk, we check if the new chunks would exceed the max chunk limit. I will add a new test that generates some leftoverChannelsDataPerTable to verify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional test added to FlushServiceTest

Copy link
Contributor

@sfc-gh-tzhang sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comment, PTAL! No need to block on my approval :)

@@ -30,6 +30,9 @@ public class ParameterProvider {
public static final String MAX_CHUNK_SIZE_IN_BYTES = "MAX_CHUNK_SIZE_IN_BYTES".toLowerCase();
public static final String MAX_ALLOWED_ROW_SIZE_IN_BYTES =
"MAX_ALLOWED_ROW_SIZE_IN_BYTES".toLowerCase();
public static final String MAX_CHUNKS_IN_BLOB = "MAX_CHUNKS_IN_BDEC".toLowerCase();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for these added parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests added

@@ -351,6 +351,16 @@ void distributeFlushTasks() {
if (!leftoverChannelsDataPerTable.isEmpty()) {
channelsDataPerTable.addAll(leftoverChannelsDataPerTable);
leftoverChannelsDataPerTable.clear();
} else if (blobData.size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the logic here combined with

? Both are checking whether a separate blob is needed

Copy link
Contributor Author

@sfc-gh-lsembera sfc-gh-lsembera Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but it could be triggered in the middle of a chunk, which makes sense when we split based on chunk/blob size, but not for the newly added check. The added check only cares about the number of chunks in a blob, so I put the break just before a new chunk is about to start.

Comment on lines 52 to 66
public static final int MAX_CHUNKS_IN_BLOB_DEFAULT = 20;
public static final int MAX_CHUNKS_IN_REGISTRATION_REQUEST_DEFAULT = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the real issue here? Is it because of the total number of chunks in one blob or the total number of the chunks in one request? If it's former, then why we need to limit the total number of chunks in one request? If it's latter, then these two values should be the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, the issue issue is the number of chunks in one request, because it is where the server-side latency and potential timeouts come from. They could be the same, the reasoning behind one being smaller than the other was that it would give the SDK an oportunity to put BDECs with fewer chunks into the same registration request. For example, if the limit of both is 100 and there is another bdec with just one chunk, it would have to go into its own registration request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, given that this situation should be rare, I'm not sure if we need to be smart here. I'm more in a favor of adding less configurable parameters with random default values, WDYT?

@sfc-gh-lsembera sfc-gh-lsembera force-pushed the lsembera/limit-chunks-in-one-blob branch 2 times, most recently from c6dd85c to 5e261b1 Compare October 12, 2023 11:17
@sfc-gh-lsembera
Copy link
Contributor Author

@sfc-gh-tzhang I addressed your PR comments, could you re-review, please?

@sfc-gh-lsembera sfc-gh-lsembera force-pushed the lsembera/limit-chunks-in-one-blob branch from 93c1a3e to e93dd17 Compare October 16, 2023 09:39
Copy link
Contributor

@sfc-gh-tzhang sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion, PTAL and feel free to merge if you disagree, thanks!

Comment on lines 52 to 66
public static final int MAX_CHUNKS_IN_BLOB_DEFAULT = 20;
public static final int MAX_CHUNKS_IN_REGISTRATION_REQUEST_DEFAULT = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, given that this situation should be rare, I'm not sure if we need to be smart here. I'm more in a favor of adding less configurable parameters with random default values, WDYT?

@sfc-gh-lsembera sfc-gh-lsembera force-pushed the lsembera/limit-chunks-in-one-blob branch from e93dd17 to 7166bde Compare October 18, 2023 08:29
@sfc-gh-lsembera sfc-gh-lsembera merged commit 5aa45d3 into master Oct 18, 2023
12 checks passed
@sfc-gh-lsembera sfc-gh-lsembera deleted the lsembera/limit-chunks-in-one-blob branch October 18, 2023 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants