SNOW-902709 Limit the max allowed number of chunks in blob #580

sfc-gh-lsembera · 2023-08-29T14:54:16Z

When one client is ingesting into many tables, we are seeing occasional timeouts from blob registrations calls. This PR introduces the limit of 20 chunks in one blob and in blob registration request.

Fixes #570
Fixes #567

sfc-gh-tzhang · 2023-09-11T20:10:30Z

max chunks per blob registration request (=sum of chunks over all blobs in the registration request): 100

Why we need to limit as well?

sfc-gh-asen · 2023-09-11T20:35:20Z

...ain/java/net/snowflake/ingest/streaming/internal/SnowflakeStreamingIngestClientInternal.java

+        // Newly added BDEC file would exceed the max number of chunks in a single registration
+        // request. We put chunks collected so far into the result list and create a new batch with
+        // the current blob
+        result.add(currentBatch);


nit: This probably can't happen as long as maxChunksInBlob parameter is less than getMaxChunksInRegistrationRequest parameter but if it is not we will add a currentBatch that is empty to the result list.

Good point, I added validation that maxChunksInBlob is always less than getMaxChunksInRegistrationRequest

sfc-gh-asen · 2023-09-11T20:47:01Z

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java

+        } else if (blobData.size()
+            >= this.owningClient.getParameterProvider().getMaxChunksInBlob()) {
+          // Create a new blob if the current one already contains max allowed number of chunks
+          logger.logInfo(
+              "Max allowed number of chunks in the current blob reached. chunkCount={}"
+                  + " maxChunkCount={} currentBlobPath={}",
+              blobData.size(),
+              this.owningClient.getParameterProvider().getMaxChunksInBlob(),
+              blobPath);
+          break;


Should this check be in the shouldStopProcessing() method?

if I am not wrong if the leftoverChannelsDataPerTable is not empty, then we will ignore this check

I think shouldStopProcessing is checking on the channel level but this is on the chunks level, there is no correctness issue if leftoverChannelsDataPerTable is not empty, right?

It should not - the idea is that we process leftover channels first and only when they are empty and we are about to start a new chunk, we check if the new chunks would exceed the max chunk limit. I will add a new test that generates some leftoverChannelsDataPerTable to verify this.

Additional test added to FlushServiceTest

sfc-gh-tzhang

Left some comment, PTAL! No need to block on my approval :)

sfc-gh-tzhang · 2023-09-11T20:17:55Z

src/main/java/net/snowflake/ingest/utils/ParameterProvider.java

@@ -30,6 +30,9 @@ public class ParameterProvider {
  public static final String MAX_CHUNK_SIZE_IN_BYTES = "MAX_CHUNK_SIZE_IN_BYTES".toLowerCase();
  public static final String MAX_ALLOWED_ROW_SIZE_IN_BYTES =
      "MAX_ALLOWED_ROW_SIZE_IN_BYTES".toLowerCase();
+  public static final String MAX_CHUNKS_IN_BLOB = "MAX_CHUNKS_IN_BDEC".toLowerCase();


Please add unit tests for these added parameters

Unit tests added

sfc-gh-tzhang · 2023-09-12T00:46:42Z

src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java

@@ -351,6 +351,16 @@ void distributeFlushTasks() {
        if (!leftoverChannelsDataPerTable.isEmpty()) {
          channelsDataPerTable.addAll(leftoverChannelsDataPerTable);
          leftoverChannelsDataPerTable.clear();
+        } else if (blobData.size()


Could the logic here combined with

snowflake-ingest-java/src/main/java/net/snowflake/ingest/streaming/internal/FlushService.java

Line 407 in 3a3cbc8

if (idx != channelsDataPerTable.size()) {

? Both are checking whether a separate blob is needed

It could, but it could be triggered in the middle of a chunk, which makes sense when we split based on chunk/blob size, but not for the newly added check. The added check only cares about the number of chunks in a blob, so I put the break just before a new chunk is about to start.

sfc-gh-tzhang · 2023-09-12T00:53:38Z

src/main/java/net/snowflake/ingest/utils/ParameterProvider.java

+  public static final int MAX_CHUNKS_IN_BLOB_DEFAULT = 20;
+  public static final int MAX_CHUNKS_IN_REGISTRATION_REQUEST_DEFAULT = 100;


What is the real issue here? Is it because of the total number of chunks in one blob or the total number of the chunks in one request? If it's former, then why we need to limit the total number of chunks in one request? If it's latter, then these two values should be the same?

Fundamentally, the issue issue is the number of chunks in one request, because it is where the server-side latency and potential timeouts come from. They could be the same, the reasoning behind one being smaller than the other was that it would give the SDK an oportunity to put BDECs with fewer chunks into the same registration request. For example, if the limit of both is 100 and there is another bdec with just one chunk, it would have to go into its own registration request.

I see, given that this situation should be rare, I'm not sure if we need to be smart here. I'm more in a favor of adding less configurable parameters with random default values, WDYT?

sfc-gh-lsembera · 2023-10-12T11:21:42Z

@sfc-gh-tzhang I addressed your PR comments, could you re-review, please?

sfc-gh-tzhang

Left a suggestion, PTAL and feel free to merge if you disagree, thanks!

sfc-gh-tzhang · 2023-10-17T01:10:07Z

src/main/java/net/snowflake/ingest/utils/ParameterProvider.java

+  public static final int MAX_CHUNKS_IN_BLOB_DEFAULT = 20;
+  public static final int MAX_CHUNKS_IN_REGISTRATION_REQUEST_DEFAULT = 100;


I see, given that this situation should be rare, I'm not sure if we need to be smart here. I'm more in a favor of adding less configurable parameters with random default values, WDYT?

sfc-gh-lsembera force-pushed the lsembera/limit-chunks-in-one-blob branch 4 times, most recently from 27a27ec to df04e8c Compare September 11, 2023 15:51

sfc-gh-lsembera marked this pull request as ready for review September 11, 2023 15:53

sfc-gh-lsembera requested review from sfc-gh-tzhang and a team as code owners September 11, 2023 15:53

sfc-gh-asen reviewed Sep 11, 2023

View reviewed changes

sfc-gh-tzhang reviewed Sep 12, 2023

View reviewed changes

sfc-gh-lsembera force-pushed the lsembera/limit-chunks-in-one-blob branch 2 times, most recently from c6dd85c to 5e261b1 Compare October 12, 2023 11:17

sfc-gh-lsembera force-pushed the lsembera/limit-chunks-in-one-blob branch from 93c1a3e to e93dd17 Compare October 16, 2023 09:39

sfc-gh-tzhang approved these changes Oct 17, 2023

View reviewed changes

sfc-gh-lsembera added 3 commits October 18, 2023 07:14

SNOW-902709 Limit the max allowed number of chunks in blob

aff4587

Additional assertion

5c93ef8

Merge parameters into 1

7166bde

sfc-gh-lsembera force-pushed the lsembera/limit-chunks-in-one-blob branch from e93dd17 to 7166bde Compare October 18, 2023 08:29

sfc-gh-lsembera merged commit 5aa45d3 into master Oct 18, 2023
12 checks passed

sfc-gh-lsembera deleted the lsembera/limit-chunks-in-one-blob branch October 18, 2023 11:22

sfc-gh-lsembera mentioned this pull request Jun 13, 2024

Blob number limit for reg req to avoid oversized registrations #569

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-902709 Limit the max allowed number of chunks in blob #580

SNOW-902709 Limit the max allowed number of chunks in blob #580

sfc-gh-lsembera commented Aug 29, 2023 •

edited

Loading

sfc-gh-tzhang commented Sep 11, 2023

sfc-gh-asen Sep 11, 2023

sfc-gh-lsembera Oct 12, 2023

sfc-gh-asen Sep 11, 2023

sfc-gh-tzhang Sep 12, 2023

sfc-gh-lsembera Sep 12, 2023

sfc-gh-lsembera Oct 12, 2023

sfc-gh-tzhang left a comment

sfc-gh-tzhang Sep 11, 2023

sfc-gh-lsembera Oct 12, 2023

sfc-gh-tzhang Sep 12, 2023

sfc-gh-lsembera Sep 12, 2023 •

edited

Loading

sfc-gh-tzhang Sep 12, 2023

sfc-gh-lsembera Sep 12, 2023

sfc-gh-tzhang Oct 17, 2023

sfc-gh-lsembera commented Oct 12, 2023

sfc-gh-tzhang left a comment

sfc-gh-tzhang Oct 17, 2023

		public static final int MAX_CHUNKS_IN_BLOB_DEFAULT = 20;
		public static final int MAX_CHUNKS_IN_REGISTRATION_REQUEST_DEFAULT = 100;

SNOW-902709 Limit the max allowed number of chunks in blob #580

SNOW-902709 Limit the max allowed number of chunks in blob #580

Conversation

sfc-gh-lsembera commented Aug 29, 2023 • edited Loading

sfc-gh-tzhang commented Sep 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-lsembera Sep 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-lsembera commented Oct 12, 2023

sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-lsembera commented Aug 29, 2023 •

edited

Loading

sfc-gh-lsembera Sep 12, 2023 •

edited

Loading