Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blob number limit for reg req to avoid oversized registrations #569

Closed

Conversation

sfc-gh-azagrebin
Copy link
Contributor

Currently, if there is a connection outage, client can accumulate many blobs to register.
The blobs are serialised into a json to be sent to Snowflake.
The size of json can exceed limit of HTTP request.
To avoid this, the PR suggests to split blobs into batches of a configurable size.

See #567

// Register the blobs, and invalidate any channels that return a failure status code
this.owningClient.registerBlobs(blobs);
Lists.partition(blobs, blobsPerReq).forEach(owningClient::registerBlobs);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a IO networking request, would it make sense to parallelize?

Copy link
Contributor Author

@sfc-gh-azagrebin sfc-gh-azagrebin Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be generally in favour if it was a common case.
Here it should be more like an exception indicating that system is already overloaded (added a log),
Hence I tend to not complicate the system with even more concurrency for this.
Happy to discuss more

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be generally in favour if it was a common case.

Actually, I take this back, we cannot do this because of ordering guarantees per channel.

Copy link
Contributor

@sfc-gh-tzhang sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, please add tests, otherwise LGTM!

@@ -296,6 +301,13 @@ public long getMaxAllowedRowSizeInBytes() {
return (val instanceof String) ? Long.parseLong(val.toString()) : (long) val;
}

public int getMaxBlobsToRegisterInOneRequest() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a unit test for this like all other parameters we added?

// Register the blobs, and invalidate any channels that return a failure status code
this.owningClient.registerBlobs(blobs);
Lists.partition(blobs, blobsPerReq).forEach(owningClient::registerBlobs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test where the blobs request is splited?

private int getBlobNumToRegisterInOneReq(int blobNum) {
int blobsPerReq = owningClient.getParameterProvider().getMaxBlobsToRegisterInOneRequest();
if (blobNum > blobsPerReq) {
logger.logWarn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add client name in the log as well?

@@ -51,6 +54,8 @@ public class ParameterProvider {
It reduces memory consumption compared to using Java Objects for buffering.*/
public static final boolean ENABLE_PARQUET_INTERNAL_BUFFERING_DEFAULT = false;

public static final int MAX_BLOBS_TO_REGISTER_IN_ONE_REQUEST_DEFAULT = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 seems to low, should we start with something higher like 30? If we have the log in Snowhouse for tacking number of blobs in one request, we could pick the P99 as well.

@sfc-gh-lsembera
Copy link
Contributor

sfc-gh-lsembera commented Jun 13, 2024

@sfc-gh-azagrebin Can we close this PR? In #580 I implemented limiting the number of chunks in a blob and in a registration request.

@sfc-gh-azagrebin
Copy link
Contributor Author

Sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants