-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blob number limit for reg req to avoid oversized registrations #569
Blob number limit for reg req to avoid oversized registrations #569
Conversation
…reg req to avoid oversized registrations
// Register the blobs, and invalidate any channels that return a failure status code | ||
this.owningClient.registerBlobs(blobs); | ||
Lists.partition(blobs, blobsPerReq).forEach(owningClient::registerBlobs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a IO networking request, would it make sense to parallelize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be generally in favour if it was a common case.
Here it should be more like an exception indicating that system is already overloaded (added a log),
Hence I tend to not complicate the system with even more concurrency for this.
Happy to discuss more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be generally in favour if it was a common case.
Actually, I take this back, we cannot do this because of ordering guarantees per channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, please add tests, otherwise LGTM!
@@ -296,6 +301,13 @@ public long getMaxAllowedRowSizeInBytes() { | |||
return (val instanceof String) ? Long.parseLong(val.toString()) : (long) val; | |||
} | |||
|
|||
public int getMaxBlobsToRegisterInOneRequest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a unit test for this like all other parameters we added?
// Register the blobs, and invalidate any channels that return a failure status code | ||
this.owningClient.registerBlobs(blobs); | ||
Lists.partition(blobs, blobsPerReq).forEach(owningClient::registerBlobs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test where the blobs request is splited?
private int getBlobNumToRegisterInOneReq(int blobNum) { | ||
int blobsPerReq = owningClient.getParameterProvider().getMaxBlobsToRegisterInOneRequest(); | ||
if (blobNum > blobsPerReq) { | ||
logger.logWarn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add client name in the log as well?
@@ -51,6 +54,8 @@ public class ParameterProvider { | |||
It reduces memory consumption compared to using Java Objects for buffering.*/ | |||
public static final boolean ENABLE_PARQUET_INTERNAL_BUFFERING_DEFAULT = false; | |||
|
|||
public static final int MAX_BLOBS_TO_REGISTER_IN_ONE_REQUEST_DEFAULT = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 seems to low, should we start with something higher like 30? If we have the log in Snowhouse for tacking number of blobs in one request, we could pick the P99 as well.
@sfc-gh-azagrebin Can we close this PR? In #580 I implemented limiting the number of chunks in a blob and in a registration request. |
Sure |
Currently, if there is a connection outage, client can accumulate many blobs to register.
The blobs are serialised into a json to be sent to Snowflake.
The size of json can exceed limit of HTTP request.
To avoid this, the PR suggests to split blobs into batches of a configurable size.
See #567