Maximum query array size of client.executeBatch #68

webcc · 2014-03-08T11:22:33Z

Hi,

We would like to know the maximum query array size that is possible to send to client.executeBatch(). We think it is a good idea to document that because we are facing problems with sizes bigger than 4000 +.

We are addressing that for the moment by splicing the array.

The text was updated successfully, but these errors were encountered:

jorgebay · 2014-03-10T11:05:21Z

As far as I know, there is no limit on batches at protocol level or CQL.

The INSERT / UPDATE queries are for the same partition key?
If not, I don't think it would be a good idea to batch atomic queries for large amount of partitions...

Another consideration is that, to batch a large amount of queries, you create in memory a large number of queries and parameters. Also, you send all that data through the wire "serially"...

webcc · 2014-03-12T18:58:25Z

There seems to be a limit. To give you an idea of the issue, we are sending for a given primary key around 150,000 INSERTs. That generates an exception in FrameWriter. If we splice the array of queries in chunks of 5000 items or less, the problem disappears.

Could you tell us what their options in queryFlag do? In particular, we would like to know what the property pageSize does. That seems to have influence with the performance of the queries if we put that in the configuration of the driver.

And by the way, many thanks for this excellent piece of software.

jorgebay · 2014-03-12T21:23:16Z

Thanks!

queryFlag is not affecting the batch in any way, pageSize is used by Cassandra only for select queries and it is ignored for others.

I still think it is not a good idea to batch such a large amount of queries, consider that each query should take on average 50 bytes (depending on the size of the query and parameters) multiplied by 150,000 queries is more than 7Mb of data on memory (that then is transfered over the wire).
Is there a reason to do such large operations?

Also, if possible, use non atomic batches (atomic batches have a performance impact):

client.executeBatch(queries, consistency, {atomic: false}, callback);

If you are getting an error from the FrameWriter, please post it.

darthcav · 2014-03-26T13:00:33Z

Hi Jorge,

Could you provide us with a short comment on what the option {atomic: false} (versus {atomic: true}) exactly does?

jorgebay · 2014-03-26T15:31:14Z

Its atomic in database terms: if any part of the batch succeeds, all of it will.

More info: Atomic batches in Cassandra

dsimmons · 2014-08-21T17:26:46Z

I'm seeing the same thing. It took me a while to track it down because only a few of my INSERTS are failing with exception TypeError: value is out of bounds. At first I thought it was due to incorrect type coersion of very long IDs (Twitter ID's, 64-bit ints that I'm storing as a string).

The problem stems from the following code:

FrameWriter.prototype.writeShort = function(num) {
  var buf = new Buffer(2);
  buf.writeUInt16BE(num, 0);
  this.buffers.push(buf);
};

The parameter num in one of my failure cases, for example, is 197136. I looked up writeUInt16BE in the Node docs, but some simple math tells me that 197136 is way outside of the 2^16 possible.

Now, with no knowledge of the underlying Cassandra wire protocol, my question is: is it possible to step up this value to perhaps 2^32? I realize that batches of this size are probably recommended against, but for these particular transactions, I need them to be that big to remain atomic. This particular insert is around 12MB uncompressed as JSON.

adam-roth · 2016-01-13T06:19:53Z

@jorgebay - Wouldn't it be more descriptive to say that if any part of the batch fails, the entire batch fails? I suppose the two are equivalent, but typically "what happens when something fails?" is the main concern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum query array size of client.executeBatch #68

Maximum query array size of client.executeBatch #68

webcc commented Mar 8, 2014

jorgebay commented Mar 10, 2014

webcc commented Mar 12, 2014

jorgebay commented Mar 12, 2014

darthcav commented Mar 26, 2014

jorgebay commented Mar 26, 2014

dsimmons commented Aug 21, 2014

adam-roth commented Jan 13, 2016

Maximum query array size of client.executeBatch #68

Maximum query array size of client.executeBatch #68

Comments

webcc commented Mar 8, 2014

jorgebay commented Mar 10, 2014

webcc commented Mar 12, 2014

jorgebay commented Mar 12, 2014

darthcav commented Mar 26, 2014

jorgebay commented Mar 26, 2014

dsimmons commented Aug 21, 2014

adam-roth commented Jan 13, 2016