Data loss when using big batches #502

ohurvitz · 2014-05-05T20:02:29Z

I have the following case:

I have a cluster of several servers (I reproduced with 4 and 10 servers)
I set split=(number of servers)
This reproduced with replication set to both 1 and 2
I write a single point to each of 25000 series, and repeat 5 times (increasing time for every cycle)
I send the data in batches, using 10 sending threads and I spread the writes between all the servers.
I write as fast as I can from each send thread, but verify I get no errors.

As the batch size increases, some data does not make it to the database, even though no error is reported. For my cases, a batch size of 5000 points does it almost every time, while 4000 does it most of the time.

Code to reproduce:
https://gist.github.com/ohurvitz/e5d74ae56d8ffa20e968

Note that there is a lot of hard coded numbers in that code, and also it expects the server name to contain a single digit '1' that is replaced by '2', '3' etc to get to more servers.

ohurvitz · 2014-05-06T15:02:00Z

Forgot to add something that might help finding the issue - if you run with replication on you can get different data dropped in different nodes, so if you run the test program with -write_data=false you get different errors every time (as queries get run on different nodes for each replicated shard).

jvshahid · 2014-05-06T15:47:37Z

Thanks @ohurvitz. Moving this issue to 0.6.1

ohurvitz · 2014-05-15T19:04:27Z

Anything about this? Still happens on latest, with and without setting write-batch-size as in new config file.

jvshahid · 2014-05-15T19:05:00Z

We haven't got chance to take a look at it yet. It's definitely on my todo list.

jvshahid · 2014-05-23T22:44:18Z

@ohurvitz I finally got to take a look at this and have a fix. Thanks for the best bug report ever, was little tricky to track it down though.

jvshahid added this to the 0.6.1 milestone May 6, 2014

jvshahid self-assigned this May 6, 2014

jvshahid modified the milestones: 0.6.2, 0.6.1, Next release May 7, 2014

jvshahid closed this as completed in dd26730 May 23, 2014

jvshahid modified the milestones: 0.7.0, Next release May 23, 2014

jvshahid added a commit that referenced this issue May 27, 2014

Fix #502. Fix a race condition in assigning id to db+series+field

e0d38ed

ohurvitz unassigned jvshahid Feb 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data loss when using big batches #502

Data loss when using big batches #502

ohurvitz commented May 5, 2014

ohurvitz commented May 6, 2014

jvshahid commented May 6, 2014

ohurvitz commented May 15, 2014

jvshahid commented May 15, 2014

jvshahid commented May 23, 2014

Data loss when using big batches #502

Data loss when using big batches #502

Comments

ohurvitz commented May 5, 2014

ohurvitz commented May 6, 2014

jvshahid commented May 6, 2014

ohurvitz commented May 15, 2014

jvshahid commented May 15, 2014

jvshahid commented May 23, 2014