[0.10.0] Write into fully-replicated cluster is not replicated across all shards #5610

e-dard · 2016-02-10T11:17:56Z

Steps to reproduce:

3-node cluster with default configs. n2 and n3 join n1.
create a database on n1.
apply the following write on n1: insert cpu value=1.

Expected result:

Database should be replicated on each of the three nodes.

Actual result:

One of the nodes does not create the database and store the point (in my example shard 4 is not created on n2.

Output and logs

n1:

> create database db
> use db
Using database db
> show servers
name: data_nodes
----------------
id  http_addr   tcp_addr
1   localhost:8186  localhost:8188
4   localhost:8386  localhost:8388
5   localhost:8286  localhost:8288


name: meta_nodes
----------------
id  http_addr   tcp_addr
1   localhost:8191  localhost:8188
2   localhost:8391  localhost:8388
3   localhost:8291  localhost:8288

> show shards
name: _internal
---------------
id  database    retention_policy    shard_group start_time      end_time        expiry_time     owners
1   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    4
2   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    5
3   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    1


name: db
--------
id  database    retention_policy    shard_group start_time  end_time    expiry_time owners

> insert cpu value=1
> show shards
name: _internal
---------------
id  database    retention_policy    shard_group start_time      end_time        expiry_time     owners
1   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    4
2   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    5
3   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    1


name: db
--------
id  database    retention_policy    shard_group start_time      end_time        expiry_time     owners
4   db      default         2       2016-02-08T00:00:00Z    2016-02-15T00:00:00Z    2016-02-15T00:00:00Z    4,5,1

> show retention policies on db
name    duration    replicaN    default
default 0       3       true

>

Note in the logs: n2 mentions that shard 4 does not exist, so it drops the write. I think that's the starting point for investigating this issue.

n1.txt
n2.txt
n3.txt

The text was updated successfully, but these errors were encountered:

dgnorton · 2016-02-22T18:12:02Z

Closing because we can't replicate as of commit b6a0b6.

dgnorton · 2016-02-25T16:55:26Z

Reopening because we saw it happen again.

There was a race where a remote write could arrive before a meta client cache update arrived. When this happened, the receiving node would drop the write because it could not determine what database and retention policy the shard it was supposed to create belonged to. This change sends the db and rp along with the write so that the receiving node does not need to consult the meta store. It also allows us to not send writes for shards that no longer exist instead of always sending them and having the receiving node logs fill up with dropped write requests. This second situation can occur when shards are deleted and some nodes still have writes queued in hinted handoff for those shards. Fixes #5610

e-dard added the category/clustering label Feb 10, 2016

e-dard added this to the 0.11.0 milestone Feb 10, 2016

jwilder mentioned this issue Feb 10, 2016

Cluster Meta-Queries #5581

Closed

e-dard assigned dgnorton Feb 22, 2016

dgnorton closed this as completed Feb 22, 2016

dgnorton reopened this Feb 25, 2016

jwilder closed this as completed in 43118ce Mar 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.10.0] Write into fully-replicated cluster is not replicated across all shards #5610

[0.10.0] Write into fully-replicated cluster is not replicated across all shards #5610

e-dard commented Feb 10, 2016

dgnorton commented Feb 22, 2016

dgnorton commented Feb 25, 2016

[0.10.0] Write into fully-replicated cluster is not replicated across all shards #5610

[0.10.0] Write into fully-replicated cluster is not replicated across all shards #5610

Comments

e-dard commented Feb 10, 2016

Output and logs

dgnorton commented Feb 22, 2016

dgnorton commented Feb 25, 2016