Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.10.0] Write into fully-replicated cluster is not replicated across all shards #5610

Closed
e-dard opened this issue Feb 10, 2016 · 2 comments
Closed
Assignees
Milestone

Comments

@e-dard
Copy link
Contributor

e-dard commented Feb 10, 2016

Steps to reproduce:

  • 3-node cluster with default configs. n2 and n3 join n1.
  • create a database on n1.
  • apply the following write on n1: insert cpu value=1.

Expected result:

  • Database should be replicated on each of the three nodes.

Actual result:

  • One of the nodes does not create the database and store the point (in my example shard 4 is not created on n2.

screen shot 2016-02-10 at 11 18 40

Output and logs

n1:

> create database db
> use db
Using database db
> show servers
name: data_nodes
----------------
id  http_addr   tcp_addr
1   localhost:8186  localhost:8188
4   localhost:8386  localhost:8388
5   localhost:8286  localhost:8288


name: meta_nodes
----------------
id  http_addr   tcp_addr
1   localhost:8191  localhost:8188
2   localhost:8391  localhost:8388
3   localhost:8291  localhost:8288

> show shards
name: _internal
---------------
id  database    retention_policy    shard_group start_time      end_time        expiry_time     owners
1   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    4
2   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    5
3   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    1


name: db
--------
id  database    retention_policy    shard_group start_time  end_time    expiry_time owners

> insert cpu value=1
> show shards
name: _internal
---------------
id  database    retention_policy    shard_group start_time      end_time        expiry_time     owners
1   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    4
2   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    5
3   _internal   monitor         1       2016-02-10T00:00:00Z    2016-02-11T00:00:00Z    2016-02-18T00:00:00Z    1


name: db
--------
id  database    retention_policy    shard_group start_time      end_time        expiry_time     owners
4   db      default         2       2016-02-08T00:00:00Z    2016-02-15T00:00:00Z    2016-02-15T00:00:00Z    4,5,1

> show retention policies on db
name    duration    replicaN    default
default 0       3       true

>

Note in the logs: n2 mentions that shard 4 does not exist, so it drops the write. I think that's the starting point for investigating this issue.

n1.txt
n2.txt
n3.txt

@dgnorton
Copy link
Contributor

Closing because we can't replicate as of commit b6a0b6.

@dgnorton
Copy link
Contributor

Reopening because we saw it happen again.

@dgnorton dgnorton reopened this Feb 25, 2016
jwilder added a commit that referenced this issue Mar 1, 2016
There was a race where a remote write could arrive before a meta
client cache update arrived.  When this happened, the receiving node
would drop the write because it could not determine what database
and retention policy the shard it was supposed to create belonged
to.

This change sends the db and rp along with the write so that the
receiving node does not need to consult the meta store.  It also
allows us to not send writes for shards that no longer exist instead
of always sending them and having the receiving node logs fill up
with dropped write requests.  This second situation can occur when
shards are deleted and some nodes still have writes queued in hinted
handoff for those shards.

Fixes #5610
@jwilder jwilder closed this as completed in 43118ce Mar 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants