The backup is not point in time if a tailed oplog's last timestamp is before a mongodump's. #164

rob256 · 2017-06-01T08:52:15Z

This is only for a sharded cluster, and could occur frequently for small sharded clusters.

So the current process is:
start tailed oplogs
start mongodumps of each replicaset with --oplog
finish mongodumps
stop tailed oplogs

Then the evaluation is made using Resolver.get_consistent_end_ts() which looks at the tailed oplogs, and then appends the oplog entries from the tailed oplogs to the mongodump oplogs.

However, in its current state, it could be common that the backups taken are not point in time.

When I was testing, I found that the optime on the configservers update a lot less frequently. In fact, it looks like it can be every 10 seconds:

configReplSet:PRIMARY> db.oplog.rs.find({}, {'ts': 1, 'op': 1, 'o': 1}).sort({$natural:-1})
{ "ts" : Timestamp(1496305850, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:50.770Z") } } }
{ "ts" : Timestamp(1496305848, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:48.833Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305848, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:48.830Z"), "up" : NumberLong(8444197), "waiting" : false } } }
{ "ts" : Timestamp(1496305839, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:39.157Z") } } }
{ "ts" : Timestamp(1496305838, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:38.823Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305838, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:38.817Z"), "up" : NumberLong(8444187), "waiting" : false } } }
{ "ts" : Timestamp(1496305828, 3), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.961Z") } } }
{ "ts" : Timestamp(1496305828, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.809Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305828, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.804Z"), "up" : NumberLong(8444177), "waiting" : false } } }
{ "ts" : Timestamp(1496305820, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:20.765Z") } } }
{ "ts" : Timestamp(1496305818, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:18.800Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305818, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:18.796Z"), "up" : NumberLong(8444167), "waiting" : false } } }
{ "ts" : Timestamp(1496305809, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:09.151Z") } } }
{ "ts" : Timestamp(1496305808, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:08.789Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305808, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:08.782Z"), "up" : NumberLong(8444157), "waiting" : false } } }
{ "ts" : Timestamp(1496305798, 3), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.955Z") } } }
{ "ts" : Timestamp(1496305798, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.776Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305798, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.771Z"), "up" : NumberLong(8444147), "waiting" : false } } }
{ "ts" : Timestamp(1496305790, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:50.764Z") } } }
{ "ts" : Timestamp(1496305788, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:48.770Z"), "waiting" : true } } }

This becomes a problem because...
The last mongodump that finishes, it's oplog time would be near the current time. But when they finish, the oplog tailers are killed off almost immediately, which means, a config server oplog tailer may have a last_ts of up to 10 seconds ago.
When the evaluation for get_consistent_end_ts() is made the ResolverThread then has a max_end_ts of the config server's last_ts. This means, tailed oplog entries are appended to the mongodump oplog only if they're prior to this max_end_ts, however, one of the mongodump's oplog already exceeds the max_end_ts. This leaves one mongodump with a higher oplog time than the others.

I was testing this by adding around 1000 documents per second against the mongos for one of the sharded collections, and then comparing the documents.

I can work around this by adding a sleep after the mongo dumps finish, given the tailed oplog for the configserver a chance to write it's next timestamp, of which a point in time backup is then made.

The graceful fix for this would be:
Once the mongodumps finish, take the last oplog timestamp from the mongodumps, and then ensure the oplog tailers exceed this. This way, the get_consistent_end_ts() would then be correct.

Please feel free to ask if you need any further information.

Thanks,

The text was updated successfully, but these errors were encountered:

timvaillancourt · 2017-06-01T19:36:13Z

Hi @rob256,

I believe this probably true in 2 scenarios but maybe you can help me confirm:

Any 0.x release of the tool
1.0.x with non-replica-set config servers

I added support to "safely" stop the oplog tailers in 1.0.0 that I believe avoids what you're describing:

After mongodump threads complete a .stop() is called on the Tailer object: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/mongodb_consistent_backup/Main.py#L386
The Tailer object in 1.x will get the current oplog timestamp from each Primary and wait until each TailThread.py reaches that timestamp: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/mongodb_consistent_backup/Oplog/Tailer/Tailer.py#L84-L95

The state of the 'last_ts' of each TailThread.py is passed from parent to thread with OplogState.py so that Tailer.py can wait on it in its .stop() method: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/mongodb_consistent_backup/Oplog/OplogState.py (thread safe due to a multiprocessing.Manager dict())

If all of this is correct I would assume what you're describing is only possible before 1.x or if you have non-replica-set based config servers where there is no oplog to safely stop using timestamps.

Does this sound correct? Could you check if the HEAD version has this issue?

rob256 · 2017-06-01T21:12:02Z

I will check which version of the code I was using tomorrow, but I may have been using HEAD as of a couple of days ago. Thanks.

rob256 · 2017-06-02T07:46:26Z

Yes, I have that, but that seems to wait until the time of the oplog for the ReplicaSet that it's tailing, rather than the maximum time across all ReplicaSets. This means, if the ConfigServer's mongodump oplog finished 5 seconds before the others, its tailer would also finish 5 seconds before the rest.

Following this, the Resolver.get_consistent_end_ts() calculates the max_end_ts by looking at the earliest timestamp across all of the Tailers, which calculates at 5 seconds before one of the mongodumps - this is when the issue occurs.

timvaillancourt · 2017-06-09T15:39:22Z

Hi @rob256, I follow what you're saying but I think part of that isn't the case. A the same time this is very important code I wrote long ago so I appreciate a review of this logic!

In Main.py the code waits for all backups (mongodumps) to finish before calling the code that waits for the current optime of the Primary, so the first oplog tailer(s) to stop would be after the last mongodump finishes. The .run() on self.backup is blocking until all mongodumps finish or it will throw an exception that will cause the tool to exit cleanly: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/mongodb_consistent_backup/Main.py#L379. I think all of this part is working fine.

I've always wanted to add a bit more safety to Resolver.get_consistent_end_ts() but I think the serial order of events in Main.py before the tailers are stopped prevents any consistency issues. There is a possibility that the last oplog timestamp of a given set is before the end-time of the MongodumpThread but that would only happen if there were no writes between that ts and the end ts, but the backup would still be technically consistent to whichever time it chose in .get_consistent_end_ts(), regardless of the lack of writes.

Could you sanity check those assumptions for me?

timvaillancourt · 2017-06-09T17:00:25Z

Doh, I see what you're saying now - this code takes a while to wrap my head around.

Yes, choosing the minimum timestamp is dangerous if no writes have occurred in some time and mongodumps complete. Good point.

I suppose the correct logic would be to leave the oplog tailing code as-is, but move .get_consistent_end_ts() to find the lowest oplog tailer end ts that is greater-or-equal to the end time of the last mongodump. If all tailers do not have an update after the end of all mongodumps than the end time of the last mongodump would be the fallback. Does that logic check out @rob256?

timvaillancourt · 2017-06-09T18:06:44Z

I've created a commit with the logic I mentioned in my last reply here to demonstrate: https://github.com/timvaillancourt/mongodb_consistent_backup/tree/issue_164_max_resolver_ts.

It might be more precise to use the time MongodumpThread.py gets an exit from mongodump but as the oplog dumping is the final step of mongodump, after the dumping of collection data, I consider this safe for now.

I've also made it round up the consistent_max_ts second because I'm rounding-down the increment counter to zero in this method. This will catch the last subsecond incremented ops.

rob256 · 2017-06-09T20:30:17Z

Thank you @timvaillancourt for taking the time to look into this. I've reviewed your modification and I'll test this out on Monday. I'll let you know how it goes. Thanks.

rob256 · 2017-06-13T18:27:16Z

Hi @timvaillancourt, I've tested your fork and so far it looks good. I'll give it another test early next week and will see if I can tweak my test to hit it a bit harder. Currently my test environment only has 2 shards. Thanks.

timvaillancourt · 2017-06-14T12:28:23Z

Thanks for your testing @rob256!

I will merge this to our 1.0.4 branch (to become a release soon) as this is at least safer than the current logic. This will be merged with other improvements here: https://github.com/Percona-Lab/mongodb_consistent_backup/tree/1.0.4.

I will delete this test fork when you're done testing.

…odump (#164) (#171) * Don't return an consistent_end_ts that is earlier than the last_ts of mongodump oplogs * Some code cleanup and fixed rounding-up logic

timvaillancourt · 2017-06-22T18:26:00Z

This fix has moved up to the 1.1.0 release now: https://github.com/Percona-Lab/mongodb_consistent_backup/tree/1.1.0

timvaillancourt · 2017-07-21T10:37:14Z

Hi @rob256, has this fix resolved the problems you noticed 100%? If so I will close this one

* Only apply quorum checking to voting members * Only apply quorum checking to electable members (not arbiter or priority=0) * 'make docker' must depend on the bin/mongodb-consistent-backup Makefile-step (#167) * Fix issue #163, move to OperationError() (#168) * Remove 'secondary_count' logic, only count electable nodes for new quorum logic (#169) * 1.0.4: Print more mongodump details (#170) * Print more details about mongodump * Version must be set first * Check if version unknown * 1.0.4: Ensure Resolver consistent timestamp is not before end of mongodump (#164) (#171) * Don't return an consistent_end_ts that is earlier than the last_ts of mongodump oplogs * Some code cleanup and fixed rounding-up logic * 1.0.4: Support handling of TailThread cursor/connection failures (#174) Support better handling/logging of TailThread cursor/connection failures Fsync the tailed oplog to disk on time or doc-writes thresholds. Pass 'backup_stop' event to threads to signal failure to other child threads. Allow oplog tailing to be disabled. Added script to simulate failure of cursor/query. * Fix disabling of oplog tailer * Fix disabling of oplog tailer (#175) * 1.0.4: Mongodump handle error (#176) * Handle mongodump's "Failed: ..." messages as a proper error * 1.0.4: Support Google Cloud Storage upload v1 (#178) * 1.1.0 VERSION (#180) * Readme gcs (#181) * Readme gcs 2 (#182) * 1.1.0 requirements.txt update (#183) * Update Fabric and boto versions * 1.1.0 repllag default 10s (#184) * Raise default repl lag max to 10s, 5s has shown to be too low on many hosts. 3.4 adds a no-op oplog update every 10s so this seems to align nicely with it * 1.1.0 gcs upload threading (#185)

timvaillancourt self-assigned this Jun 1, 2017

timvaillancourt added the bug label Jun 9, 2017

timvaillancourt mentioned this issue Jul 21, 2017

1.1.0 Release #188

Merged

dbmurphy closed this as completed Aug 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The backup is not point in time if a tailed oplog's last timestamp is before a mongodump's. #164

The backup is not point in time if a tailed oplog's last timestamp is before a mongodump's. #164

rob256 commented Jun 1, 2017

timvaillancourt commented Jun 1, 2017 •

edited

Loading

Uh oh!

rob256 commented Jun 1, 2017

Uh oh!

rob256 commented Jun 2, 2017

Uh oh!

timvaillancourt commented Jun 9, 2017 •

edited

Loading

Uh oh!

timvaillancourt commented Jun 9, 2017 •

edited

Loading

Uh oh!

timvaillancourt commented Jun 9, 2017

Uh oh!

rob256 commented Jun 9, 2017

Uh oh!

rob256 commented Jun 13, 2017

Uh oh!

timvaillancourt commented Jun 14, 2017

Uh oh!

timvaillancourt commented Jun 22, 2017

Uh oh!

timvaillancourt commented Jul 21, 2017

Uh oh!

The backup is not point in time if a tailed oplog's last timestamp is before a mongodump's. #164

The backup is not point in time if a tailed oplog's last timestamp is before a mongodump's. #164

Comments

rob256 commented Jun 1, 2017

timvaillancourt commented Jun 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rob256 commented Jun 1, 2017

Uh oh!

rob256 commented Jun 2, 2017

Uh oh!

timvaillancourt commented Jun 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timvaillancourt commented Jun 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timvaillancourt commented Jun 9, 2017

Uh oh!

rob256 commented Jun 9, 2017

Uh oh!

rob256 commented Jun 13, 2017

Uh oh!

timvaillancourt commented Jun 14, 2017

Uh oh!

timvaillancourt commented Jun 22, 2017

Uh oh!

timvaillancourt commented Jul 21, 2017

Uh oh!

timvaillancourt commented Jun 1, 2017 •

edited

Loading

timvaillancourt commented Jun 9, 2017 •

edited

Loading

timvaillancourt commented Jun 9, 2017 •

edited

Loading