-
Notifications
You must be signed in to change notification settings - Fork 81
The backup is not point in time if a tailed oplog's last timestamp is before a mongodump's. #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @rob256, I believe this probably true in 2 scenarios but maybe you can help me confirm:
I added support to "safely" stop the oplog tailers in 1.0.0 that I believe avoids what you're describing:
The state of the 'last_ts' of each TailThread.py is passed from parent to thread with OplogState.py so that Tailer.py can wait on it in its .stop() method: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/mongodb_consistent_backup/Oplog/OplogState.py (thread safe due to a multiprocessing.Manager dict()) If all of this is correct I would assume what you're describing is only possible before 1.x or if you have non-replica-set based config servers where there is no oplog to safely stop using timestamps. Does this sound correct? Could you check if the HEAD version has this issue? |
I will check which version of the code I was using tomorrow, but I may have been using HEAD as of a couple of days ago. Thanks. |
Yes, I have that, but that seems to wait until the time of the oplog for the ReplicaSet that it's tailing, rather than the maximum time across all ReplicaSets. This means, if the ConfigServer's mongodump oplog finished 5 seconds before the others, its tailer would also finish 5 seconds before the rest. Following this, the |
Hi @rob256, I follow what you're saying but I think part of that isn't the case. A the same time this is very important code I wrote long ago so I appreciate a review of this logic! In Main.py the code waits for all backups (mongodumps) to finish before calling the code that waits for the current optime of the Primary, so the first oplog tailer(s) to stop would be after the last mongodump finishes. The .run() on self.backup is blocking until all mongodumps finish or it will throw an exception that will cause the tool to exit cleanly: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/mongodb_consistent_backup/Main.py#L379. I think all of this part is working fine. I've always wanted to add a bit more safety to Resolver.get_consistent_end_ts() but I think the serial order of events in Main.py before the tailers are stopped prevents any consistency issues. There is a possibility that the last oplog timestamp of a given set is before the end-time of the MongodumpThread but that would only happen if there were no writes between that ts and the end ts, but the backup would still be technically consistent to whichever time it chose in .get_consistent_end_ts(), regardless of the lack of writes. Could you sanity check those assumptions for me? |
Doh, I see what you're saying now - this code takes a while to wrap my head around. Yes, choosing the minimum timestamp is dangerous if no writes have occurred in some time and mongodumps complete. Good point. I suppose the correct logic would be to leave the oplog tailing code as-is, but move .get_consistent_end_ts() to find the lowest oplog tailer end ts that is greater-or-equal to the end time of the last mongodump. If all tailers do not have an update after the end of all mongodumps than the end time of the last mongodump would be the fallback. Does that logic check out @rob256? |
I've created a commit with the logic I mentioned in my last reply here to demonstrate: https://github.com/timvaillancourt/mongodb_consistent_backup/tree/issue_164_max_resolver_ts. It might be more precise to use the time MongodumpThread.py gets an exit from mongodump but as the oplog dumping is the final step of mongodump, after the dumping of collection data, I consider this safe for now. I've also made it round up the consistent_max_ts second because I'm rounding-down the increment counter to zero in this method. This will catch the last subsecond incremented ops. |
Thank you @timvaillancourt for taking the time to look into this. I've reviewed your modification and I'll test this out on Monday. I'll let you know how it goes. Thanks. |
Hi @timvaillancourt, I've tested your fork and so far it looks good. I'll give it another test early next week and will see if I can tweak my test to hit it a bit harder. Currently my test environment only has 2 shards. Thanks. |
Thanks for your testing @rob256! I will merge this to our 1.0.4 branch (to become a release soon) as this is at least safer than the current logic. This will be merged with other improvements here: https://github.com/Percona-Lab/mongodb_consistent_backup/tree/1.0.4. I will delete this test fork when you're done testing. |
This fix has moved up to the 1.1.0 release now: https://github.com/Percona-Lab/mongodb_consistent_backup/tree/1.1.0 |
Hi @rob256, has this fix resolved the problems you noticed 100%? If so I will close this one |
* Only apply quorum checking to voting members * Only apply quorum checking to electable members (not arbiter or priority=0) * 'make docker' must depend on the bin/mongodb-consistent-backup Makefile-step (#167) * Fix issue #163, move to OperationError() (#168) * Remove 'secondary_count' logic, only count electable nodes for new quorum logic (#169) * 1.0.4: Print more mongodump details (#170) * Print more details about mongodump * Version must be set first * Check if version unknown * 1.0.4: Ensure Resolver consistent timestamp is not before end of mongodump (#164) (#171) * Don't return an consistent_end_ts that is earlier than the last_ts of mongodump oplogs * Some code cleanup and fixed rounding-up logic * 1.0.4: Support handling of TailThread cursor/connection failures (#174) Support better handling/logging of TailThread cursor/connection failures Fsync the tailed oplog to disk on time or doc-writes thresholds. Pass 'backup_stop' event to threads to signal failure to other child threads. Allow oplog tailing to be disabled. Added script to simulate failure of cursor/query. * Fix disabling of oplog tailer * Fix disabling of oplog tailer (#175) * 1.0.4: Mongodump handle error (#176) * Handle mongodump's "Failed: ..." messages as a proper error * 1.0.4: Support Google Cloud Storage upload v1 (#178) * 1.1.0 VERSION (#180) * Readme gcs (#181) * Readme gcs 2 (#182) * 1.1.0 requirements.txt update (#183) * Update Fabric and boto versions * 1.1.0 repllag default 10s (#184) * Raise default repl lag max to 10s, 5s has shown to be too low on many hosts. 3.4 adds a no-op oplog update every 10s so this seems to align nicely with it * 1.1.0 gcs upload threading (#185)
This is only for a sharded cluster, and could occur frequently for small sharded clusters.
So the current process is:
start tailed oplogs
start mongodumps of each replicaset with --oplog
finish mongodumps
stop tailed oplogs
Then the evaluation is made using
Resolver.get_consistent_end_ts()
which looks at the tailed oplogs, and then appends the oplog entries from the tailed oplogs to the mongodump oplogs.However, in its current state, it could be common that the backups taken are not point in time.
When I was testing, I found that the optime on the configservers update a lot less frequently. In fact, it looks like it can be every 10 seconds:
This becomes a problem because...
The last mongodump that finishes, it's oplog time would be near the current time. But when they finish, the oplog tailers are killed off almost immediately, which means, a config server oplog tailer may have a
last_ts
of up to 10 seconds ago.When the evaluation for
get_consistent_end_ts()
is made theResolverThread
then has amax_end_ts
of the config server'slast_ts
. This means, tailed oplog entries are appended to the mongodump oplog only if they're prior to thismax_end_ts
, however, one of the mongodump's oplog already exceeds themax_end_ts
. This leaves one mongodump with a higher oplog time than the others.I was testing this by adding around 1000 documents per second against the mongos for one of the sharded collections, and then comparing the documents.
I can work around this by adding a sleep after the mongo dumps finish, given the tailed oplog for the configserver a chance to write it's next timestamp, of which a point in time backup is then made.
The graceful fix for this would be:
Once the mongodumps finish, take the last oplog timestamp from the mongodumps, and then ensure the oplog tailers exceed this. This way, the
get_consistent_end_ts()
would then be correct.Please feel free to ask if you need any further information.
Thanks,
The text was updated successfully, but these errors were encountered: