-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] #1
[WIP] #1
Conversation
- the master node in mongo will grab the primary lock in consul - or first when the replica is not yet started - the master node is the only node that updates the replica config - the replica set config is automatically updated from consul's list of mongo nodes
CMD [ \ | ||
"containerpilot", \ | ||
"mongod", \ | ||
"--replSet=joyent" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should show this CMD (specifically the --replSet
) being overridden in the docker-compose.yml
file so that it's clear to someone reading through this that you can create multiple replSets by having multiple service blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (push coming shortly), also updating to have a minimal README.md on how to run.
- comments for `mongo_update_replset_config` - update readme with some usage
- expose consul port 8500 in local development - only one ContainerPilot/Consul "state" for mongodb to prevent ContainerPilot reloads - use `replSetGetStatus` instead of `is_primary` - use consul key with non-expiring token to determine "initdb" - add comment to code about scaling above 7 nodes
Two most recent commits to address the following:
|
|
||
# --------------------------------------------------------- | ||
|
||
#class ContainerPilot(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not using this we should just cut it rather than commenting-out.
Edit: nevermind, I see you've got a TODO about this below.
- add a pre_stop method to containerpilot
Improved |
log.debug(e) | ||
try: | ||
# force | ||
local_mongo.admin.command('replSetStepDown', 60, force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some more thought, we should drop this force=True
part. I think it will be safer if the user explicitly kills the container or does a force stepdown themselves.
ie, if the container gets a SIGTERM
and can't stepdown
from being primary, it should fail to stop. It fails if there is no secondary or if the secondaries were too far behind and still didn't catch up in the 8
second catch-up period.
Does that sound right to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be safer if the user explicitly kills the container
If a container is stopped via docker stop
it receives SIGTERM and then after the timeout it'll get SIGKILL anyways. Will we be in a sane state if that happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current forced stepdown or SIGKILL
of the primary node will leave us in about the same state (after an attempted stepdown); a new master would be elected (if there are enough nodes left for a vote) and there could likely be a rollback. So it seems better to let the signals tell us to "clean up" for shutdown (SIGTERM
) and if we can't step down from primary, then let the user do the kill on purpose. Maybe we add documentation around using docker kill -s SIGTERM
rather than docker stop
? Or maybe recommend higher timeouts to docker stop
, and containerpliot? Can we to adjust the stopTimeout
in containerpilot from the python so that it can be set via env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe recommend higher timeouts to docker stop, and containerpliot?
Keep in mind that lots of people will be using some kind of higher-level scheduler so telling them not to use docker stop
might not be relevant. Setting the timeouts is the right way to go.
Can we to adjust the stopTimeout in containerpilot from the python so that it can be set via env?
You can interpolate it in the ContainerPilot config from the env. See https://www.joyent.com/containerpilot/docs/configuration#template-configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started making these timeouts configurable from the environment, but I am unable to set stopTimeout
in the containerpilot json since the go module expects an int
and not a string
. Is there another way around it? Maybe use the ContainerPilot
class that we still have and write out the updated config using the environment variables?
// works
"stopTimeout": 9,
// fails
"stopTimeout": "{{.STOP_TIMEOUT}}",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work just fine, right? "stopTimeout": {{.STOP_TIMEOUT}},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had tried it and gotten parse errors, but after another rebuild it works fine now. It seems I did not save the Dockerfile with the ENV
set. 😮 😞
So I have a case that I think we need to handle:
So the question is, what should we do about the mongo config being out of sync with what is in consul? The worst case is if someone is scaling down their cluster from say 5 to 3 and one of the containers stopped is the primary with the other being stopped while there is still no elected primary, then the end result is a 3 node cluster that has 5 nodes of config. If another node goes down it will fail to have a primary anymore. If a new node is added or if a node is removed while there is an active primary, then the replica config will become consistent with the config in consul. One solution is to just add the |
Good question. Can we add a It's also maybe fair to ticket that as a bug and return to it. This is shaping up nicely and looks close (or nearly close) enough to MVP to be ready for merging. |
@misterbisson, I implemented a "wait for election" after the primary node steps down (all within the On a related note, I set |
The conversation in and around TritonDataCenter/containerpilot#200 seems to imply that deregistration prior to invoking Additionally, given that |
Your description (
This is a good use case to consider and it deserves more thought/discussion.
I'm putting words in your mouth with "naively" there, but is it a fair distillation of the concern? How does the application enforce the refusal to shut down back to the scheduler/daemon/infrastructure? Is there any way to tell |
In stateful applications in particular this means client applications will still be sending writes until their TTL expires. The
I'm certain open to changing the behavior if we think we've got the wrong one but another configuration option at this point has to really sell itself. Should the "how should ContainerPilot behave?" (vs the "how does ContainerPilot behave?") portion of this discussion get moved over to a ContainerPilot issue? |
It seems fair to bring the discussion about consul registration elsewhere, though what it means for this image is that if you The way to work around this is to first As for what is left here, there are two major parts I can think of:
I'll see if I can get the docs up this afternoon and then we can stick the backups into a new issue (especially since the recommended backup strategies are MongoDB Inc's "Cloud Manager", "Ops Manager", or file system snapshots). |
Pushed my docs changes (might want a technical writer to make it better 😉). Let me know if you want anything else in this PR. One more thing to note: do not scale down to less than half of your voting members or you will lose quorum and the nodes will fail to have a primary node (assumed netsplit). In other words, if you have 7 and want to only have 3, you must first scale to 5 and make sure the replica config is updated to just 5 nodes then you can scale to 3. It is probably best to scale down one at a time and if the current primary is the node you want to destroy, you should do the stepdown first. So maybe we need safe scaling down steps:
|
This is awesome. Thank you for working on this! At the MongoDB build step when running
Adding |
add libssl-dev to fix pip cryptography build issue
@misterbisson and @tgross, this looks fine to me. Let me know if there is anything else you would like me to add. |
@tgross @misterbisson anything else you'd like @yosifkit to update here? 😄 😇 |
The use of hostname (as reported by socket.hostname): The If the Triton account has CNS enabled, there will be a DNS records similar to: This obviously cannot be inferred by the shortened version within the zone. Is there a suitable way to determine at this FQDN from the current Mongo master prior to adding it to the replica set? The alternative is to use the host's IP Address, but Mongo specifically discourages this for many reasons. In the event the IP changes the replica set will not recover optimally. Thoughts? |
Use IP addresses instead of hostnames
@yosifkit @misterbisson I'm opening this PR with the work in the
wip
branch so that we have a place for review and comment.