Skip to content

Swarm Mode #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions bin/execute-service
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash
set -e

SERVICE_NAME=$1; shift
DOCKER_CMD=docker

TASK_ID=$(${DOCKER_CMD} service ps --filter 'desired-state=running' $SERVICE_NAME -q)
NODE_ID=$(${DOCKER_CMD} inspect --format '{{ .NodeID }}' $TASK_ID)
CONTAINER_ID=$(${DOCKER_CMD} inspect --format '{{ .Status.ContainerStatus.ContainerID }}' $TASK_ID)
TASK_NAME=swarm_exec_${RANDOM}

TASK_ID=$(${DOCKER_CMD} service create \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this is necessary in recent Docker versions, since you can use Docker Stack, e.g. docker stack deploy tmp.yml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK great thank you I will try it now I didn't realise there was a better way now. I found this code in this thread https://www.reddit.com/r/docker/comments/a5kbte/run_docker_exec_over_a_docker_swarm/

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method in the Reddit link requires that the Docker Engine is accessible remotely via a port. This isn't the default, and indeed is rather dangerous, unless secured very carefully. The default is to bind to a socket; hence, I'm pretty sure this wouldn't work.

--detach \
--name=${TASK_NAME} \
--restart-condition=none \
--mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Why are you binding the Docker socket? Do you not have Docker installed and accessible in the host? If so, no socket-binding should be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It runs a creates a temporary container on a foreign host. This was the workaround I have used before to exec into a container on a foreign swarm node. As I wasn't aware there is a better way.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, interesting… I'll try to think about this more carefully at some point. I'll admit I haven't really had much use for execution of temporary commands on remote Swarm nodes, so far; usually, I just control everything from manager nodes, and make simple services or even images for anything else. Given your explanation, it might well be that this method is fine. Certainly, it would relax some of the caveats in my own script—at the expense of loss of immediate status feedback, and indeed of guaranteeing the commands are even executed. I'll admit, when I prepare Postgres-XL on a Swarm, I don't use this method at all; I simply paste in the SQL clustering commands manually, after checking the pg_hba.conf files on the Coordinators and Datanodes. This round of work we've been doing is nice, though, in supplying automated setup examples, so I am pleased you asked. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another method, of course, would be to override the bootstrapping entrypoints with your own setup code. If you placed your files to be executed in Docker configs, then they would get auto-distributed to the nodes, even for remote worker nodes. I feel this is a little out-of-scope for an example, though (although it likely wouldn't be too hard).

--constraint node.id==${NODE_ID} \
docker:latest docker exec ${CONTAINER_ID} "$@")

while : ; do
STOPPED_TASK=$(${DOCKER_CMD} service ps --filter 'desired-state=shutdown' ${TASK_ID} -q)
[[ ${STOPPED_TASK} != "" ]] && break
sleep 1
done

${DOCKER_CMD} service logs --raw ${TASK_ID}
${DOCKER_CMD} service rm ${TASK_ID} > /dev/null
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not actually necessary to destroy the service, but admittedly it depends on the approach.

34 changes: 34 additions & 0 deletions bin/init-eg-swarm
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash -u

RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m'

STACK_NAME=$1; shift

NODES=(db_coord_1 db_coord_2 db_data_1 db_data_2)
SQL_STATMENTS=(
"CREATE NODE coord_1 WITH (TYPE = 'coordinator', HOST = '${STACK_NAME[@]}_db_coord_1', PORT = 5432);"
"CREATE NODE coord_2 WITH (TYPE = 'coordinator', HOST = '${STACK_NAME[@]}_db_coord_2', PORT = 5432);"
"CREATE NODE data_1 WITH (TYPE = 'datanode', HOST = '${STACK_NAME[@]}_db_data_1', PORT = 5432);"
"CREATE NODE data_2 WITH (TYPE = 'datanode', HOST = '${STACK_NAME[@]}_db_data_2', PORT = 5432);"
"ALTER NODE coord_1 WITH (TYPE = 'coordinator', HOST = '${STACK_NAME[@]}_db_coord_1', PORT = 5432);"
"ALTER NODE coord_2 WITH (TYPE = 'coordinator', HOST = '${STACK_NAME[@]}_db_coord_2', PORT = 5432);"
"ALTER NODE data_1 WITH (TYPE = 'datanode', HOST = '${STACK_NAME[@]}_db_data_1', PORT = 5432);"
"ALTER NODE data_2 WITH (TYPE = 'datanode', HOST = '${STACK_NAME[@]}_db_data_2', PORT = 5432);"
"SELECT pgxc_pool_reload();"
"SELECT * FROM pgxc_node;"
)

for NODE in "${NODES[@]}"
do

echo -e "${RED} Running SQL For ${STACK_NAME[@]}_${NODE[@]} ${NC}"

for SQL in "${SQL_STATMENTS[@]}"
do
echo -e "${BLUE} ./execute-service ${STACK_NAME[@]}_${NODE[@]} psql -c \"${SQL[@]}\" ${NC}"
./execute-service ${STACK_NAME[@]}_${NODE[@]} psql -c "${SQL[@]}"
done

done
108 changes: 108 additions & 0 deletions docker.compose.image.swarm.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
version: "3.7"
services:
db_gtm_1:
environment:
- PG_HOST=0.0.0.0
- PG_NODE=gtm_1
- PG_PORT=6666
image: pavouk0/postgres-xl:latest
command: docker-cmd-gtm
entrypoint: docker-entrypoint-gtm
volumes:
- db_gtm_1:/var/lib/postgresql
networks:
- db
# healthcheck:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much recommend running with healthchecks; Postgres-XL doesn't always detect unhealthy clusters very well (at least, this used to be the case a year or so ago), and it's possible for a cluster to seem up and healthy, but to fail. The recent healthchecks work I did detects and handles this automatically, restarting nodes within a Postgres-XL cluster until it becomes stable.

# test: ["CMD", "docker-healthcheck-gtm"]
deploy:
mode: global
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right, to me. If you have a multi-node Swarm cluster, this will cause multiple deployments of the services, and since they require the data directory and can only run one copy of the service at once, it will almost certainly cause data corruption and a very broken cluster.

Copy link
Contributor Author

@stephenstubbs stephenstubbs Jul 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I misunderstood global mode. All I was trying to limit replicas to be safe but realise now --replicas=1 would do what I was expecint global too. I have only tested it on a single node swarm currently but will on a multi node one as soon as it's working properly on one.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--replicas=1 would indeed do what you expect—however, it's the default, so in fact you don't need it. You don't actually need the constraints section at all; I just include it because I'm presuming you're running non-trivial clusters (i.e. greater than 1-node). However, if I were actually to use this, I'd likely change the constraints to constrain to dbxl_coord_1 etc., so the containers would only ever be launched on nodes containing the data volume. Usually, this would constrain each container to be on a specific node, although of course if you had shared storage, it could also allow for safe failover of a specific Coordinator or Datanode still whilst assuming the 1-replica. Again, I think this is likely a bit out-of-scope, for this, especially as it would require you having a backend shared storage solution configured separately. Perfectly possible, though.

db_coord_1:
environment:
- PG_GTM_HOST=db_gtm_1
- PG_GTM_PORT=6666
- PG_HOST=0.0.0.0
- PG_NODE=coord_1
- PG_PORT=5432
image: pavouk0/postgres-xl:latest
command: docker-cmd-coord
entrypoint: docker-entrypoint-coord
volumes:
- db_coord_1:/var/lib/postgresql
depends_on:
- db_gtm_1
networks:
- db
# healthcheck:
# test: ["CMD", "docker-healthcheck-coord"]
deploy:
mode: global
db_coord_2:
environment:
- PG_GTM_HOST=db_gtm_1
- PG_GTM_PORT=6666
- PG_HOST=0.0.0.0
- PG_NODE=coord_2
- PG_PORT=5432
image: pavouk0/postgres-xl:latest
command: docker-cmd-coord
entrypoint: docker-entrypoint-coord
volumes:
- db_coord_2:/var/lib/postgresql
depends_on:
- db_gtm_1
networks:
- db
# healthcheck:
# test: ["CMD", "docker-healthcheck-coord"]
deploy:
mode: global
db_data_1:
environment:
- PG_GTM_HOST=db_gtm_1
- PG_GTM_PORT=6666
- PG_HOST=0.0.0.0
- PG_NODE=data_1
- PG_PORT=5432
image: pavouk0/postgres-xl:latest
command: docker-cmd-data
entrypoint: docker-entrypoint-data
depends_on:
- db_gtm_1
volumes:
- db_data_1:/var/lib/postgresql
networks:
- db
# healthcheck:
# test: ["CMD", "docker-healthcheck-data"]
deploy:
mode: global
db_data_2:
environment:
- PG_GTM_HOST=db_gtm_1
- PG_GTM_PORT=6666
- PG_HOST=0.0.0.0
- PG_NODE=data_2
- PG_PORT=5432
image: pavouk0/postgres-xl:latest
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember latest is for testing, only; production should ping a specific tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks will do.

command: docker-cmd-data
entrypoint: docker-entrypoint-data
depends_on:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you used Stack deploy, this would be ignored.

- db_gtm_1
volumes:
- db_data_2:/var/lib/postgresql
networks:
- db
# healthcheck:
# test: ["CMD", "docker-healthcheck-data"]
deploy:
mode: global
volumes:
db_gtm_1: {}
db_coord_1: {}
db_coord_2: {}
db_data_1: {}
db_data_2: {}
networks:
db:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a huge security risk. Since the Postgres-XL cluster doesn't support authentication for inter-node communictions, it needs to use the trust in pg_hba.conf. But using the same network for both cluster communication and access into the cluster from outside means that any roles and passwords you set will be silently ignored, and anything will be allowed direct access from anywhere. Not only that, but this create a cluster-corruption risk, since non-Postgres-XL services could talk directly to the Datanodes, rather than being forced to go through the Coordinators. This in turn means that Postgres-XL would not maintain its metadata, and the cluster would be highly like to corrupt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I didn't realise this. That is a big problem. Thanks for the clarification.

driver: overlay
attachable: true
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for attachable, unless you're deploying as a standalone cluster that other Swarm services connect to. If attachable is desired, make sure it's for the Coordinator network only (db_b), not the inter-cluster network (db_a). These networks are named in order, since in fact because of how Docker Swarm sets up default routes within the container, another order will result in db_b being the default for Coordinators, leading to an authentication failure since they're not routing through the trust backend network. This can also happen even with Stack deploy, in some cases—safest is in fact to create the networks manually, or to check the networks to ensure that db_a has a lower IP subnet than db_b (given how the default healthchecks work, which rely on this, so as no longer to require variables passing in the network values).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you correct with this. I was just using attachable as I have another container that i was running psql from to test but I definitely wouldn't run it like this production.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attachable is fine, if this is your use-case. But it's not necessary if you have services launched in the same stack as your Postgres-XL cluster. If you'd rather configure separately, though, or are setting up a single Postgres-XL cluster for multiple services, then attachable is fine—or perhaps even publishing ports, if you're sure you're behind a properly-restricted firewall (or perhaps have multiple NICs, even). In the latter case, however, be very careful to only expose the Coordinator-only (db_b) overlay network; otherwise, you'll grant public access without any authentication, like noted above for pg_hba.conf trust vs md5 (if you're doing this, I highly recommend you test it carefully, too, before going live, in case there's a bug in this program somewhere—no warranty, etc. etc. etc. :) ).