-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 1366 - Slack integrations [ Docker Events ] #1472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, this is going to be very cool.
I agree that this polling technique (doing a request every 5s to determine server health) is not ideal. What about a backwards approach? Two ideas:
First, we have code in Telescope to manage shutdowns, both graceful and error-based, see https://github.com/Seneca-CDOT/telescope/blob/736eb97c754047fc8b159d11a0216106fcb86965/src/backend/lib/shutdown.js. We could modify this code to POST
to the autodeployment server with details about why we are shutting down. Then this code could be converted to a pure webhook, along the same lines as we do now from GitHub.
Second, I wonder if we could do this directly from pm2/docker-container. I'm not sure of the details, but I suspect that we can hook into events in our process manager and do something similar when the Telescope app goes down.
@manekenpix will have opinions as well, which I'd like to hear.
tools/autodeployment/package.json
Outdated
@@ -4,6 +4,7 @@ | |||
"description": "A tool for automatic deployment", | |||
"main": "server.js", | |||
"dependencies": { | |||
"axios": "0.21.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axios isn't very well maintained. Let's use one of node-fetch or got
tools/autodeployment/env.example
Outdated
@@ -16,3 +16,6 @@ UNSPLASH_CLIENT_ID= | |||
|
|||
# Path to certificates | |||
PATH_TO_CERTS= | |||
|
|||
#slack message webhook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after #
and let' give a more complete comment about what this is. A URL? What format? Give an example, link to docs one might need, etc.
@@ -0,0 +1,54 @@ | |||
require('dotenv').config(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to do this once per app, and ideally as early as possible. We already have it in https://github.com/Seneca-CDOT/telescope/blob/master/tools/autodeployment/server.js#L1, so you can assume the env is populated already and just use it.
let prodResponse = {}; | ||
|
||
try { | ||
prodResponse = await axios.get('https://telescope.cdot.systems/posts', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a health check API endpoint for this, see https://telescope.cdot.systems/health.
tools/autodeployment/server.js
Outdated
@@ -7,6 +7,7 @@ const mergeStream = require('merge-stream'); | |||
const fs = require('fs'); | |||
|
|||
const { buildStart, buildStop, handleStatus } = require('./info'); | |||
const { prodStatus } = require('./slack_app'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the same code for prod
and dev
, so you can name this status
or telescopeStatus
.
const prodStatus = () => { | ||
let alarm = false; | ||
|
||
setInterval(async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up keeping this, I think it's better to use setInterval Async, like we do in the wiki-feed-parser.
const downDate = new Date().toTimeString(); | ||
axios.post(SLACK_MESSAGE_HOOK, { | ||
response_type: 'in_channel', | ||
text: `\n:rotating_light: PRODUCTION IS DOWN. :rotating_light:\n`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, instead of hardcoding Prod
or Dev
, you can use DEPLOY_TYPE's value to know which server is running this code.
I like the first approach suggested by @humphd, but I'd still keep the automated check (not every 5 secs, but maybe every 5 or 10 min) so we get an alert in those weird cases in which, after a successful build, the front-end works but the back-end crashes at launch. |
@humphd What about check if our backend is up in the root(OS)? Something like cron jobs checking the |
Apparently we can get events from docker-compose itself: |
@humphd okay cool. I will try pm2/docker. |
Another thought I had this morning: dev could monitor prod, and vice versa, like siblings checking in on each other periodically. This would be in addition to the internal checks, basically seeing if the backend is still online. |
A bit of research on #!/bin/bash
docker events | while read line; do if [[ ${line} = *"start"* || ${line} = *"stop"* ]];then echo " Event: ${line}" ;fi ; done |
@humphd @manekenpix |
I followed all the advice, and the partial result is good. Thanks. |
1a4696d
to
efc01d9
Compare
|
||
# Slack "Incoming WebHook" to send messages to #telescope channel. | ||
# For more information: https://api.slack.com/messaging/webhooks | ||
SLACK_SEND_MESSAGE= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an example of what this should look like in addition to the docs.
|
||
const dockerEvents = new Events.EventEmitter(); | ||
|
||
const message = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const message = {
down: {
response_type: 'in_channel',
text: `\n:rotating_light: PRODUCTION IS DOWN. :rotating_light:\n`,
...etc,
},
up: {
...values for up message
}
};
Now you can use message.up
and message.down
below.
}; | ||
|
||
const waitingDocker = () => { | ||
shell.chmod('700', './waitDocker.sh'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd/dangerous to do in a script. Can we not do it as part of installing the autodeployment server once?
@@ -0,0 +1,8 @@ | |||
#!/bin/bash | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment block to the top of this file explaining what you're doing and why.
@@ -0,0 +1,8 @@ | |||
#!/bin/bash | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, let's document what this is.
@humphd |
efc01d9
to
39d5652
Compare
I will break the #1472 into different PRs. Here we have just the docker event listener, and it is not ready yet. It still needs to improve. I'm still working on the following "bugs":
About PM2, I believe that we need to enforce it to keep the auto-deployment server running(restart the server if needed). There will be a second PR landing
Thanks. |
}); | ||
event.stdout.on('data', (data) => { | ||
const containerEvent = JSON.parse(data); | ||
if (containerEvent.Actor.Attributes.name.includes('telescope')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manekenpix, this line is for filter containers. Because we could have more than one project running Docker in the machine, let's think of how it would work on CDOT machines.
This line works on my local machine because all the docker's containers start with the word "telescope"; but I think this is different on CDOT machines.
return; | ||
|
||
if (eventsDown.includes(Action)) { | ||
containersUp.has(name) ? containersUp.delete(name) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@humphd
How can I improve my coding style here?
I'm trying to avoid callbacks hell, and then I had to disable linter (/* eslint-disable no-unused-expressions */) to code these lines where I use a ternary operator with null.
@PedroFonsecaDEV from triage today, we are thinking that this is something to return to after the next.js port. Is that your feeling? Or should we work on this in the 1.6 time frame? Happy to go either way. |
@humphd agree with you. We should focus on porting to Next. |
Thank you very much to everyone committing to this PR, your work is very appreciated. We will be coming back to this PR later, so we are closing it for now. |
Issue This PR Addresses
This is the beginning of the implementation related to #1366.
Type of Change
This
Description
I created a new workspace to test it, and it's working. This is just a test, a draft.
This feature will send a message to our slack channel whenever our servers are down and when they back to work again. So we can have a live check on our servers(prod and dev). This feature doesn't rely on user commands; it will always be running.
This one is covering just production.
We will have slack commands, but for now, I believe this feature is more important.
Checklist