Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1366 - Slack integrations [ Docker Events ] #1472

Closed
wants to merge 11 commits into from

Conversation

PedroFonsecaDEV
Copy link
Contributor

Issue This PR Addresses

This is the beginning of the implementation related to #1366.

Type of Change

This

  • Bugfix: Change which fixes an issue
  • New Feature: Change which adds functionality
  • Documentation Update: Change which improves documentation
  • UI: Change which improves UI

Description

I created a new workspace to test it, and it's working. This is just a test, a draft.

This feature will send a message to our slack channel whenever our servers are down and when they back to work again. So we can have a live check on our servers(prod and dev). This feature doesn't rely on user commands; it will always be running.
This one is covering just production.

We will have slack commands, but for now, I believe this feature is more important.

2020-12-01 02 33 03

Checklist

  • Quality: This PR builds and passes our npm test and works locally
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Screenshots: This PR includes screenshots or GIFs of the changes made or an explanation of why it does not (if applicable)
  • Documentation: This PR includes updated/added documentation to user exposed functionality or configuration variables are added/changed or an explanation of why it does not(if applicable)

@PedroFonsecaDEV PedroFonsecaDEV self-assigned this Dec 1, 2020
@PedroFonsecaDEV PedroFonsecaDEV added type: enhancement New feature or request type: nice to have Feature that'd be nice to have, but not a priority labels Dec 1, 2020
Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, this is going to be very cool.

I agree that this polling technique (doing a request every 5s to determine server health) is not ideal. What about a backwards approach? Two ideas:

First, we have code in Telescope to manage shutdowns, both graceful and error-based, see https://github.com/Seneca-CDOT/telescope/blob/736eb97c754047fc8b159d11a0216106fcb86965/src/backend/lib/shutdown.js. We could modify this code to POST to the autodeployment server with details about why we are shutting down. Then this code could be converted to a pure webhook, along the same lines as we do now from GitHub.

Second, I wonder if we could do this directly from pm2/docker-container. I'm not sure of the details, but I suspect that we can hook into events in our process manager and do something similar when the Telescope app goes down.

@manekenpix will have opinions as well, which I'd like to hear.

@@ -4,6 +4,7 @@
"description": "A tool for automatic deployment",
"main": "server.js",
"dependencies": {
"axios": "0.21.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axios isn't very well maintained. Let's use one of node-fetch or got

@@ -16,3 +16,6 @@ UNSPLASH_CLIENT_ID=

# Path to certificates
PATH_TO_CERTS=

#slack message webhook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after # and let' give a more complete comment about what this is. A URL? What format? Give an example, link to docs one might need, etc.

@@ -0,0 +1,54 @@
require('dotenv').config();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to do this once per app, and ideally as early as possible. We already have it in https://github.com/Seneca-CDOT/telescope/blob/master/tools/autodeployment/server.js#L1, so you can assume the env is populated already and just use it.

let prodResponse = {};

try {
prodResponse = await axios.get('https://telescope.cdot.systems/posts', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a health check API endpoint for this, see https://telescope.cdot.systems/health.

@@ -7,6 +7,7 @@ const mergeStream = require('merge-stream');
const fs = require('fs');

const { buildStart, buildStop, handleStatus } = require('./info');
const { prodStatus } = require('./slack_app');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the same code for prod and dev, so you can name this status or telescopeStatus.

const prodStatus = () => {
let alarm = false;

setInterval(async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up keeping this, I think it's better to use setInterval Async, like we do in the wiki-feed-parser.

const downDate = new Date().toTimeString();
axios.post(SLACK_MESSAGE_HOOK, {
response_type: 'in_channel',
text: `\n:rotating_light: PRODUCTION IS DOWN. :rotating_light:\n`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, instead of hardcoding Prod or Dev, you can use DEPLOY_TYPE's value to know which server is running this code.

@manekenpix
Copy link
Member

I like the first approach suggested by @humphd, but I'd still keep the automated check (not every 5 secs, but maybe every 5 or 10 min) so we get an alert in those weird cases in which, after a successful build, the front-end works but the back-end crashes at launch.

@PedroFonsecaDEV
Copy link
Contributor Author

@humphd What about check if our backend is up in the root(OS)? Something like cron jobs checking the backend container. What do you think?

@humphd
Copy link
Contributor

humphd commented Dec 2, 2020

@PedroFonsecaDEV
Copy link
Contributor Author

@humphd okay cool. I will try pm2/docker.
Thanks.

@humphd
Copy link
Contributor

humphd commented Dec 2, 2020

Another thought I had this morning: dev could monitor prod, and vice versa, like siblings checking in on each other periodically. This would be in addition to the internal checks, basically seeing if the backend is still online.

@manekenpix
Copy link
Member

manekenpix commented Dec 3, 2020

A bit of research on docker events: a bash script that processes docker events and filters the ones including start or stop.
I think it's a good start for testing. This approach can be used to send messages to slack depending on emitted events,

#!/bin/bash
docker events |  while read line; do  if [[ ${line} = *"start"* || ${line} = *"stop"* ]];then echo " Event: ${line}" ;fi ; done

@PedroFonsecaDEV
Copy link
Contributor Author

@humphd @manekenpix
I like both ideas. I found this PM2 tool:
https://pm2.keymetrics.io/docs/usage/docker-pm2-nodejs/

@PedroFonsecaDEV
Copy link
Contributor Author

PedroFonsecaDEV commented Dec 3, 2020

I followed all the advice, and the partial result is good.
Probably I will land a PR later today(Thursday).

Thanks.
@humphd @manekenpix

@PedroFonsecaDEV
Copy link
Contributor Author

@humphd @manekenpix

I've been talking with Josue for a few days; he explained to me a lot about how Telescope builds on CDOT machine and about Telescope's architecture, and he also helped me with a lot of things. We discussed many options for this new feature, and I decided to use an event-driven approach to listen to our docker system.
This PR is totally different from the last one. Now we use the auto-deployment server to listen to docker events using bash scripts. Here you will find a sketch of my proposal. All the error messages are just for demonstration purposes.
Once we agree, I will continue to work on the feature.
Thanks
2020-12-04 02 57 13


# Slack "Incoming WebHook" to send messages to #telescope channel.
# For more information: https://api.slack.com/messaging/webhooks
SLACK_SEND_MESSAGE=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an example of what this should look like in addition to the docs.


const dockerEvents = new Events.EventEmitter();

const message = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const message = {
  down: {
    response_type: 'in_channel',
      text: `\n:rotating_light: PRODUCTION IS DOWN. :rotating_light:\n`,
      ...etc,
  },
  up: {
    ...values for up message
  }
};

Now you can use message.up and message.down below.

};

const waitingDocker = () => {
shell.chmod('700', './waitDocker.sh');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd/dangerous to do in a script. Can we not do it as part of installing the autodeployment server once?

@@ -0,0 +1,8 @@
#!/bin/bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment block to the top of this file explaining what you're doing and why.

@@ -0,0 +1,8 @@
#!/bin/bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, let's document what this is.

@PedroFonsecaDEV
Copy link
Contributor Author

@humphd
Okay. working on it.

@PedroFonsecaDEV
Copy link
Contributor Author

PedroFonsecaDEV commented Dec 12, 2020

@manekenpix @humphd

I will break the #1472 into different PRs.

Here we have just the docker event listener, and it is not ready yet. It still needs to improve. I'm still working on the following "bugs":

  1. Docker doesn't emit an event when we close it (Docker -> Quit Docker Desktop), and docker events runs only if the Docker app is running. So if docker crashes or closes for some reason, the docker events will quit without any error code; without sent us an Event. (Maybe enforce PM2 to restart Docker if it's crashed or not responding??).

About PM2, I believe that we need to enforce it to keep the auto-deployment server running(restart the server if needed).

There will be a second PR landing in the next hours tomorrow (Saturday) covering the "siblings check". as David suggested:

humphd commented 10 days ago
Another thought I had this morning: dev could monitor prod, and vice versa, like siblings checking in on each other periodically. This would be in addition to the internal checks, basically seeing if the backend is still online.

Thanks.

});
event.stdout.on('data', (data) => {
const containerEvent = JSON.parse(data);
if (containerEvent.Actor.Attributes.name.includes('telescope')) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manekenpix, this line is for filter containers. Because we could have more than one project running Docker in the machine, let's think of how it would work on CDOT machines.
This line works on my local machine because all the docker's containers start with the word "telescope"; but I think this is different on CDOT machines.

return;

if (eventsDown.includes(Action)) {
containersUp.has(name) ? containersUp.delete(name) : null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@humphd
How can I improve my coding style here?
I'm trying to avoid callbacks hell, and then I had to disable linter (/* eslint-disable no-unused-expressions */) to code these lines where I use a ternary operator with null.

@PedroFonsecaDEV
Copy link
Contributor Author

PedroFonsecaDEV commented Dec 12, 2020

2020-12-11 21 49 54

It's notifying about every single container of the docker-compose.
The message is to demonstrate how it works. We need to find a better message.

@PedroFonsecaDEV PedroFonsecaDEV changed the title Issue 1366 - Investigate Slack integrations for failed or successful builds Issue 1366 - Slack integrations [ Docker Events ] Dec 13, 2020
@PedroFonsecaDEV PedroFonsecaDEV added area: tools area: deployment Production or Staging deployment labels Dec 13, 2020
@humphd humphd requested a review from Metropass January 19, 2021 15:50
@humphd
Copy link
Contributor

humphd commented Jan 26, 2021

@PedroFonsecaDEV from triage today, we are thinking that this is something to return to after the next.js port. Is that your feeling? Or should we work on this in the 1.6 time frame? Happy to go either way.

@PedroFonsecaDEV
Copy link
Contributor Author

@humphd agree with you. We should focus on porting to Next.

@PedroFonsecaDEV PedroFonsecaDEV added the Blocked Can't do this, until something else is done label Jan 26, 2021
@izhuravlev
Copy link
Contributor

Thank you very much to everyone committing to this PR, your work is very appreciated.

We will be coming back to this PR later, so we are closing it for now.

@izhuravlev izhuravlev closed this Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: deployment Production or Staging deployment area: tools Blocked Can't do this, until something else is done type: enhancement New feature or request type: nice to have Feature that'd be nice to have, but not a priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate Slack integrations for failed or successful builds
4 participants