Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

Install rabbitmq before celery workers in sandboxes #2871

Closed

Conversation

omarkhan
Copy link
Contributor

Problem

We have been seeing this error when provisioning sandboxes:

TASK: [edxapp | ensure edxapp_workers has started] ****************************
failed: [149.202.174.205] => {"failed": true}
msg: edxapp_worker:cms_default_1: ERROR (abnormal termination)

FATAL: all hosts have already failed -- aborting

In /edx/var/log/supervisor/cms_default_1-stderr.log on that sandbox we can see:

[2016-03-14 16:15:43,947: ERROR/MainProcess] consumer: Cannot connect to amqp://celery:**@127.0.0.1:5672//: [Errno 111] Connection refused.
Trying again in 4.00 seconds...

[2016-03-14 16:15:47,955: ERROR/MainProcess] consumer: Cannot connect to amqp://celery:**@127.0.0.1:5672//: [Errno 111] Connection refused.
Trying again in 6.00 seconds...

[2016-03-14 16:15:53,965: ERROR/MainProcess] consumer: Cannot connect to amqp://celery:**@127.0.0.1:5672//: [Errno 111] Connection refused.
Trying again in 8.00 seconds...

[2016-03-14 16:16:01,980: ERROR/MainProcess] consumer: Cannot connect to amqp://celery:**@127.0.0.1:5672//: [Errno 111] Connection refused.
Trying again in 10.00 seconds...

It turns out that rabbitmq is not running, in fact it is not even installed. This is because the edx_sandbox.yml playbook runs the edxapp role before the rabbitmq role, and the edxapp role errors out because rabbitmq is missing, so rabbitmq never gets installed.

Solution

Run the rabbitmq role before the edxapp role in edx_sandbox.yml.

The workers need rabbitmq to be running before they can be started
@openedx-webhooks
Copy link

Thanks for the pull request, @omarkhan! It looks like you're a member of a company that does contract work for edX. If you're doing this work as part of a paid contract with edX, you should talk to edX about who will review this pull request. If this work is not part of a paid contract with edX, then you should ensure that there is an OSPR issue to track this work in JIRA, so that we don't lose track of your pull request.

To automatically create an OSPR issue for this pull request, just visit this link: https://openedx-webhooks.herokuapp.com/github/process_pr?number=2871&repo=edx%2Fconfiguration

@e0d
Copy link
Contributor

e0d commented Mar 17, 2016

Look reasonable, creating a from scratch sandbox as verification.

@feanil
Copy link
Contributor

feanil commented Mar 17, 2016

👍 if sandbox comes up without issues @e0d

@feanil
Copy link
Contributor

feanil commented Mar 17, 2016

This change should also be made in vagrant-fullstack.yml and vagrant-devstack.yml

@nedbat
Copy link
Contributor

nedbat commented Mar 17, 2016

@omarkhan Thanks for finding this. I went ahead and made this change in #2875, so we can close this PR.

@omarkhan
Copy link
Contributor Author

Thanks @nedbat, closing

@omarkhan omarkhan closed this Mar 18, 2016
@e0d
Copy link
Contributor

e0d commented Mar 18, 2016

For posterity, build succeeded and looks good

[e0d-pr2871] e0d@e0d-pr2871 i-2ad99eae:~$ sudo /edx/bin/supervisorctl
analytics_api                    RUNNING   pid 30708, uptime 1 day, 5:13:54
certs                            RUNNING   pid 25968, uptime 1 day, 5:16:17
credentials                      RUNNING   pid 29549, uptime 1 day, 4:54:26
ecommerce                        RUNNING   pid 21642, uptime 1 day, 5:05:03
edxapp:cms                       RUNNING   pid 2523, uptime 1 day, 5:35:03
edxapp:lms                       RUNNING   pid 2537, uptime 1 day, 5:35:02
edxapp_worker:cms_default_1      RUNNING   pid 32216, uptime 1 day, 5:46:44
edxapp_worker:cms_high_1         RUNNING   pid 32224, uptime 1 day, 5:46:42
edxapp_worker:cms_low_1          RUNNING   pid 32243, uptime 1 day, 5:46:40
edxapp_worker:lms_default_1      RUNNING   pid 32269, uptime 1 day, 5:46:36
edxapp_worker:lms_high_1         RUNNING   pid 32286, uptime 1 day, 5:46:33
edxapp_worker:lms_high_mem_1     RUNNING   pid 32301, uptime 1 day, 5:46:29
edxapp_worker:lms_low_1          RUNNING   pid 32316, uptime 1 day, 5:46:27
flower                           RUNNING   pid 31336, uptime 1 day, 4:52:20
forum                            RUNNING   pid 30898, uptime 1 day, 4:52:50
notifier-celery-workers          RUNNING   pid 31027, uptime 1 day, 4:52:36
notifier-scheduler               RUNNING   pid 30981, uptime 1 day, 4:52:48
programs                         RUNNING   pid 9139, uptime 1 day, 4:59:53
xqueue                           RUNNING   pid 20163, uptime 1 day, 5:20:32
xqueue_consumer                  RUNNING   pid 20193, uptime 1 day, 5:20:29
xserver                          RUNNING   pid 31061, uptime 1 day, 4:52:25

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants