Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for certbot and Let's Encrypt certificates #125

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

stevepiercy
Copy link
Contributor

@stevepiercy stevepiercy commented Nov 14, 2019

Closes #61.

See related documentation update in plone/training#470.

@stevepiercy stevepiercy requested a review from smcmahon November 14, 2019 21:50
@smcmahon
Copy link
Member

smcmahon commented Nov 14, 2019

This is a great idea, but we need to 1) make it optional, 2) document it in the ansible playbook docs (not just training) and 3) make it as close to foolproof as possible.

To make it optional, I suggest revising the "when" on the role operation to check the value of some new default variable like install_certbot. We don't want to install new packages and setup new cron jobs without giving the sysadmin an option.

Regarding making it foolproof: We need to have the certificate renewal also restart nginx if needed. Otherwise, this isn't really complete. I have some code that does this that I'll take a look at for possible inclusion. It may not be compatible with Geerling's approach.

It would also be great if we could figure out how to activate it for a host with just an entry in webserver_virtualhosts -- rather than having separate entries in webserver_virtualhosts and certbot_certs. Again, I don't know if this is compatible with Geerling's role.

@smcmahon smcmahon closed this Nov 14, 2019
@stevepiercy
Copy link
Contributor Author

I can make the requested revisions. Why close the PR?

@smcmahon smcmahon reopened this Nov 15, 2019
@smcmahon
Copy link
Member

Closing was accidental. I meant to just comment.

@smcmahon
Copy link
Member

I wonder if it wouldn't be better to separate the certbot operations into a separate playbook. This could be modeled on the firewall.yml playbook and could use the local-configure.yml file in the same way to pick up needed variables. This would loose the coupling and help make it clear to the sysadmin that there are a variety of considerations to be taken into account in employing certbot.

@smcmahon
Copy link
Member

I see no need for a separate 'lets_encrypt_certificate' variable. (Perhaps I'm missing something.)

Wouldn't the existing 'certificate_file' and 'key_file' options for a webserver_virtualhosts item do just as well? That's what I've been using already with my own certbot setup.

If that would work, that's one less option to be separately documented and maintained. And, no code changes needed in the nginx mode.

@smcmahon
Copy link
Member

If the last two comments are adopted, then I think the right way to document the letsencrypt support would be in an added doc in docs. An example in the training docs is also a great idea, of course.

Thanks!

TASK [Fail if Ansible is old] *****************************************************************************************************************************************************************************************************************************************
fatal: [example.com]: FAILED! => {"msg": "The conditional check 'ansible_version is version('2.5.0', 'lt')' failed. The error was: Version comparison: '<' not supported between instances of 'str' and 'int'\n\nThe error appears to be in '/project-path/plone/ansible-playbook/playbook.yml': line 11, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Fail if Ansible is old\n      ^ here\n"}
@stevepiercy
Copy link
Contributor Author

I wonder if it wouldn't be better to separate the certbot operations into a separate playbook.

That makes sense, especially considering the step to stop and start the web server so that certbot can run its own server to generate the first-time cert. There's a few more things going on, too.

I see no need for a separate 'lets_encrypt_certificate' variable. (Perhaps I'm missing something.)

This was in consideration to continue supporting copying certificates. When certbot installed the LE cert, it warned me not to move them, and I don't know if copying would be harmful. I used the variable lets_encrypt_certificate to avoid copying them.

From the README:

WARNING: DO NOT MOVE THESE FILES!
         Certbot expects these files to remain in this location in order
         to function properly!

We recommend not moving these files. For more information, see the Certbot
User Guide at https://certbot.eff.org/docs/using.html#where-are-my-certificates.

Anyway, I tried your suggestion, but it failed. These are symlinks to the actual files. example.com is a mask for the actual host.

failed: [example.com] (item={'hostname': 'example.com', 'default_server': True, 'zodb_path': '/Plone', 'address': '157.245.228.22', 'port': 443, 'protocol': 'https', 'certificate_file': '/etc/letsencrypt/live/example.com/fullchain.pem', 'key_file': '/etc/letsencrypt/live/example.com/privkey.pem'}) => {"ansible_loop_var": "item", "changed": false, "item": {"address": "157.245.228.22", "certificate_file": "/etc/letsencrypt/live/example.com/fullchain.pem", "default_server": true, "hostname": "example.com", "key_file": "/etc/letsencrypt/live/example.com/privkey.pem", "port": 443, "protocol": "https", "zodb_path": "/Plone"}, "msg": "Could not find or access '/etc/letsencrypt/live/example.com/fullchain.pem' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}

How do you manage the certbot certs?

@stevepiercy
Copy link
Contributor Author

Regarding making it foolproof: We need to have the certificate renewal also restart nginx if needed.

Certbot can handle that and with the standalone option:

certbot renew --pre-hook "service nginx stop" --post-hook "service nginx start"

... Hooks will only be run if a certificate is due for renewal, so you can run the above command frequently without unnecessarily stopping your webserver.

So this example value can go in the local-configure.yml:

certbot_auto_renew_options: '--quiet --no-self-upgrade
--pre-hook "service nginx stop" --post-hook "service nginx start"'

I think I got it, but there might be a chicken-and-egg problem with restarting nginx. I have not verified yet on a clean machine, but here are my assumptions:

  1. I need to run playbook.yml to install and configure nginx for certbot.
  2. However nginx will not restart with the webserver_virtualhosts required to use geerlingguy.certbot because I have not yet run certbot.
  3. I have to run playbook.yml, then geerlingguy.certbot, then playbook.yml once more to complete everything else.

Here's the process outline, after a VM is setup and has a non-root user. Would you please review and let me know whether I should change it? I'm a hack at this Ansible stuff.

  • Configure local-configure.yml with webserver_virtualhosts either with certbot as documented (but in a separate file, PR coming) or without certbot.
  • Optionally install geerlingguy.certbot and configure.
    • cd ansible-playbook
    • git clone https://github.com/geerlingguy/ansible-role-certbot.git geerlingguy.certbot
    • geerlingguy.certbot.yml is already configured for use, but may be edited.
  • Run playbook.yml, then geerlingguy.certbot, then playbook.yml.

Commits and PRs coming shortly.

@stevepiercy
Copy link
Contributor Author

Updated plone/training#470

I added docs in docs. I can also move the additions for LE and certbot from webserver.rst into a separate file.

@smcmahon ready for review. I will test this out later tonight on a clean Digital Ocean VM.

@stevepiercy
Copy link
Contributor Author

I tried this out on a clean DO VM, but I had to manually stop nginx, run the command to create the cert, and restart nginx. I don't know why the role geerlinguy.certbot did not do this, but I suspect it might be due to how variables are parsed by Ansible. Is there some way to debug or get more information about variables that are actually used?

@stevepiercy
Copy link
Contributor Author

Why, yes, there is a debug method for Ansible.

I realized that defaults in the role were not getting overridden by those in my local-configure.yml, so I moved them into the playbook geerlingguy.certbot.yml instead, and that yielded success.

I have to do more revisions to this PR, so please hold off merging until I can finish testing.

@stevepiercy
Copy link
Contributor Author

I've hit a roadblock, and I don't know how to fix it. Varnish returns an error message:

Error 503 Backend fetch failed
Backend fetch failed

Guru Meditation:
XID: 27

Varnish cache server

I'm using Python 3 in my playbook.yml. It completes after 3 runs. Along the way:

RUN 1

TASK [plone.plone_server : Supervisor task list is updated and we have a memmon] **************************************************************************************************************************************************************************************
fatal: [plone-demo.stevepiercy.com]: FAILED! => {"changed": true, "cmd": "supervisorctl stop zeoserver_memmon; supervisorctl remove zeoserver_memmon", "delta": "0:00:00.768966", "end": "2019-11-20 00:39:08.201577", "msg": "non-zero return code", "rc": 1, "start": "2019-11-20 00:39:07.432611", "stderr": "", "stderr_lines": [], "stdout": "zeoserver_memmon: ERROR (no such process)\nERROR: no such process/group: zeoserver_memmon", "stdout_lines": ["zeoserver_memmon: ERROR (no such process)", "ERROR: no such process/group: zeoserver_memmon"]}

RUN 2

The previous issue seems to be resolved, but the next one crops up.

TASK [plone.plone_server : Supervisor task list is updated and we have a memmon] **************************************************************************************************************************************************************************************
skipping: [plone-demo.stevepiercy.com]

...

TASK [plone.plone_server : Create initial Plone site] *****************************************************************************************************************************************************************************************************************
[WARNING]: Module remote_tmp /home/plone_daemon/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually

fatal: [plone-demo.stevepiercy.com]: FAILED! => {
  "changed": true,
  "cmd": [
    "bin/client_reserved",
    "run",
    "scripts/addPloneSite.py"
  ],
  "delta": "0:00:02.518976",
  "end": "2019-11-20 01:01:17.811993",
  "msg": "non-zero return code",
  "rc": 1,
  "start": "2019-11-20 01:01:15.293017",
  "stderr": "Traceback (most recent call last):\n  File \"bin/client_reserved\", line 266, in <module>\n    + sys.argv[1:]))\n  File \"/usr/local/plone-5.2/buildout-cache/eggs/plone.recipe.zope2instance-6.3.0-py3.6.egg/plone/recipe/zope2instance/ctl.py\", line 993, in main\n    func = ep.load()\n  File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2434, in load\n    return self.resolve()\n  File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2440, in resolve\n    module = __import__(self.module_name, fromlist=['__name__'], level=0)\n  File \"/usr/local/plone-5.2/buildout-cache/eggs/five.z2monitor-0.2-py3.6.egg/five/z2monitor/__init__.py\", line 19, in <module>\n    import zc.monitor\n  File \"/usr/local/plone-5.2/buildout-cache/eggs/zc.monitor-0.3.1-py3.6.egg/zc/monitor/__init__.py\", line 59\n    except Exception, v:\n                    ^\nSyntaxError: invalid syntax",
  "stderr_lines": [
    "Traceback (most recent call last):",
    "  File \"bin/client_reserved\", line 266, in <module>",
    "    + sys.argv[1:]))",
    "  File \"/usr/local/plone-5.2/buildout-cache/eggs/plone.recipe.zope2instance-6.3.0-py3.6.egg/plone/recipe/zope2instance/ctl.py\", line 993, in main",
    "    func = ep.load()",
    "  File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2434, in load",
    "    return self.resolve()",
    "  File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2440, in resolve",
    "    module = __import__(self.module_name, fromlist=['__name__'], level=0)",
    "  File \"/usr/local/plone-5.2/buildout-cache/eggs/five.z2monitor-0.2-py3.6.egg/five/z2monitor/__init__.py\", line 19, in <module>",
    "    import zc.monitor",
    "  File \"/usr/local/plone-5.2/buildout-cache/eggs/zc.monitor-0.3.1-py3.6.egg/zc/monitor/__init__.py\", line 59",
    "    except Exception, v:",
    "                    ^",
    "SyntaxError: invalid syntax"
  ],
  "stdout": "",
  "stdout_lines": []
}

This appears to be an Python 3 incompatibility.

https://github.com/zopefoundation/zc.monitor/blob/master/src/zc/monitor/__init__.py#L59

Should be:

            except Exception as v:

I tried editing my server's copy of that file, and running the playbook one more time, but that had no affect on Varnish. I have been able to reliably reproduce this issue on clean DO VMs.

Can anyone point me in the right direction to troubleshoot this further?

@stevepiercy
Copy link
Contributor Author

As a sanity check, I dropped back to Python 2 for the install, and there was no Varnish error.

I submitted a PR for the zc.monitor issue. Hopefully that resolves the issue with the error reported by Varnish.

@jensens jensens requested a review from fulv November 22, 2019 11:49
@stevepiercy
Copy link
Contributor Author

2 months after setting up a new Plone instance with this configuration, auto renewal fails.

Problem binding to port 80: Could not bind to IPv4 or IPv6.. Skipping.

I don't want to go in manually every 3 months to stop nginx, run certbot, and restart nginx. How do other folks handle letsencrypt automatic renewal?

@fulv
Copy link
Member

fulv commented Jan 15, 2020

Personally, I have this in root's crontab on all my hosts:

@monthly /usr/bin/certbot renew --post-hook "service nginx restart"

@stevepiercy
Copy link
Contributor Author

@fulv I tried running the command you have in cron, but it returns the same error message. Do you have the standalone version of certbot?

I scoured letsencrypt's community for answers, but all I found was to manually stop the webserver so that certbot could then bind to port 80 and renew the certificate.

@polyester
Copy link
Member

polyester commented Jan 16, 2020

@stevepiercy that's also what we do. We have a playbook that

  • stops what's running on port 80 (nginx in our case)
  • runs renew
  • restarts nginx

a bit clunky and not 100% uptime, but kinda works.
Anything else requires that you have scripted access to your DNS provider, which we don't. (If you do, you can script to update the DNS authentication method of certbot)

@stevepiercy
Copy link
Contributor Author

@polyester I'm in the same boat. No DNS hooks for LE. I'm using nginx, per the defaults of this playbook. Can you share a sanitized version of your playbook? It sounds like you still need to manually run it once per quarter, though, but at least it would save a few manual steps. I can deal with that.

@polyester
Copy link
Member

@stevepiercy it really is the simplest playbook, basically

- name: stop nginx service 
  service: name=nginx state=stopped 
- name: renew cert
  command: certbot renew
- name: start nginx service 
  service: name=nginx state=started 

which isn't very refined. Our ansible master is cronnable, but if you have to run it manually of course that is prone to forgetting. Maybe for the playbook having the cronjob on the target do the "stop nginx, renew certificate, start nginx" would be sufficient? (although of course Ansible complains louder and better if for whatever reason nginx doesn't come up again...)

@fulv
Copy link
Member

fulv commented Jan 16, 2020 via email

@tkimnguyen
Copy link
Member

I have https://mailinabox.email running and it renews the certs without my quasi-human intervention, if you're looking for possible examples

@stevepiercy
Copy link
Contributor Author

@tkimnguyen yes, please! I still haven't figured this one out.

@stevepiercy
Copy link
Contributor Author

@tkimnguyen reping. I'd like to see your example.

@tkimnguyen
Copy link
Member

@stevepiercy this is what the certbot docs say about not stopping the webserver during the certificate issuance process: https://certbot.eff.org/docs/using.html#webroot

@stevepiercy
Copy link
Contributor Author

@tkimnguyen how do you use the webroot option with Plone? I can't figure out the value for --webroot-path for a Plone site. /var/www/html is the default path, but content is not served from there.

@smcmahon
Copy link
Member

The current version of the certbot-nginx plugin is supposedly capable of issuing and renewing with no downtime. There's a discussion of how this is done in the thread at:

https://certbot.eff.org/faq#can-i-issue-a-certificate-without-bringing-down-my-web-server

with some supplementary information from nginx at:

https://www.nginx.com/faq/how-does-zero-downtime-configuration-testingreload-in-nginx-plus-work/

If this is acceptable, that plugin makes things dead simple. I've tried it out in a branch:

https://github.com/plone/ansible-playbook/tree/simplified-certbot

See https://github.com/plone/ansible-playbook/blob/simplified-certbot/docs/certbot.rst for quick documentation.

@stevepiercy
Copy link
Contributor Author

@smcmahon I checked out that branch, and added an entry to my local-configure.yml, then ran ansible-playbook -K certbot.yml. It ran alone just fine, but ultimately fails at the step Test renewal with a dry run. with the following error message:

certbot.errors.StandaloneBindError: Problem binding to port 80: Could not bind to IPv4 or IPv6.

I then commented out certificate from local-configure.yml so that the roles/nginx/templates/host.j2 would use the certbot_hosts value. Still got the same error message.

I found that when I ssh in, and issue the command certbot renew --dry-run --nginx, then it works just fine. Without the --nginx flag, the command fails as it defaults to the standalone server.

I pushed a commit with some suggested changes.

Thanks for doing the legwork on this!

@stevepiercy
Copy link
Contributor Author

stevepiercy commented Mar 18, 2020

I also added a cron job to automatically attempt to renew the certificates in f0cc209

@smcmahon
Copy link
Member

Smoke test:

I tried adding, renewing and revoking certificates on a host using the certbot-nginx plugin while simultaneously hitting a static site with 100,000 sequential ab requests. I saw no failed requests and no latency greater than 10 ms.

@smcmahon
Copy link
Member

No cronjob is needed; the certbot-nginx package creates its own with a randomized run time.

I've created a pull request for the "simplified certbot" branch with stevepiercy's other changes.

@stevepiercy
Copy link
Contributor Author

No cronjob is needed; the certbot-nginx package creates its own with a randomized run time.

@smcmahon where is that cronjob? I didn't see it on my server for any user in /var/spool/cron/crontabs/ after running the playbook.

@smcmahon
Copy link
Member

/etc/cron.d/certbot

@jensens
Copy link
Member

jensens commented Mar 13, 2023

Merge or drop?

@stevepiercy
Copy link
Contributor Author

@jensens I don't know. I assume it is still useful to someone, but I lack the bandwidth to follow through. Anyone else is welcome to take it over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Letsencrypt support
6 participants