Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
praiskup committed Dec 5, 2023
1 parent 94f00da commit c722a32
Show file tree
Hide file tree
Showing 2 changed files with 81 additions and 141 deletions.
218 changes: 79 additions & 139 deletions doc/how_to_upgrade_persistent_instances.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,152 +11,85 @@ How to upgrade persistent instances (Amazon AWS)
This article describes how to upgrade persistent instances (e.g. copr-fe-dev) to
a new Fedora version.

TODO: schedule outage.

Requirements
============

* access to `Amazon AWS`_
* ssh access to batcave01
* permissions to update aws.fedoraproject.org DNS records

* access to the team's `Amazon AWS account`_, and having that account properly
configured according to the `README.md <helper playbook repository_>`_
* permissions to run playbooks on `batcave01 <playbook SOP_>`_


Pre-upgrade
===========

The goal is to do as much work pre-upgrade as possible while focusing
only on important things and not creating a work overload with tasks,
that can be done post-upgrade.
The goal is to do as much work pre-upgrade as possible, while focusing on
as short **outage window** as possible, and still doing only important things
(and not creating a work that can be done post-upgrade).

Don't do the pre-upgrade too long before the actual upgrade. Ideally a couple of
hours or a day before.


Launch a new instance
---------------------

First, login into `Amazon AWS`_, otherwise the following step will not
work. Once you are logged-in, feel free to close the page.


1. Choose AMI
.............

Navigate to the `Cloud Base Images`_ download page and scroll down to
the section with cloud base images for Amazon public cloud. Use
``Click to launch`` button to launch an instance from the x86_64
AMI. Select the US East (N. Virginia) region.

You will get redirected to the Amazon AWS page.


2. Name and tags
................

- Set ``Name`` and add ``-new`` suffix (e.g. ``copr-distgit-dev-new``
or ``copr-distgit-prod-new``)
- Set ``CoprInstance`` to ``devel`` or ``production``
- Set ``CoprPurpose`` to ``infrastructure``
- Set ``FedoraGroup`` to ``copr``


3. Application and OS Images (Amazon Machine Image)
...................................................

Skip this section, we already chose the correct AMI from the Fedora
website.


4. Instance type
................

Currently, we use the following instance types:

+----------------+-------------+-------------+
| | Dev | Production |
+================+=============+=============+
| **frontend** | t3a.medium | t3a.xlarge |
+----------------+-------------+-------------+
| **backend** | t3a.medium | m5a.4xlarge |
+----------------+-------------+-------------+
| **keygen** | t3a.small | t3a.xlarge |
+----------------+-------------+-------------+
| **distgit** | t3a.medium | t3a.medium |
+----------------+-------------+-------------+
| **pulp** | t3a.medium | TODO |
+----------------+-------------+-------------+

When more power is needed, please use the `ec2instances.info`_ comparator to get
the cheapest available instance type according to our needs.


5. Key pair (login)
...................

- Make sure to use existing key pair named ``Ansible Key``. This allows us to
run the playbooks on ``batcave01`` box against the newly spawned VM.


6. Network settings
...................

- Click the ``Edit`` button in the box heading to show more options
- Select VPC ``vpc-0af***********972``
- Select ``Subnet`` to be ``us-east-1c``
- Switch ``Auto-assign IPv6 IP`` to ``Enable``
- Switch to ``Select existing security group`` and pick one of

- ``copr-frontend-sg``
- ``copr-backend-sg``
- ``copr-distgit-sg``
- ``copr-keygen-sg``
- ``copr-pulp-sg``


7. Configure storage
....................

- Click the ``Advanced`` button in the box heading to show more options
- Update the ``Size (GiB)`` of the root partition

+----------------+-------------+-------------+
| | Dev | Production |
+================+=============+=============+
| **frontend** | 50G | 50G |
+----------------+-------------+-------------+
| **backend** | 20G | 100G |
+----------------+-------------+-------------+
| **keygen** | 10G | 20G |
+----------------+-------------+-------------+
| **distgit** | 20G | 80G |
+----------------+-------------+-------------+
| **pulp** | 20G | TODO |
+----------------+-------------+-------------+

- Turn on the ``Encrypted`` option
- Select ``KMS key`` to whatever is ``(default)``


8. Advanced details
...................

- ``Termination protection`` - ``Enable``

Preparation
-----------

9. Launch instance
..................
Make sure you have the `helper playbook repository`_ cloned locally, step into
the clone directory.

At this point, please review ``dev.yml``, ``prod.yml`` and ``all.yml``
configuration in the ``./group_vars`` directory. Namely review all the
``old_instance_id``, ``old_network_id`` and data volume IDs, **these REALLY NEED
to match EC2 reality!**

You are going to run these playbooks on your machine::

play-vm-migration-01-new-box.yml
play-vm-migration-02-migrate-backend-box.yml
play-vm-migration-02-migrate-non-backend-box.yml
play-vm-migration-03-rename-instances.yml

While doing so, you will have to specify two Ansible variables explicitly,
``copr_instance`` (to either ``dev`` or ``prod`` string) and ``server_id`` (to
one of ``frontend``, ``backend``, ``distgit`` or ``keygen``). Example command
will look like::

$ opts=( -e copr_instance=dev -e server_id=keygen )
$ ansible-playbook play-vm-migration-01-new-box.yml "${opts[@]}"

Please realize AMI (golden images) you want to use when starting new instances,
we typically upgrade to ``Fedora N+2``, e.g. we migrate the infrastructure from
Fedora 37 to Fedora 39. Navigate to the `Cloud Base Images`_ download page, see
the section for **Intel and AMD x86_64 systems**, click the button next to the
**Fedora Cloud 39 AWS** column (JavaScript needs to be enabled!). Note the
``ami-*`` ID in the **US East (N. Virginia)** region (e.g.
``ami-0746fc234df9c1ee0``). This ``ami-*`` needs to be specified in
``group_vars/all.yml``, and both ``group_vars/{dev,prod}.yml``
need to correctly refer it.

You can double check other machine parameters like instance types (when more
power is needed, please use the `ec2instances.info`_ comparator to get
the cheapest available instance type according to our needs), naming, tags, IP
addresses, root volume sizes, etc. But typically, the defaults will be good
as-is.

Click ``Launch instance`` in the right panel.
.. note::
The ``group_vars/`` directory is the ultimate source of thruth for the Fedora
Copr instance, so please update the configuration later anytime you change
the instance parameters.

Make sure to use the existing key pair named ``Ansible Key``. This allows us to
**first** run the playbooks on ``batcave01`` box against the newly spawned VM
(the playbook then enables the Fedora Copr team members to ssh using their own
keys, as uploaded to FAS).

Add names for the root volumes
------------------------------
Launch new instances
--------------------

Once the instance is created, go to its details, switch to the
``Storage`` tab, and go through all attached volumes. Set the ``Name``
tag for each of them. Use the name of the instance as a prefix, e.g.
``copr-keygen-dev-root``, ``copr-frontend-prod-root``, etc.
This should be as simple as::

$ ansible-playbook play-vm-migration-01-new-box.yml "${opts[@]}"

Backup the current letsencrypt certificates
-------------------------------------------
Expand All @@ -170,21 +103,26 @@ Copy the certificate files by running the playbooks **against the current (old)
copr stack** (all machines). There's the ``-t certbot`` ansible tag that allows
you to speedup the playbook runs.


Pre-prepare the new VM
----------------------
Pre-prepare the new VM - backend only!
--------------------------------------

.. note::

Backend - It's possible to run the playbook against the new copr-backend
server before we actually shut-down the old one. But to make sure that
ansible won't complain, we need
It's possible to run the playbook against the new copr-backend server before
we actually shut the old one down. But to make sure that ansible won't
complain, we need

- A temporary volume attached to the new box providing an ext4 filesystem
with ``copr-repo`` label.

- An existing temporary hostname (with existing DNS record) to execute the
playbook against it.

The Volume, DNS record and a corresponding Elastic IP for this purpose is
already prepared. The ``play-vm-migration-01-new-box.yml`` playbook should
already make them available.


- A volume attached to the new box with label 'copr-repo'. Use already
existing volume named ``data-copr-be-dev-initial-playbook-run``
- An existing complementary DNS record (``copr-be-temp`` or
``copr-be-dev-temp``). poiting to the non-elastic IP of the new
server. See the `DNS SOP`_.


Note the private IP addresses
Expand Down Expand Up @@ -514,7 +452,9 @@ Close the infrastructure ticket, the upgrade is done.
.. _`Fedora infrastructure issue #7966`: https://pagure.io/fedora-infrastructure/issue/7966
.. _`fedora devel`: https://lists.fedorahosted.org/archives/list/devel@lists.fedoraproject.org/
.. _`copr devel`: https://lists.fedoraproject.org/archives/list/copr-devel@lists.fedorahosted.org/
.. _`Amazon AWS`: https://id.fedoraproject.org/saml2/SSO/Redirect?SPIdentifier=urn:amazon:webservices&RelayState=https://console.aws.amazon.com
.. _`Cloud Base Images`: https://alt.fedoraproject.org/cloud/
.. _`Amazon AWS account`: https://id.fedoraproject.org/saml2/SSO/Redirect?SPIdentifier=urn:amazon:webservices&RelayState=https://console.aws.amazon.com
.. _`Cloud Base Images`: https://fedoraproject.org/cloud/download/
.. _`DNS SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/dns/
.. _`ec2instances.info`: https://ec2instances.info/
.. _`helper playbook repository`: https://github.com/fedora-copr/ansible-fedora-copr
.. _`playbook SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/ansible/
4 changes: 2 additions & 2 deletions doc/raid_on_backend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Adding more space
1. Create two ``gp3`` volumes in EC2 of the same size and type, tag them with
``FedoraGroup: copr``, ``CoprInstance: production``, ``CoprPurpose:
infrastructure``. Attach them to a freshly started temporary instance (we
don't want to overload I/O with the `initial RAID sync <mdadm_sync>`_ on
don't want to overload I/O with the :ref:`initial RAID sync <mdadm_sync>` on
production backend). Make sure the instance type has enough EBS throughput
to perform the initial sync quickly enough.

Expand All @@ -68,7 +68,7 @@ Adding more space

$ mdadm --create --name=raid-be-03 --verbose /dev/mdXYZ --level=1 --raid-devices=2 /dev/nvmeXn1p1 /dev/nvmeYn1p1

Wait till the new empty `array is synchronized <mdadm_sync>`_ (may take hours
Wait till the new empty :ref:`array is synchronized <mdadm_sync>` (may take hours
or days, note we sync 2x16T). Check the details with ``mdadm -Db
/dev/md/raid-be-03``. See the tips bellow how to make the sync speed
unlimited with ``sysctl``.
Expand Down

0 comments on commit c722a32

Please sign in to comment.