Import Service is obsolete and superseded by cWDS. |
Terra Import Service. Tech doc here.
A walkthrough of the code in this repo is available at WALKTHROUGH.md.
Python version 3.9 is required for import-service to run properly.
Create and activate the Python virtualenvironment:
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ poetry install
You should periodically run the poetry install
line within your venv to keep it up-to-date with changes in dependencies.
If you have problems running poetry install
for the first time and encounter an error like:
The license_file parameter is deprecated, use license_files instead.
There is an incompatibility between pyyaml 5.4.1 and cython 3+ which you can work around with the following from your venv:
pip install wheel
pip install "cython<3.0.0" && pip install --no-build-isolation pyyaml==5.4.1
(src for workaround)
Activate and deactivate the venv:
$ source venv/bin/activate
<do all your work here>
(venv) $ deactivate
To run tests:
(venv) poetry run pytest
To run the type linter, go to the repo root directory and run:
(venv) poetry run mypy ./*.py && poetry run mypy -p app
If you'd like to run import-service
locally, the following steps should help:
# At the root of the directory run:
docker build . -t <your-favorite-name>
# Then, find your Image ID -- run the following command and get the SHA value associated with your new container
docker ps && docker run <image-id>
You should make mypy happy before opening a PR. Note that errors in some modules will be listed twice. This is annoying, but the good news is that you only have to fix them once.
This repo uses poetry
for dependency management. But, it deploys to App Engine and App Engine requires a requirements.txt
file. Therefore,
you must simultaneously update requirements.txt
AND poetry's poetry.lock
/pyproject.toml
when changing dependencies.
To sync requirements.txt
to poetry:
poetry export -f requirements.txt -o requirements.txt --without-hashes
For other poetry commands, such as to add a new dependency or update an existing dependency, see https://python-poetry.org/docs/cli/.
Deployments to non-production and production environments are performed in Jenkins. In order to access Jenkins, you will need to be on the Broad network or logged on to the Broad VPN.
A deployment to dev
environment will be automatically triggered every time there is a commit or push to the
develop branch on Github. If you would like to deploy a different
branch or tag to the dev
environment, you can do so by following the instructions below, but be aware that a new
deployment of the develop
branch will be triggered if anyone commits or pushes to that branch.
- Log in to Jenkins
- Navigate to the import-service-manual-deploy job
- In the left menu, click Build with Parameters
and select the
BRANCH_OR_TAG
that you want to deploy, theTARGET
environment to which you want to deploy, and enter theSLACK_CHANNEL
that you would like to receive notifications of the deploy jobs success/failure - Click the
Build
button
When doing a production deployment, each step of the checklist must be performed.
-
Double-check that
requirements.txt
is up to date with poetry; see Dependency Management. -
Create and push a new version tag for the commit you want to deploy; typically this will be the head of the develop branch. Go to Releases and select 'Draft a new Release'. Create a release with a new tag. Ensure that the tag is incremented properly based on the last released version.
-
Create a ticket for the release and be sure to leave the 'Fix Version' field blank. Add a checklist to the ticket and select 'Load Templates' from the ... menu to the right of the checklist. Use 'Import Service Release Checklist'. You may refer to (or clone) a previous release ticket for an example. This ticket ensures that the release is recorded for compliance, and that any release notes are picked up to be published. It also helps to keep track of the steps along the way, outlined in the next section.
You must deploy to each tier one-by-one and manually test in each tier after you deploy to it. This test should consist of uploading a large-ish (~2MB should suffice) tsv to a GCP workspace and ensuring it asynchronously uploads, as well as any specific changes made in the release. You may refer also to this document, although it is now somewhat out of date. Your deployment to a tier should not be considered complete until you have successfully executed each step of the manual test on that tier. Mark each step complete on the release ticket created above.
To deploy the application code, navigate to the import-service-manual-deploy
job and click the "Build with Parameters" link. Select the TAG
that you just created during the preparation steps and
the TIER
to which you want to deploy:
-
dev
deploy job succeeded and manual test passed - (Technically, this same commit is probably already running ondev
courtesy of the automaticdev
deployment job. However, deploying again is an important step because someone else may have triggered adev
deployment and we want to ensure that you understand the deployment process, the deployment tools are working properly, and that everything is working as intended.) -
alpha
deploy job succeeded and manual test passed. -
staging
deploy job succeeded and manual test passed -
prod
deploy job succeeded and manual test passed - In order to deploy toprod
, you must be on the DSP Suitability Roster. You will need to log into the production Jenkins instance and use the "import-service-manual-deploy" job to release the same tag to production.
NOTE:
- It is important that you deploy to all tiers. Because Import Service is an "indie service", we should strive to make sure
that all tiers other than
dev
are kept in sync and are running the same versions of code. This is essential so that as other DSP services are tested during their release process, they can ensure that their code will work properly with the latest version of Bond running inprod
.
Import Service
is a Google App Engine (GAE) application. Versions of import-service
are deployed to GAE either via a merge into the develop
branch (for dev
apps),
or via Jenkins for all other environments. See here for specific details.
GAE allows a maximum of 210 versions of any app, so we handle cleanup of old apps as
new versions are deployed.
DSP caps the number of versions that can be stored at 50 (this is just an arbitrary number), just to ensure plenty of versions available for rollback if a bug were introduced.
In cleanup_scripts
:
The delete-old-app-engine-versions-cleanup
bash script handles cleanup of multiple versions, sorted by oldest. It has a cap of at least 50 versions that must remain after cleanup.
For other environments other than prod
(such as alpha
or staging
), the script must be run manually. This is to ensure that the deletion of versions is intentional by an authenticated user.
For prod
, the suggestion is that the deletions occur deliberately in the Google App Engine console.