Skip to content

Commit

Permalink
[Github action] Rebuild resource collection index
Browse files Browse the repository at this point in the history
See added documentation for corresponding AWS details. Given that the
resources all come from public-facing buckets (core & staging) it seems
ok to run this from a public repo, but we may want to revisit this once
we start consuming private data.

The index is only generated for datasets (not intermediate files) and
only for the core bucket (nextstrain-data) as the that's all that's
currently handled by the server, so it saves us a little s3 storage,
transfer overhead and server memory footprint. Future work
listing/visualising all available data will use this and so this
filtering is only temporary.
  • Loading branch information
jameshadfield committed Jan 4, 2024
1 parent 855d3b2 commit c0ee239
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 4 deletions.
39 changes: 39 additions & 0 deletions .github/workflows/index-resources.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: index resources

on:
# Run at ~4am UTC time which is (± an hour) 4am UK, 5am Switzerland, midnight
# US east coast, 9pm US west coast so that for most users (and most
# developers) the index regenerates overnight
schedule:
- cron: '0 4 * * *'

# Manually triggered using GitHub's UI
workflow_dispatch:

jobs:
rebuild-index:
runs-on: ubuntu-latest
permissions:
id-token: write # needed to interact with GitHub's OIDC Token endpoint
contents: read
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '16'
- run: npm ci
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: us-east-1
role-to-assume: arn:aws:iam::827581582529:role/GitHubActionsRoleResourceIndexer
- name: Rebuild the index
run: |
node resourceIndexer/main.js \
--gzip --output resources.json.gz \
--resourceTypes dataset --collections core
- name: Upload the new index, overwriting the existing index
run: |
aws s3 cp resources.json.gz s3://nextstrain-inventories/resources.json.gz
17 changes: 13 additions & 4 deletions docs/resource-collection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,12 @@ point to the (JSON) file when you run the server.
Automated index generation
==========================

*This section will be updated once the
index creation is automated.*
The resource collection index is rebuilt every night via a GitHub action running
from this repo.

*This approach should be revisited when (if) we start indexing private data,
especially for the potential of the GitHub action logging sensitive information
which will be publicly visible.*

AWS settings necessary for resource collection
==============================================
Expand Down Expand Up @@ -82,8 +86,13 @@ Index creation (Inventory access and index upload)

**Automated index generation**

*This section will be updated once the
index creation is automated.*
The GitHub action assumes necessary AWS permissions via the IAM role
`GitHubActionsRoleResourceIndexer
<https://us-east-1.console.aws.amazon.com/iamv2/home?region=us-east-1#/roles/details/GitHubActionsRoleResourceIndexer>`__
which is obtained using OIDC. This role uses permissions from the IAM policy
`NextstrainResourceIndexer
<https://us-east-1.console.aws.amazon.com/iamv2/home?region=us-east-1#/policies/details/arn%3Aaws%3Aiam%3A%3A827581582529%3Apolicy%2FNextstrainResourceIndexer>`__
to list & read the S3 inventories, as well as upload the new index.

**Local index generation for development purposes**

Expand Down

0 comments on commit c0ee239

Please sign in to comment.