Skip to content

Commit

Permalink
Use OpenTelemetry python API; simplify actions
Browse files Browse the repository at this point in the history
  • Loading branch information
msarahan committed Dec 16, 2024
1 parent c2d3cdc commit 8ff7993
Show file tree
Hide file tree
Showing 23 changed files with 802 additions and 384 deletions.
24 changes: 24 additions & 0 deletions .github/workflows/test-artifact-cleanup.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: test-artifact-cleanup
# During this workflow, we upload a file that follows the 'telemetry-tools-*'
# After running the clean-up-artifacts action, the artifact should no longer show up
# in the web UI.

on:
workflow_dispatch:

jobs:
telemetry-setup:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Create dummy file
shell: bash
run: echo "Dumbo" > file.txt
- name: Upload dummy file
uses: actions/upload-artifact@v4
with:
name: telemetry-tools-attrs-1234
path: file.txt
- name: Clean up telemetry intermediary artifacts
uses: ./telemetry-impls/clean-up-artifacts
133 changes: 103 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,79 @@
# shared-actions

Contains all of the shared composite actions used by RAPIDS.
Contains all of the shared composite actions used by RAPIDS. Several of these actions,
especially the telemetry actions, use a pattern that we refer to as "dispatch actions."
The general idea of a dispatch action is to make it easier to depend on other actions
at a specific revision, and also to simplify using files beyond a given action .yml file.

A dispatch action is one that:
* clones the shared-actions repository (repo/ref changeable using env vars)
* runs (dispatches to) another action within the clone, using a relative path

There can be more complicated arrangements of more actions, but the idea is to
have the local clone of the shared-actions repository be the first step of an action.

Actions that refer to each other assume that they have been checked out to the
./shared-actions folder. This *should* be the root of the GitHub Actions workspace.
This assumption is what allow code reuse between actions.

In general, we should try to never call "implementation actions" here. Instead,
we should prefer to create "dispatch actions" that clone shared-actions from a particular repo
at a particular ref, and then dispatch to an implementation action from that repo.
This adds complexity, but has other advantages:

* simplifies specifying a custom branch for actions for development and testing
* changes all shared-actions calls in a workflow at once, instead of changing each one
* allows reuse of shared-actions within the shared-actions repo. Trying to use these
without the clone and relative path would not otherwise keep the repo and ref
consistent, leading to great confusion over why changes aren't being reflected.
Actions that use this pattern should include "dispatch" in their folder name, so
that they can be readily distinguished from any actions that are either
standalone or otherwise implementations that assume that the ./shared-actions
folder is already cloned, so that they can use relative paths to reference other
actions and files.

## Example dispatch action

```yaml
name: 'Example dispatch action'
name: 'dispatch-example-action'
description: |
The purpose of this wrapper is to keep it easy for external consumers to switch branches of
the shared-actions repo when they are changing something about shared-actions and need to test it
in their pipelines.
Inputs here are all assumed to be env vars set outside of this script.
Set them in your main repo's workflows.
runs:
using: 'composite'
steps:
- name: Clone shared-actions repo
uses: actions/checkout@v4
with:
repository: ${{ env.SHARED_ACTIONS_REPO}}
ref: ${{ env.SHARED_ACTIONS_REF}}
repository: ${{ env.SHARED_ACTIONS_REPO }}
ref: ${{ env.SHARED_ACTIONS_REF }}
path: ./shared-actions
- name: Stash base env vars
uses: ./shared-actions/_stash-base-env-vars
- name: Run local implementation action
uses: ./shared-actions/impls/example-action
```
In this action, the "implementation action" is the
`./shared-actions/_stash-base-env-vars`. You can have inputs in your
`./shared-actions/impls/example-action`. You can have inputs in your
dispatch actions. You would just pass them through to the implementation action.
Environment variables do carry through from the parent workflow through the
dispatch action, into the implemetation action. In most cases, it is simpler
dispatch action, and then into the implemetation action. In most cases, it is simpler
(though less explicit) to set environment variables instead of plumbing inputs
through each action.

Environment variables are hard-coded, not detected. If you want to pass a different
environment variable through, you need to add it to implementation stash action,
like `telemetry-impls/stash-base-env-vars/action.yml`. You do not need to
explicitly specify it on the loading side.

## Implementation action

These are similar to dispatch actions, except that they should not clone
shared-actions. They can depend on other actions from the shared-actions
repository using the `./shared-actions` relative path.

```yaml
name: 'example-action'
description: |
An example of calling a python script in an action. Both the action
and the python file are part of the shared-actions repo.
runs:
using: 'composite'
steps:
- name: Run local action
uses: ./shared-actions/impls/another-action
- name: Run local script file
run: python -c "./shared-actions/impls/hello.py"
shell: bash
```

## Example calling workflow

The key detail here is that the presence of the SHARED_ACTIONS_REPO and/or
Expand All @@ -76,9 +89,69 @@ env:
jobs:
actions-user:
runs-on: ubuntu-latest
steps:
- name: Call dispatch example
# DO NOT change the branch here (@main) in PRs
uses: rapidsai/shared-actions/dispatch-example-action@main
```

This works because the environment variables get passed into the shared action. They are then
used by the `actions/checkout` action, taking priority over the default values.

## Calling in child shared workflows

Shared workflows complicate matters because environment variables do not get
passed through. If you set the `SHARED_ACTIONS_REPO` and/or `SHARED_ACTIONS_REF`
variables in the top-level parent workflow, they will not take effect in any
dispatch actions that you may call in child workflows. You can pass them as inputs
to child shared workflows, but that ends up being very verbose.

To carry this information into child workflows, we use a scheme that writes a
file with environment variables, uploads this file as an artifact, then downloads
and loads the file at the start of the child workflow.

The general scheme is:

### Top-level workflow
```yaml
jobs:
setup-env-vars:
runs-on: ubuntu-latest
steps:
# implicitly picks up env vars for SHARED_ACTIONS_REPO and _REF
- uses: rapidsai/shared-actions/telemetry-dispatch-stash-base-env@main
<rest of jobs>
summarize-telemetry:
needs: <all other jobs, or just pr-builder>
# private networks will affect your choice here. If your tempo server or
# forwarder/collector is only accessible on some instances, then use one of
# those instances here
runs-on: <node>
steps:
- uses: rapidsai/shared-actions/telemetry-dispatch-summarize@main
# if you use mTLS, this is probably the right place to pass in the certificates
```

### Child workflows
```yaml
jobs:
tests:
strategy:
matrix: ${{ fromJSON(needs.compute-matrix.outputs.MATRIX) }}
runs-on: "linux-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.DRIVER }}-1"
steps:
- name: Telemetry setup
id: telemetry-setup
# DO NOT change this in PRs
uses: rapidsai/shared-actions/dispatch-script@main
```
uses: rapidsai/shared-actions/telemetry-dispatch-setup@main
continue-on-error: true
extra_attributes: "rapids.cuda=${{ matrix.CUDA_VER }},rapids.py=${{ matrix.PY_VER }}"
<other steps, as usual>
```

Behind the scenes, the implementation actions are:
* ./telemetry-impls/stash-base-env-vars: storing base environment variables (including setting default values):
* ./telemetry-impls/load-then-clone: Downloads base env var file, loads it, then
clones shared-actions according to env vars that were just loaded
* ./telemetry-impls/summarize: Runs Python script to parse GitHub logs and send OpenTelemetry spans to endpoint
29 changes: 0 additions & 29 deletions telemetry-dispatch-load-base-env-vars/action.yml

This file was deleted.

27 changes: 27 additions & 0 deletions telemetry-dispatch-setup/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: telemetry-dispatch-setup
description: |
This script sets important environment variables that may be used by tools that
implement OpenTelemetry. This script also stores attributes (metadata) for the
current job, so that this metadata can be associated with spans during the final
parsing of job metadata.
This action should be called at the beginning of child workflows, generally as the first
step in any job other than computing the matrix.
inputs:
extra_attributes:
description: "comma-separated key=value attributes to associate with the current job"

runs:
using: 'composite'
steps:
- uses: rapidsai/shared-actions/telemetry-impls/load-then-clone@empty-certs
# overrides loaded value
- name: Set OTEL_SERVICE_NAME from job
uses: ./shared-actions/telemetry-impls/set-otel-service-name
- name: Store attributes to use as metadata when creating spans
# This also sets OTEL_RESOURCE_ATTRIBUTES, for any subsequent steps
# in the calling workflow that might use it.
uses: ./shared-actions/telemetry-impls/stash-job-attributes
with:
extra_attributes: ${{ inputs.extra_attributes }}
18 changes: 10 additions & 8 deletions telemetry-dispatch-stash-base-env-vars/action.yml
Original file line number Diff line number Diff line change
@@ -1,22 +1,24 @@
name: dispatch-stash-base-env-vars
name: telemetry-dispatch-stash-base-env-vars
description: |
Clones a particular branch/ref of a shared-actions repo, then
call the stash-base-env-vars implementation script, which writes
some environment variables so that downstream jobs can refer to them.
Stores base environment variables in a file and uploads that file
as an artifact.
Inputs here are all assumed to be env vars set outside of this script.
Set them in your main repo's workflows.
This action should only be called once in a build,
at the start of the top-level workflow. All other jobs in the top
level workflow should come after this job. It is generally enough
to have only the checks and devcontainers jobs explicitly depend on
it and have everything else be downstream of them.
runs:
using: 'composite'
steps:
# We can't use the load-then-clone action because the env vars file
# that it needs is something that we create here.
- name: Clone shared-actions repo
uses: actions/checkout@v4
with:
repository: ${{ env.SHARED_ACTIONS_REPO || 'rapidsai/shared-actions' }}
ref: ${{ env.SHARED_ACTIONS_REF || 'main' }}
path: ./shared-actions
- name: Get traceparent representation of current workflow
uses: ./shared-actions/telemetry-impls/traceparent
- name: Stash base env vars
uses: ./shared-actions/telemetry-impls/stash-base-env-vars
20 changes: 20 additions & 0 deletions telemetry-dispatch-stash-job-attributes/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: dispatch-stash-attributes
description: |
Clones a particular branch/ref of a shared-actions repo, then
call the stash-attributes implementation script, which writes
some environment variables so that downstream jobs can refer to them.
Inputs here are all assumed to be env vars set outside of this script.
Set them in your main repo's workflows.
inputs:
extra_attributes:
description: "comma-separated key=value attributes to associate with the current job"

runs:
using: 'composite'
steps:
- uses: rapidsai/shared-actions/telemetry-impls/load-then-clone@empty-certs
- name: Stash current job's OTEL_RESOURCE_ATTRIBUTES
uses: ./shared-actions/telemetry-impls/stash-job-attributes
with:
extra_attributes: ${{ inputs.extra_attributes }}
15 changes: 15 additions & 0 deletions telemetry-dispatch-summarize/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: telemetry-dispatch-summarize
description: |
This action is run in a final job on the top-level workflow, after all other
jobs are completed. This action downloads the JSON records of all jobs from
the current run. It then associates metadata records that were uploaded with
the telemetry-dispatch-stash-job-attributes action with jobs. This is
effectively label metadata. Finally, this action creates OpenTelemetry spans
with the timing and label metadata, and sends it to the configured Tempo
endpoint (or forwarder).
runs:
using: 'composite'
steps:
- uses: rapidsai/shared-actions/telemetry-impls/load-then-clone@empty-certs
- uses: ./shared-actions/telemetry-impls/summarize
31 changes: 0 additions & 31 deletions telemetry-dispatch-write-summary/action.yml

This file was deleted.

Loading

0 comments on commit 8ff7993

Please sign in to comment.