Skip to content

Commit

Permalink
Merge pull request #80 from con/gh-74
Browse files Browse the repository at this point in the history
Allow specifying GitHub workflow inclusions & exclusions with regexes
  • Loading branch information
yarikoptic authored Jun 9, 2021
2 parents 3df2573 + 3776ef8 commit dd48e39
Show file tree
Hide file tree
Showing 6 changed files with 281 additions and 85 deletions.
142 changes: 79 additions & 63 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Installation
python3 -m pip install tinuous

``tinuous`` can also optionally integrate with Datalad_. To install Datalad
alongside ``tinuous``, specify the ``datalad`` extra:
alongside ``tinuous``, specify the ``datalad`` extra::

python3 -m pip install "tinuous[datalad]"

Expand Down Expand Up @@ -114,82 +114,102 @@ The configuration file is a YAML file containing a mapping with the following
keys:

``repo``
The GitHub repository to retrieve assets for, in the form ``OWNER/NAME``
*(required)* The GitHub repository to retrieve assets for, in the form ``OWNER/NAME``

``vars``
*(optional)* A mapping defining custom path template placeholders. Each
key is the name of a custom placeholder, without enclosing braces, and the
value is the string to substitute in its place. Custom values may contain
standard path template placeholders as well as other custom placeholders
defined earlier in the mapping.
A mapping defining custom path template placeholders. Each key is the name
of a custom placeholder, without enclosing braces, and the value is the
string to substitute in its place. Custom values may contain standard path
template placeholders as well as other custom placeholders defined earlier
in the mapping.

``ci``
A mapping from the names of the CI systems from which to retrieve assets to
sub-mappings containing CI-specific configuration. Including a given CI
system is optional; assets will be fetched from a given system if & only if
it is listed in this mapping.
*(required)* A mapping from the names of the CI systems from which to
retrieve assets to sub-mappings containing CI-specific configuration.
Including a given CI system is optional; assets will be fetched from a
given system if & only if it is listed in this mapping.

The CI systems and their sub-mappings are as follows:

``github``
Configuration for retrieving assets from GitHub Actions. Subfields:

``path``
A template string that will be instantiated for each workflow run
to produce the path for the directory (relative to the current
working directory) under which the run's build logs will be saved.
See "`Path Templates`_" for more information.
*(required)* A template string that will be instantiated for each
workflow run to produce the path for the directory (relative to the
current working directory) under which the run's build logs will be
saved. See "`Path Templates`_" for more information.

``artifacts_path``
*(optional)* A template string that will be instantiated for each
workflow run to produce the path for the directory (relative to the
current working directory) under which the run's artifacts will be
saved. If this is not specified, no artifacts will be downloaded.
A template string that will be instantiated for each workflow run
to produce the path for the directory (relative to the current
working directory) under which the run's artifacts will be saved.
If this is not specified, no artifacts will be downloaded.

``releases_path``
*(optional)* A template string that will be instantiated for each
A template string that will be instantiated for each
(non-draft, non-prerelease) GitHub release to produce the path for
the directory (relative to the current working directory) under
which the release's assets will be saved. If this is not
specified, no release assets will be downloaded.

``workflows``
*(optional)* A list of the filenames for the workflows for which to
retrieve assets. The filenames should only consist of the workflow
basenames, including the file extension (e.g., ``test.yml``, not
``.github/workflows/test.yml``). When ``workflows`` is not
specified, assets are retrieved for all workflows in the repository.
A specification of the workflows for which to retrieve assets.
This can be either a list of workflow basenames, including the file
extension (e.g., ``test.yml``, not ``.github/workflows/test.yml``)
or a mapping containing the following fields:

``include``
A list of workflows to retrieve assets for, given as either
basenames or (when ``regex`` is true) regular expressions
to match against basenames. If ``include`` is omitted, it
defaults to including all workflows.

``exclude``
A list of workflows to not retrieve assets for, given as
either basenames or (when ``regex`` is true) regular
expressions to match against basenames. If ``exclude`` is
omitted, no workflows are excluded. Workflows that match
both ``include`` and ``exclude`` are excluded.

``regex``
A boolean. If true (default false), the elements of the
``include`` and ``exclude`` fields are treated as regular
expressions that are matched (unanchored) against workflow
basenames; if false, they are used as exact names

When ``workflows`` is not specified, assets are retrieved for all
workflows in the repository.

``travis``
Configuration for retrieving logs from Travis-CI.com. Subfield:

``path``
A template string that will be instantiated for each job of each
build to produce the path for the file (relative to the current
working directory) in which the job's logs will be saved. See
"`Path Templates`_" for more information.
*(required)* A template string that will be instantiated for each
job of each build to produce the path for the file (relative to the
current working directory) in which the job's logs will be saved.
See "`Path Templates`_" for more information.

``appveyor``
Configuration for retrieving logs from Appveyor. Subfields:

``path``
A template string that will be instantiated for each job of each
build to produce the path for the file (relative to the current
working directory) in which the job's logs will be saved. See
"`Path Templates`_" for more information.
*(required)* A template string that will be instantiated for each
job of each build to produce the path for the file (relative to the
current working directory) in which the job's logs will be saved.
See "`Path Templates`_" for more information.

``accountName``
The name of the Appveyor account to which the repository belongs on
Appveyor
*(required)* The name of the Appveyor account to which the
repository belongs on Appveyor

``projectSlug``
*(optional)* The project slug for the repository on Appveyor; if
not specified, it is assumed that the slug is the same as the
repository name
The project slug for the repository on Appveyor; if not specified,
it is assumed that the slug is the same as the repository name

``since``
A timestamp (date, time, & timezone); only assets for builds started after
the given point in time will be retrieved
*(required)* A timestamp (date, time, & timezone); only assets for builds
started after the given point in time will be retrieved

As the script retrieves new build assets, it keeps track of their starting
points. Once the assets for all builds for the given CI system &
Expand All @@ -198,12 +218,12 @@ keys:
``since`` value for the respective CI system on subsequent runs.

``until``
*(optional)* A timestamp (date, time, & timezone); only assets for builds
started before the given point in time will be retrieved
A timestamp (date, time, & timezone); only assets for builds started before
the given point in time will be retrieved

``types``
A list of build trigger event types; only assets for builds triggered by
one of the given events will be retrieved
*(required)* A list of build trigger event types; only assets for builds
triggered by one of the given events will be retrieved

The recognized event types are:

Expand All @@ -217,36 +237,32 @@ keys:
A build in response to new commits

``secrets``
*(optional)* A mapping from names (used in log messages) to regexes
matching secrets to sanitize
A mapping from names (used in log messages) to regexes matching secrets to
sanitize

``allow-secrets-regex``
*(optional)* Any strings that match a ``secrets`` regex and also match this
regex will not be sanitized. Note that ``allow-secrets-regex`` is tested
against just the substring that matched a ``secrets`` regex without any
surrounding text, and so lookahead and lookbehind will not work in this
regex.
Any strings that match a ``secrets`` regex and also match this regex will
not be sanitized. Note that ``allow-secrets-regex`` is tested against just
the substring that matched a ``secrets`` regex without any surrounding
text, and so lookahead and lookbehind will not work in this regex.

``datalad``
*(optional)* A sub-mapping describing integration of ``tinuous`` with
Datalad_. Subfields:
A sub-mapping describing integration of ``tinuous`` with Datalad_.
Subfields:

``enabled``
*(optional)* A boolean. If true (default false), Datalad must be
installed, the current directory will be converted into a Datalad
dataset if it is not one already, the assets will optionally be divided
up into subdatasets, and all new assets will be committed at the end of
a run of ``tinuous fetch``. ``path`` template strings may contain
``//`` separators indicating the boundaries of subdatasets.
A boolean. If true (default false), Datalad must be installed, the
current directory will be converted into a Datalad dataset if it is not
one already, the assets will optionally be divided up into subdatasets,
and all new assets will be committed at the end of a run of ``tinuous
fetch``. ``path`` template strings may contain ``//`` separators
indicating the boundaries of subdatasets.

``cfg_proc``
*(optional)* Procedure to run on the dataset & subdatasets when
creating them
Procedure to run on the dataset & subdatasets when creating them

.. _Datalad: https://www.datalad.org

All fields are required unless stated otherwise.

A sample config file:

.. code:: yaml
Expand Down
36 changes: 34 additions & 2 deletions src/tinuous/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@
from functools import cached_property
import heapq
from pathlib import Path
import re
from time import sleep
from typing import Any, Dict, Iterator, List, Optional, Tuple
from typing import Any, Dict, Iterator, List, Optional, Pattern, Tuple, Union

from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, validator
import requests

from .util import expand_template, log
Expand Down Expand Up @@ -161,3 +162,34 @@ class BuildLog(BuildAsset):

class Artifact(BuildAsset):
pass


# These config-related classes need to go in this file to avoid a circular
# import issue:


class NoExtraModel(BaseModel):
class Config:
allow_population_by_field_name = True
extra = "forbid"


class WorkflowSpec(NoExtraModel):
regex: bool = False
# Workflow names are stored as compiled regexes regardless of whether
# `regex` is true in order to keep type-checking simple.
include: List[Pattern] = Field(default_factory=lambda: [re.compile(".*")])
exclude: List[Pattern] = Field(default_factory=list)

@validator("include", "exclude", pre=True, each_item=True)
def _maybe_regex(
cls, v: Union[str, Pattern], values: Dict[str, Any] # noqa: B902, U100
) -> Union[str, Pattern]:
if not values["regex"] and isinstance(v, str):
v = r"\A" + re.escape(v) + r"\Z"
return v

def match(self, s: str) -> bool:
return any(r.search(s) for r in self.include) and not any(
r.search(s) for r in self.exclude
)
23 changes: 12 additions & 11 deletions src/tinuous/config.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,17 @@
from abc import ABC, abstractmethod
from datetime import datetime
import re
from typing import Dict, Iterator, List, Optional, Pattern, Tuple
from typing import Any, Dict, Iterator, List, Optional, Pattern, Tuple

from pydantic import BaseModel, Field, validator
from pydantic import Field, validator
from pydantic.fields import ModelField

from .appveyor import Appveyor
from .base import CISystem, EventType
from .base import CISystem, EventType, NoExtraModel, WorkflowSpec
from .github import GitHubActions
from .travis import Travis


class NoExtraModel(BaseModel):
class Config:
allow_population_by_field_name = True
extra = "forbid"


class CIConfig(NoExtraModel, ABC):
path: str

Expand All @@ -40,7 +34,14 @@ def get_system(
class GitHubConfig(CIConfig):
artifacts_path: Optional[str] = None
releases_path: Optional[str] = None
workflows: Optional[List[str]] = None
workflows: WorkflowSpec = Field(default_factory=WorkflowSpec)

@validator("workflows", pre=True)
def _workflow_list(cls, v: Any) -> Any: # noqa: B902, U100
if isinstance(v, list):
return {"include": v}
else:
return v

@staticmethod
def get_auth_tokens() -> Dict[str, str]:
Expand All @@ -58,7 +59,7 @@ def get_system(
since=since,
until=until,
token=tokens["github"],
workflows=self.workflows,
workflow_spec=self.workflows,
)


Expand Down
23 changes: 15 additions & 8 deletions src/tinuous/github.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from pathlib import Path
from shutil import rmtree
import tempfile
from typing import Dict, Iterator, List, Optional, Tuple
from typing import Dict, Iterator, List, Tuple
from zipfile import ZipFile

from github import Github
Expand All @@ -15,20 +15,29 @@
from pydantic import BaseModel, Field
import requests

from .base import APIClient, Artifact, BuildAsset, BuildLog, CISystem, EventType
from .base import (
APIClient,
Artifact,
BuildAsset,
BuildLog,
CISystem,
EventType,
WorkflowSpec,
)
from .util import (
ensure_aware,
expand_template,
get_github_token,
iterfiles,
log,
removeprefix,
sanitize_pathname,
stream_to_file,
)


class GitHubActions(CISystem):
workflows: Optional[List[str]] = None
workflow_spec: WorkflowSpec
hash2pr: Dict[str, str] = Field(default_factory=dict)

@staticmethod
Expand All @@ -55,11 +64,9 @@ def ghrepo(self) -> Repository:
return self.client.get_repo(self.repo)

def get_workflows(self) -> Iterator[Workflow]:
if self.workflows is None:
yield from self.ghrepo.get_workflows()
else:
for wffile in self.workflows:
yield self.ghrepo.get_workflow(wffile)
for wf in self.ghrepo.get_workflows():
if self.workflow_spec.match(removeprefix(wf.path, ".github/workflows/")):
yield wf

def get_build_assets(
self, event_types: List[EventType], artifacts: bool = False
Expand Down
Loading

0 comments on commit dd48e39

Please sign in to comment.