Skip to content

Commit

Permalink
Introduces separate runtime provider schema
Browse files Browse the repository at this point in the history
The provider.yaml contains more information that required at
runtime (specifically about documentation building). Those
fields are not needed at runtime and their presence is optional.
Also the runtime check for provider information should be more
relexed and allow for future compatibility (with
additional properties set to false). This way we can add new,
optional fields to provider.yaml without worrying about breaking
future-compatibility of providers with future airflow versions.

This changei restores 'additionalProperties': false in the
main, development-focused provider.yaml schema and introduced
new runtime schema that is used to verify the provider info when
providers are discovered by airflow.

This 'runtime' version should change very rarely as change to
add a new required property in it breaks compatibility of
providers with already released versions of Airflow.

We also trim-down the provider.yaml file when preparing provider
packages to only contain those fields that are required in the
runtime schema.
  • Loading branch information
potiuk committed Jan 11, 2021
1 parent 7e778e1 commit fed91c6
Show file tree
Hide file tree
Showing 13 changed files with 155 additions and 72 deletions.
6 changes: 1 addition & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -500,11 +500,7 @@ repos:
- https://json-schema.org/draft-07/schema
language: python
pass_filenames: true
files: >
(?x)
^airflow/provider.yaml.schema.json$|
^airflow/config_templates/config.yml.schema.json$|
^airflow/serialization/schema.json$
files: .*\.schema\.json$
require_serial: true
additional_dependencies: ['jsonschema==3.2.0', 'PyYAML==5.3.1', 'requests==2.25.0']
- id: json-schema
Expand Down
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ global-exclude __pycache__ *.pyc
include airflow/alembic.ini
include airflow/api_connexion/openapi/v1.yaml
include airflow/git_version
include airflow/provider.yaml.schema.json
include airflow/provider_info.schema.json
include airflow/customized_form_field_behaviours.schema.json
include airflow/serialization/schema.json
include airflow/utils/python_virtualenv_script.jinja2
2 changes: 1 addition & 1 deletion TESTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -973,7 +973,7 @@ If ``current`` is specified (default), then the current version of Airflow is us
Otherwise, the released version of Airflow is installed.

The ``-install-airflow-version=<VERSION>`` command make sure that the current (from sources) version of
Airflow is removed and the released version of Airflow from ``Pypi`` is installed. Note that tests sources
Airflow is removed and the released version of Airflow from ``PyPI`` is installed. Note that tests sources
are not removed and they can be used to run tests (unit tests and system tests) against the
freshly installed version.

Expand Down
2 changes: 1 addition & 1 deletion airflow/deprecated_schemas/provider-2.0.0.yaml.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"type": "string"
},
"versions": {
"description": "List of available versions in Pypi. Sorted descending according to release date.",
"description": "List of available versions in PyPI. Sorted descending according to release date.",
"type": "array",
"items": {
"type": "string"
Expand Down
12 changes: 6 additions & 6 deletions airflow/provider.yaml.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"type": "string"
},
"versions": {
"description": "List of available versions in Pypi. Sorted descending according to release date.",
"description": "List of available versions in PyPI. Sorted descending according to release date.",
"type": "array",
"items": {
"type": "string"
Expand Down Expand Up @@ -68,7 +68,7 @@
"maxItems": 1
}
},
"additionalProperties": true,
"additionalProperties": false,
"required": [
"integration-name",
"external-doc-url",
Expand All @@ -93,7 +93,7 @@
}
}
},
"additionalProperties": true,
"additionalProperties": false,
"required": [
"integration-name",
"python-modules"
Expand Down Expand Up @@ -141,7 +141,7 @@
}
}
},
"additionalProperties": true,
"additionalProperties": false,
"required": [
"integration-name",
"python-modules"
Expand Down Expand Up @@ -170,7 +170,7 @@
"description": "List of python modules containing the transfers."
}
},
"additionalProperties": true,
"additionalProperties": false,
"required": [
"source-integration-name",
"target-integration-name",
Expand All @@ -193,7 +193,7 @@
}
}
},
"additionalProperties": true,
"additionalProperties": false,
"required": [
"name",
"package-name",
Expand Down
38 changes: 38 additions & 0 deletions airflow/provider_info.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"package-name": {
"description": "Package name available under which the package is available in the PyPI repository.",
"type": "string"
},
"name": {
"description": "Provider name",
"type": "string"
},
"description": {
"description": "Information about the package in RST format",
"type": "string"
},
"hook-class-names": {
"type": "array",
"description": "Hook class names that provide connection types to core",
"items": {
"type": "string"
}
},
"extra-links": {
"type": "array",
"description": "Class name that provide extra link functionality",
"items": {
"type": "string"
}
}
},
"required": [
"name",
"package-name",
"description",
"versions"
]
}
17 changes: 10 additions & 7 deletions airflow/providers_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@
log = logging.getLogger(__name__)


def _create_provider_schema_validator():
"""Creates JSON schema validator from the provider.yaml.schema.json"""
schema = json.loads(importlib_resources.read_text('airflow', 'provider.yaml.schema.json'))
def _create_provider_info_schema_validator():
"""Creates JSON schema validator from the provider_info.schema.json"""
schema = json.loads(importlib_resources.read_text('airflow', 'provider_info.schema.json'))
cls = jsonschema.validators.validator_for(schema)
validator = cls(schema)
return validator
Expand Down Expand Up @@ -106,15 +106,15 @@ def __init__(self):
# Customizations for javascript fields are kept here
self._field_behaviours: Dict[str, Dict] = {}
self._extra_link_class_name_set: Set[str] = set()
self._provider_schema_validator = _create_provider_schema_validator()
self._provider_schema_validator = _create_provider_info_schema_validator()
self._customized_form_fields_schema_validator = (
_create_customized_form_field_behaviours_schema_validator()
)
self._initialized = False

def initialize_providers_manager(self):
"""Lazy initialization of provider data."""
# We cannot use @cache here because it does not work during pytests, apparently each test
# We cannot use @cache here because it does not work during pytest, apparently each test
# runs it it's own namespace and ProvidersManager is a different object in each namespace
# even if it is singleton but @cache on the initialize_providers_manager message still works in the
# way that it is called only once for one of the objects (at least this is how it looks like
Expand All @@ -139,7 +139,10 @@ def _discover_all_providers_from_packages(self) -> None:
"""
Discovers all providers by scanning packages installed. The list of providers should be returned
via the 'apache_airflow_provider' entrypoint as a dictionary conforming to the
'airflow/provider.yaml.schema.json' schema.
'airflow/provider_info.schema.json' schema. Note that the schema is different at runtime
than provider.yaml.schema.json. The development version of provider schema is more strict and changes
together with the code. The runtime version is more relaxed (allows for additional properties)
and verifies only the subset of fields that are needed at runtime.
"""
for entry_point, dist in entry_points_with_dist('apache_airflow_provider'):
package_name = dist.metadata['name']
Expand Down Expand Up @@ -194,7 +197,7 @@ def _add_provider_info_from_local_source_files_on_path(self, path) -> None:
for folder, subdirs, files in os.walk(path, topdown=True):
for filename in fnmatch.filter(files, "provider.yaml"):
package_name = "apache-airflow-providers" + folder[len(root_path) :].replace(os.sep, "-")
# We are skipping discovering snowflake because of snowflake monkeypatching problem
# We are skipping discovering snowflake because of snowflake monkey-patching problem
# This is only for local development - it has no impact for the packaged snowflake provider
# That should work on its own
# https://github.com/apache/airflow/issues/12881
Expand Down
Loading

0 comments on commit fed91c6

Please sign in to comment.