Skip to content

Commit

Permalink
Add metadata validator (#24198)
Browse files Browse the repository at this point in the history
* Add begining of lib folder

* Generate models from poetry command

* Run the validation script

* Add the catalog overrides type

* Add test for valid metadata files

* Add error state tests

* Expand valid and invalid test cases

* Update readme

* Run formatter

* Delete remaining catalogs
  • Loading branch information
bnchrch authored and erohmensing committed Mar 22, 2023
1 parent 2c77227 commit d549cb6
Show file tree
Hide file tree
Showing 29 changed files with 1,854 additions and 0 deletions.
36 changes: 36 additions & 0 deletions airbyte-ci/connectors_ci/metadata_service/lib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Connector Metadata Service Library

This submodule is responsible for managing all the logic related to validating, uploading, and managing connector metadata.

## Installation

To use this submodule, it is recommended that you use Poetry to manage dependencies.

```
poetry install
```


## Generating Models

This submodule includes a tool for generating Python models from JSON Schema specifications. To generate the models, we use the library [datamodel-code-generator](https://github.com/koxudaxi/datamodel-code-generator). The generated models are stored in `models/generated`.

To generate the models, run the following command:

```bash
poetry poe generate-models

```

This will read the JSON Schema specifications in `models/src` and generate Python models in `models/generated`.


## Running Tests
```bash
poetry run pytest
```

## Validating Metadata Files
```bash
poetry run validate_metadata_file tests/fixtures/valid/metadata_catalog_override.yaml
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# generated by datamodel-codegen:
# filename: ConnectorMetadataDefinitionV0.yaml

from __future__ import annotations

from enum import Enum
from typing import List, Optional
from uuid import UUID

from pydantic import AnyUrl, BaseModel, Extra, Field


class ConnectorType(Enum):
destination = 'destination'
source = 'source'


class ReleaseStage(Enum):
alpha = 'alpha'
beta = 'beta'
generally_available = 'generally_available'
source = 'source'


class AllowedHosts(BaseModel):
class Config:
extra = Extra.allow

hosts: Optional[List[str]] = Field(
None,
description='An array of hosts that this connector can connect to. AllowedHosts not being present for the source or destination means that access to all hosts is allowed. An empty list here means that no network access is granted.',
)


class NormalizationDestinationDefinitionConfig(BaseModel):
class Config:
extra = Extra.allow

normalizationRepository: str = Field(
...,
description='a field indicating the name of the repository to be used for normalization. If the value of the flag is NULL - normalization is not used.',
)
normalizationTag: str = Field(
...,
description='a field indicating the tag of the docker repository to be used for normalization.',
)
normalizationIntegrationType: str = Field(
...,
description='a field indicating the type of integration dialect to use for normalization.',
)


class SuggestedStreams(BaseModel):
class Config:
extra = Extra.allow

streams: Optional[List[str]] = Field(
None,
description='An array of streams that this connector suggests the average user will want. SuggestedStreams not being present for the source means that all streams are suggested. An empty list here means that no streams are suggested.',
)


class ResourceRequirements(BaseModel):
class Config:
extra = Extra.forbid

cpu_request: Optional[str] = None
cpu_limit: Optional[str] = None
memory_request: Optional[str] = None
memory_limit: Optional[str] = None


class JobType(Enum):
get_spec = 'get_spec'
check_connection = 'check_connection'
discover_schema = 'discover_schema'
sync = 'sync'
reset_connection = 'reset_connection'
connection_updater = 'connection_updater'
replicate = 'replicate'


class JobTypeResourceLimit(BaseModel):
class Config:
extra = Extra.forbid

jobType: JobType
resourceRequirements: ResourceRequirements


class ActorDefinitionResourceRequirements(BaseModel):
class Config:
extra = Extra.forbid

default: Optional[ResourceRequirements] = Field(
None,
description='if set, these are the requirements that should be set for ALL jobs run for this actor definition.',
)
jobSpecific: Optional[List[JobTypeResourceLimit]] = None


class CatalogOverrides(BaseModel):
class Config:
extra = Extra.forbid

enabled: bool
name: Optional[str] = None
dockerRepository: Optional[str] = None
dockerImageTag: Optional[str] = None
supportsDbt: Optional[bool] = None
supportsNormalization: Optional[bool] = None
license: Optional[str] = None
supportUrl: Optional[AnyUrl] = None
sourceType: Optional[str] = None
allowedHosts: Optional[AllowedHosts] = None
normalizationConfig: Optional[NormalizationDestinationDefinitionConfig] = None
suggestedStreams: Optional[SuggestedStreams] = None
resourceRequirements: Optional[ActorDefinitionResourceRequirements] = None


class Catalog(BaseModel):
class Config:
extra = Extra.forbid

oss: Optional[CatalogOverrides] = None
cloud: Optional[CatalogOverrides] = None


class Data(BaseModel):
name: str
definitionId: UUID
connectorType: ConnectorType
dockerRepository: str
dockerImageTag: str
supportsDbt: Optional[bool] = None
supportsNormalization: Optional[bool] = None
license: str
supportUrl: AnyUrl
githubIssueLabel: str
sourceType: str
releaseStage: ReleaseStage
catalogs: Optional[Catalog] = None
allowedHosts: Optional[AllowedHosts] = None
normalizationConfig: Optional[NormalizationDestinationDefinitionConfig] = None
suggestedStreams: Optional[SuggestedStreams] = None
resourceRequirements: Optional[ActorDefinitionResourceRequirements] = None


class ConnectorMetadataDefinitionV0(BaseModel):
class Config:
extra = Extra.forbid

metadataSpecVersion: str
data: Data
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
"$schema": http://json-schema.org/draft-07/schema#
"$id": https://github.com/airbytehq/airbyte-types/blob/master/models/src/main/resources/ActorDefinitionResourceRequirements.yaml
title: ActorDefinitionResourceRequirements
description: actor definition specific resource requirements
type: object
# set to false because we need the validations on seeds to be strict. otherwise, we will just add whatever is in the seed file into the db.
additionalProperties: false
properties:
default:
description: if set, these are the requirements that should be set for ALL jobs run for this actor definition.
"$ref": ResourceRequirements.yaml
jobSpecific:
type: array
items:
"$ref": "#/definitions/JobTypeResourceLimit"
definitions:
JobTypeResourceLimit:
description: sets resource requirements for a specific job type for an actor definition. these values override the default, if both are set.
type: object
# set to false because we need the validations on seeds to be strict. otherwise, we will just add whatever is in the seed file into the db.
additionalProperties: false
required:
- jobType
- resourceRequirements
properties:
jobType:
"$ref": JobType.yaml
resourceRequirements:
"$ref": ResourceRequirements.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
"$schema": http://json-schema.org/draft-07/schema#
"$id": https://github.com/airbytehq/airbyte-types/blob/master/models/src/main/resources/AllowedHosts.yaml
title: AllowedHosts
description: A connector's allowed hosts. If present, the platform will limit communication to only hosts which are listed in `AllowedHosts.hosts`.
type: object
additionalProperties: true
properties:
hosts:
type: array
description: An array of hosts that this connector can connect to. AllowedHosts not being present for the source or destination means that access to all hosts is allowed. An empty list here means that no network access is granted.
items:
type: string
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
"$schema": http://json-schema.org/draft-07/schema#
"$id": https://github.com/airbytehq/airbyte/airbyte-ci/connectors_ci/metadata_service/lib/models/src/CatalogOverrides.yml
title: CatalogOverrides
description: describes the overrides per catalog of a connector
type: object
additionalProperties: false
required:
- enabled
properties:
enabled:
type: boolean
default: false
name:
type: string
dockerRepository:
type: string
dockerImageTag:
type: string
supportsDbt:
type: boolean
supportsNormalization:
type: boolean
license:
type: string
supportUrl:
type: string
format: uri
sourceType:
type: string
allowedHosts:
$ref: AllowedHosts.yaml
normalizationConfig:
"$ref": NormalizationDestinationDefinitionConfig.yaml
suggestedStreams:
$ref: SuggestedStreams.yaml
resourceRequirements:
$ref: ActorDefinitionResourceRequirements.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
"$schema": http://json-schema.org/draft-07/schema#
"$id": https://github.com/airbytehq/airbyte/airbyte-ci/connectors_ci/metadata_service/lib/models/src/ConnectorMetadataDefinitionV0.yml

title: ConnectorMetadataDefinitionV0
description: describes the metadata of a connector
type: object
required:
- metadataSpecVersion
- data
additionalProperties: false
properties:
metadataSpecVersion:
type: "string"
data:
type: object
required:
- name
- definitionId
- connectorType
- dockerRepository
- dockerImageTag
- license
- supportUrl
- githubIssueLabel
- sourceType
- releaseStage
properties:
name:
type: string
definitionId:
type: string
format: uuid
connectorType:
type: string
enum:
- destination
- source
dockerRepository:
type: string
dockerImageTag:
type: string
supportsDbt:
type: boolean
supportsNormalization:
type: boolean
license:
type: string
supportUrl:
type: string
format: uri
githubIssueLabel:
type: string
sourceType:
type: string
releaseStage:
type: string
enum:
- alpha
- beta
- generally_available
- source

catalogs:
anyOf:
- type: object
additionalProperties: false
properties:
oss:
anyOf:
- "$ref": CatalogOverrides.yaml
cloud:
anyOf:
- "$ref": CatalogOverrides.yaml

allowedHosts:
"$ref": AllowedHosts.yaml
normalizationConfig:
"$ref": NormalizationDestinationDefinitionConfig.yaml
suggestedStreams:
"$ref": SuggestedStreams.yaml
resourceRequirements:
"$ref": ActorDefinitionResourceRequirements.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
"$schema": http://json-schema.org/draft-07/schema#
"$id": https://github.com/airbytehq/airbyte-types/blob/master/models/src/main/resources/JobType.yaml
title: JobType
description: enum that describes the different types of jobs that the platform runs.
type: string
enum:
- get_spec
- check_connection
- discover_schema
- sync
- reset_connection
- connection_updater
- replicate
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
"$schema": http://json-schema.org/draft-07/schema#
"$id": https://github.com/airbytehq/airbyte-types/blob/master/models/src/main/resources/NormalizationDestinationDefinitionConfig.yaml
title: NormalizationDestinationDefinitionConfig
description: describes a normalization config for destination definition
type: object
required:
- normalizationRepository
- normalizationTag
- normalizationIntegrationType
additionalProperties: true
properties:
normalizationRepository:
type: string
description: a field indicating the name of the repository to be used for normalization. If the value of the flag is NULL - normalization is not used.
normalizationTag:
type: string
description: a field indicating the tag of the docker repository to be used for normalization.
normalizationIntegrationType:
type: string
description: a field indicating the type of integration dialect to use for normalization.
Loading

0 comments on commit d549cb6

Please sign in to comment.