-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
75 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Config Database | ||
* `workspace` | ||
* Each record represents a logical workspace for an Airbyte user. In the open-source version of the product, only one workspace is allowed. | ||
* `actor_definition` | ||
* Each record represents a connector that Airbyte supports, e.g. Postgres. This table represents all the connectors that is supported by the current running platform. | ||
* The `actor_type` column tells us whether the record represents a Source or a Destination. | ||
* The `spec` column is a JSON blob. The schema of this JSON blob matches the [spec](airbyte-protocol.md#actor-specification) model in the Airbyte Protocol. Because the protocol object is JSON, this has to be a JSON blob. | ||
* The `release_stage` describes the certification level of the connector (e.g. Alpha, Beta, Generally Available). | ||
* The `docker_repository` field is the name of the docker image associated with the connector definition. `docker_image_tag` is the tag of the docker image and the version of the connector definition. | ||
* The `source_type` field is only used for Sources, and represents the category of the connector definition (e.g. API, Database). | ||
* The `resource_requirements` field sets a default resource requirement for any connector of this type. This overrides the default we set for all connector definitions, and it can be overridden by a connection-specific resource requirement. The column is a JSON blob with the schema defined in [ActorDefinitionResourceRequirements.yaml](airbyte-config/config-models/src/main/resources/types/ActorDefinitionResourceRequirements.yaml) | ||
* The `public` boolean column, describes if a connector is available to all workspaces or not. For non, `public` connector definitions, they can be provisioned to a workspace using the `actor_definition_workspace_grant` table. `custom` means that the connector is written by a user of the platform (and not packaged into the Airbyte product). | ||
* Each record contains additional metadata and display data about a connector (e.g. `name` and `icon`), and we should add additional metadata here over time. | ||
* `actor_definition_workspace_grant` | ||
* Each record represents provisioning a non `public` connector definition to a workspace. | ||
* todo (cgardens) - should this table have a `created_at` column? | ||
* `actor` | ||
* Each record represents a configured connector. e.g. A Postgres connector configured to pull data from my database. | ||
* The `actor_type` column tells us whether the record represents a Source or a Destination. | ||
* The `actor_definition_id` column is a foreign key to the connector definition that this record is implementing. | ||
* The `configuration` column is a JSON blob. The schema of this JSON blob matches the schema specified in the `spec` column in the `connectionSpecification` field of the JSON blob. Keep in mind this schema is specific to each connector (e.g. the schema of Postgres and Salesforce are different), which is why this column has to be a JSON blob. | ||
* `actor_catalog` | ||
* Each record contains a catalog for an actor. The records in this table are meant to be immutable. | ||
* The `catalog` column is a JSON blob. The schema of this JSON blob matches the [catalog](airbyte-protocol.md#catalog) model in the Airbyte Protocol. Because the protocol object is JSON, this has to be a JSON blob. The `catalog_hash` column is a 32-bit murmur3 hash ( x86 variant) of the `catalog` field to make comparisons easier. | ||
* todo (cgardens) - should we remove the `modified_at` column? These records should be immutable. | ||
* `actor_catalog_fetch_event` | ||
* Each record represents an attempt to fetch the catalog for an actor. The records in this table are meant to be immutable. | ||
* The `actor_id` column represents the actor that the catalog is being fetched for. The `config_hash` represents a hash (32-bit murmur3 hash - x86 variant) of the `configuration` column of that actor, at the time the attempt to fetch occurred. | ||
* The `catalog_id` is a foreign key to the `actor_catalog` table. It represents the catalog fetched by this attempt. We use the foreign key, because the catalogs are often large and often multiple fetch events result in retrieving the same catalog. Also understanding how often the same catalog is fetched is interesting from a product analytics point of view. | ||
* The `actor_version` column represents the `actor_definition` version that was in use when the fetch event happened. This column is needed, because while we can infer the `actor_definition` from the foreign key relationship with the `actor` table, we cannot do the same for the version, as that can change over time. | ||
* todo (cgardens) - should we remove the `modified_at` column? These records should be immutable. | ||
* `connection` | ||
* Each record in this table configures a connection (`source_id`, `destination_id`, and relevant configuration). | ||
* The `resource_requirements` field sets a default resource requirement for the connection. This overrides the default we set for all connector definitions and the default set for the connector definitions. The column is a JSON blob with the schema defined in [ResourceRequirements.yaml](airbyte-config/config-models/src/main/resources/types/ResourceRequirements.yaml). | ||
* The `source_catalog_id` column is a foreign key to the `sourc_catalog` table and represents the catalog that was used to configure the connection. This should not be confused with the `catalog` column which contains the [ConfiguredCatalog](airbyte-protocol.md#catalog) for the connection. | ||
* The `schedule_type` column defines what type of schedule is being used. If the `type` is manual, then `schedule_data` will be null. Otherwise, `schedule_data` column is a JSON blob with the schema of [StandardSync#scheduleData](airbyte-config/config-models/src/main/resources/types/StandardSync.yaml#79) that defines the actual schedule. The columns `manual` and `schedule` are deprecated and should be ignored (they will be dropped soon). | ||
* The `namespace_type` column configures whether the namespace for the connection should use that defined by the source, the destination, or a user-defined format (`custom`). If `custom` the `namespace_format` column defines the string that will be used as the namespace. | ||
* The `status` column describes the activity level of the connector: `active` - current schedule is respected, `inactive` - current schedule is ignored (the connection does not run) but it could be switched back to active, and `deprecated` - the connection is permanently off (cannot be moved to active or inactive). | ||
* `state` | ||
* The `state` table represents the current (last) state for a connection. For a connection with `stream` state, there will be a record per stream. For a connection with `global` state, there will be a record per stream and an additional record to store the shared (global) state. For a connection with `legacy` state, there will be one record per connection. | ||
* In the `stream` and `global` state cases, the `stream_name` and `namespace` columns contains the name of the stream whose state is represented by that record. For the shared state in global `stream_name` and `namespace` will be null. | ||
* The `state` column contains the state JSON blob. Depending on the type of the connection, the schema of the blob will be different. | ||
* `stream` - for this type, this column is a JSON blob that is a blackbox to the platform and known only to the connector that generated it. | ||
* `global` - for this type, this column is a JSON blob that is a blackbox to the platform and known only to the connector that generated it. This is true for both the states for each stream and the shared state. | ||
* `legacy` - for this type, this column is a JSON blob with a top-level key called `state`. Within that `state` is a blackbox to the platform and known only to the connector that generated it. | ||
* The `type` column describes the type of the state of the row. type can be `STREAM`, `GLOBAL` or `LEGACY`. | ||
* The connection_id is a foreign key to the connection for which we are tracking state. | ||
* `stream_reset` | ||
* Each record in this table represents a stream in a connection that is enqueued to be reset or is currently being reset. It can be thought of as a queue. Once the stream is reset, the record is removed from the table. | ||
* `operation` | ||
* The `operation` table transformations for a connection beyond the raw output produced by the destination. The two options are: `normalization`, which outputs Airbyte's basic normalization. The second is `dbt`, which allows a user to configure their own custom dbt transformation. A connection can have multiple operations (e.g. it can do `normalization` and `dbt`). | ||
* If the `operation` is `dbt`, then the `operator_dbt` column will be populated with a JSON blob with the schema from [OperatorDbt](airbyte-config/config-models/src/main/resources/types/OperatorDbt.yaml). | ||
* If the `operation` is `normalization`, then the `operator_dbt` column will be populated with a JSON blob with the scehma from [OperatorNormalization](airbyte-config/config-models/src/main/resources/types/OperatorNormalization.yaml). | ||
* Operations are scoped by workspace, using the `workspace_id` column. | ||
* `connection_operation` | ||
* This table joins the `operation` table to the `connection` for which it is configured. | ||
* `workspace_service_account` | ||
* This table is a WIP for an unfinished feature. | ||
* `actor_oauth_parameter` | ||
* The name of this table is misleading. It refers to parameters to be used for any instance of an `actor_definition` (not an `actor`) within a given workspace. For OAuth, the model is that a user is provisioning access to their data to a third party tool (in this case the Airbyte Platform). Each record represents information (e.g. client id, client secret) for that third party that is getting access. | ||
* These parameters can be scoped by workspace. If `workspace_id` is not present, then the scope of the parameters is to the whole deployment of the platform (e.g. all workspaces). | ||
* The `actor_type` column tells us whether the record represents a Source or a Destination. | ||
* The `configuration` column is a JSON blob. The schema of this JSON blob matches the schema specified in the `spec` column in the `advanced_auth` field of the JSON blob. Keep in mind this schema is specific to each connector (e.g. the schema of Hubspot and Salesforce are different), which is why this column has to be a JSON blob. | ||
* `secrets` | ||
* This table is used to store secrets in open-source versions of the platform that have not set some other secrets store. This table allows us to use the same code path for secrets handling regardless of whether an external secrets store is set or not. This table is used by default for the open-source product. | ||
* `airbyte_configs_migrations` is metadata table used by Flyway (our database migration tool). It is not used for any application use cases. | ||
* `airbyte_configs` | ||
* Legacy table for config storage. Should be dropped. | ||
|
||
# Jobs Database | ||
* `jobs` | ||
* `attempts` | ||
* `airbyte_metadata` | ||
* `airbyte_jobs_migrations` is metadata table used by Flyway (our database migration tool). It is not used for any application use cases. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters