Skip to content

Commit

Permalink
Greg/community pr 17193 (airbytehq#23855)
Browse files Browse the repository at this point in the history
* [UPD] Add format and partitioning spec

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] Add parquet format, compression and partitioning

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [ADD] Integration and unit tests

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] Update docs and add bootstrap

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] bump version to 0.1.2

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [ADD] Changelog entry

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] typo

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] cast arrays with mixed types to json string

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] issues when casting athena to pandas types

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] cleanup

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] flush interval to reduce memory usage

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] allow state reset per stream

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] capitalize AWS

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [ADD] decimal support and default db LF-tags

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] account for type error

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] partition field duplication

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] bump awswrangler (fixes json compression issue)

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] refactor, infer pandas and glue types from json schema

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] default for items get

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] account for mixed type properties

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] bad complex types to json string

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] drop top keys when not in json schema

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] fix partitioning, add airbyte type, fix keyerror concurrent partitioning

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] make table type configurable

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] fix obvious type violations

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] add missing columns to create correct schema

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] integration test

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] formatting and flake

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] overwrite partial flush bug

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] integration tests

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* fix formatting

* [FIX] check and typing

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] rmv fillna

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] warn on reset

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] log on failed reset

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] cast bool

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] cast pandas columns bool casting

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [FIX] required spec and format defaults

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [ADD] icon

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* [UPD] address review comments

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>

* Automated Change

* auto-bump connector version

---------

Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Co-authored-by: Henri Blancke <blanckehenri@gmail.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: Sunny <6833405+sh4sh@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
  • Loading branch information
5 people authored Mar 27, 2023
1 parent a8e4f2c commit efcec10
Show file tree
Hide file tree
Showing 29 changed files with 1,661 additions and 946 deletions.
10 changes: 10 additions & 0 deletions airbyte-config/init/src/main/resources/icons/awsdatalake.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@
- name: AWS Datalake
destinationDefinitionId: 99878c90-0fbd-46d3-9d98-ffde879d17fc
dockerRepository: airbyte/destination-aws-datalake
dockerImageTag: 0.1.1
dockerImageTag: 0.1.2
documentationUrl: https://docs.airbyte.com/integrations/destinations/aws-datalake
icon: awsdatalake.svg
releaseStage: alpha
- name: BigQuery
destinationDefinitionId: 22f6c74f-5699-40ff-833c-4a879ea40133
Expand Down
146 changes: 133 additions & 13 deletions airbyte-config/init/src/main/resources/seed/destination_specs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -533,7 +533,7 @@
supported_destination_sync_modes:
- "overwrite"
- "append"
- dockerImage: "airbyte/destination-aws-datalake:0.1.1"
- dockerImage: "airbyte/destination-aws-datalake:0.1.2"
spec:
documentationUrl: "https://docs.airbyte.com/integrations/destinations/aws-datalake"
connectionSpecification:
Expand All @@ -544,7 +544,7 @@
- "credentials"
- "region"
- "bucket_name"
- "bucket_prefix"
- "lakeformation_database_name"
additionalProperties: false
properties:
aws_account_id:
Expand All @@ -553,11 +553,7 @@
description: "target aws account id"
examples:
- "111111111111"
region:
title: "AWS Region"
type: "string"
description: "Region name"
airbyte_secret: false
order: 1
credentials:
title: "Authentication mode"
description: "Choose How to Authenticate to AWS."
Expand Down Expand Up @@ -609,21 +605,145 @@
type: "string"
description: "Secret Access Key"
airbyte_secret: true
order: 2
region:
title: "S3 Bucket Region"
type: "string"
default: ""
description: "The region of the S3 bucket. See <a href=\"https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions\"\
>here</a> for all region codes."
enum:
- ""
- "us-east-1"
- "us-east-2"
- "us-west-1"
- "us-west-2"
- "af-south-1"
- "ap-east-1"
- "ap-south-1"
- "ap-northeast-1"
- "ap-northeast-2"
- "ap-northeast-3"
- "ap-southeast-1"
- "ap-southeast-2"
- "ca-central-1"
- "cn-north-1"
- "cn-northwest-1"
- "eu-central-1"
- "eu-north-1"
- "eu-south-1"
- "eu-west-1"
- "eu-west-2"
- "eu-west-3"
- "sa-east-1"
- "me-south-1"
- "us-gov-east-1"
- "us-gov-west-1"
order: 3
bucket_name:
title: "S3 Bucket Name"
type: "string"
description: "Name of the bucket"
airbyte_secret: false
description: "The name of the S3 bucket. Read more <a href=\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html\"\
>here</a>."
order: 4
bucket_prefix:
title: "Target S3 Bucket Prefix"
type: "string"
description: "S3 prefix"
airbyte_secret: false
order: 5
lakeformation_database_name:
title: "Lakeformation Database Name"
title: "Lake Formation Database Name"
type: "string"
description: "Which database to use"
airbyte_secret: false
description: "The default database this destination will use to create tables\
\ in per stream. Can be changed per connection by customizing the namespace."
order: 6
lakeformation_database_default_tag_key:
title: "Lake Formation Database Tag Key"
description: "Add a default tag key to databases created by this destination"
examples:
- "pii_level"
type: "string"
order: 7
lakeformation_database_default_tag_values:
title: "Lake Formation Database Tag Values"
description: "Add default values for the `Tag Key` to databases created\
\ by this destination. Comma separate for multiple values."
examples:
- "private,public"
type: "string"
order: 8
lakeformation_governed_tables:
title: "Lake Formation Governed Tables"
description: "Whether to create tables as LF governed tables."
type: "boolean"
default: false
order: 9
format:
title: "Output Format *"
type: "object"
description: "Format of the data output."
oneOf:
- title: "JSON Lines: Newline-delimited JSON"
required:
- "format_type"
properties:
format_type:
title: "Format Type *"
type: "string"
enum:
- "JSONL"
default: "JSONL"
compression_codec:
title: "Compression Codec (Optional)"
description: "The compression algorithm used to compress data."
type: "string"
enum:
- "UNCOMPRESSED"
- "GZIP"
default: "UNCOMPRESSED"
- title: "Parquet: Columnar Storage"
required:
- "format_type"
properties:
format_type:
title: "Format Type *"
type: "string"
enum:
- "Parquet"
default: "Parquet"
compression_codec:
title: "Compression Codec (Optional)"
description: "The compression algorithm used to compress data."
type: "string"
enum:
- "UNCOMPRESSED"
- "SNAPPY"
- "GZIP"
- "ZSTD"
default: "SNAPPY"
order: 10
partitioning:
title: "Choose how to partition data"
description: "Partition data by cursor fields when a cursor field is a date"
type: "string"
enum:
- "NO PARTITIONING"
- "DATE"
- "YEAR"
- "MONTH"
- "DAY"
- "YEAR/MONTH"
- "YEAR/MONTH/DAY"
default: "NO PARTITIONING"
order: 11
glue_catalog_float_as_decimal:
title: "Glue Catalog: Float as Decimal"
description: "Cast float/double as decimal(38,18). This can help achieve\
\ higher accuracy and represent numbers correctly as received from the\
\ source."
type: "boolean"
default: false
order: 12
supportsIncremental: true
supportsNormalization: false
supportsDBT: false
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# AWS Lake Formation Destination Connector Bootstrap

This destination syncs your data to s3 and aws data lake and will automatically create a glue catalog databases and tables for you.

See [this](https://docs.aws.amazon.com/lake-formation/latest/dg/how-it-works.html) to learn more about AWS Lake Formation.
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ RUN pip install .
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.1.1
LABEL io.airbyte.version=0.1.2
LABEL io.airbyte.name=airbyte/destination-aws-datalake
28 changes: 16 additions & 12 deletions airbyte-integrations/connectors/destination-aws-datalake/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,23 +140,28 @@ To run acceptance and custom integration tests:
./gradlew :airbyte-integrations:connectors:destination-aws-datalake:integrationTest
```

#### Running the Destination Acceptance Tests
#### Running the Destination Integration Tests

To successfully run the Destination Acceptance Tests, you need a `secrets/config.json` file with appropriate information. For example:

```json
{
"bucket_name": "your-bucket-name",
"bucket_prefix": "your-prefix",
"region": "your-region",
"aws_account_id": "111111111111",
"lakeformation_database_name": "an_lf_database",
"credentials": {
"credentials_title": "IAM User",
"aws_access_key_id": ".....",
"aws_secret_access_key": "....."
}
"aws_account_id": "111111111111",
"credentials": {
"credentials_title": "IAM User",
"aws_access_key_id": "aws_key_id",
"aws_secret_access_key": "aws_secret_key"
},
"region": "us-east-1",
"bucket_name": "datalake-bucket",
"lakeformation_database_name": "test",
"format": {
"format_type": "Parquet",
"compression_codec": "SNAPPY"
},
"partitioning": "NO PARTITIONING"
}

```

In the AWS account, you need to have the following elements in place:
Expand All @@ -167,7 +172,6 @@ In the AWS account, you need to have the following elements in place:
* The user must have appropriate permissions to the Lake Formation database to perform the tests (For example see: [Granting Database Permissions Using the Lake Formation Console and the Named Resource Method](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-database-permissions.html))



## Dependency Management

All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
Expand Down
Loading

0 comments on commit efcec10

Please sign in to comment.