Glue DataBrew L2 Construct #402

jaidisido · 2022-01-17T12:39:44Z

Description

AWS Glue DataBrew is a data preparation service that makes it easy for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning. It consists of 250+ transformations (e.g. correct invalid values, filter out anomalies, run data quality...) that can be automated and applied on data. At the moment, only L1 constructs for Glue DataBrew are supported. They are as follows:

CfnProject (AWS::DataBrew::Project):
- An interactive data preparation workspace where a collection of related items (data, transformations, recipes...) are managed
CfnDataset (AWS::DataBrew::Dataset):
- Dataset simply means a set of data—rows or records that are divided into columns or fields
CfnRecipe (AWS::DataBrew::Recipe):
- A set of instructions or steps for data that you want DataBrew to act on. A recipe can contain many steps, and each step can contain many actions (e.g. filter, groupby...)
CfnJob (AWS::DataBrew::Job):
- Transforms data by running the instructions that were set up in the recipe
CfnRuleset (AWS::DataBrew::Ruleset):
- Set of rules that can be used in a profile job to validate data quality
CfnSchedule (AWS::DataBrew::Schedule):
- Schedule for one or more Glue DataBrew jobs. Can be a specific date/time or on regular intervals

Among the reasons why L2 constructs would be justified, is because of how AWS Glue DataBrew recipes are published. Every time the user modifies a Glue DataBrew recipe, they must publish a new recipe version. At the moment, this process can only be done from the AWS console, CLI or SDK. It's not possible to publish a new recipe version via IaC (i.e. CFN). One possible implementation would be to have a custom resource deployed for each recipe that would automatically publish a new version whenever the recipe is modified in the CDK code. An equivalent implementation exists for the BucketDeployment construct for example.

Roles

Role	User
Proposed by	@jaidisido
Author(s)	@alias, @alias, @alias
API Bar Raiser	@alias
Stakeholders	@alias, @alias, @alias

See RFC Process for details

Workflow

Author is responsible to progress the RFC according to this checklist, and
apply the relevant labels to this issue so that the RFC table in README gets
updated.

The text was updated successfully, but these errors were encountered:

ghost · 2022-11-02T22:21:04Z

This would be awesome, we've found managing Databrew via CDK/Cfn unusable due to the issues you've highlighted here.
I'm currently deleting the job/recipe when a recipe update is needed, which is pretty painful as needs two deployments - one to delete, one to recreate.

I've looked into creating a custom resource to handle a proper publish of the update recipe, but we've since decided to just move to Glue Studio instead.

Hopefully this gets sorted one day!

awsmjs · 2023-12-14T22:48:54Z

Closing this ticket. We believe the functionality is beneficial, but does not intersect with the core framework and should be vended and maintained separately.

jaidisido added management/tracking status/proposed Newly proposed RFC labels Jan 17, 2022

evgenyka added l2-request request for new L2 construct bar-raiser/needed labels Aug 10, 2023

awsmjs closed this as completed Dec 14, 2023

mrgrain added status/rejected and removed status/proposed Newly proposed RFC bar-raiser/needed labels Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glue DataBrew L2 Construct #402

Glue DataBrew L2 Construct #402

jaidisido commented Jan 17, 2022

ghost commented Nov 2, 2022

awsmjs commented Dec 14, 2023

Glue DataBrew L2 Construct #402

Glue DataBrew L2 Construct #402

Comments

jaidisido commented Jan 17, 2022

Description

Roles

Workflow

ghost commented Nov 2, 2022

awsmjs commented Dec 14, 2023