Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glue DataBrew L2 Construct #402

Closed
1 of 11 tasks
jaidisido opened this issue Jan 17, 2022 · 2 comments
Closed
1 of 11 tasks

Glue DataBrew L2 Construct #402

jaidisido opened this issue Jan 17, 2022 · 2 comments
Labels

Comments

@jaidisido
Copy link

Description

AWS Glue DataBrew is a data preparation service that makes it easy for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning. It consists of 250+ transformations (e.g. correct invalid values, filter out anomalies, run data quality...) that can be automated and applied on data. At the moment, only L1 constructs for Glue DataBrew are supported. They are as follows:

  • CfnProject (AWS::DataBrew::Project):
    • An interactive data preparation workspace where a collection of related items (data, transformations, recipes...) are managed
  • CfnDataset (AWS::DataBrew::Dataset):
    • Dataset simply means a set of data—rows or records that are divided into columns or fields
  • CfnRecipe (AWS::DataBrew::Recipe):
    • A set of instructions or steps for data that you want DataBrew to act on. A recipe can contain many steps, and each step can contain many actions (e.g. filter, groupby...)
  • CfnJob (AWS::DataBrew::Job):
    • Transforms data by running the instructions that were set up in the recipe
  • CfnRuleset (AWS::DataBrew::Ruleset):
    • Set of rules that can be used in a profile job to validate data quality
  • CfnSchedule (AWS::DataBrew::Schedule):
    • Schedule for one or more Glue DataBrew jobs. Can be a specific date/time or on regular intervals

Among the reasons why L2 constructs would be justified, is because of how AWS Glue DataBrew recipes are published. Every time the user modifies a Glue DataBrew recipe, they must publish a new recipe version. At the moment, this process can only be done from the AWS console, CLI or SDK. It's not possible to publish a new recipe version via IaC (i.e. CFN). One possible implementation would be to have a custom resource deployed for each recipe that would automatically publish a new version whenever the recipe is modified in the CDK code. An equivalent implementation exists for the BucketDeployment construct for example.

Roles

Role User
Proposed by @jaidisido
Author(s) @alias, @alias, @alias
API Bar Raiser @alias
Stakeholders @alias, @alias, @alias

See RFC Process for details

Workflow

  • Tracking issue created (label: status/proposed)
  • API bar raiser assigned (ping us at #aws-cdk-rfcs if needed)
  • Kick off meeting
  • RFC pull request submitted (label: status/review)
  • Community reach out (via Slack and/or Twitter)
  • API signed-off (label api-approved applied to pull request)
  • Final comments period (label: status/final-comments-period)
  • Approved and merged (label: status/approved)
  • Execution plan submitted (label: status/planning)
  • Plan approved and merged (label: status/implementing)
  • Implementation complete (label: status/done)

Author is responsible to progress the RFC according to this checklist, and
apply the relevant labels to this issue so that the RFC table in README gets
updated.

@ghost
Copy link

ghost commented Nov 2, 2022

This would be awesome, we've found managing Databrew via CDK/Cfn unusable due to the issues you've highlighted here.
I'm currently deleting the job/recipe when a recipe update is needed, which is pretty painful as needs two deployments - one to delete, one to recreate.

I've looked into creating a custom resource to handle a proper publish of the update recipe, but we've since decided to just move to Glue Studio instead.

Hopefully this gets sorted one day!

@evgenyka evgenyka added l2-request request for new L2 construct bar-raiser/needed labels Aug 10, 2023
@awsmjs
Copy link
Contributor

awsmjs commented Dec 14, 2023

Closing this ticket. We believe the functionality is beneficial, but does not intersect with the core framework and should be vended and maintained separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants