-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve Dataset Profiling Glue Job #649
Conversation
To note for this PR - Glue Jobs require the permissions to be able to verify the existence of the This is needed for Glue Jobs that In order to accommodate this requirement, the PR adds code to grant If the |
@@ -49,6 +49,13 @@ def on_create(event): | |||
except ClientError as e: | |||
pass | |||
|
|||
default_db_exists = False | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the default does not exist?
So now every time we create a dataset, the dataset admin role will get permissions to the default database. Permissions to this database are needed to run the profiling job. Do you know why? is it generic for any Glue Job you need permissions to the default database? |
We also need to fix it in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And thank you, this is a long-needed fix :)
@dlpzx, @noah-paige I will gladly replicate these changes to mod-main :) |
I know in documentation it calls out that Glue will also try to verify the default DB only if you enable The permissions required are only |
Merge latest changes from main into modularization-main It includes changes from #626, #630, #648, #649, and #651 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dlpzx <71252798+dlpzx@users.noreply.github.com> Co-authored-by: wolanlu <101870655+wolanlu@users.noreply.github.com> Co-authored-by: Amr Saber <amr.m.saber.mail@gmail.com> Co-authored-by: Noah Paige <69586985+noah-paige@users.noreply.github.com> Co-authored-by: kukushking <kukushkin.anton@gmail.com> Co-authored-by: Dariusz Osiennik <osiend@amazon.com> Co-authored-by: Dennis Goldner <107395339+degoldner@users.noreply.github.com> Co-authored-by: Abdulrahman Kaitoua <abdulrahman.kaitoua@polimi.it> Co-authored-by: akaitoua-sa <126820454+akaitoua-sa@users.noreply.github.com> Co-authored-by: Gezim Musliaj <102723839+gmuslia@users.noreply.github.com> Co-authored-by: Rick Bernotas <97474536+rbernotas@users.noreply.github.com> Co-authored-by: David Mutune Kimengu <57294718+kimengu-david@users.noreply.github.com> Co-authored-by: chamcca <40579012+chamcca@users.noreply.github.com> Co-authored-by: Dhruba <117375130+marjet26@users.noreply.github.com> Co-authored-by: dbalintx <132444646+dbalintx@users.noreply.github.com> Co-authored-by: Srinivas Reddy <srinivasreddych@outlook.com> Co-authored-by: mourya-33 <134511711+mourya-33@users.noreply.github.com> Co-authored-by: Noah Paige <noahpaig@amazon.com> Co-authored-by: dlpzx <dlpzx@amazon.com>
Feature or Bugfix
Detail
SPARK_VERSION
as an environment variable forpydeequ
before importdefault
databaseRelates
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.