-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modularization: Datasets modularization pt.5 #442
modularization: Datasets modularization pt.5 #442
Conversation
Moved dataset table column to modules
Renamed table column to the python's convention format
Added dataset module to config.json
Moved database table service
Renamed DatasetTable to DatasetTableService to avoid collisions with models.DatasetTable
Moved DatasetTableColumn to modules
Currently, only async handlers require dedicated loading. Long-running tasks (scheduled tasks) might not need to have a dedicated loading mode
Extracted code from glue to glue_column_handler Added handlers importing for datasets
Extracted the code for dataset table handler
Extracted the long-running task for datasets
Extracted the subscription service into datasets
Extracted the handler to get table columns
Needed for migration for modules
Fixed tests and added new for dataset module
Glossaries had different target types and had to be treated differently
Created API for glossaries to use modularization
Added and fixed tests
Moved FeedRegistry to gql since it's more appropriate place for this Started using registry to provide types Renaming and small fixes
Solve circular dependecy for redshift. It should go away after the migration of redshift
@@ -623,3 +619,82 @@ def count_dataset_tables(session, dataset_uri): | |||
.filter(DatasetTable.datasetUri == dataset_uri) | |||
.count() | |||
) | |||
|
|||
@staticmethod | |||
def query_environment_group_datasets(session, envUri, groupUri, filter) -> Query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These query functions should be in dataset_repository (following your design pattern), but as we discussed this could also be in a followup :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will extract this methods after migrating the sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to modularization: this task needs to be reviewed. It is triggered every 15 mins and it was originally conceived to add statements to the pre-existing bucket policy of imported datasets. Through this task we can enforce more restrictive access to imported buckets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks strange. The task runs every 15 mins to update policies for imported datasets. Most of the time it will do nothing, but just incur cost for running.
I wonder why we can't update the policies during the importing of datasets?
backend/dataall/modules/datasets/handlers/glue_dataset_handler.py
Outdated
Show resolved
Hide resolved
backend/dataall/modules/datasets/handlers/glue_table_handler.py
Outdated
Show resolved
Hide resolved
backend/dataall/modules/datasets/handlers/sns_dataset_handler.py
Outdated
Show resolved
Hide resolved
backend/dataall/modules/datasets/handlers/s3_location_handler.py
Outdated
Show resolved
Hide resolved
@@ -86,11 +84,14 @@ def generate_admins_data_access_policy(self) -> iam.Policy: | |||
|
|||
return policy | |||
|
|||
def generate_data_access_policy(self) -> iam.Policy: | |||
def generate_data_access_policy(self, session) -> iam.Policy: | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to pass the session? I see now, you are using the get_statements
defined in the subclass! got it
@@ -70,10 +73,8 @@ def test_get_dataset(client, dataset1, env1, user, group): | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of using MagicMock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly? it's just working :)
I couldn't find a better way to mock instance of class. Tried a few other ways to patch an instance, but got only FAILED
. This one worked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
Added new commits: |
### Feature or Bugfix - Refactoring ### Detail Refactoring of the Dataset entity and related to it code. Refactoring for Votes Introduced DataPolicy (the same way as ServicePolicy was used used) Extracted dataset related permissions. Used new `has_tenant_permission` instead of `has_tenant_perm` that allows not to pass unused parameters ### Relates data-dot-all#412 and data-dot-all#295 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Feature or Bugfix
Detail
Refactoring of the Dataset entity and related to it code.
Refactoring for Votes
Introduced DataPolicy (the same way as ServicePolicy was used used)
Extracted dataset related permissions.
Used new
has_tenant_permission
instead ofhas_tenant_perm
that allows not to pass unused parametersRelates
#412 and #295
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.