Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modularization: Datasets modularization pt.5 #442

Merged
merged 124 commits into from
May 10, 2023
Merged
Show file tree
Hide file tree
Changes from 117 commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
3a5e0de
Initialization of dataset module
nikpodsh Apr 11, 2023
a50a02f
Refactoring of datasets
nikpodsh Apr 11, 2023
be14986
Refactoring of datasets
nikpodsh Apr 11, 2023
06f82ad
Refactoring of datasets
nikpodsh Apr 11, 2023
38145ae
Fixed leftover in loader
nikpodsh Apr 11, 2023
f0e146a
Dataset refactoring
nikpodsh Apr 11, 2023
b039163
Dataset refactoring
nikpodsh Apr 11, 2023
b7922ed
Dataset refactoring
nikpodsh Apr 11, 2023
1771bca
Notebooks doesn't require tasks
nikpodsh Apr 11, 2023
3d1603f
Renamed tasks to handlers
nikpodsh Apr 11, 2023
fb6b515
Dataset refactoring
nikpodsh Apr 11, 2023
e3596a5
Dataset refactoring
nikpodsh Apr 11, 2023
3af2ecf
Dataset refactoring
nikpodsh Apr 11, 2023
1a063b2
Dataset refactoring
nikpodsh Apr 11, 2023
b733714
Dataset refactoring
nikpodsh Apr 11, 2023
2a4e2e0
Extracted feed registry
nikpodsh Apr 11, 2023
c15d090
Extracted feed and glossary registry and created a model registry
nikpodsh Apr 11, 2023
052a2b1
Dataset refactoring
nikpodsh Apr 12, 2023
d984483
Fixed and unignored test_tables_sync
nikpodsh Apr 12, 2023
dc0c935
Split model registry into feed and glossaries
nikpodsh Apr 12, 2023
727e353
Abstraction for glossaries
nikpodsh Apr 12, 2023
49fbb41
Fixed leftovers
nikpodsh Apr 12, 2023
7d029e7
Datasets refactoring
nikpodsh Apr 13, 2023
be527eb
Added runtime type registration for Union GraphQL type
nikpodsh Apr 13, 2023
3daf2aa
Changed Feed type registration mechanism
nikpodsh Apr 13, 2023
db3bfd3
Added TODO for future refactoring
nikpodsh Apr 13, 2023
13b6e92
Added GlossaryRegistry for Union scheme
nikpodsh Apr 13, 2023
144dfea
Changed import in redshift module
nikpodsh Apr 13, 2023
d43b9b3
No need for Utils yet
nikpodsh Apr 13, 2023
39b244c
Fixed linting
nikpodsh Apr 13, 2023
cb3800a
Datasets refactoring
nikpodsh Apr 14, 2023
dd8e597
Datasets refactoring
nikpodsh Apr 14, 2023
8ca7bea
Datasets refactoring
nikpodsh Apr 14, 2023
e36ab3b
Datasets refactoring
nikpodsh Apr 14, 2023
31720c2
Datasets refactoring
nikpodsh Apr 14, 2023
8a907df
Datasets refactoring
nikpodsh Apr 14, 2023
561da72
Datasets refactoring
nikpodsh Apr 14, 2023
73c8150
Datasets refactoring
nikpodsh Apr 17, 2023
56a3610
Datasets refactoring
nikpodsh Apr 17, 2023
47a38cc
Datasets refactoring
nikpodsh Apr 17, 2023
dbb5517
Datasets refactoring
nikpodsh Apr 17, 2023
a3def13
Merge branch 'datasets-mod-part2' into datasets-mod-part3
nikpodsh Apr 17, 2023
b256678
Datasets refactoring
nikpodsh Apr 17, 2023
66b5ddb
Datasets refactoring
nikpodsh Apr 17, 2023
352d824
Datasets refactoring
nikpodsh Apr 17, 2023
9934a9c
Datasets refactoring
nikpodsh Apr 17, 2023
228c175
Datasets refactoring
nikpodsh Apr 18, 2023
263d10c
Datasets refactoring
nikpodsh Apr 19, 2023
417e6e5
Datasets refactoring
nikpodsh Apr 19, 2023
7aaff5b
Introduced Indexers
nikpodsh Apr 19, 2023
4e31b99
Extracted upsert_dataset_folders into DatasetLocationIndexer and rena…
nikpodsh Apr 19, 2023
b772812
Moved DatasetLocationIndexer into the dataset module
nikpodsh Apr 19, 2023
cd798e2
Moved DatasetStorageLocation methods to the service
nikpodsh Apr 20, 2023
b0e6a62
Renamed the service
nikpodsh Apr 20, 2023
27c6d79
Moved DatasetIndexer to modules
nikpodsh Apr 20, 2023
0e730ac
Created a dataset repository.
nikpodsh Apr 20, 2023
9ac7964
Moved DatasetTableIndexer
nikpodsh Apr 20, 2023
a1825ba
Fixed test mocking
nikpodsh Apr 20, 2023
d295485
Fixed circular import while half of the module is not migrate
nikpodsh Apr 20, 2023
005a5e7
Removed not used alarms
nikpodsh Apr 20, 2023
0fd7c02
Moved dataset table GraphQL api in modules
nikpodsh Apr 21, 2023
7030c82
Moved DatasetTable model to modules
nikpodsh Apr 21, 2023
ba45ca5
Moved delete_doc to BaseIndexer
nikpodsh Apr 21, 2023
dc8ff72
Lazy creation of connection to OpenSearch
nikpodsh Apr 21, 2023
c99ed58
Moved dataset GraphQL API to modules
nikpodsh Apr 21, 2023
9ddbaf3
Migrated DatasetService
nikpodsh Apr 24, 2023
ad845e7
Removed unused dataset method
nikpodsh Apr 24, 2023
2ac3ae7
Resolved code conflict
nikpodsh Apr 24, 2023
32be3ee
DatasetQualityRule is not used
nikpodsh Apr 24, 2023
f95566c
Moved the Dataset models to modules
nikpodsh Apr 24, 2023
d05196f
Moved a part of the code from deploying dataset stack
nikpodsh Apr 24, 2023
8b2accb
Moved a SNS dataset handler
nikpodsh Apr 25, 2023
fca218f
Merge remote-tracking branch 'origin/datasets-mod-part2' into dataset…
nikpodsh Apr 25, 2023
ef98aa0
Merge branch 'datasets-mod-part3' into datasets-mod-part4
nikpodsh Apr 25, 2023
ebac530
Merge branch 'datasets-mod-part4' into datasets-mod-part5
nikpodsh Apr 25, 2023
2cd19d2
Moved a method from Glue client to LF client
nikpodsh Apr 25, 2023
741238d
Extracted the common part of the code
nikpodsh Apr 25, 2023
81e7646
Extracted dataset part from Glue
nikpodsh Apr 25, 2023
f5752e9
glue.dataset.crawler.create action doesn't exist
nikpodsh Apr 25, 2023
ce0c4e0
Refactored dataset handlers
nikpodsh Apr 25, 2023
50ff85a
Move dataset related API to the dataset module
nikpodsh Apr 25, 2023
c4e9079
Got rid of datasets in votes
nikpodsh Apr 25, 2023
1227ca3
Extract share notification from notification API
nikpodsh Apr 26, 2023
09f7de9
Extracted dataset alarms
nikpodsh Apr 26, 2023
d446ff0
Moved bucket_policy_updater to datasets
nikpodsh Apr 26, 2023
1db0133
Moved MANAGE_DATASETS to modules
nikpodsh Apr 27, 2023
8d8d952
Moved dataset read permissions to modules
nikpodsh Apr 27, 2023
d6ec387
Moved dataset write permissions to modules
nikpodsh Apr 27, 2023
788e35f
Moved dataset table permissions to modules
nikpodsh Apr 27, 2023
584a04b
Moved dataset related policies
nikpodsh Apr 27, 2023
4dd3a2b
Moved dataset permissions for env
nikpodsh Apr 27, 2023
95cbf81
Added migration script for group environments
nikpodsh Apr 27, 2023
781bb08
Extracted data policy for dataset
nikpodsh Apr 27, 2023
1ec1235
Introduced GroupResourceManager
nikpodsh Apr 27, 2023
9f2748e
Added dataset data policy to import it with other stacks
nikpodsh Apr 27, 2023
86f4529
Fixed error and moved api import to API ModuleInterface
nikpodsh Apr 28, 2023
f4b1f67
Removed unused method
nikpodsh Apr 28, 2023
a52c145
Changed method for checking permissions
nikpodsh Apr 28, 2023
19cc9aa
Reduce number of parameters for dataset location service
nikpodsh Apr 28, 2023
cd218e2
Renamed files
nikpodsh Apr 28, 2023
3e24acb
Reduced number of parameters for the table service
nikpodsh Apr 28, 2023
b5eb774
Reduced number of parameters for the dataset service
nikpodsh Apr 28, 2023
2cd14e0
Merge remote-tracking branch 'upstream/modularization-main' into data…
nikpodsh May 2, 2023
a3d9676
Merge branch 'datasets-mod-part3' into datasets-mod-part4
nikpodsh May 2, 2023
c0aa53a
Merge branch 'datasets-mod-part4' into datasets-mod-part5
nikpodsh May 2, 2023
3ae1eca
Fixed all tests
nikpodsh May 3, 2023
f382a68
Review remarks
nikpodsh May 4, 2023
532ff0d
Added TODO
nikpodsh May 4, 2023
afcae66
Merge branch 'datasets-mod-part3' into datasets-mod-part4
nikpodsh May 4, 2023
abc301a
Merge branch 'datasets-mod-part4' into datasets-mod-part5
nikpodsh May 4, 2023
61d4eb3
After the merge of part3
nikpodsh May 4, 2023
1820bf2
Merge branch 'datasets-mod-part4' into datasets-mod-part5
nikpodsh May 4, 2023
9718e81
Merge commit
nikpodsh May 4, 2023
c2ac71c
Fixed all tests
nikpodsh May 5, 2023
082fec9
Review remarks
nikpodsh May 5, 2023
5c67ef4
Review remarks
nikpodsh May 8, 2023
57f1e1f
Review remarks
nikpodsh May 8, 2023
1ceb50e
Review remarks
nikpodsh May 8, 2023
0a5170e
Moved dataset constants
nikpodsh May 8, 2023
19377d6
Returned deleted methods and added triggering of alarms
nikpodsh May 8, 2023
ae0c611
Resolved cyclic import
nikpodsh May 9, 2023
a2e27a8
Moved glue script to dataset module and introduced extension for envi…
nikpodsh May 9, 2023
732f726
Renamed the method
nikpodsh May 9, 2023
90d6429
Fixed test and added registration of glue extension
nikpodsh May 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions backend/dataall/api/Objects/Environment/queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,18 +48,6 @@
)


listDatasetsCreatedInEnvironment = gql.QueryField(
name='listDatasetsCreatedInEnvironment',
type=gql.Ref('DatasetSearchResult'),
args=[
gql.Argument(name='environmentUri', type=gql.NonNullableType(gql.String)),
gql.Argument(name='filter', type=gql.Ref('DatasetFilter')),
],
resolver=list_datasets_created_in_environment,
test_scope='Dataset',
)


searchEnvironmentDataItems = gql.QueryField(
name='searchEnvironmentDataItems',
args=[
Expand Down
17 changes: 0 additions & 17 deletions backend/dataall/api/Objects/Environment/resolvers.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,23 +370,6 @@ def list_environment_group_permissions(
check_perm=True,
)


def list_datasets_created_in_environment(
context: Context, source, environmentUri: str = None, filter: dict = None
):
if not filter:
filter = {}
with context.engine.scoped_session() as session:
return db.api.Environment.paginated_environment_datasets(
session=session,
username=context.username,
groups=context.groups,
uri=environmentUri,
data=filter,
check_perm=True,
)


def list_shared_with_environment_data_items(
nikpodsh marked this conversation as resolved.
Show resolved Hide resolved
context: Context, source, environmentUri: str = None, filter: dict = None
):
Expand Down
16 changes: 1 addition & 15 deletions backend/dataall/api/Objects/Group/queries.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from ... import gql
from .resolvers import get_group, list_datasets_owned_by_env_group, list_data_items_shared_with_env_group, list_cognito_groups
from .resolvers import get_group, list_data_items_shared_with_env_group, list_cognito_groups

getGroup = gql.QueryField(
name='getGroup',
Expand All @@ -8,20 +8,6 @@
resolver=get_group,
)


listDatasetsOwnedByEnvGroup = gql.QueryField(
name='listDatasetsOwnedByEnvGroup',
type=gql.Ref('DatasetSearchResult'),
args=[
gql.Argument(name='environmentUri', type=gql.NonNullableType(gql.String)),
gql.Argument(name='groupUri', type=gql.NonNullableType(gql.String)),
gql.Argument(name='filter', type=gql.Ref('DatasetFilter')),
],
resolver=list_datasets_owned_by_env_group,
test_scope='Dataset',
)


listDataItemsSharedWithEnvGroup = gql.QueryField(
name='listDataItemsSharedWithEnvGroup',
args=[
Expand Down
17 changes: 0 additions & 17 deletions backend/dataall/api/Objects/Group/resolvers.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,23 +43,6 @@ def get_group(context, source, groupUri):
return Group(groupUri=groupUri, name=groupUri, label=groupUri)


def list_datasets_owned_by_env_group(
context, source, environmentUri: str = None, groupUri: str = None, filter: dict = None
):
if not filter:
filter = {}
with context.engine.scoped_session() as session:
return db.api.Environment.paginated_environment_group_datasets(
session=session,
username=context.username,
groups=context.groups,
envUri=environmentUri,
groupUri=groupUri,
data=filter,
check_perm=True,
)


def list_data_items_shared_with_env_group(
context, source, environmentUri: str = None, groupUri: str = None, filter: dict = None
):
Expand Down
13 changes: 7 additions & 6 deletions backend/dataall/api/Objects/ShareObject/resolvers.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
from ....api.context import Context
from ....aws.handlers.service_handlers import Worker
from ....db import models
from dataall.modules.datasets.db.models import DatasetStorageLocation, DatasetTable
from dataall.modules.datasets.db.models import DatasetStorageLocation, DatasetTable, Dataset
from dataall.modules.datasets.services.dataset_service import DatasetService

log = logging.getLogger(__name__)

Expand All @@ -19,7 +20,7 @@ def get_share_object_dataset(context, source, **kwargs):
share: models.ShareObject = session.query(models.ShareObject).get(
source.shareUri
)
return session.query(models.Dataset).get(share.datasetUri)
return session.query(Dataset).get(share.datasetUri)


def create_share_object(
Expand All @@ -32,7 +33,7 @@ def create_share_object(
):

with context.engine.scoped_session() as session:
dataset: models.Dataset = db.api.Dataset.get_dataset_by_uri(session, datasetUri)
dataset: Dataset = DatasetService.get_dataset_by_uri(session, datasetUri)
environment: models.Environment = db.api.Environment.get_environment_by_uri(
session, input['environmentUri']
)
Expand Down Expand Up @@ -222,7 +223,7 @@ def resolve_user_role(context: Context, source: models.ShareObject, **kwargs):
if not source:
return None
with context.engine.scoped_session() as session:
dataset: models.Dataset = db.api.Dataset.get_dataset_by_uri(session, source.datasetUri)
dataset: Dataset = DatasetService.get_dataset_by_uri(session, source.datasetUri)
if dataset and dataset.stewards in context.groups:
return ShareObjectPermission.Approvers.value
if (
Expand Down Expand Up @@ -250,7 +251,7 @@ def resolve_dataset(context: Context, source: models.ShareObject, **kwargs):
if not source:
return None
with context.engine.scoped_session() as session:
ds: models.Dataset = db.api.Dataset.get_dataset_by_uri(session, source.datasetUri)
ds: Dataset = DatasetService.get_dataset_by_uri(session, source.datasetUri)
if ds:
env: models.Environment = db.api.Environment.get_environment_by_uri(session, ds.environmentUri)
return {
Expand Down Expand Up @@ -292,7 +293,7 @@ def resolve_consumption_data(context: Context, source: models.ShareObject, **kwa
if not source:
return None
with context.engine.scoped_session() as session:
ds: models.Dataset = db.api.Dataset.get_dataset_by_uri(session, source.datasetUri)
ds: Dataset = DatasetService.get_dataset_by_uri(session, source.datasetUri)
if ds:
S3AccessPointName = utils.slugify(
source.datasetUri + '-' + source.principalId,
Expand Down
10 changes: 0 additions & 10 deletions backend/dataall/api/Objects/Stack/stack_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import requests

from .... import db
from ....api.context import Context
from ....aws.handlers.service_handlers import Worker
from ....aws.handlers.ecs import Ecs
from ....db import models
Expand Down Expand Up @@ -84,15 +83,6 @@ def deploy_stack(targetUri):
return stack


def deploy_dataset_stack(dataset: models.Dataset):
nikpodsh marked this conversation as resolved.
Show resolved Hide resolved
"""
Each dataset stack deployment triggers environment stack update
to rebuild teams IAM roles data access policies
"""
deploy_stack(dataset.datasetUri)
deploy_stack(dataset.environmentUri)


def delete_stack(
target_uri, accountid, cdk_role_arn, region
):
Expand Down
28 changes: 17 additions & 11 deletions backend/dataall/api/Objects/Vote/resolvers.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
from .... import db
from ....api.context import Context
from typing import Dict, Type

from dataall import db
from dataall.api.context import Context
from dataall.searchproxy.indexers import DashboardIndexer
from dataall.modules.datasets.indexers.dataset_indexer import DatasetIndexer
from dataall.searchproxy.base_indexer import BaseIndexer

_VOTE_TYPES: Dict[str, Type[BaseIndexer]] = {}


def add_vote_type(target_type: str, indexer: Type[BaseIndexer]):
_VOTE_TYPES[target_type] = indexer


def count_upvotes(
Expand All @@ -28,15 +36,9 @@ def upvote(context: Context, source, input=None):
data=input,
check_perm=True,
)
reindex(session, vote)
return vote


def reindex(session, vote):
if vote.targetType == 'dataset':
DatasetIndexer.upsert(session=session, dataset_uri=vote.targetUri)
elif vote.targetType == 'dashboard':
DashboardIndexer.upsert(session=session, dashboard_uri=vote.targetUri)
_VOTE_TYPES[vote.targetType].upsert(session, vote.targetUri)
return vote


def get_vote(context: Context, source, targetUri: str = None, targetType: str = None):
Expand All @@ -49,3 +51,7 @@ def get_vote(context: Context, source, targetUri: str = None, targetType: str =
data={'targetType': targetType},
check_perm=True,
)


# TODO should migrate after into the Dashboard module
add_vote_type("dashboard", DashboardIndexer)
nikpodsh marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 0 additions & 2 deletions backend/dataall/api/Objects/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
DataPipeline,
Environment,
Activity,
Dataset,
Group,
Principal,
Dashboard,
Expand Down Expand Up @@ -83,7 +82,6 @@ def adapted(obj, info, **kwargs):
response = resolver(
context=Namespace(
engine=info.context['engine'],
es=info.context['es'],
username=info.context['username'],
groups=info.context['groups'],
schema=info.context['schema'],
Expand Down
Loading