Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modularization: backend Pluginarchitecture #359

Merged
merged 97 commits into from
Mar 31, 2023

Conversation

nikpodsh
Copy link
Contributor

@nikpodsh nikpodsh commented Mar 9, 2023

This is a draft PR for showing purposes. There are still some minor issues that needs to be addressed.

Feature or Bugfix

  • Refactoring

Detail

There are following changes under this PR:

  1. Modularization + Refactoring of notebooks
    There are new modules that will play a major role in the future refactoring:

    • Core = contains the code need for application to operate correctly
    • Common = common code for all modules
    • Modules = the plugin/feature that can be inserted into the system (at the moment only notebooks)

    A small instruction how to import module can be found in backend/dataall/modules/init.py + see loader.py
    The other part that is related to modularization is the creation of environment parameters and resources.
    Environment parameter will replace all hardcoded parameters of the environment configuration.
    There is a new file - config.json that allows you to configure an application configuration.
    All existing parameters will be migrated via db migration in AWS

  2. Extracting permissions and request context (Optional for the modularization)

    Engine, user, and user groups had been passed as a parameter of context in the request. This had forced to pass a lot of parameters to other methods that weren't even needed. This information should be as a scope of the request session.
    There is a new way to retrieve the information using RequestContext.
    There is also a new way to use permission checks that require less parameters and make code cleaner. The old way was marked as deprecated

  3. Restructure of the code (Optional for the modularization)

    Since the modularization will touch all the places in the API code it can be a good change to set a new structure of the code. There are small re-organization in notebook module to address

    • Allocating the resources before the validating parameters
    • Not clear responsibility of the classes
    • Mixed layers

    There are new structure :

    • resolvers = validate and pass code to service layer
    • service layer = bisnesss logic
    • repositories = database logic (all queries should be placed here)
    • aws = contains a wrapper client upon boto3
    • cdk = all logic related to create stacks or code for ecs
    • tasks = code that will be executed in AWS lambda (short-living tasks)

    All names can be changed.

Relates

#295

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Created a modules
Moved GraphQL mapping of notebooks into modules
Added a bit of documentation
local.graphql.server.py doesn't follow the python convention of naming file
It makes impossible to debug the application
Added a way to load modules depending on config
Created a config file
Created a config representation in the backend
Returned local to graph_server.py to emphasize that it's a local server
Changed the name for cdk to be able to run debug correctly
Moved notebook.py and renamed it to stacks.py
Added documentation. Fixed config.json
Added loading of modules to cdkproxymain.py
Extracted the notebooks permissions from global permissions
Removed not used role
Added extracted sagemaker notebooks permissions
Extracted parameters into the dedicated table
Refactored the check if there are some resources that belongs to the environment
Added parameters to the db
Moved notebookEnabled from environment to parameters
Dropped the column
Fixed missed permissions
Added params scheme to gql
Fixed small issues
Changed a tests a little after introducing new changes
Tagging shouldn't have mapping inside the method,since it's not extendable
Target type can be resolved in stack
Changed the registration mechanism
Added minimum documentation
Sagemaker and ml studio notebooks shares the common code
It has been extracted
No need for re-initialization of permissions
Passing group permission as a parameter
Refactored the importing of subclasses
Sagemaker ML studio should still be loaded directly. It causes cycle dependency
Added input type for creation
Fixed permission loading
Removed a hack with the permissions for notebooks
listEnvironmentGroupInvitationPermissions should return correct set of permissions
Changed config as well since current working directory can be different
Moved sagemaker and renamed it to SagemakerClient
backend/dataall/core/permission_checker.py Outdated Show resolved Hide resolved
backend/dataall/core/environment/models.py Outdated Show resolved Hide resolved
@@ -106,27 +93,9 @@ def generate_policies(self) -> [aws_iam.ManagedPolicy]:

services = ServicePolicy.__subclasses__()

if permissions.CREATE_REDSHIFT_CLUSTER not in self.permissions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here with @dlpzx - we should keep the permissions inside the module. These are part of the business logic that is encapsulated inside module

backend/dataall/modules/loader.py Outdated Show resolved Hide resolved
The values represent a submodule and should exist
"""

API = "gql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in the end it will be hard to integrate those "forks". This is a good way to provide completely new modules but for extending existing there should be more granule extension points so that one can use composition or inheritance and modify only what is needed e.g. add additional field to db/api. If I copy the whole module it will be hard for me to follow upstream changes (even harder the it is now because those changes will be physically in other files). But probably this is smth to address as a next step

@dlpzx dlpzx marked this pull request as ready for review March 22, 2023 16:35
@dlpzx dlpzx changed the title Pluginarchitecture backend: Pluginarchitecture Mar 31, 2023
…inarchitecture

# Conflicts:
#	backend/requirements.txt
@nikpodsh nikpodsh merged commit 098081d into data-dot-all:modularization-main Mar 31, 2023
@dlpzx dlpzx changed the title backend: Pluginarchitecture modularization: backend Pluginarchitecture May 24, 2023
dlpzx added a commit to dlpzx/aws-dataall that referenced this pull request May 25, 2023
This is a draft PR for showing purposes. There are still some minor
issues that needs to be addressed.

### Feature or Bugfix
- Refactoring

### Detail
There are following changes under this PR:
1. Modularization + Refactoring of notebooks
There are new modules that will play a major role in the future
refactoring:

   * Core = contains the code need for application to operate correctly
   * Common = common code for all modules
* Modules = the plugin/feature that can be inserted into the system (at
the moment only notebooks)

The other part that is related to modularization is the creation of
environment parameters.
Environment parameter will replace all hardcoded parameters of the
environment configuration.
There is a new file - config.json that allows you to configure an
application configuration.
All existing parameters will be migrated via db migration in AWS

2. Extracting permissions and request context (Optional for the
modularization)

Engine, user, and user groups had been passed as a parameter of context
in the request. This had forced to pass a lot of parameters to other
methods that weren't even needed. This information should be as a scope
of the request session.
There is a new way to retrieve the information using `RequestContext.`
There is also a new way to use permission checks that require less
parameters and make code cleaner. The old way was marked as deprecated

3. Restructure of the code (Optional for the modularization)

Since the modularization will touch all the places in the API code it
can be a good change to set a new structure of the code. There are small
re-organization in notebook module to address

   * Allocating the resources before the validating parameters
   * Not clear responsibility of the classes
   * Mixed layers

   There are new structure :
   - resolvers = validate and pass code to service layer
   - service layer = bisnesss logic
   - repositories = database logic (all queries should be placed here)
   - aws = contains a wrapper client upon boto3
   - cdk = all logic related to create stacks or code for ecs
- tasks = code that will be executed in AWS lambda (short-living tasks)

   All names can be changed.

### Relates
[data-dot-all#295](data-dot-all#295)

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dlpzx <71252798+dlpzx@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants