forked from data-dot-all/dataall
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset Modularization pt.2 #1
Closed
Closed
Commits on Apr 14, 2023
-
Configuration menu - View commit details
-
Copy full SHA for cb3800a - Browse repository at this point
Copy the full SHA cb3800aView commit details -
Configuration menu - View commit details
-
Copy full SHA for dd8e597 - Browse repository at this point
Copy the full SHA dd8e597View commit details -
Renaming table_column_model to models to easier import other models
Configuration menu - View commit details
-
Copy full SHA for 8ca7bea - Browse repository at this point
Copy the full SHA 8ca7beaView commit details -
Configuration menu - View commit details
-
Copy full SHA for e36ab3b - Browse repository at this point
Copy the full SHA e36ab3bView commit details -
Moving dataset profiling service and renaming it
Configuration menu - View commit details
-
Copy full SHA for 31720c2 - Browse repository at this point
Copy the full SHA 31720c2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8a907df - Browse repository at this point
Copy the full SHA 8a907dfView commit details -
Deleted DatasetTableProfilingJob since could not find any usage of it
Configuration menu - View commit details
-
Copy full SHA for 561da72 - Browse repository at this point
Copy the full SHA 561da72View commit details
Commits on Apr 17, 2023
-
Returned the name to model after renaming the service
Configuration menu - View commit details
-
Copy full SHA for 47a38cc - Browse repository at this point
Copy the full SHA 47a38ccView commit details
Commits on Apr 24, 2023
-
Dataset Modularization pt.1 (data-dot-all#413)
### Feature or Bugfix - Refactoring (Modularization) ### Relates - Related issues data-dot-all#295 and data-dot-all#412 ### Short Summary First part of migration of `Dataset` (`DatasetTableColumn`) TL;DR :) ### Long Summary Datasets are huge. It's one of the central modules that's spread everywhere across the application. Migrating the entire Dataset piece would be very difficult task and, more importantly, even more difficult to review. Therefore, I decided to break down this work into "small" steps to make it more convenient to review. Dataset's API consist of the following items: * `Dataset` * `DatasetTable` * `DatasetTableColumn` * `DatasetLocation` * `DatasetProfiling` In this PR, there is only creation of `Dataset` module and migration of `DatasetTableColumn` (and some related to it pieces). Why? Because the plan was to migrate it, to see what issues would come up along with it and to address them here. The refactoring of `DatasetTableColumn` will be in other PR. The issues: 1) Glossaries 2) Feed 3) Long tasks for datasets 4) Redshift Glossaries rely on GraphQL UNION of different type (including datasets). Created an abstraction for glossary registration. There was an idea to change frontend, but it'd require a lot of work to do this Feed: same as glossaries. Solved the similar way. For feed, changing frontend API is more feasible, but I wanted it to be consistent with glossaries Long tasks for datasets. They migrated into tasks folder and doesn't require a dedicated loading for its code (at least for now). But there are two concerns: 1) The deployment uses a direct module folder references to run them (e.g. `dataall.modules.datasets....`, so basically when a module is deactivated, then we shouldn't deploy this tasks as well). I left a TODO for it to address in future (when we migrate all modules), but we should bear in mind that it might lead to inconsistencies. 2) There is a reference to `redshift` from long-running tasks = should be address in `redshift` module Redshift: it has some references to `datasets`. So there will be either dependencies among modules or small code duplication (if `redshift` doesn't rely hard on `datasets`) = will be addressed in `redshift` module Other changes: Fixed and improved some tests Extracted glue handler code that related to `DatasetTableColumn` Renamed import mode from tasks to handlers for async lambda. A few hacks that will go away with next refactoring :) Next steps: [Part2 ](#1) in preview :) Extract rest of datasets functionality (perhaps, in a few steps) Refactor extractor modules the same way as notebooks Extract tests to follow the same structure. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Configuration menu - View commit details
-
Copy full SHA for 3c4ab2d - Browse repository at this point
Copy the full SHA 3c4ab2dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ac3ae7 - Browse repository at this point
Copy the full SHA 2ac3ae7View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.