-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new import commands for dataset and databases #11670
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11670 +/- ##
==========================================
+ Coverage 62.25% 67.13% +4.88%
==========================================
Files 874 895 +21
Lines 42348 43323 +975
Branches 3972 4015 +43
==========================================
+ Hits 26363 29085 +2722
+ Misses 15805 14136 -1669
+ Partials 180 102 -78
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
superset/commands/importers/v1.py
Outdated
from superset.models.helpers import ImportExportMixin | ||
|
||
METADATA_FILE_NAME = "metadata.yaml" | ||
IMPORT_VERSION = "1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a process on when this would be upgraded to 1.0.1 vs. 2.0.0, would it be like everytime these files are changed? or whenever the underlying model in the app change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that if we bump the format to, say, 1.0.1
we would change the validation to accept anything between 1.0.0
and 1.0.1
, and adjust the class to handle the differences. If we bump it to 2.0.0
we should right a new command under v2
, so that we can still be able to import 1.0.0
with the v1
command.
superset/commands/importers/v1.py
Outdated
def import_( | ||
session: Session, config: Dict[str, Any], overwrite: bool = False, | ||
) -> ImportExportMixin: | ||
raise NotImplementedError("Subclasss MUST implement import_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError("Subclasss MUST implement import_") | |
raise NotImplementedError("Subclass MUST implement import_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, good catch! We need a spellchecker pre-commit hook.
session: Session, config: Dict[str, Any], overwrite: bool = False, | ||
) -> SqlaTable: | ||
existing = session.query(SqlaTable).filter_by(uuid=config["uuid"]).first() | ||
if existing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't we go if existing and not overwrite
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be treated separately. If the imported table already exists there are 2 scenarios:
- If
overwrite
is false them we shouldn't do anything, soreturn
- if
overwrite
is true then we can continue, but first we need to set the current ID to replace insert of insert
So if we could write it:
if existing and not overwrite:
return
if existing:
config["id"] = existing.id
Which is a bit more verbose.
superset/commands/importers/v1.py
Outdated
f'Missing file "{METADATA_FILE_NAME}" in contents' | ||
) | ||
content = self.contents[METADATA_FILE_NAME] | ||
metadata = load_yaml(METADATA_FILE_NAME, content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
metadata = load_yaml(METADATA_FILE_NAME, content) | |
metadata = load_yaml(METADATA_FILE_NAME, self.contents[METADATA_FILE_NAME]) |
and remove the top line
Based on personal code review with @willbarrett:
|
* feat: commands for importing databases and datasets * Refactor code
SUMMARY
This PR introduced 3 new commands:
superset.commands.importers.v1.ImportModelsCommand
: a base class used for the v1 importers for databases, datasets (in this PR), dashboard, charts and saved queries (in future PRs). The base class validates the import bundle, checking that themetadata.yaml
file that looks like this:Calling
run
on the class will start a database transaction block, calling theimport_bundle
method of each subclass. If there are any errors all imports are rolled back, preventing partial imports.superset.databases.commands.importers.v1.ImportDatabasesCommand
: a command that imports one or more databases present in the import bundle, as well as associated datasets. Because of this, in addition to validating databases present in the bundle, the command will also validate the datasets.When importing a database, associated datasets are imported non-destructively: if a table
foo
currently has a columncol
, but the imported dataset doesn't have it the column will not be deleted.superset.datasets.commands.importers.v1.ImportDatasetsCommand
: a command that imports one or more datasets present in the import bundle, as well as associated databases. Similar to the previous command, it will validate the database configuration as well.When importing a dataset we sync columns and metrics one way (from bundle to DB). This means that when importing an existing dataset all missing columns and metrics in the import bundle are removed. This behavior is similar to the current datasource CLI import in Superset when passing the
--sync
flag.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TEST PLAN
Added unit tests covering import, multiple imports, validation, and rollback in case of error, ensuring that the imports are atomic and idempotent.
ADDITIONAL INFORMATION