Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new import commands for dataset and databases #11670

Merged
merged 2 commits into from
Nov 17, 2020

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Nov 12, 2020

SUMMARY

This PR introduced 3 new commands:

  • superset.commands.importers.v1.ImportModelsCommand: a base class used for the v1 importers for databases, datasets (in this PR), dashboard, charts and saved queries (in future PRs). The base class validates the import bundle, checking that the metadata.yaml file that looks like this:
version: 1.0.0
type: Database
timestamp: '2020-11-04T21:27:44.423819+00:00'

Calling run on the class will start a database transaction block, calling the import_bundle method of each subclass. If there are any errors all imports are rolled back, preventing partial imports.

  • superset.databases.commands.importers.v1.ImportDatabasesCommand: a command that imports one or more databases present in the import bundle, as well as associated datasets. Because of this, in addition to validating databases present in the bundle, the command will also validate the datasets.

When importing a database, associated datasets are imported non-destructively: if a table foo currently has a column col, but the imported dataset doesn't have it the column will not be deleted.

  • superset.datasets.commands.importers.v1.ImportDatasetsCommand: a command that imports one or more datasets present in the import bundle, as well as associated databases. Similar to the previous command, it will validate the database configuration as well.

When importing a dataset we sync columns and metrics one way (from bundle to DB). This means that when importing an existing dataset all missing columns and metrics in the import bundle are removed. This behavior is similar to the current datasource CLI import in Superset when passing the --sync flag.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TEST PLAN

Added unit tests covering import, multiple imports, validation, and rollback in case of error, ensuring that the imports are atomic and idempotent.

ADDITIONAL INFORMATION

@codecov-io
Copy link

codecov-io commented Nov 12, 2020

Codecov Report

Merging #11670 (dc1386c) into master (45738ff) will increase coverage by 4.88%.
The diff coverage is 92.01%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11670      +/-   ##
==========================================
+ Coverage   62.25%   67.13%   +4.88%     
==========================================
  Files         874      895      +21     
  Lines       42348    43323     +975     
  Branches     3972     4015      +43     
==========================================
+ Hits        26363    29085    +2722     
+ Misses      15805    14136    -1669     
+ Partials      180      102      -78     
Flag Coverage Δ
cypress 56.00% <ø> (?)
javascript 62.82% <ø> (-0.03%) ⬇️
python 63.17% <92.01%> (+1.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/cli.py 40.53% <0.00%> (-0.28%) ⬇️
superset/datasets/commands/importers/v0.py 75.39% <16.66%> (-2.48%) ⬇️
superset/commands/importers/v1/utils.py 83.87% <83.87%> (ø)
...uperset/datasets/commands/importers/v1/__init__.py 92.75% <92.75%> (ø)
superset/datasets/commands/importers/v1/utils.py 92.85% <92.85%> (ø)
...perset/databases/commands/importers/v1/__init__.py 96.92% <96.92%> (ø)
superset/commands/importers/exceptions.py 100.00% <100.00%> (ø)
superset/databases/commands/importers/v1/utils.py 100.00% <100.00%> (ø)
superset/databases/schemas.py 100.00% <100.00%> (ø)
superset/datasets/schemas.py 96.39% <100.00%> (+1.95%) ⬆️
... and 232 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45738ff...dc1386c. Read the comment docs.

from superset.models.helpers import ImportExportMixin

METADATA_FILE_NAME = "metadata.yaml"
IMPORT_VERSION = "1.0.0"
Copy link
Member

@hughhhh hughhhh Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a process on when this would be upgraded to 1.0.1 vs. 2.0.0, would it be like everytime these files are changed? or whenever the underlying model in the app change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that if we bump the format to, say, 1.0.1 we would change the validation to accept anything between 1.0.0 and 1.0.1, and adjust the class to handle the differences. If we bump it to 2.0.0 we should right a new command under v2, so that we can still be able to import 1.0.0 with the v1 command.

def import_(
session: Session, config: Dict[str, Any], overwrite: bool = False,
) -> ImportExportMixin:
raise NotImplementedError("Subclasss MUST implement import_")
Copy link
Member

@hughhhh hughhhh Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise NotImplementedError("Subclasss MUST implement import_")
raise NotImplementedError("Subclass MUST implement import_")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, good catch! We need a spellchecker pre-commit hook.

session: Session, config: Dict[str, Any], overwrite: bool = False,
) -> SqlaTable:
existing = session.query(SqlaTable).filter_by(uuid=config["uuid"]).first()
if existing:
Copy link
Member

@hughhhh hughhhh Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we go if existing and not overwrite here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be treated separately. If the imported table already exists there are 2 scenarios:

  • If overwrite is false them we shouldn't do anything, so return
  • if overwrite is true then we can continue, but first we need to set the current ID to replace insert of insert

So if we could write it:

if existing and not overwrite:
    return

if existing:
    config["id"] = existing.id

Which is a bit more verbose.

f'Missing file "{METADATA_FILE_NAME}" in contents'
)
content = self.contents[METADATA_FILE_NAME]
metadata = load_yaml(METADATA_FILE_NAME, content)
Copy link
Member

@hughhhh hughhhh Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
metadata = load_yaml(METADATA_FILE_NAME, content)
metadata = load_yaml(METADATA_FILE_NAME, self.contents[METADATA_FILE_NAME])

and remove the top line

@betodealmeida
Copy link
Member Author

Based on personal code review with @willbarrett:

  • got rid of inheritance
  • simplified the logic of the commands leaving only 1 private method
  • removed circular imports

@betodealmeida betodealmeida merged commit 7bc353f into apache:master Nov 17, 2020
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
* feat: commands for importing databases and datasets

* Refactor code
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 1.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants