Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(SIP-95): new endpoint for extra table metadata #28063

Merged
merged 6 commits into from
Apr 18, 2024

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Apr 16, 2024

SUMMARY

I started working on SIP-95, which introduces catalogs to Superset. The introduction of a new hierarchical level, catalog → schema → table, means that some API endpoints need to be updated to support a new parameter.

One of these API endpoints is used to fetch extra metadata about a given table, and the route looks like this:

/<int:pk>/table_extra/<path:table_name>/<schema_name>/

There are a few problems with the current route:

  • table_name is a path, because table names can contain slashes.
  • schema_name is often passed as null or undefined, so we need special logic to convert it to None in the backend.
  • schema_name should never contain slashes, otherwise the parsing will be wrong.

Because of this, instead of enhancing the existing endpoint to add support for catalogs, I decided to create a new one:

/<int:pk>/table_metadata/extra/?table=t[&schema=s[&catalog=c]]

In this new endpoint, table, schema, and catalog are all passed as query parameters. This allows slashes in all of them, and prevents polluting logs with confusing URLs such as /api/v1/databses/1/table_extra/my_table/undefined/. Only table is required, the other two are optional.

The API calls a new method get_extra_table_metadata in the corresponding DB engine spec. The new method takes a Table object, which contains a name and optional schema/catalog. To prevent breaking any 3rd-party DB engine specs the new method will issue a warning and call extra_table_metadata if it exists and no catalog was specified.

Note that while the old endpoint is not removed in this PR, to prevent breaking changes, the frontend code was updated to use the new endpoint.

In the future, we should remove the old endpoint and also the extra_table_metadata method. @michael-s-molina, what do you think of this approach of introducing a new API? Should we mark it as deprecated now and remove it in the next release, or should we mark is as deprecated in 5.0 and remove it in 6.0?

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

I updated existing tests and added new unit tests.

ADDITIONAL INFORMATION

  • Has associated issue: [SIP-95] Proposal for Catalog Support in SQL Lab #22862
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the api Related to the REST API label Apr 16, 2024
@michael-s-molina
Copy link
Member

michael-s-molina commented Apr 17, 2024

Should we mark it as deprecated now and remove it in the next release, or should we mark is as deprecated in 5.0 and remove it in 6.0?

You can mark it as deprecated now. Please add a proposal here to remove the old endpoint in 5.0.

In this new endpoint, table, schema, and catalog are all passed as query parameters. This allows slashes in all of them, and prevents polluting logs with confusing URLs such as /api/v1/databses/1/table_extra/my_table/undefined/.

Nesting resources or not is an architectural decision, with trade-offs, that we need to make consistently in our API. Because we built the API one endpoint at a time, we don't have consistency in nomenclature, response types, params, etc. I'm hoping to fix all these problems with v2 where we would analyze all existing endpoints and propose a consistent API in a SIP. For that reason, any pattern you choose now is acceptable and will be reviewed in v2.

@betodealmeida
Copy link
Member Author

Thanks, @michael-s-molina!

Nesting resources or not is an architectural decision, with trade-offs, that we need to make consistently in our API. Because we built the API one endpoint at a time, we don't have consistency in nomenclature, response types, params, etc. I'm hoping to fix all these problems with v2 where we would analyze all existing endpoints and propose a consistent API in a SIP. For that reason, any pattern you choose now is acceptable and will be reviewed in v2.

Oh, I'm not against nesting resources in general. The problem here is that table names can contain slashes, and I think we should assume that schemas and catalogs can too. Imagine we have this route:

/some/api/<path:catalog>/<path:schema>/<path:table>/

And we have a table with slashes in the catalog, schema, and name, which we escape with %2F when doing a request:

/some/api/my%2Fcatalog/my%2Fschema/my%2Ftable/

What the Flask router sees in the request is:

/some/api/my/catalog/my/schema/my/table/

Which gets mapped to:

  • catalog: my
  • schema: catalog
  • table: my/schema/my/table

So we'd have to double escape the slashes:

/some/api/my%252Fcatalog/my%252Fschema/my%252Ftable/

And the route would have unescape them back. For cases like this I think it's easier to pass the parameters as query arguments:

/some/api/?table=my/table&schema=my/schema&catalog=my/catalog

While longer, I think it's easier to read. And when some of those parameters are missing it gets better:

/some/api/?table=my/table

vs.

/some/api/undefined/undefined/my%252Ftable/

So something to think about when designing v2.

@michael-s-molina
Copy link
Member

Oh, I'm not against nesting resources in general. The problem here is that table names can contain slashes, and I think we should assume that schemas and catalogs can too.
So something to think about when designing v2.

Yep! That's indeed a requirement we should consider when designing v2.

superset/databases/api.py Outdated Show resolved Hide resolved
superset/databases/schemas.py Outdated Show resolved Hide resolved
superset/db_engine_specs/base.py Show resolved Hide resolved
col_names, latest_parts = cls.latest_partition(
table_name, schema_name, database, show_first=True, indexes=indexes
table.table,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be passing a Table instance everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should! I'm doing that in the next PR! 😊

try:
security_manager.raise_for_access(database=database, table=table)
except SupersetSecurityException as ex:
# instead of raising 403, raise 404 to hide table existence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

:return: Engine-specific table metadata
"""
# TODO: Fix circular import caused by importing Database
# old method that doesn't work with catalogs
if hasattr(cls, "extra_table_metadata"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional:
do:

if not hasattr(cls, "extra_table_metadata"):
    return {}
...

to reduce nesting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'll fix it in the next PR!

@betodealmeida betodealmeida merged commit 68a982d into master Apr 18, 2024
31 checks passed
qleroy pushed a commit to qleroy/superset that referenced this pull request Apr 28, 2024
@betodealmeida betodealmeida changed the title feat(sip-95): new endpoint for extra table metadata feat(SIP-95): new endpoint for extra table metadata May 2, 2024
jzhao62 pushed a commit to jzhao62/superset that referenced this pull request May 16, 2024
@rusackas rusackas deleted the new-extra-table-metadata-api branch September 27, 2024 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API preset-io size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants