Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: datasource access to allow more granular access to tables on SQL Lab #18064

Conversation

victorarbuesmallada
Copy link

SUMMARY

We'd like to have a more granular access to tables on SQL Lab, which currently can only be done at schema or database level. This PR would then tackle the story we raised a couple of days ago.

TESTING INSTRUCTIONS

Assign a datasource permission to a role and check that users with that role can only query that particular datasource.

ADDITIONAL INFORMATION

  • Has associated issue: Restrict SQL Lab access to datasource #18014
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@victorarbuesmallada victorarbuesmallada force-pushed the feature/datasource-access-sql-lab branch 2 times, most recently from 1283be9 to 7c7363e Compare January 17, 2022 10:20
@codecov
Copy link

codecov bot commented Jan 18, 2022

Codecov Report

Merging #18064 (ed7c36f) into master (14b9298) will decrease coverage by 0.17%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18064      +/-   ##
==========================================
- Coverage   66.34%   66.16%   -0.18%     
==========================================
  Files        1569     1588      +19     
  Lines       61685    62480     +795     
  Branches     6240     6240              
==========================================
+ Hits        40927    41343     +416     
- Misses      19161    19540     +379     
  Partials     1597     1597              
Flag Coverage Δ
hive 52.15% <50.00%> (-1.09%) ⬇️
mysql 81.32% <50.00%> (-0.77%) ⬇️
postgres 81.37% <50.00%> (-0.78%) ⬇️
presto 51.99% <50.00%> (-1.09%) ⬇️
python 81.80% <50.00%> (-0.79%) ⬇️
sqlite 81.06% <50.00%> (-0.78%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/security/manager.py 94.26% <33.33%> (+2.29%) ⬆️
superset/databases/filters.py 100.00% <100.00%> (ø)
superset/examples/multi_line.py 0.00% <0.00%> (-53.85%) ⬇️
superset/commands/importers/v1/examples.py 0.00% <0.00%> (-38.64%) ⬇️
superset/examples/big_data.py 0.00% <0.00%> (-35.00%) ⬇️
superset/examples/misc_dashboard.py 0.00% <0.00%> (-33.34%) ⬇️
superset/examples/utils.py 0.00% <0.00%> (-28.58%) ⬇️
superset/examples/tabbed_dashboard.py 0.00% <0.00%> (-27.59%) ⬇️
superset/db_engine_specs/teradata.py 62.75% <0.00%> (-27.25%) ⬇️
superset/examples/bart_lines.py 0.00% <0.00%> (-25.81%) ⬇️
... and 95 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14b9298...ed7c36f. Read the comment docs.

@geido geido requested a review from villebro January 18, 2022 16:05
@geido
Copy link
Member

geido commented Jan 18, 2022

Hey @Painyjames it's great that you went forward and implemented this. Let me ping a few people who can review this PR!

@betodealmeida
Copy link
Member

/testenv up

@github-actions
Copy link
Contributor

@betodealmeida Ephemeral environment spinning up at http://54.200.220.32:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@betodealmeida
Copy link
Member

betodealmeida commented Jan 19, 2022

I created a test user sqllab in the ephemeral environment above (password sqllab) and assigned it to the role sql_lab, and assigned the permission datasource access on [examples].[messages](id:10) to that role.

But I wasn't able to query the table:

SELECT * FROM examples.messages

On the browser console I can see the error:

{
  "errors": [
    {
      "message": "You need access to the following tables: `examples.messages`,\n            `all_database_access` or `all_datasource_access` permission",
      "error_type": "TABLE_SECURITY_ACCESS_ERROR",
      "extra": {
        "link": "",
        "tables": [
          "examples.messages"
        ]
      }
    }
  ]
}

Maybe @dpgaspar can shed some light here?

@victorarbuesmallada
Copy link
Author

victorarbuesmallada commented Jan 19, 2022

I created a test user sqllab in the ephemeral environment above (password sqllab) and assigned it to the role sql_lab, and assigned the permission datasource access on [examples].[messages](id:10) to that role.

But I wasn't able to query the table:

SELECT * FROM examples.messages

I think the reason why this might not work is because the permission is datasource access on [database].[table] but on the sql query we have SELECT * FROM schema.table.

In the acceptance tests we have against our patched superset container, we have the following:

  • permission datasource access on [postgres].[audit_logs]
  • sql editor query SELECT * FROM public.audit_logs

This works fine cause we only have one schema on our database, but we wouldn't know what would happen if we had several schemas with the same table on the same database.

In any case, this is still better than having only the schema permission that would grant access to all the tables under that schema, but honestly maybe we should consider generating permissions which are like this:

  • datasource access on [database].[schema].[table]

@victorarbuesmallada
Copy link
Author

victorarbuesmallada commented Jan 20, 2022

I've just test it out on the test environment and, even though couldn't load the schema, I was able to run a SQL statement against the allowed table.
Screenshot 2022-01-20 at 09 54 08
Screenshot 2022-01-20 at 09 16 25

@victorarbuesmallada
Copy link
Author

victorarbuesmallada commented Jan 20, 2022

Adding as well what we have locally with this PR's changes
Screenshot 2022-01-20 at 09 49 52
Screenshot 2022-01-20 at 09 44 57

Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick pass, left a comment. This is a delicate change, we should make sure this is backward compatible

schema_name = self.default_schema_backend_map[example_db.backend]
uri = f"superset/tables/{example_db.id}/{schema_name}/{table_name}/"
rv = self.client.get(uri)
self.assertEqual(rv.status_code, 200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test where the table is not allowed also?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 👍

@victorarbuesmallada
Copy link
Author

@villebro @betodealmeida @zhaoyongjie are you guys happy with this PR? :)

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice improvement! I left a minor readability comment, but the changes LGTM at face value. I agree with @dpgaspar that this is a delicate change, so we should make sure to test this properly. I will do some testing tomorrow.

superset/databases/filters.py Outdated Show resolved Hide resolved
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jan 27, 2022
@villebro
Copy link
Member

villebro commented Feb 1, 2022

/testenv up

@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2022

@villebro Ephemeral environment spinning up at http://35.87.122.84:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@villebro
Copy link
Member

villebro commented Feb 1, 2022

@Painyjames I just spun up an ephemeral envirnonment for testing - if it's not too much trouble, do you think it would be possible to add a few test users with relevant perms (at leat one with and one without datasource access) and instructions for testing so we can validate the changes here?

@victorarbuesmallada
Copy link
Author

victorarbuesmallada commented Feb 1, 2022

I've created a couple of users:

  • testcovidvaccines (same username and password), which has access to the covid_vaccines datasource.

Screenshot 2022-02-01 at 22 27 52

  • testnoaccess (same username and password), which has no access to any datasource.

@victorarbuesmallada
Copy link
Author

victorarbuesmallada commented Feb 1, 2022

Worth mentioning that, with the datasource access user, somehow I cannot select the schema and table on the dropdown lists, although querying the allowed datasource is still possible.
This doesn't happen locally when the datasource is coming from a postgres database (instead of sql lite) as you can see here.

@villebro
Copy link
Member

villebro commented Feb 4, 2022

Worth mentioning that, with the datasource access user, somehow I cannot select the schema and table on the dropdown lists, although querying the allowed datasource is still possible. This doesn't happen locally when the datasource is coming from a postgres database (instead of sql lite) as you can see here.

I believe this is an unrelated bug and might be resolved by #18564

Edit: it appears to be a perm thing after all. Let me see check the best way to handle this

@mayurnewase
Copy link
Contributor

mayurnewase commented Feb 4, 2022

in the issue, the requirement was to allow querying on physical table on db (without need to create dataset) but datasource permission here is used for dataset layer in superset.
So if dataset is not created here but table exists in db, is user expected to query it?

@mayurnewase
Copy link
Contributor

mayurnewase commented Feb 6, 2022

Also I think the reason it doesn't show schema in the dropdown for sqlite is when examples are loaded in sqlite, this attribute https://docs.sqlalchemy.org/en/14/core/reflection.html#sqlalchemy.engine.reflection.Inspector.default_schema_name gives None instead of main so those schemas get filtered when getting schemas accessible by user.
This PR #16041 seems to address that issue,but isn't working for sqlite.

@victorarbuesmallada
Copy link
Author

@mayurnewase @villebro is there any way this can go ahead with out the sql lite fix? just to know if there's going to be a dependency on that issue.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and I agree we can leave the sqlite issue to be fixed in a follow-up PR. As the added functionality is well in line with the current logic, I feel this is good to go. @mayurnewase do you feel there's risk in merging this? I would also love to get an approval from @dpgaspar and/or @betodealmeida before merging.

@mayurnewase
Copy link
Contributor

mayurnewase commented Feb 9, 2022

I tested on postgresql and worked as intended.

Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no blocking suggestion, other LGTM.

superset/databases/filters.py Outdated Show resolved Hide resolved
Co-authored-by: Yongjie Zhao <yongjiezhao@apache.org>
@zhaoyongjie
Copy link
Member

@Painyjames Could you fix this code style issue? thanks!

@zhaoyongjie zhaoyongjie merged commit 5ee070c into apache:master Feb 9, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2022

Ephemeral environment shutdown and build artifacts deleted.

@@ -25,21 +25,28 @@

class DatabaseFilter(BaseFilter):
# TODO(bogdan): consider caching.
def schema_access_databases(self) -> Set[str]: # noqa pylint: disable=no-self-use

def can_access_databases( # noqa pylint: disable=no-self-use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason why this can't be @staticmethod ?

@@ -237,13 +242,14 @@ def get_schema_perm( # pylint: disable=no-self-use

return None

def unpack_schema_perm( # pylint: disable=no-self-use
def unpack_database_and_schema( # pylint: disable=no-self-use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason why these can't be @staticmethod ?

@victorarbuesmallada victorarbuesmallada deleted the feature/datasource-access-sql-lab branch February 9, 2022 14:25
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.5.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants