Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed escaping SQL object names #836

Merged
merged 4 commits into from
Jan 26, 2024
Merged

Fixed escaping SQL object names #836

merged 4 commits into from
Jan 26, 2024

Conversation

larsgeorge-db
Copy link
Contributor

Changes

Implements handling of special characters in SQL names. Optionally escapes the catalog, schema, or table names with backticks for SQL queries.

Linked issues

Resolves #833

Tests

  • added unit tests

Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (ae54f76) 85.61% compared to head (f112427) 85.66%.
Report is 1 commits behind head on main.

Files Patch % Lines
...rc/databricks/labs/ucx/hive_metastore/locations.py 66.66% 0 Missing and 1 partial ⚠️
src/databricks/labs/ucx/hive_metastore/udfs.py 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #836      +/-   ##
==========================================
+ Coverage   85.61%   85.66%   +0.05%     
==========================================
  Files          41       42       +1     
  Lines        5212     5232      +20     
  Branches      950      953       +3     
==========================================
+ Hits         4462     4482      +20     
  Misses        536      536              
  Partials      214      214              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@dmoore247 dmoore247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!
Please consider testing unicode, kanji (Multi-language support, and embedded space chars).

assert "`a-b`.`c`.`d`" == cb.escape("a-b.c.d", False)
assert "`a`.`b-c`.`d`" == cb.escape("a.b-c.d", False)
assert "`a`.`b`.`c-d`" == cb.escape("a.b.c-d", False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!!!
Please consider testing unicode, kanji, embedded space characters.

This documentation isn't much help... https://docs.databricks.com/en/sql/language-manual/sql-ref-names.html#names

@nfx nfx changed the title Fixed escaping issues with object names. Fixed escaping SQL object names Jan 24, 2024
@@ -231,6 +233,28 @@ def _full_name(self) -> str:
"""
return f"{self._catalog}.{self._schema}.{self._table}"

@staticmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a function of it and call it escape_sql_identifier, it doesn't really belong to this class, as it turns out from number of CrawlerBase. prefixes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense, I was about to ask that. Where do we put functions? A new Python source in the root of the project? Or some utils subdir?

@daniel-martinez-db
Copy link

I have a pattern like this:
my_catalog.my-database.mytable_namev1.0 which should be escaped as
my_catalog.my-database.mytable_name_v1.0
Can that test be included please?

@larsgeorge-db
Copy link
Contributor Author

larsgeorge-db commented Jan 25, 2024

I have a pattern like this: my_catalog.my-database.mytable_namev1.0 which should be escaped as my_catalog.my-database.mytable_name_v1.0 Can that test be included please?

How would you handle this: cat.alog.schema.table? Is your naming assuming only the last part can have dots in it? Please clarify. Also, see the above linked Databricks Naming convention, it states dots are NOT allowed in names.

@daniel-martinez-db
Copy link

I have a pattern like this: my_catalog.my-database.mytable_namev1.0 which should be escaped as my_catalog.my-database.mytable_name_v1.0 Can that test be included please?

How would you handle this: cat.alog.schema.table? Is your naming assuming only the last part can have dots in it? Please clarify. Also, see the above linked Databricks Naming convention, it states dots are NOT allowed in names.

Thanks. The patterns I'm working with only has dots in the table name, although I think we can escape all of them (maybe not in the test, but in the actual code). It's true that DB does not allow for dots, but other catalogs do; I'm facing this issue as I'm crawling an external hive metastore which allows for dots. Thank you!

@larsgeorge-db larsgeorge-db self-assigned this Jan 25, 2024
@larsgeorge-db larsgeorge-db added this pull request to the merge queue Jan 26, 2024
@nfx nfx removed this pull request from the merge queue due to a manual request Jan 26, 2024
@nfx nfx merged commit bf75b50 into main Jan 26, 2024
7 checks passed
@nfx nfx deleted the fix/issue_833_escaping branch January 26, 2024 16:07
nfx added a commit that referenced this pull request Jan 26, 2024
* Added `databricks labs ucx alias` command to create a view of tables from one schema/catalog in another schema/catalog ([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan instance profiles identify AWS S3 access and save a CSV with permissions ([#817](#817)).
* Added total view counts in the assessment dashboard ([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the `assessment` workflow to improve testing and reduce redundancy.([#825](#825)).
* Added documentation for the assessment report ([#806](#806)).
* Fixed escaping for SQL object names ([#836](#836)).

Dependency updates:

 * Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0 ([#832](#832)).
@nfx nfx mentioned this pull request Jan 26, 2024
nfx added a commit that referenced this pull request Jan 26, 2024
* Added `databricks labs ucx alias` command to create a view of tables
from one schema/catalog in another schema/catalog
([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan
instance profiles identify AWS S3 access and save a CSV with permissions
([#817](#817)).
* Added total view counts in the assessment dashboard
([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the
`assessment` workflow to improve testing and reduce
redundancy.([#825](#825)).
* Added documentation for the assessment report
([#806](#806)).
* Fixed escaping for SQL object names
([#836](#836)).

Dependency updates:

* Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0
([#832](#832)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `databricks labs ucx alias` command to create a view of tables
from one schema/catalog in another schema/catalog
([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan
instance profiles identify AWS S3 access and save a CSV with permissions
([#817](#817)).
* Added total view counts in the assessment dashboard
([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the
`assessment` workflow to improve testing and reduce
redundancy.([#825](#825)).
* Added documentation for the assessment report
([#806](#806)).
* Fixed escaping for SQL object names
([#836](#836)).

Dependency updates:

* Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0
([#832](#832)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Function crawl_grants blows up when databases have special characters
4 participants