Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1.1] - Bump size for FQN #12092

Merged
merged 24 commits into from
Jul 26, 2023
Merged

Conversation

pmbrull
Copy link
Collaborator

@pmbrull pmbrull commented Jun 22, 2023

Describe your changes:

Since we landed the FQN hash PR #11960, we can now support longer FQNs in the API.

Background

The process here between API <> Storage is as follows:

  1. New Entity is created via the API
  2. We set the FQN of the Entity based on its hierarchy (e.g., service.db.schema.table, or parent1.parent2.container).
  3. When going to store the Entity:
    1. The FQN gets split, e.g., service.db.schema.table -> service, db, schema, table.
    2. Each unit is hashed
    3. The result is joined together and stored in its hashed form in the db, e.g., hash1.hash2.hash3.hash4.
  4. When reading, we pick up the full FQN hash from the db, restore to its original name by pieces and join in the final FQN string.

Changes

Where we have variable / long sizes in FQNs is:

  • Tables (due to the Datalake connector, as the name is the path). Therefore this involves the FQNs of
    • Columns, built as tableFQN.ColumnName
    • Test Suites, built as tableFQN.TestSuite
  • Containers, since it gets built by concatenating parent.children together.

Container

For containers, we are setting the FQN size to 1024. This is why:

  • We are using MD5 as the hashing algorithm. Each MD5 transformation of a string gives us a 32 characters string.
  • MySQL max index size is 256 characters
  • This means we can store up to 256 / 32=8 levels of hashes in the db
  • EntityName is sized as 128 by default
  • The final FQN max size that the API can return is then 8 * 128 = 1024

We will then just support up to 8 parent <> children levels.

Table

We can follow a similar logic for table names. However, here it also makes sense to bump the EntityName since it will give us more room to ingest Datalake paths:

  • Setting EntityName to 516
  • Bumping FQN to 896 (bumped table name + schema + db + service)

Column FQN will be 1024 -> table FQN + column name (128)

Test Suite

Test Suites FQN will be 1024 -> Table FQN + .testSuite and to keep round numbers

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

@cypress
Copy link

cypress bot commented Jun 22, 2023

1 failed tests on run #25852 ↗︎

1 248 59 0 Flakiness 0

Details:

Bump FQN
Project: openmetadata Commit: 43fdc88478
Status: Failed Duration: 20:46 💡
Started: Jul 14, 2023 3:29 PM Ended: Jul 14, 2023 3:50 PM
Failed  cypress/e2e/Flow/TagsAddRemove.spec.js • 1 failed test • Tests with database mysql

View Output Video

Test Artifacts
Check if tags addition and removal flow working properly from tables > Adding and removing tags to the tables entity should work properly Output Screenshots Video

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings.

@pmbrull
Copy link
Collaborator Author

pmbrull commented Jul 11, 2023

@pmbrull, can you also change the Column Name length from 128 to 256 and address #7680 also in this patch?

done, thanks

@pmbrull pmbrull requested review from harshach and sureshms July 11, 2023 07:09
@pmbrull pmbrull linked an issue Jul 11, 2023 that may be closed by this pull request
@sonarqubecloud
Copy link

[open-metadata-ingestion] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@sonarqubecloud
Copy link

[OpenMetadata-Platform] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 30 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@harshach harshach merged commit 6773541 into open-metadata:main Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Ingestion safe to test Add this label to run secure Github workflows on PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Column name max length is too low
3 participants