[GEN-356] Use ServiceSpec for loading sources based on connectors #18322

sushi30 · 2024-10-18T09:15:27Z

Introducing the OpenMetadata service specification.

Main things to look out for:

Removal of the import_x_class in favor of a central BaseSpec.get_for_source which will provide the centralized repository for all the source's interfaces.
Breaking change documentation
Adding the system_metrics_computer_class field in the SQAProfilerInterface
New profiler implementations for redshift, biquery, snowflake to accomodate for the new system metric source
use source classes that can be overridden in system profiles
use a manifest class instead of factory to specify which class to resolve for connectors
example usage can be seen in redshift and snowflake

TODO

Implement the BigQuery system metric source class
run e2e tests (link to GA)

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

- use source classes that can be overridden in system profiles - use a manifest class instead of factory to specify which class to resolve for connectors - example usage can be seen in redshift and snowflake

github-actions · 2024-10-18T09:22:36Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

notion-workspace · 2024-10-18T09:23:46Z

Refactor python class management in ingestion

TeddyCr · 2024-10-18T10:15:07Z

ingestion/src/metadata/ingestion/source/database/redshift/manifest.py

+)
+from metadata.utils.manifest import BaseManifest, get_class_path
+
+RedshiftManifest = BaseManifest(profler_class=get_class_path(RedshiftProfiler))


I am thinking since we control the framework and limit "meta" management of class import and what not when creating additional classes, can we just dynamically import the class based on the service type?

So if I need to add a new class to support some functionality just for Athena, then I just need to implement that class and I don't need to thinking about managing this manifest.py file for the specific source.

I think it will help with the goal of minimizing the amount of pieces you need to touch.

Using a dynamic import approach for all our classes raises 2 issues I can think of:

How does one implement a profiler for a CustomDatabase in this case?
The idea behind this implementation is to have a default map at the service type level and allow the contributors to have more flexibility when developing other modules for their connectors.

Our dynamic module resolution is not type safe. One such example is that BigQuery is sometimes resolved as bigquery, Bigquery, or BigQuery. A dynamic import path approach requires the user to have this knowledge and implementing each separate class with these conventions in mind. One can easily forget what is the right casing fora class name and use the wrong conventions when writing the class (for example BigQuerySource instead of BigquerySource). Without type-safety the only way we can such errors is at runtime. With the current approach there is a single point of failure (the manifest). The connectors themselves are contained within a system that can be validated independently.

hi let's stick with strict imports here. It adds a bit of more "boilerplate" when we're developing a new connector but removes magical layers from the code

- used super() dependency injection in order for system metrics source - formatting

ingestion/src/metadata/utils/manifest.py

- added docs for the new specification - added some pylint ignores in the importer module

sushi30 · 2024-10-22T11:10:43Z

@pmbrull re __init

I changed how the constructors are called to remove this. LMK if this addresses your comment.

- fixed postgres native lineage test

sonarqubecloud · 2024-10-23T16:19:59Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
13 New issues
0 Accepted issues

Measures
0 Security Hotspots
45.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

…8322) * ref(profiler): use di for system profile - use source classes that can be overridden in system profiles - use a manifest class instead of factory to specify which class to resolve for connectors - example usage can be seen in redshift and snowflake * - added manifests for all custom profilers - used super() dependency injection in order for system metrics source - formatting * - implement spec for all source types - added docs for the new specification - added some pylint ignores in the importer module * remove TYPE_CHECKING in core.py * - deleted valuedispatch function - deleted get_system_metrics_by_dialect - implemented BigQueryProfiler with a system metrics source - moved import_source_class to BaseSpec * - removed tests related to the profiler factory * - reverted start_time - removed DML_STAT_TO_DML_STATEMENT_MAPPING - removed unused logger * - reverted start_time - removed DML_STAT_TO_DML_STATEMENT_MAPPING - removed unused logger * fixed tests * format * bigquery system profile e2e tests * fixed module docstring * - removed import_side_effects from redshift. we still use it in postgres for the orm conversion maps. - removed leftover methods * - tests for BaseSpec - moved get_class_path to importer * - moved constructors around to get rid of useless kwargs * - changed test_system_metric * - added linage and usage to service_spec - fixed postgres native lineage test * add comments on collaborative constructors

ref(profiler): use di for system profile

6c3fbf7

- use source classes that can be overridden in system profiles - use a manifest class instead of factory to specify which class to resolve for connectors - example usage can be seen in redshift and snowflake

sushi30 requested a review from a team as a code owner October 18, 2024 09:15

sushi30 marked this pull request as draft October 18, 2024 09:15

sushi30 had a problem deploying to test October 18, 2024 09:15 — with GitHub Actions Error

github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Oct 18, 2024

sushi30 changed the title ~~ref(profiler): use di for system profile~~ [GEN-356] ref(profiler): use di for system profile Oct 18, 2024

TeddyCr reviewed Oct 18, 2024

View reviewed changes

- added manifests for all custom profilers

6ccbf2e

- used super() dependency injection in order for system metrics source - formatting

sushi30 had a problem deploying to test October 18, 2024 10:21 — with GitHub Actions Failure

sushi30 temporarily deployed to test October 18, 2024 10:21 — with GitHub Actions Inactive

sushi30 had a problem deploying to test October 18, 2024 10:21 — with GitHub Actions Failure

sushi30 temporarily deployed to test October 18, 2024 10:21 — with GitHub Actions Inactive

sushi30 commented Oct 18, 2024

View reviewed changes

ingestion/src/metadata/utils/manifest.py Outdated Show resolved Hide resolved

sushi30 added 2 commits October 21, 2024 12:12

- implement spec for all source types

146025c

- added docs for the new specification - added some pylint ignores in the importer module

remove TYPE_CHECKING in core.py

92e53ce

sushi30 had a problem deploying to test October 21, 2024 10:22 — with GitHub Actions Error

Merge branch 'main' into system-metrics-refactor

f526351

sushi30 had a problem deploying to test October 21, 2024 10:26 — with GitHub Actions Failure

sushi30 had a problem deploying to test October 22, 2024 11:10 — with GitHub Actions Failure

sushi30 temporarily deployed to test October 22, 2024 11:10 — with GitHub Actions Inactive

sushi30 had a problem deploying to test October 22, 2024 11:10 — with GitHub Actions Failure

- changed test_system_metric

5dc797f

sushi30 temporarily deployed to test October 22, 2024 13:29 — with GitHub Actions Inactive

sushi30 had a problem deploying to test October 22, 2024 13:29 — with GitHub Actions Failure

- added linage and usage to service_spec

ead59e9

- fixed postgres native lineage test

sushi30 temporarily deployed to test October 23, 2024 08:13 — with GitHub Actions Inactive

sushi30 had a problem deploying to test October 23, 2024 08:13 — with GitHub Actions Failure

sushi30 changed the title ~~[GEN-356] ref(profiler): use di for system profile~~ [GEN-356] Use ServiceSpec for loading sources based on connectors Oct 23, 2024

sushi30 requested review from TeddyCr and pmbrull October 23, 2024 12:52

add comments on collaborative constructors

6d678cb

sushi30 temporarily deployed to test October 23, 2024 15:22 — with GitHub Actions Inactive

sushi30 had a problem deploying to test October 23, 2024 15:22 — with GitHub Actions Failure

TeddyCr approved these changes Oct 23, 2024

View reviewed changes

sushi30 merged commit 95982b9 into main Oct 24, 2024
19 of 21 checks passed

sushi30 deleted the system-metrics-refactor branch October 24, 2024 05:47

This was referenced Oct 28, 2024

Example of python dynamic class refactor #16444

Closed

MINOR Fix snowflake profiler by using case-insensitive strings #18438

Merged

ayush-shah mentioned this pull request Nov 8, 2024

BQ views Profiler; Errors on system metric computation #17681

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GEN-356] Use ServiceSpec for loading sources based on connectors #18322

[GEN-356] Use ServiceSpec for loading sources based on connectors #18322

sushi30 commented Oct 18, 2024 •

edited

Loading

github-actions bot commented Oct 18, 2024

notion-workspace bot commented Oct 18, 2024

TeddyCr Oct 18, 2024 •

edited

Loading

sushi30 Oct 18, 2024 •

edited

Loading

pmbrull Oct 24, 2024

sushi30 commented Oct 22, 2024

sonarqubecloud bot commented Oct 23, 2024

[GEN-356] Use ServiceSpec for loading sources based on connectors #18322

[GEN-356] Use ServiceSpec for loading sources based on connectors #18322

Conversation

sushi30 commented Oct 18, 2024 • edited Loading

Introducing the OpenMetadata service specification.

TODO

Describe your changes:

Type of change:

Checklist:

github-actions bot commented Oct 18, 2024

notion-workspace bot commented Oct 18, 2024

TeddyCr Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

sushi30 Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

pmbrull Oct 24, 2024

Choose a reason for hiding this comment

sushi30 commented Oct 22, 2024

sonarqubecloud bot commented Oct 23, 2024

Quality Gate passed for 'open-metadata-ingestion'

sushi30 commented Oct 18, 2024 •

edited

Loading

TeddyCr Oct 18, 2024 •

edited

Loading

sushi30 Oct 18, 2024 •

edited

Loading