-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GEN-356] Use ServiceSpec for loading sources based on connectors #18322
Conversation
- use source classes that can be overridden in system profiles - use a manifest class instead of factory to specify which class to resolve for connectors - example usage can be seen in redshift and snowflake
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
) | ||
from metadata.utils.manifest import BaseManifest, get_class_path | ||
|
||
RedshiftManifest = BaseManifest(profler_class=get_class_path(RedshiftProfiler)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking since we control the framework and limit "meta" management of class import and what not when creating additional classes, can we just dynamically import the class based on the service type?
So if I need to add a new class to support some functionality just for Athena, then I just need to implement that class and I don't need to thinking about managing this manifest.py
file for the specific source.
I think it will help with the goal of minimizing the amount of pieces you need to touch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a dynamic import approach for all our classes raises 2 issues I can think of:
-
How does one implement a profiler for a CustomDatabase in this case?
The idea behind this implementation is to have a default map at the service type level and allow the contributors to have more flexibility when developing other modules for their connectors. -
Our dynamic module resolution is not type safe. One such example is that
BigQuery
is sometimes resolved asbigquery
,Bigquery
, orBigQuery
. A dynamic import path approach requires the user to have this knowledge and implementing each separate class with these conventions in mind. One can easily forget what is the right casing fora class name and use the wrong conventions when writing the class (for exampleBigQuerySource
instead ofBigquerySource
). Without type-safety the only way we can such errors is at runtime. With the current approach there is a single point of failure (the manifest). The connectors themselves are contained within a system that can be validated independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi let's stick with strict imports here. It adds a bit of more "boilerplate" when we're developing a new connector but removes magical layers from the code
- used super() dependency injection in order for system metrics source - formatting
- added docs for the new specification - added some pylint ignores in the importer module
- fixed postgres native lineage test
Quality Gate passed for 'open-metadata-ingestion'Issues Measures |
…8322) * ref(profiler): use di for system profile - use source classes that can be overridden in system profiles - use a manifest class instead of factory to specify which class to resolve for connectors - example usage can be seen in redshift and snowflake * - added manifests for all custom profilers - used super() dependency injection in order for system metrics source - formatting * - implement spec for all source types - added docs for the new specification - added some pylint ignores in the importer module * remove TYPE_CHECKING in core.py * - deleted valuedispatch function - deleted get_system_metrics_by_dialect - implemented BigQueryProfiler with a system metrics source - moved import_source_class to BaseSpec * - removed tests related to the profiler factory * - reverted start_time - removed DML_STAT_TO_DML_STATEMENT_MAPPING - removed unused logger * - reverted start_time - removed DML_STAT_TO_DML_STATEMENT_MAPPING - removed unused logger * fixed tests * format * bigquery system profile e2e tests * fixed module docstring * - removed import_side_effects from redshift. we still use it in postgres for the orm conversion maps. - removed leftover methods * - tests for BaseSpec - moved get_class_path to importer * - moved constructors around to get rid of useless kwargs * - changed test_system_metric * - added linage and usage to service_spec - fixed postgres native lineage test * add comments on collaborative constructors
Introducing the OpenMetadata service specification.
Main things to look out for:
import_x_class
in favor of a centralBaseSpec.get_for_source
which will provide the centralized repository for all the source's interfaces.system_metrics_computer_class
field in theSQAProfilerInterface
TODO
Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>