feat(ingest/snowflake): integrate snowflake-queries into main source #10905

hsheth2 · 2024-07-13T00:11:07Z

Follow up on #10835

Integrate the SnowflakeQueriesExtractor into the main snowflake source.
Allow for view lineage parsing without requiring query-based lineage/usage too.
Add remaining missing configs (usage configs, lazy_graph)
Add support for the known lineage mappings (e.g. snowflake external tables) to the snowflake-queries source.
Add a tool extractor + initial extractor implementations.
User -> urn mapping for email-based urns.
Remove dead code from the include_view_lineage flag.

Other changes

Refactors SnowflakeFilterMixin to use composition w/ SnowflakeFilter
Similar with SnowflakeIdentifierBuilder
Remove the SnowflakeCommonProtocol
Simplify SnowflakeStructuredReportMixin
Drastically reduce the number of methods in SnowflakeCommonMixin, using composition instead.
Improves human-readability of error/warning reporting throughout the source.

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Summary by CodeRabbit

New Features
- Introduced new configuration options and fields to enhance Snowflake ingestion capabilities.
Bug Fixes
- Improved error handling and reporting mechanisms across multiple Snowflake ingestion methods.
Refactor
- Streamlined the Snowflake ingestion pipeline by renaming and restructuring methods and parameters for better clarity and performance.
- Simplified test setup for Snowflake shares by removing redundant URN generation functions.
Performance
- Enhanced query extraction and processing for Snowflake data sources.

coderabbitai · 2024-07-13T00:11:12Z

Walkthrough

The changes enhance the Snowflake ingestion capabilities in DataHub. Key updates include adding new configuration options, restructuring the SnowflakeSchemaGenerator class, refining error handling, and improving query extraction methods. These modifications streamline schema resolution, enhance structured reporting, and simplify test configurations.

Changes

Files / Groups	Change Summary
`snowflake_config.py`	Added descriptions for `convert_urns_to_lowercase`, renamed and added fields for schema resolution.
`snowflake_schema_gen.py`	Updated class structure, method parameters, and error handling in `SnowflakeSchemaGenerator`.
`snowflake_queries.py`	Added imports, updated classes and methods for query extraction and filtering.
`snowflake_v2.py`	Introduced new classes, enhanced connection handling, and updated work unit generation.
`test_snowflake_shares.py`	Simplified `SnowflakeSharesHandler` instantiation in test cases.
`sql_parsing_aggregator.py`	Added parameters for schema resolution and updated schema necessity logic.

Amidst the code’s intricate dance,
Snowflake’s essence we enhance.
With queries deft and errors tamed,
Our ingestion beast is finely framed.
In the fields of data, we stand tall,
Celebrating changes, big and small!

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 10

Outside diff range, codebase verification and nitpick comments (5)

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py (3)
134-134: Ensure consistency in documentation and naming conventions.

The description for convert_urns_to_lowercase is somewhat unclear. It could be improved to better explain the implications of this setting on the integration with other sources.
-        description="Whether to convert dataset urns to lowercase.",
+        description="Determines whether dataset URNs should be converted to lowercase to ensure consistency across different sources.",
134-134: Validation logic for include_view_lineage needs clarification.

The validator for include_view_lineage throws a ValueError if include_table_lineage is not true while include_view_lineage is set. This dependency should be clearly documented in the description of include_view_lineage.
-        description="If enabled, populates the snowflake view->table and table->view lineages. Requires appropriate grants given to the role, and include_table_lineage to be True. view->table lineage requires Snowflake Enterprise Edition or above.",
+        description="If enabled, populates the snowflake view->table and table->view lineages. Requires `include_table_lineage` to be True and appropriate grants given to the role. View->table lineage requires Snowflake Enterprise Edition or above.",
209-215: Clarify the use of new configuration fields related to query extraction.

The introduction of use_queries_v2 and lazy_schema_resolver adds complexity. Ensure that the purpose and use cases of these fields are well-documented to avoid confusion.
+        description="If enabled, uses the new queries extractor to extract queries from snowflake. This is part of an experimental feature set aimed at improving performance and accuracy of schema resolution.",
+        description="If enabled, uses lazy schema resolution to resolve schemas for tables and views. This method delays schema fetching until absolutely necessary, which can improve performance in environments with a large number of schemas.",
metadata-ingestion/src/datahub/ingestion/api/source.py (1)
294-311: Introduced report_exc method for improved error handling with context management.

The new report_exc context manager method simplifies error handling by encapsulating the try-except logic, making the code cleaner and more maintainable. However, the comment on line 304 suggests dissatisfaction with the method's naming due to its non-obvious behavior.

Suggestion: Consider renaming or improving documentation.

Clarifying the method's behavior in its name or documentation could improve code readability and prevent misuse.
- # TODO: I'm not super happy with the naming here - it's not obvious that this
- # suppresses the exception in addition to reporting it.
+ # TODO: Consider renaming `report_exc` to `suppress_and_report_exc` to clarify that it suppresses and reports exceptions.
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (1)

Line range hint 116-423: Enhanced Snowflake query extraction logic.

The refactoring and enhancements in the Snowflake query extraction logic, including better error handling and structured reporting, improve the robustness and maintainability of the code. The use of modern Python features like dataclasses and structured error reporting aligns with best practices.

However, there are several TODO comments indicating areas for future improvement, such as stateful ingestion and handling of external tables.

Suggestion: Address TODO items in future iterations.

It would be beneficial to plan for addressing these TODO items in upcoming development cycles to ensure the completeness and scalability of the Snowflake ingestion functionality.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 643ba0c and 1578f31.

Files selected for processing (20)

metadata-ingestion/src/datahub/ingestion/api/source.py (8 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_assertion.py (5 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_lineage_v2.py (13 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_profiler.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (15 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_report.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_schema.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_schema_gen.py (28 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_shares.py (5 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_summary.py (4 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_tag.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_usage_v2.py (11 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_utils.py (7 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_v2.py (9 hunks)
metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/sql/sql_config.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source_report/ingestion_stage.py (1 hunks)
metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (5 hunks)
metadata-ingestion/tests/unit/test_snowflake_shares.py (5 hunks)

Files skipped from review due to trivial changes (3)

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_schema.py
metadata-ingestion/src/datahub/ingestion/source_report/ingestion_stage.py
metadata-ingestion/tests/unit/test_snowflake_shares.py

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_utils.py

162-170: Return the negated condition directly

Inline condition

(SIM103)

Additional comments not posted (30)

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_assertion.py (5)

48-48: Refactor to use identifiers parameter in SnowflakeAssertionsHandler.

The introduction of the identifiers parameter in the constructor is a good use of composition over inheritance, aligning with the PR's objectives to simplify and refactor existing code.

53-53: Proper initialization of identifiers.

The identifiers attribute is properly initialized and stored for later use, which is consistent with the changes made in the constructor.

79-82: Dynamic platform instance URN generation.

This change dynamically generates the platform instance URN based on the configuration, which adds flexibility to handle different instances. This is a significant improvement in handling multi-instance scenarios.

97-97: Use of identifiers for dataset identification.

The method get_dataset_identifier from identifiers is used here to dynamically generate dataset identifiers, which is a cleaner and more modular approach than hardcoding or duplicating logic.

106-106: Dynamic generation of dataset URN.

The use of gen_dataset_urn from identifiers to dynamically generate dataset URNs is a good practice, ensuring consistency and reusability of URN generation logic across the codebase.

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_summary.py (1)

76-79: Introduction of SnowflakeIdentifierBuilder in SnowflakeSchemaGenerator.

The use of SnowflakeIdentifierBuilder to handle identifier generation is a positive change. It centralizes the logic for creating identifiers, which can help maintain consistency and reduce errors in identifier creation.

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_tag.py (1)

71-71: Use of identifiers for generating quoted identifiers.

The changes effectively utilize the identifiers object to generate quoted identifiers for databases, schemas, and tables. This ensures consistency and correctness in handling identifiers, which is crucial for accurate tag extraction.

Also applies to: 74-76, 82-82, 144-144

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_shares.py (2)

94-97: Improved error reporting for missing databases in shares configuration.

The addition of a structured warning message when databases referenced by the share configurations are not ingested is a good practice. It improves transparency and aids in troubleshooting.

114-114: Consistent use of identifiers for URN generation.

The consistent use of identifiers to generate dataset identifiers and URNs across different methods enhances modularity and reusability. This approach simplifies the maintenance and scalability of the code.

Also applies to: 117-117, 121-122, 142-142, 145-145, 148-149

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_report.py (2)

18-20: Addition of SnowflakeQueriesExtractorReport import.

The import of SnowflakeQueriesExtractorReport is crucial for the new functionality introduced in this PR. It's good to see that the new types are being integrated properly.

119-120: Addition of queries_extractor field in SnowflakeV2Report.

The new queries_extractor field in SnowflakeV2Report is optional and seems to be aligned with the PR's objective to enhance the Snowflake source with new extractor capabilities. This change supports the integration of SnowflakeQueriesExtractor into the main source, which is a key objective of the PR.

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_profiler.py (1)

88-88: New method for generating dataset names.

The addition of get_dataset_name method in SnowflakeProfiler is a significant enhancement. It abstracts the dataset naming logic into a single method, which improves modularity and maintainability. This change is well-aligned with the principles of clean code.

metadata-ingestion/src/datahub/ingestion/source/sql/sql_config.py (1)

86-86: Making include_views and include_tables required boolean fields.

The explicit requirement for include_views and include_tables to be boolean is a good practice, ensuring that the configuration is robust and less prone to errors. This change also clarifies the expected input types for these fields, enhancing the configuration's usability and maintainability.

Also applies to: 89-89

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_utils.py (1)

106-111: Introduction of SnowflakeFilter class.

The new SnowflakeFilter class encapsulates filtering logic, which is a positive step towards improving the modularity and maintainability of the code. This class uses the SnowflakeFilterConfig and SourceReport for its operations, which integrates well with the existing architecture.

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_lineage_v2.py (1)

431-445: Optimize Dataset Identifier Lookup

The method map_query_result_upstreams repeatedly calls self.identifiers.get_dataset_identifier_from_qualified_name inside a loop. Consider optimizing this by caching results or restructuring the code to reduce the frequency of these calls, especially if they are computationally expensive or involve network calls.

[performance]

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_schema_gen.py (9)

145-154: Refactor of Constructor Parameters and Composition

The refactoring to use composition over inheritance by introducing filters and identifiers directly in the constructor is a positive change. It enhances modularity and makes dependencies more explicit.

Also applies to: 163-164

192-196: Simplification of Identifier Methods

The methods gen_dataset_urn and snowflake_identifier are now simplified and directly use the identifiers object. This change improves readability and maintainability by reducing the complexity within the class.

216-219: Enhanced Error Handling with Structured Reporting

The use of structured_reporter.failure to handle exceptions related to permissions and database listing is a robust enhancement. It ensures that errors are not only logged but also categorized correctly, which can be crucial for debugging and monitoring.

Also applies to: 228-230, 239-239

Line range hint 282-292: Improved Warning Messages for Permission Issues

The addition of detailed warning messages when encountering permission issues during database operations is a good practice. It aids in troubleshooting by providing clear guidance on what might be wrong.

443-448: View Definition Parsing Enhancement

The integration of view definition parsing in the method _process_schema is a significant enhancement. It allows for better lineage tracking and understanding of view dependencies. The use of aggregator.add_view_definition is appropriate and aligns with the objectives of enhancing view lineage parsing.

463-466: Warning for Missing Tables/Views

The method _process_schema now includes a warning for schemas that do not contain tables or views. This is a helpful diagnostic tool for users to understand potential configuration or permission issues.

525-528: Error Handling in Table Fetching

The method fetch_tables_for_schema follows a consistent pattern in error handling as seen in other parts of the class. The structured reporting of errors helps in quick identification and resolution of issues related to permissions.

561-562: Consistent Warning Reporting for Table and Column Fetching

The methods _process_table, fetch_foreign_keys_for_table, and fetch_pk_for_table use structured reporting to warn about issues in fetching table details. This consistency in error handling across the class helps maintain a uniform approach to error reporting.

Also applies to: 596-597, 612-613

636-637: Warning for Missing Columns in Views

The method _process_view includes a warning for missing columns in views, which is crucial for ensuring data completeness and integrity. This aligns with the overall strategy of robust error handling in the class.

metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (2)

Line range hint 254-322: Refactor: Schema Resolver Initialization

The refactoring of the schema resolver initialization in the SqlParsingAggregator constructor is well-structured. The use of conditions to determine whether to use an explicitly provided resolver or to initialize one based on the graph client is clear and maintains separation of concerns.

However, ensure that the conditions for initializing the schema resolver are thoroughly tested, especially the transitions between using an explicitly provided resolver and the lazy-loading mechanism.

411-417: Check the Necessity of Schema Information

The method _need_schemas is a concise way to determine if schema information is necessary based on the features enabled. This method enhances readability and maintainability by centralizing the check.

Ensure that all features that require schema information are accounted for in this method to prevent runtime errors due to missing schema data.

metadata-ingestion/src/datahub/ingestion/api/source.py (2)

1-1: Added imports for context management and iteration.

The addition of contextlib and Iterator imports are appropriate for the new functionalities introduced in this file, specifically for the report_exc method which uses a context manager.

Also applies to: 14-14

Line range hint 102-116: Enhanced logging and error handling in structured logging.

The introduction of the stacklevel parameter and the adjustments in stack level calculations improve the accuracy of the source location in logs, which is beneficial for debugging. The use of structured logging is consistent with modern Python practices and enhances the readability and maintainability of log management.

Also applies to: 264-279, 288-291

metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1)

Line range hint 1-1: Removed unused field _s3_staging_dir.

The removal of the _s3_staging_dir field and its replacement with query_result_location aligns with the changes mentioned in the PR summary, focusing on simplification and modernization of configuration handling.

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (1)

Line range hint 66-97: Refactored configuration handling in Snowflake queries.

The integration of SnowflakeQueriesExtractorConfig with SnowflakeIdentifierConfig and SnowflakeFilterConfig into SnowflakeQueriesSourceConfig simplifies the configuration management by using inheritance, which is a good practice in object-oriented programming.

metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_schema_gen.py

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_utils.py

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_lineage_v2.py

coderabbitai · 2024-07-16T03:59:51Z

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_schema_gen.py

+                self.structured_reporter.warning(
                    "Failed to get views for schema",
                    f"{db_name}.{schema_name}",
+                    exc=e,


Error Handling in View Fetching

The method fetch_views_for_schema uses structured reporting for error handling, which is consistent with other parts of the code. However, the exception handling could be more specific to different types of errors, similar to the suggestion for fetch_schemas_for_database.

- except Exception as e: + except SnowflakePermissionError as spe: + self.structured_reporter.warning( + "Permission issues encountered while fetching views", + f"{db_name}.{schema_name}", + exc=spe, + ) + except Exception as e:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.structured_reporter.warning(

"Failed to get views for schema",

f"{db_name}.{schema_name}",

exc=e,

self.structured_reporter.warning(

"Failed to get views for schema",

f"{db_name}.{schema_name}",

exc=e,

)

except SnowflakePermissionError as spe:

self.structured_reporter.warning(

"Permission issues encountered while fetching views",

f"{db_name}.{schema_name}",

exc=spe,

)

except Exception as e:

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_usage_v2.py

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_v2.py

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1578f31 and abfc574.

Files selected for processing (2)

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_schema_gen.py (28 hunks)

Files skipped from review as they are similar to previous changes (2)

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_schema_gen.py

coderabbitai

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between abfc574 and dcb786f.

Files selected for processing (3)

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (16 hunks)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_v2.py (9 hunks)
metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (6 hunks)

Files skipped from review as they are similar to previous changes (2)

metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_v2.py

Additional comments not posted (4)

metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (4)

254-256: Refactor: Introduce schema_resolver and eager_graph_load parameters in __init__ method.

The addition of schema_resolver and eager_graph_load parameters allows for flexible schema resolution strategies. This is a positive change as it supports both eager and lazy loading of schemas based on the configuration, which can improve performance depending on the use case.

306-326: Enhancement: Schema Resolver Initialization Logic

The logic for initializing the schema_resolver is well-structured, supporting both explicit provision and lazy loading via the graph client. This flexibility is crucial for adapting to different deployment environments and configurations.

394-395: Duplicate Comment: Configuration Misuse

The previous comment about the misuse of BaseUsageConfig is still relevant. The use of a broad configuration class in a context where many properties are not applicable can lead to confusion and potential bugs.

415-421: Refactor: Simplify _need_schemas Property Logic

The updated logic in _need_schemas property now covers more scenarios where schema information might be necessary, enhancing the robustness of schema handling in different operational contexts.

coderabbitai · 2024-07-17T00:54:50Z

metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py

+        if self.generate_queries and not (
+            self.generate_lineage or self.generate_query_usage_statistics
+        ):
+            logger.warning(
+                "Queries will not be generated, as neither lineage nor query usage statistics are enabled"
+            )


Logic Issue: Potential Misleading Warning Message

The warning message about queries not being generated could be misleading because it is logged even when generate_queries is true but both generate_lineage and generate_query_usage_statistics are false. This could lead to confusion about the actual behavior of the system.

- if self.generate_queries and not ( - self.generate_lineage or self.generate_query_usage_statistics - ): - logger.warning( - "Queries will not be generated, as neither lineage nor query usage statistics are enabled" - ) + if not self.generate_lineage and not self.generate_query_usage_statistics: + if self.generate_queries: + logger.warning( + "Queries will not be generated, as neither lineage nor query usage statistics are enabled" + ) + else: + logger.debug( + "Neither lineage nor query usage statistics nor query generation are enabled" + )

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.generate_queries and not (

self.generate_lineage or self.generate_query_usage_statistics

):

logger.warning(

"Queries will not be generated, as neither lineage nor query usage statistics are enabled"

)

if not self.generate_lineage and not self.generate_query_usage_statistics:

if self.generate_queries:

logger.warning(

"Queries will not be generated, as neither lineage nor query usage statistics are enabled"

)

else:

logger.debug(

"Neither lineage nor query usage statistics nor query generation are enabled"

)

…atahub-project#10905)

* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820) * refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) (datahub-project#10764) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819) * feat(ingest/transformer): tags to terms transformer (datahub-project#10758) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822) * feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823) * feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824) * feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825) * add flag for includeSoftDeleted in scroll entities API (datahub-project#10831) * feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832) * feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826) * add scroll parameters to openapi v3 spec (datahub-project#10833) * fix(ingest): correct profile_day_of_week implementation (datahub-project#10818) * feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(cli): add more details to get cli (datahub-project#10815) * fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836) * fix(ingestion): fix datajob patcher (datahub-project#10827) * fix(smoke-test): add suffix in temp file creation (datahub-project#10841) * feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784) * feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645) Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> * docs(patch): add patch documentation for how implementation works (datahub-project#10010) Co-authored-by: John Joyce <john@acryl.io> * fix(jar): add missing custom-plugin-jar task (datahub-project#10847) * fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391) Co-authored-by: John Joyce <john@acryl.io> * docs(): Update posts.md (datahub-project#9893) Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * chore(ingest): update acryl-datahub-classify version (datahub-project#10844) * refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834) * fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849) * fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848) * fix(smoke-test): missing test for move domain (datahub-project#10837) * ci: update usernames to not considered for community (datahub-project#10851) * env: change defaults for data contract visibility (datahub-project#10854) * fix(ingest/tableau): quote special characters in external URL (datahub-project#10842) * fix(smoke-test): fix flakiness of auto complete test * ci(ingest): pin dask dependency for feast (datahub-project#10865) * fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542) * feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829) * chore(ingest): Mypy 1.10.1 pin (datahub-project#10867) * docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852) * docs: add new js snippet (datahub-project#10846) * refactor(ingestion): remove company domain for security reason (datahub-project#10839) * fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498) Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> * fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874) * fix(manage-tokens): fix manage access token policy (datahub-project#10853) * Batch get entity endpoints (datahub-project#10880) * feat(system): support conditional write semantics (datahub-project#10868) * fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890) * feat(ingest/lookml): shallow clone repos (datahub-project#10888) * fix(ingest/looker): add missing dependency (datahub-project#10876) * fix(ingest): only populate audit stamps where accurate (datahub-project#10604) * fix(ingest/dbt): always encode tag urns (datahub-project#10799) * fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727) * fix(ingestion/looker): column name missing in explore (datahub-project#10892) * fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879) * feat(conditional-writes): misc updates and fixes (datahub-project#10901) * feat(ci): update outdated action (datahub-project#10899) * feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902) Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> * feat(ingest): add snowflake-queries source (datahub-project#10835) * fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906) * docs: add new company to adoption list (datahub-project#10909) * refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(ui) Finalize support for all entity types on forms (datahub-project#10915) * Index ExecutionRequestResults status field (datahub-project#10811) * feat(ingest): grafana connector (datahub-project#10891) Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916) * feat(dataset): add support for external url in Dataset (datahub-project#10877) * docs(saas-overview) added missing features to observe section (datahub-project#10913) Co-authored-by: John Joyce <john@acryl.io> * fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882) * fix(structured properties): allow application of structured properties without schema file (datahub-project#10918) * fix(data-contracts-web) handle other schedule types (datahub-project#10919) * fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * Add feature flag for view defintions (datahub-project#10914) Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> * feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884) * fix(airflow): add error handling around render_template() (datahub-project#10907) * feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830) * feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904) * fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845) * feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864) * feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813) * fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924) * fix(build): fix lint fix web react (datahub-project#10896) * fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912) * feat(ingest): report extractor failures more loudly (datahub-project#10908) * feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905) * fix(ingest): fix docs build (datahub-project#10926) * fix(ingest/snowflake): fix test connection (datahub-project#10927) * fix(ingest/lookml): add view load failures to cache (datahub-project#10923) * docs(slack) overhauled setup instructions and screenshots (datahub-project#10922) Co-authored-by: John Joyce <john@acryl.io> * fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903) * fix(entityservice): fix merging sideeffects (datahub-project#10937) * feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938) Co-authored-by: John Joyce <john@Johns-MBP.lan> * chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: John Joyce <john@Johns-MBP.lan> * Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939) * docs: add learning center to docs (datahub-project#10921) * doc: Update hubspot form id (datahub-project#10943) * chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941) * fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895) * fix(ingest/abs): split abs utils into multiple files (datahub-project#10945) * doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950) * fix(ingest/setup): feast and abs source setup (datahub-project#10951) * fix(connections) Harden adding /gms to connections in backend (datahub-project#10942) * feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952) * fix(docs): make graphql doc gen more automated (datahub-project#10953) * feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723) * fix(spark-lineage): default timeout for future responses (datahub-project#10947) * feat(datajob/flow): add environment filter using info aspects (datahub-project#10814) * fix(ui/ingest): correct privilege used to show tab (datahub-project#10483) Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> * feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955) * add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956) * fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966) * fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965) * fix(airflow/build): Pinning mypy (datahub-project#10972) * Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974) * fix(ingest/test): Fix for mssql integration tests (datahub-project#10978) * fix(entity-service) exist check correctly extracts status (datahub-project#10973) * fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982) * bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986) * fix(ui): Remove ant less imports (datahub-project#10988) * feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987) * feat(ingest/cli): init does not actually support environment variables (datahub-project#10989) * fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991) * feat(ingest/spark): Promote beta plugin (datahub-project#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967) * feat(ingest): add `check server-config` command (datahub-project#10990) * feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466) Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig. This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig. I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled. Fixes datahub-project#9705 * fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993) * fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771) * feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985) * feat(ingest): improve `ingest deploy` command (datahub-project#10944) * fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920) - allow excluding soft-deleted entities in relationship-queries - exclude soft-deleted members of groups * fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996) * doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984) Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro@acryl.io> * fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006) * fix(ui/ingest): Support invalid cron jobs (datahub-project#10998) * fix(ingest): fix graph config loading (datahub-project#11002) Co-authored-by: Pedro Silva <pedro@acryl.io> * feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011) * feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999) * feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935) * fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858) * feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * docs: standardize terminology to DataHub Cloud (datahub-project#11003) * fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013) * docs(slack) troubleshoot docs (datahub-project#11014) * feat(propagation): Add graphql API (datahub-project#11030) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * feat(propagation): Add models for Action feature settings (datahub-project#11029) * docs(custom properties): Remove duplicate from sidebar (datahub-project#11033) * feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(propagation): Add Documentation Propagation Settings (datahub-project#11038) * fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040) * fix(ci): smoke test lint failures (datahub-project#11044) * docs: fix learning center color scheme & typo (datahub-project#11043) * feat: add cloud main page (datahub-project#11017) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662) Co-authored-by: John Joyce <john@acryl.io> * docs: fix typo (datahub-project#11046) * fix(lint): apply spotless (datahub-project#11050) * docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034) * feat(cli): Add run-id option to put sub-command (datahub-project#11023) Adds an option to assign run-id to a given put command execution. This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline. * fix(ingest): improve sql error reporting calls (datahub-project#11025) * fix(airflow): fix CI setup (datahub-project#11031) * feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039) * fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971) * (chore): Linting fix (datahub-project#11015) * chore(ci): update deprecated github actions (datahub-project#10977) * Fix ALB configuration example (datahub-project#10981) * chore(ingestion-base): bump base image packages (datahub-project#11053) * feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051) * fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910) * feat(ingest/tableau): add retry on timeout (datahub-project#10995) * change generate kafka connect properties from env (datahub-project#10545) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ingest): fix oracle cronjob ingestion (datahub-project#11001) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062) * feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041) * build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063) * docs(ingest): update developing-a-transformer.md (datahub-project#11019) * feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056) * feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * docs(airflow): update min version for plugin v2 (datahub-project#11065) * doc(ingestion/tableau): doc update for derived permission (datahub-project#11054) Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(py): remove dep on types-pkg_resources (datahub-project#11076) * feat(ingest/mode): add option to exclude restricted (datahub-project#11081) * fix(ingest): set lastObserved in sdk when unset (datahub-project#11071) * doc(ingest): Update capabilities (datahub-project#11072) * chore(vulnerability): Log Injection (datahub-project#11090) * chore(vulnerability): Information exposure through a stack trace (datahub-project#11091) * chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089) * chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088) * chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059) * chore(vulnerability): Overly permissive regex range (datahub-project#11061) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix: update customer data (datahub-project#11075) * fix(models): fixing the datasetPartition models (datahub-project#11085) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(docs-site): hiding learn more from cloud page (datahub-project#11097) * fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098) * docs: Refactor customer stories section (datahub-project#10869) Co-authored-by: Jeff Merrick <jeff@wireform.io> * fix(release): fix full/slim suffix on tag (datahub-project#11087) * feat(config): support alternate hashing algorithm for doc id (datahub-project#10423) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: John Joyce <john@acryl.io> * fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007) * fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * chore: Update contributors list in PR labeler (datahub-project#11105) * feat(ingest): tweak stale entity removal messaging (datahub-project#11064) * fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104) * fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080) * feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069) * docs: update graphql docs on forms & structured properties (datahub-project#11100) * test(search): search openAPI v3 test (datahub-project#11049) * fix(ingest/tableau): prevent empty site content urls (datahub-project#11057) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(entity-client): implement client batch interface (datahub-project#11106) * fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114) * fix(ingest): downgrade column type mapping warning to info (datahub-project#11115) * feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118) * fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111) * fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122) * fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092) * fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121) * fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366) * feat(ui): Changes to allow editable dataset name (datahub-project#10608) Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> * fix: remove saxo (datahub-project#11127) * feat(mcl-processor): Update mcl processor hooks (datahub-project#11134) * fix(openapi): fix openapi v2 endpoints & v3 documentation update * Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update" This reverts commit 573c1cb. * docs(policies): updates to policies documentation (datahub-project#11073) * fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139) * feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116) * fix(mutator): mutator hook fixes (datahub-project#11140) * feat(search): support sorting on multiple fields (datahub-project#10775) * feat(ingest): various logging improvements (datahub-project#11126) * fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(docs-site) cloud page spacing and content polishes (datahub-project#11141) * feat(ui) Enable editing structured props on fields (datahub-project#11042) * feat(tests): add md5 and last computed to testResult model (datahub-project#11117) * test(openapi): openapi regression smoke tests (datahub-project#11143) * fix(airflow): fix tox tests + update docs (datahub-project#11125) * docs: add chime to adoption stories (datahub-project#11142) * fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158) * fix(kafka-setup): add missing script to image (datahub-project#11190) * fix(config): fix hash algo config (datahub-project#11191) * test(smoke-test): updates to smoke-tests (datahub-project#11152) * fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193) * chore(kafka): kafka version bump (datahub-project#11211) * readd UsageStatsWorkUnit * fix merge problems * change logo --------- Co-authored-by: Chris Collins <chriscollins3456@gmail.com> Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: dushayntAW <158567391+dushayntAW@users.noreply.github.com> Co-authored-by: sagar-salvi-apptware <159135491+sagar-salvi-apptware@users.noreply.github.com> Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> Co-authored-by: Kevin Chun <kevin1chun@gmail.com> Co-authored-by: jordanjeremy <72943478+jordanjeremy@users.noreply.github.com> Co-authored-by: skrydal <piotr.skrydalewicz@gmail.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: sid-acryl <155424659+sid-acryl@users.noreply.github.com> Co-authored-by: Julien Jehannet <80408664+aviv-julienjehannet@users.noreply.github.com> Co-authored-by: Hendrik Richert <github@richert.li> Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: Felix Lüdin <13187726+Masterchen09@users.noreply.github.com> Co-authored-by: Pirry <158024088+chardaway@users.noreply.github.com> Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: cburroughs <chris.burroughs@gmail.com> Co-authored-by: ksrinath <ksrinath@users.noreply.github.com> Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com> Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: ipolding-cais <155455744+ipolding-cais@users.noreply.github.com> Co-authored-by: Tamas Nemeth <treff7es@gmail.com> Co-authored-by: Shubham Jagtap <132359390+shubhamjagtap639@users.noreply.github.com> Co-authored-by: haeniya <yanik.haeni@gmail.com> Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com> Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> Co-authored-by: 808OVADOZE <52988741+shtephlee@users.noreply.github.com> Co-authored-by: noggi <anton.kuraev@acryl.io> Co-authored-by: Nicholas Pena <npena@foursquare.com> Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> Co-authored-by: ethan-cartwright <ethan.cartwright.m@gmail.com> Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> Co-authored-by: Nadav Gross <33874964+nadavgross@users.noreply.github.com> Co-authored-by: Patrick Franco Braz <patrickfbraz@poli.ufrj.br> Co-authored-by: pie1nthesky <39328908+pie1nthesky@users.noreply.github.com> Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <130968841+joelmataKPN@users.noreply.github.com> Co-authored-by: Ellie O'Neil <110510035+eboneil@users.noreply.github.com> Co-authored-by: Ajoy Majumdar <ajoymajumdar@hotmail.com> Co-authored-by: deepgarg-visa <149145061+deepgarg-visa@users.noreply.github.com> Co-authored-by: Tristan Heisler <tristankheisler@gmail.com> Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io> Co-authored-by: Davi Arnaut <davi.arnaut@acryl.io> Co-authored-by: Pedro Silva <pedro@acryl.io> Co-authored-by: amit-apptware <132869468+amit-apptware@users.noreply.github.com> Co-authored-by: Sam Black <sam.black@acryl.io> Co-authored-by: Raj Tekal <varadaraj_tekal@optum.com> Co-authored-by: Steffen Grohsschmiedt <gitbhub@steffeng.eu> Co-authored-by: jaegwon.seo <162448493+wornjs@users.noreply.github.com> Co-authored-by: Renan F. Lima <51028757+lima-renan@users.noreply.github.com> Co-authored-by: Matt Exchange <xkollar@users.noreply.github.com> Co-authored-by: Jonny Dixon <45681293+acrylJonny@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: Pinaki Bhattacharjee <pinakipb2@gmail.com> Co-authored-by: Jeff Merrick <jeff@wireform.io> Co-authored-by: skrydal <piotr.skrydalewicz@acryl.io> Co-authored-by: AndreasHegerNuritas <163423418+AndreasHegerNuritas@users.noreply.github.com> Co-authored-by: jayasimhankv <145704974+jayasimhankv@users.noreply.github.com> Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> Co-authored-by: David Leifker <david.leifker@acryl.io>

hsheth2 added 2 commits July 12, 2024 15:24

support explicit schema resolver in aggregator

a0e7c97

call snowflake queries from main source

cf2b940

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jul 13, 2024

vercel bot deployed to Preview July 13, 2024 00:24 View deployment

hsheth2 changed the title ~~support explicit schema resolver in aggregator~~ feat(ingest/snowflake): integration snowflake-queries into main source Jul 13, 2024

hsheth2 changed the title ~~feat(ingest/snowflake): integration snowflake-queries into main source~~ feat(ingest/snowflake): integrate snowflake-queries into main source Jul 13, 2024

hsheth2 added 2 commits July 12, 2024 17:28

fix lint

fe69b78

improve timers

3bdfb16

vercel bot deployed to Preview July 13, 2024 00:53 View deployment

hsheth2 added 10 commits July 15, 2024 18:07

fix logging

b334d43

start refactoring SnowflakeCommonMixin

fb11cda

remove mroe methods from mixin

b0f6087

remove SnowflakeCommonProtocol

5542fe1

simplify structured reporting mixin

b4a0fce

only create one SnowflakeIdentifierBuilder

67417a0

fix athena lint

ab95f98

finish refactoring into SnowflakeIdentifierBuilder

8a1b481

support copy history in snowflake-queries

0ba7bc0

add reporting context manager

07533f1

hsheth2 marked this pull request as ready for review July 16, 2024 03:45

hsheth2 added 2 commits July 15, 2024 20:50

remove extra self.logger calls

db3199e

Merge branch 'master' into snowflake-queries-integration

1578f31

coderabbitai bot reviewed Jul 16, 2024

View reviewed changes

vercel bot deployed to Preview July 16, 2024 04:14 View deployment

Merge branch 'master' into snowflake-queries-integration

abfc574

coderabbitai bot reviewed Jul 16, 2024

View reviewed changes

vercel bot had a problem deploying to Preview July 16, 2024 23:05 Failure

fix some bugs

dcb786f

coderabbitai bot reviewed Jul 17, 2024

View reviewed changes

vercel bot had a problem deploying to Preview July 17, 2024 00:59 Failure

treff7es approved these changes Jul 17, 2024

View reviewed changes

hsheth2 merged commit bccfd8f into master Jul 17, 2024
56 of 58 checks passed

hsheth2 deleted the snowflake-queries-integration branch July 17, 2024 17:22

aviv-julienjehannet pushed a commit to aviv-julienjehannet/datahub that referenced this pull request Jul 25, 2024

feat(ingest/snowflake): integrate snowflake-queries into main source (d…

ff76d07

…atahub-project#10905)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest/snowflake): integrate snowflake-queries into main source #10905

feat(ingest/snowflake): integrate snowflake-queries into main source #10905

hsheth2 commented Jul 13, 2024 •

edited

Loading

coderabbitai bot commented Jul 13, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jul 16, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Jul 17, 2024

feat(ingest/snowflake): integrate snowflake-queries into main source #10905

feat(ingest/snowflake): integrate snowflake-queries into main source #10905

Conversation

hsheth2 commented Jul 13, 2024 • edited Loading

Checklist

Summary by CodeRabbit

coderabbitai bot commented Jul 13, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 16, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 17, 2024

Choose a reason for hiding this comment

hsheth2 commented Jul 13, 2024 •

edited

Loading

coderabbitai bot commented Jul 13, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)