Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect #10898

Merged

Conversation

Masterchen09
Copy link
Contributor

@Masterchen09 Masterchen09 commented Jul 12, 2024

@hsheth2 As discussed yesterday - here is the (draft) PR for you to check the issue with the native data types in the SQLAlchemy sources. 😊

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Summary by CodeRabbit

  • New Features

    • Enhanced schema field retrieval functionality across multiple data sources by introducing an Inspector parameter.
    • Improved data type handling and conversions in SQLAlchemy integrations.
  • Bug Fixes

    • Standardized the nativeDataType fields across various integrations by removing unnecessary parentheses and attributes.
  • Documentation

    • Updated JSON structures for better readability and usability by transitioning from string representations to structured object formats.
  • Chores

    • Updated Docker image references and port mappings for testing services.

Copy link
Contributor

coderabbitai bot commented Jul 12, 2024

Walkthrough

The updates across various modules significantly enhance the metadata ingestion system by introducing an inspector parameter to key functions, improving schema field retrieval and type handling with SQLAlchemy capabilities. JSON configurations have been refined for better readability and consistency, specifically in data type representation. These modifications streamline data processing and enhance integration with different SQL databases, resulting in a more efficient metadata management workflow.

Changes

Files Change Summary
.../athena.py, .../hive.py, .../trino.py Added inspector: Inspector parameter to get_schema_fields_for_column for improved schema retrieval.
.../hive_metastore.py Updated methods to include inspector parameter for enhanced schema handling.
.../sql_common.py, .../sqlalchemy_type_converter.py Integrated inspector into get_schema_fields and get_schema_fields_for_column for better type handling.
.../vertica.py Modified _process_projections and _process_models to include inspector parameter.
.../tests/integration/*.json Standardized nativeDataType fields and simplified JSON structure across multiple files.
.../docker-compose.yml Updated Docker image reference to saplabs/hanaexpress:latest and reduced port mappings.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Inspector
    participant SchemaRetriever

    User->>SchemaRetriever: Request schema fields
    SchemaRetriever->>Inspector: Inspect column types
    Inspector->>SchemaRetriever: Return inspected data
    SchemaRetriever-->>User: Return schema fields
Loading

🐰 "In the fields where data flows,
The inspector hops, and knowledge grows.
With types refined and JSON neat,
Our ingestion's now a tasty treat!
Let's celebrate this joyful tweak,
For schema magic, we now seek!" 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Jul 12, 2024
@@ -641,7 +642,7 @@ def _get_direct_raw_col_upstreams(

# Parse the column name out of the node name.
# Sqlglot calls .sql(), so we have to do the inverse.
normalized_col = sqlglot.parse_one(node.name).this.name
normalized_col = sqlglot.parse_one(node.name, dialect=dialect).this.name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test case that would have failed before but works with this change?

Copy link
Contributor Author

@Masterchen09 Masterchen09 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reproduce the issue we talked about in context of the SAP HANA view parsing and I think it wasn't caused by parse_one method: As the identifiers are qualified using the optimize method - and more important with the identify parameter set to true - the name of the node should always be correct, including the capitalization.

For the call to the sql method in the to_node method the dialect is also not provided (https://github.com/tobymao/sqlglot/blob/5df3f5292488df6a8e21abf3b49086c823797e78/sqlglot/lineage.py#L234 and https://github.com/tobymao/sqlglot/blob/5df3f5292488df6a8e21abf3b49086c823797e78/sqlglot/lineage.py#L285), therefore the capitalization should also not be changed by the sql method. Not sure if you meant this when you mentioned the sql method in Slack, but I think you are right - everything is correct.

I have also run some performance tests (using timeit) regarding the dialect instance which would implicitly be created by parse_one and there is basically no difference with and without the dialect instance.

edit: I have removed that part from the PR. :-)

@hsheth2
Copy link
Collaborator

hsheth2 commented Jul 12, 2024

FYI also seeing some errors in the tests e.g. Ingestion error: Can't generate DDL for NullType(); did you forget to specify a type on this Column?"

Might make sense to make a helper method like compile_field_native_type(inspector, type) that calls compile initially, but falls back to repr if an exception is thrown.

@Masterchen09
Copy link
Contributor Author

I don't think we should fallback to the repr function in case of an error: The data types which are returned by the reflection methods of SQLAlchemy are "produced" by the corresponding dialect and in general it should be possible to compile these data types by the dialect itself (otherwise the dialect would not produce these data types, right?).

The NullType is a special case, which can explicitly not be compiled and will result in a CompileError: "NullType will result in a CompileError if the compiler is asked to render the type itself [...]" (see here: https://docs.sqlalchemy.org/en/20/core/type_api.html#sqlalchemy.types.NullType)

I have added a utility function to sqlalchemy_type_converter.py which will return the __visit_name__ of the NullType (which is "null") in case NullType is suppied...I think this is better than having "NullType()" as the native data type.

if isinstance(column_type, types.NullType):
return types.NullType.__visit_name__

return column_type.compile(dialect=inspector.dialect)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should have a try catch around this - purely to ensure that we don't fail broadly if an underlying dialect throws an exception in .compile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a try/except, which will use the visit_name as a fallback and in case the data type is not visitable (which the data type should be, but who knows...) the repr of the data type. I would not expect to ever need this fallback, but to make sure that we are not failing because of the native data type it's probably better to have it.

@Masterchen09 Masterchen09 force-pushed the fix-sqlalchemy-data-types branch 3 times, most recently from 401014f to 9eec1d1 Compare July 25, 2024 10:12
@Masterchen09 Masterchen09 marked this pull request as ready for review July 27, 2024 19:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d85da39 and 1942fbb.

Files selected for processing (21)
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (3 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (7 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2 hunks)
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
  • metadata-ingestion/tests/integration/hana/docker-compose.yml (1 hunks)
  • metadata-ingestion/tests/integration/hana/hana_mces_golden.json (10 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (53 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json (9 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (10 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json (9 hunks)
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (7 hunks)
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (7 hunks)
  • metadata-ingestion/tests/integration/oracle/test_oracle.py (1 hunks)
  • metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (11 hunks)
  • metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json (11 hunks)
  • metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (26 hunks)
  • metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (26 hunks)
  • metadata-ingestion/tests/integration/trino/trino_mces_golden.json (18 hunks)
Files skipped from review due to trivial changes (3)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json
  • metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json
  • metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json
Additional comments not posted (106)
metadata-ingestion/tests/integration/hana/docker-compose.yml (1)

9-9: Verify the necessity of all the removed port mappings.

Reducing the port mappings can limit the functionality and accessibility of services interacting with the testhana container. Ensure that the removed ports are not required for any critical interactions.

metadata-ingestion/tests/integration/oracle/test_oracle.py (1)

27-27: Verify the necessity and correctness of the lambda function.

The lambda function added to the process method enforces the return of the string 'NUMBER'. Ensure that this change is necessary for the tests and correctly implemented.

metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (2)

172-172: Verify the correct integration and utilization of the inspector parameter.

The new inspector parameter is introduced to the get_schema_fields_for_column function. Ensure that it is correctly integrated and utilized throughout the function.


178-181: Verify the correct handling of the inspector parameter by the superclass method.

The call to the superclass method is updated to include the new inspector parameter. Ensure that the superclass method correctly handles this parameter and that the change does not introduce any issues.

metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4)

8-9: LGTM! New imports are necessary.

The new imports for Inspector and Visitable are required for the added functionality.


181-181: LGTM! Method signature update is necessary.

The addition of the inspector parameter enhances the function's capability to handle column types more robustly.


222-225: LGTM! Appropriate usage of the inspector parameter.

The inspector parameter is used correctly to get the native data type in the fallback description.


251-267: LGTM! Method signature update and new logic are necessary.

The addition of the inspector parameter and the handling of NullType improve the method's robustness. The try/except block ensures graceful handling of compilation errors.

metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (9)

150-150: LGTM! Standardized representation of INTEGER.

The nativeDataType has been updated from "INTEGER()" to "INTEGER", aligning it with a more conventional format.


162-162: LGTM! Simplified representation of VARCHAR(50).

The nativeDataType has been updated from "VARCHAR(length=50)" to "VARCHAR(50)", enhancing clarity and consistency.


174-174: LGTM! Simplified representation of VARCHAR(50).

The nativeDataType has been updated from "VARCHAR(length=50)" to "VARCHAR(50)", enhancing clarity and consistency.


186-186: LGTM! Simplified representation of VARCHAR(50).

The nativeDataType has been updated from "VARCHAR(length=50)" to "VARCHAR(50)", enhancing clarity and consistency.


198-198: LGTM! Simplified representation of VARCHAR(50).

The nativeDataType has been updated from "VARCHAR(length=50)" to "VARCHAR(50)", enhancing clarity and consistency.


210-210: LGTM! Standardized representation of FLOAT.

The nativeDataType has been updated from "FLOAT()" to "FLOAT", aligning it with a more conventional format.


326-326: LGTM! Standardized representation of INTEGER.

The nativeDataType has been updated from "INTEGER()" to "INTEGER", aligning it with a more conventional format.


338-338: LGTM! Simplified representation of VARCHAR(50).

The nativeDataType has been updated from "VARCHAR(length=50)" to "VARCHAR(50)", enhancing clarity and consistency.


350-350: LGTM! Standardized representation of INTEGER.

The nativeDataType has been updated from "INTEGER()" to "INTEGER", aligning it with a more conventional format.

metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1)

390-399: LGTM! Method signature update and internal call update are necessary.

The addition of the inspector parameter enhances the method's capability to handle column schemas more robustly. The internal call to the superclass method ensures that all necessary arguments are passed correctly.

metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1)

502-502: Ensure that all function calls to get_schema_fields_for_column match the new signature.

The code changes are approved.

However, ensure that all function calls to get_schema_fields_for_column match the new signature.

Verification successful

Ensure that all function calls to get_schema_fields_for_column match the new signature.

The code changes are approved. The verification confirms that the new function signature, including the inspector parameter, is consistently used across the codebase.

  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/trino.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'get_schema_fields_for_column'

Length of output: 83


Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --glob '*.py' -A 5 $'get_schema_fields_for_column'

Length of output: 5961

metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (9)

455-455: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from VARCHAR(length=500) to VARCHAR(500).


467-467: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from VARCHAR(length=200) to VARCHAR(200).


479-479: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from BIGINT() to BIGINT.


491-491: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from TEXT() to TEXT.


503-503: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from TEXT() to TEXT.


515-515: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from TIMESTAMP() to TIMESTAMP WITHOUT TIME ZONE.


527-527: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from VARCHAR(length=255) to VARCHAR(255).


539-539: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from VARCHAR(length=255) to VARCHAR(255).


551-551: LGTM! The change aligns with standard SQL syntax.

The nativeDataType has been correctly modified from JSON(astext_type=Text()) to JSON.

metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2)

472-477: Ensure that all method calls to _process_projections match the new signature.

The code changes are approved.

However, ensure that all method calls to _process_projections match the new signature.

Verification successful

All method calls to _process_projections match the new signature.

The code changes are approved.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all method calls to `_process_projections` match the new signature.

# Test: Search for the method usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'_process_projections'

Length of output: 75


Script:

#!/bin/bash
# Description: Verify all method calls to `_process_projections` match the new signature.

# Test: Search for the method usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'_process_projections'

Length of output: 1279


681-681: Ensure that all method calls to _process_models match the new signature.

The code changes are approved.

However, ensure that all method calls to _process_models match the new signature.

Verification successful

All method calls to _process_models match the new signature.

The code changes are correctly implemented and verified.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all method calls to `_process_models` match the new signature.

# Test: Search for the method usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'_process_models'

Length of output: 70


Script:

#!/bin/bash
# Description: Verify all method calls to `_process_models` match the new signature.

# Test: Search for the method usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'_process_models'

Length of output: 1164

metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (2)

261-261: Verify the impact of changing nativeDataType from "NUMBER(asdecimal=False)" to "NUMBER".

Ensure that this change aligns with the expected data type representation across the system and does not introduce any inconsistencies or errors.

Also applies to: 383-383, 508-508, 757-757, 879-879, 1004-1004


Line range hint 1-1008:
Verify the impact of removing multiple JSON objects.

Ensure that the removal of these entities does not affect the data ingestion process and that they are no longer relevant or have been replaced by a different mechanism.

metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (4)

524-524: LGTM! But verify the usage of the inspector parameter in the function body.

The addition of the inspector parameter to the function signature is approved.

Ensure that the inspector parameter is utilized correctly within the function body.


524-524: LGTM! But verify the usage of the inspector parameter in the function body.

The addition of the inspector parameter to the function signature is approved.

Ensure that the inspector parameter is utilized correctly within the function body.


757-759: LGTM! But verify the usage of the inspector parameter in the function body.

The addition of the inspector parameter to the function signature is approved.

Ensure that the inspector parameter is utilized correctly within the function body.


882-882: LGTM! But verify the usage of the inspector parameter in the function body.

The addition of the inspector parameter to the function signature is approved.

Ensure that the inspector parameter is utilized correctly within the function body.

metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (5)

261-261: Verify the impact of simplifying nativeDataType.

The change from "NUMBER(asdecimal=False)" to "NUMBER" improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.


383-383: Verify the impact of simplifying nativeDataType.

The change from "NUMBER(asdecimal=False)" to "NUMBER" improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.


508-508: Verify the impact of simplifying nativeDataType.

The change from "NUMBER(asdecimal=False)" to "NUMBER" improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.


757-757: Verify the impact of simplifying nativeDataType.

The change from "NUMBER(asdecimal=False)" to "NUMBER" improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.


879-879: Verify the impact of simplifying nativeDataType.

The change from "NUMBER(asdecimal=False)" to "NUMBER" improves readability and consistency. Ensure that this change does not affect downstream processing or interpretation of numeric data types.

metadata-ingestion/tests/integration/trino/trino_mces_golden.json (6)

259-259: Approved: Simplified data type representation.

The change from INTEGER() to INTEGER aligns with standard SQL data type definitions and simplifies the representation.


271-271: Approved: Simplified data type representation.

The change from VARCHAR(length=50) to VARCHAR(50) aligns with standard SQL data type definitions and simplifies the representation.


283-283: Approved: Simplified data type representation.

The change from VARCHAR(length=50) to VARCHAR(50) aligns with standard SQL data type definitions and simplifies the representation.


295-295: Approved: Simplified data type representation.

The change from VARCHAR(length=50) to VARCHAR(50) aligns with standard SQL data type definitions and simplifies the representation.


307-307: Approved: Simplified data type representation.

The change from JSON() to JSON aligns with standard SQL data type definitions and simplifies the representation.


531-531: Approved: Simplified data type representation.

The change from DATE() to DATE aligns with standard SQL data type definitions and simplifies the representation.

metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (6)

123-125: Approved: Necessary import for native data type handling.

The import statement for get_native_data_type_for_sqlalchemy_type is necessary for the changes made to handle native data types using SQLAlchemy's inspector.


794-794: Approved: Enhanced schema field retrieval.

The addition of the inspector parameter to the get_schema_fields method call allows for improved schema field retrieval and type handling using SQLAlchemy's inspector.


975-975: Approved: Enhanced schema field retrieval.

The addition of the inspector parameter to the get_schema_fields method signature enhances the method's ability to retrieve and handle schema fields using SQLAlchemy's inspector.


988-988: Approved: Enhanced schema field retrieval.

The addition of the inspector parameter to the get_schema_fields_for_column method call allows for improved schema field retrieval and type handling using SQLAlchemy's inspector.


1000-1000: Approved: Enhanced schema field retrieval.

The addition of the inspector parameter to the get_schema_fields_for_column method signature enhances the method's ability to retrieve and handle schema fields using SQLAlchemy's inspector.


1014-1019: Approved: Improved native data type handling.

The updated logic for determining the nativeDataType of a column by using the get_native_data_type_for_sqlalchemy_type function ensures that the native data type is derived correctly based on the SQLAlchemy type system, enhancing type safety and correctness.

metadata-ingestion/tests/integration/hana/hana_mces_golden.json (25)

8-15: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for customProperties enhances readability and usability.


29-30: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for status enhances readability and usability.


45-47: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for dataPlatformInstance enhances readability and usability.


61-64: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for subTypes enhances readability and usability.


79-80: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for browsePathsV2 enhances readability and usability.


95-102: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for customProperties enhances readability and usability.


117-118: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for status enhances readability and usability.


133-135: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for dataPlatformInstance enhances readability and usability.


149-151: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for subTypes enhances readability and usability.


167-168: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for container enhances readability and usability.


183-189: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for browsePathsV2 enhances readability and usability.


204-205: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for container enhances readability and usability.


Line range hint 238-259:
Good transition to structured JSON objects and enhanced profiling capabilities.

The change from a string-based representation to a structured JSON object for SchemaMetadata enhances readability and usability. The detailed statistical data for each field improves the profiling capabilities of the datasets.


341-343: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for subTypes enhances readability and usability.


359-361: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for domains enhances readability and usability.


377-387: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for browsePathsV2 enhances readability and usability.


402-403: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for container enhances readability and usability.


Line range hint 436-457:
Good transition to structured JSON objects and enhanced profiling capabilities.

The change from a string-based representation to a structured JSON object for SchemaMetadata enhances readability and usability. The detailed statistical data for each field improves the profiling capabilities of the datasets.


539-541: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for subTypes enhances readability and usability.


557-559: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for domains enhances readability and usability.


575-585: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for browsePathsV2 enhances readability and usability.


600-601: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for container enhances readability and usability.


Line range hint 634-655:
Good transition to structured JSON objects and enhanced profiling capabilities.

The change from a string-based representation to a structured JSON object for SchemaMetadata enhances readability and usability. The detailed statistical data for each field improves the profiling capabilities of the datasets.


725-727: Good transition to structured JSON objects.

The change from a string-based representation to a structured JSON object for subTypes enhances readability and usability.


743-745: **Good

metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (12)

150-150: Simplified data type representation.

The change from INTEGER() to INTEGER removes unnecessary parentheses, simplifying the data type representation.


162-162: Simplified data type representation.

The change from DATE() to DATE removes unnecessary parentheses, simplifying the data type representation.


174-174: More concise VARCHAR length definition.

The change from VARCHAR(length=14) to VARCHAR(14) uses a more concise form for defining the length of the VARCHAR type.


186-186: More concise VARCHAR length definition.

The change from VARCHAR(length=16) to VARCHAR(16) uses a more concise form for defining the length of the VARCHAR type.


198-198: Enhanced consistency in ENUM formatting.

The change from ENUM('M', 'F') to ENUM('M','F') removes spaces, enhancing consistency in formatting.


221-221: Simplified data type representation.

The change from DATE() to DATE removes unnecessary parentheses, simplifying the data type representation.


337-337: Simplified data type representation.

The change from INTEGER() to INTEGER removes unnecessary parentheses, simplifying the data type representation.


349-349: Simplified data type representation.

The change from INTEGER() to INTEGER removes unnecessary parentheses, simplifying the data type representation.


361-361: Simplified data type representation.

The change from DATE() to DATE removes unnecessary parentheses, simplifying the data type representation.


373-373: Simplified data type representation.

The change from DATE() to DATE removes unnecessary parentheses, simplifying the data type representation.


2459-2459: Enhanced consistency in SET formatting.

The change from SET('a', 'b', 'c', 'd') to SET('a','b','c','d') removes spaces, enhancing consistency in formatting.


2575-2575: More concise VARCHAR length definition.

The change from VARCHAR(length=50) to VARCHAR(50) uses a more concise form for defining the length of the VARCHAR type.

metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (6)

234-234: Verify the updated transient_lastddltime value.

Ensure that the new timestamp 1722106707 is correct and consistent with the expected format and context.


270-270: Verify the updated nativeDataType value.

Ensure that the new data type INTEGER is correct and consistent with the expected format and context.


474-474: Verify the updated transient_lastddltime value.

Ensure that the new timestamp 1722106711 is correct and consistent with the expected format and context.


508-508: Verify the updated nativeDataType value.

Ensure that the new data type VARCHAR is correct and consistent with the expected format and context.


756-756: Verify the updated transient_lastddltime value.

Ensure that the new timestamp 1722106709 is correct and consistent with the expected format and context.


790-790: Verify the updated nativeDataType value.

Ensure that the new data type VARCHAR is correct and consistent with the expected format and context.

metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (10)

247-247: Verify the consistency of transient_lastddltime values.

Ensure that the updated timestamp value for transient_lastddltime is consistent with other entries and follows the correct format.


283-283: LGTM! Ensure consistency across all entries.

The change from INTEGER() to INTEGER standardizes the data type representation. Verify that similar changes are applied consistently across other entries.


508-508: Verify the consistency of transient_lastddltime values.

Ensure that the updated timestamp value for transient_lastddltime is consistent with other entries and follows the correct format.


542-542: LGTM! Ensure consistency across all entries.

The change from VARCHAR() to VARCHAR standardizes the data type representation. Verify that similar changes are applied consistently across other entries.


767-767: Verify the consistency of transient_lastddltime values.

Ensure that the updated timestamp value for transient_lastddltime is consistent with other entries and follows the correct format.


801-801: LGTM! Ensure consistency across all entries.

The change from VARCHAR() to VARCHAR standardizes the data type representation. Verify that similar changes are applied consistently across other entries.


994-994: Verify the consistency of transient_lastddltime values.

Ensure that the updated timestamp value for transient_lastddltime is consistent with other entries and follows the correct format.


1028-1028: LGTM! Ensure consistency across all entries.

The change from VARCHAR() to VARCHAR standardizes the data type representation. Verify that similar changes are applied consistently across other entries.


1265-1265: Verify the consistency of transient_lastddltime values.

Ensure that the updated timestamp value for transient_lastddltime is consistent with other entries and follows the correct format.


1299-1299: LGTM! Ensure consistency across all entries.

The change from VARCHAR() to VARCHAR standardizes the data type representation. Verify that similar changes are applied consistently across other entries.

@@ -2,15 +2,11 @@
version: '3.4'
services:
testhana:
image: "store/saplabs/hanaexpress:2.00.054.00.20210603.1"
image: "saplabs/hanaexpress:latest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pinning the Docker image to a specific version.

Using the latest version of the Docker image can introduce variability and potential instability due to untested changes. Pinning to a specific version ensures stability and reproducibility.

-    image: "saplabs/hanaexpress:latest"
+    image: "saplabs/hanaexpress:2.00.054.00.20210603.1"
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
image: "saplabs/hanaexpress:latest"
image: "saplabs/hanaexpress:2.00.054.00.20210603.1"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1942fbb and dd3f27b.

Files selected for processing (21)
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (3 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (7 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2 hunks)
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
  • metadata-ingestion/tests/integration/hana/docker-compose.yml (1 hunks)
  • metadata-ingestion/tests/integration/hana/hana_mces_golden.json (10 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (53 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json (9 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (10 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json (9 hunks)
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (7 hunks)
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (7 hunks)
  • metadata-ingestion/tests/integration/oracle/test_oracle.py (1 hunks)
  • metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (11 hunks)
  • metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json (11 hunks)
  • metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (26 hunks)
  • metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (26 hunks)
  • metadata-ingestion/tests/integration/trino/trino_mces_golden.json (18 hunks)
Files skipped from review due to trivial changes (6)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json
  • metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json
  • metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json
  • metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json
  • metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json
  • metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json
Files skipped from review as they are similar to previous changes (14)
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/trino.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
  • metadata-ingestion/tests/integration/hana/docker-compose.yml
  • metadata-ingestion/tests/integration/hana/hana_mces_golden.json
  • metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json
  • metadata-ingestion/tests/integration/oracle/test_oracle.py
  • metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json
Additional comments not posted (11)
metadata-ingestion/tests/integration/trino/trino_mces_golden.json (11)

259-259: LGTM! Simplified nativeDataType representation.

The change from INTEGER() to INTEGER aligns with standard SQL data type representations and improves clarity.


271-271: LGTM! Simplified nativeDataType representation.

The change from VARCHAR(length=50) to VARCHAR(50) aligns with standard SQL data type representations and improves clarity.


283-283: LGTM! Simplified nativeDataType representation.

The change from VARCHAR(length=50) to VARCHAR(50) aligns with standard SQL data type representations and improves clarity.


295-295: LGTM! Simplified nativeDataType representation.

The change from VARCHAR(length=50) to VARCHAR(50) aligns with standard SQL data type representations and improves clarity.


307-307: LGTM! Simplified nativeDataType representation.

The change from JSON() to JSON aligns with standard SQL data type representations and improves clarity.


507-507: LGTM! Simplified nativeDataType representation.

The change from INTEGER() to INTEGER aligns with standard SQL data type representations and improves clarity.


519-519: LGTM! Simplified nativeDataType representation.

The change from INTEGER() to INTEGER aligns with standard SQL data type representations and improves clarity.


531-531: LGTM! Simplified nativeDataType representation.

The change from DATE() to DATE aligns with standard SQL data type representations and improves clarity.


543-543: LGTM! Simplified nativeDataType representation.

The change from DATE() to DATE aligns with standard SQL data type representations and improves clarity.


726-726: LGTM! Simplified nativeDataType representation.

The change from INTEGER() to INTEGER aligns with standard SQL data type representations and improves clarity.


738-738: LGTM! Simplified nativeDataType representation.

The change from VARCHAR(length=50) to VARCHAR(50) aligns with standard SQL data type representations and improves clarity.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between dd3f27b and 1a07c6d.

Files selected for processing (22)
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (3 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (7 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2 hunks)
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
  • metadata-ingestion/tests/integration/hana/docker-compose.yml (1 hunks)
  • metadata-ingestion/tests/integration/hana/hana_mces_golden.json (10 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (53 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json (9 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (10 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json (9 hunks)
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (7 hunks)
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (7 hunks)
  • metadata-ingestion/tests/integration/oracle/test_oracle.py (1 hunks)
  • metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (11 hunks)
  • metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json (11 hunks)
  • metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (26 hunks)
  • metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (26 hunks)
  • metadata-ingestion/tests/integration/trino/trino_mces_golden.json (18 hunks)
  • metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py (7 hunks)
Files skipped from review due to trivial changes (3)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json
  • metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json
  • metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json
Files skipped from review as they are similar to previous changes (17)
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/trino.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
  • metadata-ingestion/tests/integration/hana/docker-compose.yml
  • metadata-ingestion/tests/integration/hana/hana_mces_golden.json
  • metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json
  • metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json
  • metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json
  • metadata-ingestion/tests/integration/oracle/test_oracle.py
  • metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json
  • metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json
  • metadata-ingestion/tests/integration/trino/trino_mces_golden.json
Additional comments not posted (10)
metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py (5)

48-50: LGTM! But verify the function usage in the codebase.

The addition of the inspector parameter and its instantiation as a MagicMock with a DefaultDialect is correct.

However, ensure that all function calls to get_schema_fields_for_sqlalchemy_column match the new signature.

Verification successful

All function calls to get_schema_fields_for_sqlalchemy_column match the new signature.

The addition of the inspector parameter and its instantiation as a MagicMock with a DefaultDialect is correctly reflected in the function calls throughout the codebase.

  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
  • metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'

Length of output: 94


Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'get_schema_fields_for_sqlalchemy_column'

Length of output: 6596


86-87: LGTM! But verify the function usage in the codebase.

The addition of the inspector parameter and its instantiation as a MagicMock with a DefaultDialect is correct.

However, ensure that all function calls to get_schema_fields_for_sqlalchemy_column match the new signature.

Verification successful

Verified: All calls to get_schema_fields_for_sqlalchemy_column match the new signature.

The addition of the inspector parameter and its instantiation as a MagicMock with a DefaultDialect is correct and consistent throughout the codebase.

  • metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'

Length of output: 94


Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'get_schema_fields_for_sqlalchemy_column'

Length of output: 6596


66-68: LGTM! But verify the function usage in the codebase.

The addition of the inspector parameter and its instantiation as a MagicMock with a DefaultDialect is correct.

However, ensure that all function calls to get_schema_fields_for_sqlalchemy_column match the new signature.

Verification successful

Function usage verified!

All calls to get_schema_fields_for_sqlalchemy_column match the new signature with the inspector parameter.

  • metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'

Length of output: 94


Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 'get_schema_fields_for_sqlalchemy_column'

Length of output: 6596


111-113: LGTM! But verify the function usage in the codebase.

The addition of the inspector parameter and its instantiation as a MagicMock with a DefaultDialect is correct.

However, ensure that all function calls to get_schema_fields_for_sqlalchemy_column match the new signature.

Verification successful

Verified: All function calls to get_schema_fields_for_sqlalchemy_column match the new signature.

The addition of the inspector parameter is correctly reflected across the codebase.

  • metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'

Length of output: 94


Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg -t py -A 5 'get_schema_fields_for_sqlalchemy_column'

Length of output: 6592


22-24: LGTM! But verify the function usage in the codebase.

The addition of the inspector parameter and its instantiation as a MagicMock with a DefaultDialect is correct.

However, ensure that all function calls to get_schema_fields_for_sqlalchemy_column match the new signature.

Verification successful

Function usage verified!

The function get_schema_fields_for_sqlalchemy_column is correctly called with the inspector parameter in the codebase.

  • Confirmed occurrences in test cases and other parts of the codebase.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'get_schema_fields_for_sqlalchemy_column'

Length of output: 94


Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Find all Python files and search for the function usage. Expect: Only occurrences of the new signature.
fd -e py -x rg 'get_schema_fields_for_sqlalchemy_column' {}

Length of output: 620


Script:

#!/bin/bash
# Description: Verify all function calls to `get_schema_fields_for_sqlalchemy_column` match the new signature.

# Find all Python files and search for the function usage with more context lines. Expect: Only occurrences of the new signature.
fd -e py -x rg -A 10 -B 2 'get_schema_fields_for_sqlalchemy_column' {}

Length of output: 4341

metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (5)

261-261: Simplification of nativeDataType is correct.

The nativeDataType field is correctly simplified from "NUMBER(asdecimal=False)" to "NUMBER".


383-383: Simplification of nativeDataType is correct.

The nativeDataType field is correctly simplified from "NUMBER(asdecimal=False)" to "NUMBER".


508-508: Simplification of nativeDataType is correct.

The nativeDataType field is correctly simplified from "NUMBER(asdecimal=False)" to "NUMBER".


757-757: Simplification of nativeDataType is correct.

The nativeDataType field is correctly simplified from "NUMBER(asdecimal=False)" to "NUMBER".


879-879: Simplification of nativeDataType is correct.

The nativeDataType field is correctly simplified from "NUMBER(asdecimal=False)" to "NUMBER".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1a07c6d and 0e254e3.

Files selected for processing (26)
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py (3 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (7 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/trino.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (2 hunks)
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
  • metadata-ingestion/tests/integration/hana/docker-compose.yml (1 hunks)
  • metadata-ingestion/tests/integration/hana/hana_mces_golden.json (10 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json (49 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json (9 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json (10 hunks)
  • metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json (9 hunks)
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json (7 hunks)
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json (7 hunks)
  • metadata-ingestion/tests/integration/oracle/test_oracle.py (1 hunks)
  • metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json (11 hunks)
  • metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json (11 hunks)
  • metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_no_db_to_file.json (24 hunks)
  • metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_no_db_with_filter.json (14 hunks)
  • metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_to_file.json (14 hunks)
  • metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_with_lower_case_urn.json (14 hunks)
  • metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json (26 hunks)
  • metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json (26 hunks)
  • metadata-ingestion/tests/integration/trino/trino_mces_golden.json (18 hunks)
  • metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py (7 hunks)
Files skipped from review due to trivial changes (3)
  • metadata-ingestion/tests/integration/mysql/mysql_mces_no_db_golden.json
  • metadata-ingestion/tests/integration/postgres/postgres_all_db_mces_with_db_golden.json
  • metadata-ingestion/tests/integration/postgres/postgres_mces_with_db_golden.json
Files skipped from review as they are similar to previous changes (18)
  • metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/hive_metastore.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/trino.py
  • metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py
  • metadata-ingestion/tests/integration/hana/docker-compose.yml
  • metadata-ingestion/tests/integration/hana/hana_mces_golden.json
  • metadata-ingestion/tests/integration/mysql/mysql_mces_with_db_golden.json
  • metadata-ingestion/tests/integration/mysql/mysql_table_level_only.json
  • metadata-ingestion/tests/integration/mysql/mysql_table_row_count_estimate_only.json
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_database.json
  • metadata-ingestion/tests/integration/oracle/golden_test_ingest_with_out_database.json
  • metadata-ingestion/tests/integration/oracle/test_oracle.py
  • metadata-ingestion/tests/integration/trino/trino_hive_instance_mces_golden.json
  • metadata-ingestion/tests/integration/trino/trino_hive_mces_golden.json
  • metadata-ingestion/tests/unit/utilities/test_sqlalchemy_type_converter.py
Additional comments not posted (44)
metadata-ingestion/tests/integration/trino/trino_mces_golden.json (11)

259-259: Simplified data type representation.

The nativeDataType for the id field has been correctly updated to "INTEGER".


271-271: Simplified data type representation.

The nativeDataType for the name field has been correctly updated to "VARCHAR(50)".


283-283: Simplified data type representation.

The nativeDataType for the author field has been correctly updated to "VARCHAR(50)".


295-295: Simplified data type representation.

The nativeDataType for the publisher field has been correctly updated to "VARCHAR(50)".


307-307: Simplified data type representation.

The nativeDataType for the tags field has been correctly updated to "JSON".


507-507: Simplified data type representation.

The nativeDataType for the book_id field has been correctly updated to "INTEGER".


519-519: Simplified data type representation.

The nativeDataType for the member_id field has been correctly updated to "INTEGER".


531-531: Simplified data type representation.

The nativeDataType for the issue_date field has been correctly updated to "DATE".


543-543: Simplified data type representation.

The nativeDataType for the return_date field has been correctly updated to "DATE".


726-726: Simplified data type representation.

The nativeDataType for the id field has been correctly updated to "INTEGER".


738-738: Simplified data type representation.

The nativeDataType for the name field has been correctly updated to "VARCHAR(50)".

metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_no_db_with_filter.json (9)

115-119: Verify the correctness of job metadata fields.

Ensure that the new values for job_id, job_name, description, date_created, and date_modified are accurate and consistent with the rest of the metadata.


1307-1307: Approved: Data type representation change.

The nativeDataType for the ID field has been updated to INTEGER, which aligns with SQL Server best practices.


1319-1319: Approved: Data type representation change.

The nativeDataType for the ProductName field has been updated to NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS, which aligns with SQL Server best practices.


1549-1549: Approved: Data type representation change.

The nativeDataType for the ID field has been updated to INTEGER, which aligns with SQL Server best practices.


1561-1561: Approved: Data type representation change.

The nativeDataType for the ItemName field has been updated to NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS, which aligns with SQL Server best practices.


1681-1681: Approved: Data type representation change.

The nativeDataType for the ID field has been updated to INTEGER, which aligns with SQL Server best practices.


1694-1694: Approved: Data type representation change.

The nativeDataType for the LastName field has been updated to VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS, which aligns with SQL Server best practices.


1706-1706: Approved: Data type representation change.

The nativeDataType for the FirstName field has been updated to VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS, which aligns with SQL Server best practices.


1718-1718: Approved: Data type representation change.

The nativeDataType for the Age field has been updated to INTEGER, which aligns with SQL Server best practices.

metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_to_file.json (8)

115-115: Update confirmed: job_id field.

The job_id field has been updated to "c6fb6778-14f1-4516-bb41-e5eaa97a687b", reflecting a new job execution context.


118-119: Update confirmed: date_created and date_modified fields.

The date_created field has been updated to "2024-07-27 23:58:29.780000" and the date_modified field has been updated to "2024-07-27 23:58:29.943000", reflecting new timestamps for job execution.


1307-1307: Update confirmed: nativeDataType for ID field.

The nativeDataType for the ID field has been updated to "INTEGER", aligning with SQL Server's expected data type specifications.


1319-1319: Update confirmed: nativeDataType for ProductName field.

The nativeDataType for the ProductName field has been updated to "NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS", aligning with SQL Server's expected data type specifications.


1549-1549: Update confirmed: nativeDataType for ID field in Items table.

The nativeDataType for the ID field in the Items table has been updated to "INTEGER", aligning with SQL Server's expected data type specifications.


1561-1561: Update confirmed: nativeDataType for ItemName field.

The nativeDataType for the ItemName field has been updated to "NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS", aligning with SQL Server's expected data type specifications.


1681-1681: Update confirmed: nativeDataType for ID field in Persons table.

The nativeDataType for the ID field in the Persons table has been updated to "INTEGER", aligning with SQL Server's expected data type specifications.


1694-1694: Update confirmed: nativeDataType for LastName and FirstName fields.

The nativeDataType for the LastName and FirstName fields has been updated to "VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS", aligning with SQL Server's expected data type specifications.

Also applies to: 1706-1706

metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_with_lower_case_urn.json (8)

115-115: Change approved: Updated job_id.

The job_id has been updated to a new UUID. This change is straightforward and does not introduce any issues.


118-118: Change approved: Updated date_created.

The date_created timestamp has been updated to a more recent date. This change is straightforward and does not introduce any issues.


119-119: Change approved: Updated date_modified.

The date_modified timestamp has been updated to a more recent date. This change is straightforward and does not introduce any issues.


1307-1307: Change approved: Updated nativeDataType for ID field.

The nativeDataType for the ID field has been updated from INTEGER() to INTEGER. This change improves clarity and compliance with SQL Server data type conventions.


1319-1319: Change approved: Updated nativeDataType for ProductName field.

The nativeDataType for the ProductName field has been updated from NVARCHAR() to NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS. This change improves clarity and compliance with SQL Server data type conventions.


1561-1561: Change approved: Updated nativeDataType for ItemName field.

The nativeDataType for the ItemName field has been updated from NVARCHAR() to NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS. This change improves clarity and compliance with SQL Server data type conventions.


1694-1694: Change approved: Updated nativeDataType for LastName and FirstName fields.

The nativeDataType for the LastName and FirstName fields has been updated from VARCHAR(length=255, collation='SQL_Latin1_General_CP1_CI_AS') to VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS. This change improves clarity and compliance with SQL Server data type conventions.

Also applies to: 1706-1706


1838-1838: Change approved: Updated nativeDataType for SomeId and TempID fields.

The nativeDataType for the SomeId field has been updated from UNIQUEIDENTIFIER() to UNIQUEIDENTIFIER, and for the TempID field from INTEGER() to INTEGER. This change improves clarity and compliance with SQL Server data type conventions.

Also applies to: 1850-1850

metadata-ingestion/tests/integration/sql_server/golden_files/golden_mces_mssql_no_db_to_file.json (8)

115-115: LGTM!

The job_id has been updated to a new UUID, indicating a new job instance or process.


118-118: LGTM!

The date_created has been updated to a more recent date, indicating a refresh of the metadata.


119-119: LGTM!

The date_modified has been updated to a more recent date, indicating a refresh of the metadata.


1307-1307: LGTM!

The nativeDataType for the ID field has been simplified from INTEGER() to INTEGER, aligning with standard SQL syntax.


1319-1319: LGTM!

The nativeDataType for the ProductName field has been updated to NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS, aligning with standard SQL syntax and specifying collation.


1549-1549: LGTM!

The nativeDataType for the ID field has been simplified from INTEGER() to INTEGER, aligning with standard SQL syntax.


1561-1561: LGTM!

The nativeDataType for the ItemName field has been updated to NVARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS, aligning with standard SQL syntax and specifying collation.


1681-1681: LGTM!

The nativeDataType for the ID field has been simplified from INTEGER() to INTEGER, aligning with standard SQL syntax.

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Jul 29, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 0e254e3 and b002a3f.

Files selected for processing (1)
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py (4 hunks)
Files skipped from review as they are similar to previous changes (1)
  • metadata-ingestion/src/datahub/utilities/sqlalchemy_type_converter.py

@hsheth2 hsheth2 changed the title fix(ingestion): use correct native data type in all SQLAlchemy sources by compiling data type using dialect fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect Aug 6, 2024
@hsheth2 hsheth2 merged commit 9619553 into datahub-project:master Aug 6, 2024
57 of 82 checks passed
@Masterchen09 Masterchen09 deleted the fix-sqlalchemy-data-types branch August 6, 2024 20:44
arosanda added a commit to infobip/datahub that referenced this pull request Sep 23, 2024
* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820)

* refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2)  (datahub-project#10764)

Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>

* fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819)

* feat(ingest/transformer): tags to terms transformer (datahub-project#10758)

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>

* fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752)

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>

* feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822)

* feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823)

* feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824)

* feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825)

* add flag for includeSoftDeleted in scroll entities API (datahub-project#10831)

* feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832)

* feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826)

* add scroll parameters to openapi v3 spec (datahub-project#10833)

* fix(ingest): correct profile_day_of_week implementation (datahub-project#10818)

* feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* feat(cli): add more details to get cli (datahub-project#10815)

* fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836)

* fix(ingestion): fix datajob patcher (datahub-project#10827)

* fix(smoke-test): add suffix in temp file creation (datahub-project#10841)

* feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784)

* feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645)

Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com>

* docs(patch): add patch documentation for how implementation works (datahub-project#10010)

Co-authored-by: John Joyce <john@acryl.io>

* fix(jar): add missing custom-plugin-jar task (datahub-project#10847)

* fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391)

Co-authored-by: John Joyce <john@acryl.io>

* docs(): Update posts.md (datahub-project#9893)

Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* chore(ingest): update acryl-datahub-classify version (datahub-project#10844)

* refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828)

Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834)

* fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849)

* fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848)

* fix(smoke-test): missing test for move domain (datahub-project#10837)

* ci: update usernames to not considered for community (datahub-project#10851)

* env: change defaults for data contract visibility (datahub-project#10854)

* fix(ingest/tableau): quote special characters in external URL (datahub-project#10842)

* fix(smoke-test): fix flakiness of auto complete test

* ci(ingest): pin dask dependency for feast (datahub-project#10865)

* fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542)

* feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829)

* chore(ingest): Mypy 1.10.1 pin (datahub-project#10867)

* docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852)

* docs: add new js snippet (datahub-project#10846)

* refactor(ingestion): remove company domain for security reason (datahub-project#10839)

* fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498)

Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com>

* fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874)

* fix(manage-tokens): fix manage access token policy (datahub-project#10853)

* Batch get entity endpoints (datahub-project#10880)

* feat(system): support conditional write semantics (datahub-project#10868)

* fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890)

* feat(ingest/lookml): shallow clone repos (datahub-project#10888)

* fix(ingest/looker): add missing dependency (datahub-project#10876)

* fix(ingest): only populate audit stamps where accurate (datahub-project#10604)

* fix(ingest/dbt): always encode tag urns (datahub-project#10799)

* fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727)

* fix(ingestion/looker): column name missing in explore (datahub-project#10892)

* fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879)

* feat(conditional-writes): misc updates and fixes (datahub-project#10901)

* feat(ci): update outdated action (datahub-project#10899)

* feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902)

Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io>

* feat(ingest): add snowflake-queries source (datahub-project#10835)

* fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906)

* docs: add new company to adoption list (datahub-project#10909)

* refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870)

Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* feat(ui) Finalize support for all entity types on forms (datahub-project#10915)

* Index ExecutionRequestResults status field (datahub-project#10811)

* feat(ingest): grafana connector (datahub-project#10891)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916)

* feat(dataset): add support for external url in Dataset (datahub-project#10877)

* docs(saas-overview) added missing features to observe section (datahub-project#10913)

Co-authored-by: John Joyce <john@acryl.io>

* fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882)

* fix(structured properties): allow application of structured properties without schema file (datahub-project#10918)

* fix(data-contracts-web) handle other schedule types (datahub-project#10919)

* fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* Add feature flag for view defintions (datahub-project#10914)

Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io>

* feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884)

* fix(airflow): add error handling around render_template() (datahub-project#10907)

* feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830)

* feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904)

* fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845)

* feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864)

* feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813)

* fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924)

* fix(build): fix lint fix web react (datahub-project#10896)

* fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912)

* feat(ingest): report extractor failures more loudly (datahub-project#10908)

* feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905)

* fix(ingest): fix docs build (datahub-project#10926)

* fix(ingest/snowflake): fix test connection (datahub-project#10927)

* fix(ingest/lookml): add view load failures to cache (datahub-project#10923)

* docs(slack) overhauled setup instructions and screenshots (datahub-project#10922)

Co-authored-by: John Joyce <john@acryl.io>

* fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903)

* fix(entityservice): fix merging sideeffects (datahub-project#10937)

* feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938)

Co-authored-by: John Joyce <john@Johns-MBP.lan>

* chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925)

Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>
Co-authored-by: John Joyce <john@Johns-MBP.lan>

* Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939)

* docs: add learning center to docs (datahub-project#10921)

* doc: Update hubspot form id (datahub-project#10943)

* chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941)

* fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895)

* fix(ingest/abs): split abs utils into multiple files (datahub-project#10945)

* doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950)

* fix(ingest/setup): feast and abs source setup (datahub-project#10951)

* fix(connections) Harden adding /gms to connections in backend (datahub-project#10942)

* feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952)

* fix(docs): make graphql doc gen more automated (datahub-project#10953)

* feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723)

* fix(spark-lineage): default timeout for future responses (datahub-project#10947)

* feat(datajob/flow): add environment filter using info aspects (datahub-project#10814)

* fix(ui/ingest): correct privilege used to show tab (datahub-project#10483)

Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com>

* feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955)

* add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956)

* fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966)

* fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965)

* fix(airflow/build): Pinning mypy (datahub-project#10972)

* Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974)

* fix(ingest/test): Fix for mssql integration tests (datahub-project#10978)

* fix(entity-service) exist check correctly extracts status (datahub-project#10973)

* fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982)

* bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986)

* fix(ui): Remove ant less imports (datahub-project#10988)

* feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987)

* feat(ingest/cli): init does not actually support environment variables (datahub-project#10989)

* fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991)

* feat(ingest/spark): Promote beta plugin (datahub-project#10881)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967)

* feat(ingest): add `check server-config` command (datahub-project#10990)

* feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466)

Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig.
This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig.

I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled.

Fixes datahub-project#9705

* fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993)

* fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771)

* feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985)

* feat(ingest): improve `ingest deploy` command (datahub-project#10944)

* fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920)

- allow excluding soft-deleted entities in relationship-queries
- exclude soft-deleted members of groups

* fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996)

* doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984)

Co-authored-by: John Joyce <john@acryl.io>
Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Pedro Silva <pedro@acryl.io>

* fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006)

* fix(ui/ingest): Support invalid cron jobs (datahub-project#10998)

* fix(ingest): fix graph config loading (datahub-project#11002)

Co-authored-by: Pedro Silva <pedro@acryl.io>

* feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011)

* feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999)

* feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935)

* fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858)

* feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016)

Co-authored-by: Chris Collins <chriscollins3456@gmail.com>

* docs: standardize terminology to DataHub Cloud (datahub-project#11003)

* fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013)

* docs(slack) troubleshoot docs (datahub-project#11014)

* feat(propagation): Add graphql API (datahub-project#11030)

Co-authored-by: Chris Collins <chriscollins3456@gmail.com>

* feat(propagation):  Add models for Action feature settings (datahub-project#11029)

* docs(custom properties): Remove duplicate from sidebar (datahub-project#11033)

* feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997)

Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>

* feat(propagation): Add Documentation Propagation Settings (datahub-project#11038)

* fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040)

* fix(ci): smoke test lint failures (datahub-project#11044)

* docs: fix learning center color scheme & typo (datahub-project#11043)

* feat: add cloud main page (datahub-project#11017)

Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com>

* feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662)

Co-authored-by: John Joyce <john@acryl.io>

* docs: fix typo (datahub-project#11046)

* fix(lint): apply spotless (datahub-project#11050)

* docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034)

* feat(cli): Add run-id option to put sub-command (datahub-project#11023)

Adds an option to assign run-id to a given put command execution. 
This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline.

* fix(ingest): improve sql error reporting calls (datahub-project#11025)

* fix(airflow): fix CI setup (datahub-project#11031)

* feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039)

* fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971)

* (chore): Linting fix (datahub-project#11015)

* chore(ci): update deprecated github actions (datahub-project#10977)

* Fix ALB configuration example (datahub-project#10981)

* chore(ingestion-base): bump base image packages (datahub-project#11053)

* feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051)

* fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910)

* feat(ingest/tableau): add retry on timeout (datahub-project#10995)

* change generate kafka connect properties from env (datahub-project#10545)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>

* fix(ingest): fix oracle cronjob ingestion (datahub-project#11001)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>

* chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062)

* feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041)

* build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>

* fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063)

* docs(ingest): update developing-a-transformer.md (datahub-project#11019)

* feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056)

* feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* docs(airflow): update min version for plugin v2 (datahub-project#11065)

* doc(ingestion/tableau): doc update for derived permission (datahub-project#11054)

Co-authored-by: Pedro Silva <pedro.cls93@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(py): remove dep on types-pkg_resources (datahub-project#11076)

* feat(ingest/mode): add option to exclude restricted (datahub-project#11081)

* fix(ingest): set lastObserved in sdk when unset (datahub-project#11071)

* doc(ingest): Update capabilities (datahub-project#11072)

* chore(vulnerability): Log Injection (datahub-project#11090)

* chore(vulnerability): Information exposure through a stack trace (datahub-project#11091)

* chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089)

* chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088)

* chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059)

* chore(vulnerability): Overly permissive regex range (datahub-project#11061)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix: update customer data (datahub-project#11075)

* fix(models): fixing the datasetPartition models (datahub-project#11085)

Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>

* fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084)

Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>

* feat(docs-site): hiding learn more from cloud page (datahub-project#11097)

* fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082)

Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com>

* fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098)

* docs: Refactor customer stories section (datahub-project#10869)

Co-authored-by: Jeff Merrick <jeff@wireform.io>

* fix(release): fix full/slim suffix on tag (datahub-project#11087)

* feat(config): support alternate hashing algorithm for doc id (datahub-project#10423)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: John Joyce <john@acryl.io>

* fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007)

* fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* chore: Update contributors list in PR labeler (datahub-project#11105)

* feat(ingest): tweak stale entity removal messaging (datahub-project#11064)

* fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104)

* fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080)

* feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069)

* docs: update graphql docs on forms & structured properties (datahub-project#11100)

* test(search): search openAPI v3 test (datahub-project#11049)

* fix(ingest/tableau): prevent empty site content urls (datahub-project#11057)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat(entity-client): implement client batch interface (datahub-project#11106)

* fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114)

* fix(ingest): downgrade column type mapping warning to info (datahub-project#11115)

* feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118)

* fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111)

* fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122)

* fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092)

* fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121)

* fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366)

* feat(ui): Changes to allow editable dataset name (datahub-project#10608)

Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com>

* fix: remove saxo (datahub-project#11127)

* feat(mcl-processor): Update mcl processor hooks (datahub-project#11134)

* fix(openapi): fix openapi v2 endpoints & v3 documentation update

* Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update"

This reverts commit 573c1cb.

* docs(policies): updates to policies documentation (datahub-project#11073)

* fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139)

* feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116)

* fix(mutator): mutator hook fixes (datahub-project#11140)

* feat(search): support sorting on multiple fields (datahub-project#10775)

* feat(ingest): various logging improvements (datahub-project#11126)

* fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* feat(docs-site) cloud page spacing and content polishes (datahub-project#11141)

* feat(ui) Enable editing structured props on fields (datahub-project#11042)

* feat(tests): add md5 and last computed to testResult model (datahub-project#11117)

* test(openapi): openapi regression smoke tests (datahub-project#11143)

* fix(airflow): fix tox tests + update docs (datahub-project#11125)

* docs: add chime to adoption stories (datahub-project#11142)

* fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158)

* fix(kafka-setup): add missing script to image (datahub-project#11190)

* fix(config): fix hash algo config (datahub-project#11191)

* test(smoke-test): updates to smoke-tests (datahub-project#11152)

* fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193)

* chore(kafka): kafka version bump (datahub-project#11211)

* readd UsageStatsWorkUnit

* fix merge problems

* change logo

---------

Co-authored-by: Chris Collins <chriscollins3456@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>
Co-authored-by: dushayntAW <158567391+dushayntAW@users.noreply.github.com>
Co-authored-by: sagar-salvi-apptware <159135491+sagar-salvi-apptware@users.noreply.github.com>
Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>
Co-authored-by: Kevin Chun <kevin1chun@gmail.com>
Co-authored-by: jordanjeremy <72943478+jordanjeremy@users.noreply.github.com>
Co-authored-by: skrydal <piotr.skrydalewicz@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: sid-acryl <155424659+sid-acryl@users.noreply.github.com>
Co-authored-by: Julien Jehannet <80408664+aviv-julienjehannet@users.noreply.github.com>
Co-authored-by: Hendrik Richert <github@richert.li>
Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com>
Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com>
Co-authored-by: Felix Lüdin <13187726+Masterchen09@users.noreply.github.com>
Co-authored-by: Pirry <158024088+chardaway@users.noreply.github.com>
Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: cburroughs <chris.burroughs@gmail.com>
Co-authored-by: ksrinath <ksrinath@users.noreply.github.com>
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: ipolding-cais <155455744+ipolding-cais@users.noreply.github.com>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Shubham Jagtap <132359390+shubhamjagtap639@users.noreply.github.com>
Co-authored-by: haeniya <yanik.haeni@gmail.com>
Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com>
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io>
Co-authored-by: 808OVADOZE <52988741+shtephlee@users.noreply.github.com>
Co-authored-by: noggi <anton.kuraev@acryl.io>
Co-authored-by: Nicholas Pena <npena@foursquare.com>
Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com>
Co-authored-by: ethan-cartwright <ethan.cartwright.m@gmail.com>
Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io>
Co-authored-by: Nadav Gross <33874964+nadavgross@users.noreply.github.com>
Co-authored-by: Patrick Franco Braz <patrickfbraz@poli.ufrj.br>
Co-authored-by: pie1nthesky <39328908+pie1nthesky@users.noreply.github.com>
Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <130968841+joelmataKPN@users.noreply.github.com>
Co-authored-by: Ellie O'Neil <110510035+eboneil@users.noreply.github.com>
Co-authored-by: Ajoy Majumdar <ajoymajumdar@hotmail.com>
Co-authored-by: deepgarg-visa <149145061+deepgarg-visa@users.noreply.github.com>
Co-authored-by: Tristan Heisler <tristankheisler@gmail.com>
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
Co-authored-by: Davi Arnaut <davi.arnaut@acryl.io>
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: amit-apptware <132869468+amit-apptware@users.noreply.github.com>
Co-authored-by: Sam Black <sam.black@acryl.io>
Co-authored-by: Raj Tekal <varadaraj_tekal@optum.com>
Co-authored-by: Steffen Grohsschmiedt <gitbhub@steffeng.eu>
Co-authored-by: jaegwon.seo <162448493+wornjs@users.noreply.github.com>
Co-authored-by: Renan F. Lima <51028757+lima-renan@users.noreply.github.com>
Co-authored-by: Matt Exchange <xkollar@users.noreply.github.com>
Co-authored-by: Jonny Dixon <45681293+acrylJonny@users.noreply.github.com>
Co-authored-by: Pedro Silva <pedro.cls93@gmail.com>
Co-authored-by: Pinaki Bhattacharjee <pinakipb2@gmail.com>
Co-authored-by: Jeff Merrick <jeff@wireform.io>
Co-authored-by: skrydal <piotr.skrydalewicz@acryl.io>
Co-authored-by: AndreasHegerNuritas <163423418+AndreasHegerNuritas@users.noreply.github.com>
Co-authored-by: jayasimhankv <145704974+jayasimhankv@users.noreply.github.com>
Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com>
Co-authored-by: David Leifker <david.leifker@acryl.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants