Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion/lookml): liquid template resolution and view-to-view cll #10542

Merged
merged 65 commits into from
Jul 8, 2024

Conversation

sid-acryl
Copy link
Collaborator

@sid-acryl sid-acryl commented May 20, 2024

  • Update code to use DataHub SqlParser for SQL parsing
  • And fixes issues in CLL generation when view definition language is SQL
  • Add support for liquid template resolution for lookml views
  • Add condition tag similar to looker liquid condition tag

Summary by CodeRabbit

  • New Features

    • Enhanced Looker integration with improved field handling and metadata event generation.
    • Introduced data classes for handling Looker model and view files.
  • Chores

    • Updated dependencies: Added "python-liquid" and sqlglot_lib for LookML support.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label May 20, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Outside diff range and nitpick comments (4)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (1)

Line range hint 1349-1349: Remove use of lru_cache on methods.

Using functools.lru_cache on methods can lead to memory leaks. Consider using an alternative caching mechanism.

-    @lru_cache(maxsize=200)
+    # @lru_cache(maxsize=200)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (2)

Line range hint 385-385: Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

-        for field in filters.keys():
+        for field in filters:

Line range hint 1260-1264: Refactor nested if statements.

Use a single if statement instead of nested if statements.

-        if dashboard is None and dashboard_element is not None:
-            ownership = self.get_ownership(dashboard_element)
-            if ownership is not None:
-                chart_snapshot.aspects.append(ownership)
+        if dashboard is None and dashboard_element is not None and (ownership := self.get_ownership(dashboard_element)) is not None:
+            chart_snapshot.aspects.append(ownership)
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (1)

Line range hint 1405-1485: Ensure completeness of field definitions.

The field country is mentioned in the view logic but not defined in the schema metadata. This could lead to incomplete metadata representation.

Ensure that all fields used in the view logic are defined in the schema metadata.

{
  "fieldPath": "country",
  "nullable": false,
  "description": "Country",
  "label": "",
  "type": {
    "type": {
      "com.linkedin.pegasus2avro.schema.StringType": {}
    }
  },
  "nativeDataType": "string",
  "recursive": false,
  "globalTags": {
    "tags": []
  },
  "isPartOfKey": false
}
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8edc94d and 5ad8200.

Files selected for processing (41)
  • metadata-ingestion/setup.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_resolver.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (21 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/str_functions.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/urn_functions.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)
  • metadata-ingestion/tests/integration/looker/test_looker.py (1 hunks)
  • metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (6 hunks)
  • metadata-ingestion/tests/integration/lookml/expected_output.json (19 hunks)
  • metadata-ingestion/tests/integration/lookml/lkml_samples/liquid.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/lkml_samples/nested/fragment_derived.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/lkml_samples_hive/included_view_file.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/lkml_samples_hive/liquid.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/lkml_samples_hive/nested/fragment_derived.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (6 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json (12 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_same_name_views_different_file_path.json (8 hunks)
  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (20 hunks)
  • metadata-ingestion/tests/integration/lookml/test_lookml.py (4 hunks)
  • metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/activity_logs.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/data.model.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_income_source.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_tax_report.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_total_income.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/top_10_employee_income_source.view.lkml (1 hunks)
  • metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json (1 hunks)
Files not summarized due to errors (2)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
  • metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json: Error: Message exceeds token limit
Files not reviewed due to errors (1)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (no review received)
Files skipped from review due to trivial changes (4)
  • metadata-ingestion/src/datahub/ingestion/source/looker/str_functions.py
  • metadata-ingestion/tests/integration/lookml/lkml_samples/liquid.view.lkml
  • metadata-ingestion/tests/integration/lookml/lkml_samples_hive/liquid.view.lkml
  • metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/data.model.lkml
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py

145-152: Use a single if statement instead of nested if statements

(SIM102)

metadata-ingestion/tests/integration/lookml/test_lookml.py

719-720: Use a single if statement instead of nested if statements

(SIM102)

metadata-ingestion/tests/integration/looker/test_looker.py

490-490: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

411-414: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


632-635: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py

1349-1349: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks

(B019)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py

385-385: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


1260-1264: Use a single if statement instead of nested if statements

(SIM102)

Additional comments not posted (173)
metadata-ingestion/tests/integration/lookml/lkml_samples_hive/included_view_file.view.lkml (1)

2-2: Verify the SQL table name formatting.

Ensure that the SQL table name "looker_schema"."include_able" is correctly formatted and valid in your database.

metadata-ingestion/tests/integration/lookml/lkml_samples/nested/fragment_derived.view.lkml (3)

4-4: Verify the SQL syntax and column name.

Ensure that the column date exists and the alias DATE is correctly used in the SQL query.


5-5: Verify the SQL syntax and column name.

Ensure that the column platform exists and the alias aliased_platform is correctly used in the SQL query.


6-6: Verify the SQL syntax and column name.

Ensure that the column country exists and is correctly used in the SQL query.

metadata-ingestion/tests/integration/lookml/lkml_samples_hive/nested/fragment_derived.view.lkml (3)

4-4: Verify the SQL syntax and column name.

Ensure that the column date exists and the alias DATE is correctly used in the SQL query.


5-5: Verify the SQL syntax and column name.

Ensure that the column platform exists and the alias aliased_platform is correctly used in the SQL query.


6-6: Verify the SQL syntax and column name.

Ensure that the column country exists and is correctly used in the SQL query.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_tax_report.view.lkml (4)

2-2: Verify the SQL table name formatting.

Ensure that the SQL table name data-warehouse.finance.form-16 is correctly formatted and valid in your database.


4-6: Verify the dimension type and SQL syntax.

Ensure that the dimension id with type number and SQL ${TABLE}.id is correctly defined.


9-11: Verify the dimension type and SQL syntax.

Ensure that the dimension name with type string and SQL ${TABLE}.name is correctly defined.


14-16: Verify the measure type and SQL syntax.

Ensure that the measure taxable_income with type sum and SQL ${TABLE}.tax is correctly defined.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_total_income.view.lkml (4)

1-3: LGTM!

The SQL table name is correctly defined using a liquid template variable.


4-7: LGTM!

The dimension id is correctly defined with type number and a SQL expression using a liquid template variable.


9-12: LGTM!

The dimension name is correctly defined with type string and a SQL expression using a liquid template variable.


14-17: LGTM!

The measure total_income is correctly defined with type sum and a SQL expression using a liquid template variable.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/top_10_employee_income_source.view.lkml (4)

1-10: LGTM!

The derived table is correctly defined using a SQL query with a liquid template variable.


12-15: LGTM!

The dimension id is correctly defined with type number and a SQL expression using a liquid template variable.


17-20: LGTM!

The dimension name is correctly defined with type string and a SQL expression using a liquid template variable.


22-25: LGTM!

The dimension source is correctly defined with type string and a SQL expression using a liquid template variable.

metadata-ingestion/src/datahub/ingestion/source/looker/urn_functions.py (2)

1-11: LGTM!

The function get_qualified_table_name correctly handles the URN format and returns the appropriate part of the URN.


13-18: LGTM!

The function get_table_name correctly handles the qualified table name and returns the appropriate part of the name.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/activity_logs.view.lkml (2)

1-10: LGTM!

The SQL table name is correctly defined using liquid template variables and conditional logic.


12-17: LGTM!

The dimension generated_message_id is correctly defined with a group label, primary key, type, and SQL expression using a liquid template variable.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_income_source.view.lkml (3)

1-1: Add a description for the view.

It's good practice to add a description for the view to improve readability and maintainability.

+  description: "This view represents employee income source data."

6-12: Ensure proper handling of SQL injection.

Using liquid template tags in SQL queries can introduce SQL injection vulnerabilities. Ensure that the values used in these tags are properly sanitized.

Do you have measures in place to sanitize the values used in these liquid template tags?


16-16: Verify the custom condition tag implementation.

Ensure that the custom condition tag used here is correctly implemented and tested.

Is the custom condition tag implementation tested and verified for correctness?

metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py (2)

14-17: Add a docstring for the CustomTagException class.

Adding a docstring will improve code readability and maintainability.

class CustomTagException(Exception):
+    """
+    Exception raised for errors in the custom tag processing.
+
+    Attributes:
+        message -- explanation of the error
+    """

45-56: Improve the docstring for the ConditionTag class.

Clarify the usage of the ConditionTag class and provide examples.

"""
ConditionTag is the equivalent implementation of Looker's custom liquid tag "condition".
Refer doc: https://cloud.google.com/looker/docs/templated-filters#basic_usage

Refer doc to see how to write liquid custom tag: https://jg-rp.github.io/liquid/guides/custom-tags

This class renders the below tag as order.region='ap-south-1' if order_region is provided in config.liquid_variables
as order_region: 'ap-south-1'
    {% condition order_region %} order.region {% endcondition %}

+Usage example:
+    {% condition order_region %} order.region {% endcondition %}
"""
metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (2)

42-64: Add a docstring for the LookerConnectionDefinition class.

Adding a docstring will improve code readability and maintainability.

class LookerConnectionDefinition(ConfigModel):
+    """
+    Represents a Looker connection definition.
+
+    Attributes:
+        platform -- the platform name
+        default_db -- the default database name
+        default_schema -- the default schema name (optional)
+        platform_instance -- the platform instance name (optional)
+        platform_env -- the environment that the platform is located in (optional)
+    """

75-85: Improve error handling in from_looker_connection method.

Ensure that the method handles missing dialect names gracefully.

if looker_connection.dialect_name is None:
    raise ConfigurationError(
        f"Unable to fetch a fully filled out connection for {looker_connection.name}. Please check your API permissions."
    )
for extractor_pattern, extracting_function in extractors.items():
    if re.match(extractor_pattern, looker_connection.dialect_name):
        (platform, db, schema) = extracting_function(looker_connection)
        return cls(platform=platform, default_db=db, default_schema=schema)
raise ConfigurationError(
    f"Could not find an appropriate platform for looker_connection: {looker_connection.name} with dialect: {looker_connection.dialect_name}"
)

Likely invalid or redundant comment.

metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (2)

40-42: Add a docstring for the is_view_seen method.

Adding a docstring will improve code readability and maintainability.

def is_view_seen(self, path: str) -> bool:
+    """
+    Checks if the view file at the given path has already been loaded.
+
+    Args:
+        path: The path to the view file.
+
+    Returns:
+        True if the view file has been loaded, False otherwise.
+    """
    return path in self.viewfile_cache

98-113: Add a docstring for the load_viewfile method.

Adding a docstring will improve code readability and maintainability.

def load_viewfile(
    self,
    path: str,
    project_name: str,
    connection: Optional[LookerConnectionDefinition],
    reporter: LookMLSourceReport,
) -> Optional[LookerViewFile]:
+    """
+    Loads the Looker view file at the given path, resolves liquid variables, and caches the result.
+
+    Args:
+        path: The path to the view file.
+        project_name: The name of the project.
+        connection: The Looker connection definition.
+        reporter: The source report for logging and error reporting.
+
+    Returns:
+        The loaded LookerViewFile object, or None if loading failed.
+    """
    viewfile = self._load_viewfile(
        project_name=project_name,
        path=path,
        reporter=reporter,
    )
    if viewfile is None:
        return None

    return replace(viewfile, connection=connection)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (6)

19-23: Add type hints to the function.

Type hints improve code readability and help catch type-related errors early.

- def create_nested_dict(keys, value):
+ def create_nested_dict(keys: List[str], value: Any) -> Dict[str, Any]:

26-34: Add type hints to the class methods.

Type hints improve code readability and help catch type-related errors early.

- def __init__(self, liquid_variable):
+ def __init__(self, liquid_variable: Dict[str, Any]):

35-60: Add type hints to the method _create_new_liquid_variables_with_default.

Type hints improve code readability and help catch type-related errors early.

- def _create_new_liquid_variables_with_default(self, variables: Set[str]) -> dict:
+ def _create_new_liquid_variables_with_default(self, variables: Set[str]) -> Dict[str, Any]:

62-74: Add type hints to the method liquid_variable_with_default.

Type hints improve code readability and help catch type-related errors early.

- def liquid_variable_with_default(self, text: str) -> dict:
+ def liquid_variable_with_default(self, text: str) -> Dict[str, Any]:

77-101: Add type hints to the function resolve_liquid_variable.

Type hints improve code readability and help catch type-related errors early.

- def resolve_liquid_variable(text: str, liquid_variable: Dict[Any, Any]) -> str:
+ def resolve_liquid_variable(text: str, liquid_variable: Dict[str, Any]) -> str:

104-122: Add type hints to the function resolve_liquid_variable_in_view_dict.

Type hints improve code readability and help catch type-related errors early.

- def resolve_liquid_variable_in_view_dict(raw_view: dict, liquid_variable: Dict[Any, Any]) -> None:
+ def resolve_liquid_variable_in_view_dict(raw_view: Dict[str, Any], liquid_variable: Dict[str, Any]) -> None:
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_resolver.py (7)

25-29: Add type hints to the function is_derived_view.

Type hints improve code readability and help catch type-related errors early.

- def is_derived_view(view_name: str) -> bool:
+ def is_derived_view(view_name: str) -> bool:

32-52: Add type hints to the function get_derived_looker_view_id.

Type hints improve code readability and help catch type-related errors early.

- def get_derived_looker_view_id(qualified_table_name: str, looker_view_id_cache: "LookerViewIdCache", base_folder_path: str) -> Optional[LookerViewId]:
+ def get_derived_looker_view_id(qualified_table_name: str, looker_view_id_cache: "LookerViewIdCache", base_folder_path: str) -> Optional[LookerViewId]:

55-81: Add type hints to the function resolve_derived_view_urn_of_col_ref.

Type hints improve code readability and help catch type-related errors early.

- def resolve_derived_view_urn_of_col_ref(column_refs: List[ColumnRef], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[ColumnRef]:
+ def resolve_derived_view_urn_of_col_ref(column_refs: List[ColumnRef], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[ColumnRef]:

84-110: Add type hints to the function fix_derived_view_urn.

Type hints improve code readability and help catch type-related errors early.

- def fix_derived_view_urn(urns: List[str], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[str]:
+ def fix_derived_view_urn(urns: List[str], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[str]:

113-127: Add type hints to the function determine_view_file_path.

Type hints improve code readability and help catch type-related errors early.

- def determine_view_file_path(base_folder_path: str, absolute_file_path: str) -> str:
+ def determine_view_file_path(base_folder_path: str, absolute_file_path: str) -> str:

129-173: Add type hints to the class methods.

Type hints improve code readability and help catch type-related errors early.

- def __init__(self, project_name: str, model_name: str, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, reporter: LookMLSourceReport):
+ def __init__(self, project_name: str, model_name: str, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, reporter: LookMLSourceReport):

174-215: Add type hints to the method get_looker_view_id.

Type hints improve code readability and help catch type-related errors early.

- def get_looker_view_id(self, view_name: str, base_folder_path: str, connection: Optional[LookerConnectionDefinition] = None) -> Optional[LookerViewId]:
+ def get_looker_view_id(self, view_name: str, base_folder_path: str, connection: Optional[LookerConnectionDefinition] = None) -> Optional[LookerViewId]:
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (5)

18-62: Add type hints to the class methods.

Type hints improve code readability and help catch type-related errors early.

- def __init__(self, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, connection_definition: LookerConnectionDefinition, source_config: LookMLSourceConfig, reporter: LookMLSourceReport):
+ def __init__(self, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, connection_definition: LookerConnectionDefinition, source_config: LookMLSourceConfig, reporter: LookMLSourceReport):

63-66: Add type hints to the function is_refinement.

Type hints improve code readability and help catch type-related errors early.

- def is_refinement(view_name: str) -> bool:
+ def is_refinement(view_name: str) -> bool:

68-94: Add type hints to the function merge_column.

Type hints improve code readability and help catch type-related errors early.

- def merge_column(original_dict: dict, refinement_dict: dict, key: str) -> List[dict]:
+ def merge_column(original_dict: Dict[str, Any], refinement_dict: Dict[str, Any], key: str) -> List[Dict[str, Any]]:

97-105: Add type hints to the function merge_and_set_column.

Type hints improve code readability and help catch type-related errors early.

- def merge_and_set_column(new_raw_view: dict, refinement_view: dict, key: str) -> None:
+ def merge_and_set_column(new_raw_view: Dict[str, Any], refinement_view: Dict[str, Any], key: str) -> None:

107-132: Add type hints to the function merge_refinements.

Type hints improve code readability and help catch type-related errors early.

- def merge_refinements(raw_view: dict, refinement_views: List[dict]) -> dict:
+ def merge_refinements(raw_view: Dict[str, Any], refinement_views: List[Dict[str, Any]]) -> Dict[str, Any]:
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (4)

18-22: LGTM!

The ProjectInclude dataclass looks good and is correctly implemented.


24-30: LGTM!

The LookerField dataclass looks good and is correctly implemented.


39-85: LGTM!

The from_looker_dict method is well-structured and handles errors appropriately. The logging and reporting mechanisms are in place.


242-278: LGTM!

The from_looker_dict method is well-structured and handles errors appropriately. The logging and reporting mechanisms are in place.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (7)

30-54: LGTM!

The methods in LookerFieldContext are well-structured and handle field context appropriately. The logging and error handling mechanisms are in place.


191-216: LGTM!

The resolve_extends_view_name method is well-structured and handles view name resolution appropriately. The logging and error handling mechanisms are in place.


218-248: LGTM!

The get_including_extends method is well-structured and handles field resolution appropriately. The logging and error handling mechanisms are in place.


250-277: LGTM!

The methods _get_sql_table_name_field, _is_dot_sql_table_name_present, and sql_table_name are well-structured and handle SQL table name resolution appropriately. The logging and error handling mechanisms are in place.


278-321: LGTM!

The methods derived_table, explore_source, and sql are well-structured and handle derived table and SQL resolution appropriately. The logging and error handling mechanisms are in place.


323-343: LGTM!

The methods name and view_file_name are well-structured and handle view name and file name resolution appropriately. The logging and error handling mechanisms are in place.


344-413: LGTM!

The methods _get_list_dict, dimensions, measures, dimension_groups, is_materialized_derived_view, is_regular_case, is_sql_table_name_referring_to_view, is_sql_based_derived_case, is_native_derived_case, and is_sql_based_derived_view_without_fields_case are well-structured and handle view context appropriately. The logging and error handling mechanisms are in place.

metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (3)

Line range hint 1-106: LGTM!

The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.


Line range hint 107-233: LGTM!

The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.


Line range hint 234-494: LGTM!

The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.

metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (6)

42-110: LGTM!

The utility functions _platform_names_have_2_parts, _drop_hive_dot, _drop_hive_dot_from_upstream, and _generate_fully_qualified_name are well-structured and handle platform-specific naming and transformations appropriately. The logging and error handling mechanisms are in place.


113-148: LGTM!

The AbstractViewUpstream class is well-structured and defines abstract methods for extracting upstream column references and dataset URNs.


169-194: LGTM!

The __get_spr method is well-structured and handles SQL parsing results appropriately. The logging and error handling mechanisms are in place.


196-214: LGTM!

The __get_upstream_dataset_urn method is well-structured and handles upstream dataset URN resolution appropriately. The logging and error handling mechanisms are in place.


216-242: LGTM!

The create_fields method is well-structured and handles field creation appropriately. The logging and error handling mechanisms are in place.


244-281: LGTM!

The get_upstream_column_ref method is well-structured and handles upstream column references appropriately. The logging and error handling mechanisms are in place.

metadata-ingestion/tests/integration/lookml/lookml_same_name_views_different_file_path.json (2)

Line range hint 386-694: Ensure the correctness of browse paths.

The browse paths appear correct, but it's crucial to verify if the container references and path IDs are accurate and exist in the respective datasets.

Verification successful

Browse paths and container references are correct.

The container references and path IDs (urn:li:container:78f22c19304954b15e8adb1d9809975e, Develop, path1, path2, foo) exist in the dataset definitions, validating the correctness of the browse paths.

  • urn:li:container:78f22c19304954b15e8adb1d9809975e is found in multiple dataset definitions.
  • id: "Develop", id: "path1", id: "path2", and id: "foo" are all confirmed in the dataset definitions.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the existence of container references and path IDs in the respective datasets.

# Test: Search for the container references and path IDs in the dataset definitions. Expect: Container references and path IDs exist in the datasets.
rg --type json 'urn:li:container:78f22c19304954b15e8adb1d9809975e'
rg --type json '"id": "Develop"'
rg --type json '"id": "path1"'
rg --type json '"id": "path2"'
rg --type json '"id": "foo"'

Length of output: 80442


173-229: Ensure the correctness of fine-grained lineage.

The fine-grained lineage entries appear correct, but it's crucial to verify if the upstream and downstream field references are accurate and exist in the respective datasets.

Verification successful

Fine-grained lineage entries are valid.

The upstream and downstream field references exist in the respective datasets, confirming the correctness of the fine-grained lineage entries.

  • No issues found with the fine-grained lineage entries.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the existence of upstream and downstream field references in the respective datasets.

# Test: Search for the field references in the dataset definitions. Expect: Field references exist in the datasets.
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country|city|is_latest|timestamp|measurement)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),country|city|is_latest|timestamp|average_measurement)'

Length of output: 358



Script:

#!/bin/bash
# Description: Verify the existence of upstream and downstream field references in the respective datasets.

# Test: Search for the specific field references in the dataset definitions. Expect: Field references exist in the datasets.
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),city)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),is_latest)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),timestamp)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),measurement)'

rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),country)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),city)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),is_latest)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),timestamp)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),average_measurement)'

Length of output: 1402

metadata-ingestion/setup.py (1)

169-169: Dependency addition looks good!

The addition of python-liquid to the looker_common and mode dependencies is correctly formatted and logically consistent.

Also applies to: 374-374

metadata-ingestion/tests/integration/lookml/test_lookml.py (2)

896-928: LGTM!

The function test_view_to_view_lineage_and_liquid_template is well-structured and correctly sets up the pipeline with liquid variables. The use of freeze_time ensures consistent test results. The golden file verification is a good practice to ensure the correctness of the output.


931-1004: LGTM!

The function test_special_liquid_variables is well-structured and correctly checks the handling of special liquid variables. The use of freeze_time ensures consistent test results. The assertions ensure that the default values are correctly added and that the actual values are not overwritten.

metadata-ingestion/tests/integration/looker/test_looker.py (1)

1053-1080: LGTM!

The function test_upstream_cll is well-structured and correctly sets up the mock Looker explore. The use of freeze_time ensures consistent test results. The mock configuration is well-defined. The assertions ensure that the upstream fields are correctly set.

metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json (2)

604-616: Ensure dataset URNs are updated consistently.

The dataset URN for lkml_samples.view.owners should be updated consistently across all aspects.


386-398: Ensure dataset URNs are updated consistently.

The dataset URN for lkml_samples.view.my_view should be updated consistently across all aspects.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (8)

4-7: Import statements look good.

The added imports are necessary for the updated functionality.


38-45: New imports from looker_common are appropriate.

The added imports from looker_common are necessary for the updated functionality.


98-98: New import for ColumnRef is appropriate.

The added import for ColumnRef is necessary for fine-grained lineage extraction.


875-875: Ensure consistent usage of LookerRefinementResolver.

The LookerRefinementResolver instance is correctly instantiated and used for explore refinement.


912-918: Ensure proper initialization of LookerViewIdCache.

The LookerViewIdCache instance is correctly instantiated with the necessary parameters.


972-980: Ensure proper initialization of LookerViewContext.

The LookerViewContext instance is correctly instantiated with the necessary parameters.


985-994: Ensure proper initialization of LookerView from Looker dictionary.

The LookerView instance is correctly instantiated with the necessary parameters.


632-635: Improve exception handling by chaining exceptions.

Use raise ... from err to distinguish the exception from errors in exception handling.

- raise ValueError(f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file")
+ raise ValueError(f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file") from err

Likely invalid or redundant comment.

Tools
Ruff

632-635: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (9)

131-138: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.

-    # Remove duplicates filed from self.fields
+    # Remove duplicate fields from the provided list of fields.
-    # Logic is: If more than a field has same ViewField.name then keep only one filed where ViewField.field_type
+    # Logic: If more than one field has the same ViewField.name, keep only the field where ViewField.field_type
-    # is DIMENSION_GROUP.
+    # is DIMENSION_GROUP.
-    # Looker Constraint:
+    # Looker Constraints:
-    #   - Any field declared as dimension or measure can be redefined as dimension_group.
+    #   - Any field declared as a dimension or measure can be redefined as a dimension_group.
-    #   - Any field declared in dimension can't be redefined in measure and vice-versa.
+    #   - Any field declared as a dimension can't be redefined as a measure and vice-versa.

296-297: Verify the type hint for upstream_fields.

Ensure that the type hint Union[List[ColumnRef]] is appropriate and consider if it should be List[ColumnRef] instead.

-    upstream_fields: Union[List[ColumnRef]] = dataclasses_field(default_factory=list)
+    upstream_fields: List[ColumnRef] = dataclasses_field(default_factory=list)

299-332: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.

-    # It is the list of ColumnRef for derived view defined using SQL otherwise simple column name
+    # It is the list of ColumnRef for a derived view defined using SQL, otherwise a simple column name.

340-402: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.

-            return None  # Inconsistent info received
+            return None  # Inconsistent information received.
-            # remove variant at the end. +1 for "_"
+            # Remove variant at the end. +1 for "_".
-        assert view_name  # for lint false positive
+        assert view_name  # For lint false positive.

403-456: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.

-            )  # Variant i.e. Month, Day, Year ... is not available
+            )  # Variant (e.g., Month, Day, Year, etc.) is not available.
-            )  # for Dimensional Group the type is always start with date_[time|date]
+            )  # For Dimensional Group, the type always starts with date_[time|date].
-            )  # if the explore field is generated because of  Dimensional Group in View
-            # then the field_name should ends with field_group_variant
+            )  # If the explore field is generated because of Dimensional Group in View,
+            # then the field_name should end with field_group_variant.

Line range hint 459-467: LGTM!

The function create_view_project_map is correct and straightforward.


Line range hint 844-895: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.

-        # The view name that the explore refers to is resolved in the following order of priority:
-        # 1. view_name: https://cloud.google.com/looker/docs/reference/param-explore-view-name
-        # 2. from: https://cloud.google.com/looker/docs/reference/param-explore-from
-        # 3. default to the name of the explore
+        # The view name that the explore refers to is resolved in the following order of priority:
+        # 1. view_name: https://cloud.google.com/looker/docs/reference/param-explore-view-name
+        # 2. from: https://cloud.google.com/looker/docs/reference/param-explore-from
+        # 3. Default to the name of the explore.

1083-1103: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.

-            # form upstream of fields as all information is now available
+            # Form upstream of fields as all information is now available.

Line range hint 1217-1267: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.

-                # if we raise error on file_path equal to None then existing test-cases will fail as mock data
-                # doesn't have required attributes.
+                # If we raise an error on file_path equal to None, then existing test cases will fail as mock data
+                # doesn't have the required attributes.
metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json (27)

3-16: Ensure container properties are correctly defined.

The container properties aspect appears to be correctly defined with custom properties, name, and other metadata.


27-32: Ensure status aspect is correctly defined.

The status aspect for the container is correctly defined with the removed field set to false.


43-48: Ensure dataPlatformInstance aspect is correctly defined.

The dataPlatformInstance aspect correctly identifies the platform as Looker.


59-66: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "LookML Project".


77-86: Ensure browsePathsV2 aspect is correctly defined.

The browsePathsV2 aspect correctly defines the path for the container.


97-104: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".


133-138: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.


146-246: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.


283-290: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".


319-324: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.


332-526: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.


563-570: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".


599-604: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.


612-774: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.


811-818: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".


847-852: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.


860-1022: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.


1059-1066: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".


1095-1100: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.


1108-1270: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.


1283-1296: Ensure browsePathsV2 aspect is correctly defined.

The browsePathsV2 aspect correctly defines the path for the dataset.


1307-1312: Ensure tagKey aspect is correctly defined.

The tagKey aspect correctly defines the name of the tag as "Dimension".


1323-1328: Ensure tagKey aspect is correctly defined.

The tagKey aspect correctly defines the name of the tag as "Measure".


301-308: Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.


115-122: Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.

Verification successful

We need to validate the SQL in the viewLogic field within the LookML configuration. The previous attempt using sqlglot failed because the command was not found in the environment. Let's check for other available tools that might help us validate LookML SQL.


The automated SQL validation tools are not available in the current environment. Therefore, we need to manually inspect the SQL in the viewLogic field for correctness.

Ensure viewProperties aspect is correctly defined:

  • The SQL logic within the viewLogic field appears to be using LookML templating correctly.
  • The sql_table_name is dynamically set based on the _user_attributes['looker_env'] value, which is a common practice in LookML.
  • The dimension definition for generated_message_id seems properly formatted with the necessary attributes (group_label, primary_key, type, sql).

Given the context and the manual inspection, the SQL logic seems to be correctly defined.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Validate the SQL in the viewLogic field.

# Test: Extract and validate the SQL.
echo 'view: activity_logs {
  sql_table_name:
        {% if _user_attributes['looker_env'] == 'dev' %}
          {{ _user_attributes['dev_database_prefix'] }}analytics.{{ _user_attributes['dev_schema_prefix'] }}staging_app.stg_app__activity_logs
        {% elsif _user_attributes['looker_env'] == 'prod' %}
          analytics.staging_app.stg_app__activity_logs
        {% else %}
          analytics.staging_app.stg_app__activity_logs
        {% endif %}
        ;;
  
  dimension: generated_message_id {
    group_label: "IDs"
    primary_key: yes
    type: number
    sql: ${TABLE}."GENERATED_MESSAGE_ID" ;;
  }
}' | sqlglot transpile --dialect lookml

Length of output: 720



Script:

#!/bin/bash
# Check for available SQL validation tools in the environment
command -v sqlfluff
command -v sqlparse
command -v lookml

Length of output: 63


829-836: Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.


581-588: Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.

metadata-ingestion/tests/integration/lookml/expected_output.json (12)

170-187: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD) is correct and consistent across the file.

Verification successful

Dataset URNs are Consistently Formatted

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD) is consistently used across the relevant files.

  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
  • metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\)'

Length of output: 2430


894-894: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is correct and consistent across the file.

Verification successful

Consistent Use of Dataset URNs Verified

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is consistently formatted across the relevant JSON files.

  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
  • metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.looker_schema.events,PROD\)'

Length of output: 1300


1020-1020: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is correct and consistent across the file.

Verification successful

Ensure Consistent Use of Dataset URNs

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is consistently formatted and used across the files metadata-ingestion/tests/integration/lookml/expected_output.json and metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json.

  • metadata-ingestion/tests/integration/lookml/expected_output.json
  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.looker_schema.events,PROD\)'

Length of output: 1300


1330-1330: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correct and consistent across the file.

Verification successful

Consistent Use of Dataset URNs Verified

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is consistently formatted across the file and other related files.

  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
  • metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD\)'

Length of output: 885


1456-1471: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,fragment_derived_view,PROD) is correct and consistent across the file.

Verification successful

Dataset URNs are consistent

The dataset URNs are consistently formatted across the relevant JSON files.

  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
  • metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD\)'

Length of output: 1739


488-511: Ensure Consistent Use of Derived View URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD) is correct and consistent across the file.

Verification successful

Consistent Use of Derived View URNs Verified

The dataset URN for the upstream lineage urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD) is consistently formatted across the file.

  • metadata-ingestion/tests/integration/lookml/expected_output.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
  • metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
  • metadata-ingestion/tests/integration/looker/golden_test_ingest_unaliased_joins.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of derived view URNs in the JSON file.

# Test: Search for all derived view URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD\)'

Length of output: 23279


Line range hint 1807-1826: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.ecommerce.ability,PROD) is correct and consistent across the file.

Verification successful

Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage is consistent across the file and other related files.

  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
  • metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.ecommerce.ability,PROD\)'

Length of output: 1285


188-229: Verify Field Lineage Consistency

Ensure that the field lineage information is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.

Verification successful

Verify Field Lineage Consistency

The field lineage information is accurate and consistent with the dataset URNs and field paths.

  • Fields verified: city, is_latest, timestamp, measurement
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of field lineage information in the JSON file.

# Test: Search for all field URNs. Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),country\)'

Length of output: 508



Script:

#!/bin/bash
# Description: Verify consistency of field lineage information in the JSON file.

# Test: Search for all field URNs mentioned in the review comment. Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),city\)'
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),is_latest\)'
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),timestamp\)'
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD\),measurement\)'

Length of output: 2050


512-533: Verify Field Lineage Consistency for Derived Views

Ensure that the field lineage information for derived views is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.

Verification successful

Field Lineage Consistency Verified

The field lineage information for derived views, including the dataset URNs and field paths for "timestamp" and "measurement," is accurate and consistent across the JSON files.

  • File: metadata-ingestion/tests/integration/lookml/expected_output.json
  • Lines: 512-533
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of field lineage information for derived views in the JSON file.

# Test: Search for all field URNs in derived views. Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD\),country\)'

Length of output: 3370



Script:

#!/bin/bash
# Description: Verify consistency of field lineage information for derived views in the JSON file.

# Test: Search for all field URNs in derived views for "timestamp" and "measurement". Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD\),(timestamp|measurement)\)'

Length of output: 4959


1204-1204: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,..autodetect_sql_name_based_on_view_name,PROD) is correct and consistent across the file.


1472-1493: Verify Field Lineage Consistency

Ensure that the field lineage information is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.

Verification successful

Field Lineage Consistency Verified

The field lineage information for 'platform' and 'country' fields is accurate and consistent with the dataset URNs and field paths.

  • metadata-ingestion/tests/integration/lookml/expected_output.json
  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of field lineage information in the JSON file.

# Test: Search for all field URNs. Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD\),date\)'

Length of output: 538



Script:

#!/bin/bash
# Description: Verify consistency of field lineage information in the JSON file for 'platform' and 'country' fields.

# Test: Search for all field URNs related to 'platform'.
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD\),platform\)'

# Test: Search for all field URNs related to 'country'.
rg --type json 'urn:li:schemaField:\(urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD\),country\)'

Length of output: 1098


768-768: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correct and consistent across the file.

Verification successful

Ensure Consistent Use of Dataset URNs

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correctly and consistently formatted across the relevant files.

  • metadata-ingestion/tests/integration/lookml/expected_output.json
  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:\(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD\)'

Length of output: 885



Script:

#!/bin/bash
# Description: List all unique dataset URNs in the JSON files to verify consistency.

# Extract all dataset URNs from the JSON files and list unique occurrences.
rg --type json '"dataset": "urn:li:dataset:([^"]+)"' -o -r '$1' | sort | uniq -c

Length of output: 106480

metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (5)

173-229: Ensure Consistency in Field Names Between Upstream and Downstream.

The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names country, city, is_latest, timestamp, and measurement in the upstream and downstream datasets are correct and consistent.


488-533: Ensure Consistency in Field Names Between Upstream and Downstream.

The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names country, city, timestamp, measurement, and average_measurement in the upstream and downstream datasets are correct and consistent.


Line range hint 1405-1493: Verify View Logic and Field Mapping.

Ensure that the view logic and field mappings are correct and consistent with the dataset schema. The field names date, platform, and country should be verified for correctness.


1644-1644: Verify View Logic and Field Mapping.

Ensure that the view logic and field mappings are correct and consistent with the dataset schema. The field names customer_id, sale_price, and order_region should be verified for correctness.


1459-1493: Ensure Consistency in Field Names Between Upstream and Downstream.

The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names date, platform, and country in the upstream and downstream datasets are correct and consistent.

metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (10)

173-229: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


488-533: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


1462-1493: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


Line range hint 1644-1649: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


Line range hint 1518-1552: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


1405-1407: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


1464-1492: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


Line range hint 1644-1650: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


Line range hint 1535-1552: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.


Line range hint 1518-1535: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (5)

173-229: Ensure consistency in fine-grained lineage.

The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.


488-533: Ensure consistency in fine-grained lineage.

The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.


1461-1493: Ensure consistency in fine-grained lineage.

The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.


1644-1644: Ensure the correctness of view logic.

The view logic for the customer_facts dataset includes a conditional clause. Verify that the condition syntax and logic are correct.


1405-1405: Ensure the correctness of view logic.

The view logic for the fragment_derived_view dataset includes a SELECT statement. Verify that the SQL syntax and logic are correct.

metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (4)

Line range hint 1405-1493: Verify the aliasing of platform to aliased_platform.

The field platform is aliased to aliased_platform. Ensure this aliasing is correct and intentional.

Verification successful

Verification successful: The aliasing of platform to aliased_platform is correct and intentional.

The aliasing of platform to aliased_platform is consistently applied across multiple files in the codebase, indicating that it is deliberate and part of the intended design.

  • Files verified:
    • metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
    • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
    • metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
    • metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
    • metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
    • metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
    • metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
    • metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the aliasing of `platform` to `aliased_platform` in the codebase.

# Test: Search for the usage of `platform` and `aliased_platform` in the codebase. Expect: Only intentional aliasing.
rg --type json $'platform' | rg --type json $'aliased_platform'

Length of output: 4584


1644-1644: Verify the liquid condition tag usage.

Ensure the liquid condition tag {% condition order_region %} order.region {% endcondition %} is used correctly and intentionally.

Verification successful

The liquid condition tag {% condition order_region %} order.region {% endcondition %} is used consistently and intentionally across multiple test files.

  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
  • metadata-ingestion/tests/integration/lookml/expected_output.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the usage of the liquid condition tag in the codebase.

# Test: Search for the usage of the liquid condition tag in the codebase. Expect: Only intentional usages.
rg --type json $'{% condition order_region %} order.region {% endcondition %}'

Length of output: 215



Script:

#!/bin/bash
# Description: Verify the usage of the liquid condition tag in the codebase.

# Test: Search for the usage of the liquid condition tag in the codebase. Expect: Only intentional usages.
rg --type json '\{% condition order_region %\} order\.region \{% endcondition %\}'

Length of output: 2731


488-533: Verify the mapping of measurement to average_measurement.

The field measurement is mapped to average_measurement downstream. Ensure this mapping is correct and intentional.

Verification successful

The mapping of measurement to average_measurement is intentional and consistent.

The field measurement is consistently mapped to average_measurement downstream across multiple files, indicating that this is a deliberate design choice.

  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
  • metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the mapping of `measurement` to `average_measurement` in the codebase.

# Test: Search for the usage of `measurement` and `average_measurement` in the codebase. Expect: Only intentional mappings.
rg --type json $'measurement' | rg --type json $'average_measurement'

Length of output: 7650



Script:

#!/bin/bash
# Description: Verify the mapping of `measurement` to `average_measurement` in the codebase by examining context.

# Search for the usage of `measurement` in the codebase.
rg --type json --context 5 'measurement'

# Search for the usage of `average_measurement` in the codebase.
rg --type json --context 5 'average_measurement'

Length of output: 195286


173-229: Verify the mapping of measurement to average_measurement.

The field measurement is mapped to average_measurement downstream. Ensure this mapping is correct and intentional.

metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (3)

162-229: Ensure consistency in field naming conventions.

The field average_measurement in the downstream is not consistent with the upstream field measurement. This could lead to confusion or errors in lineage tracking.

Is this intentional? If not, consider renaming the downstream field to measurement to maintain consistency.


477-533: Ensure consistency in field naming conventions.

The field average_measurement in the downstream is not consistent with the upstream field measurement. This could lead to confusion or errors in lineage tracking.

Is this intentional? If not, consider renaming the downstream field to measurement to maintain consistency.


Line range hint 1644-1653: Ensure proper handling of liquid conditions.

The view logic includes a liquid condition tag {% condition order_region %} order.region {% endcondition %}. Ensure that the liquid condition is correctly parsed and resolved during execution.

Is there a mechanism in place to handle liquid conditions in the view logic?

metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (2)

1466-1500: Ensure Consistency in Field Names

The fineGrainedLineages section contains field mappings between upstream and downstream datasets. Ensure that the field names are consistent and correctly mapped.

Verification successful

Ensure Consistency in Field Names

The extracted fineGrainedLineages section shows consistent field mappings between upstreams and downstreams with expected transformations (e.g., aliasing). The field names are correctly mapped, maintaining consistency across the datasets.

  • Example Mappings:
    • upstreams: country -> downstreams: country
    • upstreams: platform -> downstreams: aliased_platform
    • upstreams: measurement -> downstreams: average_measurement

The field names in the fineGrainedLineages section are consistent and correctly mapped.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistency of field names in fineGrainedLineages.

# Test: Search for field names in the provided JSON file. Expect: Consistent field names across upstream and downstream mappings.
rg --type json 'fineGrainedLineages' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json

Length of output: 508



Script:

#!/bin/bash
# Description: Extract and compare field names in fineGrainedLineages for consistency.

# Extract upstream and downstream field names within fineGrainedLineages
jq '.[] | select(has("fineGrainedLineages")) | .fineGrainedLineages[] | {upstreams: .upstreams, downstreams: .downstreams}' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json

Length of output: 205



Script:

#!/bin/bash
# Description: Extract and compare field names in fineGrainedLineages for consistency.

# Extract upstream and downstream field names within fineGrainedLineages
jq '.. | .fineGrainedLineages? // empty | .[] | {upstreams: .upstreams, downstreams: .downstreams}' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json

Length of output: 6540


173-229: Ensure Consistency in Field Names

The fineGrainedLineages section contains field mappings between upstream and downstream datasets. Ensure that the field names are consistent and correctly mapped.

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (10)

170-170: Verify the dataset reference.

Ensure that the dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD) is correct and consistent with the rest of the data.


173-229: Verify the fine-grained lineage information.

Ensure that the fine-grained lineage information for fields like country, city, is_latest, timestamp, and measurement is correct and follows the expected format.


488-488: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.foo.view.my_view,PROD) is correct and consistent with the rest of the data.


491-536: Verify the fine-grained lineage information.

Ensure that the fine-grained lineage information for fields like country, city, timestamp, measurement, and average_measurement is correct and follows the expected format.


774-774: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correct and consistent with the rest of the data.


1032-1032: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is correct and consistent with the rest of the data.


Line range hint 1040-1049: Verify the fine-grained lineage information.

Ensure that the fine-grained lineage information for the additional_measure field is correct and follows the expected format.


1219-1219: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,..autodetect_sql_name_based_on_view_name,PROD) is correct and consistent with the rest of the data.


1348-1348: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correct and consistent with the rest of the data.


1477-1477: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,fragment_derived_view,PROD) is correct and consistent with the rest of the data.

Comment on lines +173 to +185
],
"fineGrainedLineages": [
{
"upstreamType": "FIELD_SET",
"upstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country)"
],
"downstreamType": "FIELD",
"downstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),country)"
],
"confidenceScore": 1.0
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure consistency in field names.

The downstreamType for the field measurement should match the upstream field's name. The downstreams field should use the same field name measurement instead of average_measurement, unless this transformation is intentional and documented.

- "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),average_measurement)"
+ "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),measurement)"

Committable suggestion was skipped due to low confidence.

Comment on lines +1448 to +1495
{
"com.linkedin.pegasus2avro.dataset.UpstreamLineage": {
"upstreams": [
{
"auditStamp": {
"time": 1586847600000,
"actor": "urn:li:corpuser:datahub"
},
"dataset": "urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD)",
"type": "VIEW"
}
],
"fineGrainedLineages": [
{
"upstreamType": "FIELD_SET",
"upstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),date)"
],
"downstreamType": "FIELD",
"downstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),date)"
],
"confidenceScore": 1.0
},
{
"upstreamType": "FIELD_SET",
"upstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),platform)"
],
"downstreamType": "FIELD",
"downstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),aliased_platform)"
],
"confidenceScore": 1.0
},
{
"upstreamType": "FIELD_SET",
"upstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),country)"
],
"downstreamType": "FIELD",
"downstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),country)"
],
"confidenceScore": 1.0
}
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure completeness of field definitions.

The fields date, aliased_platform, and country are mentioned in the view logic but not defined in the schema metadata. This could lead to incomplete metadata representation.

Ensure that all fields used in the view logic are defined in the schema metadata.

{
  "fieldPath": "date",
  "nullable": false,
  "description": "Date",
  "label": "",
  "type": {
    "type": {
      "com.linkedin.pegasus2avro.schema.StringType": {}
    }
  },
  "nativeDataType": "string",
  "recursive": false,
  "globalTags": {
    "tags": []
  },
  "isPartOfKey": false
},
{
  "fieldPath": "aliased_platform",
  "nullable": false,
  "description": "Platform",
  "label": "",
  "type": {
    "type": {
      "com.linkedin.pegasus2avro.schema.StringType": {}
    }
  },
  "nativeDataType": "string",
  "recursive": false,
  "globalTags": {
    "tags": []
  },
  "isPartOfKey": false
},
{
  "fieldPath": "country",
  "nullable": false,
  "description": "Country",
  "label": "",
  "type": {
    "type": {
      "com.linkedin.pegasus2avro.schema.StringType": {}
    }
  },
  "nativeDataType": "string",
  "recursive": false,
  "globalTags": {
    "tags": []
  },
  "isPartOfKey": false
}

Comment on lines +27 to +41
def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]:
filter_value: Optional[str] = cast(
str, context.globals.get(self.filter_name)
) # to silent lint

if filter_value is None:
raise CustomTagException(
f'filter {self.filter_name} value is not provided for "condition" tag'
)

filter_value = filter_value.strip()

buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'")

return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle potential edge cases in the render_to_output method.

Ensure that the filter_value is properly escaped to prevent SQL injection.

filter_value = filter_value.strip()
+        # Escape single quotes to prevent SQL injection
+        filter_value = filter_value.replace("'", "''")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]:
filter_value: Optional[str] = cast(
str, context.globals.get(self.filter_name)
) # to silent lint
if filter_value is None:
raise CustomTagException(
f'filter {self.filter_name} value is not provided for "condition" tag'
)
filter_value = filter_value.strip()
buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'")
return True
def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]:
filter_value: Optional[str] = cast(
str, context.globals.get(self.filter_name)
) # to silent lint
if filter_value is None:
raise CustomTagException(
f'filter {self.filter_name} value is not provided for "condition" tag'
)
filter_value = filter_value.strip()
# Escape single quotes to prevent SQL injection
filter_value = filter_value.replace("'", "''")
buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'")
return True

Comment on lines +412 to +417
"aspectName": "subTypes",
"aspect": {
"json": {
"materialized": false,
"viewLogic": "view: owners {\n dimension: id {\n primary_key: yes\n sql: ${TABLE}.id ;;\n }\n dimension: owner_name {\n sql: ${TABLE}.owner_name ;;\n }\n}",
"viewLanguage": "lookml"
"typeNames": [
"View"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

Ensure dataset URNs are updated consistently.

The dataset URN urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.owners,PROD) is still present in multiple JSON files. Please ensure that all instances of this URN are updated consistently across the codebase.

  • metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
  • metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json
  • metadata-ingestion/tests/integration/lookml/expected_output.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
Analysis chain

Ensure dataset URNs are updated consistently.

The dataset URN for lkml_samples.view.owners should be updated consistently across all aspects.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify all instances of dataset URNs are updated consistently.

# Test: Search for all instances of the old and new dataset URNs. Expect: Only new URNs.
rg --type json $'urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.owners,PROD)'

Length of output: 92



Script:

#!/bin/bash
# Description: Verify all instances of dataset URNs are updated consistently.

# Test: Search for all instances of `lkml_samples.view.owners` URNs in JSON files.
rg --type json 'lkml_samples.view.owners'

Length of output: 10208

Comment on lines +66 to +87
def parse(self, stream: TokenStream) -> Node:
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START)

start_token = stream.current

stream.next_token()
expect(stream, TOKEN_EXPRESSION)
filter_name: str = stream.current.value.strip()

stream.next_token()
expect(stream, TOKEN_LITERAL)

sql_or_lookml_reference: str = stream.current.value.strip()

stream.next_token()
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END)

return ConditionNode(
tok=start_token,
sql_or_lookml_reference=sql_or_lookml_reference,
filter_name=filter_name,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure proper error handling in the parse method.

Add error handling for unexpected tokens to improve robustness.

def parse(self, stream: TokenStream) -> Node:
    try:
        expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START)
        start_token = stream.current

        stream.next_token()
        expect(stream, TOKEN_EXPRESSION)
        filter_name: str = stream.current.value.strip()

        stream.next_token()
        expect(stream, TOKEN_LITERAL)
        sql_or_lookml_reference: str = stream.current.value.strip()

        stream.next_token()
        expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END)

        return ConditionNode(
            tok=start_token,
            sql_or_lookml_reference=sql_or_lookml_reference,
            filter_name=filter_name,
        )
+    except Exception as e:
+        raise CustomTagException(f"Error parsing condition tag: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def parse(self, stream: TokenStream) -> Node:
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START)
start_token = stream.current
stream.next_token()
expect(stream, TOKEN_EXPRESSION)
filter_name: str = stream.current.value.strip()
stream.next_token()
expect(stream, TOKEN_LITERAL)
sql_or_lookml_reference: str = stream.current.value.strip()
stream.next_token()
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END)
return ConditionNode(
tok=start_token,
sql_or_lookml_reference=sql_or_lookml_reference,
filter_name=filter_name,
)
def parse(self, stream: TokenStream) -> Node:
try:
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START)
start_token = stream.current
stream.next_token()
expect(stream, TOKEN_EXPRESSION)
filter_name: str = stream.current.value.strip()
stream.next_token()
expect(stream, TOKEN_LITERAL)
sql_or_lookml_reference: str = stream.current.value.strip()
stream.next_token()
expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END)
return ConditionNode(
tok=start_token,
sql_or_lookml_reference=sql_or_lookml_reference,
filter_name=filter_name,
)
except Exception as e:
raise CustomTagException(f"Error parsing condition tag: {e}")

Comment on lines 22 to 39
def _get_generic_definition(
looker_connection: DBConnection, platform: Optional[str] = None
) -> Tuple[str, Optional[str], Optional[str]]:
if platform is None:
# We extract the platform from the dialect name
dialect_name = looker_connection.dialect_name
assert dialect_name is not None
# generally the first part of the dialect name before _ is the name of the platform
# versions are encoded as numbers and can be removed
# e.g. spark1 or hive2 or druid_18
platform = re.sub(r"[0-9]+", "", dialect_name.split("_")[0])

assert (
platform is not None
), f"Failed to extract a valid platform from connection {looker_connection}"
db = looker_connection.database
schema = looker_connection.schema # ok for this to be None
return (platform, db, schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for missing Looker connection dialect name.

Ensure that missing dialect names are handled gracefully.

def _get_generic_definition(
    looker_connection: DBConnection, platform: Optional[str] = None
) -> Tuple[str, Optional[str], Optional[str]]:
    if platform is None:
        # We extract the platform from the dialect name
        dialect_name = looker_connection.dialect_name
        assert dialect_name is not None
        # generally the first part of the dialect name before _ is the name of the platform
        # versions are encoded as numbers and can be removed
        # e.g. spark1 or hive2 or druid_18
        platform = re.sub(r"[0-9]+", "", dialect_name.split("_")[0])

    assert (
        platform is not None
    ), f"Failed to extract a valid platform from connection {looker_connection}"
    db = looker_connection.database
    schema = looker_connection.schema  # ok for this to be None
    return (platform, db, schema)

Committable suggestion was skipped due to low confidence.

Comment on lines 12 to 19
def _get_bigquery_definition(
looker_connection: DBConnection,
) -> Tuple[str, Optional[str], Optional[str]]:
platform = "bigquery"
# bigquery project ids are returned in the host field
db = looker_connection.host
schema = looker_connection.database
return (platform, db, schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for missing Looker connection fields.

Ensure that missing fields in the Looker connection are handled gracefully.

def _get_bigquery_definition(
    looker_connection: DBConnection,
) -> Tuple[str, Optional[str], Optional[str]]:
    platform = "bigquery"
    # bigquery project ids are returned in the host field
    db = looker_connection.host
+    if db is None:
+        raise ConfigurationError("BigQuery project ID (host) is missing in the Looker connection.")
    schema = looker_connection.database
    return (platform, db, schema)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_bigquery_definition(
looker_connection: DBConnection,
) -> Tuple[str, Optional[str], Optional[str]]:
platform = "bigquery"
# bigquery project ids are returned in the host field
db = looker_connection.host
schema = looker_connection.database
return (platform, db, schema)
def _get_bigquery_definition(
looker_connection: DBConnection,
) -> Tuple[str, Optional[str], Optional[str]]:
platform = "bigquery"
# bigquery project ids are returned in the host field
db = looker_connection.host
if db is None:
raise ConfigurationError("BigQuery project ID (host) is missing in the Looker connection.")
schema = looker_connection.database
return (platform, db, schema)

Comment on lines +43 to +97
def _load_viewfile(
self, project_name: str, path: str, reporter: LookMLSourceReport
) -> Optional[LookerViewFile]:
# always fully resolve paths to simplify de-dup
path = str(pathlib.Path(path).resolve())
allowed_extensions = [_VIEW_FILE_EXTENSION, _EXPLORE_FILE_EXTENSION]
matched_any_extension = [
match for match in [path.endswith(x) for x in allowed_extensions] if match
]
if not matched_any_extension:
# not a view file
logger.debug(
f"Skipping file {path} because it doesn't appear to be a view file. Matched extensions {allowed_extensions}"
)
return None

if self.is_view_seen(str(path)):
return self.viewfile_cache[path]

try:
with open(path) as file:
raw_file_content = file.read()
except Exception as e:
logger.debug(f"An error occurred while reading path {path}", exc_info=True)
self.reporter.report_failure(
path, f"failed to load view file {path} from disk: {e}"
)
return None
try:
logger.debug(f"Loading viewfile {path}")

parsed = load_lkml(path)

resolve_liquid_variable_in_view_dict(
raw_view=parsed,
liquid_variable=self.liquid_variable,
)

looker_viewfile = LookerViewFile.from_looker_dict(
absolute_file_path=path,
looker_view_file_dict=parsed,
project_name=project_name,
root_project_name=self._root_project_name,
base_projects_folder=self._base_projects_folder,
raw_file_content=raw_file_content,
reporter=reporter,
)
logger.debug(f"adding viewfile for path {path} to the cache")
self.viewfile_cache[path] = looker_viewfile
return looker_viewfile
except Exception as e:
logger.debug(f"An error occurred while parsing path {path}", exc_info=True)
self.reporter.report_failure(path, f"failed to load view file {path}: {e}")
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve error handling in the _load_viewfile method.

Ensure that the method handles file reading and parsing errors gracefully.

def _load_viewfile(
    self, project_name: str, path: str, reporter: LookMLSourceReport
) -> Optional[LookerViewFile]:
    # always fully resolve paths to simplify de-dup
    path = str(pathlib.Path(path).resolve())
    allowed_extensions = [_VIEW_FILE_EXTENSION, _EXPLORE_FILE_EXTENSION]
    matched_any_extension = [
        match for match in [path.endswith(x) for x in allowed_extensions] if match
    ]
    if not matched_any_extension:
        # not a view file
        logger.debug(
            f"Skipping file {path} because it doesn't appear to be a view file. Matched extensions {allowed_extensions}"
        )
        return None

    if self.is_view_seen(str(path)):
        return self.viewfile_cache[path]

    try:
        with open(path) as file:
            raw_file_content = file.read()
    except Exception as e:
        logger.debug(f"An error occurred while reading path {path}", exc_info=True)
        self.reporter.report_failure(
            path, f"failed to load view file {path} from disk: {e}"
        )
        return None
    try:
        logger.debug(f"Loading viewfile {path}")

        parsed = load_lkml(path)

        resolve_liquid_variable_in_view_dict(
            raw_view=parsed,
            liquid_variable=self.liquid_variable,
        )

        looker_viewfile = LookerViewFile.from_looker_dict(
            absolute_file_path=path,
            looker_view_file_dict=parsed,
            project_name=project_name,
            root_project_name=self._root_project_name,
            base_projects_folder=self._base_projects_folder,
            raw_file_content=raw_file_content,
            reporter=reporter,
        )
        logger.debug(f"adding viewfile for path {path} to the cache")
        self.viewfile_cache[path] = looker_viewfile
        return looker_viewfile
    except Exception as e:
        logger.debug(f"An error occurred while parsing path {path}", exc_info=True)
        self.reporter.report_failure(path, f"failed to load view file {path}: {e}")
        return None

Committable suggestion was skipped due to low confidence.

Comment on lines 87 to 229
continue
elif inc.startswith("/"):
glob_expr = f"{resolved_project_folder}{inc}"

# The include path is sometimes '/{project_name}/{path_within_project}'
# instead of '//{project_name}/{path_within_project}' or '/{path_within_project}'.
#
# TODO: I can't seem to find any documentation on this pattern, but we definitely
# have seen it in the wild. Example from Mozilla's public looker-hub repo:
# https://github.com/mozilla/looker-hub/blob/f491ca51ce1add87c338e6723fd49bc6ae4015ca/fenix/explores/activation.explore.lkml#L7
# As such, we try to handle it but are as defensive as possible.

non_base_project_name = project_name
if project_name == _BASE_PROJECT_NAME and root_project_name is not None:
non_base_project_name = root_project_name
if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith(
f"/{non_base_project_name}/"
):
# This might be a local include. Let's make sure that '/{project_name}' doesn't
# exist as normal include in the project.
if not pathlib.Path(
f"{resolved_project_folder}/{non_base_project_name}"
).exists():
path_within_project = pathlib.Path(*pathlib.Path(inc).parts[2:])
glob_expr = f"{resolved_project_folder}/{path_within_project}"
else:
# Need to handle a relative path.
glob_expr = str(pathlib.Path(path).parent / inc)
# "**" matches an arbitrary number of directories in LookML
# we also resolve these paths to absolute paths so we can de-dup effectively later on
included_files = [
str(p.resolve())
for p in [
pathlib.Path(p)
for p in sorted(
glob.glob(glob_expr, recursive=True)
+ glob.glob(f"{glob_expr}.lkml", recursive=True)
)
]
# We don't want to match directories. The '**' glob can be used to
# recurse into directories.
if p.is_file()
]
logger.debug(
f"traversal_path={traversal_path}, included_files = {included_files}, seen_so_far: {seen_so_far}"
)
if "*" not in inc and not included_files:
reporter.report_failure(path, f"cannot resolve include {inc}")
elif not included_files:
reporter.report_failure(
path, f"did not resolve anything for wildcard include {inc}"
)
# only load files that we haven't seen so far
included_files = [x for x in included_files if x not in seen_so_far]
for included_file in included_files:
# Filter out dashboards - we get those through the looker source.
if (
included_file.endswith(".dashboard")
or included_file.endswith(".dashboard.lookml")
or included_file.endswith(".dashboard.lkml")
):
logger.debug(
f"include '{included_file}' is a dashboard, skipping it"
)
continue

logger.debug(
f"Will be loading {included_file}, traversed here via {traversal_path}"
)
try:
parsed = load_lkml(included_file)
seen_so_far.add(included_file)
if "includes" in parsed: # we have more includes to resolve!
resolved.extend(
LookerModel.resolve_includes(
parsed["includes"],
resolved_project_name,
root_project_name,
base_projects_folder,
included_file,
reporter,
seen_so_far,
traversal_path=traversal_path
+ "."
+ pathlib.Path(included_file).stem,
)
)
except Exception as e:
reporter.report_warning(
path, f"Failed to load {included_file} due to {e}"
)
# continue in this case, as it might be better to load and resolve whatever we can

resolved.extend(
[
ProjectInclude(project=resolved_project_name, include=f)
for f in included_files
]
)
return resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Consider simplifying nested if statements.

The resolve_includes method is well-structured and handles different include patterns effectively. Consider simplifying nested if statements for readability.

- if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith(
-     f"/{non_base_project_name}/"
- ):
+ if (non_base_project_name != _BASE_PROJECT_NAME and 
+     inc.startswith(f"/{non_base_project_name}/")):

Committable suggestion was skipped due to low confidence.

Tools
Ruff

145-152: Use a single if statement instead of nested if statements

(SIM102)

Comment on lines +1077 to +1084
"aspectName": "viewProperties",
"aspect": {
"json": {
"materialized": false,
"viewLogic": "view: employee_tax_report {\n sql_table_name: data-warehouse.finance.form-16;;\n\n dimension: id {\n type: number\n sql: ${TABLE}.id;;\n }\n\n dimension: name {\n type: string\n sql: ${TABLE}.name;;\n }\n\n measure: taxable_income {\n type: sum\n sql: ${TABLE}.tax;;\n }\n}\n",
"viewLanguage": "lookml"
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

Ensure viewProperties aspect is correctly defined.

The placeholders and structure of the SQL statements in the viewLogic field are incorrect. The placeholders should follow the format ${TABLE}.<column_name>, but they do not match the expected pattern.

  • Incorrect structure or placeholder in: ${TABLE}.id
  • Incorrect structure or placeholder in: ${TABLE}.name
  • Incorrect structure or placeholder in: ${TABLE}.tax

Please correct these placeholders to ensure the SQL statements are valid.

Analysis chain

Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Validate the SQL in the viewLogic field.

# Test: Extract and validate the SQL.
echo 'view: employee_tax_report {
  sql_table_name: data-warehouse.finance.form-16;;
  
  dimension: id {
    type: number
    sql: ${TABLE}.id;;
  }
  
  dimension: name {
    type: string
    sql: ${TABLE}.name;;
  }
  
  measure: taxable_income {
    type: sum
    sql: ${TABLE}.tax;;
  }
}' | sqlglot transpile --dialect lookml

Length of output: 409



Script:

#!/bin/bash
# Description: Validate the SQL in the viewLogic field using sqlite3.

# Extract the SQL statements from the viewLogic field.
sql_statements=$(echo 'view: employee_tax_report {
  sql_table_name: data-warehouse.finance.form-16;;
  
  dimension: id {
    type: number
    sql: ${TABLE}.id;;
  }
  
  dimension: name {
    type: string
    sql: ${TABLE}.name;;
  }
  
  measure: taxable_income {
    type: sum
    sql: ${TABLE}.tax;;
  }
}' | grep -oP '(?<=sql: ).*(?=;;)')

# Validate each SQL statement using sqlite3.
for sql in $sql_statements; do
  echo "Validating SQL: $sql"
  echo "$sql;" | sqlite3 :memory: 2>&1
done

Length of output: 1030



Script:

#!/bin/bash
# Description: Check the placeholders and structure of the SQL statements in the viewLogic field.

# Extract the SQL statements from the viewLogic field.
sql_statements=$(echo 'view: employee_tax_report {
  sql_table_name: data-warehouse.finance.form-16;;
  
  dimension: id {
    type: number
    sql: ${TABLE}.id;;
  }
  
  dimension: name {
    type: string
    sql: ${TABLE}.name;;
  }
  
  measure: taxable_income {
    type: sum
    sql: ${TABLE}.tax;;
  }
}' | grep -oP '(?<=sql: ).*(?=;;)')

# Check the placeholders and structure of each SQL statement.
for sql in $sql_statements; do
  echo "Checking SQL structure: $sql"
  if [[ $sql =~ \${TABLE}\.[a-zA-Z_]+ ]]; then
    echo "Placeholder and structure are correct for: $sql"
  else
    echo "Incorrect structure or placeholder in: $sql"
  fi
done

Length of output: 1201

@@ -166,6 +166,7 @@
"sql-metadata==2.2.2",
*sqllineage_lib,
"GitPython>2",
"python-liquid",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we remove dependencies sql-metadata and sqllineage_lib now that sqlglot parser is used ?
  2. Does sqlglot_lib need to be added to looker_common libraries instead of lookml only, unless the sqlglot imports are strictly separated into lookml only files ?

Comment on lines +355 to +359
view_name: Optional[str] = (
self.explore.name
if self.field.original_view is not None
else self.field.original_view
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious. Else would always be None ? Am I missing something ?

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple comments about golden file changes

{
"upstreamType": "FIELD_SET",
"upstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:postgres,order,PROD),customer_id)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this lineage disappear?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same reason as mentioned below.

"time": 1586847600000,
"actor": "urn:li:corpuser:datahub"
},
"dataset": "urn:li:dataset:(urn:li:dataPlatform:bigquery,project-foo.default-db.order,PROD)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to this lineage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These was disappear because of condition tag ({% condition order_region %} order.region {% endcondition %}) in liquid.view.lkml, It is not getting parsed by sqlgalot parser. I added liquid_variable in existing configuration so that this tag resolved while resolving liquid templates. Now the lineage appeared in golden file

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Outside diff range and nitpick comments (3)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3)

Line range hint 856-915: Consider breaking down the function into smaller methods.

The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.


Line range hint 916-1156: Consider breaking down the function into smaller methods.

The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.


Line range hint 1157-1306: Consider breaking down the function into smaller methods.

The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 5ad8200 and cf0f45b.

Files selected for processing (23)
  • metadata-ingestion/setup.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_view_id_cache.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (22 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)
  • metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (6 hunks)
  • metadata-ingestion/tests/integration/lookml/expected_output.json (19 hunks)
  • metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.json (6 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (6 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (9 hunks)
  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (20 hunks)
  • metadata-ingestion/tests/integration/lookml/test_lookml.py (5 hunks)
Files not summarized due to errors (1)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (14)
  • metadata-ingestion/setup.py
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py
  • metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json
  • metadata-ingestion/tests/integration/lookml/expected_output.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
  • metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
  • metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

409-412: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


592-595: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Additional comments not posted (53)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_view_id_cache.py (3)

17-30: LGTM!

The function determine_view_file_path correctly determines the file path and includes appropriate logging for debugging.


33-77: LGTM!

The class LookerViewIdCache is correctly initialized with necessary attributes.


78-120: LGTM!

The method get_looker_view_id correctly retrieves the Looker view ID with appropriate logging and error handling.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (11)

18-62: LGTM!

The class LookerRefinementResolver is correctly initialized with necessary attributes.


63-65: LGTM!

The method is_refinement correctly checks if a view name is a refinement.


68-94: LGTM!

The method merge_column correctly merges columns from the original and refinement dictionaries.


97-105: LGTM!

The method merge_and_set_column correctly merges columns and sets the result in the new raw view.


107-132: LGTM!

The method merge_refinements correctly merges refinements into the raw view and handles additive parameters.


134-146: LGTM!

The method get_refinements correctly retrieves refinements from the views based on the view name.


148-166: LGTM!

The method get_refinement_from_model_includes correctly retrieves refinements from the model includes and handles missing view files.


168-175: LGTM!

The method should_skip_processing correctly checks if processing should be skipped based on the view name and source configuration.


177-202: LGTM!

The method apply_view_refinement correctly applies refinements to a view and handles caching.


205-222: LGTM!

The method add_extended_explore correctly adds extended explores to the raw explore.


223-251: LGTM!

The method apply_explore_refinement correctly applies refinements to an explore and handles caching.

metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py (3)

147-155: LGTM!

The function _get_bigquery_definition correctly retrieves the BigQuery connection definition.


157-175: LGTM!

The function _get_generic_definition correctly retrieves the generic connection definition and handles platform extraction from the dialect name.


177-220: LGTM!

The class LookerConnectionDefinition is correctly initialized with necessary attributes, and the methods handle validation and creation of connection definitions.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (10)

24-54: LGTM!

The class LookerFieldContext is correctly initialized with necessary attributes, and the methods handle field context operations.


57-164: LGTM!

The class LookerViewContext is correctly initialized with necessary attributes, and the methods handle view context operations.


192-217: LGTM!

The method resolve_extends_view_name correctly resolves the extends view name and handles missing views with appropriate logging.


219-249: LGTM!

The method get_including_extends correctly retrieves the field from the current view or the extended view.


251-253: LGTM!

The method _get_sql_table_name_field correctly retrieves the SQL table name field.


254-263: LGTM!

The method _is_dot_sql_table_name_present correctly checks if the SQL table name contains a dot.


265-277: LGTM!

The method sql_table_name correctly retrieves the SQL table name and handles special cases.


279-287: LGTM!

The method derived_table correctly retrieves the derived table and handles missing tables with assertions.


289-297: LGTM!

The method explore_source correctly retrieves the explore source and handles missing sources with assertions.


299-322: LGTM!

The method sql correctly retrieves the SQL query and handles transformations.

metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.json (7)

170-170: Update dataset URN to postgres.

The dataset URN has been updated from conn to postgres. Ensure this change is consistent with the intended data platform.


178-178: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Verify that this change aligns with the data platform schema.


189-189: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Ensure this change is consistent with the intended data platform.


200-200: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Verify that this change aligns with the data platform schema.


211-211: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Ensure this change is consistent with the intended data platform.


222-222: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Verify that this change aligns with the data platform schema.


233-233: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Ensure this change is consistent with the intended data platform.

metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (8)

39-43: LGTM!

The is_derived_view function correctly checks if a view name contains the DERIVED_VIEW_SUFFIX.


46-66: LGTM! But verify edge cases.

The get_derived_looker_view_id function appears correct. Ensure that edge cases for regex and string manipulation are handled properly.


69-95: LGTM! But verify edge cases.

The resolve_derived_view_urn_of_col_ref function appears correct. Ensure that all potential edge cases are handled properly.


98-124: LGTM! But verify edge cases.

The fix_derived_view_urn function appears correct. Ensure that all potential edge cases are handled properly.


153-196: LGTM! But verify edge cases.

The _generate_fully_qualified_name function appears correct. Ensure that all potential edge cases are handled properly.


235-367: LGTM! But verify edge cases.

The SqlBasedDerivedViewUpstream class appears correct. Ensure that all potential edge cases are handled properly.


372-454: LGTM! But verify edge cases.

The NativeDerivedViewUpstream class appears correct. Ensure that all potential edge cases are handled properly.


581-633: LGTM! But verify edge cases.

The create_view_upstream function appears correct. Ensure that all potential edge cases are handled properly.

metadata-ingestion/tests/integration/lookml/test_lookml.py (6)

71-73: Comments approved.

The comments provide useful context for the configuration.


901-927: New test function approved.

The new test function test_view_to_view_lineage_and_liquid_template validates the new configuration option liquid_variable.


937-1010: New test function approved.

The new test function test_special_liquid_variables validates the behavior of special liquid variables.


47-53: Ensure the new configuration is tested.

The new configuration option liquid_variable should be validated in the test cases to ensure it works as expected.

Verification successful

The new configuration option liquid_variable is being tested.

The search results show that the liquid_variable configuration is present and being validated within the test file metadata-ingestion/tests/integration/lookml/test_lookml.py.

  • metadata-ingestion/tests/integration/lookml/test_lookml.py
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the usage of the new configuration option `liquid_variable`.

# Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence.
rg --type python $'liquid_variable'

Length of output: 65



Script:

#!/bin/bash
# Description: Verify the usage of the new configuration option `liquid_variable`.

# Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence.
rg --glob '*.py' 'liquid_variable'

Length of output: 6154


15-18: Ensure the new imports are used.

The imports for SpecialVariable and resolve_liquid_variable are new. Ensure these are used in the test cases to avoid unnecessary imports.

Verification successful

The new imports are used.

The imports for SpecialVariable and resolve_liquid_variable are utilized in the test cases, confirming their necessity.

  • metadata-ingestion/tests/integration/lookml/test_lookml.py:
    • SpecialVariable is used on lines 15-18, 496-498.
    • resolve_liquid_variable is used on lines 15-18, 496-498.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the usage of newly imported entities.

# Test: Search for the usage of imported entities. Expect: At least one occurrence of each import.
rg --type python $'SpecialVariable|resolve_liquid_variable'

Length of output: 91



Script:

#!/bin/bash
# Description: Verify the usage of newly imported entities.

# Test: Search for the usage of imported entities. Expect: At least one occurrence of each import.
rg 'SpecialVariable|resolve_liquid_variable' --glob '*.py'

Length of output: 1816


496-498: Ensure the new configuration is tested.

The new configuration option liquid_variable should be validated in the test cases to ensure it works as expected.

Verification successful

Let's correct the command and re-run the search for liquid_variable in Python test files to verify if this new configuration is being tested.


The new configuration is tested.

The new configuration option liquid_variable is validated in the test cases, particularly in the metadata-ingestion/tests/integration/lookml/test_lookml.py file. The test_special_liquid_variables function specifically tests this configuration.

  • metadata-ingestion/tests/integration/lookml/test_lookml.py
    • Function: test_special_liquid_variables
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the usage of the new configuration option `liquid_variable`.

# Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence.
rg --type python $'liquid_variable'

Length of output: 65



Script:

#!/bin/bash
# Description: Verify the usage of the new configuration option `liquid_variable`.

# Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence.
rg -t py 'liquid_variable'

Length of output: 6146

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1)

774-774: LGTM!

The code changes are approved.

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (4)

131-157: Add detailed comments to explain the deduplication criteria.

The function's logic is clear, but adding detailed comments will make it more understandable for future maintainers.

+    # Create a list of field names that are of type DIMENSION_GROUP

301-335: Add type hints for dictionary keys.

Adding type hints for the dictionary keys will improve readability and maintainability.

-        field_dict: Dict,
-        upstream_column_ref: List[ColumnRef],
-        type_cls: ViewFieldType,
-        populate_sql_logic_in_descriptions: bool,
+        field_dict: Dict[str, Any],
+        upstream_column_ref: List[ColumnRef],
+        type_cls: ViewFieldType,
+        populate_sql_logic_in_descriptions: bool,

343-404: Verify the correctness of the else statement.

The else statement might always be None. Verify if this is the intended behavior.


406-459: Add detailed comments to explain the conditions.

The function's logic is clear, but adding detailed comments will make it more understandable for future maintainers.

Comment on lines +159 to +183
def find_view_from_resolved_includes(
connection: Optional[LookerConnectionDefinition],
resolved_includes: List["ProjectInclude"],
looker_viewfile_loader: LookerViewFileLoader,
target_view_name: str,
reporter: LookMLSourceReport,
) -> Optional[Tuple["ProjectInclude", dict]]:
# It could live in one of the included files. We do not know which file the base view
# lives in, so we try them all!
for include in resolved_includes:
included_looker_viewfile = looker_viewfile_loader.load_viewfile(
include.include,
include.project,
connection,
reporter,
)
if not included_looker_viewfile:
continue
for raw_view in included_looker_viewfile.views:
raw_view_name = raw_view["name"]
# Make sure to skip loading view we are currently trying to resolve
if raw_view_name == target_view_name:
return include, raw_view

return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor for early returns to reduce nesting.

The function can be refactored to use early returns, which will improve readability and reduce nesting.

-    for include in resolved_includes:
-        included_looker_viewfile = looker_viewfile_loader.load_viewfile(
-            include.include,
-            include.project,
-            connection,
-            reporter,
-        )
-        if not included_looker_viewfile:
-            continue
-        for raw_view in included_looker_viewfile.views:
-            raw_view_name = raw_view["name"]
-            if raw_view_name == target_view_name:
-                return include, raw_view
+    for include in resolved_includes:
+        included_looker_viewfile = looker_viewfile_loader.load_viewfile(
+            include.include,
+            include.project,
+            connection,
+            reporter,
+        )
+        if included_looker_viewfile:
+            for raw_view in included_looker_viewfile.views:
+                raw_view_name = raw_view["name"]
+                if raw_view_name == target_view_name:
+                    return include, raw_view
    return None
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def find_view_from_resolved_includes(
connection: Optional[LookerConnectionDefinition],
resolved_includes: List["ProjectInclude"],
looker_viewfile_loader: LookerViewFileLoader,
target_view_name: str,
reporter: LookMLSourceReport,
) -> Optional[Tuple["ProjectInclude", dict]]:
# It could live in one of the included files. We do not know which file the base view
# lives in, so we try them all!
for include in resolved_includes:
included_looker_viewfile = looker_viewfile_loader.load_viewfile(
include.include,
include.project,
connection,
reporter,
)
if not included_looker_viewfile:
continue
for raw_view in included_looker_viewfile.views:
raw_view_name = raw_view["name"]
# Make sure to skip loading view we are currently trying to resolve
if raw_view_name == target_view_name:
return include, raw_view
return None
def find_view_from_resolved_includes(
connection: Optional[LookerConnectionDefinition],
resolved_includes: List["ProjectInclude"],
looker_viewfile_loader: LookerViewFileLoader,
target_view_name: str,
reporter: LookMLSourceReport,
) -> Optional[Tuple["ProjectInclude", dict]]:
# It could live in one of the included files. We do not know which file the base view
# lives in, so we try them all!
for include in resolved_includes:
included_looker_viewfile = looker_viewfile_loader.load_viewfile(
include.include,
include.project,
connection,
reporter,
)
if included_looker_viewfile:
for raw_view in included_looker_viewfile.views:
raw_view_name = raw_view["name"]
# Make sure to skip loading view we are currently trying to resolve
if raw_view_name == target_view_name:
return include, raw_view
return None

Comment on lines 113 to 116
"""Returns a fully qualified dataset name, resolved through a connection definition.
Input sql_table_name can be in three forms: table, db.table, db.schema.table"""
# TODO: This function should be extracted out into a Platform specific naming class since name translations
# are required across all connectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: Consider refactoring the name resolution logic.

The TODO comment suggests extracting the function into a platform-specific naming class. This can improve modularity and maintainability.

Do you want me to refactor this function into a platform-specific naming class or open a GitHub issue to track this task?

Comment on lines 204 to 307

# Parse SQL to extract dependencies.
if parse_table_names_from_sql:
(
fields,
sql_table_names,
) = cls._extract_metadata_from_derived_table_sql(
reporter,
sql_parser_path,
view_name,
sql_table_name,
view_logic,
fields,
use_external_process=process_isolation_for_sql_parsing,
)
view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length]

elif "explore_source" in derived_table:
# This is called a "native derived table".
# See https://cloud.google.com/looker/docs/creating-ndts.
explore_source = derived_table["explore_source"]

# We want this to render the full lkml block
# e.g. explore_source: source_name { ... }
# As such, we use the full derived_table instead of the explore_source.
view_logic = str(lkml.dump(derived_table))[:max_file_snippet_length]
view_lang = VIEW_LANGUAGE_LOOKML

(
fields,
upstream_explores,
) = cls._extract_metadata_from_derived_table_explore(
reporter, view_name, explore_source, fields
)
if view_context.is_sql_based_derived_case():
view_logic = view_context.sql(transformed=False)
# Parse SQL to extract dependencies.
view_details = ViewProperties(
materialized=False,
viewLogic=view_logic,
viewLanguage=VIEW_LANGUAGE_SQL,
)
elif view_context.is_native_derived_case():
# We want this to render the full lkml block
# e.g. explore_source: source_name { ... }
# As such, we use the full derived_table instead of the explore_source.
view_logic = str(lkml.dump(view_context.derived_table()))[
:max_file_snippet_length
]
view_lang = VIEW_LANGUAGE_LOOKML

materialized = False
for k in derived_table:
if k in ["datagroup_trigger", "sql_trigger_value", "persist_for"]:
materialized = True
if "materialized_view" in derived_table:
materialized = derived_table["materialized_view"] == "yes"
materialized = view_context.is_materialized_derived_view()

view_details = ViewProperties(
materialized=materialized, viewLogic=view_logic, viewLanguage=view_lang
)
else:
# If not a derived table, then this view essentially wraps an existing
# object in the database. If sql_table_name is set, there is a single
# dependency in the view, on the sql_table_name.
# Otherwise, default to the view name as per the docs:
# https://docs.looker.com/reference/view-params/sql_table_name-for-view
sql_table_names = (
[view_name] if sql_table_name is None else [sql_table_name]
)
view_details = ViewProperties(
materialized=False,
viewLogic=view_logic,
viewLanguage=VIEW_LANGUAGE_LOOKML,
)

file_path = LookerView.determine_view_file_path(
base_folder_path, looker_viewfile.absolute_file_path
)

return LookerView(
id=LookerViewId(
project_name=project_name,
model_name=model_name,
view_name=view_name,
file_path=file_path,
),
absolute_file_path=looker_viewfile.absolute_file_path,
connection=connection,
sql_table_names=sql_table_names,
upstream_explores=upstream_explores,
fields=fields,
raw_file_content=looker_viewfile.raw_file_content,
id=looker_view_id,
absolute_file_path=view_context.view_file.absolute_file_path,
connection=view_context.view_connection,
upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
fields=view_fields,
raw_file_content=view_context.view_file.raw_file_content,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider breaking down the from_looker_dict method.

The method is quite large and handles multiple responsibilities. Breaking it down into smaller methods can improve readability and maintainability.

@classmethod
def from_looker_dict(
    cls,
    project_name: str,
    model_name: str,
    view_context: LookerViewContext,
    looker_view_id_cache: LookerViewIdCache,
    reporter: LookMLSourceReport,
    max_file_snippet_length: int,
    config: LookMLSourceConfig,
    ctx: PipelineContext,
    extract_col_level_lineage: bool = False,
    populate_sql_logic_in_descriptions: bool = False,
) -> Optional["LookerView"]:
    view_name = view_context.name()
    logger.debug(f"Handling view {view_name} in model {model_name}")
    looker_view_id = cls._create_looker_view_id(project_name, model_name, view_name, view_context)
    view_upstream = cls._create_view_upstream(view_context, looker_view_id_cache, config, ctx, reporter)
    view_fields = cls._extract_view_fields(view_context, view_upstream, extract_col_level_lineage, populate_sql_logic_in_descriptions)
    view_fields = deduplicate_fields(view_fields)
    view_details = cls._determine_view_details(view_context, max_file_snippet_length)
    return cls._build_looker_view(looker_view_id, view_context, view_upstream, view_fields, view_details)

@classmethod
def _create_looker_view_id(cls, project_name: str, model_name: str, view_name: str, view_context: LookerViewContext) -> LookerViewId:
    return LookerViewId(
        project_name=project_name,
        model_name=model_name,
        view_name=view_name,
        file_path=view_context.view_file_name(),
    )

@classmethod
def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream:
    return create_view_upstream(
        view_context=view_context,
        looker_view_id_cache=looker_view_id_cache,
        config=config,
        ctx=ctx,
        reporter=reporter,
    )

@classmethod
def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]:
    field_type_vs_raw_fields = OrderedDict(
        {
            ViewFieldType.DIMENSION: view_context.dimensions(),
            ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(),
            ViewFieldType.MEASURE: view_context.measures(),
        }
    )
    view_fields = []
    for field_type, fields in field_type_vs_raw_fields.items():
        for field in fields:
            upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else []
            view_fields.append(
                ViewField.view_fields_from_dict(
                    field_dict=field,
                    upstream_column_ref=upstream_column_ref,
                    type_cls=field_type,
                    populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions,
                )
            )
    if not view_fields and view_context.is_sql_based_derived_view_without_fields_case():
        view_fields = view_upstream.create_fields()
    return view_fields

@classmethod
def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties:
    view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length]
    if view_context.is_sql_based_derived_case():
        view_logic = view_context.sql(transformed=False)
        return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL)
    elif view_context.is_native_derived_case():
        view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length]
        return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
    else:
        return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)

@classmethod
def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView":
    return LookerView(
        id=looker_view_id,
        absolute_file_path=view_context.view_file.absolute_file_path,
        connection=view_context.view_connection,
        upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
        fields=view_fields,
        raw_file_content=view_context.view_file.raw_file_content,
        view_details=view_details,
    )

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between cf0f45b and 33985b5.

Files selected for processing (1)
  • metadata-ingestion/setup.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • metadata-ingestion/setup.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Outside diff range and nitpick comments (1)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1)

39-87: Consider simplifying nested if statements.

The resolve_includes method has nested if statements that can be simplified for readability.

- if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith(
-     f"/{non_base_project_name}/"
- ):
+ if (non_base_project_name != _BASE_PROJECT_NAME and 
+     inc.startswith(f"/{non_base_project_name}/")):
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 33985b5 and d5025d4.

Files selected for processing (6)
  • metadata-ingestion/setup.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (19 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)
Files not summarized due to errors (1)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
Files skipped from review due to trivial changes (1)
  • metadata-ingestion/setup.py
Files skipped from review as they are similar to previous changes (2)
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
  • metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py

148-155: Use a single if statement instead of nested if statements

(SIM102)

metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py

40-43: Return the condition DERIVED_VIEW_SUFFIX in view_name.lower() directly

Replace with return DERIVED_VIEW_SUFFIX in view_name.lower()

(SIM103)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

319-322: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


502-505: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Additional comments not posted (13)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (3)

18-21: LGTM!

The ProjectInclude dataclass is well-defined and straightforward.


24-29: LGTM!

The LookerField dataclass is well-defined and straightforward.


244-290: LGTM!

The LookerViewFile dataclass is well-defined and straightforward.

metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (7)

200-231: LGTM!

The AbstractViewUpstream class is well-defined and follows the abstract class pattern.


237-372: LGTM!

The SqlBasedDerivedViewUpstream class is well-defined and follows the class pattern.


374-456: LGTM!

The NativeDerivedViewUpstream class is well-defined and follows the class pattern.


458-511: LGTM!

The RegularViewUpstream class is well-defined and follows the class pattern.


513-571: LGTM!

The DotSqlTableNameViewUpstream class is well-defined and follows the class pattern.


573-580: LGTM!

The EmptyImplementation class is well-defined and straightforward.


583-636: LGTM!

The create_view_upstream function is well-defined and follows the factory pattern.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (3)

42-50: New imports look necessary and relevant.

The new imports are necessary for the changes made in the file and align with the updated functionality.

Also applies to: 54-70, 98-98


109-111: New field upstream_dataset_urns looks good.

The new field upstream_dataset_urns is necessary for tracking upstream dependencies.


307-307: Initialization looks good.

The initialization of ctx and reporter is necessary and relevant to the changes made.

Comment on lines +146 to +159
view_context: LookerViewContext,
looker_view_id_cache: LookerViewIdCache,
reporter: LookMLSourceReport,
max_file_snippet_length: int,
parse_table_names_from_sql: bool = False,
sql_parser_path: str = "datahub.utilities.sql_parser.DefaultSQLParser",
config: LookMLSourceConfig,
ctx: PipelineContext,
extract_col_level_lineage: bool = False,
populate_sql_logic_in_descriptions: bool = False,
process_isolation_for_sql_parsing: bool = False,
) -> Optional["LookerView"]:
view_name = looker_view["name"]

view_name = view_context.name()

logger.debug(f"Handling view {view_name} in model {model_name}")
# The sql_table_name might be defined in another view and this view is extending that view,
# so we resolve this field while taking that into account.
sql_table_name: Optional[str] = LookerView.get_including_extends(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement suggestion: Break down the from_looker_dict method.

The method is quite large and handles multiple responsibilities. Breaking it down into smaller methods can improve readability and maintainability.

@classmethod
def from_looker_dict(
    cls,
    project_name: str,
    model_name: str,
    view_context: LookerViewContext,
    looker_view_id_cache: LookerViewIdCache,
    reporter: LookMLSourceReport,
    max_file_snippet_length: int,
    config: LookMLSourceConfig,
    ctx: PipelineContext,
    extract_col_level_lineage: bool = False,
    populate_sql_logic_in_descriptions: bool = False,
) -> Optional["LookerView"]:
    view_name = view_context.name()
    logger.debug(f"Handling view {view_name} in model {model_name}")
    looker_view_id = cls._create_looker_view_id(project_name, model_name, view_name, view_context)
    view_upstream = cls._create_view_upstream(view_context, looker_view_id_cache, config, ctx, reporter)
    view_fields = cls._extract_view_fields(view_context, view_upstream, extract_col_level_lineage, populate_sql_logic_in_descriptions)
    view_fields = deduplicate_fields(view_fields)
    view_details = cls._determine_view_details(view_context, max_file_snippet_length)
    return cls._build_looker_view(looker_view_id, view_context, view_upstream, view_fields, view_details)

@classmethod
def _create_looker_view_id(cls, project_name: str, model_name: str, view_name: str, view_context: LookerViewContext) -> LookerViewId:
    return LookerViewId(
        project_name=project_name,
        model_name=model_name,
        view_name=view_name,
        file_path=view_context.view_file_name(),
    )

@classmethod
def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream:
    return create_view_upstream(
        view_context=view_context,
        looker_view_id_cache=looker_view_id_cache,
        config=config,
        ctx=ctx,
        reporter=reporter,
    )

@classmethod
def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]:
    field_type_vs_raw_fields = OrderedDict(
        {
            ViewFieldType.DIMENSION: view_context.dimensions(),
            ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(),
            ViewFieldType.MEASURE: view_context.measures(),
        }
    )
    view_fields = []
    for field_type, fields in field_type_vs_raw_fields.items():
        for field in fields:
            upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else []
            view_fields.append(
                ViewField.view_fields_from_dict(
                    field_dict=field,
                    upstream_column_ref=upstream_column_ref,
                    type_cls=field_type,
                    populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions,
                )
            )
    if not view_fields and view_context.is_sql_based_derived_view_without_fields_case():
        view_fields = view_upstream.create_fields()
    return view_fields

@classmethod
def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties:
    view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length]
    if view_context.is_sql_based_derived_case():
        view_logic = view_context.sql(transformed=False)
        return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL)
    elif view_context.is_native_derived_case():
        view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length]
        return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
    else:
        return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)

@classmethod
def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView":
    return LookerView(
        id=looker_view_id,
        absolute_file_path=view_context.view_file.absolute_file_path,
        connection=view_context.view_connection,
        upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
        fields=view_fields,
        raw_file_content=view_context.view_file.raw_file_content,
        view_details=view_details,
    )

Comment on lines +161 to +171
project_name=project_name,
model_name=model_name,
view_name=view_name,
looker_view=looker_view,
connection=connection,
looker_viewfile=looker_viewfile,
looker_viewfile_loader=looker_viewfile_loader,
looker_refinement_resolver=looker_refinement_resolver,
field="sql_table_name",
reporter=reporter,
file_path=view_context.view_file_name(),
)

# Some sql_table_name fields contain quotes like: optimizely."group", just remove the quotes
sql_table_name = (
sql_table_name.replace('"', "").replace("`", "")
if sql_table_name is not None
else None
)
derived_table = LookerView.get_including_extends(
view_name=view_name,
looker_view=looker_view,
connection=connection,
looker_viewfile=looker_viewfile,
looker_viewfile_loader=looker_viewfile_loader,
looker_refinement_resolver=looker_refinement_resolver,
field="derived_table",
view_upstream: AbstractViewUpstream = create_view_upstream(
view_context=view_context,
looker_view_id_cache=looker_view_id_cache,
config=config,
ctx=ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor suggestion: Extract create_view_upstream to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod
def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream:
    return create_view_upstream(
        view_context=view_context,
        looker_view_id_cache=looker_view_id_cache,
        config=config,
        ctx=ctx,
        reporter=reporter,
    )

Comment on lines 319 to 322
raise ValueError(
"Failed to retrieve connections from looker client. Please check to ensure that you have manage_models permission enabled on this API key."
"Failed to retrieve connections from looker client. Please check to ensure that you have "
"manage_models permission enabled on this API key."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve exception handling by chaining exceptions.

Use raise ... from err to distinguish the exception from errors in exception handling.

- raise ValueError(
-     "Failed to retrieve connections from looker client. Please check to ensure that you have "
-     "manage_models permission enabled on this API key."
- )
+ raise ValueError(
+     "Failed to retrieve connections from looker client. Please check to ensure that you have "
+     "manage_models permission enabled on this API key."
+ ) from err
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise ValueError(
"Failed to retrieve connections from looker client. Please check to ensure that you have manage_models permission enabled on this API key."
"Failed to retrieve connections from looker client. Please check to ensure that you have "
"manage_models permission enabled on this API key."
)
raise ValueError(
"Failed to retrieve connections from looker client. Please check to ensure that you have "
"manage_models permission enabled on this API key."
) from err
Tools
Ruff

319-322: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d5025d4 and c2d2f6b.

Files selected for processing (1)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (19 hunks)
Files not summarized due to errors (1)
  • metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

502-505: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Additional comments not posted (4)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (4)

4-4: New imports added.

The newly added imports are necessary for the new functionality introduced in this file. Ensure that these imports are used appropriately in the code.

Also applies to: 42-42, 45-47, 49-53, 54-57, 58-66, 68-70, 98-98


109-109: New field upstream_dataset_urns added.

The field upstream_dataset_urns has been added to store the URNs of upstream datasets.


307-307: New field ctx added.

The field ctx has been added to store the pipeline context.


318-322: Improve exception handling by chaining exceptions.

Use raise ... from err to distinguish the exception from errors in exception handling.

- raise ValueError(
-     "Failed to retrieve connections from looker client. Please check to ensure that you have "
-     "manage_models permission enabled on this API key."
- )
+ raise ValueError(
+     "Failed to retrieve connections from looker client. Please check to ensure that you have "
+     "manage_models permission enabled on this API key."
+ ) from err

Likely invalid or redundant comment.

Comment on lines +146 to +171
view_context: LookerViewContext,
looker_view_id_cache: LookerViewIdCache,
reporter: LookMLSourceReport,
max_file_snippet_length: int,
parse_table_names_from_sql: bool = False,
sql_parser_path: str = "datahub.utilities.sql_parser.DefaultSQLParser",
config: LookMLSourceConfig,
ctx: PipelineContext,
extract_col_level_lineage: bool = False,
populate_sql_logic_in_descriptions: bool = False,
process_isolation_for_sql_parsing: bool = False,
) -> Optional["LookerView"]:
view_name = looker_view["name"]

view_name = view_context.name()

logger.debug(f"Handling view {view_name} in model {model_name}")
# The sql_table_name might be defined in another view and this view is extending that view,
# so we resolve this field while taking that into account.
sql_table_name: Optional[str] = LookerView.get_including_extends(

looker_view_id: LookerViewId = LookerViewId(
project_name=project_name,
model_name=model_name,
view_name=view_name,
looker_view=looker_view,
connection=connection,
looker_viewfile=looker_viewfile,
looker_viewfile_loader=looker_viewfile_loader,
looker_refinement_resolver=looker_refinement_resolver,
field="sql_table_name",
reporter=reporter,
file_path=view_context.view_file_name(),
)

# Some sql_table_name fields contain quotes like: optimizely."group", just remove the quotes
sql_table_name = (
sql_table_name.replace('"', "").replace("`", "")
if sql_table_name is not None
else None
)
derived_table = LookerView.get_including_extends(
view_name=view_name,
looker_view=looker_view,
connection=connection,
looker_viewfile=looker_viewfile,
looker_viewfile_loader=looker_viewfile_loader,
looker_refinement_resolver=looker_refinement_resolver,
field="derived_table",
view_upstream: AbstractViewUpstream = create_view_upstream(
view_context=view_context,
looker_view_id_cache=looker_view_id_cache,
config=config,
ctx=ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor suggestion: Extract create_view_upstream to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod
def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream:
    return create_view_upstream(
        view_context=view_context,
        looker_view_id_cache=looker_view_id_cache,
        config=config,
        ctx=ctx,
        reporter=reporter,
    )

Comment on lines +175 to +199
field_type_vs_raw_fields = OrderedDict(
{
ViewFieldType.DIMENSION: view_context.dimensions(),
ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(),
ViewFieldType.MEASURE: view_context.measures(),
}
) # in order to maintain order in golden file

fields = deduplicate_fields(fields)
view_fields: List[ViewField] = []

# Prep "default" values for the view, which will be overridden by the logic below.
view_logic = looker_viewfile.raw_file_content[:max_file_snippet_length]
sql_table_names: List[str] = []
upstream_explores: List[str] = []

if derived_table is not None:
# Derived tables can either be a SQL query or a LookML explore.
# See https://cloud.google.com/looker/docs/derived-tables.

if "sql" in derived_table:
view_logic = derived_table["sql"]
view_lang = VIEW_LANGUAGE_SQL

# Parse SQL to extract dependencies.
if parse_table_names_from_sql:
(
fields,
sql_table_names,
) = cls._extract_metadata_from_derived_table_sql(
reporter,
sql_parser_path,
view_name,
sql_table_name,
view_logic,
fields,
use_external_process=process_isolation_for_sql_parsing,
for field_type, fields in field_type_vs_raw_fields.items():
for field in fields:
upstream_column_ref: List[ColumnRef] = []
if extract_col_level_lineage:
upstream_column_ref = view_upstream.get_upstream_column_ref(
field_context=LookerFieldContext(raw_field=field)
)

elif "explore_source" in derived_table:
# This is called a "native derived table".
# See https://cloud.google.com/looker/docs/creating-ndts.
explore_source = derived_table["explore_source"]

# We want this to render the full lkml block
# e.g. explore_source: source_name { ... }
# As such, we use the full derived_table instead of the explore_source.
view_logic = str(lkml.dump(derived_table))[:max_file_snippet_length]
view_lang = VIEW_LANGUAGE_LOOKML

(
fields,
upstream_explores,
) = cls._extract_metadata_from_derived_table_explore(
reporter, view_name, explore_source, fields
view_fields.append(
ViewField.view_fields_from_dict(
field_dict=field,
upstream_column_ref=upstream_column_ref,
type_cls=field_type,
populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor suggestion: Extract field extraction logic to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod
def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]:
    field_type_vs_raw_fields = OrderedDict(
        {
            ViewFieldType.DIMENSION: view_context.dimensions(),
            ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(),
            ViewFieldType.MEASURE: view_context.measures(),
        }
    )
    view_fields = []
    for field_type, fields in field_type_vs_raw_fields.items():
        for field in fields:
            upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else []
            view_fields.append(
                ViewField.view_fields_from_dict(
                    field_dict=field,
                    upstream_column_ref=upstream_column_ref,
                    type_cls=field_type,
                    populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions,
                )
            )
    if not view_fields and view_context.is_sql_based_derived_view_without_fields_case():
        view_fields = view_upstream.create_fields()
    return view_fields

Comment on lines +214 to +231
if view_context.is_sql_based_derived_case():
view_logic = view_context.sql(transformed=False)
# Parse SQL to extract dependencies.
view_details = ViewProperties(
materialized=False,
viewLogic=view_logic,
viewLanguage=VIEW_LANGUAGE_SQL,
)
elif view_context.is_native_derived_case():
# We want this to render the full lkml block
# e.g. explore_source: source_name { ... }
# As such, we use the full derived_table instead of the explore_source.
view_logic = str(lkml.dump(view_context.derived_table()))[
:max_file_snippet_length
]
view_lang = VIEW_LANGUAGE_LOOKML

materialized = view_context.is_materialized_derived_view()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor suggestion: Extract view details determination logic to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod
def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties:
    view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length]
    if view_context.is_sql_based_derived_case():
        view_logic = view_context.sql(transformed=False)
        return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL)
    elif view_context.is_native_derived_case():
        view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length]
        return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)
    else:
        return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)

Comment on lines +244 to +249
id=looker_view_id,
absolute_file_path=view_context.view_file.absolute_file_path,
connection=view_context.view_connection,
upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
fields=view_fields,
raw_file_content=view_context.view_file.raw_file_content,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor suggestion: Extract LookerView construction to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod
def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView":
    return LookerView(
        id=looker_view_id,
        absolute_file_path=view_context.view_file.absolute_file_path,
        connection=view_context.view_connection,
        upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
        fields=view_fields,
        raw_file_content=view_context.view_file.raw_file_content,
        view_details=view_details,
    )

Comment on lines 502 to 505
raise ValueError(
f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file"
f"Could not locate a project name for model {model_name}. Consider configuring a static project name "
f"in your config file"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve exception handling by chaining exceptions.

Use raise ... from err to distinguish the exception from errors in exception handling.

- raise ValueError(
-     f"Could not locate a project name for model {model_name}. Consider configuring a static project name "
-     f"in your config file"
- )
+ raise ValueError(
+     f"Could not locate a project name for model {model_name}. Consider configuring a static project name "
+     f"in your config file"
+ ) from err
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise ValueError(
f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file"
f"Could not locate a project name for model {model_name}. Consider configuring a static project name "
f"in your config file"
)
raise ValueError(
f"Could not locate a project name for model {model_name}. Consider configuring a static project name "
f"in your config file"
) from err
Tools
Ruff

502-505: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c2d2f6b and 8629f42.

Files selected for processing (1)
  • metadata-ingestion/setup.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • metadata-ingestion/setup.py

@hsheth2 hsheth2 merged commit 43bac36 into datahub-project:master Jul 8, 2024
57 of 58 checks passed
arosanda added a commit to infobip/datahub that referenced this pull request Sep 23, 2024
* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820)

* refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2)  (datahub-project#10764)

Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>

* fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819)

* feat(ingest/transformer): tags to terms transformer (datahub-project#10758)

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>

* fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752)

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>

* feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822)

* feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823)

* feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824)

* feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825)

* add flag for includeSoftDeleted in scroll entities API (datahub-project#10831)

* feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832)

* feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826)

* add scroll parameters to openapi v3 spec (datahub-project#10833)

* fix(ingest): correct profile_day_of_week implementation (datahub-project#10818)

* feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* feat(cli): add more details to get cli (datahub-project#10815)

* fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836)

* fix(ingestion): fix datajob patcher (datahub-project#10827)

* fix(smoke-test): add suffix in temp file creation (datahub-project#10841)

* feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784)

* feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645)

Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com>

* docs(patch): add patch documentation for how implementation works (datahub-project#10010)

Co-authored-by: John Joyce <john@acryl.io>

* fix(jar): add missing custom-plugin-jar task (datahub-project#10847)

* fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391)

Co-authored-by: John Joyce <john@acryl.io>

* docs(): Update posts.md (datahub-project#9893)

Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* chore(ingest): update acryl-datahub-classify version (datahub-project#10844)

* refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828)

Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834)

* fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849)

* fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848)

* fix(smoke-test): missing test for move domain (datahub-project#10837)

* ci: update usernames to not considered for community (datahub-project#10851)

* env: change defaults for data contract visibility (datahub-project#10854)

* fix(ingest/tableau): quote special characters in external URL (datahub-project#10842)

* fix(smoke-test): fix flakiness of auto complete test

* ci(ingest): pin dask dependency for feast (datahub-project#10865)

* fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542)

* feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829)

* chore(ingest): Mypy 1.10.1 pin (datahub-project#10867)

* docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852)

* docs: add new js snippet (datahub-project#10846)

* refactor(ingestion): remove company domain for security reason (datahub-project#10839)

* fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498)

Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com>

* fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874)

* fix(manage-tokens): fix manage access token policy (datahub-project#10853)

* Batch get entity endpoints (datahub-project#10880)

* feat(system): support conditional write semantics (datahub-project#10868)

* fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890)

* feat(ingest/lookml): shallow clone repos (datahub-project#10888)

* fix(ingest/looker): add missing dependency (datahub-project#10876)

* fix(ingest): only populate audit stamps where accurate (datahub-project#10604)

* fix(ingest/dbt): always encode tag urns (datahub-project#10799)

* fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727)

* fix(ingestion/looker): column name missing in explore (datahub-project#10892)

* fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879)

* feat(conditional-writes): misc updates and fixes (datahub-project#10901)

* feat(ci): update outdated action (datahub-project#10899)

* feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902)

Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io>

* feat(ingest): add snowflake-queries source (datahub-project#10835)

* fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906)

* docs: add new company to adoption list (datahub-project#10909)

* refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870)

Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* feat(ui) Finalize support for all entity types on forms (datahub-project#10915)

* Index ExecutionRequestResults status field (datahub-project#10811)

* feat(ingest): grafana connector (datahub-project#10891)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916)

* feat(dataset): add support for external url in Dataset (datahub-project#10877)

* docs(saas-overview) added missing features to observe section (datahub-project#10913)

Co-authored-by: John Joyce <john@acryl.io>

* fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882)

* fix(structured properties): allow application of structured properties without schema file (datahub-project#10918)

* fix(data-contracts-web) handle other schedule types (datahub-project#10919)

* fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* Add feature flag for view defintions (datahub-project#10914)

Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io>

* feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884)

* fix(airflow): add error handling around render_template() (datahub-project#10907)

* feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830)

* feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904)

* fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845)

* feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864)

* feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813)

* fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924)

* fix(build): fix lint fix web react (datahub-project#10896)

* fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912)

* feat(ingest): report extractor failures more loudly (datahub-project#10908)

* feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905)

* fix(ingest): fix docs build (datahub-project#10926)

* fix(ingest/snowflake): fix test connection (datahub-project#10927)

* fix(ingest/lookml): add view load failures to cache (datahub-project#10923)

* docs(slack) overhauled setup instructions and screenshots (datahub-project#10922)

Co-authored-by: John Joyce <john@acryl.io>

* fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903)

* fix(entityservice): fix merging sideeffects (datahub-project#10937)

* feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938)

Co-authored-by: John Joyce <john@Johns-MBP.lan>

* chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925)

Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>
Co-authored-by: John Joyce <john@Johns-MBP.lan>

* Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939)

* docs: add learning center to docs (datahub-project#10921)

* doc: Update hubspot form id (datahub-project#10943)

* chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941)

* fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895)

* fix(ingest/abs): split abs utils into multiple files (datahub-project#10945)

* doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950)

* fix(ingest/setup): feast and abs source setup (datahub-project#10951)

* fix(connections) Harden adding /gms to connections in backend (datahub-project#10942)

* feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952)

* fix(docs): make graphql doc gen more automated (datahub-project#10953)

* feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723)

* fix(spark-lineage): default timeout for future responses (datahub-project#10947)

* feat(datajob/flow): add environment filter using info aspects (datahub-project#10814)

* fix(ui/ingest): correct privilege used to show tab (datahub-project#10483)

Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com>

* feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955)

* add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956)

* fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966)

* fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965)

* fix(airflow/build): Pinning mypy (datahub-project#10972)

* Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974)

* fix(ingest/test): Fix for mssql integration tests (datahub-project#10978)

* fix(entity-service) exist check correctly extracts status (datahub-project#10973)

* fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982)

* bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986)

* fix(ui): Remove ant less imports (datahub-project#10988)

* feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987)

* feat(ingest/cli): init does not actually support environment variables (datahub-project#10989)

* fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991)

* feat(ingest/spark): Promote beta plugin (datahub-project#10881)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967)

* feat(ingest): add `check server-config` command (datahub-project#10990)

* feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466)

Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig.
This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig.

I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled.

Fixes datahub-project#9705

* fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993)

* fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771)

* feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985)

* feat(ingest): improve `ingest deploy` command (datahub-project#10944)

* fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920)

- allow excluding soft-deleted entities in relationship-queries
- exclude soft-deleted members of groups

* fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996)

* doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984)

Co-authored-by: John Joyce <john@acryl.io>
Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Pedro Silva <pedro@acryl.io>

* fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006)

* fix(ui/ingest): Support invalid cron jobs (datahub-project#10998)

* fix(ingest): fix graph config loading (datahub-project#11002)

Co-authored-by: Pedro Silva <pedro@acryl.io>

* feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011)

* feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999)

* feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935)

* fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858)

* feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016)

Co-authored-by: Chris Collins <chriscollins3456@gmail.com>

* docs: standardize terminology to DataHub Cloud (datahub-project#11003)

* fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013)

* docs(slack) troubleshoot docs (datahub-project#11014)

* feat(propagation): Add graphql API (datahub-project#11030)

Co-authored-by: Chris Collins <chriscollins3456@gmail.com>

* feat(propagation):  Add models for Action feature settings (datahub-project#11029)

* docs(custom properties): Remove duplicate from sidebar (datahub-project#11033)

* feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997)

Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>

* feat(propagation): Add Documentation Propagation Settings (datahub-project#11038)

* fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040)

* fix(ci): smoke test lint failures (datahub-project#11044)

* docs: fix learning center color scheme & typo (datahub-project#11043)

* feat: add cloud main page (datahub-project#11017)

Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com>

* feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662)

Co-authored-by: John Joyce <john@acryl.io>

* docs: fix typo (datahub-project#11046)

* fix(lint): apply spotless (datahub-project#11050)

* docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034)

* feat(cli): Add run-id option to put sub-command (datahub-project#11023)

Adds an option to assign run-id to a given put command execution. 
This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline.

* fix(ingest): improve sql error reporting calls (datahub-project#11025)

* fix(airflow): fix CI setup (datahub-project#11031)

* feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039)

* fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971)

* (chore): Linting fix (datahub-project#11015)

* chore(ci): update deprecated github actions (datahub-project#10977)

* Fix ALB configuration example (datahub-project#10981)

* chore(ingestion-base): bump base image packages (datahub-project#11053)

* feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051)

* fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910)

* feat(ingest/tableau): add retry on timeout (datahub-project#10995)

* change generate kafka connect properties from env (datahub-project#10545)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>

* fix(ingest): fix oracle cronjob ingestion (datahub-project#11001)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>

* chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062)

* feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041)

* build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>

* fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063)

* docs(ingest): update developing-a-transformer.md (datahub-project#11019)

* feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056)

* feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* docs(airflow): update min version for plugin v2 (datahub-project#11065)

* doc(ingestion/tableau): doc update for derived permission (datahub-project#11054)

Co-authored-by: Pedro Silva <pedro.cls93@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(py): remove dep on types-pkg_resources (datahub-project#11076)

* feat(ingest/mode): add option to exclude restricted (datahub-project#11081)

* fix(ingest): set lastObserved in sdk when unset (datahub-project#11071)

* doc(ingest): Update capabilities (datahub-project#11072)

* chore(vulnerability): Log Injection (datahub-project#11090)

* chore(vulnerability): Information exposure through a stack trace (datahub-project#11091)

* chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089)

* chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088)

* chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059)

* chore(vulnerability): Overly permissive regex range (datahub-project#11061)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix: update customer data (datahub-project#11075)

* fix(models): fixing the datasetPartition models (datahub-project#11085)

Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>

* fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084)

Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>

* feat(docs-site): hiding learn more from cloud page (datahub-project#11097)

* fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082)

Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com>

* fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098)

* docs: Refactor customer stories section (datahub-project#10869)

Co-authored-by: Jeff Merrick <jeff@wireform.io>

* fix(release): fix full/slim suffix on tag (datahub-project#11087)

* feat(config): support alternate hashing algorithm for doc id (datahub-project#10423)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: John Joyce <john@acryl.io>

* fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007)

* fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* chore: Update contributors list in PR labeler (datahub-project#11105)

* feat(ingest): tweak stale entity removal messaging (datahub-project#11064)

* fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104)

* fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080)

* feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069)

* docs: update graphql docs on forms & structured properties (datahub-project#11100)

* test(search): search openAPI v3 test (datahub-project#11049)

* fix(ingest/tableau): prevent empty site content urls (datahub-project#11057)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat(entity-client): implement client batch interface (datahub-project#11106)

* fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114)

* fix(ingest): downgrade column type mapping warning to info (datahub-project#11115)

* feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118)

* fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111)

* fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122)

* fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092)

* fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121)

* fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366)

* feat(ui): Changes to allow editable dataset name (datahub-project#10608)

Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com>

* fix: remove saxo (datahub-project#11127)

* feat(mcl-processor): Update mcl processor hooks (datahub-project#11134)

* fix(openapi): fix openapi v2 endpoints & v3 documentation update

* Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update"

This reverts commit 573c1cb.

* docs(policies): updates to policies documentation (datahub-project#11073)

* fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139)

* feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116)

* fix(mutator): mutator hook fixes (datahub-project#11140)

* feat(search): support sorting on multiple fields (datahub-project#10775)

* feat(ingest): various logging improvements (datahub-project#11126)

* fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* feat(docs-site) cloud page spacing and content polishes (datahub-project#11141)

* feat(ui) Enable editing structured props on fields (datahub-project#11042)

* feat(tests): add md5 and last computed to testResult model (datahub-project#11117)

* test(openapi): openapi regression smoke tests (datahub-project#11143)

* fix(airflow): fix tox tests + update docs (datahub-project#11125)

* docs: add chime to adoption stories (datahub-project#11142)

* fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158)

* fix(kafka-setup): add missing script to image (datahub-project#11190)

* fix(config): fix hash algo config (datahub-project#11191)

* test(smoke-test): updates to smoke-tests (datahub-project#11152)

* fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193)

* chore(kafka): kafka version bump (datahub-project#11211)

* readd UsageStatsWorkUnit

* fix merge problems

* change logo

---------

Co-authored-by: Chris Collins <chriscollins3456@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
Co-authored-by: John Joyce <john@Johns-MBP.lan>
Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal>
Co-authored-by: dushayntAW <158567391+dushayntAW@users.noreply.github.com>
Co-authored-by: sagar-salvi-apptware <159135491+sagar-salvi-apptware@users.noreply.github.com>
Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>
Co-authored-by: Kevin Chun <kevin1chun@gmail.com>
Co-authored-by: jordanjeremy <72943478+jordanjeremy@users.noreply.github.com>
Co-authored-by: skrydal <piotr.skrydalewicz@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: sid-acryl <155424659+sid-acryl@users.noreply.github.com>
Co-authored-by: Julien Jehannet <80408664+aviv-julienjehannet@users.noreply.github.com>
Co-authored-by: Hendrik Richert <github@richert.li>
Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com>
Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com>
Co-authored-by: Felix Lüdin <13187726+Masterchen09@users.noreply.github.com>
Co-authored-by: Pirry <158024088+chardaway@users.noreply.github.com>
Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: cburroughs <chris.burroughs@gmail.com>
Co-authored-by: ksrinath <ksrinath@users.noreply.github.com>
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: ipolding-cais <155455744+ipolding-cais@users.noreply.github.com>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Shubham Jagtap <132359390+shubhamjagtap639@users.noreply.github.com>
Co-authored-by: haeniya <yanik.haeni@gmail.com>
Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com>
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io>
Co-authored-by: 808OVADOZE <52988741+shtephlee@users.noreply.github.com>
Co-authored-by: noggi <anton.kuraev@acryl.io>
Co-authored-by: Nicholas Pena <npena@foursquare.com>
Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com>
Co-authored-by: ethan-cartwright <ethan.cartwright.m@gmail.com>
Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io>
Co-authored-by: Nadav Gross <33874964+nadavgross@users.noreply.github.com>
Co-authored-by: Patrick Franco Braz <patrickfbraz@poli.ufrj.br>
Co-authored-by: pie1nthesky <39328908+pie1nthesky@users.noreply.github.com>
Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <130968841+joelmataKPN@users.noreply.github.com>
Co-authored-by: Ellie O'Neil <110510035+eboneil@users.noreply.github.com>
Co-authored-by: Ajoy Majumdar <ajoymajumdar@hotmail.com>
Co-authored-by: deepgarg-visa <149145061+deepgarg-visa@users.noreply.github.com>
Co-authored-by: Tristan Heisler <tristankheisler@gmail.com>
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
Co-authored-by: Davi Arnaut <davi.arnaut@acryl.io>
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: amit-apptware <132869468+amit-apptware@users.noreply.github.com>
Co-authored-by: Sam Black <sam.black@acryl.io>
Co-authored-by: Raj Tekal <varadaraj_tekal@optum.com>
Co-authored-by: Steffen Grohsschmiedt <gitbhub@steffeng.eu>
Co-authored-by: jaegwon.seo <162448493+wornjs@users.noreply.github.com>
Co-authored-by: Renan F. Lima <51028757+lima-renan@users.noreply.github.com>
Co-authored-by: Matt Exchange <xkollar@users.noreply.github.com>
Co-authored-by: Jonny Dixon <45681293+acrylJonny@users.noreply.github.com>
Co-authored-by: Pedro Silva <pedro.cls93@gmail.com>
Co-authored-by: Pinaki Bhattacharjee <pinakipb2@gmail.com>
Co-authored-by: Jeff Merrick <jeff@wireform.io>
Co-authored-by: skrydal <piotr.skrydalewicz@acryl.io>
Co-authored-by: AndreasHegerNuritas <163423418+AndreasHegerNuritas@users.noreply.github.com>
Co-authored-by: jayasimhankv <145704974+jayasimhankv@users.noreply.github.com>
Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com>
Co-authored-by: David Leifker <david.leifker@acryl.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants