fix(ingestion/lookml): liquid template resolution and view-to-view cll #10542

sid-acryl · 2024-05-20T07:02:32Z

Update code to use DataHub SqlParser for SQL parsing
And fixes issues in CLL generation when view definition language is SQL
Add support for liquid template resolution for lookml views
Add condition tag similar to looker liquid condition tag

Summary by CodeRabbit

New Features
- Enhanced Looker integration with improved field handling and metadata event generation.
- Introduced data classes for handling Looker model and view files.
Chores
- Updated dependencies: Added "python-liquid" and sqlglot_lib for LookML support.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_sql_parser.py

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

…hub-fork into master+ing-510-lookml-cll

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_sql_parser.py

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

…hub-fork into master+ing-510-lookml-cll

coderabbitai

Actionable comments posted: 13

Outside diff range and nitpick comments (4)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (1)
Line range hint 1349-1349: Remove use of lru_cache on methods.

Using functools.lru_cache on methods can lead to memory leaks. Consider using an alternative caching mechanism.
-    @lru_cache(maxsize=200)
+    # @lru_cache(maxsize=200)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (2)
Line range hint 385-385: Optimize dictionary key check.

Use key in dict instead of key in dict.keys().
-        for field in filters.keys():
+        for field in filters:
Line range hint 1260-1264: Refactor nested if statements.

Use a single if statement instead of nested if statements.
-        if dashboard is None and dashboard_element is not None:
-            ownership = self.get_ownership(dashboard_element)
-            if ownership is not None:
-                chart_snapshot.aspects.append(ownership)
+        if dashboard is None and dashboard_element is not None and (ownership := self.get_ownership(dashboard_element)) is not None:
+            chart_snapshot.aspects.append(ownership)
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (1)
Line range hint 1405-1485: Ensure completeness of field definitions.

The field country is mentioned in the view logic but not defined in the schema metadata. This could lead to incomplete metadata representation.

Ensure that all fields used in the view logic are defined in the schema metadata.
{
  "fieldPath": "country",
  "nullable": false,
  "description": "Country",
  "label": "",
  "type": {
    "type": {
      "com.linkedin.pegasus2avro.schema.StringType": {}
    }
  },
  "nativeDataType": "string",
  "recursive": false,
  "globalTags": {
    "tags": []
  },
  "isPartOfKey": false
}

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8edc94d and 5ad8200.

Files selected for processing (41)

metadata-ingestion/setup.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_resolver.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (21 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/str_functions.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/urn_functions.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)
metadata-ingestion/tests/integration/looker/test_looker.py (1 hunks)
metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (6 hunks)
metadata-ingestion/tests/integration/lookml/expected_output.json (19 hunks)
metadata-ingestion/tests/integration/lookml/lkml_samples/liquid.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/lkml_samples/nested/fragment_derived.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/lkml_samples_hive/included_view_file.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/lkml_samples_hive/liquid.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/lkml_samples_hive/nested/fragment_derived.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (6 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json (12 hunks)
metadata-ingestion/tests/integration/lookml/lookml_same_name_views_different_file_path.json (8 hunks)
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (20 hunks)
metadata-ingestion/tests/integration/lookml/test_lookml.py (4 hunks)
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/activity_logs.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/data.model.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_income_source.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_tax_report.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_total_income.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/top_10_employee_income_source.view.lkml (1 hunks)
metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json (1 hunks)

Files not summarized due to errors (2)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit
metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json: Error: Message exceeds token limit

Files not reviewed due to errors (1)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (no review received)

Files skipped from review due to trivial changes (4)

metadata-ingestion/src/datahub/ingestion/source/looker/str_functions.py
metadata-ingestion/tests/integration/lookml/lkml_samples/liquid.view.lkml
metadata-ingestion/tests/integration/lookml/lkml_samples_hive/liquid.view.lkml
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/data.model.lkml

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py

145-152: Use a single if statement instead of nested if statements

(SIM102)

metadata-ingestion/tests/integration/lookml/test_lookml.py

719-720: Use a single if statement instead of nested if statements

(SIM102)

metadata-ingestion/tests/integration/looker/test_looker.py

490-490: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

411-414: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

632-635: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py

1349-1349: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks

(B019)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py

385-385: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

1260-1264: Use a single if statement instead of nested if statements

(SIM102)

Additional comments not posted (173)

metadata-ingestion/tests/integration/lookml/lkml_samples_hive/included_view_file.view.lkml (1)

2-2: Verify the SQL table name formatting.

Ensure that the SQL table name "looker_schema"."include_able" is correctly formatted and valid in your database.

metadata-ingestion/tests/integration/lookml/lkml_samples/nested/fragment_derived.view.lkml (3)

4-4: Verify the SQL syntax and column name.

Ensure that the column date exists and the alias DATE is correctly used in the SQL query.

5-5: Verify the SQL syntax and column name.

Ensure that the column platform exists and the alias aliased_platform is correctly used in the SQL query.

6-6: Verify the SQL syntax and column name.

Ensure that the column country exists and is correctly used in the SQL query.

metadata-ingestion/tests/integration/lookml/lkml_samples_hive/nested/fragment_derived.view.lkml (3)

4-4: Verify the SQL syntax and column name.

Ensure that the column date exists and the alias DATE is correctly used in the SQL query.

5-5: Verify the SQL syntax and column name.

Ensure that the column platform exists and the alias aliased_platform is correctly used in the SQL query.

6-6: Verify the SQL syntax and column name.

Ensure that the column country exists and is correctly used in the SQL query.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_tax_report.view.lkml (4)

2-2: Verify the SQL table name formatting.

Ensure that the SQL table name data-warehouse.finance.form-16 is correctly formatted and valid in your database.

4-6: Verify the dimension type and SQL syntax.

Ensure that the dimension id with type number and SQL ${TABLE}.id is correctly defined.

9-11: Verify the dimension type and SQL syntax.

Ensure that the dimension name with type string and SQL ${TABLE}.name is correctly defined.

14-16: Verify the measure type and SQL syntax.

Ensure that the measure taxable_income with type sum and SQL ${TABLE}.tax is correctly defined.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_total_income.view.lkml (4)

1-3: LGTM!

The SQL table name is correctly defined using a liquid template variable.

4-7: LGTM!

The dimension id is correctly defined with type number and a SQL expression using a liquid template variable.

9-12: LGTM!

The dimension name is correctly defined with type string and a SQL expression using a liquid template variable.

14-17: LGTM!

The measure total_income is correctly defined with type sum and a SQL expression using a liquid template variable.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/top_10_employee_income_source.view.lkml (4)

1-10: LGTM!

The derived table is correctly defined using a SQL query with a liquid template variable.

12-15: LGTM!

The dimension id is correctly defined with type number and a SQL expression using a liquid template variable.

17-20: LGTM!

The dimension name is correctly defined with type string and a SQL expression using a liquid template variable.

22-25: LGTM!

The dimension source is correctly defined with type string and a SQL expression using a liquid template variable.

metadata-ingestion/src/datahub/ingestion/source/looker/urn_functions.py (2)

1-11: LGTM!

The function get_qualified_table_name correctly handles the URN format and returns the appropriate part of the URN.

13-18: LGTM!

The function get_table_name correctly handles the qualified table name and returns the appropriate part of the name.

metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/activity_logs.view.lkml (2)

1-10: LGTM!

The SQL table name is correctly defined using liquid template variables and conditional logic.

12-17: LGTM!

The dimension generated_message_id is correctly defined with a group label, primary key, type, and SQL expression using a liquid template variable.
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_income_source.view.lkml (3)
1-1: Add a description for the view.

It's good practice to add a description for the view to improve readability and maintainability.
+  description: "This view represents employee income source data."
6-12: Ensure proper handling of SQL injection.

Using liquid template tags in SQL queries can introduce SQL injection vulnerabilities. Ensure that the values used in these tags are properly sanitized.

Do you have measures in place to sanitize the values used in these liquid template tags?

16-16: Verify the custom condition tag implementation.

Ensure that the custom condition tag used here is correctly implemented and tested.

Is the custom condition tag implementation tested and verified for correctness?
metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py (2)
14-17: Add a docstring for the CustomTagException class.

Adding a docstring will improve code readability and maintainability.
class CustomTagException(Exception):
+    """
+    Exception raised for errors in the custom tag processing.
+
+    Attributes:
+        message -- explanation of the error
+    """
45-56: Improve the docstring for the ConditionTag class.

Clarify the usage of the ConditionTag class and provide examples.
"""
ConditionTag is the equivalent implementation of Looker's custom liquid tag "condition".
Refer doc: https://cloud.google.com/looker/docs/templated-filters#basic_usage

Refer doc to see how to write liquid custom tag: https://jg-rp.github.io/liquid/guides/custom-tags

This class renders the below tag as order.region='ap-south-1' if order_region is provided in config.liquid_variables
as order_region: 'ap-south-1'
    {% condition order_region %} order.region {% endcondition %}

+Usage example:
+    {% condition order_region %} order.region {% endcondition %}
"""
metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (2)
42-64: Add a docstring for the LookerConnectionDefinition class.

Adding a docstring will improve code readability and maintainability.
class LookerConnectionDefinition(ConfigModel):
+    """
+    Represents a Looker connection definition.
+
+    Attributes:
+        platform -- the platform name
+        default_db -- the default database name
+        default_schema -- the default schema name (optional)
+        platform_instance -- the platform instance name (optional)
+        platform_env -- the environment that the platform is located in (optional)
+    """
75-85: Improve error handling in from_looker_connection method.

Ensure that the method handles missing dialect names gracefully.
if looker_connection.dialect_name is None:
    raise ConfigurationError(
        f"Unable to fetch a fully filled out connection for {looker_connection.name}. Please check your API permissions."
    )
for extractor_pattern, extracting_function in extractors.items():
    if re.match(extractor_pattern, looker_connection.dialect_name):
        (platform, db, schema) = extracting_function(looker_connection)
        return cls(platform=platform, default_db=db, default_schema=schema)
raise ConfigurationError(
    f"Could not find an appropriate platform for looker_connection: {looker_connection.name} with dialect: {looker_connection.dialect_name}"
)
Likely invalid or redundant comment.
metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (2)
40-42: Add a docstring for the is_view_seen method.

Adding a docstring will improve code readability and maintainability.
def is_view_seen(self, path: str) -> bool:
+    """
+    Checks if the view file at the given path has already been loaded.
+
+    Args:
+        path: The path to the view file.
+
+    Returns:
+        True if the view file has been loaded, False otherwise.
+    """
    return path in self.viewfile_cache
98-113: Add a docstring for the load_viewfile method.

Adding a docstring will improve code readability and maintainability.
def load_viewfile(
    self,
    path: str,
    project_name: str,
    connection: Optional[LookerConnectionDefinition],
    reporter: LookMLSourceReport,
) -> Optional[LookerViewFile]:
+    """
+    Loads the Looker view file at the given path, resolves liquid variables, and caches the result.
+
+    Args:
+        path: The path to the view file.
+        project_name: The name of the project.
+        connection: The Looker connection definition.
+        reporter: The source report for logging and error reporting.
+
+    Returns:
+        The loaded LookerViewFile object, or None if loading failed.
+    """
    viewfile = self._load_viewfile(
        project_name=project_name,
        path=path,
        reporter=reporter,
    )
    if viewfile is None:
        return None

    return replace(viewfile, connection=connection)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (6)
19-23: Add type hints to the function.

Type hints improve code readability and help catch type-related errors early.
- def create_nested_dict(keys, value):
+ def create_nested_dict(keys: List[str], value: Any) -> Dict[str, Any]:
26-34: Add type hints to the class methods.

Type hints improve code readability and help catch type-related errors early.
- def __init__(self, liquid_variable):
+ def __init__(self, liquid_variable: Dict[str, Any]):
35-60: Add type hints to the method _create_new_liquid_variables_with_default.

Type hints improve code readability and help catch type-related errors early.
- def _create_new_liquid_variables_with_default(self, variables: Set[str]) -> dict:
+ def _create_new_liquid_variables_with_default(self, variables: Set[str]) -> Dict[str, Any]:
62-74: Add type hints to the method liquid_variable_with_default.

Type hints improve code readability and help catch type-related errors early.
- def liquid_variable_with_default(self, text: str) -> dict:
+ def liquid_variable_with_default(self, text: str) -> Dict[str, Any]:
77-101: Add type hints to the function resolve_liquid_variable.

Type hints improve code readability and help catch type-related errors early.
- def resolve_liquid_variable(text: str, liquid_variable: Dict[Any, Any]) -> str:
+ def resolve_liquid_variable(text: str, liquid_variable: Dict[str, Any]) -> str:
104-122: Add type hints to the function resolve_liquid_variable_in_view_dict.

Type hints improve code readability and help catch type-related errors early.
- def resolve_liquid_variable_in_view_dict(raw_view: dict, liquid_variable: Dict[Any, Any]) -> None:
+ def resolve_liquid_variable_in_view_dict(raw_view: Dict[str, Any], liquid_variable: Dict[str, Any]) -> None:
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_resolver.py (7)
25-29: Add type hints to the function is_derived_view.

Type hints improve code readability and help catch type-related errors early.
- def is_derived_view(view_name: str) -> bool:
+ def is_derived_view(view_name: str) -> bool:
32-52: Add type hints to the function get_derived_looker_view_id.

Type hints improve code readability and help catch type-related errors early.
- def get_derived_looker_view_id(qualified_table_name: str, looker_view_id_cache: "LookerViewIdCache", base_folder_path: str) -> Optional[LookerViewId]:
+ def get_derived_looker_view_id(qualified_table_name: str, looker_view_id_cache: "LookerViewIdCache", base_folder_path: str) -> Optional[LookerViewId]:
55-81: Add type hints to the function resolve_derived_view_urn_of_col_ref.

Type hints improve code readability and help catch type-related errors early.
- def resolve_derived_view_urn_of_col_ref(column_refs: List[ColumnRef], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[ColumnRef]:
+ def resolve_derived_view_urn_of_col_ref(column_refs: List[ColumnRef], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[ColumnRef]:
84-110: Add type hints to the function fix_derived_view_urn.

Type hints improve code readability and help catch type-related errors early.
- def fix_derived_view_urn(urns: List[str], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[str]:
+ def fix_derived_view_urn(urns: List[str], looker_view_id_cache: "LookerViewIdCache", base_folder_path: str, config: LookMLSourceConfig) -> List[str]:
113-127: Add type hints to the function determine_view_file_path.

Type hints improve code readability and help catch type-related errors early.
- def determine_view_file_path(base_folder_path: str, absolute_file_path: str) -> str:
+ def determine_view_file_path(base_folder_path: str, absolute_file_path: str) -> str:
129-173: Add type hints to the class methods.

Type hints improve code readability and help catch type-related errors early.
- def __init__(self, project_name: str, model_name: str, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, reporter: LookMLSourceReport):
+ def __init__(self, project_name: str, model_name: str, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, reporter: LookMLSourceReport):
174-215: Add type hints to the method get_looker_view_id.

Type hints improve code readability and help catch type-related errors early.
- def get_looker_view_id(self, view_name: str, base_folder_path: str, connection: Optional[LookerConnectionDefinition] = None) -> Optional[LookerViewId]:
+ def get_looker_view_id(self, view_name: str, base_folder_path: str, connection: Optional[LookerConnectionDefinition] = None) -> Optional[LookerViewId]:
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (5)
18-62: Add type hints to the class methods.

Type hints improve code readability and help catch type-related errors early.
- def __init__(self, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, connection_definition: LookerConnectionDefinition, source_config: LookMLSourceConfig, reporter: LookMLSourceReport):
+ def __init__(self, looker_model: LookerModel, looker_viewfile_loader: LookerViewFileLoader, connection_definition: LookerConnectionDefinition, source_config: LookMLSourceConfig, reporter: LookMLSourceReport):
63-66: Add type hints to the function is_refinement.

Type hints improve code readability and help catch type-related errors early.
- def is_refinement(view_name: str) -> bool:
+ def is_refinement(view_name: str) -> bool:
68-94: Add type hints to the function merge_column.

Type hints improve code readability and help catch type-related errors early.
- def merge_column(original_dict: dict, refinement_dict: dict, key: str) -> List[dict]:
+ def merge_column(original_dict: Dict[str, Any], refinement_dict: Dict[str, Any], key: str) -> List[Dict[str, Any]]:
97-105: Add type hints to the function merge_and_set_column.

Type hints improve code readability and help catch type-related errors early.
- def merge_and_set_column(new_raw_view: dict, refinement_view: dict, key: str) -> None:
+ def merge_and_set_column(new_raw_view: Dict[str, Any], refinement_view: Dict[str, Any], key: str) -> None:
107-132: Add type hints to the function merge_refinements.

Type hints improve code readability and help catch type-related errors early.
- def merge_refinements(raw_view: dict, refinement_views: List[dict]) -> dict:
+ def merge_refinements(raw_view: Dict[str, Any], refinement_views: List[Dict[str, Any]]) -> Dict[str, Any]:
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (4)

18-22: LGTM!

The ProjectInclude dataclass looks good and is correctly implemented.

24-30: LGTM!

The LookerField dataclass looks good and is correctly implemented.

39-85: LGTM!

The from_looker_dict method is well-structured and handles errors appropriately. The logging and reporting mechanisms are in place.

242-278: LGTM!

The from_looker_dict method is well-structured and handles errors appropriately. The logging and reporting mechanisms are in place.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (7)

30-54: LGTM!

The methods in LookerFieldContext are well-structured and handle field context appropriately. The logging and error handling mechanisms are in place.

191-216: LGTM!

The resolve_extends_view_name method is well-structured and handles view name resolution appropriately. The logging and error handling mechanisms are in place.

218-248: LGTM!

The get_including_extends method is well-structured and handles field resolution appropriately. The logging and error handling mechanisms are in place.

250-277: LGTM!

The methods _get_sql_table_name_field, _is_dot_sql_table_name_present, and sql_table_name are well-structured and handle SQL table name resolution appropriately. The logging and error handling mechanisms are in place.

278-321: LGTM!

The methods derived_table, explore_source, and sql are well-structured and handle derived table and SQL resolution appropriately. The logging and error handling mechanisms are in place.

323-343: LGTM!

The methods name and view_file_name are well-structured and handle view name and file name resolution appropriately. The logging and error handling mechanisms are in place.

344-413: LGTM!

The methods _get_list_dict, dimensions, measures, dimension_groups, is_materialized_derived_view, is_regular_case, is_sql_table_name_referring_to_view, is_sql_based_derived_case, is_native_derived_case, and is_sql_based_derived_view_without_fields_case are well-structured and handle view context appropriately. The logging and error handling mechanisms are in place.

metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (3)

Line range hint 1-106: LGTM!

The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.

Line range hint 107-233: LGTM!

The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.

Line range hint 234-494: LGTM!

The JSON data segments are well-structured and correctly represent the test data for LookML integration tests.

metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (6)

42-110: LGTM!

The utility functions _platform_names_have_2_parts, _drop_hive_dot, _drop_hive_dot_from_upstream, and _generate_fully_qualified_name are well-structured and handle platform-specific naming and transformations appropriately. The logging and error handling mechanisms are in place.

113-148: LGTM!

The AbstractViewUpstream class is well-structured and defines abstract methods for extracting upstream column references and dataset URNs.

169-194: LGTM!

The __get_spr method is well-structured and handles SQL parsing results appropriately. The logging and error handling mechanisms are in place.

196-214: LGTM!

The __get_upstream_dataset_urn method is well-structured and handles upstream dataset URN resolution appropriately. The logging and error handling mechanisms are in place.

216-242: LGTM!

The create_fields method is well-structured and handles field creation appropriately. The logging and error handling mechanisms are in place.

244-281: LGTM!

The get_upstream_column_ref method is well-structured and handles upstream column references appropriately. The logging and error handling mechanisms are in place.
metadata-ingestion/tests/integration/lookml/lookml_same_name_views_different_file_path.json (2)
Line range hint 386-694: Ensure the correctness of browse paths.

The browse paths appear correct, but it's crucial to verify if the container references and path IDs are accurate and exist in the respective datasets.

Verification successful

Browse paths and container references are correct.

The container references and path IDs (urn:li:container:78f22c19304954b15e8adb1d9809975e, Develop, path1, path2, foo) exist in the dataset definitions, validating the correctness of the browse paths.

urn:li:container:78f22c19304954b15e8adb1d9809975e is found in multiple dataset definitions.

id: "Develop", id: "path1", id: "path2", and id: "foo" are all confirmed in the dataset definitions.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the existence of container references and path IDs in the respective datasets.

# Test: Search for the container references and path IDs in the dataset definitions. Expect: Container references and path IDs exist in the datasets.
rg --type json 'urn:li:container:78f22c19304954b15e8adb1d9809975e'
rg --type json '"id": "Develop"'
rg --type json '"id": "path1"'
rg --type json '"id": "path2"'
rg --type json '"id": "foo"'
Length of output: 80442
173-229: Ensure the correctness of fine-grained lineage.

The fine-grained lineage entries appear correct, but it's crucial to verify if the upstream and downstream field references are accurate and exist in the respective datasets.

Verification successful

Fine-grained lineage entries are valid.

The upstream and downstream field references exist in the respective datasets, confirming the correctness of the fine-grained lineage entries.

No issues found with the fine-grained lineage entries.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the existence of upstream and downstream field references in the respective datasets.

# Test: Search for the field references in the dataset definitions. Expect: Field references exist in the datasets.
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country|city|is_latest|timestamp|measurement)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),country|city|is_latest|timestamp|average_measurement)'
Length of output: 358

Script:
#!/bin/bash
# Description: Verify the existence of upstream and downstream field references in the respective datasets.

# Test: Search for the specific field references in the dataset definitions. Expect: Field references exist in the datasets.
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),city)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),is_latest)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),timestamp)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),measurement)'

rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),country)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),city)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),is_latest)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),timestamp)'
rg --type json 'urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.path1.foo.view.my_view,PROD),average_measurement)'
Length of output: 1402
metadata-ingestion/setup.py (1)

169-169: Dependency addition looks good!

The addition of python-liquid to the looker_common and mode dependencies is correctly formatted and logically consistent.

Also applies to: 374-374

metadata-ingestion/tests/integration/lookml/test_lookml.py (2)

896-928: LGTM!

The function test_view_to_view_lineage_and_liquid_template is well-structured and correctly sets up the pipeline with liquid variables. The use of freeze_time ensures consistent test results. The golden file verification is a good practice to ensure the correctness of the output.

931-1004: LGTM!

The function test_special_liquid_variables is well-structured and correctly checks the handling of special liquid variables. The use of freeze_time ensures consistent test results. The assertions ensure that the default values are correctly added and that the actual values are not overwritten.

metadata-ingestion/tests/integration/looker/test_looker.py (1)

1053-1080: LGTM!

The function test_upstream_cll is well-structured and correctly sets up the mock Looker explore. The use of freeze_time ensures consistent test results. The mock configuration is well-defined. The assertions ensure that the upstream fields are correctly set.

metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json (2)

604-616: Ensure dataset URNs are updated consistently.

The dataset URN for lkml_samples.view.owners should be updated consistently across all aspects.

386-398: Ensure dataset URNs are updated consistently.

The dataset URN for lkml_samples.view.my_view should be updated consistently across all aspects.
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (8)
4-7: Import statements look good.

The added imports are necessary for the updated functionality.

38-45: New imports from looker_common are appropriate.

The added imports from looker_common are necessary for the updated functionality.

98-98: New import for ColumnRef is appropriate.

The added import for ColumnRef is necessary for fine-grained lineage extraction.

875-875: Ensure consistent usage of LookerRefinementResolver.

The LookerRefinementResolver instance is correctly instantiated and used for explore refinement.

912-918: Ensure proper initialization of LookerViewIdCache.

The LookerViewIdCache instance is correctly instantiated with the necessary parameters.

972-980: Ensure proper initialization of LookerViewContext.

The LookerViewContext instance is correctly instantiated with the necessary parameters.

985-994: Ensure proper initialization of LookerView from Looker dictionary.

The LookerView instance is correctly instantiated with the necessary parameters.

632-635: Improve exception handling by chaining exceptions.

Use raise ... from err to distinguish the exception from errors in exception handling.
- raise ValueError(f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file")
+ raise ValueError(f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file") from err
Likely invalid or redundant comment.

Tools

Ruff

632-635: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (9)
131-138: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.
-    # Remove duplicates filed from self.fields
+    # Remove duplicate fields from the provided list of fields.
-    # Logic is: If more than a field has same ViewField.name then keep only one filed where ViewField.field_type
+    # Logic: If more than one field has the same ViewField.name, keep only the field where ViewField.field_type
-    # is DIMENSION_GROUP.
+    # is DIMENSION_GROUP.
-    # Looker Constraint:
+    # Looker Constraints:
-    #   - Any field declared as dimension or measure can be redefined as dimension_group.
+    #   - Any field declared as a dimension or measure can be redefined as a dimension_group.
-    #   - Any field declared in dimension can't be redefined in measure and vice-versa.
+    #   - Any field declared as a dimension can't be redefined as a measure and vice-versa.
296-297: Verify the type hint for upstream_fields.

Ensure that the type hint Union[List[ColumnRef]] is appropriate and consider if it should be List[ColumnRef] instead.
-    upstream_fields: Union[List[ColumnRef]] = dataclasses_field(default_factory=list)
+    upstream_fields: List[ColumnRef] = dataclasses_field(default_factory=list)
299-332: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.
-    # It is the list of ColumnRef for derived view defined using SQL otherwise simple column name
+    # It is the list of ColumnRef for a derived view defined using SQL, otherwise a simple column name.
340-402: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.
-            return None  # Inconsistent info received
+            return None  # Inconsistent information received.
-            # remove variant at the end. +1 for "_"
+            # Remove variant at the end. +1 for "_".
-        assert view_name  # for lint false positive
+        assert view_name  # For lint false positive.
403-456: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.
-            )  # Variant i.e. Month, Day, Year ... is not available
+            )  # Variant (e.g., Month, Day, Year, etc.) is not available.
-            )  # for Dimensional Group the type is always start with date_[time|date]
+            )  # For Dimensional Group, the type always starts with date_[time|date].
-            )  # if the explore field is generated because of  Dimensional Group in View
-            # then the field_name should ends with field_group_variant
+            )  # If the explore field is generated because of Dimensional Group in View,
+            # then the field_name should end with field_group_variant.
Line range hint 459-467: LGTM!

The function create_view_project_map is correct and straightforward.

Line range hint 844-895: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.
-        # The view name that the explore refers to is resolved in the following order of priority:
-        # 1. view_name: https://cloud.google.com/looker/docs/reference/param-explore-view-name
-        # 2. from: https://cloud.google.com/looker/docs/reference/param-explore-from
-        # 3. default to the name of the explore
+        # The view name that the explore refers to is resolved in the following order of priority:
+        # 1. view_name: https://cloud.google.com/looker/docs/reference/param-explore-view-name
+        # 2. from: https://cloud.google.com/looker/docs/reference/param-explore-from
+        # 3. Default to the name of the explore.
1083-1103: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.
-            # form upstream of fields as all information is now available
+            # Form upstream of fields as all information is now available.
Line range hint 1217-1267: Improve comment clarity.

The comments explaining the logic can be made clearer for better understanding.
-                # if we raise error on file_path equal to None then existing test-cases will fail as mock data
-                # doesn't have required attributes.
+                # If we raise an error on file_path equal to None, then existing test cases will fail as mock data
+                # doesn't have the required attributes.
metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json (27)
3-16: Ensure container properties are correctly defined.

The container properties aspect appears to be correctly defined with custom properties, name, and other metadata.

27-32: Ensure status aspect is correctly defined.

The status aspect for the container is correctly defined with the removed field set to false.

43-48: Ensure dataPlatformInstance aspect is correctly defined.

The dataPlatformInstance aspect correctly identifies the platform as Looker.

59-66: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "LookML Project".

77-86: Ensure browsePathsV2 aspect is correctly defined.

The browsePathsV2 aspect correctly defines the path for the container.

97-104: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".

133-138: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.

146-246: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.

283-290: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".

319-324: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.

332-526: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.

563-570: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".

599-604: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.

612-774: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.

811-818: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".

847-852: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.

860-1022: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.

1059-1066: Ensure subTypes aspect is correctly defined.

The subTypes aspect correctly identifies the type as "View".

1095-1100: Ensure container aspect is correctly defined.

The container aspect correctly identifies the container URN.

1108-1270: Ensure proposedSnapshot is correctly defined.

The proposedSnapshot aspect includes various metadata aspects such as BrowsePaths, Status, UpstreamLineage, SchemaMetadata, and DatasetProperties. Ensure all fields are correctly defined and consistent with the dataset's metadata.

1283-1296: Ensure browsePathsV2 aspect is correctly defined.

The browsePathsV2 aspect correctly defines the path for the dataset.

1307-1312: Ensure tagKey aspect is correctly defined.

The tagKey aspect correctly defines the name of the tag as "Dimension".

1323-1328: Ensure tagKey aspect is correctly defined.

The tagKey aspect correctly defines the name of the tag as "Measure".

301-308: Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.

115-122: Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.

Verification successful

We need to validate the SQL in the viewLogic field within the LookML configuration. The previous attempt using sqlglot failed because the command was not found in the environment. Let's check for other available tools that might help us validate LookML SQL.

The automated SQL validation tools are not available in the current environment. Therefore, we need to manually inspect the SQL in the viewLogic field for correctness.

Ensure viewProperties aspect is correctly defined:

The SQL logic within the viewLogic field appears to be using LookML templating correctly.

The sql_table_name is dynamically set based on the _user_attributes['looker_env'] value, which is a common practice in LookML.

The dimension definition for generated_message_id seems properly formatted with the necessary attributes (group_label, primary_key, type, sql).

Given the context and the manual inspection, the SQL logic seems to be correctly defined.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Validate the SQL in the viewLogic field.

# Test: Extract and validate the SQL.
echo 'view: activity_logs {
  sql_table_name:
        {% if _user_attributes['looker_env'] == 'dev' %}
          {{ _user_attributes['dev_database_prefix'] }}analytics.{{ _user_attributes['dev_schema_prefix'] }}staging_app.stg_app__activity_logs
        {% elsif _user_attributes['looker_env'] == 'prod' %}
          analytics.staging_app.stg_app__activity_logs
        {% else %}
          analytics.staging_app.stg_app__activity_logs
        {% endif %}
        ;;
  
  dimension: generated_message_id {
    group_label: "IDs"
    primary_key: yes
    type: number
    sql: ${TABLE}."GENERATED_MESSAGE_ID" ;;
  }
}' | sqlglot transpile --dialect lookml
Length of output: 720

Script:
#!/bin/bash
# Check for available SQL validation tools in the environment
command -v sqlfluff
command -v sqlparse
command -v lookml
Length of output: 63
829-836: Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.

581-588: Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.
metadata-ingestion/tests/integration/lookml/expected_output.json (12)
170-187: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD) is correct and consistent across the file.

Verification successful

Dataset URNs are Consistently Formatted

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD) is consistently used across the relevant files.

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json

metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:$urn:li:dataPlatform:postgres,my_table,PROD$'
Length of output: 2430
894-894: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is correct and consistent across the file.

Verification successful

Consistent Use of Dataset URNs Verified

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is consistently formatted across the relevant JSON files.

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json

metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:$urn:li:dataPlatform:postgres,.looker_schema.events,PROD$'
Length of output: 1300
1020-1020: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is correct and consistent across the file.

Verification successful

Ensure Consistent Use of Dataset URNs

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is consistently formatted and used across the files metadata-ingestion/tests/integration/lookml/expected_output.json and metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json.

metadata-ingestion/tests/integration/lookml/expected_output.json

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:$urn:li:dataPlatform:postgres,.looker_schema.events,PROD$'
Length of output: 1300
1330-1330: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correct and consistent across the file.

Verification successful

Consistent Use of Dataset URNs Verified

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is consistently formatted across the file and other related files.

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json

metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:$urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD$'
Length of output: 885
1456-1471: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,fragment_derived_view,PROD) is correct and consistent across the file.

Verification successful

Dataset URNs are consistent

The dataset URNs are consistently formatted across the relevant JSON files.

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json

metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:$urn:li:dataPlatform:postgres,fragment_derived_view,PROD$'
Length of output: 1739
488-511: Ensure Consistent Use of Derived View URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD) is correct and consistent across the file.

Verification successful

Consistent Use of Derived View URNs Verified

The dataset URN for the upstream lineage urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD) is consistently formatted across the file.

metadata-ingestion/tests/integration/lookml/expected_output.json

metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json

metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json

metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json

metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json

metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json

metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json

metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json

metadata-ingestion/tests/integration/looker/golden_test_ingest_unaliased_joins.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of derived view URNs in the JSON file.

# Test: Search for all derived view URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:$urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD$'
Length of output: 23279
Line range hint 1807-1826: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.ecommerce.ability,PROD) is correct and consistent across the file.

Verification successful

Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage is consistent across the file and other related files.

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json

metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:$urn:li:dataPlatform:postgres,.ecommerce.ability,PROD$'
Length of output: 1285
188-229: Verify Field Lineage Consistency

Ensure that the field lineage information is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.

Verification successful

Verify Field Lineage Consistency

The field lineage information is accurate and consistent with the dataset URNs and field paths.

Fields verified: city, is_latest, timestamp, measurement
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of field lineage information in the JSON file.

# Test: Search for all field URNs. Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD$,country\)'
Length of output: 508

Script:
#!/bin/bash
# Description: Verify consistency of field lineage information in the JSON file.

# Test: Search for all field URNs mentioned in the review comment. Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD$,city\)'
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD$,is_latest\)'
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD$,timestamp\)'
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:postgres,my_table,PROD$,measurement\)'
Length of output: 2050
512-533: Verify Field Lineage Consistency for Derived Views

Ensure that the field lineage information for derived views is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.

Verification successful

Field Lineage Consistency Verified

The field lineage information for derived views, including the dataset URNs and field paths for "timestamp" and "measurement," is accurate and consistent across the JSON files.

File: metadata-ingestion/tests/integration/lookml/expected_output.json

Lines: 512-533
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of field lineage information for derived views in the JSON file.

# Test: Search for all field URNs in derived views. Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD$,country\)'
Length of output: 3370

Script:
#!/bin/bash
# Description: Verify consistency of field lineage information for derived views in the JSON file.

# Test: Search for all field URNs in derived views for "timestamp" and "measurement". Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD$,(timestamp|measurement)\)'
Length of output: 4959
1204-1204: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,..autodetect_sql_name_based_on_view_name,PROD) is correct and consistent across the file.

1472-1493: Verify Field Lineage Consistency

Ensure that the field lineage information is accurate and consistent with the dataset URNs and field paths. Verify that the field paths and URNs are correct.

Verification successful

Field Lineage Consistency Verified

The field lineage information for 'platform' and 'country' fields is accurate and consistent with the dataset URNs and field paths.

metadata-ingestion/tests/integration/lookml/expected_output.json

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of field lineage information in the JSON file.

# Test: Search for all field URNs. Expect: Consistent formatting and correct field paths.
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD$,date\)'
Length of output: 538

Script:
#!/bin/bash
# Description: Verify consistency of field lineage information in the JSON file for 'platform' and 'country' fields.

# Test: Search for all field URNs related to 'platform'.
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD$,platform\)'

# Test: Search for all field URNs related to 'country'.
rg --type json 'urn:li:schemaField:$urn:li:dataset:\(urn:li:dataPlatform:postgres,fragment_derived_view,PROD$,country\)'
Length of output: 1098
768-768: Ensure Consistent Use of Dataset URNs

The dataset URN for the upstream lineage should be consistently formatted. Verify that urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correct and consistent across the file.

Verification successful

Ensure Consistent Use of Dataset URNs

The dataset URN urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correctly and consistently formatted across the relevant files.

metadata-ingestion/tests/integration/lookml/expected_output.json

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of dataset URNs in the JSON file.

# Test: Search for all dataset URNs. Expect: Consistent formatting.
rg --type json 'urn:li:dataset:$urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD$'
Length of output: 885

Script:
#!/bin/bash
# Description: List all unique dataset URNs in the JSON files to verify consistency.

# Extract all dataset URNs from the JSON files and list unique occurrences.
rg --type json '"dataset": "urn:li:dataset:([^"]+)"' -o -r '$1' | sort | uniq -c
Length of output: 106480
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (5)

173-229: Ensure Consistency in Field Names Between Upstream and Downstream.

The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names country, city, is_latest, timestamp, and measurement in the upstream and downstream datasets are correct and consistent.

488-533: Ensure Consistency in Field Names Between Upstream and Downstream.

The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names country, city, timestamp, measurement, and average_measurement in the upstream and downstream datasets are correct and consistent.

Line range hint 1405-1493: Verify View Logic and Field Mapping.

Ensure that the view logic and field mappings are correct and consistent with the dataset schema. The field names date, platform, and country should be verified for correctness.

1644-1644: Verify View Logic and Field Mapping.

Ensure that the view logic and field mappings are correct and consistent with the dataset schema. The field names customer_id, sale_price, and order_region should be verified for correctness.

1459-1493: Ensure Consistency in Field Names Between Upstream and Downstream.

The fine-grained lineage mappings should be consistent with the field names defined in the dataset schemas. Verify that the field names date, platform, and country in the upstream and downstream datasets are correct and consistent.

metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (10)

173-229: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

488-533: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

1462-1493: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

Line range hint 1644-1649: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

Line range hint 1518-1552: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

1405-1407: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

1464-1492: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

Line range hint 1644-1650: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

Line range hint 1535-1552: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

Line range hint 1518-1535: Ensure the consistency of field names and types in fine-grained lineages.

The fine-grained lineages should ensure that the field names and types are consistent between upstream and downstream datasets.

metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (5)

173-229: Ensure consistency in fine-grained lineage.

The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.

488-533: Ensure consistency in fine-grained lineage.

The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.

1461-1493: Ensure consistency in fine-grained lineage.

The structure of the fine-grained lineage looks correct. However, ensure that all upstream and downstream fields are correctly mapped and the confidence score is accurate.

1644-1644: Ensure the correctness of view logic.

The view logic for the customer_facts dataset includes a conditional clause. Verify that the condition syntax and logic are correct.

1405-1405: Ensure the correctness of view logic.

The view logic for the fragment_derived_view dataset includes a SELECT statement. Verify that the SQL syntax and logic are correct.
metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (4)
Line range hint 1405-1493: Verify the aliasing of platform to aliased_platform.

The field platform is aliased to aliased_platform. Ensure this aliasing is correct and intentional.

Verification successful

Verification successful: The aliasing of platform to aliased_platform is correct and intentional.

The aliasing of platform to aliased_platform is consistently applied across multiple files in the codebase, indicating that it is deliberate and part of the intended design.

Files verified:

metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json

metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json

metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json

metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json

metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json

metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json

metadata-ingestion/tests/integration/lookml/expected_output.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the aliasing of `platform` to `aliased_platform` in the codebase.

# Test: Search for the usage of `platform` and `aliased_platform` in the codebase. Expect: Only intentional aliasing.
rg --type json $'platform' | rg --type json $'aliased_platform'
Length of output: 4584
1644-1644: Verify the liquid condition tag usage.

Ensure the liquid condition tag {% condition order_region %} order.region {% endcondition %} is used correctly and intentionally.

Verification successful

The liquid condition tag {% condition order_region %} order.region {% endcondition %} is used consistently and intentionally across multiple test files.

metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json

metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json

metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json

metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json

metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json

metadata-ingestion/tests/integration/lookml/expected_output.json

metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json

metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of the liquid condition tag in the codebase.

# Test: Search for the usage of the liquid condition tag in the codebase. Expect: Only intentional usages.
rg --type json $'{% condition order_region %} order.region {% endcondition %}'
Length of output: 215

Script:
#!/bin/bash
# Description: Verify the usage of the liquid condition tag in the codebase.

# Test: Search for the usage of the liquid condition tag in the codebase. Expect: Only intentional usages.
rg --type json '\{% condition order_region %\} order\.region \{% endcondition %\}'
Length of output: 2731
488-533: Verify the mapping of measurement to average_measurement.

The field measurement is mapped to average_measurement downstream. Ensure this mapping is correct and intentional.

Verification successful

The mapping of measurement to average_measurement is intentional and consistent.

The field measurement is consistently mapped to average_measurement downstream across multiple files, indicating that this is a deliberate design choice.

metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json

metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json

metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the mapping of `measurement` to `average_measurement` in the codebase.

# Test: Search for the usage of `measurement` and `average_measurement` in the codebase. Expect: Only intentional mappings.
rg --type json $'measurement' | rg --type json $'average_measurement'
Length of output: 7650

Script:
#!/bin/bash
# Description: Verify the mapping of `measurement` to `average_measurement` in the codebase by examining context.

# Search for the usage of `measurement` in the codebase.
rg --type json --context 5 'measurement'

# Search for the usage of `average_measurement` in the codebase.
rg --type json --context 5 'average_measurement'
Length of output: 195286
173-229: Verify the mapping of measurement to average_measurement.

The field measurement is mapped to average_measurement downstream. Ensure this mapping is correct and intentional.
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (3)

162-229: Ensure consistency in field naming conventions.

The field average_measurement in the downstream is not consistent with the upstream field measurement. This could lead to confusion or errors in lineage tracking.

Is this intentional? If not, consider renaming the downstream field to measurement to maintain consistency.

477-533: Ensure consistency in field naming conventions.

The field average_measurement in the downstream is not consistent with the upstream field measurement. This could lead to confusion or errors in lineage tracking.

Is this intentional? If not, consider renaming the downstream field to measurement to maintain consistency.

Line range hint 1644-1653: Ensure proper handling of liquid conditions.

The view logic includes a liquid condition tag {% condition order_region %} order.region {% endcondition %}. Ensure that the liquid condition is correctly parsed and resolved during execution.

Is there a mechanism in place to handle liquid conditions in the view logic?
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (2)
1466-1500: Ensure Consistency in Field Names

The fineGrainedLineages section contains field mappings between upstream and downstream datasets. Ensure that the field names are consistent and correctly mapped.

Verification successful

Ensure Consistency in Field Names

The extracted fineGrainedLineages section shows consistent field mappings between upstreams and downstreams with expected transformations (e.g., aliasing). The field names are correctly mapped, maintaining consistency across the datasets.

Example Mappings:

upstreams: country -> downstreams: country

upstreams: platform -> downstreams: aliased_platform

upstreams: measurement -> downstreams: average_measurement

The field names in the fineGrainedLineages section are consistent and correctly mapped.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify consistency of field names in fineGrainedLineages.

# Test: Search for field names in the provided JSON file. Expect: Consistent field names across upstream and downstream mappings.
rg --type json 'fineGrainedLineages' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Length of output: 508

Script:
#!/bin/bash
# Description: Extract and compare field names in fineGrainedLineages for consistency.

# Extract upstream and downstream field names within fineGrainedLineages
jq '.[] | select(has("fineGrainedLineages")) | .fineGrainedLineages[] | {upstreams: .upstreams, downstreams: .downstreams}' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Length of output: 205

Script:
#!/bin/bash
# Description: Extract and compare field names in fineGrainedLineages for consistency.

# Extract upstream and downstream field names within fineGrainedLineages
jq '.. | .fineGrainedLineages? // empty | .[] | {upstreams: .upstreams, downstreams: .downstreams}' metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
Length of output: 6540
173-229: Ensure Consistency in Field Names

The fineGrainedLineages section contains field mappings between upstream and downstream datasets. Ensure that the field names are consistent and correctly mapped.
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (10)

170-170: Verify the dataset reference.

Ensure that the dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,my_table,PROD) is correct and consistent with the rest of the data.

173-229: Verify the fine-grained lineage information.

Ensure that the fine-grained lineage information for fields like country, city, is_latest, timestamp, and measurement is correct and follows the expected format.

488-488: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.foo.view.my_view,PROD) is correct and consistent with the rest of the data.

491-536: Verify the fine-grained lineage information.

Ensure that the fine-grained lineage information for fields like country, city, timestamp, measurement, and average_measurement is correct and follows the expected format.

774-774: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correct and consistent with the rest of the data.

1032-1032: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.events,PROD) is correct and consistent with the rest of the data.

Line range hint 1040-1049: Verify the fine-grained lineage information.

Ensure that the fine-grained lineage information for the additional_measure field is correct and follows the expected format.

1219-1219: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,..autodetect_sql_name_based_on_view_name,PROD) is correct and consistent with the rest of the data.

1348-1348: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,.looker_schema.include_able,PROD) is correct and consistent with the rest of the data.

1477-1477: Verify the upstream dataset reference.

Ensure that the upstream dataset reference urn:li:dataset:(urn:li:dataPlatform:postgres,fragment_derived_view,PROD) is correct and consistent with the rest of the data.

coderabbitai · 2024-07-01T08:04:56Z

metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json

+                        ],
+                        "fineGrainedLineages": [
+                            {
+                                "upstreamType": "FIELD_SET",
+                                "upstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,warehouse.default_db.default_schema.my_table,DEV),country)"
+                                ],
+                                "downstreamType": "FIELD",
+                                "downstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),country)"
+                                ],
+                                "confidenceScore": 1.0
+                            },


Ensure consistency in field names.

The downstreamType for the field measurement should match the upstream field's name. The downstreams field should use the same field name measurement instead of average_measurement, unless this transformation is intentional and documented.

- "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),average_measurement)" + "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.my_view,PROD),measurement)"

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-01T08:04:56Z

metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json

+                {
+                    "com.linkedin.pegasus2avro.dataset.UpstreamLineage": {
+                        "upstreams": [
+                            {
+                                "auditStamp": {
+                                    "time": 1586847600000,
+                                    "actor": "urn:li:corpuser:datahub"
+                                },
+                                "dataset": "urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD)",
+                                "type": "VIEW"
+                            }
+                        ],
+                        "fineGrainedLineages": [
+                            {
+                                "upstreamType": "FIELD_SET",
+                                "upstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),date)"
+                                ],
+                                "downstreamType": "FIELD",
+                                "downstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),date)"
+                                ],
+                                "confidenceScore": 1.0
+                            },
+                            {
+                                "upstreamType": "FIELD_SET",
+                                "upstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),platform)"
+                                ],
+                                "downstreamType": "FIELD",
+                                "downstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),aliased_platform)"
+                                ],
+                                "confidenceScore": 1.0
+                            },
+                            {
+                                "upstreamType": "FIELD_SET",
+                                "upstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,default_db.default_schema.fragment_derived_view,PROD),country)"
+                                ],
+                                "downstreamType": "FIELD",
+                                "downstreams": [
+                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.fragment_derived_view,PROD),country)"
+                                ],
+                                "confidenceScore": 1.0
+                            }
+                        ]
+                    }


Ensure completeness of field definitions.

The fields date, aliased_platform, and country are mentioned in the view logic but not defined in the schema metadata. This could lead to incomplete metadata representation.

Ensure that all fields used in the view logic are defined in the schema metadata.

{ "fieldPath": "date", "nullable": false, "description": "Date", "label": "", "type": { "type": { "com.linkedin.pegasus2avro.schema.StringType": {} } }, "nativeDataType": "string", "recursive": false, "globalTags": { "tags": [] }, "isPartOfKey": false }, { "fieldPath": "aliased_platform", "nullable": false, "description": "Platform", "label": "", "type": { "type": { "com.linkedin.pegasus2avro.schema.StringType": {} } }, "nativeDataType": "string", "recursive": false, "globalTags": { "tags": [] }, "isPartOfKey": false }, { "fieldPath": "country", "nullable": false, "description": "Country", "label": "", "type": { "type": { "com.linkedin.pegasus2avro.schema.StringType": {} } }, "nativeDataType": "string", "recursive": false, "globalTags": { "tags": [] }, "isPartOfKey": false }

coderabbitai · 2024-07-01T08:04:56Z

metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py

+    def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]:
+        filter_value: Optional[str] = cast(
+            str, context.globals.get(self.filter_name)
+        )  # to silent lint
+
+        if filter_value is None:
+            raise CustomTagException(
+                f'filter {self.filter_name} value is not provided for "condition" tag'
+            )
+
+        filter_value = filter_value.strip()
+
+        buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'")
+
+        return True


Handle potential edge cases in the render_to_output method.

Ensure that the filter_value is properly escaped to prevent SQL injection.

filter_value = filter_value.strip() + # Escape single quotes to prevent SQL injection + filter_value = filter_value.replace("'", "''")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]:

filter_value: Optional[str] = cast(

str, context.globals.get(self.filter_name)

) # to silent lint

if filter_value is None:

raise CustomTagException(

f'filter {self.filter_name} value is not provided for "condition" tag'

)

filter_value = filter_value.strip()

buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'")

return True

def render_to_output(self, context: Context, buffer: TextIO) -> Optional[bool]:

filter_value: Optional[str] = cast(

str, context.globals.get(self.filter_name)

) # to silent lint

if filter_value is None:

raise CustomTagException(

f'filter {self.filter_name} value is not provided for "condition" tag'

)

filter_value = filter_value.strip()

# Escape single quotes to prevent SQL injection

filter_value = filter_value.replace("'", "''")

buffer.write(f"{self.sql_or_lookml_reference}='{filter_value}'")

return True

coderabbitai · 2024-07-01T08:04:57Z

metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json

+    "aspectName": "subTypes",
    "aspect": {
        "json": {
-            "materialized": false,
-            "viewLogic": "view: owners {\n  dimension: id {\n    primary_key: yes\n    sql: ${TABLE}.id ;;\n  }\n  dimension: owner_name {\n    sql: ${TABLE}.owner_name ;;\n  }\n}",
-            "viewLanguage": "lookml"
+            "typeNames": [
+                "View"
+            ]


Tip
Codebase Verification

Ensure dataset URNs are updated consistently.

The dataset URN urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.owners,PROD) is still present in multiple JSON files. Please ensure that all instances of this URN are updated consistently across the codebase.

metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json

metadata-ingestion/tests/integration/lookml/lookml_reachable_views.json

metadata-ingestion/tests/integration/lookml/expected_output.json

metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json

metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json

metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json

metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json

metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json

Analysis chain

Ensure dataset URNs are updated consistently.

The dataset URN for lkml_samples.view.owners should be updated consistently across all aspects.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Verify all instances of dataset URNs are updated consistently. # Test: Search for all instances of the old and new dataset URNs. Expect: Only new URNs. rg --type json $'urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.owners,PROD)'

Length of output: 92

Script:

#!/bin/bash # Description: Verify all instances of dataset URNs are updated consistently. # Test: Search for all instances of `lkml_samples.view.owners` URNs in JSON files. rg --type json 'lkml_samples.view.owners'

Length of output: 10208

coderabbitai · 2024-07-01T08:04:57Z

metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py

+    def parse(self, stream: TokenStream) -> Node:
+        expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START)
+
+        start_token = stream.current
+
+        stream.next_token()
+        expect(stream, TOKEN_EXPRESSION)
+        filter_name: str = stream.current.value.strip()
+
+        stream.next_token()
+        expect(stream, TOKEN_LITERAL)
+
+        sql_or_lookml_reference: str = stream.current.value.strip()
+
+        stream.next_token()
+        expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END)
+
+        return ConditionNode(
+            tok=start_token,
+            sql_or_lookml_reference=sql_or_lookml_reference,
+            filter_name=filter_name,
+        )


Ensure proper error handling in the parse method.

Add error handling for unexpected tokens to improve robustness.

def parse(self, stream: TokenStream) -> Node: try: expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START) start_token = stream.current stream.next_token() expect(stream, TOKEN_EXPRESSION) filter_name: str = stream.current.value.strip() stream.next_token() expect(stream, TOKEN_LITERAL) sql_or_lookml_reference: str = stream.current.value.strip() stream.next_token() expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END) return ConditionNode( tok=start_token, sql_or_lookml_reference=sql_or_lookml_reference, filter_name=filter_name, ) + except Exception as e: + raise CustomTagException(f"Error parsing condition tag: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def parse(self, stream: TokenStream) -> Node:

expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START)

start_token = stream.current

stream.next_token()

expect(stream, TOKEN_EXPRESSION)

filter_name: str = stream.current.value.strip()

stream.next_token()

expect(stream, TOKEN_LITERAL)

sql_or_lookml_reference: str = stream.current.value.strip()

stream.next_token()

expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END)

return ConditionNode(

tok=start_token,

sql_or_lookml_reference=sql_or_lookml_reference,

filter_name=filter_name,

)

def parse(self, stream: TokenStream) -> Node:

try:

expect(stream, TOKEN_TAG, value=ConditionTag.TAG_START)

start_token = stream.current

stream.next_token()

expect(stream, TOKEN_EXPRESSION)

filter_name: str = stream.current.value.strip()

stream.next_token()

expect(stream, TOKEN_LITERAL)

sql_or_lookml_reference: str = stream.current.value.strip()

stream.next_token()

expect(stream, TOKEN_TAG, value=ConditionTag.TAG_END)

return ConditionNode(

tok=start_token,

sql_or_lookml_reference=sql_or_lookml_reference,

filter_name=filter_name,

)

except Exception as e:

raise CustomTagException(f"Error parsing condition tag: {e}")

coderabbitai · 2024-07-01T08:04:57Z

metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py

+def _get_generic_definition(
+    looker_connection: DBConnection, platform: Optional[str] = None
+) -> Tuple[str, Optional[str], Optional[str]]:
+    if platform is None:
+        # We extract the platform from the dialect name
+        dialect_name = looker_connection.dialect_name
+        assert dialect_name is not None
+        # generally the first part of the dialect name before _ is the name of the platform
+        # versions are encoded as numbers and can be removed
+        # e.g. spark1 or hive2 or druid_18
+        platform = re.sub(r"[0-9]+", "", dialect_name.split("_")[0])
+
+    assert (
+        platform is not None
+    ), f"Failed to extract a valid platform from connection {looker_connection}"
+    db = looker_connection.database
+    schema = looker_connection.schema  # ok for this to be None
+    return (platform, db, schema)


Add error handling for missing Looker connection dialect name.

Ensure that missing dialect names are handled gracefully.

def _get_generic_definition( looker_connection: DBConnection, platform: Optional[str] = None ) -> Tuple[str, Optional[str], Optional[str]]: if platform is None: # We extract the platform from the dialect name dialect_name = looker_connection.dialect_name assert dialect_name is not None # generally the first part of the dialect name before _ is the name of the platform # versions are encoded as numbers and can be removed # e.g. spark1 or hive2 or druid_18 platform = re.sub(r"[0-9]+", "", dialect_name.split("_")[0]) assert ( platform is not None ), f"Failed to extract a valid platform from connection {looker_connection}" db = looker_connection.database schema = looker_connection.schema # ok for this to be None return (platform, db, schema)

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-01T08:04:57Z

metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py

+def _get_bigquery_definition(
+    looker_connection: DBConnection,
+) -> Tuple[str, Optional[str], Optional[str]]:
+    platform = "bigquery"
+    # bigquery project ids are returned in the host field
+    db = looker_connection.host
+    schema = looker_connection.database
+    return (platform, db, schema)


Add error handling for missing Looker connection fields.

Ensure that missing fields in the Looker connection are handled gracefully.

def _get_bigquery_definition( looker_connection: DBConnection, ) -> Tuple[str, Optional[str], Optional[str]]: platform = "bigquery" # bigquery project ids are returned in the host field db = looker_connection.host + if db is None: + raise ConfigurationError("BigQuery project ID (host) is missing in the Looker connection.") schema = looker_connection.database return (platform, db, schema)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _get_bigquery_definition(

looker_connection: DBConnection,

) -> Tuple[str, Optional[str], Optional[str]]:

platform = "bigquery"

# bigquery project ids are returned in the host field

db = looker_connection.host

schema = looker_connection.database

return (platform, db, schema)

def _get_bigquery_definition(

looker_connection: DBConnection,

) -> Tuple[str, Optional[str], Optional[str]]:

platform = "bigquery"

# bigquery project ids are returned in the host field

db = looker_connection.host

if db is None:

raise ConfigurationError("BigQuery project ID (host) is missing in the Looker connection.")

schema = looker_connection.database

return (platform, db, schema)

coderabbitai · 2024-07-01T08:04:57Z

metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py

+    def _load_viewfile(
+        self, project_name: str, path: str, reporter: LookMLSourceReport
+    ) -> Optional[LookerViewFile]:
+        # always fully resolve paths to simplify de-dup
+        path = str(pathlib.Path(path).resolve())
+        allowed_extensions = [_VIEW_FILE_EXTENSION, _EXPLORE_FILE_EXTENSION]
+        matched_any_extension = [
+            match for match in [path.endswith(x) for x in allowed_extensions] if match
+        ]
+        if not matched_any_extension:
+            # not a view file
+            logger.debug(
+                f"Skipping file {path} because it doesn't appear to be a view file. Matched extensions {allowed_extensions}"
+            )
+            return None
+
+        if self.is_view_seen(str(path)):
+            return self.viewfile_cache[path]
+
+        try:
+            with open(path) as file:
+                raw_file_content = file.read()
+        except Exception as e:
+            logger.debug(f"An error occurred while reading path {path}", exc_info=True)
+            self.reporter.report_failure(
+                path, f"failed to load view file {path} from disk: {e}"
+            )
+            return None
+        try:
+            logger.debug(f"Loading viewfile {path}")
+
+            parsed = load_lkml(path)
+
+            resolve_liquid_variable_in_view_dict(
+                raw_view=parsed,
+                liquid_variable=self.liquid_variable,
+            )
+
+            looker_viewfile = LookerViewFile.from_looker_dict(
+                absolute_file_path=path,
+                looker_view_file_dict=parsed,
+                project_name=project_name,
+                root_project_name=self._root_project_name,
+                base_projects_folder=self._base_projects_folder,
+                raw_file_content=raw_file_content,
+                reporter=reporter,
+            )
+            logger.debug(f"adding viewfile for path {path} to the cache")
+            self.viewfile_cache[path] = looker_viewfile
+            return looker_viewfile
+        except Exception as e:
+            logger.debug(f"An error occurred while parsing path {path}", exc_info=True)
+            self.reporter.report_failure(path, f"failed to load view file {path}: {e}")
+            return None
+


Improve error handling in the _load_viewfile method.

Ensure that the method handles file reading and parsing errors gracefully.

def _load_viewfile( self, project_name: str, path: str, reporter: LookMLSourceReport ) -> Optional[LookerViewFile]: # always fully resolve paths to simplify de-dup path = str(pathlib.Path(path).resolve()) allowed_extensions = [_VIEW_FILE_EXTENSION, _EXPLORE_FILE_EXTENSION] matched_any_extension = [ match for match in [path.endswith(x) for x in allowed_extensions] if match ] if not matched_any_extension: # not a view file logger.debug( f"Skipping file {path} because it doesn't appear to be a view file. Matched extensions {allowed_extensions}" ) return None if self.is_view_seen(str(path)): return self.viewfile_cache[path] try: with open(path) as file: raw_file_content = file.read() except Exception as e: logger.debug(f"An error occurred while reading path {path}", exc_info=True) self.reporter.report_failure( path, f"failed to load view file {path} from disk: {e}" ) return None try: logger.debug(f"Loading viewfile {path}") parsed = load_lkml(path) resolve_liquid_variable_in_view_dict( raw_view=parsed, liquid_variable=self.liquid_variable, ) looker_viewfile = LookerViewFile.from_looker_dict( absolute_file_path=path, looker_view_file_dict=parsed, project_name=project_name, root_project_name=self._root_project_name, base_projects_folder=self._base_projects_folder, raw_file_content=raw_file_content, reporter=reporter, ) logger.debug(f"adding viewfile for path {path} to the cache") self.viewfile_cache[path] = looker_viewfile return looker_viewfile except Exception as e: logger.debug(f"An error occurred while parsing path {path}", exc_info=True) self.reporter.report_failure(path, f"failed to load view file {path}: {e}") return None

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-01T08:04:58Z

metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py

+                    continue
+            elif inc.startswith("/"):
+                glob_expr = f"{resolved_project_folder}{inc}"
+
+                # The include path is sometimes '/{project_name}/{path_within_project}'
+                # instead of '//{project_name}/{path_within_project}' or '/{path_within_project}'.
+                #
+                # TODO: I can't seem to find any documentation on this pattern, but we definitely
+                # have seen it in the wild. Example from Mozilla's public looker-hub repo:
+                # https://github.com/mozilla/looker-hub/blob/f491ca51ce1add87c338e6723fd49bc6ae4015ca/fenix/explores/activation.explore.lkml#L7
+                # As such, we try to handle it but are as defensive as possible.
+
+                non_base_project_name = project_name
+                if project_name == _BASE_PROJECT_NAME and root_project_name is not None:
+                    non_base_project_name = root_project_name
+                if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith(
+                    f"/{non_base_project_name}/"
+                ):
+                    # This might be a local include. Let's make sure that '/{project_name}' doesn't
+                    # exist as normal include in the project.
+                    if not pathlib.Path(
+                        f"{resolved_project_folder}/{non_base_project_name}"
+                    ).exists():
+                        path_within_project = pathlib.Path(*pathlib.Path(inc).parts[2:])
+                        glob_expr = f"{resolved_project_folder}/{path_within_project}"
+            else:
+                # Need to handle a relative path.
+                glob_expr = str(pathlib.Path(path).parent / inc)
+            # "**" matches an arbitrary number of directories in LookML
+            # we also resolve these paths to absolute paths so we can de-dup effectively later on
+            included_files = [
+                str(p.resolve())
+                for p in [
+                    pathlib.Path(p)
+                    for p in sorted(
+                        glob.glob(glob_expr, recursive=True)
+                        + glob.glob(f"{glob_expr}.lkml", recursive=True)
+                    )
+                ]
+                # We don't want to match directories. The '**' glob can be used to
+                # recurse into directories.
+                if p.is_file()
+            ]
+            logger.debug(
+                f"traversal_path={traversal_path}, included_files = {included_files}, seen_so_far: {seen_so_far}"
+            )
+            if "*" not in inc and not included_files:
+                reporter.report_failure(path, f"cannot resolve include {inc}")
+            elif not included_files:
+                reporter.report_failure(
+                    path, f"did not resolve anything for wildcard include {inc}"
+                )
+            # only load files that we haven't seen so far
+            included_files = [x for x in included_files if x not in seen_so_far]
+            for included_file in included_files:
+                # Filter out dashboards - we get those through the looker source.
+                if (
+                    included_file.endswith(".dashboard")
+                    or included_file.endswith(".dashboard.lookml")
+                    or included_file.endswith(".dashboard.lkml")
+                ):
+                    logger.debug(
+                        f"include '{included_file}' is a dashboard, skipping it"
+                    )
+                    continue
+
+                logger.debug(
+                    f"Will be loading {included_file}, traversed here via {traversal_path}"
+                )
+                try:
+                    parsed = load_lkml(included_file)
+                    seen_so_far.add(included_file)
+                    if "includes" in parsed:  # we have more includes to resolve!
+                        resolved.extend(
+                            LookerModel.resolve_includes(
+                                parsed["includes"],
+                                resolved_project_name,
+                                root_project_name,
+                                base_projects_folder,
+                                included_file,
+                                reporter,
+                                seen_so_far,
+                                traversal_path=traversal_path
+                                + "."
+                                + pathlib.Path(included_file).stem,
+                            )
+                        )
+                except Exception as e:
+                    reporter.report_warning(
+                        path, f"Failed to load {included_file} due to {e}"
+                    )
+                    # continue in this case, as it might be better to load and resolve whatever we can
+
+            resolved.extend(
+                [
+                    ProjectInclude(project=resolved_project_name, include=f)
+                    for f in included_files
+                ]
+            )
+        return resolved


LGTM! Consider simplifying nested if statements.

The resolve_includes method is well-structured and handles different include patterns effectively. Consider simplifying nested if statements for readability.

- if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith( - f"/{non_base_project_name}/" - ): + if (non_base_project_name != _BASE_PROJECT_NAME and + inc.startswith(f"/{non_base_project_name}/")):

Committable suggestion was skipped due to low confidence.

Tools

Ruff

145-152: Use a single if statement instead of nested if statements

(SIM102)

coderabbitai · 2024-07-01T08:04:58Z

metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json

+    "aspectName": "viewProperties",
+    "aspect": {
+        "json": {
+            "materialized": false,
+            "viewLogic": "view: employee_tax_report {\n  sql_table_name: data-warehouse.finance.form-16;;\n\n  dimension: id {\n    type: number\n    sql: ${TABLE}.id;;\n  }\n\n  dimension: name {\n    type: string\n    sql: ${TABLE}.name;;\n  }\n\n  measure: taxable_income {\n    type: sum\n    sql: ${TABLE}.tax;;\n  }\n}\n",
+            "viewLanguage": "lookml"
+        }
+    },


Tip
Codebase Verification

Ensure viewProperties aspect is correctly defined.

The placeholders and structure of the SQL statements in the viewLogic field are incorrect. The placeholders should follow the format ${TABLE}.<column_name>, but they do not match the expected pattern.

Incorrect structure or placeholder in: ${TABLE}.id

Incorrect structure or placeholder in: ${TABLE}.name

Incorrect structure or placeholder in: ${TABLE}.tax

Please correct these placeholders to ensure the SQL statements are valid.

Analysis chain

Ensure viewProperties aspect is correctly defined.

The viewProperties aspect defines the view logic and language correctly. However, ensure the SQL in the viewLogic field is validated for correctness.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Validate the SQL in the viewLogic field. # Test: Extract and validate the SQL. echo 'view: employee_tax_report { sql_table_name: data-warehouse.finance.form-16;; dimension: id { type: number sql: ${TABLE}.id;; } dimension: name { type: string sql: ${TABLE}.name;; } measure: taxable_income { type: sum sql: ${TABLE}.tax;; } }' | sqlglot transpile --dialect lookml

Length of output: 409

Script:

#!/bin/bash # Description: Validate the SQL in the viewLogic field using sqlite3. # Extract the SQL statements from the viewLogic field. sql_statements=$(echo 'view: employee_tax_report { sql_table_name: data-warehouse.finance.form-16;; dimension: id { type: number sql: ${TABLE}.id;; } dimension: name { type: string sql: ${TABLE}.name;; } measure: taxable_income { type: sum sql: ${TABLE}.tax;; } }' | grep -oP '(?<=sql: ).*(?=;;)') # Validate each SQL statement using sqlite3. for sql in $sql_statements; do echo "Validating SQL: $sql" echo "$sql;" | sqlite3 :memory: 2>&1 done

Length of output: 1030

Script:

#!/bin/bash # Description: Check the placeholders and structure of the SQL statements in the viewLogic field. # Extract the SQL statements from the viewLogic field. sql_statements=$(echo 'view: employee_tax_report { sql_table_name: data-warehouse.finance.form-16;; dimension: id { type: number sql: ${TABLE}.id;; } dimension: name { type: string sql: ${TABLE}.name;; } measure: taxable_income { type: sum sql: ${TABLE}.tax;; } }' | grep -oP '(?<=sql: ).*(?=;;)') # Check the placeholders and structure of each SQL statement. for sql in $sql_statements; do echo "Checking SQL structure: $sql" if [[ $sql =~ \${TABLE}\.[a-zA-Z_]+ ]]; then echo "Placeholder and structure are correct for: $sql" else echo "Incorrect structure or placeholder in: $sql" fi done

Length of output: 1201

mayurinehate · 2024-07-01T10:47:39Z

metadata-ingestion/setup.py

@@ -166,6 +166,7 @@
    "sql-metadata==2.2.2",
    *sqllineage_lib,
    "GitPython>2",
+    "python-liquid",


Can we remove dependencies sql-metadata and sqllineage_lib now that sqlglot parser is used ?

Does sqlglot_lib need to be added to looker_common libraries instead of lookml only, unless the sqlglot imports are strictly separated into lookml only files ?

mayurinehate · 2024-07-01T10:55:51Z

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py

+        view_name: Optional[str] = (
+            self.explore.name
+            if self.field.original_view is not None
+            else self.field.original_view
+        )


This looks suspicious. Else would always be None ? Am I missing something ?

hsheth2

Had a couple comments about golden file changes

hsheth2 · 2024-07-01T18:12:41Z

metadata-ingestion/tests/integration/lookml/expected_output.json

-                            {
-                                "upstreamType": "FIELD_SET",
-                                "upstreams": [
-                                    "urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:postgres,order,PROD),customer_id)"


why did this lineage disappear?

same reason as mentioned below.

hsheth2 · 2024-07-01T18:13:43Z

metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json

-                                    "time": 1586847600000,
-                                    "actor": "urn:li:corpuser:datahub"
-                                },
-                                "dataset": "urn:li:dataset:(urn:li:dataPlatform:bigquery,project-foo.default-db.order,PROD)",


what happened to this lineage?

These was disappear because of condition tag ({% condition order_region %} order.region {% endcondition %}) in liquid.view.lkml, It is not getting parsed by sqlgalot parser. I added liquid_variable in existing configuration so that this tag resolved while resolving liquid templates. Now the lineage appeared in golden file

2. Fix the existing golden files

coderabbitai

Actionable comments posted: 5

Outside diff range and nitpick comments (3)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3)

Line range hint 856-915: Consider breaking down the function into smaller methods.

The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.

Line range hint 916-1156: Consider breaking down the function into smaller methods.

The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.

Line range hint 1157-1306: Consider breaking down the function into smaller methods.

The function is complex and could benefit from being broken down into smaller methods for better readability and maintainability.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 5ad8200 and cf0f45b.

Files selected for processing (23)

metadata-ingestion/setup.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_view_id_cache.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (22 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)
metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (6 hunks)
metadata-ingestion/tests/integration/lookml/expected_output.json (19 hunks)
metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.json (6 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json (6 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (9 hunks)
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (9 hunks)
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (20 hunks)
metadata-ingestion/tests/integration/lookml/test_lookml.py (5 hunks)

Files not summarized due to errors (1)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit

Files skipped from review as they are similar to previous changes (14)

metadata-ingestion/setup.py
metadata-ingestion/src/datahub/ingestion/source/looker/looker_connection.py
metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py
metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json
metadata-ingestion/tests/integration/lookml/expected_output.json
metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
metadata-ingestion/tests/integration/lookml/lookml_mces_badsql_parser.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

409-412: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

592-595: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Additional comments not posted (53)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_view_id_cache.py (3)

17-30: LGTM!

The function determine_view_file_path correctly determines the file path and includes appropriate logging for debugging.

33-77: LGTM!

The class LookerViewIdCache is correctly initialized with necessary attributes.

78-120: LGTM!

The method get_looker_view_id correctly retrieves the Looker view ID with appropriate logging and error handling.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_refinement.py (11)

18-62: LGTM!

The class LookerRefinementResolver is correctly initialized with necessary attributes.

63-65: LGTM!

The method is_refinement correctly checks if a view name is a refinement.

68-94: LGTM!

The method merge_column correctly merges columns from the original and refinement dictionaries.

97-105: LGTM!

The method merge_and_set_column correctly merges columns and sets the result in the new raw view.

107-132: LGTM!

The method merge_refinements correctly merges refinements into the raw view and handles additive parameters.

134-146: LGTM!

The method get_refinements correctly retrieves refinements from the views based on the view name.

148-166: LGTM!

The method get_refinement_from_model_includes correctly retrieves refinements from the model includes and handles missing view files.

168-175: LGTM!

The method should_skip_processing correctly checks if processing should be skipped based on the view name and source configuration.

177-202: LGTM!

The method apply_view_refinement correctly applies refinements to a view and handles caching.

205-222: LGTM!

The method add_extended_explore correctly adds extended explores to the raw explore.

223-251: LGTM!

The method apply_explore_refinement correctly applies refinements to an explore and handles caching.

metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py (3)

147-155: LGTM!

The function _get_bigquery_definition correctly retrieves the BigQuery connection definition.

157-175: LGTM!

The function _get_generic_definition correctly retrieves the generic connection definition and handles platform extraction from the dialect name.

177-220: LGTM!

The class LookerConnectionDefinition is correctly initialized with necessary attributes, and the methods handle validation and creation of connection definitions.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (10)

24-54: LGTM!

The class LookerFieldContext is correctly initialized with necessary attributes, and the methods handle field context operations.

57-164: LGTM!

The class LookerViewContext is correctly initialized with necessary attributes, and the methods handle view context operations.

192-217: LGTM!

The method resolve_extends_view_name correctly resolves the extends view name and handles missing views with appropriate logging.

219-249: LGTM!

The method get_including_extends correctly retrieves the field from the current view or the extended view.

251-253: LGTM!

The method _get_sql_table_name_field correctly retrieves the SQL table name field.

254-263: LGTM!

The method _is_dot_sql_table_name_present correctly checks if the SQL table name contains a dot.

265-277: LGTM!

The method sql_table_name correctly retrieves the SQL table name and handles special cases.

279-287: LGTM!

The method derived_table correctly retrieves the derived table and handles missing tables with assertions.

289-297: LGTM!

The method explore_source correctly retrieves the explore source and handles missing sources with assertions.

299-322: LGTM!

The method sql correctly retrieves the SQL query and handles transformations.

metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.json (7)

170-170: Update dataset URN to postgres.

The dataset URN has been updated from conn to postgres. Ensure this change is consistent with the intended data platform.

178-178: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Verify that this change aligns with the data platform schema.

189-189: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Ensure this change is consistent with the intended data platform.

200-200: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Verify that this change aligns with the data platform schema.

211-211: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Ensure this change is consistent with the intended data platform.

222-222: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Verify that this change aligns with the data platform schema.

233-233: Update schema field URN to postgres.

The schema field URN has been updated from conn to postgres. Ensure this change is consistent with the intended data platform.

metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (8)

39-43: LGTM!

The is_derived_view function correctly checks if a view name contains the DERIVED_VIEW_SUFFIX.

46-66: LGTM! But verify edge cases.

The get_derived_looker_view_id function appears correct. Ensure that edge cases for regex and string manipulation are handled properly.

69-95: LGTM! But verify edge cases.

The resolve_derived_view_urn_of_col_ref function appears correct. Ensure that all potential edge cases are handled properly.

98-124: LGTM! But verify edge cases.

The fix_derived_view_urn function appears correct. Ensure that all potential edge cases are handled properly.

153-196: LGTM! But verify edge cases.

The _generate_fully_qualified_name function appears correct. Ensure that all potential edge cases are handled properly.

235-367: LGTM! But verify edge cases.

The SqlBasedDerivedViewUpstream class appears correct. Ensure that all potential edge cases are handled properly.

372-454: LGTM! But verify edge cases.

The NativeDerivedViewUpstream class appears correct. Ensure that all potential edge cases are handled properly.

581-633: LGTM! But verify edge cases.

The create_view_upstream function appears correct. Ensure that all potential edge cases are handled properly.
metadata-ingestion/tests/integration/lookml/test_lookml.py (6)
71-73: Comments approved.

The comments provide useful context for the configuration.

901-927: New test function approved.

The new test function test_view_to_view_lineage_and_liquid_template validates the new configuration option liquid_variable.

937-1010: New test function approved.

The new test function test_special_liquid_variables validates the behavior of special liquid variables.

47-53: Ensure the new configuration is tested.

The new configuration option liquid_variable should be validated in the test cases to ensure it works as expected.

Verification successful

The new configuration option liquid_variable is being tested.

The search results show that the liquid_variable configuration is present and being validated within the test file metadata-ingestion/tests/integration/lookml/test_lookml.py.

metadata-ingestion/tests/integration/lookml/test_lookml.py
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of the new configuration option `liquid_variable`.

# Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence.
rg --type python $'liquid_variable'
Length of output: 65

Script:
#!/bin/bash
# Description: Verify the usage of the new configuration option `liquid_variable`.

# Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence.
rg --glob '*.py' 'liquid_variable'
Length of output: 6154
15-18: Ensure the new imports are used.

The imports for SpecialVariable and resolve_liquid_variable are new. Ensure these are used in the test cases to avoid unnecessary imports.

Verification successful

The new imports are used.

The imports for SpecialVariable and resolve_liquid_variable are utilized in the test cases, confirming their necessity.

metadata-ingestion/tests/integration/lookml/test_lookml.py:

SpecialVariable is used on lines 15-18, 496-498.

resolve_liquid_variable is used on lines 15-18, 496-498.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of newly imported entities.

# Test: Search for the usage of imported entities. Expect: At least one occurrence of each import.
rg --type python $'SpecialVariable|resolve_liquid_variable'
Length of output: 91

Script:
#!/bin/bash
# Description: Verify the usage of newly imported entities.

# Test: Search for the usage of imported entities. Expect: At least one occurrence of each import.
rg 'SpecialVariable|resolve_liquid_variable' --glob '*.py'
Length of output: 1816
496-498: Ensure the new configuration is tested.

The new configuration option liquid_variable should be validated in the test cases to ensure it works as expected.

Verification successful

Let's correct the command and re-run the search for liquid_variable in Python test files to verify if this new configuration is being tested.

The new configuration is tested.

The new configuration option liquid_variable is validated in the test cases, particularly in the metadata-ingestion/tests/integration/lookml/test_lookml.py file. The test_special_liquid_variables function specifically tests this configuration.

metadata-ingestion/tests/integration/lookml/test_lookml.py

Function: test_special_liquid_variables
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of the new configuration option `liquid_variable`.

# Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence.
rg --type python $'liquid_variable'
Length of output: 65

Script:
#!/bin/bash
# Description: Verify the usage of the new configuration option `liquid_variable`.

# Test: Search for the usage of `liquid_variable` in test cases. Expect: At least one occurrence.
rg -t py 'liquid_variable'
Length of output: 6146
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1)

774-774: LGTM!

The code changes are approved.
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (4)
131-157: Add detailed comments to explain the deduplication criteria.

The function's logic is clear, but adding detailed comments will make it more understandable for future maintainers.
+    # Create a list of field names that are of type DIMENSION_GROUP
301-335: Add type hints for dictionary keys.

Adding type hints for the dictionary keys will improve readability and maintainability.
-        field_dict: Dict,
-        upstream_column_ref: List[ColumnRef],
-        type_cls: ViewFieldType,
-        populate_sql_logic_in_descriptions: bool,
+        field_dict: Dict[str, Any],
+        upstream_column_ref: List[ColumnRef],
+        type_cls: ViewFieldType,
+        populate_sql_logic_in_descriptions: bool,
343-404: Verify the correctness of the else statement.

The else statement might always be None. Verify if this is the intended behavior.

406-459: Add detailed comments to explain the conditions.

The function's logic is clear, but adding detailed comments will make it more understandable for future maintainers.

coderabbitai · 2024-07-02T07:06:20Z

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py

+def find_view_from_resolved_includes(
+    connection: Optional[LookerConnectionDefinition],
+    resolved_includes: List["ProjectInclude"],
+    looker_viewfile_loader: LookerViewFileLoader,
+    target_view_name: str,
+    reporter: LookMLSourceReport,
+) -> Optional[Tuple["ProjectInclude", dict]]:
+    # It could live in one of the included files. We do not know which file the base view
+    # lives in, so we try them all!
+    for include in resolved_includes:
+        included_looker_viewfile = looker_viewfile_loader.load_viewfile(
+            include.include,
+            include.project,
+            connection,
+            reporter,
+        )
+        if not included_looker_viewfile:
+            continue
+        for raw_view in included_looker_viewfile.views:
+            raw_view_name = raw_view["name"]
+            # Make sure to skip loading view we are currently trying to resolve
+            if raw_view_name == target_view_name:
+                return include, raw_view
+
+    return None


Refactor for early returns to reduce nesting.

The function can be refactored to use early returns, which will improve readability and reduce nesting.

- for include in resolved_includes: - included_looker_viewfile = looker_viewfile_loader.load_viewfile( - include.include, - include.project, - connection, - reporter, - ) - if not included_looker_viewfile: - continue - for raw_view in included_looker_viewfile.views: - raw_view_name = raw_view["name"] - if raw_view_name == target_view_name: - return include, raw_view + for include in resolved_includes: + included_looker_viewfile = looker_viewfile_loader.load_viewfile( + include.include, + include.project, + connection, + reporter, + ) + if included_looker_viewfile: + for raw_view in included_looker_viewfile.views: + raw_view_name = raw_view["name"] + if raw_view_name == target_view_name: + return include, raw_view return None

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def find_view_from_resolved_includes(

connection: Optional[LookerConnectionDefinition],

resolved_includes: List["ProjectInclude"],

looker_viewfile_loader: LookerViewFileLoader,

target_view_name: str,

reporter: LookMLSourceReport,

) -> Optional[Tuple["ProjectInclude", dict]]:

# It could live in one of the included files. We do not know which file the base view

# lives in, so we try them all!

for include in resolved_includes:

included_looker_viewfile = looker_viewfile_loader.load_viewfile(

include.include,

include.project,

connection,

reporter,

)

if not included_looker_viewfile:

continue

for raw_view in included_looker_viewfile.views:

raw_view_name = raw_view["name"]

# Make sure to skip loading view we are currently trying to resolve

if raw_view_name == target_view_name:

return include, raw_view

return None

def find_view_from_resolved_includes(

connection: Optional[LookerConnectionDefinition],

resolved_includes: List["ProjectInclude"],

looker_viewfile_loader: LookerViewFileLoader,

target_view_name: str,

reporter: LookMLSourceReport,

) -> Optional[Tuple["ProjectInclude", dict]]:

# It could live in one of the included files. We do not know which file the base view

# lives in, so we try them all!

for include in resolved_includes:

included_looker_viewfile = looker_viewfile_loader.load_viewfile(

include.include,

include.project,

connection,

reporter,

)

if included_looker_viewfile:

for raw_view in included_looker_viewfile.views:

raw_view_name = raw_view["name"]

# Make sure to skip loading view we are currently trying to resolve

if raw_view_name == target_view_name:

return include, raw_view

return None

coderabbitai · 2024-07-02T07:06:20Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+    """Returns a fully qualified dataset name, resolved through a connection definition.
+    Input sql_table_name can be in three forms: table, db.table, db.schema.table"""
+    # TODO: This function should be extracted out into a Platform specific naming class since name translations
+    #  are required across all connectors


Reminder: Consider refactoring the name resolution logic.

The TODO comment suggests extracting the function into a platform-specific naming class. This can improve modularity and maintainability.

Do you want me to refactor this function into a platform-specific naming class or open a GitHub issue to track this task?

coderabbitai · 2024-07-02T07:06:20Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

-
-                # Parse SQL to extract dependencies.
-                if parse_table_names_from_sql:
-                    (
-                        fields,
-                        sql_table_names,
-                    ) = cls._extract_metadata_from_derived_table_sql(
-                        reporter,
-                        sql_parser_path,
-                        view_name,
-                        sql_table_name,
-                        view_logic,
-                        fields,
-                        use_external_process=process_isolation_for_sql_parsing,
-                    )
+        view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length]

-            elif "explore_source" in derived_table:
-                # This is called a "native derived table".
-                # See https://cloud.google.com/looker/docs/creating-ndts.
-                explore_source = derived_table["explore_source"]
-
-                # We want this to render the full lkml block
-                # e.g. explore_source: source_name { ... }
-                # As such, we use the full derived_table instead of the explore_source.
-                view_logic = str(lkml.dump(derived_table))[:max_file_snippet_length]
-                view_lang = VIEW_LANGUAGE_LOOKML
-
-                (
-                    fields,
-                    upstream_explores,
-                ) = cls._extract_metadata_from_derived_table_explore(
-                    reporter, view_name, explore_source, fields
-                )
+        if view_context.is_sql_based_derived_case():
+            view_logic = view_context.sql(transformed=False)
+            # Parse SQL to extract dependencies.
+            view_details = ViewProperties(
+                materialized=False,
+                viewLogic=view_logic,
+                viewLanguage=VIEW_LANGUAGE_SQL,
+            )
+        elif view_context.is_native_derived_case():
+            # We want this to render the full lkml block
+            # e.g. explore_source: source_name { ... }
+            # As such, we use the full derived_table instead of the explore_source.
+            view_logic = str(lkml.dump(view_context.derived_table()))[
+                :max_file_snippet_length
+            ]
+            view_lang = VIEW_LANGUAGE_LOOKML

-            materialized = False
-            for k in derived_table:
-                if k in ["datagroup_trigger", "sql_trigger_value", "persist_for"]:
-                    materialized = True
-            if "materialized_view" in derived_table:
-                materialized = derived_table["materialized_view"] == "yes"
+            materialized = view_context.is_materialized_derived_view()

            view_details = ViewProperties(
                materialized=materialized, viewLogic=view_logic, viewLanguage=view_lang
            )
        else:
-            # If not a derived table, then this view essentially wraps an existing
-            # object in the database. If sql_table_name is set, there is a single
-            # dependency in the view, on the sql_table_name.
-            # Otherwise, default to the view name as per the docs:
-            # https://docs.looker.com/reference/view-params/sql_table_name-for-view
-            sql_table_names = (
-                [view_name] if sql_table_name is None else [sql_table_name]
-            )
            view_details = ViewProperties(
                materialized=False,
                viewLogic=view_logic,
                viewLanguage=VIEW_LANGUAGE_LOOKML,
            )

-        file_path = LookerView.determine_view_file_path(
-            base_folder_path, looker_viewfile.absolute_file_path
-        )
-
        return LookerView(
-            id=LookerViewId(
-                project_name=project_name,
-                model_name=model_name,
-                view_name=view_name,
-                file_path=file_path,
-            ),
-            absolute_file_path=looker_viewfile.absolute_file_path,
-            connection=connection,
-            sql_table_names=sql_table_names,
-            upstream_explores=upstream_explores,
-            fields=fields,
-            raw_file_content=looker_viewfile.raw_file_content,
+            id=looker_view_id,
+            absolute_file_path=view_context.view_file.absolute_file_path,
+            connection=view_context.view_connection,
+            upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
+            fields=view_fields,
+            raw_file_content=view_context.view_file.raw_file_content,


Consider breaking down the from_looker_dict method.

The method is quite large and handles multiple responsibilities. Breaking it down into smaller methods can improve readability and maintainability.

@classmethod def from_looker_dict( cls, project_name: str, model_name: str, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, reporter: LookMLSourceReport, max_file_snippet_length: int, config: LookMLSourceConfig, ctx: PipelineContext, extract_col_level_lineage: bool = False, populate_sql_logic_in_descriptions: bool = False, ) -> Optional["LookerView"]: view_name = view_context.name() logger.debug(f"Handling view {view_name} in model {model_name}") looker_view_id = cls._create_looker_view_id(project_name, model_name, view_name, view_context) view_upstream = cls._create_view_upstream(view_context, looker_view_id_cache, config, ctx, reporter) view_fields = cls._extract_view_fields(view_context, view_upstream, extract_col_level_lineage, populate_sql_logic_in_descriptions) view_fields = deduplicate_fields(view_fields) view_details = cls._determine_view_details(view_context, max_file_snippet_length) return cls._build_looker_view(looker_view_id, view_context, view_upstream, view_fields, view_details) @classmethod def _create_looker_view_id(cls, project_name: str, model_name: str, view_name: str, view_context: LookerViewContext) -> LookerViewId: return LookerViewId( project_name=project_name, model_name=model_name, view_name=view_name, file_path=view_context.view_file_name(), ) @classmethod def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream: return create_view_upstream( view_context=view_context, looker_view_id_cache=looker_view_id_cache, config=config, ctx=ctx, reporter=reporter, ) @classmethod def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]: field_type_vs_raw_fields = OrderedDict( { ViewFieldType.DIMENSION: view_context.dimensions(), ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(), ViewFieldType.MEASURE: view_context.measures(), } ) view_fields = [] for field_type, fields in field_type_vs_raw_fields.items(): for field in fields: upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else [] view_fields.append( ViewField.view_fields_from_dict( field_dict=field, upstream_column_ref=upstream_column_ref, type_cls=field_type, populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions, ) ) if not view_fields and view_context.is_sql_based_derived_view_without_fields_case(): view_fields = view_upstream.create_fields() return view_fields @classmethod def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties: view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length] if view_context.is_sql_based_derived_case(): view_logic = view_context.sql(transformed=False) return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL) elif view_context.is_native_derived_case(): view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length] return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML) else: return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML) @classmethod def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView": return LookerView( id=looker_view_id, absolute_file_path=view_context.view_file.absolute_file_path, connection=view_context.view_connection, upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(), fields=view_fields, raw_file_content=view_context.view_file.raw_file_content, view_details=view_details, )

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between cf0f45b and 33985b5.

Files selected for processing (1)

metadata-ingestion/setup.py (2 hunks)

Files skipped from review as they are similar to previous changes (1)

metadata-ingestion/setup.py

…hub-fork into master+ing-510-lookml-cll

coderabbitai

Actionable comments posted: 7

Outside diff range and nitpick comments (1)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1)
39-87: Consider simplifying nested if statements.

The resolve_includes method has nested if statements that can be simplified for readability.
- if non_base_project_name != _BASE_PROJECT_NAME and inc.startswith(
-     f"/{non_base_project_name}/"
- ):
+ if (non_base_project_name != _BASE_PROJECT_NAME and 
+     inc.startswith(f"/{non_base_project_name}/")):

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 33985b5 and d5025d4.

Files selected for processing (6)

metadata-ingestion/setup.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (18 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (19 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (1 hunks)

Files not summarized due to errors (1)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit

Files skipped from review due to trivial changes (1)

metadata-ingestion/setup.py

Files skipped from review as they are similar to previous changes (2)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py

148-155: Use a single if statement instead of nested if statements

(SIM102)

metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py

40-43: Return the condition DERIVED_VIEW_SUFFIX in view_name.lower() directly

Replace with return DERIVED_VIEW_SUFFIX in view_name.lower()

(SIM103)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

319-322: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

502-505: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Additional comments not posted (13)

metadata-ingestion/src/datahub/ingestion/source/looker/looker_dataclasses.py (3)

18-21: LGTM!

The ProjectInclude dataclass is well-defined and straightforward.

24-29: LGTM!

The LookerField dataclass is well-defined and straightforward.

244-290: LGTM!

The LookerViewFile dataclass is well-defined and straightforward.

metadata-ingestion/src/datahub/ingestion/source/looker/view_upstream.py (7)

200-231: LGTM!

The AbstractViewUpstream class is well-defined and follows the abstract class pattern.

237-372: LGTM!

The SqlBasedDerivedViewUpstream class is well-defined and follows the class pattern.

374-456: LGTM!

The NativeDerivedViewUpstream class is well-defined and follows the class pattern.

458-511: LGTM!

The RegularViewUpstream class is well-defined and follows the class pattern.

513-571: LGTM!

The DotSqlTableNameViewUpstream class is well-defined and follows the class pattern.

573-580: LGTM!

The EmptyImplementation class is well-defined and straightforward.

583-636: LGTM!

The create_view_upstream function is well-defined and follows the factory pattern.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (3)

42-50: New imports look necessary and relevant.

The new imports are necessary for the changes made in the file and align with the updated functionality.

Also applies to: 54-70, 98-98

109-111: New field upstream_dataset_urns looks good.

The new field upstream_dataset_urns is necessary for tracking upstream dependencies.

307-307: Initialization looks good.

The initialization of ctx and reporter is necessary and relevant to the changes made.

coderabbitai · 2024-07-08T09:44:43Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+        view_context: LookerViewContext,
+        looker_view_id_cache: LookerViewIdCache,
        reporter: LookMLSourceReport,
        max_file_snippet_length: int,
-        parse_table_names_from_sql: bool = False,
-        sql_parser_path: str = "datahub.utilities.sql_parser.DefaultSQLParser",
+        config: LookMLSourceConfig,
+        ctx: PipelineContext,
        extract_col_level_lineage: bool = False,
        populate_sql_logic_in_descriptions: bool = False,
-        process_isolation_for_sql_parsing: bool = False,
    ) -> Optional["LookerView"]:
-        view_name = looker_view["name"]
+
+        view_name = view_context.name()
+
        logger.debug(f"Handling view {view_name} in model {model_name}")
-        # The sql_table_name might be defined in another view and this view is extending that view,
-        # so we resolve this field while taking that into account.
-        sql_table_name: Optional[str] = LookerView.get_including_extends(
+


Improvement suggestion: Break down the from_looker_dict method.

The method is quite large and handles multiple responsibilities. Breaking it down into smaller methods can improve readability and maintainability.

@classmethod def from_looker_dict( cls, project_name: str, model_name: str, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, reporter: LookMLSourceReport, max_file_snippet_length: int, config: LookMLSourceConfig, ctx: PipelineContext, extract_col_level_lineage: bool = False, populate_sql_logic_in_descriptions: bool = False, ) -> Optional["LookerView"]: view_name = view_context.name() logger.debug(f"Handling view {view_name} in model {model_name}") looker_view_id = cls._create_looker_view_id(project_name, model_name, view_name, view_context) view_upstream = cls._create_view_upstream(view_context, looker_view_id_cache, config, ctx, reporter) view_fields = cls._extract_view_fields(view_context, view_upstream, extract_col_level_lineage, populate_sql_logic_in_descriptions) view_fields = deduplicate_fields(view_fields) view_details = cls._determine_view_details(view_context, max_file_snippet_length) return cls._build_looker_view(looker_view_id, view_context, view_upstream, view_fields, view_details) @classmethod def _create_looker_view_id(cls, project_name: str, model_name: str, view_name: str, view_context: LookerViewContext) -> LookerViewId: return LookerViewId( project_name=project_name, model_name=model_name, view_name=view_name, file_path=view_context.view_file_name(), ) @classmethod def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream: return create_view_upstream( view_context=view_context, looker_view_id_cache=looker_view_id_cache, config=config, ctx=ctx, reporter=reporter, ) @classmethod def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]: field_type_vs_raw_fields = OrderedDict( { ViewFieldType.DIMENSION: view_context.dimensions(), ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(), ViewFieldType.MEASURE: view_context.measures(), } ) view_fields = [] for field_type, fields in field_type_vs_raw_fields.items(): for field in fields: upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else [] view_fields.append( ViewField.view_fields_from_dict( field_dict=field, upstream_column_ref=upstream_column_ref, type_cls=field_type, populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions, ) ) if not view_fields and view_context.is_sql_based_derived_view_without_fields_case(): view_fields = view_upstream.create_fields() return view_fields @classmethod def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties: view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length] if view_context.is_sql_based_derived_case(): view_logic = view_context.sql(transformed=False) return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL) elif view_context.is_native_derived_case(): view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length] return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML) else: return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML) @classmethod def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView": return LookerView( id=looker_view_id, absolute_file_path=view_context.view_file.absolute_file_path, connection=view_context.view_connection, upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(), fields=view_fields, raw_file_content=view_context.view_file.raw_file_content, view_details=view_details, )

coderabbitai · 2024-07-08T09:44:43Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+            project_name=project_name,
+            model_name=model_name,
            view_name=view_name,
-            looker_view=looker_view,
-            connection=connection,
-            looker_viewfile=looker_viewfile,
-            looker_viewfile_loader=looker_viewfile_loader,
-            looker_refinement_resolver=looker_refinement_resolver,
-            field="sql_table_name",
-            reporter=reporter,
+            file_path=view_context.view_file_name(),
        )

-        # Some sql_table_name fields contain quotes like: optimizely."group", just remove the quotes
-        sql_table_name = (
-            sql_table_name.replace('"', "").replace("`", "")
-            if sql_table_name is not None
-            else None
-        )
-        derived_table = LookerView.get_including_extends(
-            view_name=view_name,
-            looker_view=looker_view,
-            connection=connection,
-            looker_viewfile=looker_viewfile,
-            looker_viewfile_loader=looker_viewfile_loader,
-            looker_refinement_resolver=looker_refinement_resolver,
-            field="derived_table",
+        view_upstream: AbstractViewUpstream = create_view_upstream(
+            view_context=view_context,
+            looker_view_id_cache=looker_view_id_cache,
+            config=config,
+            ctx=ctx,


Refactor suggestion: Extract create_view_upstream to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream: return create_view_upstream( view_context=view_context, looker_view_id_cache=looker_view_id_cache, config=config, ctx=ctx, reporter=reporter, )

coderabbitai · 2024-07-08T09:44:43Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

                raise ValueError(
-                    "Failed to retrieve connections from looker client. Please check to ensure that you have manage_models permission enabled on this API key."
+                    "Failed to retrieve connections from looker client. Please check to ensure that you have "
+                    "manage_models permission enabled on this API key."
                )


Improve exception handling by chaining exceptions.

Use raise ... from err to distinguish the exception from errors in exception handling.

- raise ValueError( - "Failed to retrieve connections from looker client. Please check to ensure that you have " - "manage_models permission enabled on this API key." - ) + raise ValueError( + "Failed to retrieve connections from looker client. Please check to ensure that you have " + "manage_models permission enabled on this API key." + ) from err

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise ValueError(

"Failed to retrieve connections from looker client. Please check to ensure that you have manage_models permission enabled on this API key."

"Failed to retrieve connections from looker client. Please check to ensure that you have "

"manage_models permission enabled on this API key."

)

raise ValueError(

"Failed to retrieve connections from looker client. Please check to ensure that you have "

"manage_models permission enabled on this API key."

) from err

Tools

Ruff

319-322: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

coderabbitai

Actionable comments posted: 5

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d5025d4 and c2d2f6b.

Files selected for processing (1)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (19 hunks)

Files not summarized due to errors (1)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py: Error: Message exceeds token limit

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

502-505: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Additional comments not posted (4)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (4)
4-4: New imports added.

The newly added imports are necessary for the new functionality introduced in this file. Ensure that these imports are used appropriately in the code.

Also applies to: 42-42, 45-47, 49-53, 54-57, 58-66, 68-70, 98-98

109-109: New field upstream_dataset_urns added.

The field upstream_dataset_urns has been added to store the URNs of upstream datasets.

307-307: New field ctx added.

The field ctx has been added to store the pipeline context.

318-322: Improve exception handling by chaining exceptions.

Use raise ... from err to distinguish the exception from errors in exception handling.
- raise ValueError(
-     "Failed to retrieve connections from looker client. Please check to ensure that you have "
-     "manage_models permission enabled on this API key."
- )
+ raise ValueError(
+     "Failed to retrieve connections from looker client. Please check to ensure that you have "
+     "manage_models permission enabled on this API key."
+ ) from err
Likely invalid or redundant comment.

coderabbitai · 2024-07-08T09:49:34Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+        view_context: LookerViewContext,
+        looker_view_id_cache: LookerViewIdCache,
        reporter: LookMLSourceReport,
        max_file_snippet_length: int,
-        parse_table_names_from_sql: bool = False,
-        sql_parser_path: str = "datahub.utilities.sql_parser.DefaultSQLParser",
+        config: LookMLSourceConfig,
+        ctx: PipelineContext,
        extract_col_level_lineage: bool = False,
        populate_sql_logic_in_descriptions: bool = False,
-        process_isolation_for_sql_parsing: bool = False,
    ) -> Optional["LookerView"]:
-        view_name = looker_view["name"]
+
+        view_name = view_context.name()
+
        logger.debug(f"Handling view {view_name} in model {model_name}")
-        # The sql_table_name might be defined in another view and this view is extending that view,
-        # so we resolve this field while taking that into account.
-        sql_table_name: Optional[str] = LookerView.get_including_extends(
+
+        looker_view_id: LookerViewId = LookerViewId(
+            project_name=project_name,
+            model_name=model_name,
            view_name=view_name,
-            looker_view=looker_view,
-            connection=connection,
-            looker_viewfile=looker_viewfile,
-            looker_viewfile_loader=looker_viewfile_loader,
-            looker_refinement_resolver=looker_refinement_resolver,
-            field="sql_table_name",
-            reporter=reporter,
+            file_path=view_context.view_file_name(),
        )

-        # Some sql_table_name fields contain quotes like: optimizely."group", just remove the quotes
-        sql_table_name = (
-            sql_table_name.replace('"', "").replace("`", "")
-            if sql_table_name is not None
-            else None
-        )
-        derived_table = LookerView.get_including_extends(
-            view_name=view_name,
-            looker_view=looker_view,
-            connection=connection,
-            looker_viewfile=looker_viewfile,
-            looker_viewfile_loader=looker_viewfile_loader,
-            looker_refinement_resolver=looker_refinement_resolver,
-            field="derived_table",
+        view_upstream: AbstractViewUpstream = create_view_upstream(
+            view_context=view_context,
+            looker_view_id_cache=looker_view_id_cache,
+            config=config,
+            ctx=ctx,


Refactor suggestion: Extract create_view_upstream to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod def _create_view_upstream(cls, view_context: LookerViewContext, looker_view_id_cache: LookerViewIdCache, config: LookMLSourceConfig, ctx: PipelineContext, reporter: LookMLSourceReport) -> AbstractViewUpstream: return create_view_upstream( view_context=view_context, looker_view_id_cache=looker_view_id_cache, config=config, ctx=ctx, reporter=reporter, )

coderabbitai · 2024-07-08T09:49:35Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+        field_type_vs_raw_fields = OrderedDict(
+            {
+                ViewFieldType.DIMENSION: view_context.dimensions(),
+                ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(),
+                ViewFieldType.MEASURE: view_context.measures(),
+            }
+        )  # in order to maintain order in golden file

-        fields = deduplicate_fields(fields)
+        view_fields: List[ViewField] = []

-        # Prep "default" values for the view, which will be overridden by the logic below.
-        view_logic = looker_viewfile.raw_file_content[:max_file_snippet_length]
-        sql_table_names: List[str] = []
-        upstream_explores: List[str] = []
-
-        if derived_table is not None:
-            # Derived tables can either be a SQL query or a LookML explore.
-            # See https://cloud.google.com/looker/docs/derived-tables.
-
-            if "sql" in derived_table:
-                view_logic = derived_table["sql"]
-                view_lang = VIEW_LANGUAGE_SQL
-
-                # Parse SQL to extract dependencies.
-                if parse_table_names_from_sql:
-                    (
-                        fields,
-                        sql_table_names,
-                    ) = cls._extract_metadata_from_derived_table_sql(
-                        reporter,
-                        sql_parser_path,
-                        view_name,
-                        sql_table_name,
-                        view_logic,
-                        fields,
-                        use_external_process=process_isolation_for_sql_parsing,
+        for field_type, fields in field_type_vs_raw_fields.items():
+            for field in fields:
+                upstream_column_ref: List[ColumnRef] = []
+                if extract_col_level_lineage:
+                    upstream_column_ref = view_upstream.get_upstream_column_ref(
+                        field_context=LookerFieldContext(raw_field=field)
                    )

-            elif "explore_source" in derived_table:
-                # This is called a "native derived table".
-                # See https://cloud.google.com/looker/docs/creating-ndts.
-                explore_source = derived_table["explore_source"]
-
-                # We want this to render the full lkml block
-                # e.g. explore_source: source_name { ... }
-                # As such, we use the full derived_table instead of the explore_source.
-                view_logic = str(lkml.dump(derived_table))[:max_file_snippet_length]
-                view_lang = VIEW_LANGUAGE_LOOKML
-
-                (
-                    fields,
-                    upstream_explores,
-                ) = cls._extract_metadata_from_derived_table_explore(
-                    reporter, view_name, explore_source, fields
+                view_fields.append(
+                    ViewField.view_fields_from_dict(
+                        field_dict=field,
+                        upstream_column_ref=upstream_column_ref,
+                        type_cls=field_type,
+                        populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions,
+                    )


Refactor suggestion: Extract field extraction logic to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod def _extract_view_fields(cls, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, extract_col_level_lineage: bool, populate_sql_logic_in_descriptions: bool) -> List[ViewField]: field_type_vs_raw_fields = OrderedDict( { ViewFieldType.DIMENSION: view_context.dimensions(), ViewFieldType.DIMENSION_GROUP: view_context.dimension_groups(), ViewFieldType.MEASURE: view_context.measures(), } ) view_fields = [] for field_type, fields in field_type_vs_raw_fields.items(): for field in fields: upstream_column_ref = view_upstream.get_upstream_column_ref(field_context=LookerFieldContext(raw_field=field)) if extract_col_level_lineage else [] view_fields.append( ViewField.view_fields_from_dict( field_dict=field, upstream_column_ref=upstream_column_ref, type_cls=field_type, populate_sql_logic_in_descriptions=populate_sql_logic_in_descriptions, ) ) if not view_fields and view_context.is_sql_based_derived_view_without_fields_case(): view_fields = view_upstream.create_fields() return view_fields

coderabbitai · 2024-07-08T09:49:35Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+        if view_context.is_sql_based_derived_case():
+            view_logic = view_context.sql(transformed=False)
+            # Parse SQL to extract dependencies.
+            view_details = ViewProperties(
+                materialized=False,
+                viewLogic=view_logic,
+                viewLanguage=VIEW_LANGUAGE_SQL,
+            )
+        elif view_context.is_native_derived_case():
+            # We want this to render the full lkml block
+            # e.g. explore_source: source_name { ... }
+            # As such, we use the full derived_table instead of the explore_source.
+            view_logic = str(lkml.dump(view_context.derived_table()))[
+                :max_file_snippet_length
+            ]
+            view_lang = VIEW_LANGUAGE_LOOKML
+
+            materialized = view_context.is_materialized_derived_view()


Refactor suggestion: Extract view details determination logic to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod def _determine_view_details(cls, view_context: LookerViewContext, max_file_snippet_length: int) -> ViewProperties: view_logic = view_context.view_file.raw_file_content[:max_file_snippet_length] if view_context.is_sql_based_derived_case(): view_logic = view_context.sql(transformed=False) return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_SQL) elif view_context.is_native_derived_case(): view_logic = str(lkml.dump(view_context.derived_table()))[:max_file_snippet_length] return ViewProperties(materialized=view_context.is_materialized_derived_view(), viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML) else: return ViewProperties(materialized=False, viewLogic=view_logic, viewLanguage=VIEW_LANGUAGE_LOOKML)

coderabbitai · 2024-07-08T09:49:35Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+            id=looker_view_id,
+            absolute_file_path=view_context.view_file.absolute_file_path,
+            connection=view_context.view_connection,
+            upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(),
+            fields=view_fields,
+            raw_file_content=view_context.view_file.raw_file_content,


Refactor suggestion: Extract LookerView construction to a separate method.

This part can be moved to a separate method to improve readability.

@classmethod def _build_looker_view(cls, looker_view_id: LookerViewId, view_context: LookerViewContext, view_upstream: AbstractViewUpstream, view_fields: List[ViewField], view_details: ViewProperties) -> "LookerView": return LookerView( id=looker_view_id, absolute_file_path=view_context.view_file.absolute_file_path, connection=view_context.view_connection, upstream_dataset_urns=view_upstream.get_upstream_dataset_urn(), fields=view_fields, raw_file_content=view_context.view_file.raw_file_content, view_details=view_details, )

coderabbitai · 2024-07-08T09:49:35Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

            raise ValueError(
-                f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file"
+                f"Could not locate a project name for model {model_name}. Consider configuring a static project name "
+                f"in your config file"
            )


Improve exception handling by chaining exceptions.

Use raise ... from err to distinguish the exception from errors in exception handling.

- raise ValueError( - f"Could not locate a project name for model {model_name}. Consider configuring a static project name " - f"in your config file" - ) + raise ValueError( + f"Could not locate a project name for model {model_name}. Consider configuring a static project name " + f"in your config file" + ) from err

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise ValueError(

f"Could not locate a project name for model {model_name}. Consider configuring a static project name in your config file"

f"Could not locate a project name for model {model_name}. Consider configuring a static project name "

f"in your config file"

)

raise ValueError(

f"Could not locate a project name for model {model_name}. Consider configuring a static project name "

f"in your config file"

) from err

Tools

Ruff

502-505: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c2d2f6b and 8629f42.

Files selected for processing (1)

metadata-ingestion/setup.py (2 hunks)

Files skipped from review as they are similar to previous changes (1)

metadata-ingestion/setup.py

sid-acryl added 2 commits May 17, 2024 20:30

wip

32a6ab0

introduce datahub sqlparser for sql parsing

b308f91

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label May 20, 2024

remove import pdb

d09b296

vercel bot deployed to Preview May 20, 2024 07:31 View deployment

sid-acryl and others added 4 commits May 24, 2024 15:27

Merge branch 'master' into master+ing-510-lookml-cll

3010df0

syntax1 and syntax2 support of lookml sql view

4dedd87

update message

7922cc7

Merge branch 'master' into master+ing-510-lookml-cll

86026c6

vercel bot had a problem deploying to Preview May 24, 2024 17:37 Failure

hsheth2 reviewed May 24, 2024

View reviewed changes

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_sql_parser.py Outdated Show resolved Hide resolved

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py Show resolved Hide resolved

sid-acryl added 2 commits May 27, 2024 22:57

add if else for syntax

9f9d510

Merge branch 'master+ing-510-lookml-cll' of github.com:sid-acryl/data…

93add15

…hub-fork into master+ing-510-lookml-cll

vercel bot had a problem deploying to Preview May 27, 2024 17:36 Failure

liquid variable resolution

9dbd32b

vercel bot deployed to Preview May 28, 2024 15:09 View deployment

hsheth2 reviewed May 28, 2024

View reviewed changes

sid-acryl and others added 2 commits May 29, 2024 21:49

liquid variable in config

4f44fd9

Merge branch 'master' into master+ing-510-lookml-cll

fe96655

vercel bot deployed to Preview May 29, 2024 16:50 View deployment

Merge branch 'master' into master+ing-510-lookml-cll

522d4a5

vercel bot deployed to Preview May 30, 2024 06:22 View deployment

sid-acryl added 2 commits May 30, 2024 14:05

sqlglot_lib

fcd9957

Merge branch 'master+ing-510-lookml-cll' of github.com:sid-acryl/data…

808ded1

…hub-fork into master+ing-510-lookml-cll

vercel bot deployed to Preview May 30, 2024 08:49 View deployment

test fixes

542de95

vercel bot deployed to Preview June 1, 2024 03:56 View deployment

view to view lineage

ab61277

vercel bot deployed to Preview June 5, 2024 08:29 View deployment

Merge branch 'master' into master+ing-510-lookml-cll

734fbbe

coderabbitai bot reviewed Jul 1, 2024

View reviewed changes

refactor code

2e8c14f

vercel bot deployed to Preview July 1, 2024 09:58 View deployment

mayurinehate reviewed Jul 1, 2024

View reviewed changes

hsheth2 approved these changes Jul 1, 2024

View reviewed changes

sid-acryl added 2 commits July 2, 2024 11:53

fix existing golden files

b9f8b08

1. Resolve merge conflict

cf0f45b

2. Fix the existing golden files

coderabbitai bot reviewed Jul 2, 2024

View reviewed changes

vercel bot deployed to Preview July 2, 2024 07:08 View deployment

Merge branch 'master' into master+ing-510-lookml-cll

33985b5

coderabbitai bot reviewed Jul 3, 2024

View reviewed changes

vercel bot deployed to Preview July 3, 2024 10:28 View deployment

Merge branch 'master' into master+ing-510-lookml-cll

519c173

vercel bot deployed to Preview July 3, 2024 17:26 View deployment

sid-acryl added 3 commits July 8, 2024 15:08

resolve merge conflict

0b926d0

Merge branch 'master+ing-510-lookml-cll' of github.com:sid-acryl/data…

d5025d4

…hub-fork into master+ing-510-lookml-cll

address review comments

c2d2f6b

coderabbitai bot reviewed Jul 8, 2024

View reviewed changes

vercel bot had a problem deploying to Preview July 8, 2024 09:53 Failure

Merge branch 'master' into master+ing-510-lookml-cll

e7008d2

vercel bot had a problem deploying to Preview July 8, 2024 12:57 Failure

Merge branch 'master' into master+ing-510-lookml-cll

8629f42

coderabbitai bot reviewed Jul 8, 2024

View reviewed changes

vercel bot deployed to Preview July 8, 2024 13:44 View deployment

hsheth2 merged commit 43bac36 into datahub-project:master Jul 8, 2024
57 of 58 checks passed

feldjay mentioned this pull request Oct 17, 2024

fix(ingest/looker): Remove bad imports from looker_common #11663

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ingestion/lookml): liquid template resolution and view-to-view cll #10542

fix(ingestion/lookml): liquid template resolution and view-to-view cll #10542

sid-acryl commented May 20, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot left a comment

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

coderabbitai bot Jul 1, 2024

mayurinehate Jul 1, 2024

mayurinehate Jul 1, 2024

hsheth2 left a comment

hsheth2 Jul 1, 2024

sid-acryl Jul 2, 2024

hsheth2 Jul 1, 2024

sid-acryl Jul 2, 2024

coderabbitai bot left a comment

coderabbitai bot Jul 2, 2024

coderabbitai bot Jul 2, 2024

coderabbitai bot Jul 2, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Jul 8, 2024

coderabbitai bot Jul 8, 2024

coderabbitai bot Jul 8, 2024

coderabbitai bot left a comment

coderabbitai bot Jul 8, 2024

coderabbitai bot Jul 8, 2024

coderabbitai bot Jul 8, 2024

coderabbitai bot Jul 8, 2024

coderabbitai bot Jul 8, 2024

coderabbitai bot left a comment

fix(ingestion/lookml): liquid template resolution and view-to-view cll #10542

fix(ingestion/lookml): liquid template resolution and view-to-view cll #10542

Conversation

sid-acryl commented May 20, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 1, 2024

Choose a reason for hiding this comment

mayurinehate Jul 1, 2024

Choose a reason for hiding this comment

mayurinehate Jul 1, 2024

Choose a reason for hiding this comment

hsheth2 left a comment

Choose a reason for hiding this comment

hsheth2 Jul 1, 2024

Choose a reason for hiding this comment

sid-acryl Jul 2, 2024

Choose a reason for hiding this comment

hsheth2 Jul 1, 2024

Choose a reason for hiding this comment

sid-acryl Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 8, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 8, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

sid-acryl commented May 20, 2024 •

edited by coderabbitai bot

Loading