-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task #864
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #864 +/- ##
=======================================
Coverage 86.55% 86.55%
=======================================
Files 41 41
Lines 5162 5171 +9
Branches 938 943 +5
=======================================
+ Hits 4468 4476 +8
+ Misses 481 480 -1
- Partials 213 215 +2 ☔ View full report in Codecov by Sentry. |
@@ -50,6 +50,14 @@ def _assess_pipelines(self, all_pipelines) -> Iterable[PipelineInfo]: | |||
pipeline_config = pipeline_response.spec.configuration | |||
if pipeline_config: | |||
failures.extend(self.check_spark_conf(pipeline_config, "pipeline")) | |||
pipeline_cluster = pipeline_response.spec.clusters[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you iterate over clusters instead of picking up the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the code to iterate through the cluster.
@@ -21,14 +21,22 @@ def test_pipeline_assessment_with_config(mocker): | |||
) | |||
] | |||
|
|||
ws = Mock() | |||
ws = MagicMock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create_autospec workspace client instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using create_autospec now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor to workspace_client_mock
config_dict = { | ||
"spark.hadoop.fs.azure.account.auth.type.abcde.dfs.core.windows.net": "SAS", | ||
"spark.hadoop.fs.azure.sas.token.provider.type.abcde.dfs." | ||
"core.windows.net": "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider", | ||
"spark.hadoop.fs.azure.sas.fixed.token.abcde.dfs.core.windows.net": "{{secrets/abcde_access/sasFixedToken}}", | ||
} | ||
pipeline_cluster = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move out these long responses to a separate file, so that tests are more maintainable? see https://github.com/databrickslabs/ucx/blob/main/tests/unit/assessment/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separated out the long response outside.
ws.pipelines.get().spec.configuration = config_dict | ||
ws.pipelines.get().spec.clusters = pipeline_cluster | ||
ws.cluster_policies.get().definition = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor to start using workspace_client_mock
- see https://github.com/databrickslabs/ucx/blame/main/tests/unit/assessment/test_clusters.py#L22-L58
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced it with workspace_client_mock
config_dict = {} | ||
pipeline_cluster = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor to start using workspace_client_mock
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced it with workspace_client_mock
@@ -112,9 +203,32 @@ def test_pipeline_without_owners_should_have_empty_creator_name(): | |||
) | |||
] | |||
|
|||
ws = Mock() | |||
ws = create_autospec(WorkspaceClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor to start using workspace_client_mock
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced it with workspace_client_mock
) | ||
] | ||
|
||
ws = create_autospec(WorkspaceClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor to start using workspace_client_mock
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced it with workspace_client_mock
ws.pipelines.get().spec.configuration = config_dict | ||
ws.pipelines.get().spec.clusters = pipeline_cluster | ||
|
||
crawler = PipelinesCrawler(ws, MockBackend(), "ucx")._assess_pipelines(sample_pipelines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't invoke private methods in unit tests! it's prohibited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the private method from unit test
ws.pipelines.get().spec.configuration = config_dict | ||
ws.pipelines.get().spec.clusters = pipeline_cluster | ||
crawler = PipelinesCrawler(ws, MockBackend(), "ucx")._assess_pipelines(sample_pipelines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't invoke private methods in unit tests! it's prohibited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the private method from unit test
'"hidden": true\n }\n}' | ||
) | ||
ws.workspace.export().content = "JXNoCmVjaG8gIj0=" | ||
ws.dbfs.read().data = "JXNoCmVjaG8gIj0=" | ||
|
||
crawler = PipelinesCrawler(ws, MockBackend(), "ucx")._assess_pipelines(sample_pipelines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't invoke private methods in unit tests! it's prohibited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the private method from unit test
config_dict = { | ||
"spark.hadoop.fs.azure.account.auth.type.abcde.dfs.core.windows.net": "SAS", | ||
"spark.hadoop.fs.azure.sas.token.provider.type.abcde.dfs." | ||
"core.windows.net": "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider", | ||
"spark.hadoop.fs.azure.sas.fixed.token.abcde.dfs.core.windows.net": "{{secrets/abcde_access/sasFixedToken}}", | ||
} | ||
ws.pipelines.get().spec.configuration = config_dict | ||
ws.pipelines.get().spec.clusters = mock_pipeline_cluster | ||
ws.cluster_policies.get(policy_id="single-user-with-spn").definition = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at "init.py" in the same folder and do it like there. Modify that file if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the code accordingly
tests/unit/assessment/conftest.py
Outdated
|
||
@pytest.fixture(scope="function") | ||
def mock_pipeline_cluster(): | ||
yield [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify workspace_client_mock to store json in a file for pipelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified the file accordingly.
tests/unit/assessment/__init__.py
Outdated
ws = create_autospec(WorkspaceClient) | ||
ws.clusters.list.return_value = _load_list(ClusterDetails, f"../assessment/clusters/{clusters}") | ||
ws.cluster_policies.get = _cluster_policy | ||
ws.pipelines.get().spec.clusters = _load_list(PipelineCluster, f"clusters/{pipeline_cluster}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is silly and makes no sense at all. mock the entire response, not just a subset - ws.pipelines.get.return_value = _load_list(...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now capturing the entire PipelineResponse instead of only mocking the cluster part.
tests/unit/assessment/__init__.py
Outdated
@@ -24,8 +25,9 @@ def _cluster_policy(policy_id: str): | |||
return Policy(description=definition, policy_family_definition_overrides=overrides) | |||
|
|||
|
|||
def workspace_client_mock(clusters="no-spark-conf.json"): | |||
def workspace_client_mock(clusters="no-spark-conf.json", pipeline_cluster="pipeline_cluster.json"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes no sense - you're not mocking the list response. in this case - argument is not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was are having this parameter to give an option to specify the pipeline cluster json. Removed this argument now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you read the surrounding code before making changes?...
tests/unit/assessment/__init__.py
Outdated
@@ -28,4 +29,5 @@ def workspace_client_mock(clusters="no-spark-conf.json"): | |||
ws = create_autospec(WorkspaceClient) | |||
ws.clusters.list.return_value = _load_list(ClusterDetails, f"../assessment/clusters/{clusters}") | |||
ws.cluster_policies.get = _cluster_policy | |||
ws.pipelines.get.return_value = _load_list(GetPipelineResponse, "clusters/pipeline_cluster.json")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prajin-29 , do you read the surrounding code before changing it? 🤦
why ws.pipelines.get
is different than ws.cluster_policies.get
? it's illogical!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated ws.peipeline.get
similar to ws.cluster_policies.get
. With this I have included the pipeline spec configuration also inside the json so that we can access everything in one shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test implementation is incorrect and confusing
tests/unit/assessment/__init__.py
Outdated
@@ -24,8 +25,14 @@ def _cluster_policy(policy_id: str): | |||
return Policy(description=definition, policy_family_definition_overrides=overrides) | |||
|
|||
|
|||
def _pipeline_cluster(pipeline_id: str): | |||
pipeline_response = _load_list(GetPipelineResponse, f"clusters/{pipeline_id}.json")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it load_list and not load fixture? Why do you put pipeline test fixtures to a clusters folder? This will confuse people after you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nfx now the code is rebased with the main and using the pipeline test fixture from the pipeline folder.
"spark.hadoop.fs.azure.sas.fixed.token.abcde.dfs.core.windows.net": "{{secrets/abcde_access/sasFixedToken}}", | ||
} | ||
ws.pipelines.get().spec.configuration = config_dict | ||
ws = workspace_client_mock(clusters="job-source-cluster.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prajin-29 I've extended workspace_client_mock
in #923 to read pipeline spec from a json file in under tests/unit/assessment/pipelines
, so you can rebase your code off of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased the code to use the change
ws.workspace.export().content = "JXNoCmVjaG8gIj0=" | ||
ws.dbfs.read().data = "JXNoCmVjaG8gIj0=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these for pipeline tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I added this to cover the negative scenario for full coverage. But with the addition of the spec in json we can remove this. Removed this from test.
# Conflicts: # tests/unit/assessment/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
* Added CLI Command `databricks labs ucx save-uc-compatible-roles` ([#863](#863)). * Added dashboard widget with table count by storage and format ([#852](#852)). * Added verification of group permissions ([#841](#841)). * Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task ([#864](#864)). * Created cluster policy (ucx-policy) to be used by all UCX compute. This may require customers to reinstall UCX. ([#853](#853)). * Skip scanning objects that were removed on platform side since the last scan time, so that integration tests are less flaky ([#922](#922)). * Updated assessment documentation ([#873](#873)). Dependency updates: * Updated databricks-sdk requirement from ~=0.18.0 to ~=0.19.0 ([#930](#930)).
* Added CLI Command `databricks labs ucx save-uc-compatible-roles` ([#863](#863)). * Added dashboard widget with table count by storage and format ([#852](#852)). * Added verification of group permissions ([#841](#841)). * Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task ([#864](#864)). * Created cluster policy (ucx-policy) to be used by all UCX compute. This may require customers to reinstall UCX. ([#853](#853)). * Skip scanning objects that were removed on platform side since the last scan time, so that integration tests are less flaky ([#922](#922)). * Updated assessment documentation ([#873](#873)). Dependency updates: * Updated databricks-sdk requirement from ~=0.18.0 to ~=0.19.0 ([#930](#930)).
* Added CLI Command `databricks labs ucx save-uc-compatible-roles` ([#863](#863)). * Added dashboard widget with table count by storage and format ([#852](#852)). * Added verification of group permissions ([#841](#841)). * Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task ([#864](#864)). * Created cluster policy (ucx-policy) to be used by all UCX compute. This may require customers to reinstall UCX. ([#853](#853)). * Skip scanning objects that were removed on platform side since the last scan time, so that integration tests are less flaky ([#922](#922)). * Updated assessment documentation ([#873](#873)). Dependency updates: * Updated databricks-sdk requirement from ~=0.18.0 to ~=0.19.0 ([#930](#930)).
Changes
Checking pipeline cluster config and cluster policy in Crawl Pipeline
Linked issues
closes #844
Resolves #844
Functionality
databricks labs ucx ...
...
...
Tests