-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crawl_permissions
run fails
#359
Comments
This is being addressed in some change in the SDK soon. |
@larsgeorge-db could you please link the relevant issues from the SDK? |
SDK (v0.10.0) is expected to retry based on the HTTP 429 and retry_after header. But apparently, the header is not propagated. Created ES ticket. |
How do we address "permission denied" error. Should we fail or continue the crawl? |
If you are workspace admin, you should not get "permission denied". Still, it does occur for some directories. Not sure there is anything we can do about it other than log it and continue - that's what we do currently. |
Fixed by SDK update |
… `BadRequest`, `PermissionDenied`, `InternalError`, and others (#376) Improve the ergonomics of SDK, where instead of `except DatabricksError as err: if err.error_code != 'NOT_FOUND': raise err else: do_stuff()` we could do `except NotFound: do_stuff()`, like in [this example](https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/workspace_access/generic.py#L71-L84). Additionally, it'll make it easier to read stack traces, as they will contain specific exception class name. Examples of unclear stack traces are: databrickslabs/ucx#359, databrickslabs/ucx#353, databrickslabs/ucx#347, # First principles - ~~do not override `builtins.NotImplemented` for `NOT_IMPLEMENTED` error code~~ - assume that platform error_code/HTTP status code mapping is not perfect and in the state of transition - we do represent reasonable subset of error codes as specific exceptions - it's still possible to access `error_code` from specific exceptions like `NotFound` or `AlreadyExists`. # Proposal - have hierarchical exceptions, also inheriting from Python's built-in exceptions - more specific error codes override more generic HTTP status codes - more generic HTTP status codes matched after more specific error codes, where there's a default exception class per HTTP status code, and we do rely on Databricks platform exception mapper to do the right thing. - have backward-compatible error creation for cases like using older versions of the SDK on the way never releases of the platform. ![image](https://github.com/databricks/databricks-sdk-py/assets/259697/a4519f76-0778-468c-9bf5-6133984b5af7) ### Naming conflict resolution We have four sources of naming and this is a proposed order of naming conflict resolution: 1. Databricks `error_code`, that is surfaced in our API documentation, known by Databricks users 2. HTTP Status codes, known by some developers 3. Python builtin exceptions, known by some developers 4. grpc error codes https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto#L38-L185, know by some developers --------- Co-authored-by: Miles Yucht <miles@databricks.com>
Crawl permissions task fails with the following error:
`DatabricksError: java.io.IOException: java.util.concurrent.TimeoutException: Timed out after 5 seconds
DatabricksError Traceback (most recent call last)
File ~/.ipykernel/1209/command--1-1024872113:18
15 entry = [ep for ep in metadata.distribution("databricks_labs_ucx").entry_points if ep.name == "runtime"]
16 if entry:
17 # Load and execute the entrypoint, assumes no parameters
---> 18 entry[0].load()()
19 else:
20 import databricks_labs_ucx
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py:213, in main()
212 def main():
--> 213 trigger(*sys.argv)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/tasks.py:93, in trigger(*argv)
90 cfg = WorkspaceConfig.from_file(Path(args["config"]))
91 logging.getLogger("databricks").setLevel(cfg.log_level)
---> 93 current_task.fn(cfg)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py:137, in crawl_permissions(cfg)
135 toolkit = GroupMigrationToolkit(cfg)
136 toolkit.cleanup_inventory_table()
--> 137 toolkit.inventorize_permissions()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/migration.py:124, in GroupMigrationToolkit.inventorize_permissions(self)
123 def inventorize_permissions(self):
--> 124 self._permissions_manager.inventorize_permissions()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/manager.py:26, in PermissionManager.inventorize_permissions(self)
24 crawler_tasks = list(self._get_crawler_tasks())
25 logger.info(f"Starting to crawl permissions. Total tasks: {len(crawler_tasks)}")
---> 26 results = ThreadedExecution.gather("crawl permissions", crawler_tasks)
27 items = []
28 for item in results:
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:48, in ThreadedExecution.gather(cls, name, tasks)
45 @classmethod
46 def gather(cls, name: str, tasks: list[ExecutableFunction]) -> list[ExecutableResult]:
47 reporter = ProgressReporter(len(tasks), f"{name}: ")
---> 48 return cls(tasks, num_threads=4, progress_reporter=reporter).run()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:63, in ThreadedExecution.run(self)
60 results = concurrent.futures.wait(self._futures, return_when=ALL_COMPLETED)
62 logger.debug("Collecting the results from threaded execution")
---> 63 collected = [future.result() for future in results.done]
64 return collected
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:63, in (.0)
60 results = concurrent.futures.wait(self._futures, return_when=ALL_COMPLETED)
62 logger.debug("Collecting the results from threaded execution")
---> 63 collected = [future.result() for future in results.done]
64 return collected
File /usr/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
449 raise CancelledError()
450 elif self._state == FINISHED:
--> 451 return self.__get_result()
453 self._condition.wait(timeout)
455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
File /usr/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None
File /usr/lib/python3.10/concurrent/futures/thread.py:58, in _WorkItem.run(self)
55 return
57 try:
---> 58 result = self.fn(*self.args, **self.kwargs)
59 except BaseException as exc:
60 self.future.set_exception(exc)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/mixins/hardening.py:57, in rate_limited..decorator..wrapper(*args, **kwargs)
54 @wraps(func)
55 def wrapper(*args, **kwargs):
56 rate_limiter.throttle()
---> 57 return func(*args, **kwargs)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py:62, in GenericPermissionsSupport._crawler_task(self, object_type, object_id)
60 @rate_limited(max_requests=100)
61 def _crawler_task(self, object_type: str, object_id: str) -> Permissions | None:
---> 62 permissions = self._safe_get_permissions(object_type, object_id)
63 if not permissions:
64 return None
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py:84, in GenericPermissionsSupport._safe_get_permissions(self, object_type, object_id)
82 return None
83 else:
---> 84 raise e
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py:73, in GenericPermissionsSupport._safe_get_permissions(self, object_type, object_id)
71 def _safe_get_permissions(self, object_type: str, object_id: str) -> iam.ObjectPermissions | None:
72 try:
---> 73 return self._ws.permissions.get(object_type, object_id)
74 except DatabricksError as e:
75 if e.error_code in [
76 "RESOURCE_DOES_NOT_EXIST",
77 "RESOURCE_NOT_FOUND",
78 "PERMISSION_DENIED",
79 "FEATURE_DISABLED",
80 ]:
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/service/iam.py:1919, in PermissionsAPI.get(self, request_object_type, request_object_id)
1906 """Get object permissions.
1907
1908 Gets the permissions of an object. Objects can inherit permissions from their parent objects or root
(...)
1915 :returns: :class:
ObjectPermissions
1916 """
1918 headers = {'Accept': 'application/json', }
-> 1919 res = self._api.do('GET',
1920 f'/api/2.0/permissions/{request_object_type}/{request_object_id}',
1921 headers=headers)
1922 return ObjectPermissions.from_dict(res)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/core.py:1084, in ApiClient.do(self, method, path, query, headers, body, raw, files, data)
1080 if not response.ok:
1081 # TODO: experiment with traceback pruning for better readability
1082 # See https://stackoverflow.com/a/58821552/277035
1083 payload = response.json()
-> 1084 raise self._make_nicer_error(status_code=response.status_code, **payload) from None
1085 if raw:
1086 return StreamingResponse(response)
DatabricksError: java.io.IOException: java.util.concurrent.TimeoutException: Timed out after 5 seconds`
last few lines of log output:
21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for directories 4342757430346325 due to PERMISSION_DENIED 21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for directories 4341706793123026 due to PERMISSION_DENIED 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12860/13030, rps: 9.603/sec 21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for directories 4420770739346988 due to PERMISSION_DENIED 21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for directories 4481203634490252 due to PERMISSION_DENIED 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12870/13030, rps: 9.604/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12880/13030, rps: 9.602/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12890/13030, rps: 9.603/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12900/13030, rps: 9.602/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12910/13030, rps: 9.603/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12920/13030, rps: 9.601/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12930/13030, rps: 9.602/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12940/13030, rps: 9.600/sec 21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for authorization passwords due to FEATURE_DISABLED 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12950/13030, rps: 9.601/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12960/13030, rps: 9.600/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12970/13030, rps: 9.603/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12980/13030, rps: 9.606/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12990/13030, rps: 9.610/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 13000/13030, rps: 9.613/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 13010/13030, rps: 9.615/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 13020/13030, rps: 9.619/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 13030/13030, rps: 9.624/sec
The text was updated successfully, but these errors were encountered: