Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crawl_permissions run fails #359

Closed
mwojtyczka opened this issue Oct 3, 2023 · 6 comments
Closed

crawl_permissions run fails #359

mwojtyczka opened this issue Oct 3, 2023 · 6 comments
Assignees
Labels
bug Something isn't working feat/crawler migrate/groups Corresponds to Migrate Groups Step of go/uc/upgrade step/assessment go/uc/upgrade - Assessment Step

Comments

@mwojtyczka
Copy link
Contributor

Crawl permissions task fails with the following error:

`DatabricksError: java.io.IOException: java.util.concurrent.TimeoutException: Timed out after 5 seconds

DatabricksError Traceback (most recent call last)
File ~/.ipykernel/1209/command--1-1024872113:18
15 entry = [ep for ep in metadata.distribution("databricks_labs_ucx").entry_points if ep.name == "runtime"]
16 if entry:
17 # Load and execute the entrypoint, assumes no parameters
---> 18 entry[0].load()()
19 else:
20 import databricks_labs_ucx

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py:213, in main()
212 def main():
--> 213 trigger(*sys.argv)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/tasks.py:93, in trigger(*argv)
90 cfg = WorkspaceConfig.from_file(Path(args["config"]))
91 logging.getLogger("databricks").setLevel(cfg.log_level)
---> 93 current_task.fn(cfg)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py:137, in crawl_permissions(cfg)
135 toolkit = GroupMigrationToolkit(cfg)
136 toolkit.cleanup_inventory_table()
--> 137 toolkit.inventorize_permissions()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/migration.py:124, in GroupMigrationToolkit.inventorize_permissions(self)
123 def inventorize_permissions(self):
--> 124 self._permissions_manager.inventorize_permissions()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/manager.py:26, in PermissionManager.inventorize_permissions(self)
24 crawler_tasks = list(self._get_crawler_tasks())
25 logger.info(f"Starting to crawl permissions. Total tasks: {len(crawler_tasks)}")
---> 26 results = ThreadedExecution.gather("crawl permissions", crawler_tasks)
27 items = []
28 for item in results:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:48, in ThreadedExecution.gather(cls, name, tasks)
45 @classmethod
46 def gather(cls, name: str, tasks: list[ExecutableFunction]) -> list[ExecutableResult]:
47 reporter = ProgressReporter(len(tasks), f"{name}: ")
---> 48 return cls(tasks, num_threads=4, progress_reporter=reporter).run()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:63, in ThreadedExecution.run(self)
60 results = concurrent.futures.wait(self._futures, return_when=ALL_COMPLETED)
62 logger.debug("Collecting the results from threaded execution")
---> 63 collected = [future.result() for future in results.done]
64 return collected

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:63, in (.0)
60 results = concurrent.futures.wait(self._futures, return_when=ALL_COMPLETED)
62 logger.debug("Collecting the results from threaded execution")
---> 63 collected = [future.result() for future in results.done]
64 return collected

File /usr/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
449 raise CancelledError()
450 elif self._state == FINISHED:
--> 451 return self.__get_result()
453 self._condition.wait(timeout)
455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File /usr/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None

File /usr/lib/python3.10/concurrent/futures/thread.py:58, in _WorkItem.run(self)
55 return
57 try:
---> 58 result = self.fn(*self.args, **self.kwargs)
59 except BaseException as exc:
60 self.future.set_exception(exc)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/mixins/hardening.py:57, in rate_limited..decorator..wrapper(*args, **kwargs)
54 @wraps(func)
55 def wrapper(*args, **kwargs):
56 rate_limiter.throttle()
---> 57 return func(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py:62, in GenericPermissionsSupport._crawler_task(self, object_type, object_id)
60 @rate_limited(max_requests=100)
61 def _crawler_task(self, object_type: str, object_id: str) -> Permissions | None:
---> 62 permissions = self._safe_get_permissions(object_type, object_id)
63 if not permissions:
64 return None

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py:84, in GenericPermissionsSupport._safe_get_permissions(self, object_type, object_id)
82 return None
83 else:
---> 84 raise e

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py:73, in GenericPermissionsSupport._safe_get_permissions(self, object_type, object_id)
71 def _safe_get_permissions(self, object_type: str, object_id: str) -> iam.ObjectPermissions | None:
72 try:
---> 73 return self._ws.permissions.get(object_type, object_id)
74 except DatabricksError as e:
75 if e.error_code in [
76 "RESOURCE_DOES_NOT_EXIST",
77 "RESOURCE_NOT_FOUND",
78 "PERMISSION_DENIED",
79 "FEATURE_DISABLED",
80 ]:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/service/iam.py:1919, in PermissionsAPI.get(self, request_object_type, request_object_id)
1906 """Get object permissions.
1907
1908 Gets the permissions of an object. Objects can inherit permissions from their parent objects or root
(...)
1915 :returns: :class:ObjectPermissions
1916 """
1918 headers = {'Accept': 'application/json', }
-> 1919 res = self._api.do('GET',
1920 f'/api/2.0/permissions/{request_object_type}/{request_object_id}',
1921 headers=headers)
1922 return ObjectPermissions.from_dict(res)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/core.py:1084, in ApiClient.do(self, method, path, query, headers, body, raw, files, data)
1080 if not response.ok:
1081 # TODO: experiment with traceback pruning for better readability
1082 # See https://stackoverflow.com/a/58821552/277035
1083 payload = response.json()
-> 1084 raise self._make_nicer_error(status_code=response.status_code, **payload) from None
1085 if raw:
1086 return StreamingResponse(response)

DatabricksError: java.io.IOException: java.util.concurrent.TimeoutException: Timed out after 5 seconds`

last few lines of log output:
21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for directories 4342757430346325 due to PERMISSION_DENIED 21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for directories 4341706793123026 due to PERMISSION_DENIED 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12860/13030, rps: 9.603/sec 21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for directories 4420770739346988 due to PERMISSION_DENIED 21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for directories 4481203634490252 due to PERMISSION_DENIED 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12870/13030, rps: 9.604/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12880/13030, rps: 9.602/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12890/13030, rps: 9.603/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12900/13030, rps: 9.602/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12910/13030, rps: 9.603/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12920/13030, rps: 9.601/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12930/13030, rps: 9.602/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12940/13030, rps: 9.600/sec 21:36 WARN [d.l.ucx.workspace_access.generic] Could not get permissions for authorization passwords due to FEATURE_DISABLED 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12950/13030, rps: 9.601/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12960/13030, rps: 9.600/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12970/13030, rps: 9.603/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12980/13030, rps: 9.606/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 12990/13030, rps: 9.610/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 13000/13030, rps: 9.613/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 13010/13030, rps: 9.615/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 13020/13030, rps: 9.619/sec 21:36 INFO [d.l.ucx.framework.parallel] crawl permissions: 13030/13030, rps: 9.624/sec

@nfx nfx added bug Something isn't working feat/crawler migrate/groups Corresponds to Migrate Groups Step of go/uc/upgrade step/assessment go/uc/upgrade - Assessment Step labels Oct 3, 2023
@nfx nfx added this to the 1 week milestone Oct 3, 2023
@nfx nfx added this to UCX (weekly) Oct 3, 2023
@nfx nfx changed the title Crew permissins run fails crawl_permissions run fails Oct 3, 2023
@larsgeorge-db
Copy link
Contributor

This is being addressed in some change in the SDK soon.

@mwojtyczka
Copy link
Contributor Author

mwojtyczka commented Oct 4, 2023

@larsgeorge-db could you please link the relevant issues from the SDK?
@nfx Do we risk loosing permissions after the migration because of this?

@mwojtyczka
Copy link
Contributor Author

SDK (v0.10.0) is expected to retry based on the HTTP 429 and retry_after header. But apparently, the header is not propagated. Created ES ticket.
For now fixed by adding retry.

@FastLee
Copy link
Contributor

FastLee commented Oct 4, 2023

How do we address "permission denied" error. Should we fail or continue the crawl?

github-merge-queue bot pushed a commit that referenced this issue Oct 4, 2023
…n errors (#375)

Fixed deletion of backup groups [issue #374].
Added rate limits and retries to group operations [issue #353].
Temp fix for issue #359
Added log messages for better visibility.
Added useful troubleshooting snippets to the docs.
@mwojtyczka
Copy link
Contributor Author

mwojtyczka commented Oct 5, 2023

If you are workspace admin, you should not get "permission denied". Still, it does occur for some directories. Not sure there is anything we can do about it other than log it and continue - that's what we do currently.

@nfx nfx self-assigned this Oct 6, 2023
@nfx nfx moved this from Todo to In Progress in UCX (weekly) Oct 6, 2023
zpappa pushed a commit that referenced this issue Oct 6, 2023
…n errors (#375)

Fixed deletion of backup groups [issue #374].
Added rate limits and retries to group operations [issue #353].
Temp fix for issue #359
Added log messages for better visibility.
Added useful troubleshooting snippets to the docs.
@pohlposition
Copy link
Contributor

Fixed by SDK update

@github-project-automation github-project-automation bot moved this from In Progress to Done in UCX (weekly) Oct 9, 2023
github-merge-queue bot pushed a commit to databricks/databricks-sdk-py that referenced this issue Nov 13, 2023
… `BadRequest`, `PermissionDenied`, `InternalError`, and others (#376)

Improve the ergonomics of SDK, where instead of `except DatabricksError
as err: if err.error_code != 'NOT_FOUND': raise err else: do_stuff()` we
could do `except NotFound: do_stuff()`, like in [this
example](https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/workspace_access/generic.py#L71-L84).

Additionally, it'll make it easier to read stack traces, as they will
contain specific exception class name. Examples of unclear stack traces
are: databrickslabs/ucx#359,
databrickslabs/ucx#353,
databrickslabs/ucx#347,

# First principles
- ~~do not override `builtins.NotImplemented` for `NOT_IMPLEMENTED`
error code~~
- assume that platform error_code/HTTP status code mapping is not
perfect and in the state of transition
- we do represent reasonable subset of error codes as specific
exceptions
- it's still possible to access `error_code` from specific exceptions
like `NotFound` or `AlreadyExists`.

# Proposal
- have hierarchical exceptions, also inheriting from Python's built-in
exceptions
- more specific error codes override more generic HTTP status codes
- more generic HTTP status codes matched after more specific error
codes, where there's a default exception class per HTTP status code, and
we do rely on Databricks platform exception mapper to do the right
thing.
- have backward-compatible error creation for cases like using older
versions of the SDK on the way never releases of the platform.


![image](https://github.com/databricks/databricks-sdk-py/assets/259697/a4519f76-0778-468c-9bf5-6133984b5af7)

### Naming conflict resolution

We have four sources of naming and this is a proposed order of naming
conflict resolution:
1. Databricks `error_code`, that is surfaced in our API documentation,
known by Databricks users
2. HTTP Status codes, known by some developers
3. Python builtin exceptions, known by some developers
4. grpc error codes
https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto#L38-L185,
know by some developers

---------

Co-authored-by: Miles Yucht <miles@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feat/crawler migrate/groups Corresponds to Migrate Groups Step of go/uc/upgrade step/assessment go/uc/upgrade - Assessment Step
Projects
Archived in project
Development

No branches or pull requests

5 participants