Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce more specific exceptions, like NotFound, AlreadyExists, BadRequest, PermissionDenied, InternalError, and others #376

Merged
merged 8 commits into from
Nov 13, 2023

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Oct 3, 2023

Improve the ergonomics of SDK, where instead of except DatabricksError as err: if err.error_code != 'NOT_FOUND': raise err else: do_stuff() we could do except NotFound: do_stuff(), like in this example.

Additionally, it'll make it easier to read stack traces, as they will contain specific exception class name. Examples of unclear stack traces are: databrickslabs/ucx#359, databrickslabs/ucx#353, databrickslabs/ucx#347,

First principles

  • do not override builtins.NotImplemented for NOT_IMPLEMENTED error code
  • assume that platform error_code/HTTP status code mapping is not perfect and in the state of transition
  • we do represent reasonable subset of error codes as specific exceptions
  • it's still possible to access error_code from specific exceptions like NotFound or AlreadyExists.

Proposal

  • have hierarchical exceptions, also inheriting from Python's built-in exceptions
  • more specific error codes override more generic HTTP status codes
  • more generic HTTP status codes matched after more specific error codes, where there's a default exception class per HTTP status code, and we do rely on Databricks platform exception mapper to do the right thing.
  • have backward-compatible error creation for cases like using older versions of the SDK on the way never releases of the platform.

image

Naming conflict resolution

We have four sources of naming and this is a proposed order of naming conflict resolution:

  1. Databricks error_code, that is surfaced in our API documentation, known by Databricks users
  2. HTTP Status codes, known by some developers
  3. Python builtin exceptions, known by some developers
  4. grpc error codes https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto#L38-L185, know by some developers

…ists`, `BadRequest`, `PermissionDenied`, `InternalError`, and others

Improve the ergonomics of SDK, where instead of `except DatabricksError as err: if err.error_code != 'NOT_FOUND': raise err else: do_stuff()` we could do `except NotFound: do_stuff()`. Additionally, it'll make it easier to read stack traces, as they will contain specific exception class name.

# First principles
- do not override `builtins.NotImplemented` for `NOT_IMPLEMENTED` error code
- assume that platform error_code/HTTP status code mapping is not perfect and in the state of transition
- we do represent reasonable subset of error codes as specific exceptions

## Open questions

### HTTP Status Codes vs Error Codes

1. Mix between status codes and error codes (preferred)
2. Rely only on HTTP status codes
3. Rely only on `error_code`'s

One example is `BAD_REQUEST` error code that maps onto HTTP 400 and to `except BadRequest as err` catch clause. But `MALFORMED_REQUEST`, `INVALID_STATE`, and `UNPARSEABLE_HTTP_ERROR` do also map to HTTP 400. So the proposal is to remap the MALFORMED_REQUEST to `BadRequest` exception.

Another corner-case is UC:
- 'METASTORE_DOES_NOT_EXIST': NotFound,
- 'DAC_DOES_NOT_EXIST': NotFound,
- 'CATALOG_DOES_NOT_EXIST': NotFound,
- 'SCHEMA_DOES_NOT_EXIST': NotFound,
- 'TABLE_DOES_NOT_EXIST': NotFound,
- 'SHARE_DOES_NOT_EXIST': NotFound,
- 'RECIPIENT_DOES_NOT_EXIST': NotFound,
- 'STORAGE_CREDENTIAL_DOES_NOT_EXIST': NotFound,
- 'EXTERNAL_LOCATION_DOES_NOT_EXIST': NotFound,
- 'PRINCIPAL_DOES_NOT_EXIST': NotFound,
- 'PROVIDER_DOES_NOT_EXIST': NotFound,

### Naming conflict resolution

We have three sources of naming:
- `error_code`
- HTTP Status
- Python builtin exceptions

We still have to define which name takes the precedence.
@renardeinside
Copy link

Agree with this statement:

HTTP Status Codes vs Error Codes (preferred)

IMO, error codes shall be the main body of the error, since they provide logical explanation to what has happened, while HTTP status codes provide in-depth technical details.

Comparing two options:

except MetastoreDoesNotExist:
    # do something

And:

except NotFound:
  # do something

The first one sounds a bit more DDD which is proven to be a good practice.

@nfx
Copy link
Contributor Author

nfx commented Oct 6, 2023

@renardeinside technically, we can do class MetastoreDoesNotExist(NotFound): pass

@nfx
Copy link
Contributor Author

nfx commented Oct 12, 2023

some more input: SCIM doesn't have a concept of AlreadyExists, only ResourceConflict.

12:26 DEBUG [d.s] POST /api/2.0/preview/scim/v2/Groups
> {
>   "displayName": "db-temp-ucx_fY5N",
>   "members": [
>     {
>       "display": "test-user-2",
>       "value": "**REDACTED**"
>     },
>     {
>       "display": "test-user-3",
>       "value": "**REDACTED**"
>     },
>     "... (1 additional elements)"
>   ],
>   "meta": {
>     "resourceType": "WorkspaceGroup"
>   }
> }
< 409 Conflict
< {
<   "detail": "Group with name db-temp-ucx_fY5N already exists.",
<   "schemas": [
<     "urn:ietf:params:scim:api:messages:2.0:Error"
<   ],
<   "status": "409"
< }

@nfx
Copy link
Contributor Author

nfx commented Oct 12, 2023

and first time seeing this:

E               databricks.sdk.core.DatabricksError: The service at /api/2.0/pipelines is temporarily unavailable. Please try again later. [TraceId: 00-b31ec4becd0d5db8df40d3aa6c7b8d66-f9f0e60e28e5ac19-00]

@alexott
Copy link
Contributor

alexott commented Oct 14, 2023

Yes, it makes sense to have very specific errors and handle them explicitly

@william-conti
Copy link

We're in this situation in the UCX project and it's starting to be urgent because of the usage customer has of this tool. Please prioritize this when possible

nfx added a commit to databricks/databricks-sdk-go that referenced this pull request Nov 7, 2023
…rrNotFound)`, `errors.Is(err, databricks.ErrAlreadyExists)`, `errors.Is(err, databricks.ErrBadRequest)`, `errors.Is(err, databricks.ErrPermissionDenied)`, `errors.Is(err, databricks.ErrInternal)`, and others

This PR enables ergonomic error handling for all Databricks API Errors, following the same principles as databricks/databricks-sdk-py#376
@nfx nfx changed the title (RFC) Introduce more specific exceptions, like NotFound, AlreadyExists, BadRequest, PermissionDenied, InternalError, and others Introduce more specific exceptions, like NotFound, AlreadyExists, BadRequest, PermissionDenied, InternalError, and others Nov 10, 2023
@nfx nfx removed the discussion label Nov 10, 2023
@nfx
Copy link
Contributor Author

nfx commented Nov 10, 2023

Now this PR is generated from the metadata in databricks/databricks-sdk-go#682

Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! Couple small suggestions, otherwise I think we're good.

databricks/sdk/errors.py Outdated Show resolved Hide resolved
databricks/sdk/errors.py Outdated Show resolved Hide resolved

from .base import DatabricksError

__all__ = ['error_mapper'{{range .ExceptionTypes}}, '{{.PascalName}}'{{end}}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? These are also the only exported members of this module. I also wasn't aware that this variable was respected in modules that also define an __init__.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how python works ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just cloned this PR and removed this line and was still able to import these:

from databricks.sdk import errors

print(dir(errors))
['Aborted', 'AlreadyExists', 'BadRequest', 'Cancelled', 'DataLoss', 'DatabricksError', 'DeadlineExceeded', 'ErrorDetail', 'InternalError', 'InvalidParameterValue', 'NotFound', 'NotImplemented', 'OperationFailed', 'OperationTimeout', 'PermissionDenied', 'RequestLimitExceeded', 'ResourceAlreadyExists', 'ResourceConflict', 'ResourceExhausted', 'TemporarilyUnavailable', 'TooManyRequests', 'Unauthenticated', 'Unknown', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'base', 'error_mapper', 'mapper', 'mapping', 'sdk']

If you want to control what is exported, I think you need to include this in the __init__.py file explicitly.

.codegen/error_mapping.py.tmpl Outdated Show resolved Hide resolved
.codegen/error_mapping.py.tmpl Outdated Show resolved Hide resolved
nfx added a commit to databricks/databricks-sdk-java that referenced this pull request Nov 10, 2023
… `BadRequest`, `PermissionDenied`, `InternalError`, and others

See implementations in other SDKs:

- Go: databricks/databricks-sdk-go#682
- Python: databricks/databricks-sdk-py#376
@nfx nfx requested a review from mgyucht November 10, 2023 16:18
@nfx nfx enabled auto-merge November 10, 2023 16:38
github-merge-queue bot pushed a commit to databricks/databricks-sdk-go that referenced this pull request Nov 13, 2023
…rrNotFound)`, `errors.Is(err, databricks.ErrAlreadyExists)`, `errors.Is(err, databricks.ErrBadRequest)`, `errors.Is(err, databricks.ErrPermissionDenied)`, `errors.Is(err, databricks.ErrInternal)`, and others (#682)

## Changes
This PR enables ergonomic error handling for all Databricks API Errors,
following the same principles as
databricks/databricks-sdk-py#376

## Tests

- [x] `make test` passing
- [x] `make fmt` applied
- [ ] relevant integration tests applied
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Thanks for working with us on this!

@nfx nfx added this pull request to the merge queue Nov 13, 2023
Merged via the queue into main with commit 93a622d Nov 13, 2023
7 checks passed
@nfx nfx deleted the ergonomics/errors branch November 13, 2023 13:45
mgyucht added a commit that referenced this pull request Nov 14, 2023
* Introduce more specific exceptions, like `NotFound`, `AlreadyExists`, `BadRequest`, `PermissionDenied`, `InternalError`, and others ([#376](#376)). This makes it easier to handle errors thrown by the Databricks API. Instead of catching `DatabricksError` and checking the error_code field, you can catch one of these subtypes of `DatabricksError`, which is more ergonomic and removes the need to rethrow exceptions that you don't want to catch. For example:
```python
try:
  return (self._ws
    .permissions
    .get(object_type, object_id))
except DatabricksError as e:
  if e.error_code in [
    "RESOURCE_DOES_NOT_EXIST",
    "RESOURCE_NOT_FOUND",
    "PERMISSION_DENIED",
    "FEATURE_DISABLED",
    "BAD_REQUEST"]:
    logger.warning(...)
    return None
  raise RetryableError(...) from e
```
can be replaced with
```python
try:
  return (self._ws
    .permissions
    .get(object_type, object_id))
except PermissionDenied, FeatureDisabled:
  logger.warning(...)
  return None
except NotFound:
  raise RetryableError(...)
```
* Paginate all SCIM list requests in the SDK ([#440](#440)). This change ensures that SCIM list() APIs use a default limit of 100 resources, leveraging SCIM's offset + limit pagination to batch requests to the Databricks API.
* Added taskValues support in remoteDbUtils ([#406](#406)).
* Added more detailed error message on default credentials not found error ([#419](#419)).
* Request management token via Azure CLI only for Service Principals and not human users ([#408](#408)).

API Changes:

 * Fixed `create()` method for [w.functions](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/functions.html) workspace-level service and corresponding `databricks.sdk.service.catalog.CreateFunction` and `databricks.sdk.service.catalog.FunctionInfo` dataclasses.
 * Changed `create()` method for [w.metastores](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/metastores.html) workspace-level service with new required argument order.
 * Changed `storage_root` field for `databricks.sdk.service.catalog.CreateMetastore` to be optional.
 * Added `skip_validation` field for `databricks.sdk.service.catalog.UpdateExternalLocation`.
 * Added `libraries` field for `databricks.sdk.service.compute.CreatePolicy`, `databricks.sdk.service.compute.EditPolicy` and `databricks.sdk.service.compute.Policy`.
 * Added `init_scripts` field for `databricks.sdk.service.compute.EventDetails`.
 * Added `file` field for `databricks.sdk.service.compute.InitScriptInfo`.
 * Added `zone_id` field for `databricks.sdk.service.compute.InstancePoolGcpAttributes`.
 * Added several dataclasses related to init scripts.
 * Added `databricks.sdk.service.compute.LocalFileInfo` dataclass.
 * Replaced `ui_state` field with `edit_mode` for `databricks.sdk.service.jobs.CreateJob` and `databricks.sdk.service.jobs.JobSettings`.
 * Replaced `databricks.sdk.service.jobs.CreateJobUiState` dataclass with `databricks.sdk.service.jobs.CreateJobEditMode`.
 * Added `include_resolved_values` field for `databricks.sdk.service.jobs.GetRunRequest`.
 * Replaced `databricks.sdk.service.jobs.JobSettingsUiState` dataclass with `databricks.sdk.service.jobs.JobSettingsEditMode`.
 * Removed [a.o_auth_enrollment](https://databricks-sdk-py.readthedocs.io/en/latest/account/o_auth_enrollment.html) account-level service. This was only used to aid in OAuth enablement during the public preview of OAuth. OAuth is now enabled for all AWS E2 accounts, so usage of this API is no longer needed.
 * Added `network_connectivity_config_id` field for `databricks.sdk.service.provisioning.UpdateWorkspaceRequest`.
 * Added [a.network_connectivity](https://databricks-sdk-py.readthedocs.io/en/latest/account/network_connectivity.html) account-level service.
 * Added `string_shared_as` field for `databricks.sdk.service.sharing.SharedDataObject`.

Internal changes:

* Added regression question to issue template ([#414](#414)).
* Made test_auth no longer fail if you have a default profile setup ([#426](#426)).

OpenAPI SHA: d136ad0541f036372601bad9a4382db06c3c912d, Date: 2023-11-14
@mgyucht mgyucht mentioned this pull request Nov 14, 2023
github-merge-queue bot pushed a commit that referenced this pull request Nov 14, 2023
* Introduce more specific exceptions, like `NotFound`, `AlreadyExists`,
`BadRequest`, `PermissionDenied`, `InternalError`, and others
([#376](#376)). This
makes it easier to handle errors thrown by the Databricks API. Instead
of catching `DatabricksError` and checking the error_code field, you can
catch one of these subtypes of `DatabricksError`, which is more
ergonomic and removes the need to rethrow exceptions that you don't want
to catch. For example:
```python
try:
  return (self._ws
    .permissions
    .get(object_type, object_id))
except DatabricksError as e:
  if e.error_code in [
    "RESOURCE_DOES_NOT_EXIST",
    "RESOURCE_NOT_FOUND",
    "PERMISSION_DENIED",
    "FEATURE_DISABLED",
    "BAD_REQUEST"]:
    logger.warning(...)
    return None
  raise RetryableError(...) from e
```
can be replaced with
```python
try:
  return (self._ws
    .permissions
    .get(object_type, object_id))
except PermissionDenied, FeatureDisabled:
  logger.warning(...)
  return None
except NotFound:
  raise RetryableError(...)
```
* Paginate all SCIM list requests in the SDK
([#440](#440)). This
change ensures that SCIM list() APIs use a default limit of 100
resources, leveraging SCIM's offset + limit pagination to batch requests
to the Databricks API.
* Added taskValues support in remoteDbUtils
([#406](#406)).
* Added more detailed error message on default credentials not found
error
([#419](#419)).
* Request management token via Azure CLI only for Service Principals and
not human users
([#408](#408)).

API Changes:

* Fixed `create()` method for
[w.functions](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/functions.html)
workspace-level service and corresponding
`databricks.sdk.service.catalog.CreateFunction` and
`databricks.sdk.service.catalog.FunctionInfo` dataclasses.
* Changed `create()` method for
[w.metastores](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/metastores.html)
workspace-level service with new required argument order.
* Changed `storage_root` field for
`databricks.sdk.service.catalog.CreateMetastore` to be optional.
* Added `skip_validation` field for
`databricks.sdk.service.catalog.UpdateExternalLocation`.
* Added `libraries` field for
`databricks.sdk.service.compute.CreatePolicy`,
`databricks.sdk.service.compute.EditPolicy` and
`databricks.sdk.service.compute.Policy`.
* Added `init_scripts` field for
`databricks.sdk.service.compute.EventDetails`.
* Added `file` field for
`databricks.sdk.service.compute.InitScriptInfo`.
* Added `zone_id` field for
`databricks.sdk.service.compute.InstancePoolGcpAttributes`.
 * Added several dataclasses related to init scripts.
 * Added `databricks.sdk.service.compute.LocalFileInfo` dataclass.
* Replaced `ui_state` field with `edit_mode` for
`databricks.sdk.service.jobs.CreateJob` and
`databricks.sdk.service.jobs.JobSettings`.
* Replaced `databricks.sdk.service.jobs.CreateJobUiState` dataclass with
`databricks.sdk.service.jobs.CreateJobEditMode`.
* Added `include_resolved_values` field for
`databricks.sdk.service.jobs.GetRunRequest`.
* Replaced `databricks.sdk.service.jobs.JobSettingsUiState` dataclass
with `databricks.sdk.service.jobs.JobSettingsEditMode`.
* Removed
[a.o_auth_enrollment](https://databricks-sdk-py.readthedocs.io/en/latest/account/o_auth_enrollment.html)
account-level service. This was only used to aid in OAuth enablement
during the public preview of OAuth. OAuth is now enabled for all AWS E2
accounts, so usage of this API is no longer needed.
* Added `network_connectivity_config_id` field for
`databricks.sdk.service.provisioning.UpdateWorkspaceRequest`.
* Added
[a.network_connectivity](https://databricks-sdk-py.readthedocs.io/en/latest/account/network_connectivity.html)
account-level service.
* Added `string_shared_as` field for
`databricks.sdk.service.sharing.SharedDataObject`.

Internal changes:

* Added regression question to issue template
([#414](#414)).
* Made test_auth no longer fail if you have a default profile setup
([#426](#426)).

OpenAPI SHA: d136ad0541f036372601bad9a4382db06c3c912d, Date: 2023-11-14
github-merge-queue bot pushed a commit to databricks/databricks-sdk-java that referenced this pull request Apr 4, 2024
… `BadRequest`, `PermissionDenied`, `InternalError`, and others (#185)

See implementations in other SDKs:

- Go: databricks/databricks-sdk-go#682
- Python: databricks/databricks-sdk-py#376

---------

Co-authored-by: Miles Yucht <miles@databricks.com>
Co-authored-by: Tanmay Rustagi <tanmay.rustagi@databricks.com>
vikrantpuppala pushed a commit to vikrantpuppala/databricks-sdk-java that referenced this pull request Apr 23, 2024
… `BadRequest`, `PermissionDenied`, `InternalError`, and others (databricks#185)

See implementations in other SDKs:

- Go: databricks/databricks-sdk-go#682
- Python: databricks/databricks-sdk-py#376

---------

Co-authored-by: Miles Yucht <miles@databricks.com>
Co-authored-by: Tanmay Rustagi <tanmay.rustagi@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api client issues related to API client ergonomics UX of SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants