Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46810][DOCS] Align error class terminology with SQL standard #44902

Closed
100 changes: 68 additions & 32 deletions common/utils/src/main/resources/error/README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,102 @@
# Guidelines
# Guidelines for Throwing User-Facing Errors

To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE,
and message parameters rather than an arbitrary error message.

## Terminology

Though we will mainly talk about "error classes" in front of users, user-facing errors having many parts.

The hierarchy is as follows:
1. Error category
2. Error sub-category
3. Error state / SQLSTATE
4. Error class
5. Error sub-class

The 5-character error state is simply the concatenation of the 2-character category with the 3-character sub-category.

Here is an example:
* Error category: `42` - "Syntax Error or Access Rule Violation"
* Error sub-category: `K01`
* Error state / SQLSTATE: `42K01` - "data type not fully specified"
* Error class: `INCOMPLETE_TYPE_DEFINITION`
* Error sub-class: `ARRAY`
* Error sub-class: `MAP`
* Error sub-class: `STRUCT`
* Error class: `DATATYPE_MISSING_SIZE`


## Usage

1. Check if the error is an internal error.
Internal errors are bugs in the code that we do not expect users to encounter; this does not include unsupported operations.
If true, use the error class `INTERNAL_ERROR` and skip to step 4.
2. Check if an appropriate error class already exists in `error-classes.json`.
If true, use the error class and skip to step 4.
3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; keep in mind the invariants below.
3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; keep in mind the invariants below, which are also [checked here][error-invariants].
4. Check if the exception type already extends `SparkThrowable`.
If true, skip to step 6.
5. Mix `SparkThrowable` into the exception.
6. Throw the exception with the error class and message parameters. If the same exception is thrown in several places, create an util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.

[error-invariants]: https://github.com/apache/spark/blob/40574bb36647a35d7ac1fe8b7b1efcb98b058065/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala#L138-L141

### Before

Throw with arbitrary error message:

throw new TestException("Problem A because B")
```scala
throw new TestException("Problem A because B")
```

### After

`error-classes.json`

"PROBLEM_BECAUSE" : {
"message" : ["Problem <problem> because <cause>"],
"sqlState" : "XXXXX"
}
```json
"PROBLEM_BECAUSE" : {
"message" : ["Problem <problem> because <cause>"],
"sqlState" : "XXXXX"
}
```

`SparkException.scala`

class SparkTestException(
errorClass: String,
messageParameters: Map[String, String])
extends TestException(SparkThrowableHelper.getMessage(errorClass, messageParameters))
with SparkThrowable {

override def getMessageParameters: java.util.Map[String, String] = messageParameters.asJava
```scala
class SparkTestException(
errorClass: String,
messageParameters: Map[String, String])
extends TestException(SparkThrowableHelper.getMessage(errorClass, messageParameters))
with SparkThrowable {

override def getMessageParameters: java.util.Map[String, String] =
messageParameters.asJava

override def getErrorClass: String = errorClass
}
override def getErrorClass: String = errorClass
}
```

Throw with error class and message parameters:

throw new SparkTestException("PROBLEM_BECAUSE", Map("problem" -> "A", "cause" -> "B"))
```scala
throw new SparkTestException("PROBLEM_BECAUSE", Map("problem" -> "A", "cause" -> "B"))
```

## Access fields

To access error fields, catch exceptions that extend `org.apache.spark.SparkThrowable` and access
- Error class with `getErrorClass`
- SQLSTATE with `getSqlState`


try {
...
} catch {
case e: SparkThrowable if Option(e.getSqlState).forall(_.startsWith("42")) =>
warn("Syntax error")
}
```scala
try {
...
} catch {
case e: SparkThrowable if Option(e.getSqlState).forall(_.startsWith("42")) =>
warn("Syntax error")
}
```

## Fields

Expand All @@ -81,9 +117,9 @@ You should not introduce new uncategorized errors. Instead, convert them to prop
### Message

Error messages provide a descriptive, human-readable representation of the error.
The message format accepts string parameters via the HTML tag syntax: e.g. <relationName>.
The message format accepts string parameters via the HTML tag syntax: e.g. `<relationName>`.

The values passed to the message shoudl not themselves be messages.
The values passed to the message should not themselves be messages.
They should be: runtime-values, keywords, identifiers, or other values that are not translated.

The quality of the error message should match the
Expand All @@ -96,21 +132,21 @@ The quality of the error message should match the
### SQLSTATE

SQLSTATE is an mandatory portable error identifier across SQL engines.
SQLSTATE comprises a 2-character class value followed by a 3-character subclass value.
SQLSTATE comprises a 2-character category followed by a 3-character sub-category.
nchammas marked this conversation as resolved.
Show resolved Hide resolved
Spark prefers to re-use existing SQLSTATEs, preferably used by multiple vendors.
For extension Spark claims the 'K**' subclass range.
If a new class is needed it will also claim the 'K0' class.
For extension Spark claims the `K**` sub-category range.
If a new category is needed it will also claim the `K0` category.

Internal errors should use the 'XX' class. You can subdivide internal errors by component.
For example: The existing 'XXKD0' is used for an internal analyzer error.
Internal errors should use the `XX` category. You can subdivide internal errors by component.
For example: The existing `XXKD0` is used for an internal analyzer error.

#### Invariants

- Consistent across releases unless the error is internal.

#### ANSI/ISO standard

The following SQLSTATEs are collated from:
The SQLSTATEs in `error-states.json` are collated from:
- SQL2016
- DB2 zOS/LUW
- PostgreSQL 15
Expand Down