Add ability to capture validation errors #1

kevin-bates · 2022-02-28T20:59:30Z

This pull request stems from an issue discovered during Jupyter Server performance analysis in which it was learned that the server is validating notebooks twice (and on both read and write operations). This appears to have been the case for the last 7 to 8 years (since the code resided in notebook). I suspect this is based on history few are privy to, but this pull request attempts to rectify this in a (hopefully) acceptable manner. Since the elimination of the redundant validation can improve performance by nearly 50%, this seems like an opportunity we should not ignore.

Because so many years have passed, I believe there are at least two backward-compatibility concerns that lead to this particular resolution.

The read/write calls do not inform the caller that the operation encountered a validation error. As a result, the content (on read) is always returned, even in light of validation errors, which, today, are logged as errors. This behavior also leads to the content always being written, even in light of validation errors. (Of course, filesystem-related issues will raise appropriate exceptions, but all things being equal, the caller has no idea a validation issue is present.)
Applications wish to format their own validation error. Jupyter Server (and notebook) used a subsequent (and unconditional) call to nbformat's validate() method, to capture the exception and produce an application-friendly error message when validation issues are encountered.

This change adds an optional dictionary-valued parameter (capture_validation_error) that applications can pass to capture the ValidationError instance for use by the calling application. If the parameter is non-None and a dictionary instance, a validation error will be added into the dictionary under the key 'ValidationError' while the corresponding value will contain the ValidationError instance. This would allow applications that make an additional call to validate() to remove the second call since they have both their content (on reads) and the ValidationError instance when validation issues are present.

Alternative approaches

One approach would be to add an optional boolean parameter that skips validation on the read/write methods, leaving actual validation up to the application. Since today's read/write code leaves no indication that a validation error was encountered, this approach would work as well. However, given that read/write methods already include validation, it does require an additional method call and feels a bit clunky.
Another approach would be to raise ValidationError when it occurs, which also allows the application to produce a friendly message, but will prevent the return of content (on reads) and persistence of content (on writes) - that is assumed behavior today.

There may be other solutions, but I think we should do something as this is the kind of low-hanging fruit that performance-sensitive folks live for. 😄

Add ability to capture validation errors

96c75a3

kevin-bates merged commit 66443e6 into master Mar 2, 2022

kevin-bates deleted the capture-validation-error branch March 2, 2022 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to capture validation errors #1

Add ability to capture validation errors #1

kevin-bates commented Feb 28, 2022

Add ability to capture validation errors #1

Add ability to capture validation errors #1

Conversation

kevin-bates commented Feb 28, 2022

Alternative approaches