Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to capture validation errors #1

Merged
merged 1 commit into from
Mar 2, 2022

Conversation

kevin-bates
Copy link
Owner

This pull request stems from an issue discovered during Jupyter Server performance analysis in which it was learned that the server is validating notebooks twice (and on both read and write operations). This appears to have been the case for the last 7 to 8 years (since the code resided in notebook). I suspect this is based on history few are privy to, but this pull request attempts to rectify this in a (hopefully) acceptable manner. Since the elimination of the redundant validation can improve performance by nearly 50%, this seems like an opportunity we should not ignore.

Because so many years have passed, I believe there are at least two backward-compatibility concerns that lead to this particular resolution.

  1. The read/write calls do not inform the caller that the operation encountered a validation error. As a result, the content (on read) is always returned, even in light of validation errors, which, today, are logged as errors. This behavior also leads to the content always being written, even in light of validation errors. (Of course, filesystem-related issues will raise appropriate exceptions, but all things being equal, the caller has no idea a validation issue is present.)
  2. Applications wish to format their own validation error. Jupyter Server (and notebook) used a subsequent (and unconditional) call to nbformat's validate() method, to capture the exception and produce an application-friendly error message when validation issues are encountered.

This change adds an optional dictionary-valued parameter (capture_validation_error) that applications can pass to capture the ValidationError instance for use by the calling application. If the parameter is non-None and a dictionary instance, a validation error will be added into the dictionary under the key 'ValidationError' while the corresponding value will contain the ValidationError instance. This would allow applications that make an additional call to validate() to remove the second call since they have both their content (on reads) and the ValidationError instance when validation issues are present.

Alternative approaches

  1. One approach would be to add an optional boolean parameter that skips validation on the read/write methods, leaving actual validation up to the application. Since today's read/write code leaves no indication that a validation error was encountered, this approach would work as well. However, given that read/write methods already include validation, it does require an additional method call and feels a bit clunky.
  2. Another approach would be to raise ValidationError when it occurs, which also allows the application to produce a friendly message, but will prevent the return of content (on reads) and persistence of content (on writes) - that is assumed behavior today.

There may be other solutions, but I think we should do something as this is the kind of low-hanging fruit that performance-sensitive folks live for. 😄

@kevin-bates kevin-bates merged commit 66443e6 into master Mar 2, 2022
@kevin-bates kevin-bates deleted the capture-validation-error branch March 2, 2022 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant