Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start separating normalisation from validation logic #282

Merged
merged 24 commits into from
Aug 16, 2022

Conversation

Carreau
Copy link
Member

@Carreau Carreau commented Jun 8, 2022

Re-issue of #236. #244 made things even worse with respect to some problem with notebook trust,
where a notebook is trusted, saved, reopened but is not trusted.

The reason being that the signature is computed before validate, and validate mutate the notebook, leading to the signature not matching.

really validate() should not mutate the notebook ever. Validate is in part used for security. It should not return a value of give results on a modified object.

Just try to imagine if a password comparison function, said compare('HUNTER@', 'hunter2') == True because obviously the user had caps lock pressed on their keyboard ? This is the same.

We obviously can't change brutally, but we really need to stop having validation and normalisation be the same step.

We change the validation logic and separate the normalisation from
the validation step.

We make sure that if a notebook is normalized, it emits a warning.
In the future we will turn the warning in to an Error.

We add test for the current and an xfail test for the future behavior
@Carreau
Copy link
Member Author

Carreau commented Jun 22, 2022

@ewjoachim, you expressed interest in Jupyter and security.

Copy link
Contributor

@ewjoachim ewjoachim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey :) Thanks for the invitation to review :)
It's my first time contributing to this project, I've tried to make relevant comments, but I know quite a few comments in here are on code that you didn't introduce and merely moved (and also, typos)

Also, I may have more time later for a higher-level review :)

nbformat/validator.py Outdated Show resolved Hide resolved
nbformat/validator.py Outdated Show resolved Hide resolved
nbformat/validator.py Outdated Show resolved Hide resolved
nbformat/validator.py Outdated Show resolved Hide resolved
nbformat/validator.py Outdated Show resolved Hide resolved
nbformat/validator.py Outdated Show resolved Hide resolved
Comment on lines +490 to +491
number_of_cells = len(nbdict.get("cells", 0))
for cell_idx in range(number_of_cells):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
number_of_cells = len(nbdict.get("cells", 0))
for cell_idx in range(number_of_cells):
for cell, cell_error in zip(nbdict.get("cells", []), error_tree["cells"]):

(and then replace all the nbdict["cells"][cell_idx] below with cell and the error_tree["cells"][cell_idx] with cell_error

That said, I'm saying this because I imagine that error_tree["cells"] is a list. If it's a dict with no assurance of being correctly ordered, we can do:

Suggested change
number_of_cells = len(nbdict.get("cells", 0))
for cell_idx in range(number_of_cells):
for cell_idx, cell in enumerate(nbdict.get("cells", [])):
cell_error = error_tree["cells"][cell_idx]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all that was indented/dedented as is to be extracted into its own function. I'll mark that as refactor later. Here as well I think we want to generate a copy that is fixed instead of mutating in place.

tests/test_validator.py Outdated Show resolved Hide resolved
tests/test_validator.py Outdated Show resolved Hide resolved
tests/test_validator.py Outdated Show resolved Hide resolved
Carreau and others added 2 commits June 22, 2022 05:56
Co-authored-by: Joachim Jablon <ewjoachim@gmail.com>
@Carreau
Copy link
Member Author

Carreau commented Jun 22, 2022

Hey :) Thanks for the invitation to review :)

Thanks, I was not expecting an in-depth review, but as you pointed out your interest in security and this is problematic because of the signature, validate but mutate, save with mutated signature flow, I thought that would be of interest.

@ewjoachim
Copy link
Contributor

ewjoachim commented Jun 22, 2022

I thought that would be of interest.

It is :) As discussed, I agree with you regarding this aspect of "security boundaries", that we expect some functions to do checks (especially security-related checks, including hashes of content to ensure non-mutability), others to make changes, and it can quickly become problematic when we try to do both at once.

(And I enjoyed every minute of doing this review, don't worry :) )

@Carreau
Copy link
Member Author

Carreau commented Aug 3, 2022

Todo: #282 (comment) need to be refactor later.

The xfail test, and isvalid() should be updated to pass, and isvalid() should not try to fix anything.

@Carreau
Copy link
Member Author

Carreau commented Aug 16, 2022

Ok, let's get that in and see what we break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants