Add validation guidelines #1130

lukpueh · 2020-09-10T08:10:08Z

There seems to be agreement to discontinue the securesystemslib schema facility (see secure-systems-lab/securesystemslib#183). We still need to be able to validate all inputs at the user boundary (type annotations should make this a lot easier), and provide tools to check if metadata is spec compliant (maybe we can use something like JSON schema?).
At any rate, it would be helpful for contributors to provide guidelines for validation.

lukpueh · 2020-09-10T08:15:06Z

Using in-toto style ValidationMixin might be a commendable way to validate metadata in memory (see the mixin and it's usage for details).

lukpueh · 2020-12-01T09:12:02Z

secure-systems-lab/code-style-guidelines#18 has an interesting discussion about input validation, control flow and program consistency.

joshuagl · 2021-02-18T10:04:03Z

The approach I would take to this research project, is:

Understand the current validation mechanism in use:

Review securesystemslib.schema, its purpose and its flaws.
Review in-toto's ValidationMixin, which validates metadata in memory utilising securesystemslib.schema (see example usage).

Review existing external/3rd-party solutions:

pydantic -- uses type annotations to provide data validation (runtime type hint validation) and settings management.
marshmallow -- uses schemas to provide simplified object serialisation, deserialisation, and input data validation.

Understand options for custom validation logic

Descriptors seem well suited for attribute validation. However, they may not allow for the currently supported pattern of initialising empty objects, assigning values, and later validating them (from Add input validation to simple metadata api #1140 (comment)).

For each of the three possible new approaches suggested above, I would expect some prototype code to be written to get a feel for how the approach fits with our new code. I'd be inclined to base on #1279, if it has not already been merged by the time we get to experimenting with new approaches.

Goals:
We want to be able to:

validate all inputs at the user boundary.
provide tools to check if metadata is specification compliant.

Outcomes:

Submit an ADR on input validation, summarising different approaches and making a recommendation for the project.

Considerations:

Are the stated goals appropriate/sufficient?
Which Python versions are supported by the various mechanisms explored?
pip is about to become a major user and has to vendor any dependencies we add. This may be a good argument for a custom implementation, or perhaps not if the transitive closure of dependencies to vendor is small.
Be aware of performance impact (see Revise schema and formats facility secure-systems-lab/securesystemslib#183 (comment)).
Other third-party solutions exist i.e. desert and typical.

Next steps:

Input validation for simple metadata API (Add input validation to simple metadata api #1140).

See also, the related issue on input validation for metadata API: #1140

Other possibly useful references:

Unreachable else best practice secure-systems-lab/code-style-guidelines#18 has a discussion about input validation, control flow and program consistency.
blog post about instance attribute validation techniques including the in-toto approach and Descriptors
The section of the Hypermodern Python guide on typing discusses data validation with Desert and Marshmallow

MVrachev · 2021-03-10T15:26:51Z

The initial version of the ADR addressing this issue is out: #1301.
It contains only two options for now ValidationMixin and pydantic.

MVrachev · 2021-05-26T12:42:21Z

Update on what has happened so far:

I documented multiple validations options in ADR0007: How we are going to validate the new API codebase #1301.
It was decided that I would create validation functions for each of the attributes before deciding which validation option is good for us from ADR 7.
Created Prototype descriptors and validation functions #1366 to showcase how validation functions are supposed to work with python descriptors.
After Jussi's comment here, we decided it's best to do research on how we use the metadata attributes and think about what might go wrong with them.
I started creating issues for each of the metadata attributes in the format Metadata Attribute research - * where * is the name of the attribute. For example Metadata Attribute research: spec_version #1419, Metadata Attribute research: expires #1420.

I will unassign myself from this issue for now, because I am not actively working on validation guidelines ADR.
Before that, it's important to understand how do we want to operate and store all of the metadata attributes, provide validation functions for them and decide which validation option do we want to use from ADR7 or something totally new.

MVrachev · 2022-03-17T11:06:19Z

Together with @lukpueh we have discussed that a formal ADR about validation guidelines seems too much of work and we are not sure we needed it as we have already implemented validation for all Metadata classes (see #1140 (comment)).

Even if there is no ADR there is a sense in providing some guidance about how the maintainers feel about validation, what validations options were discussed and what requirements should be taken into account when adding validation to python-tuf. It seems that the best place to answer those questions is in a blogpost published on https://theupdateframework.github.io/python-tuf/ and together with @lukpueh agree that this is the logical step that will close this issue.

lukpueh changed the title ~~**Add validation guidelines**~~ Add validation guidelines Sep 10, 2020

lukpueh added discussion Discussions related to the design, implementation and operation of the project documentation Documentation of the project as well as procedural documentation labels Sep 10, 2020

lukpueh added this to the General Infrastructure/Workflow decisions milestone Sep 10, 2020

This was referenced Sep 10, 2020

Decide on extent of OOP in metadata model #1133

Closed

Add classes for complex metadata attributes #1138

Closed

Add classes for complex metadata fields #1139

Closed

Add input validation to simple metadata api #1140

Closed

MVrachev mentioned this issue Sep 23, 2020

Add exception handling guidelines #1131

Open

lukpueh mentioned this issue Dec 1, 2020

Unreachable else best practice secure-systems-lab/code-style-guidelines#18

Closed

joshuagl assigned joshuagl and MVrachev and unassigned joshuagl Feb 18, 2021

MVrachev mentioned this issue Mar 10, 2021

ADR0007: How we are going to validate the new API codebase #1301

Closed

3 tasks

MVrachev removed their assignment May 26, 2021

lukpueh mentioned this issue Nov 30, 2021

examples: run mypy on repo example #1697

Closed

lukpueh mentioned this issue Mar 17, 2022

Add stricter TARGETPATH and PATHPATTERN checks #1018

Closed

MVrachev self-assigned this Mar 17, 2022

MVrachev added the backlog Issues to address with priority for current development goals label Mar 23, 2022

lukpueh mentioned this issue Mar 20, 2023

Require fully conformant metadata when loading in-toto/in-toto#186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add validation guidelines #1130

Add validation guidelines #1130

lukpueh commented Sep 10, 2020

lukpueh commented Sep 10, 2020

lukpueh commented Dec 1, 2020

joshuagl commented Feb 18, 2021

MVrachev commented Mar 10, 2021

MVrachev commented May 26, 2021

MVrachev commented Mar 17, 2022 •

edited

Loading

Add validation guidelines #1130

Add validation guidelines #1130

Comments

lukpueh commented Sep 10, 2020

lukpueh commented Sep 10, 2020

lukpueh commented Dec 1, 2020

joshuagl commented Feb 18, 2021

MVrachev commented Mar 10, 2021

MVrachev commented May 26, 2021

MVrachev commented Mar 17, 2022 • edited Loading

MVrachev commented Mar 17, 2022 •

edited

Loading