Schema validation is a means of ensuring that incoming data matches an expected format. There are two primary components: definition and validation.
At DataMade, our preferred framework for schema validation is marshmallow
.
marshmallow
provides fairly extensive documentation. See their quickstart to get up and running quickly. For more advanced or specific development, see the relevant section of their user guide.
- University of Minnesota Elections Archive: Definition | Validation
- Dedupe.io API: Access to the Dedupe.io repository is limited to Dedupe.io developers. The schemas for API requests pre-process, authenticate, validate, and post-process data using
marshmallow
. If you wish to see an example of any or all of the above, request a gist of the relevant code from a member of thededupeio
organization.
Excerpted from a July 2019 implementation plan for the Dedupe.io API.
It would be ideal to lean on an existing schema validation library, rather than rolling our own solution, to reduce the amount of custom infrastructure we need to maintain. I propose using marshmallow, "a… library for converting complex data types, such as objects, to and from native Python data types." The documentation contains a compelling section on its benefits over other schema validation libraries.
Personally, I like:
- The ability to define schemas as Python classes
- Sensical and elegant interface (decorators) for custom, callable validators
- Returns all validation errors at once, enabling a nicer user experience with the API
- Pre- and post-processing hooks for transforming valid data
I also considered:
- jsonschema
- Pros
- Familiar from applications in pupa and IHS
- Supports custom validation
- Cons
- Define schemas as dictionaries, not classes
- Validation API is wonky
- Does not support transformation
- Pros
- schema
- Pros
- Seems to offer a lot of the same functionality as marshmallow
- Cons
- The API is inelegant
- Pros
We like Django REST framework for API development. It provides its own module for definiting and validating schemas, which we've used in a number of projects, however there is a django-rest-marshmallow
plugin that would allow us to use marshmallow
serializers instead. It would be ideal to trial this in our next Django REST project, in order to more fully standardize our tooling for schema-related operations.