[TOC]
Cowj supports appending a type system for input schema validation as of now.
Design goal of this has been to do automatic input verification for for Request
body.
Sans that, the developer has to assume a lot about how input data would be structured. Consider the POJO:
record class Person( String firstName, String lastName, int age)
That something is of this type is an open problem. We can argue that it must have at least one of the attributes. But what works for business?
The real verification is always hidden deep behind actual verification done, post receiving the actual object.
For example, what about the Rule
that first name can not be more than 32 chars?
Or age must be between 0 to 150?
These are thrift
, avro
, protobuf
and likes. They can define schema, but the trouble is about rules. Their design goal is compression.
Admittedly they are superior at that, and they also mandate both server and the client to compile
to generate the actual stub, they take care of serialization problem.
Admittedly Avro is better at this than the rest, but there are others.
These are Open API
, RAML
and JSON-Schema
. There are intrinsic trouble with open api because of - it is post facto, once the response is actually coded in, one should automagically
produce the response.
My own opinionated remark on this - this is not good.
Code should be done based on interfaces, and auto generating schema is a terrible idea for things which are distributed in nature by definition.
RAML
is much better than Open API, but stems from the problem of over engineering.
The only trouble is, suitable schema validator is missing for the same. Same goes for Open-API, people are so interested in auto generation, they do not want to validate payload.
JSON-Schema is massively popular among the Node
enthusiasts, and thus, validations are terribly easy to find. We picked up the fastest java implementation for the same.
-
Develop the schema first for:
-
input
-
output
-
error
-
-
Be Optionally typed
-
Current live schema version in use should be publicly available in live instance
And then automatically verify the input coming from clients, optionally, if need be.
It is in [3] that the Swagger is better - because of auto generation the output schema should be in sync.
The problem remains, however, what sort of validations are put into it?
If anyone wants to use a schema, it has to be inside the static
folder,
the special designated folder must hold all type definitions.
Suppose, then we have a static folder pointing to /something/my_static
,
then the designated schema file is : /something/my_static/types/schema.yaml
This types
folder would host now all json-schema
.
This was done in purpose, to ensure public availability of the interfacing contract.
We support up-to draft-07 of JSON Schema ( https://json-schema.org ).
Take a typical app which creates person and gets the person back prod
app:
port: 5042
# path mapping
routes:
post:
/person: _/create_person.zm
get:
/person/:personId : _/fetch_person.zm
Corresponding schema is defined as: app/samples/prod/types/schema.yaml
#####################
# Defines the Schemas for routes
# https://json-schema.org/learn/miscellaneous-examples.html
#####################
labels: # how system knows which label to invoke
ok: "resp.status == 200" # when response status is 200
err: "resp.status != 200" # when it is not
verify:
in: true # verify input schema
out: true # verify output schema, and log errors
routes:
/person: # the route
post:
in: Person.json # the input body schema
ok: RetVal.json
err: RetVal.json
/person/*:
get:
params: params.json # query parameter schema
ok: Person.json
err: RetVal.json
storages:
in-mem-storage:
read: true # just for the heck of it
sep: "/" # default sep is also same
paths:
".*" : Person.json # all storage path matches to this schema
Note that the query parameter schema automatically converts query parameters to objects following the algorithm:
-
If a parameter has multiple occurrence treat it as list
-
try converting the item into
2.1 Boolean
2.2 Numeric
2.3 Failing, string
-
Create an object and then return the string rep of the object to match against schema.
Special case is of input schema in
, for the rest, how to know which output schema to map it from?
This is done by expression
labels. Under the hood system runs an expression evaluator.
boolean testExpression(Request request, Response response, String expression)
This way, way more specific schema mapping can be done & checked with the validator.
Name of the label corresponds to the left hand side, for example ok
is a name.
Expression is the right hand side, which when evaluated to true
corresponding schema will be applied.
For example in case of ok
it is:
resp.status == 200
This name against schema is stored in the routes.
This turns on
and off
input and output schema verification.
Once a schema is attached the default configuration is follows:
verify:
in: true # verify input schema
out: false # do not verify output schema - classic someone else's problem
This whole schema verification technically can be done at the proxy API gateway layer.
Validation takes a little amount, initially from 20ms
to load the schema, on a proper run it would take around
0 to 2 ms.
As one can see, we invert the routes
with path
in front, and then use the verbs.
As each of these paths can be accessed with multiple verb
we invert it.
These are examples from the same app prod
which is available in the app/samples
directory:
This comes from types/RetVal.json
:
{
"$id": "https://github.com/nmondal/cowj/prod/retval.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"oneOf" : [
{
"properties": {
"personId": {
"type": "string"
}},"required":["personId"]
},
{
"properties": {
"error": {
"type": "string"
}},"required":["error"]
}
]
}
This comes from types/Person.json
:
{
"$id": "https://github.com/nmondal/cowj/prod/person.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0,
"maximum" : 150
},
"personId": {
"description": "System Generated Person Id",
"type": "string"
}
},
"required": ["firstName", "lastName" ]
}
Once we turn on the schema validation, then, the system automatically validates the schema
and parsed JSON Object gets added to the Request
as an attribute _body
,
where as the query params gets added as _params
which is a dictionary,
which then the route script can use as follows:
// ZoomBA
assert( "_body" @ req.attributes() , "How come req.body failed to verify and come here?" , 409 )
params = req.attribute("_params") // this should already have the parsed query params data
my_body = req.attribute("_body") // this should already have the parsed body data
This is added so that developers do not need to again re-parse the already parsed JSON body - done during the validation phase.
Validation errors are responded with 409
as discussed in SO here - along with the validation error.
On success, nothing, except time taken gets logged. On failure, the error gets logged, server keeps on running.
Given the static
folder is mapped, one can simply browse to :
<host>:<port>/types/schema.yaml
to see the schema mapping, along with other files:
<host>:<port>/types/RetVal.json
This makes the schema publicly exposed.
One interesting way to apply type on top of a key,value
storage is via data access pattern.
This feature is being done as follows.
Take a storage like in memory storage, for example as defined in prod :
plugins:
cowj.plugins:
curl: CurlWrapper::CURL
mem-st: MemoryBackedStorage::STORAGE
data-sources:
json_place:
type: curl
url: https://jsonplaceholder.typicode.com
in-mem-storage:
type: mem-st
We want to ensure that every access is typed
. To do so
we change the schema.yaml
file in the same project
as prod schema :
storages:
in-mem-storage:
read: true # just for the heck of it
sep: "/" # default sep is also same
paths:
".*" : Person.json # all storage path matches to this schema
This automatically wraps around the underlying storage with nice typing, based on data access patterns.
To apply this, one needs to create a key
which is the name of the storage data source as shown above.
The configuration has the following keys.
When true
forces data schema verification on every read. Please do not use it.
Every write is default verified, that can not be turned off.
This is the path seperator to be applied between the bucketName
and fileName
to apply the data access pattern.
For example, if the sep
is -
then the final access pattern is bucketName-fileName
Remember, it is a regex match, so use \.
to specify a .
seperator.
These are the data access paths, regex-pattern : schema file name
form.
The pattern must match globally for the data access pattern.
-
Thrift : Thrift: The Missing Guide
-
Avro : IDL Language | Apache Avro
-
Protobuf : Language Guide (proto 3) | Protocol Buffers Documentation
-
JSON Schema : https://json-schema.org
-
RAML : https://github.com/raml-org/raml-spec/blob/master/versions/raml-10/raml-10.md/
-
Open API ( Swagger ) : https://swagger.io/specification/https://swagger.io/specification/
-
On Type Systems:
-
Static Vs Dynamic Typing :