Skip to content

RFC: Validate incoming and outgoing events utility #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
heitorlessa opened this issue Jul 28, 2020 · 12 comments
Closed

RFC: Validate incoming and outgoing events utility #95

heitorlessa opened this issue Jul 28, 2020 · 12 comments
Assignees
Labels

Comments

@heitorlessa
Copy link
Contributor

heitorlessa commented Jul 28, 2020

Key information

Summary

This utility helps you validate incoming events from Lambda event sources as well as the response from your Lambda handler - All based on JSON Schemas. You can either validate an entire incoming event including the chosen event source wrapper, or only the event payload/body if you will.

Motivation

Well-Architected Serverless Lens recommends validating events under the Security pillar. As of now, customers have to implement their own validation mechanisms, bring additional dependencies, and end up crafting multiple JSON Schemas for popular event sources.

We could ease that by including in Powertools and lowering the bar for entry.

Proposal

This utility would use the already present Fast JSON Schema lib to validate both incoming and outgoing events using a preferred JSON Schema.

Another plus is that we can use our Middleware Factory to make this easier to implement, and automatically trace it with X-Ray to profile performance impact when necessary.

Validating both inbound/outbound

from aws_lambda_powertools.utilities import validator

@validator(inbound=inbound_schema_dict, outbound=outbound_schema_dict)
def lambda_handler(evt, ctx):
    ...

For customers wanting to validate only the payload of popular event sources, say API Gateway, this utility will work in tandem with an extractor utility - That will provide the following benefits:

  • Validate only the actual event payload of a popular event source
    • e.g. validate API GW POST payload event (event['body']) itself not the whole API GW event
  • Craft your own JSON Schema to help you validate payload as well as things like headers/message attributes, etc.

By default, envelopes will pluck only the payload of a message within the event. Allowing multiple paths can easily add complexity, so we will defer to customers creating their own envelopes if they want to.

Validating inbound with built-in popular event source schemas

from aws_lambda_powertools.utilities import validator
from aws_lambda_powertools.utilities.extractor import envelopes

@validator(inbound=inbound_schema, envelope=envelopes.api_gateway_rest)
def lambda_handler(evt, ctx):
    ...

Drawbacks

  • JSON Schemas are not offered today for Lambda Event Sources; this means maintaining it
  • Validating JSON Schemas can add up to ~25ms during cold start executions
    • Invalid schemas or compiled-once are microseconds

Rationale and alternatives

  • What other designs have been considered? Why not them?: Lambda Decorators Validate - It'd mean bringing two additional dependencies for a single utility, and schema is slower than fastjsonschema we already include in Powertools
  • What is the impact of not doing this?: Customers less experienced with Lambda might end up reinventing the wheel, and accidentally bringing less performant or incorrect validations.

Unresolved questions

  • Should we validate the event source per se, the event within the event source envelope, or both
    • e.g. message body in API GW, message within SQS Array object, etc.
    • UPDATE: We'll create an extractor utility to not violate SRP
  • Should we also optionally serialize the event before validating?

Optional, stash area for topics that need further development e.g. TBD

@heitorlessa heitorlessa added RFC triage Pending triage from maintainers labels Jul 28, 2020
@ghost ghost removed the triage Pending triage from maintainers label Jul 28, 2020
@heitorlessa heitorlessa pinned this issue Jul 28, 2020
@nmoutschen
Copy link
Contributor

It'd be nice to have vended schemata for AWS Services, but question mark on their utilities. Would it be to filter when a Lambda function is triggered by an unexpected service? The schemata would have to be loose enough to accommodate future changes without breaking anything.

However, validating things like APIGW body, EventBridge details, etc. would be a must-have for me, but means having an understanding of these events and how to unpack them (e.g. json.load() the APIGW body).

@heitorlessa
Copy link
Contributor Author

heitorlessa commented Jul 29, 2020

UPDATE: Updated the RFC based on our internal discussion on this.

Yep, I agree with that and the feedback we got on Twitter too.

As we have to create a correlation ID utility too, it makes more sense to:

  • create a general purpose utility to extract the actual event from an event source envelop
    • This could also help in scenarios where you have SNS Payload + SQS envelop
    • Instead of auto-detecting the event source we could accept an Enum like to reduce cycles detecting event sources that won't typically change over time, and maintaining schemas outside our control
  • validator utility could use a enum-like parameter to define what event source it is, and validate the actual event payload itself by unwrapping with the previous utility.

If this sounds good I'll update the RFC to reflect these, and create a PR next week

Thoughts?

@heitorlessa heitorlessa self-assigned this Aug 3, 2020
@nmoutschen
Copy link
Contributor

We should also validate the event itself based on specific keys. I see a few examples that would benefit from that:

  • API Gateway where you need specific keys (headers, query parameters, etc.).
  • SNS/SQS where you need message attributes.
  • EventBridge where you need the details to have a specific schema.

@tmclaugh
Copy link

tmclaugh commented Aug 5, 2020

To what @nmoutschen said above and after our chat with Heitor, Yes, validating both event and payload are valuable. What I'd like is a decorator for validation where I pass in and schema only to validate the event or pass in an additional envelope which would cause the event payload only to be validated.

@randude
Copy link

randude commented Aug 13, 2020

Hey guys, really liking your tools, it makes life a lot easier for us Python AWS developers.
For validation I actually wrote a blog about how we solved that in our code:
https://medium.com/cyberark-engineering/aws-lambda-event-validation-from-zero-to-hero-2ca950acd2ea
We basically used Pydantic which offers excellent performance while being super readable.
We use it also for validation and parsing boto3 responses and not just aws events or events between services.
What do you think? @heitorlessa

@heitorlessa
Copy link
Contributor Author

This is really interesting, though 8.2MB more!

I need to dig into their docs a bit more as they seem to offer another package to generate Pydantic Models from JSON Schema, which could be a great value add. Apart from package size, we use fastjsonschema which is really fast, so the benchmarks they compared against do not hold ground with the lib we use.

I believe the UX would change from the proposed design from the little I saw, how would you see the UX if we were to use pydantic here @randude?

@ran-isenberg
Copy link
Contributor

Well, the user can either supply a json or a pydantic model. A pydantic model has advantages because you can define non json types like uuid, http url , datetimes, email addresses etc., and also define validator/root validators which offer logical validation of the relationships between the variables in the schema and not just the pure value check. So this is a big plus, that's why we used it. You can add custom values validation like aws region string, we used boto3 to validate it's a valid region input.

So I think that the user should be able to define pydantic schemas and then you can wrap the whole try/except with your decorator. I.e the input to the validator is a Pydantic class.
You can also add support for SQS/SNS/eventbridge/DynamoDB stream messages where the original user message is wrapped inside an AWS event. I had to manually go and write the dynamoDB schema (and also maintain it), but if this utility can do this for me, that would be awesome. You can do this by making the model that you create like this:
I think that the user will have to import the SQSEvent schema and inherit it
class AWSEvent(BaseMode)
..
[ has a message field which is the custom user data as a dict ]
..
class UserSchema(BaseModel)
..

class SQSUserSchema(AWSEvent)
"message": UserSchema

Another parameter for validation would need to be if it's string or dict, pydantic handles both but you need to call different functions when validating.

Last thought - it would be awesome if AWS had some central repo for schemas where you can share schemas between teams and also look at the current AWS event schemas. I know that there's https://docs.aws.amazon.com/eventbridge/latest/userguide/eventbridge-schemas.html but that's too specific for eventbridge.

BTW, Let me know if you need some help with coding this ;)

@heitorlessa
Copy link
Contributor Author

Thanks for clarifying @randude - And yes, if you could spare some time working a PR to POC this I'd love any help I can get -- I have to work on minor fixes for Logger on docs and autocomplete for PyCharm in the meantime, then I'll be back to this :)

I can totally see this being more helpful long-term - I have two thoughts I need to think more carefully so recording for posterity:

  • It seems that the models could be a separate package itself, or this could start with models for popular Lambda Event sources like scoped in this PR
  • Pydantic is primarily a parser, so the initially proposed validator and extractor might merge into a parser utility to make its intent more explicit
    • I'd also suspect the preferred UX would be to parse and return the parsed object back to the handler, or fail with a validation exception if there are any validators in the custom model

@ran-isenberg
Copy link
Contributor

@heitorlessa It's me again.
This is my work account. I'll see what I can do. Hopefully i'll have some good news very soon.
Do you have any slack/skype account that i can ping for questions regarding the repo if i need?

@heitorlessa
Copy link
Contributor Author

@risenberg-cyberark Awesome - I'm on AWS Developer Slack channel [Heitor Lessa (AWS], feel free to DM me. If you're not, DM me on Twitter and I can invite you to Slack

ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 19, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 19, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 19, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 19, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 19, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 19, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 22, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 22, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 23, 2020
ran-isenberg referenced this issue in ran-isenberg/aws-lambda-powertools-python Aug 25, 2020
@heitorlessa heitorlessa unpinned this issue Aug 28, 2020
@heitorlessa heitorlessa added the pending-release Fix or implementation already in dev waiting to be released label Sep 18, 2020
@heitorlessa
Copy link
Contributor Author

the initial implementation of this RFC (Simple JSON Schema validator) has now been merged #153 - It'll be available in the next release 1.6.0

@to-mc
Copy link
Contributor

to-mc commented Sep 23, 2020

Closing this issue as we've released the validator utility with 1.6.0. See RFC #147 for progress on integrating with pydantic.

@to-mc to-mc closed this as completed Sep 23, 2020
@heitorlessa heitorlessa removed the pending-release Fix or implementation already in dev waiting to be released label Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

6 participants