Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Advanced parser utility (pydantic) #118

Merged
merged 21 commits into from
Oct 2, 2020

Conversation

ran-isenberg
Copy link
Contributor

@ran-isenberg ran-isenberg commented Aug 19, 2020

Issue, if available: #147, #95

Description of changes:

Added a new validation module. It has the validator decorator code with 3 envelopes (eventbridge, dynamoDB and custom user) and tests. Also has the eventbridge & dynamoDB schemas.

Checklist

Breaking change checklist

**RFC issue #95 :

  • Migration process documented
  • Implement warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ran-isenberg ran-isenberg changed the title RFC: Validate incoming and outgoing events utility #95 feat: RFC: Validate incoming and outgoing events utility #95 Aug 19, 2020
@lgtm-com
Copy link

lgtm-com bot commented Aug 19, 2020

This pull request introduces 5 alerts when merging cf1e795 into 81539a0 - view on LGTM.com

new alerts:

  • 3 for Signature mismatch in overriding method
  • 1 for First parameter of a method is not named 'self'
  • 1 for Explicit export is not defined

@ran-isenberg
Copy link
Contributor Author

ran-isenberg commented Aug 19, 2020

open issues:

  1. update documentation.
  2. add SQS/SNS support
  3. decided whether the UX is good enough, i.e the lambda handler gets a new dict with two keys: custom and orig, where orig is the original event unparsed (should it be parsed?) and custom is the pydantic parsed object that user defined.
    Note that for the dynamoDB you get a list of dicts where each dict has a new and old objects (new & old stream for each record because dynamodb sends a list of records).
  4. add more tests for schemas
  5. add examples

@codecov-commenter
Copy link

codecov-commenter commented Aug 19, 2020

Codecov Report

Merging #118 into develop will decrease coverage by 1.16%.
The diff coverage is 87.61%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #118      +/-   ##
===========================================
- Coverage    99.86%   98.69%   -1.17%     
===========================================
  Files           52       64      +12     
  Lines         2154     2380     +226     
  Branches        97      109      +12     
===========================================
+ Hits          2151     2349     +198     
- Misses           3       24      +21     
- Partials         0        7       +7     
Impacted Files Coverage Δ
...rtools/utilities/advanced_parser/envelopes/base.py 70.00% <70.00%> (ø)
...tilities/advanced_parser/envelopes/event_bridge.py 78.57% <78.57%> (ø)
...ertools/utilities/advanced_parser/envelopes/sqs.py 82.35% <82.35%> (ø)
...owertools/utilities/advanced_parser/schemas/sqs.py 84.00% <84.00%> (ø)
...s/utilities/advanced_parser/envelopes/envelopes.py 86.36% <86.36%> (ø)
...ools/utilities/advanced_parser/schemas/dynamodb.py 93.93% <93.93%> (ø)
...a_powertools/utilities/advanced_parser/__init__.py 100.00% <100.00%> (ø)
...ls/utilities/advanced_parser/envelopes/__init__.py 100.00% <100.00%> (ø)
...ls/utilities/advanced_parser/envelopes/dynamodb.py 100.00% <100.00%> (ø)
...bda_powertools/utilities/advanced_parser/parser.py 100.00% <100.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d08de0b...10d0079. Read the comment docs.

@ran-isenberg ran-isenberg force-pushed the pydantic branch 2 times, most recently from fcc20f6 to 3a68baa Compare August 19, 2020 11:36
@ran-isenberg ran-isenberg force-pushed the pydantic branch 2 times, most recently from 3d8ba80 to 7da981f Compare August 22, 2020 16:14
* develop: (26 commits)
  docs: move tenets; remove extra space
  docs: use table for clarity
  docs: add blog post, and quick example
  docs: subtle rewording for better clarity
  docs: fix typos, log_event & sampling wording
  docs: make sensitive info more explicit with an example
  docs: create Patching modules section; cleanup response wording
  docs: move concurrent asynchronous under escape hatch
  chore: update internal docstrings for consistency
  fix: remove actual response from debug logs
  docs: grammar
  docs: bring new feature upfront when returning sensitive info
  chore: update changelog to reflect new feature
  chore: clarify changelog bugfix vs breaking change
  chore: remove/correct unnecessary debug logs
  chore: fix debug log adding unused obj
  fix: naming and staticmethod consistency
  improv: naming consistency
  fix: correct in_subsegment assertion
  feat: capture_response as metadata option #127
  ...
* develop:
  docs: Fix doc for log sampling (#135)
  fix(logging): Don't include `json_default` in logs (#132)
  chore: bump to 1.4.0
  docs: add Lambda Layer SAR App url and ARN
  fix: upgrade dot-prop, serialize-javascript
  fix heading error due to merge
  formatting for bash script
  add layer to docs and how to use it from SAR
  moved publish step to publish workflow after pypi push
  fix(ssm): Make decrypt an explicit option and refactoring (#123)
  change to eu-west-1 default region
  remove tmp release flag and set trigger to release published
  add overwrite flag for ssm
  add relase tag simulation
  more typos
  fix typo in branch trigger
  fix indent, yaml ...
  line endings
@heitorlessa
Copy link
Contributor

Notes to self to review on Friday

  • Customers shouldn't need to import pydantic to define a model/schema
  • Revisit Schema, Envelope and Model naming convention - There could be confusion here
  • Customers shouldn't need to import ValidationError from pydantic
  • Revisit DynamoDB Schema names - Schema, Scheme, Record, and DynamoDBStream might be a better name
  • Revisit DynamoDB streams complexity baseline (failing)
  • Review high level function that parses and validates (not a decorator)
  • Revisit validator to return parsed event by default instead of custom/orig event keys
  • Revisit validator option to not return parsed event but original one (prevent breaking those relying on event being a dictionary)
  • Read docs and derive ideas to simplify usage

@jplock
Copy link

jplock commented Aug 26, 2020

Is there a possibility of implementing event schema validation using the existing fastjsonschema dependency instead of introducing pydantic?

@ran-isenberg
Copy link
Contributor Author

ran-isenberg commented Aug 26, 2020

@jplock yes you can, but pydantic is much more than just a schema validator. It gives the users advanced validation capabilities, more type checking (types that JSON doesnt provide), custom logic checks on values (other than empty or null), validate custom relationships between parameters and more. In this PR , we dont only validate but give back the parsed object (a pydantic dataclass basically).

We also give the users an option to just validate and dont get back a parsed object.

@lgtm-com
Copy link

lgtm-com bot commented Aug 26, 2020

This pull request introduces 1 alert when merging bce7aab into 8da0cce - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@heitorlessa
Copy link
Contributor

heitorlessa commented Aug 27, 2020

cc @cakepietoast @alexcasalboni your thoughts are mostly welcome here too

@jplock suppose we were to use Pydantic (parser+validation), how would you see yourself using this feature? If not, what would be helpful?

I'm still awe with Pydantic flexibility, the ability to provide runtime safety for customers, and providing auto-complete for customers on common event sources they typically hop onto multiple doc pages to find the right format.

However, the UX is not easy to use yet. I'm looking at the PR more carefully tomorrow, and will block Monday afternoon to think about ways to simplify this, as Ran did a great job in pulling this together already.


Blog post written by Ran on what Pydantic brings to the Serverless experience: https://medium.com/cyberark-engineering/aws-lambda-event-validation-from-zero-to-hero-2ca950acd2ea

Pros

  • Parses popular event sources and provides auto-complete to event data attributes instead of a dict
  • Allows customers to define validation more easily, including runtime type checker with helpful errors
  • Plays nicely with popular IDEs as well as deriving JSON Schemas from models

Cons

  • Envelop, Schema, and Model are intertwined and a bit confusing to use (UX)
  • Requires customers to understand Pydantic to use it effectively (learning curve, which is what we aim to lower with this lib)
  • Additional dependency (8.2MB), and need to profile/benchmark cold start impact
    • Perhaps we can make an optional dependency using Extras?

I fear we might add a powerful feature that not many customers will be able to leverage it to its full extent. If we can't think of a way to find a middle ground by early next week, this will be a good opportunity to break this into two simpler utilities (parser, validator), and add simplicity to this project Tenets.

@jplock
Copy link

jplock commented Aug 27, 2020

@heitorlessa i like the idea of event schema validation but pydantic feels too heavy for me. All I’d want to do is validate a schema and get a dict. I wouldn’t personally use the full models. That’s why I thought we could build this feature using fastjsonschema as a first release.

@michaelbrewer
Copy link
Contributor

michaelbrewer commented Aug 28, 2020

@heitorlessa - i don't like any having any kind of extra dependencies. Currently with no extra dependencies we are already at 8.3M

I see a few options:

  1. Docs: A directory listing of companion libraries like a Lambda Powertools Awesome Collection, with a very thin wrapper implementations. Like getting Chalice + Powertools to work or Lambda Proxy + Powertools etc... This could be a great resource of cookbook examples. Maybe there is already something like this?
  2. Extras: An extension utility that works when you opt in for that additional library dependency, but does not require common use cases to include any extra dependencies. In fact do we absolutely need fastjsonschema ?
  3. CLI tool: Powertools CLI tool that generates models from a schema file. Then the CLI tool can be quite bloated and generated rich but lightweight model objects from a schema file with a collection of common usage cases like a API Gateway Request / Response. In fact these generate model objects should not need any extra dependencies, and should be treated as read only resource built as part of the makefile.

I prefer a combination of 1 and 3.


Current dependencies:

name                                         summary
-------------------------------------------  --------------------------------------------------------------------------------------------------------------------------------------------------------
aws_lambda_powertools                        Python utilities for AWS Lambda functions including but not limited to tracing, logging and custom metric
├── aws-xray-sdk<3.0.0,>=2.5.0               The AWS X-Ray SDK for Python (the SDK) enables Python developers to record and emit information from within their applications to the AWS X-Ray service.
│   ├── botocore>=1.11.3                     Low-level, data-driven core of boto 3.
│   │   ├── docutils<0.16,>=0.10             Docutils -- Python Documentation Utilities
│   │   ├── jmespath<1.0.0,>=0.7.1           JSON Matching Expressions
│   │   ├── python-dateutil<3.0.0,>=2.1      Extensions to the standard Python datetime module
│   │   │   └── six>=1.5                     Python 2 and 3 compatibility utilities
│   │   └── urllib3<1.26,>=1.20              HTTP library with thread-safe connection pooling, file post, and more.
│   ├── future                               Clean single-source support for Python 3 and 2
│   ├── jsonpickle                           Python library for serializing any arbitrary object graph into JSON
│   │   └── importlib-metadata               Read metadata from Python packages
│   │       └── zipp>=0.5                    Backport of pathlib-compatible object wrapper for zip files
│   └── wrapt                                Module for decorators, wrappers and monkey patching.
├── boto3<2.0,>=1.12                         The AWS SDK for Python
│   ├── botocore<1.18.0,>=1.17.50            Low-level, data-driven core of boto 3.
│   │   ├── docutils<0.16,>=0.10             Docutils -- Python Documentation Utilities
│   │   ├── jmespath<1.0.0,>=0.7.1           JSON Matching Expressions
│   │   ├── python-dateutil<3.0.0,>=2.1      Extensions to the standard Python datetime module
│   │   │   └── six>=1.5                     Python 2 and 3 compatibility utilities
│   │   └── urllib3<1.26,>=1.20              HTTP library with thread-safe connection pooling, file post, and more.
│   ├── jmespath<1.0.0,>=0.7.1               JSON Matching Expressions
│   └── s3transfer<0.4.0,>=0.3.0             An Amazon S3 Transfer Manager
│       └── botocore<2.0a.0,>=1.12.36        Low-level, data-driven core of boto 3.
│           ├── docutils<0.16,>=0.10         Docutils -- Python Documentation Utilities
│           ├── jmespath<1.0.0,>=0.7.1       JSON Matching Expressions
│           ├── python-dateutil<3.0.0,>=2.1  Extensions to the standard Python datetime module
│           │   └── six>=1.5                 Python 2 and 3 compatibility utilities
│           └── urllib3<1.26,>=1.20          HTTP library with thread-safe connection pooling, file post, and more.
└── fastjsonschema<2.15.0,>=2.14.4           Fastest Python implementation of JSON schema

@heitorlessa
Copy link
Contributor

Thanks @jplock and @michaelbrewer for your inputs, much appreciated.

We'll review this today with the core team and circle back with a solution - As of now, this doesn't meet our Keep It Lean tenet however useful this is.

@ran-isenberg
Copy link
Contributor Author

I'll add my two cents here if you dont mind :)
I've introduced pydantic to my organisation half a year ago. It's been well received and we have an excellent experience with it.
It's very easy to define a schema. You dont have to use the bells & whistles. You just inherit from BaseModel class and define parameters with the Python.Typing module. It produces a much readable code than a dict schema. You can also extend it and import and have a repo just for schemas. What I like here is that as an end user I have an option. I can define a basic schema or a more sophisticated schema.

Regarding the usage -I added per @heitorlessa 's request the validate function. It's not the decorator. it's a simpler version that take a schema and returns a parsed object, just like you wanted.

Regarding the extra size, as an end user, if i'm getting good performance and more usability options, i really dont mind.

I think this repo can set a high standard for validation. It's a great opportunity and I believe that pydantic can help that.

@heitorlessa
Copy link
Contributor

The format is now solved upstream in develop - rebasing should work.

For the Pytest on examples, it's best to completely remove the example folder from the repo, since we now have a cookiecutter for that purpose.

I'll push that change to develop later and ping here for rebasing

@ran-isenberg ran-isenberg force-pushed the pydantic branch 2 times, most recently from 00e8844 to eb3153f Compare September 25, 2020 17:11
@ran-isenberg
Copy link
Contributor Author

@heitorlessa all ready for ya ;)

Copy link
Contributor

@heitorlessa heitorlessa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New user experience looks GREAT! Added some comments to help us maintain as not all maintainers are well versed with Pydantic, so any additional context helps.

@heitorlessa
Copy link
Contributor

heitorlessa commented Oct 2, 2020

THANK YOU @risenberg-cyberark for all those changes, UX improvements, and additional comments you've made.

I'm gonna merge this as-is now. As agreed, I'll make a PR to write the docs for this feature and get your review before merging it.

As you also spent a ton of effort already, I'll create another PR the following minor changes:

  • Remove validators. We need an universal schema owned by Lambda for each event source, or else we'll likely break some customers if a few services change, as we won't be notified.
  • Snake_case. IIRC, EventBridge detailtype was the only odd one out but I'll double check if there are others
  • Refactor test event data. Data_classes and validator utility are partially using the same events, so we'll refactor tests to provide events to all tests as fixtures.
  • Kwargs over args. Some internal code is using args over kwargs - Moving to kwargs make it easier to refactor, and order of args wouldn't be impacted.
  • Docstrings. Enhance Example section in docstrings.
  • Remove flake8 polyfill. Double check whether a fresh Python env hits the issue you hit with that, since we don't use it
  • Raise own Validation Exception. For customers wanting to catch that exception (e.g. custom middlewares) they'll have to find Pydantic exception to catch it properly
  • Enhance imports. Envelopes and a few others can be turned into high level imports to clean up customers' code, plus DX.
  • Test coverage. Validation errors aren't being tested as of now.
  • [Optional] Enhancing envelope selection. Time allowing, I'll experiment with providing a similar experience we have in the validator envelope utility envelope.<event_source>.
  • [Optional] Create a standalone parser. Time allowing, I'll experiment creating a standalone parser function with the same functionality the decorator
  • [Optional] Support for API Gateway, Kinesis, S3, and SNS. Likely to be after 1.7.0, but time allowing we can do that sooner.

@heitorlessa heitorlessa merged commit c7a584f into aws-powertools:develop Oct 2, 2020
@ran-isenberg
Copy link
Contributor Author

@heitorlessa can you maybe put them in comments? i'd hate for that logic to be forgotten. These are connections that json schemas cant really define (only documentation).

@ran-isenberg ran-isenberg deleted the pydantic branch October 2, 2020 13:11
heitorlessa referenced this pull request in heitorlessa/aws-lambda-powertools-python Oct 4, 2020
@heitorlessa
Copy link
Contributor

@heitorlessa can you maybe put them in comments? i'd hate for that logic to be forgotten. These are connections that json schemas cant really define (only documentation).

Sure thing - Commented them out, and added a link to this discussion for history purposes.

heitorlessa added a commit that referenced this pull request Oct 14, 2020
chore: ease maintenance of upcoming parser #118
@heitorlessa heitorlessa mentioned this pull request Oct 14, 2020
19 tasks
@mwarkentin
Copy link

@heitorlessa are the extras required for this functionality included in the SAR app / lambda layer? Or do those need to be handled separately? Might be worth adding a note in the docs, not sure if that would be under the parser docs or the lambda layer installation docs?

@heitorlessa
Copy link
Contributor

@heitorlessa are the extras required for this functionality included in the SAR app / lambda layer? Or do those need to be handled separately? Might be worth adding a note in the docs, not sure if that would be under the parser docs or the lambda layer installation docs?

Hey @mwarkentin - the latter, handled separately as described in the parser docs.

Unsure whether it worth creating a separate Layer for that tbh given the additional operational overhead. In V2, we'd like to create each utility to be pip installable if a theory works, then we could consider multiple layers I suppose.

Happy to discuss more in a separate issue tho ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants