Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add support for Pydantic v2 #2672

Closed
12 of 13 tasks
leandrodamascena opened this issue Jul 5, 2023 · 12 comments · Fixed by #2733
Closed
12 of 13 tasks

RFC: Add support for Pydantic v2 #2672

leandrodamascena opened this issue Jul 5, 2023 · 12 comments · Fixed by #2733
Assignees
Labels

Comments

@leandrodamascena
Copy link
Contributor

leandrodamascena commented Jul 5, 2023

Is this related to an existing feature request or issue?

#2427

Which Powertools for AWS Lambda (Python) utility does this relate to?

Parser

Summary

This RFC proposes changes to Parser’s Pydantic Models to support both Pydantic V2 and V1 without breaking changes.

In Powertools v3 (~EOL, date not settled yet), we will then remove support for Pydantic V1, and update our Parser’s dependency to use Pydantic v2.

stateDiagram-v2
    Models: Update Parser Pydantic models
    DocsV2: Document how to bring Pydantic v2
    Compatibility: Ensure Pydantic v1 and v2 can coexist
    POC: Make POC available for beta testers
    Compatibility --> Models
    Models --> DocsV2
    DocsV2 --> POC
    POC --> PR

Loading

Compatibility table

Pydantic v1 Pydantic v2 V2 deprecation V2 removed In use? Code change required
@validator @field_validator ✔️ ✔️ ✔️
@root_validator @model_validator ✔️ ✔️ ✔️
.parse_obj() model_validate() ✔️ ✔️
.json() model_dump_json() ✔️ ✔️
.dict() model_dump() ✔️ ✔️
.parse_raw() model_validate_json() ✔️ ✔️

Use case

Pydantic V2, the latest version of Pydantic, has been launched with enhanced features and performances. Customers using Powertools for AWS Lambda (Python) in their AWS Lambda functions express interest to update it to Pydantic V2. To meet the customer's needs, the integration of Pydantic V2 in Powertools is required.

This integration enables customers to update their workloads to use Pydantic V2 while ensuring a seamless transition for existing customers using Pydantic V1.


Proposal

To accommodate both Pydantic v1 and Pydantic v2, the proposed design is to refactor the parser models with minimal changes. These changes should work in both versions of Pydantic and be transparent to users.

TL;DR: Proposed actions summary

  • Set default value of None for optional fields
  • Keep @validator and @root_validator deprecated features with a note to remove in V3
  • Keep @parse_obj deprecated features with a note to remove in V3
  • Investigate empty Dicts/List fail validation
  • Investigate .dict() and .json() removal
  • Handle TypeError validation
  • Investigate datetime coercion
  • Investigate development dependencies conflict
  • Document how to bring Pydantic v2 with Powertools
  • Document how to disable deprecation warnings for Pydantic V2
  • Create PR with POC

Optional fields must have default value

Pydantic v1

class SqsAttributesModel(BaseModel):
    ApproximateReceiveCount: str
    ApproximateFirstReceiveTimestamp: datetime
    MessageDeduplicationId: Optional[str]
    MessageGroupId: Optional[str]
    SenderId: str
    SentTimestamp: datetime
    SequenceNumber: Optional[str]
    AWSTraceHeader: Optional[str]

Pydantic v2

class SqsAttributesModel(BaseModel):
    ApproximateReceiveCount: str
    ApproximateFirstReceiveTimestamp: datetime
    MessageDeduplicationId: Optional[str] = None
    MessageGroupId: Optional[str] = None
    SenderId: str
    SentTimestamp: datetime
    SequenceNumber: Optional[str] = None
    AWSTraceHeader: Optional[str] = None

Validators are deprecated

Both @root_validator and @validator validators are deprecated in Pydantic V2 and will be removed in Pydantic V3. Pydantic recommends using the new @model_validator and @field_validator validators. However, we can continue using the deprecated validators in Powertools to avoid breaking changes and plan their removal for Powertools v3.

Pydantic v1

@root_validator(allow_reuse=True)
def check_message_id(cls, values):
    message_id, event_type = values.get("messageId"), values.get("eventType")
    if message_id is not None and event_type != "MESSAGE":
        raise TypeError("messageId is available only when the `eventType` is `MESSAGE`")
    return values

Pydantic v2

@root_validator(allow_reuse=True, skip_on_failure=True)
def check_message_id(cls, values):
    message_id, event_type = values.get("messageId"), values.get("eventType")
    if message_id is not None and event_type != "MESSAGE":
        raise ValueError("messageId is available only when the `eventType` is `MESSAGE`")
    return values

Another alternative is to check the Pydantic version and add the conditional import. We will also need to create a function to wrap the validator decorators and check the Pydantic version. This workaround may make the models harder to read and understand for maintenance purposes.

Powertools Layer

The Powertools Layer is built with Pydanticv1 and this can be a potential problem if the customer uses our layer and brings Pydantic v2 as an external dependency.

In tests with Lambda Powertools Layer + Pydanticv2 installed as an external dependency, Lambda first includes the /var/task path, that is, the external dependency will have preference over the one used in the Layer and it allows the customer brings their preferred Pydantic version.

image

Path

{"level":"INFO","location":"<module>:8","message":["/var/task","/opt/python/lib/python3.10/site-packages","/opt/python","/var/runtime","/var/lang/lib/python310.zip","/var/lang/lib/python3.10","/var/lang/lib/python3.10/lib-dynload","/var/lang/lib/python3.10/site-packages","/opt/python/lib/python3.10/site-packages"],"timestamp":"2023-07-05 09:04:52,691+0000","service":"service_undefined"}

Pydantic Version

{"level":"INFO","location":"lambda_handler:11","message":"Pydantic version -> 2.0.1","timestamp":"2023-07-05 09:04:52,694+0000","service":"service_undefined","xray_trace_id":"1-64a53234-0ca06edc18c523524775237c"}

Warnings

  • To ensure a smooth transition and minimize disruptions for our users, we have temporarily suppressed the PydanticDeprecatedSince20 and PydanticDeprecationWarning warnings (related to these functions). This allows existing applications to continue functioning as expected without outputting warnings.

  • If needed, you can enable the warnings yourself with something like the code below. Reference: https://docs.python.org/3/library/warnings.html

from aws_lambda_powertools.utilities.parser import event_parser, BaseModel, envelopes
from aws_lambda_powertools.utilities.parser.models import (
    SqsModel,
)

from aws_lambda_powertools import Logger
import pydantic

import warnings
warnings.simplefilter('default')

Out of scope

Refactorings involving breaking change for customers who want to use v1. If there is something that involves breaking, it will be left out of this change.

We could this opportunity to evaluate the performance of Pydantic V2 and potentially enhance the performance of Parser utility - not required tho.


Potential challenges

Most of the challenges were addressed and I was able to use the Powertools for AWS Lambda (Python) Parser utility with Pydantic v2 with several models. But some challenges still need to be understood whether this is a breaking change or not.

Working with datetime fields

In Pydantic v1, when using datetime fields, the UTC offset is included and the tests work fine. However, in Pydantic v2, the UTC offset is not included, causing our tests to fail.

Codebase

from datetime import datetime

from pydantic import BaseModel
import pydantic


class Model(BaseModel):
    datefield: datetime = None

epoch_time = 1659687279885

m = Model(
    datefield=epoch_time,
)

print(f"Pydantic version -> {pydantic.__version__}")
print(f"Raw epoch time -> {epoch_time}")
print(f"Raw pydantic field -> {m.datefield}")
print(f"Pydantic converted epoch time -> {int(round(m.datefield.timestamp() * 1000))}")

assert epoch_time == int(round(m.datefield.timestamp() * 1000))

Pydantic v1

/tmp/pydantic2 via 🐍 v3.10.6 (.env) on ☁️  (us-east-1) 
❯ python v1.py
Pydantic version -> 1.10.11
Raw epoch time -> 1659687279885
Raw pydantic field -> 2022-08-05 08:14:39.885000+00:00
Pydantic converted epoch time -> 1659687279885

Pydantic v2

/tmp/pydantic2 via 🐍 v3.10.6 (.env) on ☁️  (us-east-1) 
❯ python v2.py
Pydantic version -> 2.0.1
Raw epoch time -> 1659687279885
Raw pydantic field -> 2022-08-05 08:14:39.885000
Pydantic converted epoch time -> 1659683679885
Traceback (most recent call last):
  File "/tmp/pydantic2/v2.py", line 21, in <module>
    assert epoch_time == int(round(m.datefield.timestamp() * 1000))
AssertionError

Batch processing

Some Batch processing tests are failing and I need to investigate why.

FAILED tests/functional/test_utilities_batch.py::test_batch_processor_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'message_id'
FAILED tests/functional/test_utilities_batch.py::test_batch_processor_dynamodb_context_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'dynamodb'
FAILED tests/functional/test_utilities_batch.py::test_batch_processor_kinesis_context_parser_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'kinesis'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'message_id'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_dynamodb_context_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'dynamodb'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_kinesis_context_parser_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'kinesis'

Developer environment

Since some of our dependencies (cfn-lint / aws-sam-translator) have a requirement of Pydantic v1, we'll need to remove them from our development environment in order to accommodate Pydantic v2.

❯ poetry add "pydantic>=2.0"                       

Updating dependencies
Resolving dependencies... (0.6s)

Because no versions of aws-sam-translator match >1.68.0,<1.69.0 || >1.69.0,<1.70.0 || >1.70.0
 and aws-sam-translator (1.68.0) depends on pydantic (>=1.8,<2.0), aws-sam-translator (>=1.68.0,<1.69.0 || >1.69.0,<1.70.0 || >1.70.0) requires pydantic (>=1.8,<2.0).
And because aws-sam-translator (1.69.0) depends on pydantic (>=1.8,<2.0), aws-sam-translator (>=1.68.0,<1.70.0 || >1.70.0) requires pydantic (>=1.8,<2.0).
And because aws-sam-translator (1.70.0) depends on pydantic (>=1.8,<2.0)
 and cfn-lint (0.77.10) depends on aws-sam-translator (>=1.68.0), cfn-lint (0.77.10) requires pydantic (>=1.8,<2.0).
So, because aws-lambda-powertools depends on both pydantic (>=2.0) and cfn-lint (0.77.10), version solving failed.

Dependency resolution for Pydantic v1 and v2

Customers will run into a conflict when having either requirements.txt:

All Powertools features and Pydantic v2

aws-lambda-powertools[all]
pydantic # or pydantic>=2

Powertools parser feature and Pydantic v2

aws-lambda-powertools[parser]
pydantic # or pydantic>=2

Recommendation

We should include a new section in the documentation to explain how to use Pydantic v2 with Parser.

For example, customers should refrain from using [all] or [parser] when bringing Pydantic v2 as part of their dependencies.

  • aws-lambda-powertools[all] becomes aws-lambda-powertools[validation,tracer,aws-sdk]

Because of the Optional[str] = None breaking change in v2, we should keep our pydantic pinning to v1 until we launch v3 and move away. We cannot guarantee a customer is using additional Pydantic v1 features through Powertools - or followed our docs to the letter.

This also gives us room to recommend disabling warnings for deprecated features we're keeping for backwards compatibility (e.g., validators). This prevents Pydantic littering customers' logs ($$) when bringing Pydantic v2.


Ad-hoc test for pydantic v2 dep

Until Powertools V3 is a thing, we should guard against backwards incompatibility for newer Pydantic models, or changes to existing models that might be contributed externally from a Pydantic v2 customer.

Recommendation

Setup a new temporary GitHub Action workflow to trigger on changes to Parser's models. We can use Nox to streamline dependencies and easily trigger Parser's unit tests.

Alternatively, within this temporary workflow, we could call make dev, remove Pydantic, install Pydantic v2`, and run Parser's unit tests.

Nox's benefit is that it's more stable, easier to reason, and it will lead us to address an unrelated area we aren't testing today -- tests w/o optional dependencies (e.g., Batch without parser code).


No response

Alternative solutions

No response

Acknowledgment

@leandrodamascena leandrodamascena added RFC triage Pending triage from maintainers labels Jul 5, 2023
@heitorlessa heitorlessa removed the triage Pending triage from maintainers label Jul 5, 2023
@heitorlessa heitorlessa moved this from Triage to Working on it in Powertools for AWS Lambda (Python) Jul 5, 2023
@heitorlessa heitorlessa pinned this issue Jul 5, 2023
@leandrodamascena
Copy link
Contributor Author

RFC updated with a test using Lambda Layer + Plan to release a version with Pydantic v2.

@rubenfonseca
Copy link
Contributor

This is a mega job with the investigation. To be honest I will be extremely suprised if we can keep our code working with both v1 and v2, but you seems to be close!

I believe the biggest problem is that deprecation of the root_validator. Even though it's deprecated, could we maybe keep it until we release Powertools v3 (using Pydantic v2 only)?.

Another concern is customer code: unless we are very careful with our docs, custom models based on pydantic v1 would fail on v2 unless they apply the same changes (e.g: Optional fields must have default value).

Sod o you still believe we can support both versions with the same code?

@heitorlessa
Copy link
Contributor

Reading

@heitorlessa
Copy link
Contributor

Firstly, congrats @leandrodamascena on your first RFC - bar raising to say the least. We should update our Maintainers playbook to include this RFC when it comes to breaking changes.

Some quick recommendations:

  • Bring actionable items more upfront in the proposal. For example:

TL;DR: Proposed actions summary

  • Set default value of None for optional fields
  • Keep @validator and @root_validator deprecated features with a note to remove in V3
  • Investigate datetime coercion
  • Investigate development dependencies conflict
  • Document how to bring Pydantic v2 with Powertools
  • Create PR with POC
  • Break down the Lambda Layer paragraph to ease reading, and the output to quickly notice Pydantic V2 version

Assuming we can indeed keep both v1 and v2 working with our models, I've noticed two major areas missing:

Dependency resolution for Pydantic v1 and v2

Customers will run into a conflict when having either requirements.txt:

All Powertools features and Pydantic v2

aws-lambda-powertools[all]
pydantic # or pydantic>=2

Powertools parser feature and Pydantic v2

aws-lambda-powertools[parser]
pydantic # or pydantic>=2

Recommendation

We should include a new section in the documentation to explain how to use Pydantic v2 with Parser.

For example, customers should refrain from using [all] or [parser] when bringing Pydantic v2 as part of their dependencies.

  • aws-lambda-powertools[all] becomes aws-lambda-powertools[validation,tracer,aws-sdk]

Because of the Optional[str] = None breaking change in v2, we should keep our pydantic pinning to v1 until we launch v3 and move away. We cannot guarantee a customer is using additional Pydantic v1 features through Powertools - or followed our docs to the letter.

This also gives us room to recommend disabling warnings for deprecated features we're keeping for backwards compatibility (e.g., validators). This prevents Pydantic littering customers' logs ($$) when bringing Pydantic v2.

Ad-hoc test for pydantic v2 dep

Until Powertools V3 is a thing, we should guard against backwards incompatibility for newer Pydantic models, or changes to existing models that might be contributed externally from a Pydantic v2 customer.

Recommendation

Setup a new temporary GitHub Action workflow to trigger on changes to Parser's models. We can use Nox to streamline dependencies and easily trigger Parser's unit tests.

Alternatively, within this temporary workflow, we could call make dev, remove Pydantic, install Pydantic v2`, and run Parser's unit tests.

Nox's benefit is that it's more stable, easier to reason, and it will lead us to address an unrelated area we aren't testing today -- tests w/o optional dependencies (e.g., Batch without parser code).

PS: We should try GitHub Discussions next time to have threaded conversations about areas of a given RFC.

@leandrodamascena leandrodamascena self-assigned this Jul 5, 2023
@leandrodamascena
Copy link
Contributor Author

@ran-isenberg comment - #2427 (comment)

  1. the parse_raw/ parse_obj funcs are now deprecated and need to renamed

We will not rename functions marked as deprecated. We will keep these functions to avoid breaking changes for anyone wanting to use Powertools with Pydantic 1. Our plan is to remove support for Pydantic v1 functions in Powertools V3.

  1. Types: Empty Dicts/List now fail validation, they didnt before so this might alter people's validations

By adding the default value None in the optional fields we avoid this problem.

  1. .dict() and .json() are also removed, we use in several use cases, one of them is also idempotency and even the parser docs.

These functions were not removed, they were marked as deprecated and we can still use both in Pydantic v2.

  1. @Validators However, in Pydantic V2, when a TypeError is raised in a validator, it is no longer converted into a ValidationError, so that means people need to change their code to catch TypeError (breaking change) OR we change the validator to raise ValidationError instead of TypeError.
  2. Issue number 4 also applies to code such as json.loads that raises TypeError, it's best to catch all exceptions and perhaps re-raise them as validationerrors

Indeed its a good point, Ran. In this case I think we need to change the validators to raise ValueError/ValidationError instead of TypeError. I'll test it in more detail.

Thanks a lot for bringing up these points.

@leandrodamascena
Copy link
Contributor Author

I believe the biggest problem is that deprecation of the root_validator. Even though it's deprecated, could we maybe keep it until we release Powertools v3 (using Pydantic v2 only)?.

Yeah @rubenfonseca, that is the idea to avoid breaking changes.

Another concern is customer code: unless we are very careful with our docs, custom models based on pydantic v1 would fail on v2 unless they apply the same changes (e.g: Optional fields must have default value).

Absolutely. Because of that, I added a table to make sure we will cover this on our documentation.

Sod o you still believe we can support both versions with the same code?

Our goal is to do just that and try to cover every part of this RFC to avoid breaking changes.

@leandrodamascena
Copy link
Contributor Author

Firstly, congrats @leandrodamascena on your first RFC - bar raising to say the least. We should update our Maintainers playbook to include this RFC when it comes to breaking changes.

Thanks a lot for this comment @heitorlessa! This is very important for me ❤️

Some quick recommendations:

  • Bring actionable items more upfront in the proposal. For example:

TL;DR: Proposed actions summary

  • Set default value of None for optional fields
  • Keep @validator and @root_validator deprecated features with a note to remove in V3
  • Investigate datetime coercion
  • Investigate development dependencies conflict
  • Document how to bring Pydantic v2 with Powertools
  • Create PR with POC

Added

  • Break down the Lambda Layer paragraph to ease reading, and the output to quickly notice Pydantic V2 version

Done

I added the new 2 sections.

@ran-isenberg
Copy link
Contributor

Thx @leandrodamascena , sounds like a plan!
In our linters config, deprecated usage actually fails the pipeline, i havent tested it in powertools config, but it's a point to take note of. We are going to increase major versions at our company and fix it. Anyways, eagerly waiting powertools v3 where pydantic v1 will be retired.

@leandrodamascena
Copy link
Contributor Author

Thx @leandrodamascena , sounds like a plan! In our linters config, deprecated usage actually fails the pipeline, i havent tested it in powertools config, but it's a point to take note of.

I've added an item to take care of this, but if the customer decides to upgrade to Pydantic V2 then they can suppress the Pydantic warning by disabling the warning messages. We also can do this on the Powertools side.

from pydantic import validator, ValidationError, PydanticDeprecationWarning
import warnings
warnings.filterwarnings("ignore", category=PydanticDeprecationWarning)

@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Jul 8, 2023
@leandrodamascena leandrodamascena removed the pending-release Fix or implementation already in dev waiting to be released label Jul 8, 2023
@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Jul 8, 2023
@heitorlessa heitorlessa removed the pending-release Fix or implementation already in dev waiting to be released label Jul 10, 2023
@leandrodamascena leandrodamascena linked a pull request Jul 11, 2023 that will close this issue
10 tasks
@github-project-automation github-project-automation bot moved this from Working on it to Coming soon in Powertools for AWS Lambda (Python) Jul 21, 2023
@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@rubenfonseca
Copy link
Contributor

Update: going to release this in the next few hours.

@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Jul 21, 2023
@github-actions
Copy link
Contributor

This is now released under 2.21.0 version!

@github-actions github-actions bot removed the pending-release Fix or implementation already in dev waiting to be released label Jul 21, 2023
@leandrodamascena leandrodamascena unpinned this issue Jul 21, 2023
@leandrodamascena leandrodamascena moved this from Coming soon to Shipped in Powertools for AWS Lambda (Python) Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Shipped
Development

Successfully merging a pull request may close this issue.

4 participants