Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Projection model not working in aggregation #1017

Open
valentinoli opened this issue Sep 5, 2024 · 8 comments
Open

[BUG] Projection model not working in aggregation #1017

valentinoli opened this issue Sep 5, 2024 · 8 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request

Comments

@valentinoli
Copy link
Contributor

Describe the bug

When I provide a projection_model to FindMany.aggregate(), that has the id property with an alias="_id" it fails to project with the following error

_id
  Field required [type=missing, input_value={[**REDACTED**]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing

This happens even if I set populate_by_name=True on the Pydantic model.

To Reproduce

from beanie import DocumentModel
from pydantic import BaseModel

class Doc(DocumentModel):
    id: str = Field(
        ...,
        alias="_id",
    )

class Model(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )

    id: str = Field(
        ...,
        alias="_id",
    )

async def query():
    results = await Doc.find().aggregate(aggregation_pipeline=[], projection_model=Model).to_list()
    return results

Expected behavior
This should work. I should not have to manually project each result from the query, like [Model(**res) for res in results]

Copy link
Contributor

This issue is stale because it has been open 30 days with no activity.

@github-actions github-actions bot added the Stale label Oct 22, 2024
@staticxterm
Copy link
Contributor

Hi, I am unable to reproduce this on Python3.13, Beanie 1.27.0 (or even 1.26.0) and Pydantic 2.9.2 (or 1.10.18).
Code

import asyncio

from beanie import Document, init_beanie
from motor.motor_asyncio import AsyncIOMotorClient
from pydantic import BaseModel, ConfigDict, Field


class Doc(Document):
    id: str = Field(
        ...,
        alias="_id",
    )
    field_a: str
    field_b: str


class Model(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )

    id: str = Field(
        ...,
        alias="_id",
    )
    field_b: str


async def main():
    client = AsyncIOMotorClient("mongodb://localhost:27017")
    database = client["test-db"]

    await init_beanie(database, document_models=[Doc])

    # Run DB queries now.
    doc = Doc(id="1", field_a="a", field_b="b")
    result = await doc.save()
    print(result)

    results = (
        await Doc.find()
        .aggregate(aggregation_pipeline=[], projection_model=Model)
        .to_list()
    )
    print(results)


if __name__ == "__main__":
    asyncio.run(main())

Output:

id='1' revision_id=None field_a='a' field_b='b'
[Model(id='1', field_b='b')]

@mg3146
Copy link

mg3146 commented Oct 23, 2024

I feel like I had this issue once when using pydantic v1, seems slightly familiar... don't quote me on that though

@valentinoli
Copy link
Contributor Author

Hey, thanks for the response. I will try to provide a better reproduction.

@valentinoli
Copy link
Contributor Author

valentinoli commented Oct 25, 2024

It's bit of a "weird" case, but below is the full reproduction.

Here is the document example:

{
  "_id": "my_id",
  "field_list": [
    {
      "id": "id_1",
      "field_a": "a",
      "field_b": "b"
    },
    {
      "id": "id_2",
      "field_a": "a",
      "field_b": "b"
    }
  ]
}

The aim is to get as result the nested object (projected to exclude field_b):

{
  "id": "id_1",
  "field_a": "a"
}

Here is the code to reproduce. Notice how the ProjectionModel.id has alias="_id" (it just does, don't ask why) and populate_by_name=True. So the below should work, but it doesn't.

import asyncio

from beanie import Document, init_beanie
from motor.motor_asyncio import AsyncIOMotorClient
from pydantic import BaseModel, Field, ConfigDict


class ProjectionModel(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )

    id: str = Field(
        ...,
        alias="_id",
    )
    field_a: str


class Model(BaseModel):
    id: str
    field_a: str
    field_b: str


class Doc(Document):
    id: str = Field(
        ...,
        alias="_id",
    )
    field_list: list[Model]


async def query(list_item_id: str):
    results = (
        await Doc.find()
        .aggregate(
            aggregation_pipeline=[
                {
                    "$unwind": "$field_list",
                },
                {
                    "$match": {
                        "field_list.id": list_item_id,
                    },
                },
                {
                    "$replaceRoot": {
                        "newRoot": "$field_list",
                    }
                },
            ],
            projection_model=ProjectionModel,
        )
        .to_list()
    )
    return results


async def main():
    client = AsyncIOMotorClient("mongodb://localhost:27017")

    await init_beanie(
        database=client.test_db,
        document_models=[Doc],
    )

    # Run DB queries now.
    doc = Doc(
        id="my_id",
        field_list=[
            Model(
                id="id_1",
                field_a="a",
                field_b="b",
            ),
            Model(
                id="id_2",
                field_a="a",
                field_b="b",
            ),
        ],
    )
    result = await doc.save()
    print(result)

    try:
        results = await query(list_item_id="id_1")
        print(results)
    finally:
        # pass
        await doc.delete()


if __name__ == "__main__":
    asyncio.run(main())

Copy link
Contributor

This issue is stale because it has been open 30 days with no activity.

@github-actions github-actions bot added the Stale label Nov 25, 2024
Copy link
Contributor

github-actions bot commented Dec 9, 2024

This issue was closed because it has been stalled for 14 days with no activity.

@github-actions github-actions bot closed this as completed Dec 9, 2024
@staticxterm
Copy link
Contributor

Hi @valentinoli,

sorry for taking some time to reply back to this. All right, I managed to reproduce with the code example in your last reply.
Thank you for the detailed description of the issue. Now, I wouldn't technically classify this as a bug in Beanie, although it's somewhere in the "gray zone". So, I'll put both the "bug" and "enhancement" labels as well as "documentation".

As you rightly point out, the issue is setting the alias in the projection model:

class ProjectionModel(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
    )

    id: str = Field(
        ...,
        alias="_id",  # setting the alias triggers the issue
        # without it, "id" field is selected from the nested object in DB just fine
    )
    field_a: str

The offending line in Beanie:

document_projection[field.alias or name] = 1

Here, the alias is prioritized and used over the non-alias field, which results in "_id" being selected for the projection model.
And since the "_id" field doesn't exist for the nested model (instead the "id" field exists as that is how it is saved to DB), Pydantic throws an exception.

Now, what bugs me with this functionality of projection model and Pydantic aliases (and of course, it's not really documented anywhere) is that we can't really know programatically (I guess..) how the data is stored in DB. So in essence, what the generated query should look like, with the alias used, or without it?

I guess there is an improvement to be made with the Pydantic aliases, as we've had lots of issues related to that. But the trouble is, honestly, for the query, I don't know which field should be selected as we don't know that beforehand.

This issue definitely needs some accompanying documentation so that others won't run in the same issue again.

Hopefully someone comes with a neat solution (if there is one).

@staticxterm staticxterm reopened this Dec 31, 2024
@staticxterm staticxterm added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request and removed Stale action requested labels Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants