Skip to content

Fix multipart body arrays #938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 6, 2025

Conversation

micha91
Copy link
Contributor

@micha91 micha91 commented Jan 13, 2024

As described in #692 arrays of files are not handled correctly, if they are part of multipart/form-data. This is fixed in this PR by letting to_multipart return a List[Tuple[str, Any]] instead of a Dict[str, Any].

@micha91 micha91 force-pushed the fix-multipart-body-file-array branch from 651fddc to fe841d7 Compare January 13, 2024 10:52
@micha91 micha91 changed the title Fix multipart body file array fix: Fix multipart body file array Jan 13, 2024
@dbanty dbanty added the 🥚breaking This change breaks compatibility label Jan 15, 2024
@dbanty
Copy link
Collaborator

dbanty commented Jan 15, 2024

Thanks for adding this support! I believe this is a breaking change, since it's possible someone was using the JSON serialization behavior before (even though it seems wrong in general). I'll need to do some manual testing just to make sure I know how to describe the breaking change 😅. I'd also love to get someone to test a real running API and verify it all functions (maybe you've already done this?)

@micha91
Copy link
Contributor Author

micha91 commented Jan 15, 2024

Yes, I already use the updated model template as a custom template in a project. Unfortunately this project is not public yet and the API it is developed for, isn't publicly accessible, too

@ratgen
Copy link

ratgen commented Feb 15, 2024

Hi, just tested this out with an api I am working on. It works great.

@FabianSchurig
Copy link

@dbanty @micha91 Thank you so much for providing the openapi-python-client package and this PR!

I tested it in the following way:
pipx install git+https://github.com/openapi-generators/openapi-python-client.git@refs/pull/938/head

openapi-python-client update --path ../../swagger.json --config openapi-python-generator-config.yml

    "/upload": {
      "post": {
        "tags": [
          "Microsoft.KernelMemory.ServiceAssembly"
        ],
        "summary": "Upload a new document to the knowledge base",
        "description": "Upload a document consisting of one or more files to extract memories from. The extraction process happens asynchronously. If a document with the same ID already exists, it will be overwritten and the memories previously extracted will be updated.",
        "operationId": "UploadDocument",
        "requestBody": {
          "description": "Document to upload and extract memories from",
          "content": {
            "multipart/form-data": {
              "schema": {
                "type": "object",
                "properties": {
                  "index": {
                    "type": "string",
                    "description": "Name of the index where to store memories generated by the files."
                  },
                  "documentId": {
                    "type": "string",
                    "description": "Unique ID used for import pipeline and document ID."
                  },
                  "tags": {
                    "type": "object",
                    "additionalProperties": {
                      "type": "string"
                    },
                    "description": "Tags to apply to the memories extracted from the files."
                  },
                  "steps": {
                    "type": "array",
                    "items": {
                      "type": "string"
                    },
                    "description": "How to process the files, e.g. how to extract/chunk etc."
                  },
                  "files": {
                    "type": "array",
                    "items": {
                      "type": "string",
                      "format": "binary"
                    },
                    "description": "Files to process and extract memories from."
                  }
                }
              }
            }
          }
        },

In general it works!
I would greatly appreciate if we could have similar functionality to solve #692 in the main branch, soon.

@dbanty What do you think about an alternative where a user could just overwrite how a

"files": {
      "type": "array",
      "items": {
        "type": "string",
        "format": "binary"
      },

is handled?

This could either generate:

        files: Union[Unset, tuple[None, bytes, str]] = UNSET
        if not isinstance(self.files, Unset):
            _temp_files = []
            for files_item_data in self.files:
                files_item = files_item_data.to_tuple()

                _temp_files.append(files_item)
            files = (None, json.dumps(_temp_files).encode(), "application/json")

or

        files: Union[Unset, tuple[None, bytes, str]] = UNSET
        if not isinstance(self.files, Unset):
            files = files

The later works well in the case and should be just a minimal, non-breaking and optional change?

@micha91 micha91 force-pushed the fix-multipart-body-file-array branch from 453d017 to 5e8ad7a Compare May 13, 2025 07:42
@micha91
Copy link
Contributor Author

micha91 commented May 13, 2025

I reimplemented the changes based on the latest main branch. I tried to merge changes from main in here, but there were just too many changes in the end and the review view was just messed up completely.
@dbanty I would really like to get this merged. We use the generator to build our REST API client for Polarion, which is a proprietary tool so we cannot adjust its API. As the current template version on main does not work for arrays of files at all (leads to a JSON serialization error as bytes are not JSON serializable) we have to use a custom template to get a working client. And updating these custom templates for new generator releases is really hard work.
Because you asked before for a real word example - the full specification of the Polarion REST API can be found here and an endpoint using this array of files can be found here.

@dbanty
Copy link
Collaborator

dbanty commented May 17, 2025

@micha91 thanks for updating! I'm slowing making some changes as I get time. A couple things still left to look into:

  1. According to the docs we should actually be using the data param instead of the files param for everything that's not a file... which would simplify things a lot I think. But it also doesn't specify what types are allowed here, so I have to look into that.
  2. I want to add multiple files to the integration tests just to make sure we keep it working long term

Hoping to get this all wrapped up soon. Thanks for your hard work and patience on this 🙇

@dbanty
Copy link
Collaborator

dbanty commented May 31, 2025

Integration tests are updated, so left to do:

@dbanty
Copy link
Collaborator

dbanty commented Jun 3, 2025

@micha91 (and all following) thanks for all the work, I think this is almost ready! One thing I've noticed as I go through and test/fix more stuff around multipart is that having a top-level schema of an array really didn't work anyway. The test we had with this schema:

"multipart/form-data": {
    "schema": {
        "type": "array",
        "items": {
            "type": "string",
            "format": "binary"
        }
    }
}

Produced code like this:

for body_item_data in body:
        body_item = body_item_data.to_tuple()

        _body.append(body_item)

    _kwargs["files"] = _body

Which as far as I can tell is completely invalid, since each .to_tuple() will have the (file_name, content, mime_type) but no field name, like ("files" (file_name, content, mime_type)). So I just removed that.

Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?

@dbanty dbanty changed the title fix: Fix multipart body file array Fix multipart body arrays Jun 3, 2025
@micha91
Copy link
Contributor Author

micha91 commented Jun 3, 2025

@dbanty Thanks for your additional efforts on this! I will give the current implementation a try today to see if it also works in our real world client 👍

Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?

I think it is simply unsupported when using multipart, so it should be absolutely correct to drop it

@dbanty dbanty mentioned this pull request Jun 4, 2025
@micha91
Copy link
Contributor Author

micha91 commented Jun 5, 2025

@dbanty Thanks for your additional efforts on this! I will give the current implementation a try today to see if it also works in our real world client 👍

Any reason you can think of that we should keep some sort of top-level array-of-files support? How would that even work in multipart?

I think it is simply unsupported when using multipart, so it should be absolutely correct to drop it

Works as expected. I still need custom some custom templates, but that's due to issues in the specification

@dbanty dbanty enabled auto-merge June 6, 2025 01:18
@dbanty dbanty added this pull request to the merge queue Jun 6, 2025
Merged via the queue into openapi-generators:main with commit 305229b Jun 6, 2025
22 checks passed
@knope-bot knope-bot bot mentioned this pull request Jun 6, 2025
github-merge-queue bot pushed a commit that referenced this pull request Jun 6, 2025
> [!IMPORTANT]
> Merging this pull request will create this release

## Breaking Changes

- Raise minimum httpx version to 0.23

### Removed ability to set an array as a multipart body

Previously, when defining a request's body as `multipart/form-data`, the
generator would attempt to generate code
for both `object` schemas and `array` schemas. However, most arrays
could not generate valid multipart bodies, as
there would be no field names (required to set the `Content-Disposition`
headers).

The code to generate any body for `multipart/form-data` where the schema
is `array` has been removed, and any such
bodies will be skipped. This is not _expected_ to be a breaking change
in practice, since the code generated would
probably never work.

If you have a use-case for `multipart/form-data` with an `array` schema,
please [open a new
discussion](https://github.com/openapi-generators/openapi-python-client/discussions)
with an example schema and the desired functional Python code.

### Change default multipart array serialization

Previously, any arrays of values in a `multipart/form-data` body would
be serialized as an `application/json` part.
This matches the default behavior specified by OpenAPI and supports
arrays of files (`binary` format strings).
However, because this generator doesn't yet support specifying
`encoding` per property, this may result in
now-incorrect code when the encoding _was_ explicitly set to
`application/json` for arrays of scalar values.

PR #938 fixes #692. Thanks @micha91 for the fix, @ratgen and
@FabianSchurig for testing, and @davidlizeng for the original report...
many years ago 😅.

Co-authored-by: knope-bot[bot] <152252888+knope-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🥚breaking This change breaks compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants