Skip to content

Conversation

@vatsrahul1001
Copy link
Contributor

I noticed that when we try to create embeddings using Cohere, we encounter a serialization error in Airflow 3. Since pickling has been removed from XCom in AF3, this might be causing the error: TypeError: cannot serialize object of type <class 'cohere.types.embed_by_type_response_embeddings.EmbedByTypeResponseEmbeddings'>.
image


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please ammed commit message? Changes are on multiple providers.

@vatsrahul1001
Copy link
Contributor Author

Can you please ammed commit message? Changes are on multiple providers.

Changes are specific to Cohere only. We have some integration system tests which are from other providers

@eladkal
Copy link
Contributor

eladkal commented Jun 4, 2025

Changes are specific to Cohere only. We have some integration system tests which are from other providers

Then what is the motivation to include them here? If there is another reason for having the changes best to do it in a seperated PR

@vatsrahul1001
Copy link
Contributor Author

Changes are specific to Cohere only. We have some integration system tests which are from other providers

Then what is the motivation to include them here? If there is another reason for having the changes best to do it in a seperated PR

We have interation system DAGs from pinecone and Weaviate which uses cohere operator in these tests are kind of dependent on these changes.

@vatsrahul1001
Copy link
Contributor Author

Looking at failing tests

@vatsrahul1001 vatsrahul1001 requested a review from eladkal June 6, 2025 03:08
@vatsrahul1001
Copy link
Contributor Author

I will raise separate PR for Pinecone and Weaviate.

@vatsrahul1001 vatsrahul1001 merged commit 3a7e521 into apache:main Jun 6, 2025
67 checks passed
@vatsrahul1001 vatsrahul1001 deleted the fix-cohere-provider branch June 6, 2025 06:10
@amoghrajesh
Copy link
Contributor

@vatsrahul1001 #50867 tracks an effort to make the serialisation work for cohere embeddings, it is not due to pickling imo, it probably is because we do not have support for ser / deser pydantic types as EmbedByTypeResponseEmbeddings is essentially a pydantic model.

sanederchik pushed a commit to sanederchik/airflow that referenced this pull request Jun 7, 2025
* make cohere provider AF3 compatible
@sjyangkevin
Copy link
Contributor

Yes. I agree with @amoghrajesh. If Pinecone and Weaviate also utilize pydantic, I believe a long-term and maintainable fix is to let XCom serde to support Pydantic model. I actually attached evidence about how this issue can be solved once the serde can support it.

return embeddings
if response.embeddings.float_ is None:
raise ValueError("Embeddings response is missing float_ field")
return response.embeddings.float_
Copy link
Contributor

@sjyangkevin sjyangkevin Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to provide some of my findings about this change. As shown https://github.com/cohere-ai/cohere-python/blob/main/src/cohere/types/embed_by_type_response_embeddings.py. There could be a case that the embedding is stored in other fields. I also share some thoughts in the comment, and it will be better if the pydantic model can be handled by XComs serde.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now its ok, just added a comment so that people arent confused: #51517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants