Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update text embedding component #532

Merged
merged 5 commits into from
Oct 18, 2023
Merged

Conversation

PhilippeMoussalli
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli commented Oct 18, 2023

PR that modifies the text embedding component:

  • Change the name of the component to embed_text for consistency (we already have an embed_image component)
  • Added VertexAI as a possible model to use
  • Changed the base image to pytorch since some of the dependencies in the requirement have cuda deps
  • Fixed some tests (mainly path related)

Tested the component with the CC rag pipeline and works fine (added the op here).

The component runs fine (exit code 0) however there seems to be an error that pops since the connection with the API does not seem to shutdown gracefully.

rag-cc-pipeline-embed_text-1     | [2023-10-18 13:58:45,227 | root | INFO] Writing data...
[########################################] | 100% Completed | 52.61 ss
rag-cc-pipeline-embed_text-1     | [2023-10-18 13:59:37,967 | fondant.executor | INFO] Saving output manifest to gs://soy-audio-379412_kfp-artifacts/custom_artifact/rag-cc-pipeline/rag-cc-pipeline-20231018155832/embed_text/manifest.json
rag-cc-pipeline-embed_text-1     | [2023-10-18 13:59:37,967 | fondant.executor | INFO] Writing cache key to gs://soy-audio-379412_kfp-artifacts/custom_artifact/rag-cc-pipeline/cache/dd3a24cb288cd0eaba12063d2885bc9d.txt
rag-cc-pipeline-embed_text-1     | Traceback (most recent call last):
rag-cc-pipeline-embed_text-1     |   File "src/python/grpcio/grpc/_cython/_cygrpc/aio/grpc_aio.pyx.pxi", line 110, in grpc._cython.cygrpc.shutdown_grpc_aio
rag-cc-pipeline-embed_text-1     |   File "src/python/grpcio/grpc/_cython/_cygrpc/aio/grpc_aio.pyx.pxi", line 114, in grpc._cython.cygrpc.shutdown_grpc_aio
rag-cc-pipeline-embed_text-1     |   File "src/python/grpcio/grpc/_cython/_cygrpc/aio/grpc_aio.pyx.pxi", line 78, in grpc._cython.cygrpc._actual_aio_shutdown
rag-cc-pipeline-embed_text-1     | AttributeError: 'NoneType' object has no attribute 'POLLER'
rag-cc-pipeline-embed_text-1     | Exception ignored in: 'grpc._cython.cygrpc.AioChannel.__dealloc__'
rag-cc-pipeline-embed_text-1     | Traceback (most recent call last):
rag-cc-pipeline-embed_text-1     |   File "src/python/grpcio/grpc/_cython/_cygrpc/aio/grpc_aio.pyx.pxi", line 110, in grpc._cython.cygrpc.shutdown_grpc_aio
rag-cc-pipeline-embed_text-1     |   File "src/python/grpcio/grpc/_cython/_cygrpc/aio/grpc_aio.pyx.pxi", line 114, in grpc._cython.cygrpc.shutdown_grpc_aio
rag-cc-pipeline-embed_text-1     |   File "src/python/grpcio/grpc/_cython/_cygrpc/aio/grpc_aio.pyx.pxi", line 78, in grpc._cython.cygrpc._actual_aio_shutdown
rag-cc-pipeline-embed_text-1     | AttributeError: 'NoneType' object has no attribute 'POLLER'

Normally this would be done with a client but in this case we're initializing the model directly

@@ -29,15 +29,16 @@ def test_run_component_test():

dataframe = pd.concat({"text": pd.DataFrame(data)}, axis=1, names=["text", "data"])

component = GenerateEmbeddingsComponent(
component = EmbedTextComponent(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not ideal that we're loading an actual model (which makes the test dependent on an internet connection), although I don't mind as much in the component tests since we currently run them manually

@RobbeSneyders
Copy link
Member

Let's look into suppressing the error later.

@RobbeSneyders RobbeSneyders merged commit 877f2c5 into main Oct 18, 2023
4 checks passed
@RobbeSneyders RobbeSneyders deleted the update-text-embedding-component branch October 18, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants