Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when reading vector column of pgvector #340

Open
2 tasks done
np-kyokyo opened this issue Jul 12, 2024 · 8 comments · May be fixed by #341
Open
2 tasks done

Error when reading vector column of pgvector #340

np-kyokyo opened this issue Jul 12, 2024 · 8 comments · May be fixed by #341
Labels

Comments

@np-kyokyo
Copy link

np-kyokyo commented Jul 12, 2024

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

Sqlacodegen version

3.0.0rc5

SQLAlchemy version

2.0.31

RDBMS vendor

PostgreSQL

What happened?

Issue:

When attempting to add a new column to the recommend_model."Image" table using the following SQL command:

ALTER TABLE "recommend_model"."Image" ADD COLUMN  "embedding" vector(1408) NOT NULL;

then, we encountered an issue where the pgvector.sqlalchemy.vector.VECTOR type is not being properly recognized in the context of the get_adapted_type method. This results in the column type falling back to sqlalchemy.sql.type_api.UserDefinedType instead.

Using pgvector 0.3.0
Traceback (most recent call last):
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/bin/sqlacodegen", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/cli.py", line 101, in main
    outfile.write(generator.generate())
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 171, in generate
    self.fix_column_types(table)
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 651, in fix_column_types
    column.type = self.get_adapted_type(column.type)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 703, in get_adapted_type
    if new_coltype.compile(self.bind.engine.dialect) != compiled_type and (
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/type_api.py", line 1062, in compile
    return dialect.type_compiler_instance.process(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 960, in process
    return type_._compiler_dispatch(self, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/visitors.py", line 141, in _compiler_dispatch
    return meth(self, **kw)  # type: ignore  # noqa: E501
           ^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 7262, in visit_user_defined
    return type_.get_col_spec(**kw)
           ^^^^^^^^^^^^^^^^^^
AttributeError: 'UserDefinedType' object has no attribute 'get_col_spec'
def get_adapted_type(self, coltype: Any) -> Any:
    ...
    coltype = new_coltype
    if supercls.__name__ != supercls.__name__.upper():
        break

Analysis

The VECTOR type is recognized, but since it is an uppercase type, the loop continues to check for any camel case types. As a result, UserDefinedType is found, but since it does not have the get_col_spec method, an error occurs.

Potential Fixes

  1. Avoid to use UserDefinedType when get adapted type:

    if supercls.__visit_name__ == "user_defined":
        break
    coltype = new_coltype

    UserDefinedTypeが発見された場合、すでにnew_coltypeがあった場合はそちらを優先する方針がよさそう。
    It seems a good approach to prioritize an existing detected new_coltype if UserDefinedType is found.

    It seem to be below:

                 if supercls.__visit_name__ == "user_defined" and new_coltype is not None:
                     continue
           
                 ...
    
                 adapted_coltype = new_coltype
  2. Add Special Handling for VECTOR:
    Include a specific check for the VECTOR type:

    coltype = new_coltype
    if supercls.__name__ == "VECTOR":
        break

    VECTORの時点で現状、キャメルケースがくることはない。→しかし、今後来るようになる可能性はある?
    Currently, there are no camel case types coming after VECTOR. However, it is possible that they might come in the future.

  3. Remove the Uppercase Check:
    Replace the existing check with a break statement, allowing VECTOR to be recognized:

    coltype = new_coltype
    # if supercls.__name__ != supercls.__name__.upper():
    #     break

    → Due to the impact on other areas, this approach is rejected.

Output on success

embedding: Mapped[Any] = mapped_column(VECTOR(1408))

References

Commit where if supercls.name != supercls.name.upper() was introduced

ralated issue

#300

What is the difference between Text and TEXT types in SQLAlchemy?

ref. https://docs.sqlalchemy.org/en/20/core/type_basics.html#generic-camelcase-types

Removing if supercls.__name__ != supercls.__name__.upper(): break results in differences, such as parts previously outputting as Text now appearing as TEXT.

Database schema for reproducing the bug

(WIP)

@np-kyokyo np-kyokyo added the bug label Jul 12, 2024
@np-kyokyo np-kyokyo changed the title pgvectorのvectorカラムを読み込んだ時エラーを吐く Error when reading vector column of pgvector Jul 12, 2024
@np-kyokyo
Copy link
Author

np-kyokyo commented Jul 12, 2024

I'm dealing with this issue... I've been too busy to handle it.

np-kyokyo added a commit to np-kyokyo/sqlacodegen that referenced this issue Jul 12, 2024
@np-kyokyo np-kyokyo linked a pull request Jul 12, 2024 that will close this issue
@agronholm
Copy link
Owner

Did you install the project with the pgvector extra?

@np-kyokyo
Copy link
Author

np-kyokyo commented Jul 12, 2024

@agronholm
yes. this is minimal requirements.lock to reproduce the issue.

# generated by rye
# use `rye lock` or `rye sync` to update this lockfile
#
# last locked with the following flags:
#   pre: false
#   features: []
#   all-features: false
#   with-sources: false
#   generate-hashes: false

inflect==7.3.1
    # via sqlacodegen
more-itertools==10.3.0
    # via inflect
numpy==2.0.0
    # via pgvector
pgvector==0.3.1
psycopg2==2.9.9
sqlacodegen==3.0.0rc5
sqlalchemy==2.0.31
    # via sqlacodegen
typeguard==4.3.0
    # via inflect
typing-extensions==4.12.2
    # via sqlalchemy
    # via typeguard

@agronholm
Copy link
Owner

Others have successfully tested the pgvector integration. Any idea why it's not working for you?

@Tcintra
Copy link

Tcintra commented Jul 18, 2024

I downgraded pgvector to 0.2.5 and it worked fine, was having the same issue before!

@agronholm
Copy link
Owner

Pinpointing the specific commit where it starts to fail would probably help.

@chandlj
Copy link

chandlj commented Nov 21, 2024

@agronholm @np-kyokyo The exact commit is here: pgvector/pgvector-python@ba45eb2

It happened in the upgrade from pgvector==0.2.5 to pgvector==0.3.0. I am also having the same issue.

@agronholm
Copy link
Owner

But sqlacodegen does not directly import any of those types, so it doesn't make sense that such a change would break the integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants