Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mapping_types config seems to be ignored on diff #511

Open
OskarStark opened this issue Nov 1, 2023 · 1 comment
Open

mapping_types config seems to be ignored on diff #511

OskarStark opened this issue Nov 1, 2023 · 1 comment

Comments

@OskarStark
Copy link

OskarStark commented Nov 1, 2023

Using the latest bundle version and Postgres.

I have the following entity

<?php

declare(strict_types=1);

namespace App\Entity;

use Doctrine\ORM\Mapping as ORM;
use LLPhant\Embeddings\VectorStores\Doctrine\DoctrineEmbeddingEntityBase;

#[ORM\Entity]
class Snippet extends DoctrineEmbeddingEntityBase
{
    #[ORM\Id]
    #[ORM\Column(type: 'integer')]
    #[ORM\GeneratedValue(strategy: 'AUTO')]
    public int $id;

    #[ORM\Column(type: VectorType::VECTOR, length: 1536)]
    public ?array $embedding;

    public function getId(): int
    {
        return $this->id;
    }
}

VectorType:

<?php

namespace LLPhant\Embeddings\VectorStores\Doctrine;

use Doctrine\DBAL\Exception;
use Doctrine\DBAL\Platforms\AbstractPlatform;
use Doctrine\DBAL\Platforms\PostgreSQLPlatform;
use Doctrine\DBAL\Types\Type;
use Doctrine\Deprecations\Deprecation;

class VectorType extends Type
{
    final public const VECTOR = 'vector';

    /**
     * @param  mixed[]  $column
     */
    public function getSQLDeclaration(array $column, AbstractPlatform $platform): string
    {
        // getName is deprecated since doctrine/dbal 2.13 see: https://github.com/doctrine/dbal/issues/4749
        // BUT it is the most stable way to check if the platform is PostgreSQLPlatform in a lot of doctrine versions
        // so we will use it and add a check for the class name in case it is removed in the future
        if (method_exists($platform, 'getName') && $platform->getName() !== 'postgresql') {
            throw Exception::notSupported('VECTORs not supported by Platform.');
        }

        if (! isset($column['length'])) {
            throw Exception::notSupported('VECTORs must have a length.');
        }

        if ($column['length'] < 1) {
            throw Exception::notSupported('VECTORs must have a length greater than 0.');
        }

        if (! is_int($column['length'])) {
            throw Exception::notSupported('VECTORs must have a length that is an integer.');
        }

        return sprintf('vector(%d)', $column['length']);
    }

    /**
     * @return float[]
     */
    public function convertToPHPValue(mixed $value, AbstractPlatform $platform): array
    {
        if ($value === null) {
            return [];
        }

        $value = is_resource($value) ? stream_get_contents($value) : $value;

        if (! is_string($value)) {
            throw Exception::notSupported('Error while converting VECTORs to PHP value.');
        }

        $convertedValue = explode(',', $value);
        $floatArray = [];
        foreach ($convertedValue as $singleConvertedValue) {
            $floatArray[] = (float) $singleConvertedValue;
        }

        return $floatArray;
    }

    public function convertToDatabaseValue(mixed $value, AbstractPlatform $platform): string
    {
        //If $value is not a float array throw an exception
        if (! is_array($value)) {
            throw Exception::notSupported('VECTORs must be an array.');
        }

        return VectorUtils::getVectorAsString($value);
    }

    public function getName(): string
    {
        return self::VECTOR;
    }
}
# config/packages/doctrine.yaml
doctrine:
    dbal:
        url: '%env(resolve:DATABASE_URL)%'
        mapping_types:
            vector: array
        types:
            vector: LLPhant\Embeddings\VectorStores\Doctrine\VectorType

the created migration:

final class Version20231101073053 extends AbstractMigration
{
    public function up(Schema $schema): void
    {
        $this->addSql('CREATE SEQUENCE snippet_id_seq INCREMENT BY 1 MINVALUE 1 START 1');
        $this->addSql('CREATE TABLE snippet (id INT NOT NULL, embedding vector(1536) NOT NULL, PRIMARY KEY(id))');
    }

    public function down(Schema $schema): void
    {
        $this->throwIrreversibleMigrationException();
    }
}

From my understanding, because I added mapping_types: vector: array it should look like:

        $this->addSql('CREATE SEQUENCE snippet_id_seq INCREMENT BY 1 MINVALUE 1 START 1');
        $this->addSql('CREATE TABLE snippet (id INT NOT NULL, embedding vector(1536) NOT NULL, PRIMARY KEY(id))');
+       $this->addSql('COMMENT ON COLUMN snippet.embedding IS \'(DC2Type:array)\'');

If I manually add this line, all works good, but if I want to generate the diff again I get another migration over and over again:

final class Version20231101073443 extends AbstractMigration
{
    public function up(Schema $schema): void
    {
        $this->addSql('ALTER TABLE snippet ALTER embedding TYPE vector(1536)');
        $this->addSql('COMMENT ON COLUMN snippet.embedding IS NULL');
    }

    public function down(Schema $schema): void
    {
        $this->throwIrreversibleMigrationException();
    }
}

cc @chr-hertel @goetas

@goetas
Copy link
Member

goetas commented Nov 1, 2023

From my understanding of mapping types, types and comments...

  • mapping_types are rules used by DBAL to convert SQL types to DBAL types.
  • types are rules used by DBAL to convert DBAL types into PHP Types (or better, to choose a converter that maps "SQL data" to PHP types/data).

as example:

  • postgres bigserial is converted into DBAL's bigint (thanks to built in mapping_types)
  • DBAL's bigint is later converted in PHP's string (thanks to built in types)

Thus in your case i would expect something as:

        mapping_types:
            vector: LLPhantVectorType
        types:
            LLPhantVectorType: LLPhant\Embeddings\VectorStores\Doctrine\VectorType

The fact that your implementation of VectorType returns an array is is unknown to doctrine.

The comment is used to infer a Doctrine type when there are multiple doctrine types mapped on the same SQL type. The classical example is:

Consider an SQL type varchar(255), that type can be used as the Php's string, MyEnum (a backed enum that uses strings as cases) or any PHP type that can be serialized into a string with max 255 chrs.

When DBAL encounters a vector SQL type, needs to decide which DBAL Type assign to it. If a comment is found, it will try to find that type, if not it will find a default.

You might have configs defined as follows:

        mapping_types:
            vector: LLPhantVectorType
        types:
            LLPhantVectorType: LLPhant\Embeddings\VectorStores\Doctrine\VectorType
            GoetasVectorType: GoetasVectorType

In this case, both GoetasVectorType and LLPhant\Embeddings\VectorStores\Doctrine\VectorType want to use a SQL vector, but DBAL needs to a way to decide which type to instantaite when an SQL column of type vector is found.

GoetasVectorType will have to override requiresSQLCommentHint. Columns that use GoetasVectorType will have a comment, columns that use LLPhant\Embeddings\VectorStores\Doctrine\VectorType will not.

When DBAL finds an SQL vector type will check for a comment, when found will try to instantiate the type in the comment. If no comment is found, will try to instantiate the type defined in the configs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants