Skip to content

[Data] Support write_sql() #38242

@c21

Description

@c21

Description

For LLM application, it's often needed to write vector index back to database, e.g. https://github.com/pgvector/pgvector.
It would be great if Ray Data supports writing to the SQL database, then we can work with the index end-to-end smoothly.

The interface can be

def write_sql(
    self,
    sql: str,
    connection_factory: Callable[[], Connection],
    ray_remote_args: Optional[Dict[str, Any]] = None,
) -> None:
from pgvector.psycopg import register_vector

# you can do following
def create_connection():
    conn = psycopg.connect(os.environ["DB_CONNECTION_STRING"])
    register_vector(conn)
    return conn

ds = ...
ds.write_sql("INSERT INTO document (uri, body, embedding) VALUES (%s, %s, %s)", create_connection)

Use case

Ideally we can do this by Ray 2.7 for LLM application.

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksdataRay Data-related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions