Skip to content

Latest commit

 

History

History
176 lines (116 loc) · 2.89 KB

semantic-search.md

File metadata and controls

176 lines (116 loc) · 2.89 KB
description
Example for Semantic Search

Semantic Search

This tutorial demonstrates using the pgml SDK to create a collection, add documents, build a pipeline for vector search, make a sample query, and archive the collection when finished. It loads sample data, indexes questions, times a semantic search query, and prints formatted results.

Imports and Setup

Python

from pgml import Collection, Model, Splitter, Pipeline  
from datasets import load_dataset
from dotenv import load_dotenv
import asyncio

JavaScript

const pgml = require("pgml");

require("dotenv").config();

The SDK is imported and environment variables are loaded.

Initialize Collection

Python

async def main():

  load_dotenv()

  collection = Collection("my_collection") 

JavaScript

const main = async () => {

  const collection = pgml.newCollection("my_javascript_collection");

}

A collection object is created to represent the search collection.

Create Pipeline

Python

  model = Model()
  splitter = Splitter()

  pipeline = Pipeline("my_pipeline", model, splitter)

  await collection.add_pipeline(pipeline)

JavaScript

  const model = pgml.newModel();

  const splitter = pgml.newSplitter();

  const pipeline = pgml.newPipeline("my_javascript_pipeline", model, splitter);

  await collection.add_pipeline(pipeline); 

A pipeline encapsulating a model and splitter is created and added to the collection.

Upsert Documents

Python

  documents = [
    {"id": "doc1", "text": "..."},
    {"id": "doc2", "text": "..."}
  ]

  await collection.upsert_documents(documents)  

JavaScript

  const documents = [
    {
      id: "Document One",
      text: "...",
    },
    {
      id: "Document Two", 
      text: "...",
    },
  ];

  await collection.upsert_documents(documents);

Documents are upserted into the collection and indexed by the pipeline.

Query

Python

  results = await collection.query()
    .vector_recall("query", pipeline)
    .fetch_all() 

JavaScript

  const queryResults = await collection
    .query()
    .vector_recall(
      "query",
      pipeline,
    )
    .fetch_all();

A vector similarity search query is made on the collection.

Archive Collection

Python

  await collection.archive()

JavaScript

  await collection.archive();

The collection is archived when finished.

Let me know if you would like me to modify or add anything!

Main

Python

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

main().then((results) => { 
console.log("Vector search Results: \n", results); 
});

Boilerplate to call main() async function.

Let me know if you would like me to modify or add anything to this markdown documentation. Happy to iterate on it!