description |
---|
Example for Semantic Search |
This tutorial demonstrates using the pgml
SDK to create a collection, add documents, build a pipeline for vector search, make a sample query, and archive the collection when finished. It loads sample data, indexes questions, times a semantic search query, and prints formatted results.
Python
from pgml import Collection, Model, Splitter, Pipeline
from datasets import load_dataset
from dotenv import load_dotenv
import asyncio
JavaScript
const pgml = require("pgml");
require("dotenv").config();
The SDK is imported and environment variables are loaded.
Python
async def main():
load_dotenv()
collection = Collection("my_collection")
JavaScript
const main = async () => {
const collection = pgml.newCollection("my_javascript_collection");
}
A collection object is created to represent the search collection.
Python
model = Model()
splitter = Splitter()
pipeline = Pipeline("my_pipeline", model, splitter)
await collection.add_pipeline(pipeline)
JavaScript
const model = pgml.newModel();
const splitter = pgml.newSplitter();
const pipeline = pgml.newPipeline("my_javascript_pipeline", model, splitter);
await collection.add_pipeline(pipeline);
A pipeline encapsulating a model and splitter is created and added to the collection.
Python
documents = [
{"id": "doc1", "text": "..."},
{"id": "doc2", "text": "..."}
]
await collection.upsert_documents(documents)
JavaScript
const documents = [
{
id: "Document One",
text: "...",
},
{
id: "Document Two",
text: "...",
},
];
await collection.upsert_documents(documents);
Documents are upserted into the collection and indexed by the pipeline.
Python
results = await collection.query()
.vector_recall("query", pipeline)
.fetch_all()
JavaScript
const queryResults = await collection
.query()
.vector_recall(
"query",
pipeline,
)
.fetch_all();
A vector similarity search query is made on the collection.
Python
await collection.archive()
JavaScript
await collection.archive();
The collection is archived when finished.
Let me know if you would like me to modify or add anything!
Python
if __name__ == "__main__":
asyncio.run(main())
JavaScript
main().then((results) => {
console.log("Vector search Results: \n", results);
});
Boilerplate to call main() async function.
Let me know if you would like me to modify or add anything to this markdown documentation. Happy to iterate on it!