Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorsearch providers should utilize the global Langchain.logger #804

Merged
merged 8 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ HUGGING_FACE_API_KEY=
LLAMACPP_MODEL_PATH=
LLAMACPP_N_THREADS=
LLAMACPP_N_GPU_LAYERS=
MILVUS_URL=
MILVUS_URL=http://localhost:19530
MISTRAL_AI_API_KEY=
NEWS_API_KEY=
OLLAMA_URL=http://localhost:11434
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
## [Unreleased]
- [BREAKING] Langchain::Vectorsearch::Milvus was rewritten to work with newer milvus 0.10.0 gem
- Assistant can now process image_urls in the messages (currently only for OpenAI and Mistral AI)
- Vectorsearch providers utilize the global Langchain.logger
- Update required milvus, qdrant and weaviate versions

## [0.16.1] - 2024-09-30
- Deprecate Langchain::LLM::GooglePalm
Expand Down
22 changes: 12 additions & 10 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ GEM
faraday (~> 2.0)
typhoeus (~> 1.4)
ffi (1.16.3)
fiber-storage (1.0.0)
google-cloud-env (2.1.1)
faraday (>= 1.0, < 3.a)
google_palm_api (0.1.3)
Expand All @@ -169,12 +170,13 @@ GEM
multi_json (~> 1.11)
os (>= 0.9, < 2.0)
signet (>= 0.16, < 2.a)
graphlient (0.7.0)
graphlient (0.8.0)
faraday (~> 2.0)
graphql-client
graphql (2.3.4)
graphql (2.3.16)
base64
graphql-client (0.22.0)
fiber-storage
graphql-client (0.23.0)
activesupport (>= 3.0)
graphql (>= 1.13.0)
hashdiff (1.1.0)
Expand Down Expand Up @@ -211,7 +213,7 @@ GEM
net-smtp
matrix (0.4.2)
method_source (1.1.0)
milvus (0.9.3)
milvus (0.10.3)
faraday (>= 2.0.1, < 3)
mini_mime (1.1.5)
mini_portile2 (2.8.6)
Expand Down Expand Up @@ -282,7 +284,7 @@ GEM
psych (5.1.2)
stringio
public_suffix (5.0.5)
qdrant-ruby (0.9.7)
qdrant-ruby (0.9.8)
faraday (>= 2.0.1, < 3)
racc (1.8.0)
rack (3.0.11)
Expand Down Expand Up @@ -419,9 +421,9 @@ GEM
parser (>= 3.3.0)
uri (0.13.0)
vcr (6.2.0)
weaviate-ruby (0.8.10)
weaviate-ruby (0.9.2)
faraday (>= 2.0.1, < 3.0)
graphlient (~> 0.7.0)
graphlient (>= 0.7.0, < 0.9.0)
webmock (3.23.1)
addressable (>= 2.8.0)
crack (>= 0.3.2)
Expand Down Expand Up @@ -461,7 +463,7 @@ DEPENDENCIES
langchainrb!
llama_cpp (~> 0.9.4)
mail (~> 2.8)
milvus (~> 0.9.3)
milvus (~> 0.10.3)
mistral-ai
nokogiri (~> 1.13)
pdf-reader (~> 2.0)
Expand All @@ -470,7 +472,7 @@ DEPENDENCIES
pinecone (~> 0.1.6)
power_point_pptx (~> 0.1.0)
pry-byebug (~> 3.10.0)
qdrant-ruby (~> 0.9.4)
qdrant-ruby (~> 0.9.8)
rake (~> 13.0)
rdiscount (~> 2.2.7)
replicate-ruby (~> 0.2.2)
Expand All @@ -483,7 +485,7 @@ DEPENDENCIES
sequel (~> 5.68.0)
standard (>= 1.35.1)
vcr
weaviate-ruby (~> 0.8.10)
weaviate-ruby (~> 0.9.2)
webmock
wikipedia-client (~> 1.17.0)
yard (~> 0.9.34)
Expand Down
6 changes: 3 additions & 3 deletions langchain.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Gem::Specification.new do |spec|
spec.add_development_dependency "google_search_results", "~> 2.0.0"
spec.add_development_dependency "hnswlib", "~> 0.8.1"
spec.add_development_dependency "hugging-face", "~> 0.3.4"
spec.add_development_dependency "milvus", "~> 0.9.3"
spec.add_development_dependency "milvus", "~> 0.10.3"
spec.add_development_dependency "llama_cpp", "~> 0.9.4"
spec.add_development_dependency "nokogiri", "~> 1.13"
spec.add_development_dependency "mail", "~> 2.8"
Expand All @@ -67,13 +67,13 @@ Gem::Specification.new do |spec|
spec.add_development_dependency "pdf-reader", "~> 2.0"
spec.add_development_dependency "pinecone", "~> 0.1.6"
spec.add_development_dependency "replicate-ruby", "~> 0.2.2"
spec.add_development_dependency "qdrant-ruby", "~> 0.9.4"
spec.add_development_dependency "qdrant-ruby", "~> 0.9.8"
spec.add_development_dependency "roo", "~> 2.10.0"
spec.add_development_dependency "roo-xls", "~> 1.2.0"
spec.add_development_dependency "ruby-openai", "~> 7.1.0"
spec.add_development_dependency "safe_ruby", "~> 1.0.4"
spec.add_development_dependency "sequel", "~> 5.68.0"
spec.add_development_dependency "weaviate-ruby", "~> 0.8.10"
spec.add_development_dependency "weaviate-ruby", "~> 0.9.2"
spec.add_development_dependency "wikipedia-client", "~> 1.17.0"
spec.add_development_dependency "power_point_pptx", "~> 0.1.0"
end
2 changes: 1 addition & 1 deletion lib/langchain/vectorsearch/elasticsearch.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def initialize(url:, index_name:, llm:, api_key: nil, es_options: {})
@options = {
url: url,
request_timeout: 20,
log: false
logger: Langchain.logger
}.merge(es_options)

@es_client = ::Elasticsearch::Client.new(**options)
Expand Down
106 changes: 45 additions & 61 deletions lib/langchain/vectorsearch/milvus.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,18 @@ class Milvus < Base
# Wrapper around Milvus REST APIs.
#
# Gem requirements:
# gem "milvus", "~> 0.9.3"
# gem "milvus", "~> 0.10.3"
#
# Usage:
# milvus = Langchain::Vectorsearch::Milvus.new(url:, index_name:, llm:, api_key:)
# milvus = Langchain::Vectorsearch::Milvus.new(url:, index_name:, llm:, api_key:)
#

def initialize(url:, index_name:, llm:, api_key: nil)
depends_on "milvus"

@client = ::Milvus::Client.new(url: url)
@client = ::Milvus::Client.new(
url: url,
logger: Langchain.logger
)
@index_name = index_name

super(llm: llm)
Expand All @@ -24,33 +26,24 @@ def initialize(url:, index_name:, llm:, api_key: nil)
def add_texts(texts:)
client.entities.insert(
collection_name: index_name,
num_rows: Array(texts).size,
fields_data: [
{
field_name: "content",
type: ::Milvus::DATA_TYPES["varchar"],
field: Array(texts)
}, {
field_name: "vectors",
type: ::Milvus::DATA_TYPES["float_vector"],
field: Array(texts).map { |text| llm.embed(text: text).embedding }
}
]
data: texts.map do |text|
{content: text, vector: llm.embed(text: text).embedding}
end
)
end

# TODO: Add update_texts method

# Deletes a list of texts in the index
#
# @param ids [Array<Integer>] The ids of texts to delete
# @return [Boolean] The response from the server
def remove_texts(ids:)
raise ArgumentError, "ids must be an array" unless ids.is_a?(Array)
# Convert ids to integers if strings are passed
ids = ids.map(&:to_i)

client.entities.delete(
collection_name: index_name,
expression: "id in #{ids}"
filter: "id in #{ids}"
)
end

Expand All @@ -62,33 +55,25 @@ def create_default_schema
client.collections.create(
auto_id: true,
collection_name: index_name,
description: "Default schema created by langchain.rb",
fields: [
{
name: "id",
is_primary_key: true,
autoID: true,
data_type: ::Milvus::DATA_TYPES["int64"]
fieldName: "id",
isPrimary: true,
dataType: "Int64"
}, {
name: "content",
is_primary_key: false,
data_type: ::Milvus::DATA_TYPES["varchar"],
type_params: [
{
key: "max_length",
value: "32768" # Largest allowed value
}
]
fieldName: "content",
isPrimary: false,
dataType: "VarChar",
elementTypeParams: {
max_length: "32768" # Largest allowed value
}
}, {
name: "vectors",
data_type: ::Milvus::DATA_TYPES["float_vector"],
is_primary_key: false,
type_params: [
{
key: "dim",
value: llm.default_dimensions.to_s
}
]
fieldName: "vector",
isPrimary: false,
dataType: "FloatVector",
elementTypeParams: {
dim: llm.default_dimensions.to_s
}
}
]
)
Expand All @@ -97,27 +82,31 @@ def create_default_schema
# Create the default index
# @return [Boolean] The response from the server
def create_default_index
client.indices.create(
client.indexes.create(
collection_name: index_name,
field_name: "vectors",
extra_params: [
{key: "metric_type", value: "L2"},
{key: "index_type", value: "IVF_FLAT"},
{key: "params", value: "{\"nlist\":1024}"}
index_params: [
{
metricType: "L2",
fieldName: "vector",
indexName: "vector_idx",
indexConfig: {
index_type: "AUTOINDEX"
}
}
]
)
end

# Get the default schema
# @return [Hash] The response from the server
def get_default_schema
client.collections.get(collection_name: index_name)
client.collections.describe(collection_name: index_name)
end

# Delete default schema
# @return [Hash] The response from the server
def destroy_default_schema
client.collections.delete(collection_name: index_name)
client.collections.drop(collection_name: index_name)
end

# Load default schema into memory
Expand All @@ -138,16 +127,12 @@ def similarity_search(query:, k: 4)
def similarity_search_by_vector(embedding:, k: 4)
load_default_schema

client.search(
client.entities.search(
collection_name: index_name,
output_fields: ["id", "content"], # Add "vectors" if need to have full vectors returned.
top_k: k.to_s,
vectors: [embedding],
dsl_type: 1,
params: "{\"nprobe\": 10}",
anns_field: "vectors",
metric_type: "L2",
vector_type: ::Milvus::DATA_TYPES["float_vector"]
anns_field: "vector",
data: [embedding],
limit: k,
output_fields: ["content", "id", "vector"]
)
end

Expand All @@ -159,8 +144,7 @@ def similarity_search_by_vector(embedding:, k: 4)
def ask(question:, k: 4, &block)
search_results = similarity_search(query: question, k: k)

content_field = search_results.dig("results", "fields_data").select { |field| field.dig("field_name") == "content" }
content_data = content_field.first.dig("Field", "Scalars", "Data", "StringData", "data")
content_data = search_results.dig("data").map { |result| result.dig("content") }

context = content_data.join("\n---\n")

Expand Down
5 changes: 3 additions & 2 deletions lib/langchain/vectorsearch/qdrant.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ class Qdrant < Base
# Wrapper around Qdrant
#
# Gem requirements:
# gem "qdrant-ruby", "~> 0.9.3"
# gem "qdrant-ruby", "~> 0.9.8"
#
# Usage:
# qdrant = Langchain::Vectorsearch::Qdrant.new(url:, api_key:, index_name:, llm:)
Expand All @@ -22,7 +22,8 @@ def initialize(url:, api_key:, index_name:, llm:)

@client = ::Qdrant::Client.new(
url: url,
api_key: api_key
api_key: api_key,
logger: Langchain.logger
)
@index_name = index_name

Expand Down
5 changes: 3 additions & 2 deletions lib/langchain/vectorsearch/weaviate.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ class Weaviate < Base
# Wrapper around Weaviate
#
# Gem requirements:
# gem "weaviate-ruby", "~> 0.9.0"
# gem "weaviate-ruby", "~> 0.9.2"
#
# Usage:
# weaviate = Langchain::Vectorsearch::Weaviate.new(url: ENV["WEAVIATE_URL"], api_key: ENV["WEAVIATE_API_KEY"], index_name: "Docs", llm: llm)
Expand All @@ -22,7 +22,8 @@ def initialize(url:, index_name:, llm:, api_key: nil)

@client = ::Weaviate::Client.new(
url: url,
api_key: api_key
api_key: api_key,
logger: Langchain.logger
)

# Weaviate requires the class name to be Capitalized: https://weaviate.io/developers/weaviate/configuration/schema-configuration#create-a-class
Expand Down
Loading