-
-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚃 Langchain::ActiveRecord::Hooks
module for vectorsearch capabilities
#211
Conversation
Nice. The use of LLM within the familiar MVC architecture would lower the psychological barrier for newcomers.
looks good. We can deal with any vector DB. |
I assume that the provider is defined in the config, and only the index name is specified in the model. |
Sorry -- I'm not sure what you mean? |
Langchain::ActiveRecord.configure do |config|
config.provider_name = :weaviate
config.provider_url =
config.provider_api_key =
end class Recipe < ActiveRecord::Base
include Langchain::ActiveRecord::Hooks
vector_index_name "Recipes"
end or vector index name is automatically defined by model name. |
@moekidev Then -- do we even need to declare the index name? We can just assume it's the name of the table/model name?
This also assumes that there's only 1 vector search DB per application, and that's probably the case most of the time, but who knows?! :) I once worked on a Rails application that connected to 10+ different databases. |
OK😂 I guess we should adopt a design that allows us to choose a vector DB for each model. Each model may have different vector DB requirements. |
As this is right now, I don't think it adds a ton of value to for a user, as opposed to just hooking it into yourself. Some things I can think of that would make it more compelling:
I have been reading up for activestorage recently, and I think it gives a pretty good pattern. You have a local: # the key is how you use refer to it from the code
service: Disk # this corresponds to a class on the backend
root: <%= Rails.root.join("storage") %> # ERB is supported
test:
service: Disk
root: <%= Rails.root.join("tmp/storage") %>
amazon:
service: S3
access_key_id: ""
secret_access_key: ""
bucket: ""
region: "" # e.g. 'us-east-1' Models can also choose to use something different: class User < ApplicationRecord
has_one_attached :avatar, service: :s3
end For the other stuff, I'm kinda curious how much ends up being vectorstore specific 🤔 Like the index, at least the naming conventions, would be specific weaviate if I am reading some of the comments correctly? |
@technicalpickles The PR you opened takes care of # 1.
|
# Weaviate requires the class name to be Capitalized: https://weaviate.io/developers/weaviate/configuration/schema-configuration#create-a-class | ||
@index_name = index_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should add some validations here? Raise if it's not capitalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically Weaviate will automatically capitalize "products"
to "Products"
when you're creating an index but then will claim that it's not found if you search the "products"
index later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I'm suggesting to either automatically capitalize, or raise an error if it's not capitalized already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with auto-capitalize then.
Co-authored-by: Josh Nichols <josh.nichols@gusto.com>
… method when indexing
# vectorsearch provider: Langchain::Vectorsearch::Weaviate.new( | ||
# api_key: ENV["WEAVIATE_API_KEY"], | ||
# url: ENV["WEAVIATE_URL"], | ||
# index_name: "Recipes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see this handled automatically.
I was thinking this could be handled in a similar way to ActiveRecord::Base.table_name
?
- https://api.rubyonrails.org/classes/ActiveRecord/ModelSchema/ClassMethods.html#method-i-table_name-3D
- https://api.rubyonrails.org/classes/ActiveRecord/ModelSchema/ClassMethods.html#method-i-table_name
Giving it more thought though, the index name is needed at the time vector search is instantiated. Could make it work if we made it lazily instantiated. Maybe something like...
vectorsearch weaviate: {
api_key: ENV["WEAVIATE_API_KEY"],
url: ENV["WEAVIATE_URL"],
}
# then in the class method
def vectorsearch(providers = {})
# only allow one to be passed in
# providers.keys corresponds to the object
provider = providers.keys.first
self.class.class_variable_set(:@@provider_options, providers[provider))
# ...
end
def vectorsearch_provider
@vectorsearch_provider ||= ... # lookup provider by name, pass in the options
end
# Weaviate requires the class name to be Capitalized: https://weaviate.io/developers/weaviate/configuration/schema-configuration#create-a-class | ||
@index_name = index_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I'm suggesting to either automatically capitalize, or raise an error if it's not capitalized already.
TODOs:
similarity_search()
should return actual ActiveRecord objects