Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚃 Langchain::ActiveRecord::Hooks module for vectorsearch capabilities #211

Merged
merged 17 commits into from
Jun 24, 2023

Conversation

andreibondarev
Copy link
Collaborator

@andreibondarev andreibondarev commented Jun 22, 2023

Jun-22-2023 15-29-38

TODOs:

  • Unit testing
  • similarity_search() should return actual ActiveRecord objects
  • Add info to the README

@kawakamimoeki
Copy link
Contributor

kawakamimoeki commented Jun 22, 2023

Nice. The use of LLM within the familiar MVC architecture would lower the psychological barrier for newcomers.


self.class.class_variable_get(:@@provider).add_texts(texts: self.to_json)

looks good. We can deal with any vector DB.

@kawakamimoeki
Copy link
Contributor

kawakamimoeki commented Jun 22, 2023

I assume that the provider is defined in the config, and only the index name is specified in the model.

@andreibondarev
Copy link
Collaborator Author

I assume that the provider is defined in the config, and only the index name is specified in the model.

Sorry -- I'm not sure what you mean?

@kawakamimoeki
Copy link
Contributor

kawakamimoeki commented Jun 22, 2023

Langchain::ActiveRecord.configure do |config|
  config.provider_name = :weaviate
  config.provider_url = 
  config.provider_api_key =
end
class Recipe < ActiveRecord::Base
  include Langchain::ActiveRecord::Hooks

  vector_index_name "Recipes"
end

or vector index name is automatically defined by model name.

@andreibondarev
Copy link
Collaborator Author

andreibondarev commented Jun 22, 2023

@moekidev Then -- do we even need to declare the index name? We can just assume it's the name of the table/model name?

Langchain::ActiveRecord.configure do |config|
config.provider_name = :weaviate
config.provider_url =
config.provider_api_key =
end

This also assumes that there's only 1 vector search DB per application, and that's probably the case most of the time, but who knows?! :)

I once worked on a Rails application that connected to 10+ different databases.

@kawakamimoeki
Copy link
Contributor

kawakamimoeki commented Jun 22, 2023

I once worked on a Rails application that connected to 10+ different databases.

OK😂 I guess we should adopt a design that allows us to choose a vector DB for each model. Each model may have different vector DB requirements.

@technicalpickles
Copy link
Contributor

As this is right now, I don't think it adds a ton of value to for a user, as opposed to just hooking it into yourself. Some things I can think of that would make it more compelling:

  • less code to set up a vector store in a model
  • define the vectorstores more centrally
  • handle things like index names automatically, ie use the model name

I have been reading up for activestorage recently, and I think it gives a pretty good pattern. You have a config/storage.yml (similar to config/database.yml) that defines all the different service you might use:

local: # the key is how you use refer to it from the code
  service: Disk # this corresponds to a class on the backend
  root: <%= Rails.root.join("storage") %> # ERB is supported

test:
  service: Disk
  root: <%= Rails.root.join("tmp/storage") %>

amazon:
  service: S3
  access_key_id: ""
  secret_access_key: ""
  bucket: ""
  region: "" # e.g. 'us-east-1'

Models can also choose to use something different:

class User < ApplicationRecord
  has_one_attached :avatar, service: :s3
end

For the other stuff, I'm kinda curious how much ends up being vectorstore specific 🤔 Like the index, at least the naming conventions, would be specific weaviate if I am reading some of the comments correctly?

technicalpickles and others added 2 commits June 22, 2023 20:33
* Add railtie to automatically include the hooks

* fix the example's method call. assume vectorsearch is available
@andreibondarev
Copy link
Collaborator Author

  1. less code to set up a vector store in a model
  2. define the vectorstores more centrally
  3. handle things like index names automatically, ie use the model name

@technicalpickles The PR you opened takes care of # 1.

  1. I agree that some sort of YAML config would be nice but do you think we need it now?
  2. I'll make index_name: default to the .table_name.

For the other stuff, I'm kinda curious how much ends up being vectorstore specific
This is a very good question. When you look at the classes under langchain/vectorsearch/*, what does it look like to you? There's certainly difference, similar to how there's differences between the different LLMs, but I wonder if we can continue drawing out a common DSL for all of them with opts={} to go deeper with each of them if needed.

lib/langchain/active_record/hooks.rb Outdated Show resolved Hide resolved
lib/langchain/active_record/hooks.rb Outdated Show resolved Hide resolved
Comment on lines 28 to 29
# Weaviate requires the class name to be Capitalized: https://weaviate.io/developers/weaviate/configuration/schema-configuration#create-a-class
@index_name = index_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should add some validations here? Raise if it's not capitalized.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically Weaviate will automatically capitalize "products" to "Products" when you're creating an index but then will claim that it's not found if you search the "products" index later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I'm suggesting to either automatically capitalize, or raise an error if it's not capitalized already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with auto-capitalize then.

lib/langchain/active_record/hooks.rb Outdated Show resolved Hide resolved
# vectorsearch provider: Langchain::Vectorsearch::Weaviate.new(
# api_key: ENV["WEAVIATE_API_KEY"],
# url: ENV["WEAVIATE_URL"],
# index_name: "Recipes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this handled automatically.

I was thinking this could be handled in a similar way to ActiveRecord::Base.table_name?

Giving it more thought though, the index name is needed at the time vector search is instantiated. Could make it work if we made it lazily instantiated. Maybe something like...

vectorsearch weaviate: {
  api_key: ENV["WEAVIATE_API_KEY"],
  url: ENV["WEAVIATE_URL"],
}

# then in the class method
def vectorsearch(providers = {})
  # only allow one to be passed in
  # providers.keys corresponds to the object
  provider = providers.keys.first
  self.class.class_variable_set(:@@provider_options, providers[provider))
  # ...
end

def vectorsearch_provider
  @vectorsearch_provider ||=  ... # lookup provider by name, pass in the options
end

Comment on lines 28 to 29
# Weaviate requires the class name to be Capitalized: https://weaviate.io/developers/weaviate/configuration/schema-configuration#create-a-class
@index_name = index_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I'm suggesting to either automatically capitalize, or raise an error if it's not capitalized already.

@andreibondarev andreibondarev marked this pull request as ready for review June 24, 2023 21:41
@andreibondarev andreibondarev merged commit fe1b372 into main Jun 24, 2023
@andreibondarev andreibondarev deleted the active-record-hooks branch June 24, 2023 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants