Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MongoDB for custom data layer #796

Closed

Conversation

sandangel
Copy link
Contributor

  • fix: correct typing for _data_layer
  • draft implement for mongodb data layer

Signed-off-by: San Nguyen <vinhsannguyen91@gmail.com>
Signed-off-by: San Nguyen <vinhsannguyen91@gmail.com>
@willydouhard
Copy link
Collaborator

@tpatel #796 #796

@tpatel
Copy link
Contributor

tpatel commented Mar 7, 2024

@sandangel Thanks for your contribution, this is a great idea!

I have some feedback:

  • The best way to implement a custom data layer should be to only provide a class that overrides BaseDataLayer (this is what you've done with MongoDataLayer)
  • There should be no changes to the rest of the chainlit code (outside of exposing this class in the package). Users are expected to do cl_data._data_layer = MongoDataLayer() in their chainlit app, to configure the data layer.

Let me know if you have any question!

@sandangel
Copy link
Contributor Author

Sure, let me revert the change in config. I will update the PR tomorrow.

Signed-off-by: San Nguyen <vinhsannguyen91@gmail.com>
Signed-off-by: San Nguyen <vinhsannguyen91@gmail.com>
@sandangel
Copy link
Contributor Author

sandangel commented Mar 10, 2024

@tpatel , I have a small PR to fix typing for the data layer first to help me test my change on local. Could you please check them first? #802

Once I finished integration tests I will update this PR.

Signed-off-by: San Nguyen <vinhsannguyen91@gmail.com>
@hayescode
Copy link
Contributor

@sandangel thank you for making this! This is a great step in the right direction!

One question: Can the blob storage be split from the class to give users the ability to use MongoDB in other cloud environments like Azure/Google?

@sandangel
Copy link
Contributor Author

@hayescode
For now I'm changing the approach to mimic the literalai client instead. Ideally users can implement their own upload function that use different blob storage

@sandangel
Copy link
Contributor Author

hi @tpatel , I finished the mongodb custom data layer on my local and now testing. However I could not enable the chat history. Do you have any idea?
image

@sandangel
Copy link
Contributor Author

Oh I found the issue. I need to use cl_data._data_layer instead.

@Jimmy-Newtron
Copy link

This is an excellent work 👍

I am really enthusiast of this and it would be great if Elasticsearch can be used as well

You can use Elasticsearch as a vector store for RAG Assistants

@sandangel
Copy link
Contributor Author

sandangel commented Mar 15, 2024

@Jimmy-Newtron If you look at the mongo_api.py, you'll see I'm using mongo client directly. We might need to have a database adapter to support different databases. Perhaps you can use elastic search for vector search and mongo db for storing chat data. Others feel free to have a look at how I handle mongo_api client to mimic literalai client.

Implementing the BaseDataLayer has been very confusing because of the mapping between chainlit types and literalai types. However, since chainlit data layer only use Literal Client to interact with Literal AI, we can implement LiteralAI Client for Mongo DB instead and let chainlit handle the type mapping between chainlit and literalai.
I think it's cleaner than trying to implement the BaseDataLayer with both chainlit + literalai

@tpatel
Copy link
Contributor

tpatel commented Mar 21, 2024

I'm curious to learn more about what is confusing with the types? My goal is to make sure it's smooth to implement the data layers.

One thing that I've noticed in your PR is that you are inheriting a class from ChainlitDataLayer, which shouldn't be needed. The ideal scenario is to use ChainlitDataLayer only as an exemple and to inherit from BaseDataLayer.

@sandangel
Copy link
Contributor Author

hi @tpatel . Thanks for checking. In my cases, I feel confused because lots of data types were imported from literalai and then mapped to chainlit data types. In chainlit data types, there is also a confusion for having multiple types to describe the same object. Like User/UserDict. Ideally, all the DSL should be defined in chainlit, and literalai will just import those instead. Also in base datta layer, all the methods are not clear and well designed to be implemented properly. The LiteralAI client has a better interface to implement and really straight forward.

@sandangel
Copy link
Contributor Author

Closing, since our team decided to use postgres instead

@sandangel sandangel closed this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants