-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Symfony docs chat bot (RAG) #19445
Comments
Yes, that would be awesome, but maybe not as trivial. @OskarStark and I already started this before, but the first results were actually not better than plain GPT, since the model already knows a lot about Symfony. But you're right - it is not up to date. Were you already able to combine Symfony Docs, RAG and the coding "skills" of a model? Do you have a repo already to have a look? |
Ok interesting, which model did you use ? No, it's just the beginning, I only played with the Mistral 7b model and I was quite impressed that a small and open source model gave me good results. With some up to date docs it should give great results I think. A GPT-4 will always be better for analysis and programming but I think having a tool able to give pertinent results and links to the doc could add a lot of value to the current symfony documentation 🤔 |
We used ada for embeddings, pgvector for retrieval and gpt-4 as model |
And a RAG with GPT-4 was weaker than GPT-4 alone ? So weird 🤔 But maybe creating a chatbot is a bit too ambitious. Let's start with a simple RAG to simplify the doc navigation |
And what was your strategy to split the chunk ? Did you add metadata to the chunks ? |
Feedback POCI created a POC able to retrieve/parse/chunk a Symfony doc page. I used the page scheduler cause the scheduler is a really new component. vector database : weaviate Prompt :"You are an expert in the programming language PHP and the PHP framework Symfony. With the help of the following documents, your knowledge about PHP and the Symfony framework, please answer the question/instruction : 'How to create a custom trigger with the scheduler component ?'. Include code examples of PHP and Symfony. You will format you answer in markdown" Answer :Creating a custom trigger with the Symfony scheduler component involves implementing the Here's an example of how to create a custom trigger that runs a message every other day:
// src/Scheduler/Trigger/CustomTrigger.php
namespace App\Scheduler\Trigger;
use Symfony\Component\Scheduler\TriggerInterface;
use Symfony\Component\Scheduler\Trigger\TriggerableInterface;
class CustomTrigger implements TriggerInterface
{
public function getNextRunDate(TriggerableInterface $triggerable, \DateTimeImmutable $run):
?\DateTimeImmutable
{
$nextRun = $run->modify('+2 days');
return $nextRun;
}
public function __toString(): string
{
return 'Custom Trigger';
}
}
use App\Scheduler\Message\MyMessage;
use App\Scheduler\Trigger\CustomTrigger;
RecurringMessage::trigger(
new CustomTrigger(),
new MyMessage()
);
// src/Scheduler/MyScheduleProvider.php
namespace App\Scheduler;
use App\Scheduler\Message\MyMessage;
use App\Scheduler\Trigger\CustomTrigger;
use Symfony\Component\Scheduler\Schedule;
use Symfony\Component\Scheduler\ScheduleProviderInterface;
#[AsSchedule('uptoyou')]
class SaleTaskProvider implements ScheduleProviderInterface
{
public function getSchedule(): Schedule
{
return $this->schedule ??= (new Schedule())
->with(
RecurringMessage::trigger(
new CustomTrigger(),
new MyMessage()
)
);
}
}
$ php bin/console messenger:consume scheduler_nameofyourschedule This is a simple example of creating a custom trigger that runs a message every other day. You can modify The result is quite nice I think. With the simple mistal-7b model the results were not good. I asked the same prompt to the model without any document and it was able to answer a bit but not as rich as the RAG (I don't know how he knows about the scheduler whereas it's a new component, maybe he is confusing with the older bundle 🤔) |
Just see this : https://youtu.be/zFJgRd05Noo?si=eHgUklO5-iW11Fcf |
This is a POC for educational purposes, to test the LLPhant library. Personally, I've tested it by indexing the entirety of the documentation, and the results are quite good, although GPT-4 is already not too bad by default. The main advantage is noticeable in the freshness of the information. |
Feedback POCI indexed all the current documentation. With a basic user question, the vector database was not really able to retrieve the most interesting documents... A way to improve it was to ask the model to rephrase the question but at some the point, the most efficient way was to ask the model to answer the question then look for similarities in the DB. For this first step it's what I'm going to do => a model that answer a question and then I retrieve the most pertinent documents thanks to the answer. Like this the user can look further directly with the documentation (with each document I have the link to the documentation) |
So retrieval basically is a similarity search for further reading based on the plain models response? |
I might be wrong but as the documentation already use Meilisearch as its primary search engine and as MS introduced recently the vector search feature, maybe the team can work around it to build a "tailor-made" bot? Documentation: https://www.meilisearch.com/docs/learn/experimental/vector_search |
Yeah exactly. I tested 4 different approaches and this one seems to be the best. 1. Retrieval with the user questionThis one gave really bad result, the vector database seems to be lost, it's like there is not enough context. 2. Retrieval with the keywords extract from the questionI was asking the model to extract the main keywords/concepts of the user question. It was much better but I was losing the user intention and the vector database was still missing some context. For example, if the model return the word 'Symfony' at some point, all the retrieval was about installing or local web server.... 3 Retrieval with the context around the questionI was asking the model to give keywords, main concepts and context around the user question. This was working great but at some point the model was just answering the question... 4 Retrieval with the answer of the modelIn all the previous approaches, I was giving the retrieve documents as context for the model to answer the question. As @chr-hertel said previously the models (GPT-4, Mistral 8x7b) are really good and I was not seeing significant improvements. So I decided to let the model answer (Of course I'm still guiding with some prompt engineering) and then use the answer to retrieve the documents. This approach is the most efficient and really cheap, since I'm not giving documents as context, querying the model is much cheaper. With this approach I have a specialized Symfony bot (Thanks to prompt engineering) and if the user want to see further or get up to date data, the most related doc sections are complementing the model answer thanks to the retrieval |
Interesting, maybe there is an easy bridge to make 🤔 |
Thanks for this proposal. I find this discussion super interesting. Same for the talk that @chr-hertel delivered in the recent Symfony online conference. However, there's a big problem that will prevent us from using this: money. Using a custom solution hosted by us is out of question, but using embeddings with Meilisearch won't be possible either because of this:
We can't pay for this ... so we need to hope that GPT-4 and others crawl Symfony Docs again and their contents update with the latest changes. Thanks for understanding 🙏 |
Last update Screencast.from.25.01.2024.21.03.56.webm |
The indexing part is not really an issue, with the text2vec_context model of Weaviate it's free. Anyway, I will start the project as a side project and if it works, let's talk again how we can bridge it with the Symfony docs |
Hello, excellent idea! I've been thinking about applying a similar concept to internal documentation. I'm very interested in your progress in this area. Although I lack experience in this field, I'm happy to contribute. Please don't hesitate to contact me if you need any help. |
Hi 👋 Have you also looked at a custom chat gpt where you can upload docs as permanent context ? https://help.openai.com/en/articles/8554397-creating-a-gpt This would need to be fed with a big text version of the docs but it could work |
I created one dedicated to Symfony UX and gave it all the source code + documentation + some blog posts... and the results are sometimes impressive, sometimes 100% off-topic. And it has this weird habit to persist in mistakes. This one has a lot of "custom-made" instructions, tailored for my specific needs during the experiment, but i'll recreate and open one more "standard" next month i hope, if anyone want to play with. |
@florentdestremau I made one but the thing is you have to be premium to access it: https://chat.openai.com/g/g-hpapJXDZg-symfonychatbot |
Hi everyone,
I'm thinking about creating a chatbot (or at least a query system with enhanced result) based on the Symfony/PHP documentation.
Tell me what you think, if you see any blockers or if you have any suggestions. Happy coding !
1. issues
2. Solution
Of course, I'm not crazy, I don't want to trained a model, it's quite expensive and I don't have the skills. My idea is to create a RAG. By indexing multiple documentations of the Symfony ecosystem in a vector database, and then use a small open-source model (Mistral 7b) to analyse and enhanced the result, I think I can create something really nice.
3. Steps
The POC
V1
V2
More feature
4. Business model
Hosting the infrastructure and using the mistral 7b model will have a cost which I can afford as a side-project but not if people start using it a lot. I'm open to any suggestion.
Sources
The text was updated successfully, but these errors were encountered: