Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate the use of a chatbot to form a knowledgebase #2026

Open
stichbury opened this issue Nov 15, 2022 · 18 comments
Open

Investigate the use of a chatbot to form a knowledgebase #2026

stichbury opened this issue Nov 15, 2022 · 18 comments

Comments

@stichbury
Copy link
Contributor

We need a good way for Kedro users to get answers to their questions. Right now, they could search the discord linen archive or Slack for previous discussions but the UX isn't great. Or they can look at our written FAQ (which they do -- it's a popular page) but it doesn't deliver what they expect (it doesn't go into specific answers to specific questions). They probably end up on Google looking for answers and maybe StackOverflow.

It would be great to run a NLP chatbot that has been trained on our archives and documentation and can have a stab at an answer or link users to the right location to start their research. This is the holy grail of all documentation though, and does rely on a decent knowledgebase to train it, which we probably don't have (at least, we have content, but it's not clear whether it is suitable).

I think we need to first investigate the state of this kind of solution and then look at whether we can apply it to a Slackbot for Kedro.

This is early days, but here's a few links:

This issue is to seed some discussion and potentially earmark some time for research at a hackathon or similar spike.

@stichbury stichbury added Component: Documentation 📄 Issue/PR for markdown and API documentation Issue: Feature Request New feature or improvement to existing feature labels Nov 15, 2022
@stichbury
Copy link
Contributor Author

I've been playing with ChatGPT (https://chat.openai.com/chat) a bit recently and there's definitely some scope.
Screenshot 2022-12-05 at 12 44 43

@stichbury
Copy link
Contributor Author

stichbury commented Dec 6, 2022

Recently, we've seen the ChatGPT project put out a beta of what I was mumbling about above. Thanks OpenAI 👯

While it's not yet ready, there are some ways we can prepare for a future where Kedro users turn to ChatGPT to answer their queries. A rough list:

  • We need to ensure that our corpus of Q&A is indexed. This includes linen archives and any other content existing on Slack, Discourse etc.
  • We create some complete and useful answers that rank well on queries such as "What is Kedro?" (and, more generally, "How do I ensure my data science code is reusable" etc. This ensures Kedro gets into the text returned)
  • It also makes sense to build a set of common Q&A to be indexed and provide the basis of answers to typical questions. I had previously been reluctant to write FAQ documentation because it dates and is hard to keep in-sync. I have a new insight into this based on the potential of Chat systems such as ChatGPT. I think we should write some FAQ docs and keep them on the website so they are indexed, but not linked from site navigation. The information is available for search without a signficant overhead of building a specific information architecture or even elaborate site design.

@yetudada
Copy link
Contributor

Looks like this will be possible, Dagster built something like this: https://dagster.io/blog/chatgpt-langchain

This ticket should also probably link to #1649

@stichbury
Copy link
Contributor Author

@astrojuanlu Also this one up for discussion

@astrojuanlu
Copy link
Member

Somebody pointed me into this direction: https://langchain.readthedocs.io/en/latest/use_cases/question_answering.html

@datajoely
Copy link
Contributor

Relevant tool
https://www.mendable.ai/?s=03

@astrojuanlu
Copy link
Member

More: https://docsbot.ai/

@noklam
Copy link
Contributor

noklam commented Apr 4, 2023

It's crazy how quick this space evolve. It's quite feasible to build one with langchain, you can also limited the context that it reads docs.kedro.org and generate answer with relevant link only in the docs (so it's not making random stuff up).

@astrojuanlu
Copy link
Member

@astrojuanlu
Copy link
Member

"Don't replace your user community with an LLM-based chatbot" https://thisisimportant.net/posts/user-community-llm-chatbots/

@astrojuanlu
Copy link
Member

Beware of pushback from the tech community mdn/yari#9208

@datajoely
Copy link
Contributor

@stichbury
Copy link
Contributor Author

"Don't replace your user community with an LLM-based chatbot" https://thisisimportant.net/posts/user-community-llm-chatbots/

I don't think it was ever a binary choice of either/or, was it? If the community want a knowledgebase, let's give it to them alongside current options...we're not proposing to remove anything.

@stichbury
Copy link
Contributor Author

@noklam and I worked on this as part of the Quantazio Hack. It would be good to continue the work as part of the ongoing docs effort.

@astrojuanlu
Copy link
Member

If we ever get to this, probably we'd use some form of Retrieval Augmented Generation (RAG), see https://github.com/imartinez/privategpt

@astrojuanlu
Copy link
Member

also xref kedro-org/kedro-plugins#434

@astrojuanlu astrojuanlu removed Component: Documentation 📄 Issue/PR for markdown and API documentation Issue: Feature Request New feature or improvement to existing feature labels Dec 18, 2023
@astrojuanlu
Copy link
Member

Related: publishing a custom GPT on Kedro, MLOps? https://help.openai.com/en/articles/8798878-building-and-publishing-a-gpt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants