Supporting blog content - Local rag with lightweight Elasticsearch #488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Delacrobix wants to merge 8 commits into elastic:main from Delacrobix:local-rag-with-lightweight-elasticsearch

Contributor

Delacrobix commented Sep 18, 2025

No description provided.

Delacrobix added 3 commits

September 18, 2025 11:13


          supporting blog content local-rag-with-lightweight-elasticsearch

8792ea0


          app logs

3b15238


          app-logs

40d7e6a

gitnotebooks bot commented Sep 18, 2025

Review these changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/488

Delacrobix added 5 commits

September 18, 2025 20:53


          script changes and docker image

89f9988


          Deleting docker-compose, adding tinyllama results, code changes and d…

505ad49

…ataset changes


          tinyLlama results

b8e502a


          qwen3:4b model results

401d76c


          local-rag-with-lightweight-elasticsearch

f2d67f3

carlyrichmond requested changes

View reviewed changes

.../local-rag-with-lightweight-elasticsearch/app-logs/llama-smoltalk-3.2-1b-instruct_results.md

Contributor

carlyrichmond Nov 21, 2025

Can you amend the output to ask the LLM to include sources? This will make it easier for the audience to find the applicable document from the dataset.

.../local-rag-with-lightweight-elasticsearch/app-logs/llama-smoltalk-3.2-1b-instruct_results.md

Contributor

carlyrichmond Nov 21, 2025

I'm slightly worried about including an example suggesting that a particular technology (specifically Elasticsearch) is slow, since this is on Elasticsearch labs. Especially since one fo the other models suggests it's inefficient. It might be worth amending the transcripts to include a common slowness use case (such as sharding or node issues for a high volume) and regenerate the answer. Alternatively I would change it to different technologies.

.../local-rag-with-lightweight-elasticsearch/app-logs/llama-smoltalk-3.2-1b-instruct_results.md



		## Stats
		✅ Indexed 5 documents in 250ms

Contributor

carlyrichmond Nov 21, 2025

Why is the indexing differing between models? I would expect indexing is a one-off operation independent to the model. This doesn't make sense to me. Should it be removed or clarified?

...og-content/local-rag-with-lightweight-elasticsearch/app-logs/why-elasticsearch-is-so-cool.md

Contributor

carlyrichmond Nov 21, 2025

I would change this example to something generic such as "Why is the sky blue?" or something else. As a developer this comes across as quite cringy to me.

supporting-blog-content/local-rag-with-lightweight-elasticsearch/script.py

+              from openai import OpenAI
+              ES_URL = "http://localhost:9200"
+              ES_API_KEY = "your-api-key-here"

Contributor

carlyrichmond Nov 21, 2025

I would change the URL, API key and LOCAL_AI_URL values to environment variables that are loaded via something like dotenv and a local .env file. While this is fine for local development, when developers try to move this to production they need to tidy it up. So lets set the example now.

supporting-blog-content/local-rag-with-lightweight-elasticsearch/script.py

		ai_client = OpenAI(base_url=LOCAL_AI_URL, api_key="sk-x")


		def build_documents(dataset_folder, index_name):

Contributor

carlyrichmond Nov 21, 2025

Should this be called load_documents as it's opening text files. It's not really building the documents from scratch which is misleading.

supporting-blog-content/local-rag-with-lightweight-elasticsearch/script.py

+                      if filename.endswith(".txt"):
+                          filepath = os.path.join(dataset_folder, filename)
+                          with open(filepath, "r", encoding="utf-8") as file:

Contributor

carlyrichmond Nov 21, 2025

I would add a comment explaining why you've used utf-8 encoding here.

supporting-blog-content/local-rag-with-lightweight-elasticsearch/script.py

		}


		def index_documents():

Contributor

carlyrichmond Nov 21, 2025

Add top level comments for each function explaining what they do.

supporting-blog-content/local-rag-with-lightweight-elasticsearch/script.py

+                  start_time = time.time()
+                  try:
+                      response = ai_client.chat.completions.create(

Contributor

carlyrichmond Nov 21, 2025

I would perhaps add a comment making clear that this is a simple generate rather than streaming of the response token by token.

supporting-blog-content/local-rag-with-lightweight-elasticsearch/script.py

+                  try:
+                      start_time = time.time()
+                      success, _ = helpers.bulk(

Contributor

carlyrichmond Nov 21, 2025

You should add the index creation code either here based on the condition that the index doesn't exist, or in a separate utility function. For semantic text you'll need to specify that mapping when creating the index, and that step is missing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet