Skip to content

Conversation

@Delacrobix
Copy link
Contributor

No description provided.

@gitnotebooks
Copy link

gitnotebooks bot commented Sep 18, 2025

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you amend the output to ask the LLM to include sources? This will make it easier for the audience to find the applicable document from the dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about including an example suggesting that a particular technology (specifically Elasticsearch) is slow, since this is on Elasticsearch labs. Especially since one fo the other models suggests it's inefficient. It might be worth amending the transcripts to include a common slowness use case (such as sharding or node issues for a high volume) and regenerate the answer. Alternatively I would change it to different technologies.



## Stats
✅ Indexed 5 documents in 250ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the indexing differing between models? I would expect indexing is a one-off operation independent to the model. This doesn't make sense to me. Should it be removed or clarified?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this example to something generic such as "Why is the sky blue?" or something else. As a developer this comes across as quite cringy to me.

from openai import OpenAI

ES_URL = "http://localhost:9200"
ES_API_KEY = "your-api-key-here"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the URL, API key and LOCAL_AI_URL values to environment variables that are loaded via something like dotenv and a local .env file. While this is fine for local development, when developers try to move this to production they need to tidy it up. So lets set the example now.

ai_client = OpenAI(base_url=LOCAL_AI_URL, api_key="sk-x")


def build_documents(dataset_folder, index_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called load_documents as it's opening text files. It's not really building the documents from scratch which is misleading.

if filename.endswith(".txt"):
filepath = os.path.join(dataset_folder, filename)

with open(filepath, "r", encoding="utf-8") as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment explaining why you've used utf-8 encoding here.

}


def index_documents():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add top level comments for each function explaining what they do.

start_time = time.time()

try:
response = ai_client.chat.completions.create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would perhaps add a comment making clear that this is a simple generate rather than streaming of the response token by token.

try:
start_time = time.time()

success, _ = helpers.bulk(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add the index creation code either here based on the condition that the index doesn't exist, or in a separate utility function. For semantic text you'll need to specify that mapping when creating the index, and that step is missing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants