Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Generate problem support for title variable #2310

Merged
merged 1 commit into from
Feb 18, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion apps/dataset/task/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ def generate_problem_by_paragraph(paragraph, llm_model, prompt):
try:
ListenerManagement.update_status(QuerySet(Paragraph).filter(id=paragraph.id), TaskType.GENERATE_PROBLEM,
State.STARTED)
res = llm_model.invoke([HumanMessage(content=prompt.replace('{data}', paragraph.content))])
res = llm_model.invoke(
[HumanMessage(content=prompt.replace('{data}', paragraph.content).replace('{title}', paragraph.title))])
if (res.content is None) or (len(res.content) == 0):
return
problems = res.content.split('\n')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The given code has two main issues:

  1. Multiple replace functions: The original prompt.replace('{data}', paragraph.content) replaces all occurrences of the placeholder {data} with the paragraph's content. However, if the input prompt also contains a placeholder for '{title}', this line will replace both placeholders unnecessarily.

    # Incorrect usage due to double replacement
    Prompt = "I want you to generate problems based on {data} ({title})."
    Paragraph_content = "Sample Content"
    Title = "Sample Title"
    
    prompt_with_title = Prompt.replace('{data}', Paragraph_content)  # This replaces '{data}'
    final_prompt = prompt_with_title.replace('{title}', Title)       # This replaces '{title}' again

    Instead, separate replacements should be done to avoid unnecessary replacements and make it clearer that each variable serves a specific purpose when generating text.

  2. String formatting in Python: If there is an intention to format strings using variables from paragraph.title, consider using named parameters instead of keyword arguments when creating objects like HumanMessage. In many languages like Python and some others, using named parameters can help improve readability by clearly indicating which part of the string corresponds to which parameter value.

    HumanMessage(content=prompt.replace('{data}', paragraph.content))

    becomes:

    HumanMessage(content='{originalPrompt}'.format(originalPrompt=prompt.format(data=paragraph.content)))

Here’s how these improvements could look after addressing the first issue alone:

Modified Code Block

try:
    ListenerManagement.update_status(QuerySet(Paragraph).filter(id=paragraph.id), TaskType.GENERATE_PROBLEM,
                                     State.STARTED)
    
    prompts_list = [
        {"message": prompt.replace('{data}', paragraph.content)},
        {"message": prompt.replace('{title}', paragraph.title)}
    ]

    for prompt_dict in prompts_list:
        res = llm_model.invoke([HumanMessage(**prompt_dict)])
    
        if (res.content is None) or (len(res.content) == 0):
                return
    
        problems = res.content.split('\n')

These changes ensure that each message generated uses only relevant placeholders defined in its respective dictionary key.

Expand Down