-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt Experimentation and UI updates #36
Conversation
Co-authored-by: Aakanksha Duggal <aduggal@redhat.com>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @oindrillac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few minor comments, but overall it looks great 🎉
We can summarize the findings and results from the prompt_experiments.ipynb
into our final report/slide deck and share it with the team 😄
"**LangChain evaluation on grammar, descriptiveness and helpfulness:**", | ||
help="Use Langchain to evaluate on cutsom criteria (this list can be updated based on what we are looking to see from the generated docs" | ||
"**LLM based evaluation on logic, correctness and helpfulness:**", | ||
help="Use Langchain Criteria based Eval to evaluate on cutsom criteria (this list can be updated based on what we are looking to see from the generated docs). Note this is language mo0del based evaluation and not always a true indication of the quality of the output that is generatged." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit fix: Note this is language mo0del....
should be Note this is language model....
and ...output this is generatged
should be ....output that is generated.
@@ -0,0 +1,3141 @@ | |||
,model,prompt,code_file,part,response,langchain_helpfulness,langchain_correctness,langchain_logical,instruction,total_langchain_score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repo structuring change: We should perhaps move all the CSV files and the eval_df.pkl
file into a separate folder: /evaluation/data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't review this notebook on Reviewnb and the file diff was too large to leave comments line by line, so summarizing my comments here:
- Include a small description at the beginning of the notebook to explain what the notebook is about
- Since the
get_response()
andappend_row_to_dataframe()
functions are already defined in thehelper_functions.ipynb
, maybe you can invoke the function from there rather than defining the function again in this notebook? - Small nit fix: in the
Conclusion
section, I think point 3 needs to be rephrased from :When he granite model it fails, it fails becuase half....
toWhen the granite model fails, it fails because half of the time....
As discussed, we will be addressing the comments in a separate PR. Merging this for now. /lgtm |
No description provided.