-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bench #843
Bench #843
Conversation
Barebones module created with intended goals before implementation.
Added init and degenerate test for bench module.
Tests implemented for lib_bench_dir function for benchmarking.
Additional comments on env var use.
Langchain can now be downloaded when needed for powerlift integration.
Benchmarking notebook available for JSON output. Also minor changes to tests and API for benchmarking.
Implementation details now included for module docs.
Reformat and removal of function no longer needed for public facing API.
Rename to conform with rest of notebooks with underscores and lower casing.
Can now run benchmarks outside of notebook via tests or direct `bench` call.
Pandas still optional for setup via bench.
Incorrectly added bench to extra list earlier. This should fix it.
Adjusted so that huggingface hub is used to download LMs.
Missing comma within bench tags.
Added missing packages needed for bench. Args clean-up and path refactoring.
Default args.
Removal of no longer needed test.
Some tests need API keys for benchmarking. Disabled.
TODO comment removed for integration.
from guidance.bench._powerlift import retrieve_langchain | ||
from pathlib import Path | ||
|
||
def test_retrieve_langchain_err(monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monkeypatch is a fixture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's an in-built fixture within pytest.
… unit tests. Argument type fixed in `retrieve_langchain`. Added bench dependencies to plain unit tests.
Optional keywords are now without Optional top-level type. Variables that shadow previous types are adjusted. Ignore on powerlift module due to imports having no stubs / markers.
`lib_bench_dir` was not assigning correctly to env_lib_path.
No module named 'pkg_resources' is an error for CI so far. Added setuptools as dependency for bench to attempt to correct this.
Extra tag added for all tests for bench.
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #843 +/- ##
==========================================
+ Coverage 59.50% 59.87% +0.37%
==========================================
Files 59 63 +4
Lines 4334 4571 +237
==========================================
+ Hits 2579 2737 +158
- Misses 1755 1834 +79 ☔ View full report in Codecov by Sentry. |
Uninstalls first, then installs specific version of llama-cpp-python in case the bench tag has installed it already.
Set -y to uninstall to remove interactivity.
LGTM |
First iteration of adding benchmarks to guidance.
Includes notebook and backing code in
guidance.bench
module for code reproducibility.guidance
is tested on LangChain's Chat Extract dataset. They've done solid work finding a problem with realistic structured JSON output that includes conditionals, nested fields and constraints when checking for JSON schema validation as well.Dependencies are hidden behind an extra tag
bench
. This shouldn't impact standard installations.Test coverage should be high, however I've skipped some tests here as the CI won't be able to run it without an API key to LangChain.
Code is structured to work across multiple GPU containers, but not fully integrated yet. Will have to work on guidance dockerfile later for that.
LMK if more details / changes needed.