Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bench #843

Merged
merged 30 commits into from
May 21, 2024
Merged

Bench #843

merged 30 commits into from
May 21, 2024

Conversation

nopdive
Copy link
Collaborator

@nopdive nopdive commented May 21, 2024

First iteration of adding benchmarks to guidance.

Includes notebook and backing code in guidance.bench module for code reproducibility.
guidance is tested on LangChain's Chat Extract dataset. They've done solid work finding a problem with realistic structured JSON output that includes conditionals, nested fields and constraints when checking for JSON schema validation as well.

Dependencies are hidden behind an extra tag bench. This shouldn't impact standard installations.

Test coverage should be high, however I've skipped some tests here as the CI won't be able to run it without an API key to LangChain.

Code is structured to work across multiple GPU containers, but not fully integrated yet. Will have to work on guidance dockerfile later for that.

LMK if more details / changes needed.

nopdive added 20 commits May 16, 2024 01:03
Barebones module created with intended goals before implementation.
Added init and degenerate test for bench module.
Tests implemented for lib_bench_dir function for benchmarking.
Additional comments on env var use.
Langchain can now be downloaded when needed for powerlift integration.
Benchmarking notebook available for JSON output. Also minor changes to
tests and API for benchmarking.
Implementation details now included for module docs.
Reformat and removal of function no longer needed for public facing API.
Rename to conform with rest of notebooks with underscores and lower
casing.
Can now run benchmarks outside of notebook via tests or direct `bench`
call.
Pandas still optional for setup via bench.
Incorrectly added bench to extra list earlier. This should fix it.
Adjusted so that huggingface hub is used to download LMs.
Missing comma within bench tags.
Added missing packages needed for bench. Args clean-up and path
refactoring.
Removal of no longer needed test.
Some tests need API keys for benchmarking. Disabled.
TODO comment removed for integration.
from guidance.bench._powerlift import retrieve_langchain
from pathlib import Path

def test_retrieve_langchain_err(monkeypatch):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monkeypatch is a fixture?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's an in-built fixture within pytest.

… unit

tests.

Argument type fixed in `retrieve_langchain`. Added bench dependencies to
plain unit tests.
@nopdive nopdive marked this pull request as draft May 21, 2024 03:13
Optional keywords are now without Optional top-level type. Variables
that shadow previous types are adjusted. Ignore on powerlift module
due to imports having no stubs / markers.
`lib_bench_dir` was not assigning correctly to env_lib_path.
No module named 'pkg_resources' is an error for CI so far.
Added setuptools as dependency for bench to attempt to correct this.
Extra tag added for all tests for bench.
@codecov-commenter
Copy link

codecov-commenter commented May 21, 2024

Codecov Report

Attention: Patch coverage is 16.87764% with 197 lines in your changes are missing coverage. Please review.

Project coverage is 59.87%. Comparing base (3377383) to head (00e3a8c).
Report is 1 commits behind head on main.

Files Patch % Lines
guidance/bench/_powerlift.py 10.59% 194 Missing ⚠️
guidance/bench/_api.py 66.66% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #843      +/-   ##
==========================================
+ Coverage   59.50%   59.87%   +0.37%     
==========================================
  Files          59       63       +4     
  Lines        4334     4571     +237     
==========================================
+ Hits         2579     2737     +158     
- Misses       1755     1834      +79     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nopdive and others added 2 commits May 21, 2024 05:18
Uninstalls first, then installs specific version of
llama-cpp-python in case the bench tag has installed it already.
@nopdive nopdive marked this pull request as ready for review May 21, 2024 05:54
Set -y to uninstall to remove interactivity.
@Harsha-Nori
Copy link
Collaborator

LGTM

@Harsha-Nori Harsha-Nori merged commit 0f15e4b into guidance-ai:main May 21, 2024
121 checks passed
@nopdive nopdive deleted the bench branch May 21, 2024 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants