Bench #843

nopdive · 2024-05-21T02:01:19Z

First iteration of adding benchmarks to guidance.

Includes notebook and backing code in guidance.bench module for code reproducibility.
guidance is tested on LangChain's Chat Extract dataset. They've done solid work finding a problem with realistic structured JSON output that includes conditionals, nested fields and constraints when checking for JSON schema validation as well.

Dependencies are hidden behind an extra tag bench. This shouldn't impact standard installations.

Test coverage should be high, however I've skipped some tests here as the CI won't be able to run it without an API key to LangChain.

Code is structured to work across multiple GPU containers, but not fully integrated yet. Will have to work on guidance dockerfile later for that.

LMK if more details / changes needed.

Barebones module created with intended goals before implementation.

Added init and degenerate test for bench module.

Tests implemented for lib_bench_dir function for benchmarking.

Additional comments on env var use.

Langchain can now be downloaded when needed for powerlift integration.

Benchmarking notebook available for JSON output. Also minor changes to tests and API for benchmarking.

Implementation details now included for module docs.

Reformat and removal of function no longer needed for public facing API.

Rename to conform with rest of notebooks with underscores and lower casing.

Can now run benchmarks outside of notebook via tests or direct `bench` call.

Pandas still optional for setup via bench.

Incorrectly added bench to extra list earlier. This should fix it.

Adjusted so that huggingface hub is used to download LMs.

Missing comma within bench tags.

Added missing packages needed for bench. Args clean-up and path refactoring.

Default args.

Removal of no longer needed test.

Some tests need API keys for benchmarking. Disabled.

TODO comment removed for integration.

riedgar-ms · 2024-05-21T02:32:35Z

tests/bench/test_powerlift.py

+from guidance.bench._powerlift import retrieve_langchain
+from pathlib import Path
+
+def test_retrieve_langchain_err(monkeypatch):


monkeypatch is a fixture?

Yes, it's an in-built fixture within pytest.

… unit tests. Argument type fixed in `retrieve_langchain`. Added bench dependencies to plain unit tests.

Optional keywords are now without Optional top-level type. Variables that shadow previous types are adjusted. Ignore on powerlift module due to imports having no stubs / markers.

`lib_bench_dir` was not assigning correctly to env_lib_path.

No module named 'pkg_resources' is an error for CI so far. Added setuptools as dependency for bench to attempt to correct this.

Extra tag added for all tests for bench.

codecov-commenter · 2024-05-21T05:17:31Z

Codecov Report

Attention: Patch coverage is 16.87764% with 197 lines in your changes are missing coverage. Please review.

Project coverage is 59.87%. Comparing base (3377383) to head (00e3a8c).
Report is 1 commits behind head on main.

Files	Patch %	Lines
guidance/bench/_powerlift.py	10.59%	194 Missing ⚠️
guidance/bench/_api.py	66.66%	3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #843      +/-   ##
==========================================
+ Coverage   59.50%   59.87%   +0.37%     
==========================================
  Files          59       63       +4     
  Lines        4334     4571     +237     
==========================================
+ Hits         2579     2737     +158     
- Misses       1755     1834      +79

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Uninstalls first, then installs specific version of llama-cpp-python in case the bench tag has installed it already.

Set -y to uninstall to remove interactivity.

Harsha-Nori · 2024-05-21T08:03:17Z

LGTM

nopdive added 20 commits May 16, 2024 01:03

Adding bench module within guidance.

018556b

Barebones module created with intended goals before implementation.

Boilerplate test code for bench module.

aac3d8e

Added init and degenerate test for bench module.

Tests for util function lib_bench_dir.

5a45464

Tests implemented for lib_bench_dir function for benchmarking.

Doc clean-up for lib_bench_dir function.

e1d819c

Additional comments on env var use.

Added code to retrieve langchain chat extraction dataset.

73c0840

Langchain can now be downloaded when needed for powerlift integration.

Added benchmarking notebook for JSON and needed functions.

f7ad755

Benchmarking notebook available for JSON output. Also minor changes to tests and API for benchmarking.

Updated overview documentation for benchmarking module.

f491601

Implementation details now included for module docs.

Black reformat and dead code removal.

fca723e

Reformat and removal of function no longer needed for public facing API.

Rename of json output benchmark.

e76b050

Rename to conform with rest of notebooks with underscores and lower casing.

API defined and implemented for benchmarking.

3438461

Can now run benchmarks outside of notebook via tests or direct `bench` call.

Adjustments to API types so pandas is not needed.

7f6a662

Pandas still optional for setup via bench.

Adjustment to setup.py for benchmarking extra.

d624b08

Incorrectly added bench to extra list earlier. This should fix it.

Trial runner now downloads models from huggingface.

549989d

Adjusted so that huggingface hub is used to download LMs.

Fixed typo in setup.py for bench tag.

3826e7b

Missing comma within bench tags.

Additional dependencies for bench. Clean-up on bench module.

b92dad3

Added missing packages needed for bench. Args clean-up and path refactoring.

Black reformat on bench module.

13139c1

Default args.

Clean-up of benchmark tests.

6db891b

Removal of no longer needed test.

Merge branch 'main' into bench

e7a0673

Skip tests that cannot be run on CI.

9000efc

Some tests need API keys for benchmarking. Disabled.

Removal of TODO in benchmark module.

da0e086

TODO comment removed for integration.

riedgar-ms reviewed May 21, 2024

View reviewed changes

Fixed types retrieving langchain dataset. Added bench dependencies to…

86ac34d

… unit tests. Argument type fixed in `retrieve_langchain`. Added bench dependencies to plain unit tests.

nopdive marked this pull request as draft May 21, 2024 03:13

nopdive added 4 commits May 21, 2024 04:31

Various type adjustments to reduce mypy errors.

83c8345

Optional keywords are now without Optional top-level type. Variables that shadow previous types are adjusted. Ignore on powerlift module due to imports having no stubs / markers.

Fix for lib_bench_dir.

b7bdb80

`lib_bench_dir` was not assigning correctly to env_lib_path.

Additional dependency for setup.py on bench extra.

d7d76fd

No module named 'pkg_resources' is an error for CI so far. Added setuptools as dependency for bench to attempt to correct this.

Added benchmarking to all test runners.

246bc6d

Extra tag added for all tests for bench.

nopdive and others added 2 commits May 21, 2024 05:18

Adjustment to github workflow for llama-cpp-python install.

149752c

Uninstalls first, then installs specific version of llama-cpp-python in case the bench tag has installed it already.

Merge branch 'guidance-ai:main' into bench

dfd0660

nopdive added 2 commits May 21, 2024 05:23

Merge branch 'main' into bench

15fef70

Merge branch 'bench' of github.com:nopdive/guidance into bench

6943b01

nopdive marked this pull request as ready for review May 21, 2024 05:54

Fix for llama-cpp-python in CI.

00e3a8c

Set -y to uninstall to remove interactivity.

Harsha-Nori merged commit 0f15e4b into guidance-ai:main May 21, 2024
121 checks passed

nopdive deleted the bench branch May 21, 2024 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bench #843

Bench #843

nopdive commented May 21, 2024

riedgar-ms May 21, 2024

nopdive May 21, 2024

codecov-commenter commented May 21, 2024 •

edited

Loading

Harsha-Nori commented May 21, 2024

Bench #843

Bench #843

Conversation

nopdive commented May 21, 2024

riedgar-ms May 21, 2024

Choose a reason for hiding this comment

nopdive May 21, 2024

Choose a reason for hiding this comment

codecov-commenter commented May 21, 2024 • edited Loading

Codecov Report

Harsha-Nori commented May 21, 2024

codecov-commenter commented May 21, 2024 •

edited

Loading