Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(py): Rename CLI arg for pytest plugin from --output to --langsmith-output to avoid conflict #1482

Merged
merged 8 commits into from
Jan 31, 2025
10 changes: 9 additions & 1 deletion python/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# LangSmith Client SDK

Check notice on line 1 in python/README.md

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

........... create_5_000_run_trees: Mean +- std dev: 678 ms +- 63 ms ........... create_10_000_run_trees: Mean +- std dev: 1.35 sec +- 0.09 sec ........... create_20_000_run_trees: Mean +- std dev: 2.82 sec +- 0.09 sec ........... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 713 us +- 10 us ........... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 25.0 ms +- 0.4 ms ........... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 105 ms +- 3 ms ........... dumps_dataclass_nested_50x100: Mean +- std dev: 25.6 ms +- 0.3 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (18.7 ms) is 25% of the mean (75.9 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 75.9 ms +- 18.7 ms ........... dumps_pydanticv1_nested_50x100: Mean +- std dev: 197 ms +- 2 ms

Check notice on line 1 in python/README.md

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | dumps_pydanticv1_nested_50x100 | 220 ms | 197 ms: 1.12x faster | +-----------------------------------------------+----------+------------------------+ | create_5_000_run_trees | 684 ms | 678 ms: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 25.0 ms | 25.0 ms: 1.00x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 709 us | 713 us: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 104 ms | 105 ms: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | create_10_000_run_trees | 1.33 sec | 1.35 sec: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | dumps_dataclass_nested_50x100 | 25.2 ms | 25.6 ms: 1.02x slower | +-----------------------------------------------+----------+------------------------+ | create_20_000_run_trees | 2.70 sec | 2.82 sec: 1.04x slower | +-----------------------------------------------+----------+------------------------+ | dumps_pydantic_nested_50x100 | 67.0 ms | 75.9 ms: 1.13x slower | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.01x slower | +-----------------------------------------------+----------+------------------------+

Check notice on line 1 in python/README.md

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

........... create_5_000_run_trees: Mean +- std dev: 663 ms +- 50 ms ........... create_10_000_run_trees: Mean +- std dev: 1.38 sec +- 0.10 sec ........... create_20_000_run_trees: Mean +- std dev: 2.72 sec +- 0.15 sec ........... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 707 us +- 12 us ........... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 25.1 ms +- 0.3 ms ........... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 104 ms +- 2 ms ........... dumps_dataclass_nested_50x100: Mean +- std dev: 25.4 ms +- 0.3 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (18.6 ms) is 24% of the mean (77.0 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 77.0 ms +- 18.6 ms ........... dumps_pydanticv1_nested_50x100: Mean +- std dev: 200 ms +- 3 ms

Check notice on line 1 in python/README.md

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | dumps_pydanticv1_nested_50x100 | 220 ms | 200 ms: 1.10x faster | +-----------------------------------------------+----------+------------------------+ | create_5_000_run_trees | 684 ms | 663 ms: 1.03x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 104 ms | 104 ms: 1.00x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 709 us | 707 us: 1.00x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 25.0 ms | 25.1 ms: 1.00x slower | +-----------------------------------------------+----------+------------------------+ | create_20_000_run_trees | 2.70 sec | 2.72 sec: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | dumps_dataclass_nested_50x100 | 25.2 ms | 25.4 ms: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | create_10_000_run_trees | 1.33 sec | 1.38 sec: 1.03x slower | +-----------------------------------------------+----------+------------------------+ | dumps_pydantic_nested_50x100 | 67.0 ms | 77.0 ms: 1.15x slower | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.01x slower | +-----------------------------------------------+----------+------------------------+

[![Release Notes](https://img.shields.io/github/release/langchain-ai/langsmith-sdk?logo=python)](https://github.com/langchain-ai/langsmith-sdk/releases)
[![Python Downloads](https://static.pepy.tech/badge/langsmith/month)](https://pepy.tech/project/langsmith)
Expand Down Expand Up @@ -345,7 +345,7 @@
my_function("hello world")
```

# Instructor
## Instructor

We provide a convenient integration with [Instructor](https://jxnl.github.io/instructor/), largely by virtue of it essentially just using the OpenAI SDK.

Expand Down Expand Up @@ -417,6 +417,14 @@
my_function("Jason is 25 years old")
```

## Pytest Plugin

The LangSmith pytest plugin lets Python developers define their datasets and evaluations as pytest test cases.
See [online docs](https://docs.smith.langchain.com/evaluation/how_to_guides/pytest) for more information.

This plugin is installed as part of the LangSmith SDK, and is enabled by default.
See also official pytest docs: [How to install and use plugins](https://docs.pytest.org/en/stable/how-to/plugins.html)

## Additional Documentation

To learn more about the LangSmith platform, check out the [docs](https://docs.smith.langchain.com/docs/).
51 changes: 30 additions & 21 deletions python/langsmith/pytest_plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import importlib.util
import json
import logging
import os
import time
from collections import defaultdict
Expand All @@ -12,26 +13,33 @@
from langsmith import utils as ls_utils
from langsmith.testing._internal import test as ls_test

logger = logging.getLogger(__name__)


def pytest_addoption(parser):
"""Set CLI options for choosing output format."""
group = parser.getgroup("langsmith", "LangSmith")
group.addoption(
"--output",
action="store",
default="pytest",
choices=["langsmith", "ls", "pytest"],
help=(
"Choose output format: 'langsmith' | 'ls' "
"(rich custom LangSmith output) or 'pytest' "
"(standard pytest). Defaults to 'pytest'."
),
)
"""Set a boolean flag for LangSmith output.

Skip if --langsmith-output is already defined.
"""
try:
# Try to add the option, will raise if it already exists
group = parser.getgroup("langsmith", "LangSmith")
group.addoption(
"--langsmith-output",
action="store_true",
default=False,
help="Use LangSmith output (requires 'rich').",
)
except ValueError:
# Option already exists
logger.warning(
"LangSmith output flag cannot be added because it's already defined."
)


def _handle_output_args(args):
"""Handle output arguments."""
if any(opt in args for opt in ["--output=langsmith", "--output=ls"]):
if any(opt in args for opt in ["--langsmith-output"]):
# Only add --quiet if it's not already there
if not any(a in args for a in ["-q", "--quiet"]):
args.insert(0, "--quiet")
Expand Down Expand Up @@ -82,7 +90,7 @@ def pytest_report_teststatus(report, config):
"""Remove the short test-status character outputs ("./F")."""
# The hook normally returns a 3-tuple: (short_letter, verbose_word, color)
# By returning empty strings, the progress characters won't show.
if config.getoption("--output") in ("langsmith", "ls"):
if config.getoption("--langsmith-output"):
return "", "", ""


Expand Down Expand Up @@ -301,23 +309,24 @@ def pytest_configure(config):
config.addinivalue_line(
"markers", "langsmith: mark test to be tracked in LangSmith"
)
if config.getoption("--output") in ("langsmith", "ls"):
if config.getoption("--langsmith-output"):
if not importlib.util.find_spec("rich"):
msg = (
"Must have 'rich' installed to use --output='langsmith' | 'ls'. "
"Must have 'rich' installed to use --langsmith-output. "
"Please install with: `pip install -U 'langsmith[pytest]'`"
)
raise ValueError(msg)
if os.environ.get("PYTEST_XDIST_TESTRUNUID"):
msg = (
"--output='langsmith' | 'ls' not supported with pytest-xdist. "
"Please remove the '--output' option or '-n' option."
"--langsmith-output not supported with pytest-xdist. "
"Please remove the '--langsmith-output' option or '-n' option."
)
raise ValueError(msg)
if ls_utils.test_tracking_is_disabled():
msg = (
"--output='langsmith' | 'ls' not supported when env var"
"LANGSMITH_TEST_TRACKING='false'. Please remove the '--output' option "
"--langsmith-output not supported when env var"
"LANGSMITH_TEST_TRACKING='false'. Please remove the"
"'--langsmith-output' option "
"or enable test tracking."
)
raise ValueError(msg)
Expand Down
83 changes: 83 additions & 0 deletions python/tests/evaluation/test_decorator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import os

import pytest

from langsmith import testing as t


@pytest.mark.skipif(
not os.getenv("LANGSMITH_TRACING"),
reason="LANGSMITH_TRACING environment variable not set",
)
@pytest.mark.langsmith
@pytest.mark.parametrize("c", list(range(10)))
async def test_addition_single(c):
x = 3
y = 4
t.log_inputs({"x": x, "y": y, "c": c})

expected = 7 + c
t.log_reference_outputs({"sum": expected})

actual = x + y + c
t.log_outputs({"sum": actual})

t.log_feedback(key="foo", score=1)

assert actual == expected


async def my_app():
return "hello"


@pytest.mark.skipif(
not os.getenv("LANGSMITH_TRACING"),
reason="LANGSMITH_TRACING environment variable not set",
)
@pytest.mark.langsmith
async def test_openai_says_hello():
# Traced code will be included in the test case
text = "Say hello!"
response = await my_app()
t.log_inputs({"text": text})
t.log_outputs({"response": response})
t.log_reference_outputs({"response": "hello!"})

# Use this context manager to trace any steps used for generating evaluation
# feedback separately from the main application logic
with t.trace_feedback():
grade = 1 if "hello" in response else 0
t.log_feedback(key="llm_judge", score=grade)

assert "hello" in response.lower()


@pytest.mark.skipif(
not os.getenv("LANGSMITH_TRACING"),
reason="LANGSMITH_TRACING environment variable not set",
)
@pytest.mark.xfail(reason="Test failure output case")
@pytest.mark.langsmith(output_keys=["expected"])
@pytest.mark.parametrize(
"a, b, expected",
[
(1, 2, 3),
(3, 4, 7),
],
)
async def test_addition_parametrized(a: int, b: int, expected: int):
t.log_outputs({"sum": a + b})
assert a + b != expected


@pytest.mark.skipif(
not os.getenv("LANGSMITH_TRACING"),
reason="LANGSMITH_TRACING environment variable not set",
)
@pytest.mark.langsmith
@pytest.mark.parametrize("a,b", [[i, i] for i in range(20)])
def test_param(a, b):
t.log_outputs({"sum": a + b})
t.log_reference_outputs({"sum": a + b})
assert a + b == a + b
Loading