-
Notifications
You must be signed in to change notification settings - Fork 319
LLM Obs SDK evaluation metrics submission #8688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi! 👋 Looks like you updated a Git Submodule.
|
d7d8e6a to
53386c1
Compare
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 47 metrics, 6 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~a8ef30deee, baseline=1.51.0-SNAPSHOT~860a603678
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (995.804 ms) : 0, 995804
Total [baseline] (8.599 s) : 0, 8598983
Agent [candidate] (995.505 ms) : 0, 995505
Total [candidate] (8.575 s) : 0, 8575123
section iast
Agent [baseline] (1.135 s) : 0, 1135314
Total [baseline] (9.33 s) : 0, 9329655
Agent [candidate] (1.132 s) : 0, 1132229
Total [candidate] (9.287 s) : 0, 9287133
gantt
title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~a8ef30deee, baseline=1.51.0-SNAPSHOT~860a603678
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (687.09 ms) : 0, 687090
BytebuddyAgent [candidate] (687.494 ms) : 0, 687494
GlobalTracer [baseline] (242.328 ms) : 0, 242328
GlobalTracer [candidate] (242.103 ms) : 0, 242103
AppSec [baseline] (30.506 ms) : 0, 30506
AppSec [candidate] (30.115 ms) : 0, 30115
Debugger [baseline] (6.084 ms) : 0, 6084
Debugger [candidate] (6.056 ms) : 0, 6056
Remote Config [baseline] (674.963 µs) : 0, 675
Remote Config [candidate] (684.763 µs) : 0, 685
Telemetry [baseline] (8.275 ms) : 0, 8275
Telemetry [candidate] (8.24 ms) : 0, 8240
section iast
BytebuddyAgent [baseline] (809.561 ms) : 0, 809561
BytebuddyAgent [candidate] (807.593 ms) : 0, 807593
GlobalTracer [baseline] (232.674 ms) : 0, 232674
GlobalTracer [candidate] (231.726 ms) : 0, 231726
IAST [baseline] (29.642 ms) : 0, 29642
IAST [candidate] (24.667 ms) : 0, 24667
AppSec [baseline] (28.102 ms) : 0, 28102
AppSec [candidate] (32.985 ms) : 0, 32985
Debugger [baseline] (5.913 ms) : 0, 5913
Debugger [candidate] (5.826 ms) : 0, 5826
Remote Config [baseline] (585.206 µs) : 0, 585
Remote Config [candidate] (582.477 µs) : 0, 582
Telemetry [baseline] (8.049 ms) : 0, 8049
Telemetry [candidate] (7.988 ms) : 0, 7988
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~a8ef30deee, baseline=1.51.0-SNAPSHOT~860a603678
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (996.676 ms) : 0, 996676
Total [baseline] (10.721 s) : 0, 10720882
Agent [candidate] (995.911 ms) : 0, 995911
Total [candidate] (10.622 s) : 0, 10621746
section appsec
Agent [baseline] (1.178 s) : 0, 1177512
Total [baseline] (10.755 s) : 0, 10755346
Agent [candidate] (1.178 s) : 0, 1177850
Total [candidate] (10.799 s) : 0, 10799023
section iast
Agent [baseline] (1.132 s) : 0, 1132131
Total [baseline] (10.875 s) : 0, 10875425
Agent [candidate] (1.134 s) : 0, 1133900
Total [candidate] (10.829 s) : 0, 10829108
section profiling
Agent [baseline] (1.246 s) : 0, 1245905
Total [baseline] (10.921 s) : 0, 10921370
Agent [candidate] (1.253 s) : 0, 1253041
Total [candidate] (11.04 s) : 0, 11039709
gantt
title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~a8ef30deee, baseline=1.51.0-SNAPSHOT~860a603678
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (688.53 ms) : 0, 688530
BytebuddyAgent [candidate] (687.669 ms) : 0, 687669
GlobalTracer [baseline] (242.07 ms) : 0, 242070
GlobalTracer [candidate] (242.302 ms) : 0, 242302
AppSec [baseline] (30.206 ms) : 0, 30206
AppSec [candidate] (30.212 ms) : 0, 30212
Debugger [baseline] (6.114 ms) : 0, 6114
Debugger [candidate] (6.074 ms) : 0, 6074
Remote Config [baseline] (671.864 µs) : 0, 672
Remote Config [candidate] (682.772 µs) : 0, 683
Telemetry [baseline] (8.234 ms) : 0, 8234
Telemetry [candidate] (8.189 ms) : 0, 8189
section appsec
BytebuddyAgent [baseline] (711.779 ms) : 0, 711779
BytebuddyAgent [candidate] (711.799 ms) : 0, 711799
GlobalTracer [baseline] (235.588 ms) : 0, 235588
GlobalTracer [candidate] (235.809 ms) : 0, 235809
AppSec [baseline] (171.47 ms) : 0, 171470
AppSec [candidate] (171.733 ms) : 0, 171733
Debugger [baseline] (5.762 ms) : 0, 5762
Debugger [candidate] (5.765 ms) : 0, 5765
Remote Config [baseline] (586.404 µs) : 0, 586
Remote Config [candidate] (602.991 µs) : 0, 603
Telemetry [baseline] (8.056 ms) : 0, 8056
Telemetry [candidate] (8.046 ms) : 0, 8046
IAST [baseline] (23.414 ms) : 0, 23414
IAST [candidate] (23.21 ms) : 0, 23210
section iast
BytebuddyAgent [baseline] (807.359 ms) : 0, 807359
BytebuddyAgent [candidate] (808.083 ms) : 0, 808083
GlobalTracer [baseline] (232.082 ms) : 0, 232082
GlobalTracer [candidate] (232.59 ms) : 0, 232590
AppSec [baseline] (29.697 ms) : 0, 29697
AppSec [candidate] (31.631 ms) : 0, 31631
Debugger [baseline] (5.822 ms) : 0, 5822
Debugger [candidate] (5.824 ms) : 0, 5824
Remote Config [baseline] (570.956 µs) : 0, 571
Remote Config [candidate] (575.431 µs) : 0, 575
Telemetry [baseline] (7.94 ms) : 0, 7940
Telemetry [candidate] (7.966 ms) : 0, 7966
IAST [baseline] (27.807 ms) : 0, 27807
IAST [candidate] (26.347 ms) : 0, 26347
section profiling
ProfilingAgent [baseline] (104.474 ms) : 0, 104474
ProfilingAgent [candidate] (103.989 ms) : 0, 103989
BytebuddyAgent [baseline] (677.959 ms) : 0, 677959
BytebuddyAgent [candidate] (683.174 ms) : 0, 683174
GlobalTracer [baseline] (361.861 ms) : 0, 361861
GlobalTracer [candidate] (363.751 ms) : 0, 363751
AppSec [baseline] (32.257 ms) : 0, 32257
AppSec [candidate] (30.998 ms) : 0, 30998
Debugger [baseline] (12.231 ms) : 0, 12231
Debugger [candidate] (13.546 ms) : 0, 13546
Remote Config [baseline] (660.292 µs) : 0, 660
Remote Config [candidate] (661.079 µs) : 0, 661
Telemetry [baseline] (7.974 ms) : 0, 7974
Telemetry [candidate] (8.057 ms) : 0, 8057
Profiling [baseline] (104.499 ms) : 0, 104499
Profiling [candidate] (104.013 ms) : 0, 104013
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 2 performance regressions! Performance is the same for 9 metrics, 12 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~a8ef30deee, baseline=1.51.0-SNAPSHOT~860a603678
dateFormat X
axisFormat %s
section baseline
no_agent (4.427 ms) : 4377, 4477
. : milestone, 4427,
iast (8.906 ms) : 8754, 9057
. : milestone, 8906,
iast_FULL (13.867 ms) : 13591, 14144
. : milestone, 13867,
iast_GLOBAL (10.207 ms) : 10007, 10407
. : milestone, 10207,
profiling (8.487 ms) : 8350, 8624
. : milestone, 8487,
tracing (7.609 ms) : 7494, 7724
. : milestone, 7609,
section candidate
no_agent (4.376 ms) : 4318, 4433
. : milestone, 4376,
iast (8.867 ms) : 8726, 9009
. : milestone, 8867,
iast_FULL (13.763 ms) : 13492, 14034
. : milestone, 13763,
iast_GLOBAL (10.82 ms) : 10630, 11010
. : milestone, 10820,
profiling (8.84 ms) : 8698, 8981
. : milestone, 8840,
tracing (7.416 ms) : 7312, 7520
. : milestone, 7416,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~a8ef30deee, baseline=1.51.0-SNAPSHOT~860a603678
dateFormat X
axisFormat %s
section baseline
no_agent (37.319 ms) : 37021, 37617
. : milestone, 37319,
appsec (49.515 ms) : 49071, 49958
. : milestone, 49515,
code_origins (44.428 ms) : 44064, 44793
. : milestone, 44428,
iast (45.498 ms) : 45097, 45899
. : milestone, 45498,
profiling (50.442 ms) : 49942, 50942
. : milestone, 50442,
tracing (44.9 ms) : 44515, 45285
. : milestone, 44900,
section candidate
no_agent (38.248 ms) : 37936, 38561
. : milestone, 38248,
appsec (49.418 ms) : 48982, 49854
. : milestone, 49418,
code_origins (44.793 ms) : 44412, 45174
. : milestone, 44793,
iast (45.201 ms) : 44794, 45609
. : milestone, 45201,
profiling (46.931 ms) : 46496, 47366
. : milestone, 46931,
tracing (44.819 ms) : 44443, 45194
. : milestone, 44819,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~a8ef30deee, baseline=1.51.0-SNAPSHOT~860a603678
dateFormat X
axisFormat %s
section baseline
no_agent (14.915 s) : 14915000, 14915000
. : milestone, 14915000,
appsec (14.774 s) : 14774000, 14774000
. : milestone, 14774000,
iast (18.698 s) : 18698000, 18698000
. : milestone, 18698000,
iast_GLOBAL (17.829 s) : 17829000, 17829000
. : milestone, 17829000,
profiling (15.702 s) : 15702000, 15702000
. : milestone, 15702000,
tracing (14.612 s) : 14612000, 14612000
. : milestone, 14612000,
section candidate
no_agent (15.374 s) : 15374000, 15374000
. : milestone, 15374000,
appsec (14.682 s) : 14682000, 14682000
. : milestone, 14682000,
iast (18.242 s) : 18242000, 18242000
. : milestone, 18242000,
iast_GLOBAL (17.726 s) : 17726000, 17726000
. : milestone, 17726000,
profiling (15.129 s) : 15129000, 15129000
. : milestone, 15129000,
tracing (14.964 s) : 14964000, 14964000
. : milestone, 14964000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~a8ef30deee, baseline=1.51.0-SNAPSHOT~860a603678
dateFormat X
axisFormat %s
section baseline
no_agent (1.475 ms) : 1464, 1487
. : milestone, 1475,
appsec (2.41 ms) : 2360, 2460
. : milestone, 2410,
iast (2.197 ms) : 2134, 2259
. : milestone, 2197,
iast_GLOBAL (2.231 ms) : 2168, 2294
. : milestone, 2231,
profiling (2.032 ms) : 1982, 2082
. : milestone, 2032,
tracing (2.01 ms) : 1961, 2059
. : milestone, 2010,
section candidate
no_agent (1.475 ms) : 1464, 1487
. : milestone, 1475,
appsec (2.398 ms) : 2348, 2447
. : milestone, 2398,
iast (2.178 ms) : 2116, 2241
. : milestone, 2178,
iast_GLOBAL (2.236 ms) : 2173, 2299
. : milestone, 2236,
profiling (2.036 ms) : 1986, 2086
. : milestone, 2036,
tracing (2.007 ms) : 1959, 2055
. : milestone, 2007,
|
53386c1 to
573416d
Compare
040efd0 to
f78b2c5
Compare
573416d to
bb6d246
Compare
bb6d246 to
a74e456
Compare
…g provided in the override
* add APIs for llm obs * add llm message class to support llm spans * add llm message class to support llm spans * impl llmobs agent and llmobs apis * support llm messages with tool calls * handle default model name and provider * rm unneeded file * impl llmobs agent and llmobs apis * impl llmobs agent * working writer * add support for llm message and tool calls * impl llmobs agent and llmobs apis * use new ctx api to track parent span * add api for evals * working impl supporting both agentless and agent * handle null tags and default to default ml app if null or empty string provided in the override * cleaned up whitespace * resolve merge conflicts * remaining merge conflicts * fix bad method call * fixed llmobs intake creation if llmobs not enabled * removed print statements * ran spotless * ran spotless * added tests for llmobsspanmapper * fixed coverage for tags --------- Co-authored-by: Nayeem Kamal <nayeem.kamal@datadoghq.com> Co-authored-by: Nayeem Kamal <kamal.nayeem12@gmail.com>
* add APIs for llm obs sdk (#8135) * add APIs for llm obs * add llm message class to support llm spans * follow java convention of naming Id instead of ID * add codeowners * implement LLM Obs SDK spans APIs (#8390) * add APIs for llm obs * add llm message class to support llm spans * add llm message class to support llm spans * impl llmobs agent and llmobs apis * support llm messages with tool calls * handle default model name and provider * rm unneeded file * spotless * add APIs for llm obs sdk (#8135) * add APIs for llm obs * add llm message class to support llm spans * follow java convention of naming Id instead of ID * add codeowners * rename ID to Id according to java naming conventions * Undo change to integrations-core submodule * fix build gradle * rm empty line * fix test * LLM Obs SDK Mapper (#8372) * add APIs for llm obs * add llm message class to support llm spans * add llm message class to support llm spans * impl llmobs agent and llmobs apis * support llm messages with tool calls * handle default model name and provider * rm unneeded file * impl llmobs agent and llmobs apis * impl llmobs agent * working writer * add support for llm message and tool calls * cleaned up whitespace * resolve merge conflicts * remaining merge conflicts * fix bad method call * fixed llmobs intake creation if llmobs not enabled * removed print statements * added tests for llmobsspanmapper * fixed coverage for tags --------- Co-authored-by: Nayeem Kamal <nayeem.kamal@datadoghq.com> * updated to master submodule * LLM Obs SDK use context API for parent children span linkage (#8711) * add APIs for llm obs * add llm message class to support llm spans * add llm message class to support llm spans * impl llmobs agent and llmobs apis * support llm messages with tool calls * handle default model name and provider * rm unneeded file * impl llmobs agent and llmobs apis * impl llmobs agent * working writer * add support for llm message and tool calls * impl llmobs agent and llmobs apis * use new ctx api to track parent span * cleaned up whitespace * resolve merge conflicts * remaining merge conflicts * fix bad method call * fixed llmobs intake creation if llmobs not enabled * removed print statements * ran spotless * added tests for llmobsspanmapper * fixed coverage for tags --------- Co-authored-by: Nayeem Kamal <nayeem.kamal@datadoghq.com> Co-authored-by: Nayeem Kamal <kamal.nayeem12@gmail.com> * LLM Obs SDK evaluation metrics submission (#8688) * add APIs for llm obs * add llm message class to support llm spans * add llm message class to support llm spans * impl llmobs agent and llmobs apis * support llm messages with tool calls * handle default model name and provider * rm unneeded file * impl llmobs agent and llmobs apis * impl llmobs agent * working writer * add support for llm message and tool calls * impl llmobs agent and llmobs apis * use new ctx api to track parent span * add api for evals * working impl supporting both agentless and agent * handle null tags and default to default ml app if null or empty string provided in the override * cleaned up whitespace * resolve merge conflicts * remaining merge conflicts * fix bad method call * fixed llmobs intake creation if llmobs not enabled * removed print statements * ran spotless * ran spotless * added tests for llmobsspanmapper * fixed coverage for tags --------- Co-authored-by: Nayeem Kamal <nayeem.kamal@datadoghq.com> Co-authored-by: Nayeem Kamal <kamal.nayeem12@gmail.com> * fix CODEOWNERS --------- Co-authored-by: Nayeem Kamal <nayeem.kamal@datadoghq.com> Co-authored-by: Nayeem Kamal <kamal.nayeem12@gmail.com>
What Does This Do
Motivation
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any usefull labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]