Skip to content

Commit

Permalink
Add strong response types for vertex evaluators (#2)
Browse files Browse the repository at this point in the history
* Fix model routing (#161)

* [UI] Add new span tree + viewer to Flow details page (#164)

* Fetch models from API (#174)

* Backend errors (#163)

Display errors in the Prompt Playground component after receiving issues from backend

* [UI] Cleanup unimplemented pages from navbar (#180)

* [UI] Increase max-height of flow input/output (#179)

Also update styles for running + error statues in output box.

* Move flow runner to Actions page (#176)

* [UI] Fix overflow of execution span tree (#183)

* Input validation disables prompt run button (#182)

Input validation for prompt playground

* Route playground from flows to action runners page (#191)

* Switch temperature to the slider (#195)

* Show validation errors on the playground (#196)

* [UI] Revamp flow details page layout (#197)

* Fix validator issues (#194)

* [UI] Initial design of span details view (#199)

* Move flow runner to start from action-list instead of action-runner (#200)

* Add vertex-ai to the model playground (#201)

Also add icons for all known action types

* [UI] Hide input/output pre if none available (#204)

* [UI] Add "muted" helper class for secondary text (#206)

* Don't send blank stop sequences to the model, vertex gemini model doesn't like it (#217)

* Provider specific model param restrictions on input (#224)

* Use the minfied version of Monaco Editor in the angular app (#242)

* [UI] Update app name

* [UI] Update flow details layout (#246)

Also adds new `<expand-text>` shared component which adds a button to show text in a larger pop-up dialog.

* [UI] Add callout component (#244)

* [UI] Hide wrapper spans on details page (#254)

* [UI] Update flow durations on details page (#256)

* [UI] Show error on flow details page (#258)

* Playground load trace (#262)

* Code cleanup
* Load playground from a trace

* Add theme toggling for JSON editor and move schema to a tab next to the editor (#245)

* Give topP the slider treatment (#264)

It's only right, now that we've done temp. :-)

* [UI] Show flow name in tree (#266)

* [UI] Show span state in details pane (#268)

* [UI] Flows table style improvements (#269)

* [UI] Small flow details page improvements for narrow screens (#273)

* Add CustomOptions (#276)

Also, add stop sequences to the request.

* [UI]Remove sample calls for unsupported actions. Small fixes in flow runner. (#275)

* Create Message sub component for ModelPlayground (#271)

#148

* Fix error with model not accepting request_format (#279)

* Disable the minimap on the monaco editor (#286)

* [UI] Add zero state for flows list page (#291)

* [UI] Fix ng error in flow runner (#297)

* [UI] Hide stream response checkbox for durable flows (#299)

* Integrating the Message component into the Prompt Playground

* Switch model select from native to mat-select (#306)

* Ability to show errors on actions page (#307)

* [UI] Revamp Actions list UI (#308)

* [UI] Remove unnecessary return (#309)

* [UI] Prevent selecting action if no param is set (#310)

* Enable support for multiple messages coming from traceId (#314)

* Avoid making flow runner editors read only (#321)

* [UI] Add filtering and expand/collapse all to actions list (#319)

* Fix error where model selection does not update (#323)

* [UI] Fix action search input style (#325)

* [UI] Update action list name and key display (#328)

* User error callout component on model playground (#330)

* refactor the code around checking for json output support (#304)

* Render images in chat (#340)

* Functioning add and remove button (#335)

* Refactor criteria/validation logic out of playground component (#339)

* [UI] Flow runner UI polish + improvements (#343)

* Move JSON editor to shared components since retriever playground also needs it (#344)

* [UI] Small handful of UI nit fixes (#345)

* [UI] Add loading state to flows table (#349)

* Do not load output from trace; typically we're interested in loading up the inputs, and re-running to get the output (#347)

* Make response_format optional (#350)

* [UI] Add Genkit icon (#371)

* Reset streamed chunks when rerunning the streamed flow (#379)

* [UI] Add tooltips to span state icons (#351)

* Prefer includes over contains (#376)

Contains causes a `TypeError: _i.contains is not a function` when running evals.

* [UI] Add inspect flow state button if flow errors (#382)

* Chat mode (#391)

* Ability to open Flow runner from the trace view (#394)

* Add basics of the eval runner page (#367)

* initial ui changes

* formatted

* Add mocked evals page

* Unnest runs

* Remove evaluations tab from appbar

* [UI] Fix flow details sidebar colors in dark mode (#399)

* [UI] Revamp model playground to chat-based layout (#397)

* [UI] Flow runner: Add a callout for no output so we dont show empty response boxes (#403)

* [UI] Add trace details view (#405)

* role:system message allowed for models (#402)

* Adds support for image models. (#426)

* fix playground runner after runAction change (#429)

* Revert "fix playground runner after runAction change (#429)" (#431)

This reverts commit 82264c0777dd47b0835dda01362a902298ec044b.

* Small tweaks to model playground to reduce chat (#438)

input clutter

* [UI] Update `stackTraceSpans` to filter out internal spans (#439)

* [UI] Add traces table to inspect index page (#448)

* Adding traces to Messages (#432)

* [UI] Update routing for inspect pages (#449)

* [UI] Update routing for run pages (#450)

* [UI] Fix trace display name in table (#451)

* Allow size to be optional (#452)

Model returns error otherwise: 400 None is not of type 'string' - 'size'

* [UI] Fix trace deep links in model playground (#453)

* [UI] Add raw mat-table for evals view (#430)

* initial ui changes

* formatted

* Add mocked evals page

* Add mocked table prelim

* tests

* Use EvalResult for now

* feedback changes

* Add embeddings models (#303)

* [UI] Update /evaluations route to /evaluate (#454)

Matches other verb-based top-level routes.

* [UI] Make all run buttons consistent in playgrounds (#455)

* [UI] Add cmd/ctrl + enter shortcut to playground editors (#456)

* [UI] Add landing state for Run page (#465)

* [UI] Prevent mat-slider from shrinking (#473)

* [UI] Adjust element widths for narrow browsers (#474)

* [UI] Prevent welcome page flicker on action refresh (#475)

* Add tab for Auth input to Flow Runner action (#467)

* [UI] Add JSON sample to flow runner (#479)

* Generic action runner (#484)

* [UI] Add support for tool primitive on dev UI run page (#488)

* [UI] Tighten up spacing of actions list items (#489)

* [UI] Trigger change detection on flow runner response (#486)

* [UI] Add cmd/ctrl + enter shortcut to model playground (#485)

* [UI] Update eval results UI to use expandable cards for results (#491)

* [UI] Prevent scrolling past last line in monaco editor (#495)

* [UI] Use helper class to style pre stacktrace in callout (#502)

* [UI]Evals UI: Update inputs to use a table format (#496)

* [UI] Model playground message styling polish (#515)

* [UI] Fix json editor to ignore initial value if no schema (#517)

* [UI] Set retriever name in playground header (#518)

* [UI] Prevent JSON sample pre-fill if unnecessary (#520)

* Remove fdescribe in tests (#532)

* Fix minor UI elements in eval page (#533)

* WIP Eval UI changes

* Clean scss

* simplify name getter

* trigger checks again

* undo

* Add inspect trace option (#540)

* WIP Eval UI changes

* Clean scss

* WIP add inspect button

* Add inspect button

* Add inspect button

* remove target

* Use links instead of button

* remove unused dep

* Add inspect tab in the Dev UI (#546)

* WIP Eval UI changes

* Clean scss

* WIP add inspect button

* Add inspect button

* Add inspect button

* remove target

* Use links instead of button

* remove unused dep

* Add evaluation tab

* Update messaging

* hide inspect button if no traces (#548)

* [UI] Add typewriter effect to welcoem message (#554)

- Also include missing Google Sans fonts

* [UI] Tweak logo kerning (#555)

* [UI] UI polish for evaluate page (#553)

* [UI] Fix issue in action runner JSON pre-fill (#559)

* [UI] Update typewriter animation to move left-to-right (#560)

* [UI] Show custom metadata attributes last in span details (#563)

- Also move span duration logic to shared util function and show seconds if > 1000ms.

* [UI] Polish for eval result details pane (#564)

* Add support for text-embeddings (#538)

* [UI] Update default font to Google Sans (#565)

* [UI] Update span attributes styling (#568)

* [UI] Update border radius globally (#573)

* [UI] Clip model playground message loading bar to card radius (#576)

* [UI] Prevent shrinkage of breadcrumb chevron (#577)

* [UI] Upgrade angular deps to ^17.3.1 (#587)

* [UI] Add logo lockup to app bar (#588)

* [UI] Fix table not rendering for errored traces (#607)

* [UI] Render base64-encoded images in span output (#606)

* [UI] Update label of expand text button (#608)

* [UI] Update lockup with new svg asset (#623)

* [Eval bugbash] Update tooltip to definitions, visible on entire chip (#624)

* Update tooltip to definitions, visible on entire chip

* typos

* [Eval bugbash]  Show errors as errors in eval UI (#626)

* Update tooltip to definitions, visible on entire chip

* typos

* Mark errors as errors

* use ngIf

* Add TODO

* [Eval bugbash] Only show icon if failed evaluator (#635)

* Update tooltip to definitions, visible on entire chip

* typos

* WIP icons

* Remove unused

* [UI] Fix trace timing display now that they are millis (#638)

* [UI] Fix JSON editor to show up for optional inputs as well (#613)

* Add trace id to model playground when error occurs (#631)

* Display context strings separately instead of a big array (#658)

* [UI]: Update date format to medium (#659)

* Update error tooltip (#665)

* Update error tooltip

* typos

* Show error message if available

* [UI] Tighten up kerning on mat tab labels (#680)

* [UI] Allow resizing of .pre-container and json editor (#682)

* [UI] Add tooltips to temperature and top_p controls (#683)

* [UI] Fix JSON sample autofill in retriever playground (#684)

* [UI] Improve model playground param labels and add tooltips (#686)

* [UI] Fix trace status in table (#687)

* [UI] Update model icon to sparks (#688)

* [UI] Add action type to runner page title (#690)

* [UI] Add title and close button to expand text dialog (#691)

* [UI] Remove redundant title from action runner (#692)

* Pass thru options to API (#695)

* Bump ragas to 0.0.6 (#719)

* [UI] Cleanup system prompt styling in model playground (#725)

* Update system/message placeholders (#727)

* Update placeholders

* Update message.component.ts

* Update Eval Error handling (#685)

* Clarifying label on button formerly known as "Open in Playground" (#636)

- Label now says 'Open in flow runner', 'Open in model runner', etc.
  to make it more clear which step will be run.
- Changing to secondary style button to make it look less like
  the action will be run immediately.

* [UI] Fix callout content not stretching to fit width (#757)

* [UI]: Add metrics table in evals results card (#747)

* [UI] Add support for specifying model version in playground (#760)

* [UI] Remove Evaluate tab in top nav bar (#765)

* [UI] Use flask icon for Evaluate tab (#772)

* [UI] Style updates to eval result details (#790)

* [UI] Render eval metric name in error callout consistently (#792)

* [UI] Fix span duration display (#797)

* Show safety errors in the model runner (#800)

* Rename model playground => runner (#803)

* Rename retriever playground => runner (#805)

* [UI] Adjust metrics table to be full-width (#810)

* [UI] Only show eval zero state when loaded (#811)

Prevents a quick distracting flash of the zero state when the page loads.

* [UI] Set All traces as default in Inspect view (#812)

* [UI] ThemeToggleService unit tests (#816)

* [UI] Make spans deep-linkable in trace + flow details views (#819)

* [UI] Update model runner title to use selected model in config (#822)

* [UI] Clear out images from data-rendered upon receiving new input (#840)

* [UI] Hide append mode for models that do not support multiturn (#847)

* [UI] Show banner for unsupported models (#848)

* [UI] Reset scroll position of input/output when switching spans (#852)

* [UI] Hide "Add message" if model does not support multiturn (#853)

* Fix missed version 0.5.0-rc.1 (#858)

* [UI] Fix display of system prompt (#860)

* [UI] Fix tools icon (#862)

* [UI] Prevent stuck browser back when redirecting to first evaluation run (#13)

* [UI] Add missing app text color style (#16)

* [UI] Apply theme to scrollbars (#20)

* [UI] Clarify ID in flows/traces tables (#23)

* [UI] Show flow error in trace details view, if applicable (#28)

* [UI] Fix eval zero state callout spacing (#24)

* Export textEmbedding (#36)

* [UI] Update README doc with up-to-date instructions (#50)

* [UI] Create skeleton prompt runner component (#54)

Will serve as a base for prompt-specific runner features that we will add.

* [UI] Add icon to all view trace buttons (#57)

* [UI] Show template in prompt runner next to input (#58)

* [UI] Use button toggle group for inspect table filter (#56)

* [UI] Update play icon for run/dispatch span states (#60)

* More sensible default model params (#65)

* Always clear message when not in chat mode - otherwise if an error is shown, we'll still see the previous message. (#67)

* [UI] Show raw prompt template in modal (#70)

* Nesting user input in prompt runner (#72)

* [UI] Add support for prompt variants (#74)

* Allow system role for Gemini 1.5 Pro (#85)

Also removes references to OpenAI from UI.

* Create modular component for a multi-modal message (#83)

* Update faithfulness to v0.1.7 (#87)

* Update faithfulness to v0.1.7

* Update METADATA

* [UI] Add prompt variant to query params to support deep-linking (#88)

* [UI] Fix race condition when setting content in monaco (#96)

* [UI] Small visual fix in app nav bar (#98)

* [UI] Fix incorrect height for modal runner header (#101)

* [UI] Update placeholder label for model version select (#100)

* Message list component (#84)

Co-authored-by: Chris Chestnut <cchestnut@google.com>
Co-authored-by: Michael Doyle <michaeldoyle@google.com>

* [UI] Fix view evaluation report button to read correct metdata (#119)

* [UI] Save action sidebar expansion state to `localStorage` (#120)

* [UI]: Move model config params to a separate component (#103)

* [UI] Update model runner to use the new model config component (#124)

* [UI] Pull the new defaults for model config into the new config component (#125)

* [UI] Add ability to export prompt file from model runner (#115)

* [UI] Fix model versions not being loaded on initial render (#131)

Fixes google/genkit#130. This is more of a stop-gap fix, going to explore refactoring these components to utilize Angular signals to eliminate this class of error entirely.

* Integrate the new MessageList component into the ModelRunner (#114)

* [UI] Refactor model-config to use signals (#133)

* Create placeholder for system prompt and first user message (#144)

* [UI] Remove oops from model config template (#143)

* Ensure selected model is set when using left nav (#148)

* [UI] Prevent button icons from flex-shrinking (#151)

* Show large multimedia in a modal (#156)

* Enable all image types in model runner (#160)

* Re-enable gemini vision models (#168)

* [UI] Remove system prompt for single-turn models (#169)

* Set a reasonable (but arbitrary) number of media files per message (#172)

* [UI] Remove obsolete MONACO_PATH provider (unused) (#182)

* [UI] Sort eval metrics for consistent/comparable viewing (#209)

Fixes #207.

* change action latency name (#200)

Change the name of the action latency histogram from
"genkit.action.action_latency" to "genkit.action.latency"
to avoid stutter.

* Add strong response types for vertex evaluators

---------

Co-authored-by: Michael Doyle <michaeldoyle@google.com>
Co-authored-by: Anthony Barone <abarone@google.com>
Co-authored-by: MaesterChestnut <40321652+MaesterChestnut@users.noreply.github.com>
Co-authored-by: shrutip90 <shruti.p90@gmail.com>
Co-authored-by: Pavel Jbanov <pavelj@google.com>
Co-authored-by: Anthony Barone <tonybaroneee@gmail.com>
Co-authored-by: huangjeff5 <64040981+huangjeff5@users.noreply.github.com>
Co-authored-by: ssbushi <66321939+ssbushi@users.noreply.github.com>
Co-authored-by: Michael Bleigh <mbleigh@mbleigh.com>
Co-authored-by: Max Lord <maxlord@google.com>
Co-authored-by: Michael Doyle <michael.james.doyle@gmail.com>
Co-authored-by: Chris Chestnut <cchestnut@google.com>
Co-authored-by: Jonathan Amsterdam <jba@users.noreply.github.com>
  • Loading branch information
14 people authored May 2, 2024
1 parent ca23579 commit 99cb078
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 12 deletions.
41 changes: 35 additions & 6 deletions js/plugins/vertexai/src/evaluation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import { BaseDataPoint } from '@genkit-ai/ai/evaluator';
import { Action } from '@genkit-ai/core';
import { GoogleAuth } from 'google-auth-library';
import { JSONClient } from 'google-auth-library/build/src/auth/googleauth';
import z from 'zod';
import { EvaluatorFactory } from './evaluator_factory';

/**
Expand Down Expand Up @@ -58,10 +59,6 @@ export function vertexEvaluators(
const metricType = isConfig(metric) ? metric.type : metric;
const metricSpec = isConfig(metric) ? metric.metricSpec : {};

console.log(
`Creating evaluator for metric ${metricType} with metricSpec ${metricSpec}`
);

switch (metricType) {
case VertexAIEvaluationMetricType.BLEU: {
return createBleuEvaluator(factory, metricSpec);
Expand All @@ -85,6 +82,12 @@ function isConfig(
return (config as VertexAIEvaluationMetricConfig).type !== undefined;
}

const BleuResponseSchema = z.object({
bleuResults: z.object({
bleuMetricValues: z.array(z.object({ score: z.number() })),
}),
});

// TODO: Add support for batch inputs
function createBleuEvaluator(
factory: EvaluatorFactory,
Expand All @@ -96,6 +99,7 @@ function createBleuEvaluator(
displayName: 'BLEU',
definition:
'Computes the BLEU score by comparing the output against the ground truth',
responseSchema: BleuResponseSchema,
},
(datapoint) => {
if (!datapoint.reference) {
Expand Down Expand Up @@ -125,6 +129,12 @@ function createBleuEvaluator(
);
}

const RougeResponseSchema = z.object({
rougeResults: z.object({
rougeMetricValues: z.array(z.object({ score: z.number() })),
}),
});

// TODO: Add support for batch inputs
function createRougeEvaluator(
factory: EvaluatorFactory,
Expand All @@ -136,6 +146,7 @@ function createRougeEvaluator(
displayName: 'ROUGE',
definition:
'Computes the ROUGE score by comparing the output against the ground truth',
responseSchema: RougeResponseSchema,
},
(datapoint) => {
if (!datapoint.reference) {
Expand Down Expand Up @@ -163,6 +174,14 @@ function createRougeEvaluator(
);
}

const SafetyResponseSchema = z.object({
safetyResult: z.object({
score: z.number(),
explanation: z.string(),
confidence: z.number(),
}),
});

function createSafetyEvaluator(
factory: EvaluatorFactory,
metricSpec: any
Expand All @@ -172,6 +191,7 @@ function createSafetyEvaluator(
metric: VertexAIEvaluationMetricType.SAFETY,
displayName: 'Safety',
definition: 'Assesses the level of safety of an output',
responseSchema: SafetyResponseSchema,
},
(datapoint) => {
return {
Expand All @@ -183,7 +203,7 @@ function createSafetyEvaluator(
},
};
},
(response: any, datapoint: BaseDataPoint) => {
(response, datapoint: BaseDataPoint) => {
return {
testCaseId: datapoint.testCaseId,
evaluation: {
Expand All @@ -197,6 +217,14 @@ function createSafetyEvaluator(
);
}

const GroundednessResponseSchema = z.object({
groundednessResult: z.object({
score: z.number(),
explanation: z.string(),
confidence: z.number(),
}),
});

function createGroundednessEvaluator(
factory: EvaluatorFactory,
metricSpec: any
Expand All @@ -207,6 +235,7 @@ function createGroundednessEvaluator(
displayName: 'Groundedness',
definition:
'Assesses the ability to provide or reference information included only in the context',
responseSchema: GroundednessResponseSchema,
},
(datapoint) => {
return {
Expand All @@ -219,7 +248,7 @@ function createGroundednessEvaluator(
},
};
},
(response: any, datapoint: BaseDataPoint) => {
(response, datapoint: BaseDataPoint) => {
return {
testCaseId: datapoint.testCaseId,
evaluation: {
Expand Down
31 changes: 25 additions & 6 deletions js/plugins/vertexai/src/evaluator_factory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import { Action } from '@genkit-ai/core';
import { runInNewSpan } from '@genkit-ai/core/tracing';
import { GoogleAuth } from 'google-auth-library';
import { JSONClient } from 'google-auth-library/build/src/auth/googleauth';
import z from 'zod';
import { VertexAIEvaluationMetricType } from './evaluation';

export class EvaluatorFactory {
Expand All @@ -28,14 +29,18 @@ export class EvaluatorFactory {
private readonly projectId: string
) {}

create(
create<ResponseType extends z.ZodTypeAny>(
config: {
metric: VertexAIEvaluationMetricType;
displayName: string;
definition: string;
responseSchema: ResponseType;
},
toRequest: (datapoint: BaseDataPoint) => any,
responseHandler: (response: any, datapoint: BaseDataPoint) => any
responseHandler: (
response: z.infer<ResponseType>,
datapoint: BaseDataPoint
) => any
): Action {
return defineEvaluator(
{
Expand All @@ -44,14 +49,21 @@ export class EvaluatorFactory {
definition: config.definition,
},
async (datapoint: BaseDataPoint) => {
const response = await this.evaluateInstances(toRequest(datapoint));
const responseSchema = config.responseSchema;
const response = await this.evaluateInstances(
toRequest(datapoint),
responseSchema
);

return responseHandler(response, datapoint);
}
);
}

async evaluateInstances(partialRequest: any) {
async evaluateInstances<ResponseType extends z.ZodTypeAny>(
partialRequest: any,
responseSchema: ResponseType
): Promise<z.infer<ResponseType>> {
const locationName = `projects/${this.projectId}/locations/${this.location}`;
return await runInNewSpan(
{
Expand All @@ -64,15 +76,22 @@ export class EvaluatorFactory {
location: locationName,
...partialRequest,
};

metadata.input = request;
const client = await this.auth.getClient();
const url = `https://${this.location}-aiplatform.googleapis.com/v1beta1/${locationName}:evaluateInstances`;
const response = await client.request({
url: `https://${this.location}-aiplatform.googleapis.com/v1beta1/${locationName}:evaluateInstances`,
url,
method: 'POST',
body: JSON.stringify(request),
});
metadata.output = response.data;
return response.data as any;

try {
return responseSchema.parse(response.data);
} catch (e) {
throw new Error(`Error parsing ${url} API response: ${e}`);
}
}
);
}
Expand Down

0 comments on commit 99cb078

Please sign in to comment.