-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Vertex AI and Gen AI models not returning usage information even though HTTP response has them #5774
Comments
Hey there, @chreds! I'm here to help you with any bugs, questions, or contributions you have. Let's see what we can do to solve this issue together. To access the import { ChatVertexAI } from "@langchain/google-vertexai";
const chatModel = new ChatVertexAI({
temperature: 0,
});
const res = await chatModel.invoke("Why is the sky blue? Be concise.");
console.log(res.usage_metadata);
/*
{ input_tokens: 9, output_tokens: 12, total_tokens: 21 }
*/ This code demonstrates how to invoke the model and access the Additionally, there are known issues or limitations in the current versions ( If you need to handle the raw HTTP response directly, note that the For If you need to ensure that async parse(text: string): Promise<Uint8Array> {
const chunk = await this.outputParser.parse(text);
const encoder = new TextEncoder();
let parsedChunk;
// Assuming usageMetadata is available in the chunk or needs to be added
const usageMetadata = { /* your metadata here */ };
if (this.contentType === "text/event-stream") {
return encoder.encode(`event: data\ndata: ${JSON.stringify({ ...chunk, usageMetadata })}\n\n`);
}
if (typeof chunk === "string") {
parsedChunk = chunk;
} else {
parsedChunk = JSON.stringify({ ...chunk, usageMetadata });
}
return encoder.encode(parsedChunk);
} This modification ensures that |
Thanks @dosu bot some helpful info there.
|
To ensure that Here is an example of how to configure the model to include const model = new ChatVertexAI({
temperature: 0,
streamUsage: true, // Ensure this is set to true
});
const res = await model.invoke("Why is the sky blue? Be concise.");
console.log(res.usage_metadata); In the provided tests, the test("Stream token count usage_metadata", async () => {
const model = new ChatVertexAI({
temperature: 0,
});
let res: AIMessageChunk | null = null;
for await (const chunk of await model.stream(
"Why is the sky blue? Be concise."
)) {
if (!res) {
res = chunk;
} else {
res = res.concat(chunk);
}
}
console.log(res);
expect(res?.usage_metadata).toBeDefined();
if (!res?.usage_metadata) {
return;
}
expect(res.usage_metadata.input_tokens).toBe(9);
expect(res.usage_metadata.output_tokens).toBeGreaterThan(10);
expect(res.usage_metadata.total_tokens).toBe(
res.usage_metadata.input_tokens + res.usage_metadata.output_tokens
);
}); And it is checked to be undefined when test("streamUsage excludes token usage", async () => {
const model = new ChatVertexAI({
temperature: 0,
streamUsage: false,
});
let res: AIMessageChunk | null = null;
for await (const chunk of await model.stream(
"Why is the sky blue? Be concise."
)) {
if (!res) {
res = chunk;
} else {
res = res.concat(chunk);
}
}
console.log(res);
expect(res?.usage_metadata).not.toBeDefined();
}); By ensuring Additionally, the async testUsageMetadata(
callOptions?: InstanceType<this["Cls"]>["ParsedCallOptions"]
) {
const chatModel = new this.Cls(this.constructorArgs);
const result = await chatModel.invoke("Hello", callOptions);
expect(result).toBeDefined();
expect(result).toBeInstanceOf(this.invokeResponseType);
if (!("usage_metadata" in result)) {
throw new Error("result is not an instance of AIMessage");
}
const usageMetadata = result.usage_metadata as UsageMetadata;
expect(usageMetadata).toBeDefined();
expect(typeof usageMetadata.input_tokens).toBe("number");
expect(typeof usageMetadata.output_tokens).toBe("number");
expect(typeof usageMetadata.total_tokens).toBe("number");
}
async testUsageMetadataStreaming(
callOptions?: InstanceType<this["Cls"]>["ParsedCallOptions"]
) {
const chatModel = new this.Cls(this.constructorArgs);
let finalChunks: AIMessageChunk | undefined;
for await (const chunk of await chatModel.stream("Hello", callOptions)) {
expect(chunk).toBeDefined();
expect(chunk).toBeInstanceOf(AIMessageChunk);
if (!finalChunks) {
finalChunks = chunk;
} else {
finalChunks = finalChunks.concat(chunk);
}
}
if (!finalChunks) {
throw new Error("finalChunks is undefined");
}
const usageMetadata = finalChunks.usage_metadata;
expect(usageMetadata).toBeDefined();
if (!usageMetadata) {
throw new Error("usageMetadata is undefined");
}
expect(typeof usageMetadata.input_tokens).toBe("number");
expect(typeof usageMetadata.output_tokens).toBe("number");
expect(typeof usageMetadata.total_tokens).toBe("number");
} These methods ensure that |
@dosu, I didn't have streamUsage set on the model so it should be defaulting to true. I tried explicitly setting it to true and the results are the same. Is there anything else that could be causing it to be undefined? |
The Here is the relevant part of the code that shows the issue: let usageMetadata: UsageMetadata | undefined;
for await (const response of stream) {
if (
"usageMetadata" in response &&
this.streamUsage !== false &&
options.streamUsage !== false
) {
const genAIUsageMetadata = response.usageMetadata as {
promptTokenCount: number;
candidatesTokenCount: number;
totalTokenCount: number;
};
if (!usageMetadata) {
usageMetadata = {
input_tokens: genAIUsageMetadata.promptTokenCount,
output_tokens: genAIUsageMetadata.candidatesTokenCount,
total_tokens: genAIUsageMetadata.totalTokenCount,
};
} else {
const outputTokenDiff =
genAIUsageMetadata.candidatesTokenCount -
usageMetadata.output_tokens;
usageMetadata = {
input_tokens: 0,
output_tokens: outputTokenDiff,
total_tokens: outputTokenDiff,
};
}
}
const chunk = convertResponseContentToChatGenerationChunk(response, {
usageMetadata,
});
if (!chunk) {
continue;
}
yield chunk;
await runManager?.handleLLMNewToken(chunk.text ?? "");
} The To ensure |
It looks like the issue with The pull request includes changes that ensure In the meantime, you can keep an eye on the status of the pull request and update to the latest version once it is released. This should resolve the issue and allow you to access the |
Should be live now - closing but please reopen if not fixed in latest. |
Thanks @jacoblee93. I've got it pass through usage now with a custom json output parser. Is this the best practices way of accessing?
|
I guess a better option might be to use RunnableParallel with JsonOutputParser and a CustomUsageOutputParser instead so it's not tightly coupled with the JsonOutputParser. |
Yeah, this is much cleaner:
|
Checked other resources
Example Code
The output includes an empty
"llmOutput": {}
The output includes safety information but no usageMetadata
Error Message and Stack Trace (if applicable)
No response
Description
No matter what I try I can't seem to get access to the usageMetadata on the Google APIs.
I tried HTTPOutputParser and other ways to get access to the raw HTTP which would have the usage information but it's just passing through the response messages.
Endpoint is northamerica-northeast1-aiplatform.googleapis.com but I've used the default us-central as well with the same problem.
I just tried with the ChatGoogleGenerativeAI model as well and it's showing the safety information but not the usage information in the callback. When I do a raw HTTP request to https://generativelanguage.googleapis.com/v1/models/gemini-1.5-flash:generateContent?key=[redacted] I get a response that includes:
How can I get access to this info? I'd rather have access to it from the output of the invoke instead of through a callback with the way my app is structured, but I'm sure I can make it work wherever it is.
System Info
The text was updated successfully, but these errors were encountered: