Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Vertex AI and Gen AI models not returning usage information even though HTTP response has them #5774

Closed
5 tasks done
chreds opened this issue Jun 15, 2024 · 11 comments
Closed
5 tasks done
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@chreds
Copy link

chreds commented Jun 15, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

const googleModel = new ChatVertexAI({
...
callbacks: [{
  handleLLMEnd(output) {
    console.log("llmOutput " + JSON.stringify(output, null, 2))
  }
}]

The output includes an empty "llmOutput": {}

const googleModel = new ChatGoogleGenerativeAI({
...
callbacks: [{
  handleLLMEnd(output) {
    console.log("llmOutput " + JSON.stringify(output, null, 2))
  }
}]

The output includes safety information but no usageMetadata

Error Message and Stack Trace (if applicable)

No response

Description

No matter what I try I can't seem to get access to the usageMetadata on the Google APIs.

I tried HTTPOutputParser and other ways to get access to the raw HTTP which would have the usage information but it's just passing through the response messages.

Endpoint is northamerica-northeast1-aiplatform.googleapis.com but I've used the default us-central as well with the same problem.

I just tried with the ChatGoogleGenerativeAI model as well and it's showing the safety information but not the usage information in the callback. When I do a raw HTTP request to https://generativelanguage.googleapis.com/v1/models/gemini-1.5-flash:generateContent?key=[redacted] I get a response that includes:

{
    "candidates": [
       ...
    ],
    "usageMetadata": {
        "promptTokenCount": 12,
        "candidatesTokenCount": 144,
        "totalTokenCount": 156
    }
}

How can I get access to this info? I'd rather have access to it from the output of the invoke instead of through a callback with the way my app is structured, but I'm sure I can make it work wherever it is.

System Info

"@langchain/google-genai": "^0.0.17",
"@langchain/google-vertexai": "^0.0.18",
"langchain": "^0.2.5",
@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jun 15, 2024
Copy link

dosubot bot commented Jun 15, 2024

Hey there, @chreds! I'm here to help you with any bugs, questions, or contributions you have. Let's see what we can do to solve this issue together.

To access the usageMetadata from the Google Vertex AI and Generative AI models when using LangChain.js, you can use the invoke method and then log the usage_metadata from the response. Here is an example:

import { ChatVertexAI } from "@langchain/google-vertexai";

const chatModel = new ChatVertexAI({
  temperature: 0,
});

const res = await chatModel.invoke("Why is the sky blue? Be concise.");

console.log(res.usage_metadata);

/*
  { input_tokens: 9, output_tokens: 12, total_tokens: 21 }
*/

This code demonstrates how to invoke the model and access the usage_metadata from the response [1].

Additionally, there are known issues or limitations in the current versions (@langchain/google-genai ^0.0.17, @langchain/google-vertexai ^0.0.18, langchain ^0.2.5) that might affect the retrieval of usageMetadata. Specifically, if streamUsage is set to false, the usage_metadata is not included in the response. Ensure that streamUsage is not explicitly set to false to include usage_metadata in the response [2].

If you need to handle the raw HTTP response directly, note that the invoke method in ChatVertexAI and ChatGoogleGenerativeAI does not provide direct access to the full raw HTTP response object. Instead, it processes the response and returns a structured object [2].

For ChatGoogleGenerativeAI, usageMetadata is extracted from the response and includes input_tokens, output_tokens, and total_tokens. This metadata is updated during streaming by calculating the difference in token counts between chunks [3].

If you need to ensure that usageMetadata is included in the parsed output, you might need to modify the HttpResponseOutputParser class to include this functionality. Here is an example of how you might modify the parse method to include usageMetadata:

async parse(text: string): Promise<Uint8Array> {
  const chunk = await this.outputParser.parse(text);
  const encoder = new TextEncoder();
  let parsedChunk;

  // Assuming usageMetadata is available in the chunk or needs to be added
  const usageMetadata = { /* your metadata here */ };

  if (this.contentType === "text/event-stream") {
    return encoder.encode(`event: data\ndata: ${JSON.stringify({ ...chunk, usageMetadata })}\n\n`);
  }
  if (typeof chunk === "string") {
    parsedChunk = chunk;
  } else {
    parsedChunk = JSON.stringify({ ...chunk, usageMetadata });
  }
  return encoder.encode(parsedChunk);
}

This modification ensures that usageMetadata is included in the parsed output [4].

To continue talking to Dosu, mention @dosu.

@chreds
Copy link
Author

chreds commented Jun 16, 2024

Thanks @dosu bot some helpful info there.
I'm still getting undefined for usageMetadata. Tried taking out the ChatPromptTemplate code and calling with model.invoke() without any of the piping and the result still has usageMetadata undefined.

  lc_serializable: true,
  lc_kwargs: {
    content: ...,
    tool_calls: [],
    additional_kwargs: { finishReason: 'STOP', index: 0, safetyRatings: [Array] },
    invalid_tool_calls: [],
    response_metadata: {}
  },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: ...,
  name: undefined,
  additional_kwargs: {
    finishReason: 'STOP',
    index: 0,
    safetyRatings: [ [Object], [Object], [Object], [Object] ]
  },
  response_metadata: {
    finishReason: 'STOP',
    index: 0,
    safetyRatings: [ [Object], [Object], [Object], [Object] ]
  },
  tool_calls: [],
  invalid_tool_calls: [],
  usage_metadata: undefined

Copy link

dosubot bot commented Jun 16, 2024

To ensure that usageMetadata is included in the response when using LangChain.js with Google Vertex AI and Generative AI models, you need to configure the ChatVertexAI model appropriately. Specifically, you should ensure that the streamUsage option is set to true (or not set to false).

Here is an example of how to configure the model to include usageMetadata:

const model = new ChatVertexAI({
  temperature: 0,
  streamUsage: true, // Ensure this is set to true
});

const res = await model.invoke("Why is the sky blue? Be concise.");
console.log(res.usage_metadata);

In the provided tests, the usage_metadata is checked to be defined when streamUsage is enabled:

test("Stream token count usage_metadata", async () => {
  const model = new ChatVertexAI({
    temperature: 0,
  });
  let res: AIMessageChunk | null = null;
  for await (const chunk of await model.stream(
    "Why is the sky blue? Be concise."
  )) {
    if (!res) {
      res = chunk;
    } else {
      res = res.concat(chunk);
    }
  }
  console.log(res);
  expect(res?.usage_metadata).toBeDefined();
  if (!res?.usage_metadata) {
    return;
  }
  expect(res.usage_metadata.input_tokens).toBe(9);
  expect(res.usage_metadata.output_tokens).toBeGreaterThan(10);
  expect(res.usage_metadata.total_tokens).toBe(
    res.usage_metadata.input_tokens + res.usage_metadata.output_tokens
  );
});

And it is checked to be undefined when streamUsage is disabled:

test("streamUsage excludes token usage", async () => {
  const model = new ChatVertexAI({
    temperature: 0,
    streamUsage: false,
  });
  let res: AIMessageChunk | null = null;
  for await (const chunk of await model.stream(
    "Why is the sky blue? Be concise."
  )) {
    if (!res) {
      res = chunk;
    } else {
      res = res.concat(chunk);
    }
  }
  console.log(res);
  expect(res?.usage_metadata).not.toBeDefined();
});

By ensuring streamUsage is set to true, you can include usageMetadata in the response [1].

Additionally, the testUsageMetadata and testUsageMetadataStreaming methods in the ChatModelIntegrationTests class from the libs/langchain-standard-tests/src/integration_tests/chat_models.ts file also verify the presence of usage_metadata:

async testUsageMetadata(
  callOptions?: InstanceType<this["Cls"]>["ParsedCallOptions"]
) {
  const chatModel = new this.Cls(this.constructorArgs);
  const result = await chatModel.invoke("Hello", callOptions);
  expect(result).toBeDefined();
  expect(result).toBeInstanceOf(this.invokeResponseType);
  if (!("usage_metadata" in result)) {
    throw new Error("result is not an instance of AIMessage");
  }
  const usageMetadata = result.usage_metadata as UsageMetadata;
  expect(usageMetadata).toBeDefined();
  expect(typeof usageMetadata.input_tokens).toBe("number");
  expect(typeof usageMetadata.output_tokens).toBe("number");
  expect(typeof usageMetadata.total_tokens).toBe("number");
}

async testUsageMetadataStreaming(
  callOptions?: InstanceType<this["Cls"]>["ParsedCallOptions"]
) {
  const chatModel = new this.Cls(this.constructorArgs);
  let finalChunks: AIMessageChunk | undefined;
  for await (const chunk of await chatModel.stream("Hello", callOptions)) {
    expect(chunk).toBeDefined();
    expect(chunk).toBeInstanceOf(AIMessageChunk);
    if (!finalChunks) {
      finalChunks = chunk;
    } else {
      finalChunks = finalChunks.concat(chunk);
    }
  }
  if (!finalChunks) {
    throw new Error("finalChunks is undefined");
  }
  const usageMetadata = finalChunks.usage_metadata;
  expect(usageMetadata).toBeDefined();
  if (!usageMetadata) {
    throw new Error("usageMetadata is undefined");
  }
  expect(typeof usageMetadata.input_tokens).toBe("number");
  expect(typeof usageMetadata.output_tokens).toBe("number");
  expect(typeof usageMetadata.total_tokens).toBe("number");
}

These methods ensure that usage_metadata is properly included and validated in the responses [2].

To continue talking to Dosu, mention @dosu.

@chreds
Copy link
Author

chreds commented Jun 16, 2024

@dosu, I didn't have streamUsage set on the model so it should be defaulting to true. I tried explicitly setting it to true and the results are the same. Is there anything else that could be causing it to be undefined?

Copy link

dosubot bot commented Jun 16, 2024

The usage_metadata field in the response from LangChain.js with Google Vertex AI and Generative AI models can be undefined despite setting streamUsage to true due to an issue in the _streamResponseChunks method. The code does not correctly handle the usageMetadata when streaming responses.

Here is the relevant part of the code that shows the issue:

let usageMetadata: UsageMetadata | undefined;
for await (const response of stream) {
  if (
    "usageMetadata" in response &&
    this.streamUsage !== false &&
    options.streamUsage !== false
  ) {
    const genAIUsageMetadata = response.usageMetadata as {
      promptTokenCount: number;
      candidatesTokenCount: number;
      totalTokenCount: number;
    };
    if (!usageMetadata) {
      usageMetadata = {
        input_tokens: genAIUsageMetadata.promptTokenCount,
        output_tokens: genAIUsageMetadata.candidatesTokenCount,
        total_tokens: genAIUsageMetadata.totalTokenCount,
      };
    } else {
      const outputTokenDiff =
        genAIUsageMetadata.candidatesTokenCount -
        usageMetadata.output_tokens;
      usageMetadata = {
        input_tokens: 0,
        output_tokens: outputTokenDiff,
        total_tokens: outputTokenDiff,
      };
    }
  }

  const chunk = convertResponseContentToChatGenerationChunk(response, {
    usageMetadata,
  });
  if (!chunk) {
    continue;
  }

  yield chunk;
  await runManager?.handleLLMNewToken(chunk.text ?? "");
}

The usageMetadata is only updated if "usageMetadata" in response and both this.streamUsage and options.streamUsage are not false. If the usageMetadata is not present in the response or if the conditions are not met, it remains undefined [1].

To ensure usageMetadata is properly handled, verify that the streaming responses from the Google Vertex AI and Generative AI models include the usageMetadata and that the conditions for updating it are correctly implemented. Additionally, check the tests in libs/langchain-google-vertexai/src/tests/chat_models.int.test.ts for examples of how usage_metadata should be handled and verified [2].

To continue talking to Dosu, mention @dosu.

@chreds
Copy link
Author

chreds commented Jun 16, 2024

@dosu maybe I need to wait for #5763 to be released before your suggestions will work. I'll probably have to wait for the next release to get that functionality.

Copy link

dosubot bot commented Jun 16, 2024

It looks like the issue with usageMetadata being undefined might be related to a known problem in the _streamResponseChunks method, where the usageMetadata is not correctly handled when streaming responses. This issue is addressed in a recent pull request, which you mentioned: pull request #5763.

The pull request includes changes that ensure usageMetadata is properly included in the response. If you are still experiencing issues despite setting streamUsage to true, it is likely that you will need to wait for this pull request to be merged and released in the next version of LangChain.js.

In the meantime, you can keep an eye on the status of the pull request and update to the latest version once it is released. This should resolve the issue and allow you to access the usageMetadata as expected.

To continue talking to Dosu, mention @dosu.

@jacoblee93
Copy link
Collaborator

Should be live now - closing but please reopen if not fixed in latest.

@chreds
Copy link
Author

chreds commented Jun 18, 2024

Thanks @jacoblee93. I've got it pass through usage now with a custom json output parser. Is this the best practices way of accessing?

export type JsonAndUsageOutput = {
    json?: Record<string, any>, 
    tokens?: {
        input_tokens: number, 
        output_tokens: number 
    }
}

export class JsonAndUsageOutputParser extends JsonOutputParser {
    async parseResult(generations: ChatGeneration[] & { message: AIMessage }[], callbacks?: any) : Promise<JsonAndUsageOutput> {
        let input_tokens = 0
        let output_tokens = 0
        
        for(const generation of generations) {
            input_tokens += (generation.message.usage_metadata.input_tokens || 0)
            output_tokens += (generation.message.usage_metadata.output_tokens || 0)
            logger.debug(generation.message.usage_metadata)
        }

        const result = { json: await super.parseResult(generations, callbacks), tokens: { input_tokens, output_tokens } }
        return result
    }
}

@chreds
Copy link
Author

chreds commented Jun 18, 2024

I guess a better option might be to use RunnableParallel with JsonOutputParser and a CustomUsageOutputParser instead so it's not tightly coupled with the JsonOutputParser.

@chreds
Copy link
Author

chreds commented Jun 18, 2024

Yeah, this is much cleaner:

        const jsonParser = new JsonOutputParser()
        const usageParser = RunnableLambda.from((message: AIMessage) : UsageMetadata => {
            return message.usage_metadata
        })
    
        const mapChain = RunnableMap.from({
            json: jsonParser,
            usage: usageParser,
        })

        const chainResult = await prompt.pipe(model).pipe(mapChain).invoke({data: invokeWith})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants