Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google[minor]: Move Vertex embeddings to integration package #12

Open
wants to merge 2 commits into
base: cloned_main_de3a4
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions libs/langchain-community/src/embeddings/googlevertexai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import {
import { GoogleVertexAILLMConnection } from "../utils/googlevertexai-connection.js";

/**
* @deprecated Import and use from @langchain/google-vertexai or @langchain/google-vertexai-web
* Defines the parameters required to initialize a
* GoogleVertexAIEmbeddings instance. It extends EmbeddingsParams and
* GoogleVertexAIConnectionParams.
Expand All @@ -19,12 +20,14 @@ export interface GoogleVertexAIEmbeddingsParams
GoogleVertexAIBaseLLMInput<GoogleAuthOptions> {}

/**
* @deprecated Import and use from @langchain/google-vertexai or @langchain/google-vertexai-web
* Defines additional options specific to the
* GoogleVertexAILLMEmbeddingsInstance. It extends AsyncCallerCallOptions.
*/
interface GoogleVertexAILLMEmbeddingsOptions extends AsyncCallerCallOptions {}

/**
* @deprecated Import and use from @langchain/google-vertexai or @langchain/google-vertexai-web
* Represents an instance for generating embeddings using the Google
* Vertex AI API. It contains the content to be embedded.
*/
Expand All @@ -33,6 +36,7 @@ interface GoogleVertexAILLMEmbeddingsInstance {
}

/**
* @deprecated Import and use from @langchain/google-vertexai or @langchain/google-vertexai-web
* Defines the structure of the embeddings results returned by the Google
* Vertex AI API. It extends GoogleVertexAIBasePrediction and contains the
* embeddings and their statistics.
Expand All @@ -48,6 +52,7 @@ interface GoogleVertexEmbeddingsResults extends GoogleVertexAIBasePrediction {
}

/**
* @deprecated Import and use from @langchain/google-vertexai or @langchain/google-vertexai-web
* Enables calls to the Google Cloud's Vertex AI API to access
* the embeddings generated by Large Language Models.
*
Comment on lines 54 to 58
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Functionality

The class GoogleVertexAIEmbeddings has been marked as deprecated, which is good. However, to ensure smooth transition for users, it would be helpful to provide more detailed information in the deprecation notice. Consider adding a brief explanation of why the class is being deprecated and provide a code example of how to use the new import location. This will help users understand the change and quickly update their code.

Chat with Korbit by mentioning @korbit-ai, and give a 👍 or 👎 to help Korbit improve your reviews.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,9 @@ export class GoogleVertexAILLMConnection<
}

const projectId = await this.client.getProjectId();

console.log(
`https://${this.endpoint}/v1/projects/${projectId}/locations/${this.location}/publishers/google/models/${this.model}:${method}`
);
Comment on lines +215 to +217
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Functionality

I noticed that a console.log statement has been added to print the URL method. While this might be helpful for debugging, it's generally not recommended to leave console.log statements in production code. Additionally, this log might expose sensitive information about the project ID, location, and model. Consider removing this console.log statement or replacing it with a more appropriate logging mechanism that respects different environments (development vs. production) and doesn't expose sensitive data.

Chat with Korbit by mentioning @korbit-ai, and give a 👍 or 👎 to help Korbit improve your reviews.

Comment on lines +215 to +217
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or replace console.log for sensitive information.

The console.log statement exposes sensitive information, such as the project ID and model. Consider using a logging mechanism that respects different environments and does not expose sensitive data.

- console.log(
-   `https://${this.endpoint}/v1/projects/${projectId}/locations/${this.location}/publishers/google/models/${this.model}:${method}`
- );
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
console.log(
`https://${this.endpoint}/v1/projects/${projectId}/locations/${this.location}/publishers/google/models/${this.model}:${method}`
);

return `https://${this.endpoint}/v1/projects/${projectId}/locations/${this.location}/publishers/google/models/${this.model}:${method}`;
}

Expand Down
8 changes: 4 additions & 4 deletions libs/langchain-google-common/src/connection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -166,8 +166,8 @@ export abstract class GoogleHostConnection<
}

export abstract class GoogleAIConnection<
CallOptions extends BaseLanguageModelCallOptions,
MessageType,
CallOptions extends AsyncCallerCallOptions,
InputType,
AuthOptions
>
extends GoogleHostConnection<CallOptions, GoogleLLMResponse, AuthOptions>
Expand Down Expand Up @@ -232,12 +232,12 @@ export abstract class GoogleAIConnection<
}

abstract formatData(
input: MessageType,
input: InputType,
parameters: GoogleAIModelRequestParams
): unknown;

async request(
input: MessageType,
input: InputType,
parameters: GoogleAIModelRequestParams,
options: CallOptions
): Promise<GoogleLLMResponse> {
Expand Down
202 changes: 202 additions & 0 deletions libs/langchain-google-common/src/embeddings.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
import { Embeddings, EmbeddingsParams } from "@langchain/core/embeddings";
import {
AsyncCaller,
AsyncCallerCallOptions,
} from "@langchain/core/utils/async_caller";
import { chunkArray } from "@langchain/core/utils/chunk_array";
import { GoogleAIConnection } from "./connection.js";
import { ApiKeyGoogleAuth, GoogleAbstractedClient } from "./auth.js";
import { GoogleAIModelRequestParams, GoogleConnectionParams } from "./types.js";
import { getEnvironmentVariable } from "@langchain/core/utils/env";

class EmbeddingsConnection<
CallOptions extends AsyncCallerCallOptions,
AuthOptions
> extends GoogleAIConnection<
CallOptions,
GoogleEmbeddingsInstance[],
AuthOptions
> {
convertSystemMessageToHumanContent: boolean | undefined;

constructor(
fields: GoogleConnectionParams<AuthOptions> | undefined,
caller: AsyncCaller,
client: GoogleAbstractedClient,
streaming: boolean
) {
super(fields, caller, client, streaming);
}
Comment on lines +22 to +29
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary constructor.

The constructor in EmbeddingsConnection does not perform any additional logic and can be safely removed to simplify the code.

-  constructor(
-    fields: GoogleConnectionParams<AuthOptions> | undefined,
-    caller: AsyncCaller,
-    client: GoogleAbstractedClient,
-    streaming: boolean
-  ) {
-    super(fields, caller, client, streaming);
-  }
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constructor(
fields: GoogleConnectionParams<AuthOptions> | undefined,
caller: AsyncCaller,
client: GoogleAbstractedClient,
streaming: boolean
) {
super(fields, caller, client, streaming);
}
Tools
Biome

[error] 22-29: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)


async buildUrlMethod(): Promise<string> {
return "predict";
}

formatData(
input: GoogleEmbeddingsInstance[],
parameters: GoogleAIModelRequestParams
): unknown {
return {
instances: input,
parameters,
};
}
}

/**
* Defines the parameters required to initialize a
* GoogleEmbeddings instance. It extends EmbeddingsParams and
* GoogleConnectionParams.
*/
export interface BaseGoogleEmbeddingsParams<AuthOptions>
extends EmbeddingsParams,
GoogleConnectionParams<AuthOptions> {
model: string;
}

/**
* Defines additional options specific to the
* GoogleEmbeddingsInstance. It extends AsyncCallerCallOptions.
*/
export interface BaseGoogleEmbeddingsOptions extends AsyncCallerCallOptions {}

/**
* Represents an instance for generating embeddings using the Google
* Vertex AI API. It contains the content to be embedded.
*/
export interface GoogleEmbeddingsInstance {
content: string;
}

/**
* Defines the structure of the embeddings results returned by the Google
* Vertex AI API. It extends GoogleBasePrediction and contains the
* embeddings and their statistics.
*/
export interface BaseGoogleEmbeddingsResults {
embeddings: {
statistics: {
token_count: number;
truncated: boolean;
};
values: number[];
};
}

/**
* Enables calls to the Google Cloud's Vertex AI API to access
* the embeddings generated by Large Language Models.
*
* To use, you will need to have one of the following authentication
* methods in place:
* - You are logged into an account permitted to the Google Cloud project
* using Vertex AI.
* - You are running this on a machine using a service account permitted to
* the Google Cloud project using Vertex AI.
* - The `GOOGLE_APPLICATION_CREDENTIALS` environment variable is set to the
* path of a credentials file for a service account permitted to the
* Google Cloud project using Vertex AI.
* @example
* ```typescript
* const model = new GoogleEmbeddings();
* const res = await model.embedQuery(
* "What would be a good company name for a company that makes colorful socks?"
* );
* console.log({ res });
* ```
*/
export abstract class BaseGoogleEmbeddings<AuthOptions>
extends Embeddings
implements BaseGoogleEmbeddingsParams<AuthOptions>
{
model: string;

private connection: GoogleAIConnection<
BaseGoogleEmbeddingsOptions,
GoogleEmbeddingsInstance[],
GoogleConnectionParams<AuthOptions>
>;

constructor(fields: BaseGoogleEmbeddingsParams<AuthOptions>) {
super(fields);

this.model = fields.model;
this.connection = new EmbeddingsConnection(
{ ...fields, ...this },
this.caller,
this.buildClient(fields),
false
);
}

abstract buildAbstractedClient(
fields?: GoogleConnectionParams<AuthOptions>
): GoogleAbstractedClient;

buildApiKeyClient(apiKey: string): GoogleAbstractedClient {
return new ApiKeyGoogleAuth(apiKey);
}

buildApiKey(
fields?: GoogleConnectionParams<AuthOptions>
): string | undefined {
return fields?.apiKey ?? getEnvironmentVariable("GOOGLE_API_KEY");
}

buildClient(
fields?: GoogleConnectionParams<AuthOptions>
): GoogleAbstractedClient {
const apiKey = this.buildApiKey(fields);
if (apiKey) {
return this.buildApiKeyClient(apiKey);
} else {
return this.buildAbstractedClient(fields);
}
}

/**
* Takes an array of documents as input and returns a promise that
* resolves to a 2D array of embeddings for each document. It splits the
* documents into chunks and makes requests to the Google Vertex AI API to
* generate embeddings.
* @param documents An array of documents to be embedded.
* @returns A promise that resolves to a 2D array of embeddings for each document.
*/
async embedDocuments(documents: string[]): Promise<number[][]> {
const instanceChunks: GoogleEmbeddingsInstance[][] = chunkArray(
documents.map((document) => ({
content: document,
})),
5
); // Vertex AI accepts max 5 instances per prediction
const parameters = {};
const options = {};
const responses = await Promise.all(
instanceChunks.map((instances) =>
this.connection.request(instances, parameters, options)
)
);
const result: number[][] =
responses
?.map(
(response) =>
(response?.data as any)?.predictions?.map(
(result: any) => result.embeddings.values
) ?? []
)
.flat() ?? [];
return result;
}

/**
* Takes a document as input and returns a promise that resolves to an
* embedding for the document. It calls the embedDocuments method with the
* document as the input.
* @param document A document to be embedded.
* @returns A promise that resolves to an embedding for the document.
*/
async embedQuery(document: string): Promise<number[]> {
const data = await this.embedDocuments([document]);
return data[0];
}
Comment on lines +198 to +201
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Functionality severity potentially major

In the embedQuery method, there's no check to ensure that the data array returned by embedDocuments is not empty before accessing its first element. This could lead to a runtime error if embedDocuments returns an empty array. Consider adding a check and handling the case where no embeddings are returned.

Chat with Korbit by mentioning @korbit-ai, and give a 👍 or 👎 to help Korbit improve your reviews.

}
1 change: 1 addition & 0 deletions libs/langchain-google-common/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
export * from "./chat_models.js";
export * from "./llms.js";
export * from "./embeddings.js";

export * from "./auth.js";
export * from "./connection.js";
Expand Down
4 changes: 2 additions & 2 deletions libs/langchain-google-gauth/src/auth.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import {
ensureAuthOptionScopes,
GoogleAbstractedClient,
GoogleAbstractedClientOps,
GoogleBaseLLMInput,
GoogleConnectionParams,
JsonStream,
} from "@langchain/google-common";
import { GoogleAuth, GoogleAuthOptions } from "google-auth-library";
Expand All @@ -27,7 +27,7 @@ export class NodeJsonStream extends JsonStream {
export class GAuthClient implements GoogleAbstractedClient {
gauth: GoogleAuth;

constructor(fields?: GoogleBaseLLMInput<GoogleAuthOptions>) {
constructor(fields?: GoogleConnectionParams<GoogleAuthOptions>) {
const options = ensureAuthOptionScopes<GoogleAuthOptions>(
fields?.authOptions,
"scopes",
Expand Down
39 changes: 39 additions & 0 deletions libs/langchain-google-gauth/src/embeddings.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import {
GoogleAbstractedClient,
GoogleConnectionParams,
BaseGoogleEmbeddings,
BaseGoogleEmbeddingsParams,
} from "@langchain/google-common";
import { GoogleAuthOptions } from "google-auth-library";
import { GAuthClient } from "./auth.js";

/**
* Input to LLM class.
*/
export interface GoogleEmbeddingsInput
extends BaseGoogleEmbeddingsParams<GoogleAuthOptions> {}

/**
* Integration with an LLM.
*/
export class GoogleEmbeddings
extends BaseGoogleEmbeddings<GoogleAuthOptions>
implements GoogleEmbeddingsInput
{
// Used for tracing, replace with the same name as your class
static lc_name() {
return "GoogleEmbeddings";
}

lc_serializable = true;

constructor(fields: GoogleEmbeddingsInput) {
super(fields);
}
Comment on lines +30 to +32
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unnecessary constructor.

The constructor in GoogleEmbeddings is redundant as it only calls the superclass constructor. Removing it will simplify the code.

-  constructor(fields: GoogleEmbeddingsInput) {
-    super(fields);
-  }
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constructor(fields: GoogleEmbeddingsInput) {
super(fields);
}
Tools
Biome

[error] 30-32: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)


buildAbstractedClient(
fields?: GoogleConnectionParams<GoogleAuthOptions>
): GoogleAbstractedClient {
return new GAuthClient(fields);
}
}
1 change: 1 addition & 0 deletions libs/langchain-google-gauth/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
export * from "./chat_models.js";
export * from "./llms.js";
export * from "./embeddings.js";
25 changes: 25 additions & 0 deletions libs/langchain-google-vertexai-web/src/embeddings.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import {
type GoogleEmbeddingsInput,
GoogleEmbeddings,
} from "@langchain/google-webauth";

/**
* Input to chat model class.
*/
export interface GoogleVertexAIEmbeddingsInput extends GoogleEmbeddingsInput {}

/**
* Integration with a chat model.
*/
export class GoogleVertexAIEmbeddings extends GoogleEmbeddings {
Comment on lines +11 to +14
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the class description.

The class description inaccurately states "Integration with a chat model." It should be updated to reflect that this class is for embeddings, such as "Integration with Google Vertex AI for embeddings."

static lc_name() {
return "GoogleVertexAIEmbeddings";
}

constructor(fields: GoogleVertexAIEmbeddingsInput) {
super({
...fields,
platformType: "gcp",
});
}
}
1 change: 1 addition & 0 deletions libs/langchain-google-vertexai-web/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
export * from "./chat_models.js";
export * from "./llms.js";
export * from "./embeddings.js";
Loading