Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(EAI-587): Clean up shared imports in mongodb-rag-core + migrate @azure/openai to openai #546

Merged
merged 11 commits into from
Oct 31, 2024

Conversation

mongodben
Copy link
Collaborator

@mongodben mongodben commented Oct 30, 2024

Jira: https://jira.mongodb.org/browse/EAI-587

Changes

  • Switch codebase to use openai module instead of @azure/openai
  • Rename modules exported from mongodb-rag-core to better folow namespace convention of CamelCase (w/ first letter uppercase)
    • Due to small differences in the SDKs' APIs, there were some small associated code refactors.

Notes

@mongodben mongodben marked this pull request as ready for review October 31, 2024 13:54
Comment on lines 31 to 33
export * from "mongodb";
export * from "@azure/openai";
export * as OpenAI from "openai";
export * from "./langchain";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nlarew in future PR, i wanna do the hared langchain and mongodb exports w/ the * as Name convention for continuing the better namespacing of modules in this package,

ie:

export * as MongoDB from "mongodb";
export * as Langchain from "langchain";

not doing in this PR b/c it'd dramatically increase the # of files touched.

relatedly, i think it'd probably be more correct to change these external dependencies that we re-export as peerDependencies in the package.json, but also since our monorepo is the main consumer of the mongodb-rag-core package, it probably would add more overhead and annoyance to us than value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm directionally ok with that export * as Namespace approach, though I want to make sure that it wouldn't make it harder to do tree shaking or otherwise bloat bundles.

If it is an issue, we might consider an alternative of two package entry points. i.e.

  • In this file, we export * from "./langchain" so that there's no namespace wrapper object
  • We need to create that file we export from, i.e. ./langchain/index.ts. It does the export * from "openai";
  • In package.json we add a exports for ./langchain.

So basically I could do either of these as a consumer:

import { LangChainGithubRepoLoader } from "mongodb-rag-core";
import { LangChainGithubRepoLoader } from "mongodb-rag-core/langchain";

Maybe too complex?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thought on the peer dependencies, I'd have to think through it a bit or see it in action

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm notionally in agreement w/ the idea of mongodb-rag-core/<namespace>, however in practice i've repeatedly run into issues w/ any packages that we compose like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why i didn't use that pattern. but let me use this as an opportunity to at least properly understand why this doesn't work, even if we cant switch to it..tho maybe we'll be able to switch..let's see.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to merge this PR, and play w/ this in separate PR. would require touching a lot of files b/c so many imports would be changed

@mongodben mongodben changed the title (EAI-587): Clean up shared imports in mongodb-rag-core (EAI-587): Clean up shared imports in mongodb-rag-core + migrate @azure/openai to openai Oct 31, 2024
Copy link
Collaborator

@nlarew nlarew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Yeoman's work on this one. Just a few small nits & questions.

@@ -41,7 +40,7 @@ const systemPrompt = `You are an expert data labeler employed by MongoDB.
You must label metadata about the user query based on its context in the conversation.
Your pay is determined by the accuracy of your labels as judged against other expert labelers, so do excellent work to maximize your earnings to support your family.`;

const fewShotExamples: ChatCompletionMessageParam[] = [
const fewShotExamples: OpenAI.default.Chat.ChatCompletionMessageParam[] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird type is an unfortunate side effect of the namespace exports :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, also the openai package does some funky stuff w/ exports

Comment on lines +2 to +4
// This module is also required for the mock to work
// in the tests
const { updateFrontMatter } = jest.requireActual("mongodb-chatbot-server");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - do you know why updateFrontMatter but not any other exports from the server?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea b/c that's needed in one of the methods. don't want it overwritten

Copy link
Collaborator

@nlarew nlarew Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the meat of the question is - when would we need to add other methods here? Like is this for a specific test or just in general we need the actual function?

Comment on lines +164 to +167
if (
response.function_call === undefined ||
response.function_call === null
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can keep this as a one-liner if you want

Suggested change
if (
response.function_call === undefined ||
response.function_call === null
) {
if ((response.function_call ?? undefined) === undefined) {

or if the typing makes sense could go even simpler

Suggested change
if (
response.function_call === undefined ||
response.function_call === null
) {
if (!response.function_call) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping explicit in case of ""

@@ -458,16 +460,21 @@ export async function streamGenerateResponseMessage({
});
}

return { messages: newMessages };
return { messages: newMessages.map(convertMessageFromLlmToDb) };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add the convertMessageFromLlmToDb() call here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b/c in our implementation it's still type functionCall, whereas the new openai lib is function_call

...baseChatbotRepoConfig,
name: "ingest_testData",
metadata: {
productName: "Ingest Test Data",
},
filter: (path) => path.includes("ingest/testData"),
filter: (path) => path.includes(Path.join("mongodb-rag-core", "testData")),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the Path import here? Can't you just do "mongodb-rag-core/testData" since this is joining two static strings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not if you're on windows!

…BackUserQuery.test.ts

Co-authored-by: Nick Larew <nick.larew@mongodb.com>
@mongodben mongodben merged commit 3eb647d into main Oct 31, 2024
1 check was pending
@mongodben mongodben deleted the EAI-587 branch October 31, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants