Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Unknown node type: DOCUMENT #1179

Closed
AndreMaz opened this issue Sep 11, 2024 · 15 comments · Fixed by #1214
Closed

Error: Unknown node type: DOCUMENT #1179

AndreMaz opened this issue Sep 11, 2024 · 15 comments · Fixed by #1214
Assignees
Labels
bug Something isn't working

Comments

@AndreMaz
Copy link
Contributor

AndreMaz commented Sep 11, 2024

@himself65 bumped to 0.5.24 to see If the #1176 is fixed. It is but now I'm getting the following error:

Error: Unknown node type: DOCUMENT
at splitNodesByType (webpack-internal:///(rsc)/../../node_modules/@llamaindex/core/dist/schema/index.js:708:19)
at createMessageContent (webpack-internal:///(rsc)/../../node_modules/llamaindex/dist/synthesizers/utils.js:11:94)
at DefaultContextGenerator.generate (webpack-internal:///(rsc)/../../node_modules/llamaindex/dist/engines/chat/DefaultContextGenerator.js:50:107)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async ContextChatEngine.prepareRequestMessages (webpack-internal:///(rsc)/../../node_modules/llamaindex/dist/engines/chat/ContextChatEngine.js:470:25)
at async ContextChatEngine.chat (webpack-internal:///(rsc)/../../node_modules/llamaindex/dist/engines/chat/ContextChatEngine.js:437:33)
at async Object.start (webpack-internal:///(rsc)/./src/app/api/chat/route.ts:60:32)

Note: This error was not present in the v0.5.20. Something that I initially described in #1172

I can repro the issue exactly in the same way as I've mentioned in the ticket above.
From the ticket:

I'm using llamaindex at app/api/chat/route.ts , in other words, I'm using Route Handlers
When I start my app and open /home next.js compiles the app and I can call llamaindex at app/api/chat/route.ts without any error. Then, when I navigate to another page (e.g, /products) next.js does another compilation. After this when I call the handler at app/api/chat/route.ts I get the error above.

@himself65 himself65 self-assigned this Sep 11, 2024
@himself65 himself65 added the bug Something isn't working label Sep 11, 2024
@himself65
Copy link
Member

what's your code snippt? I'm guessing there's dual module in the output

@himself65
Copy link
Member

I cannot reproduce the issue but I think this might related to the bundler

@AndreMaz
Copy link
Contributor Author

Here's my next.config.mjs

/**
 * @type {import("next").NextConfig}
 */

import { fileURLToPath } from "url";
import _jiti from "jiti";
import withLlamaIndex from "llamaindex/next";

const jiti = _jiti(fileURLToPath(import.meta.url));

// Import env files to validate at build time. Use jiti so we can load .ts files in here.
jiti("./src/env");

const isStaticExport = "false";

const nextConfig = {
  basePath: process.env.NEXT_PUBLIC_BASE_PATH,
  env: {
    BUILD_STATIC_EXPORT: isStaticExport,
  },
  // Trailing slashes must be disabled for Next Auth callback endpoint to work
  // https://stackoverflow.com/a/78348528
  trailingSlash: false,
  modularizeImports: {
    "@mui/icons-material": {
      transform: "@mui/icons-material/{{member}}",
    },
    "@mui/material": {
      transform: "@mui/material/{{member}}",
    },
    "@mui/lab": {
      transform: "@mui/lab/{{member}}",
    },
  },
  webpack(config) {
    config.module.rules.push({
      test: /\.svg$/,
      use: ["@svgr/webpack"],
    });

    // To allow chatbot to work
    // Extracted from: https://github.com/neondatabase/examples/blob/main/ai/llamaindex/rag-nextjs/next.config.mjs
    config.resolve.alias = {
      ...config.resolve.alias,
      sharp$: false,
      "onnxruntime-node$": false,
    };

    return config;
  },
  ...(isStaticExport === "true" && {
    output: "export",
  }),
};

export default withLlamaIndex(nextConfig);

All llamaindex-related code is located in app/api/chat

config.ts

import { OpenAI, OpenAIEmbedding, Settings } from "llamaindex";

import { env } from "~/env";

const aiGlobalConfig = globalThis as unknown as {
  llmModel?: OpenAI;
  embedModel?: OpenAIEmbedding;
};

const llmModel =
  aiGlobalConfig.llmModel ??
  new OpenAI({
    apiKey: env.OPENAI_API_KEY,
    model: env.OPENAI_MODEL_NAME,
  });

if (env.NODE_ENV !== "production") {
  aiGlobalConfig.llmModel = llmModel;
}

const embedModel =
  aiGlobalConfig.embedModel ??
  new OpenAIEmbedding({
    apiKey: env.OPENAI_API_KEY,
    model: env.OPENAI_EMBED_MODEL_NAME,
  });

if (env.NODE_ENV !== "production") {
  aiGlobalConfig.embedModel = embedModel;
}

// LlamaIndex Settings
Settings.llm = llmModel;

Settings.embedModel = embedModel;

export { Settings };

verctor-store.ts

import { PGVectorStore } from "llamaindex/vector-store/PGVectorStore";

import { env } from "~/env";

const DIMMS = env.EMBED_DIM;

const CONN_STRING = env.VECTOR_STORE_PG_URL;

const globalForVectorStore = globalThis as unknown as {
  vectorStore: PGVectorStore | undefined;
};

const storedVectorStore =
  globalForVectorStore.vectorStore ??
  new PGVectorStore({
    dimensions: DIMMS,
    connectionString: CONN_STRING,
  });

if (env.NODE_ENV !== "production") {
  globalForVectorStore.vectorStore = storedVectorStore;
}

const vectorStore = storedVectorStore;

export { vectorStore };

route.ts

import type { NextRequest } from "next/server";
import { ContextChatEngine, serviceContextFromDefaults, VectorStoreIndex } from "llamaindex";

import type { ChatBotPayload } from "./types";
import { env } from "~/env";
import { Settings } from "./config";
import { vectorStore } from "./vector-store";

/**
 * Key used to store the space ID in the metadata.
 */
const METADATA_SPACE_ID_KEY = "space_id";

export async function POST(request: NextRequest) {
  try {
    const { messages = [], spaces: confluenceSpaces = [] } =
      (await request.json()) as ChatBotPayload;

    if (confluenceSpaces.length === 0) {
      throw new Error("No confluence spaces provided.");
    }

    const userMessages = messages.filter((i) => i.role === "user");
    const query = userMessages[userMessages.length - 1]?.content;

    if (!query) {
      throw new Error("No query provided.");
    }

    const serviceContext = serviceContextFromDefaults({ embedModel: Settings.embedModel });
    const index = await VectorStoreIndex.fromVectorStore(vectorStore, serviceContext);
    const retriever = index.asRetriever({
      topK: {
        TEXT: env.TOP_K_SIMILARITY_TEXT,
        IMAGE: env.TOP_K_SIMILARITY_IMAGE,
      },
      filters: {
        // Limit the search to the allowed confluence spaces.
        filters: confluenceSpaces.map((spaceId) => ({
          key: METADATA_SPACE_ID_KEY,
          value: spaceId,
          operator: "==",
        })),
        condition: "or",
      },
    });
    const chatEngine = new ContextChatEngine({
      retriever,
    });

    const encoder = new TextEncoder();
    const customReadable = new ReadableStream({
      async start(controller) {
        const stream = await chatEngine.chat({
          message: query,
          chatHistory: messages,
          stream: true,
          verbose: true,
        });
        for await (const chunk of stream) {
          controller.enqueue(encoder.encode(chunk.response));
        }
        controller.close();
      },
    });
    return new Response(customReadable, {
      headers: {
        Connection: "keep-alive",
        "Content-Encoding": "none",
        "Cache-Control": "no-cache, no-transform",
        "Content-Type": "text/plain; charset=utf-8",
      },
    });
  } catch (error) {
    const errorMessage =
      error instanceof Error ? error.message : "An error ocurred while processing the request.";

    return Response.json(
      { error: errorMessage },
      {
        headers: { "Content-Type": "application/json" },
        status: 500,
      },
    );
  }
}

@himself65
Copy link
Member

I try this on my locally. It runs well, I'm guessing you are using different version of llamaindex & @llamaindex/core

@AndreMaz
Copy link
Contributor Author

Tested with the latest 0.6.0 and the error is still present.

The error comes from here

export function splitNodesByType(nodes: BaseNode[]): NodesByType {
  const result: NodesByType = {};

  for (const node of nodes) {
    let type: ModalityType;
    if (node instanceof ImageNode) {
      type = ModalityType.IMAGE;
    } else if (node instanceof TextNode) {
      type = ModalityType.TEXT;
    } else {
      throw new Error(`Unknown node type: ${node.type}`);
    }
    if (type in result) {
      result[type]?.push(node);
    } else {
      result[type] = [node];
    }
  }
  return result;
}

Link to source: https://github.com/run-llama/LlamaIndexTS/blob/main/packages/core/src/schema/node.ts#L438-L446

Wouldn't it be safer to replace the node instanceof ImageNode for node.type === ObjectType.IMAGE ?

@marcusschiesser
Copy link
Collaborator

@AndreMaz If we do that change, we would also have to include subclasses of ImageNode, e.g. ImageDocument. Why do you think that would be safer?

@himself65
Copy link
Member

Tested with the latest 0.6.0 and the error is still present.

The error comes from here

export function splitNodesByType(nodes: BaseNode[]): NodesByType {
  const result: NodesByType = {};

  for (const node of nodes) {
    let type: ModalityType;
    if (node instanceof ImageNode) {
      type = ModalityType.IMAGE;
    } else if (node instanceof TextNode) {
      type = ModalityType.TEXT;
    } else {
      throw new Error(`Unknown node type: ${node.type}`);
    }
    if (type in result) {
      result[type]?.push(node);
    } else {
      result[type] = [node];
    }
  }
  return result;
}

Link to source: https://github.com/run-llama/LlamaIndexTS/blob/main/packages/core/src/schema/node.ts#L438-L446

Wouldn't it be safer to replace the node instanceof ImageNode for node.type === ObjectType.IMAGE ?

It’s weird to me that we didn’t change any code related to you error stack. It’s confusing to me

@himself65
Copy link
Member

but I think it might caused by dual package somewhere, I know bundler will cause two ImageNode, TextNode in some cases

@AndreMaz
Copy link
Contributor Author

AndreMaz commented Sep 17, 2024

@himself65 with 0.6.2 the Error: Unknown node type: DOCUMENT is gone.

However, now I'm seeing the llamaindex was already imported. This breaks constructor checks and will lead to issues that you've added in https://github.com/run-llama/LlamaIndexTS/pull/1214/files. However, I don't fully understand what's going on here. I mean, to avoid having multiple versions of the package I'm declaring it in pnpm's catalog and then only use it in the files above.

I also don't mix ESM and CJS imports so I don't know what's causing this

Do you have any pointers on what should I be looking at to solve this issue?

Unrelated
In PGVectorStore if I pass my custom PG client it cannot define the dimensions or defined the embedModel

  constructor(configOrClient?: PGVectorStoreConfig | pg.ClientBase) {
    // We cannot import pg from top level, it might have side effects
    //  so we only check if the config.connect function exists
    if (
      configOrClient &&
      "connect" in configOrClient &&
      typeof configOrClient.connect === "function"
    ) {
      const db = configOrClient as pg.ClientBase;
      super();
      this.db = db;
    } else {
      const config = configOrClient as PGVectorStoreConfig;
      super(config?.embedModel);
      this.schemaName = config?.schemaName ?? PGVECTOR_SCHEMA;
      this.tableName = config?.tableName ?? PGVECTOR_TABLE;
      this.database = config?.database;
      this.connectionString = config?.connectionString;
      this.dimensions = config?.dimensions ?? 1536;
    }
  }

@AndreMaz
Copy link
Contributor Author

AndreMaz commented Sep 17, 2024

@himself65 what if we add something like pgClientConfig to PGVectorStoreConfig? Something like:

export type PGVectorStoreConfig = {
  schemaName?: string | undefined;
  tableName?: string | undefined;
  database?: string | undefined;
  connectionString?: string | undefined;
  dimensions?: number | undefined;
  embedModel?: BaseEmbedding | undefined;

  pgClientConfig?: pg.ClientConfig | undefined; <=========== CUSTOM PG CLIENT CONFIGS
};



export declare class PGVectorStore extends VectorStoreBase implements VectorStoreNoEmbedModel {
  // ... other vars //

  private db?: pg.ClientBase;
  private pgClientConfig?: pg.ClientConfig; //  <=========== REF TO CUSTOM PG CLIENT CONFIGS

  /**
   * Constructs a new instance of the PGVectorStore
   *
   * If the `connectionString` is not provided the following env variables are
   * used to connect to the DB:
   * PGHOST=your database host
   * PGUSER=your database user
   * PGPASSWORD=your database password
   * PGDATABASE=your database name
   * PGPORT=your database port
   */
  constructor(config?: PGVectorStoreConfig) {
    super(config?.embedModel);
    this.schemaName = config?.schemaName ?? PGVECTOR_SCHEMA;
    this.tableName = config?.tableName ?? PGVECTOR_TABLE;
    this.database = config?.database;
    this.connectionString = config?.connectionString;
    this.dimensions = config?.dimensions ?? 1536;

    this.pgClientConfig = config?.pgClientConfig ?? {}; <=========== REF TO CUSTOM PG CLIENT CONFIGS
  }
  
  
  // ... other fns //

private async getDb(): Promise<pg.ClientBase> {
    if (!this.db) {
      try {
        const pg = await import("pg");
        const { Client } = pg.default ? pg.default : pg;

        const { registerType } = await import("pgvector/pg");
        // Create DB connection
        // Read connection params from env - see comment block above
        const db = new Client({
          ...this.pgClientConfig, <=========== INJECT CUSTOM PARAMS 
          database: this.database,
          connectionString: this.connectionString,
        });
        await db.connect();

        // Check vector extension
        await db.query("CREATE EXTENSION IF NOT EXISTS vector");
        await registerType(db);

        // All good?  Keep the connection reference
        this.db = db;
      } catch (err) {
        console.error(err);
        return Promise.reject(err instanceof Error ? err : new Error(`${err}`));
      }
    }

    const db = this.db;

    // Check schema, table(s), index(es)
    await this.checkSchema(db);

    return Promise.resolve(this.db);
  }

This way we can easily config the PGVectorStore and the client. This should allow the user to pass custom certs (#366)

@AndreMaz
Copy link
Contributor Author

AndreMaz commented Sep 17, 2024

I try this on my locally. It runs well, I'm guessing you are using different version of llamaindex & @llamaindex/core

@himself65 here are the versions that I'm currently using

pnpm why -F @web llamaindex
Legend: production dependency, optional only, dev only

@web@6.0.1 /home/Dev/web/apps/web

dependencies:
llamaindex 0.6.2
pnpm why -F @web @llamaindex/core
Legend: production dependency, optional only, dev only

@web@6.0.1 /home/Dev/web/apps/web

dependencies:
llamaindex 0.6.2
├─┬ @llamaindex/cloud 0.2.6
│ └── @llamaindex/core 0.2.2 peer
├── @llamaindex/core 0.2.2
├─┬ @llamaindex/groq 0.0.3
│ └─┬ @llamaindex/openai 0.1.4
│   └── @llamaindex/core 0.2.2
└─┬ @llamaindex/openai 0.1.4
  └── @llamaindex/core 0.2.2

@AndreMaz
Copy link
Contributor Author

@himself65 after googling around and checking the issue that you've linked in #1214

What's the chance of some change after 0.5.20 (last one that was working without complaining) caused CJS and ESM bundling as described in this article https://www.codejam.info/2024/02/esm-cjs-dupe.html ?

@himself65
Copy link
Member

In PGVectorStore if I pass my custom PG client it cannot define the dimensions or defined the embedModel

Thanks for feedback, I didn't consider that case.

@himself65
Copy link
Member

I try this on my locally. It runs well, I'm guessing you are using different version of llamaindex & @llamaindex/core

@himself65 here are the versions that I'm currently using

pnpm why -F @web llamaindex
Legend: production dependency, optional only, dev only

@web@6.0.1 /home/Dev/web/apps/web

dependencies:
llamaindex 0.6.2
pnpm why -F @web @llamaindex/core
Legend: production dependency, optional only, dev only

@web@6.0.1 /home/Dev/web/apps/web

dependencies:
llamaindex 0.6.2
├─┬ @llamaindex/cloud 0.2.6
│ └── @llamaindex/core 0.2.2 peer
├── @llamaindex/core 0.2.2
├─┬ @llamaindex/groq 0.0.3
│ └─┬ @llamaindex/openai 0.1.4
│   └── @llamaindex/core 0.2.2
└─┬ @llamaindex/openai 0.1.4
  └── @llamaindex/core 0.2.2

Im guessing you are using pnpm monorepo, It's very offen to have dual module issue, you need check your pnpm-lock.yml and run pnpm dedupe to cleanup the deps

@himself65
Copy link
Member

himself65 commented Sep 18, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants