Error: Unknown node type: DOCUMENT #1179

AndreMaz · 2024-09-11T07:21:54Z

@himself65 bumped to 0.5.24 to see If the #1176 is fixed. It is but now I'm getting the following error:

Error: Unknown node type: DOCUMENT
at splitNodesByType (webpack-internal:///(rsc)/../../node_modules/@llamaindex/core/dist/schema/index.js:708:19)
at createMessageContent (webpack-internal:///(rsc)/../../node_modules/llamaindex/dist/synthesizers/utils.js:11:94)
at DefaultContextGenerator.generate (webpack-internal:///(rsc)/../../node_modules/llamaindex/dist/engines/chat/DefaultContextGenerator.js:50:107)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async ContextChatEngine.prepareRequestMessages (webpack-internal:///(rsc)/../../node_modules/llamaindex/dist/engines/chat/ContextChatEngine.js:470:25)
at async ContextChatEngine.chat (webpack-internal:///(rsc)/../../node_modules/llamaindex/dist/engines/chat/ContextChatEngine.js:437:33)
at async Object.start (webpack-internal:///(rsc)/./src/app/api/chat/route.ts:60:32)

Note: This error was not present in the v0.5.20. Something that I initially described in #1172

I can repro the issue exactly in the same way as I've mentioned in the ticket above.
From the ticket:

I'm using llamaindex at app/api/chat/route.ts , in other words, I'm using Route Handlers
When I start my app and open /home next.js compiles the app and I can call llamaindex at app/api/chat/route.ts without any error. Then, when I navigate to another page (e.g, /products) next.js does another compilation. After this when I call the handler at app/api/chat/route.ts I get the error above.

The text was updated successfully, but these errors were encountered:

himself65 · 2024-09-11T16:13:43Z

what's your code snippt? I'm guessing there's dual module in the output

himself65 · 2024-09-11T16:28:42Z

I cannot reproduce the issue but I think this might related to the bundler

AndreMaz · 2024-09-12T07:54:00Z

Here's my next.config.mjs

/**
 * @type {import("next").NextConfig}
 */

import { fileURLToPath } from "url";
import _jiti from "jiti";
import withLlamaIndex from "llamaindex/next";

const jiti = _jiti(fileURLToPath(import.meta.url));

// Import env files to validate at build time. Use jiti so we can load .ts files in here.
jiti("./src/env");

const isStaticExport = "false";

const nextConfig = {
  basePath: process.env.NEXT_PUBLIC_BASE_PATH,
  env: {
    BUILD_STATIC_EXPORT: isStaticExport,
  },
  // Trailing slashes must be disabled for Next Auth callback endpoint to work
  // https://stackoverflow.com/a/78348528
  trailingSlash: false,
  modularizeImports: {
    "@mui/icons-material": {
      transform: "@mui/icons-material/{{member}}",
    },
    "@mui/material": {
      transform: "@mui/material/{{member}}",
    },
    "@mui/lab": {
      transform: "@mui/lab/{{member}}",
    },
  },
  webpack(config) {
    config.module.rules.push({
      test: /\.svg$/,
      use: ["@svgr/webpack"],
    });

    // To allow chatbot to work
    // Extracted from: https://github.com/neondatabase/examples/blob/main/ai/llamaindex/rag-nextjs/next.config.mjs
    config.resolve.alias = {
      ...config.resolve.alias,
      sharp$: false,
      "onnxruntime-node$": false,
    };

    return config;
  },
  ...(isStaticExport === "true" && {
    output: "export",
  }),
};

export default withLlamaIndex(nextConfig);

All llamaindex-related code is located in app/api/chat

config.ts

import { OpenAI, OpenAIEmbedding, Settings } from "llamaindex";

import { env } from "~/env";

const aiGlobalConfig = globalThis as unknown as {
  llmModel?: OpenAI;
  embedModel?: OpenAIEmbedding;
};

const llmModel =
  aiGlobalConfig.llmModel ??
  new OpenAI({
    apiKey: env.OPENAI_API_KEY,
    model: env.OPENAI_MODEL_NAME,
  });

if (env.NODE_ENV !== "production") {
  aiGlobalConfig.llmModel = llmModel;
}

const embedModel =
  aiGlobalConfig.embedModel ??
  new OpenAIEmbedding({
    apiKey: env.OPENAI_API_KEY,
    model: env.OPENAI_EMBED_MODEL_NAME,
  });

if (env.NODE_ENV !== "production") {
  aiGlobalConfig.embedModel = embedModel;
}

// LlamaIndex Settings
Settings.llm = llmModel;

Settings.embedModel = embedModel;

export { Settings };

verctor-store.ts

import { PGVectorStore } from "llamaindex/vector-store/PGVectorStore";

import { env } from "~/env";

const DIMMS = env.EMBED_DIM;

const CONN_STRING = env.VECTOR_STORE_PG_URL;

const globalForVectorStore = globalThis as unknown as {
  vectorStore: PGVectorStore | undefined;
};

const storedVectorStore =
  globalForVectorStore.vectorStore ??
  new PGVectorStore({
    dimensions: DIMMS,
    connectionString: CONN_STRING,
  });

if (env.NODE_ENV !== "production") {
  globalForVectorStore.vectorStore = storedVectorStore;
}

const vectorStore = storedVectorStore;

export { vectorStore };

route.ts

import type { NextRequest } from "next/server";
import { ContextChatEngine, serviceContextFromDefaults, VectorStoreIndex } from "llamaindex";

import type { ChatBotPayload } from "./types";
import { env } from "~/env";
import { Settings } from "./config";
import { vectorStore } from "./vector-store";

/**
 * Key used to store the space ID in the metadata.
 */
const METADATA_SPACE_ID_KEY = "space_id";

export async function POST(request: NextRequest) {
  try {
    const { messages = [], spaces: confluenceSpaces = [] } =
      (await request.json()) as ChatBotPayload;

    if (confluenceSpaces.length === 0) {
      throw new Error("No confluence spaces provided.");
    }

    const userMessages = messages.filter((i) => i.role === "user");
    const query = userMessages[userMessages.length - 1]?.content;

    if (!query) {
      throw new Error("No query provided.");
    }

    const serviceContext = serviceContextFromDefaults({ embedModel: Settings.embedModel });
    const index = await VectorStoreIndex.fromVectorStore(vectorStore, serviceContext);
    const retriever = index.asRetriever({
      topK: {
        TEXT: env.TOP_K_SIMILARITY_TEXT,
        IMAGE: env.TOP_K_SIMILARITY_IMAGE,
      },
      filters: {
        // Limit the search to the allowed confluence spaces.
        filters: confluenceSpaces.map((spaceId) => ({
          key: METADATA_SPACE_ID_KEY,
          value: spaceId,
          operator: "==",
        })),
        condition: "or",
      },
    });
    const chatEngine = new ContextChatEngine({
      retriever,
    });

    const encoder = new TextEncoder();
    const customReadable = new ReadableStream({
      async start(controller) {
        const stream = await chatEngine.chat({
          message: query,
          chatHistory: messages,
          stream: true,
          verbose: true,
        });
        for await (const chunk of stream) {
          controller.enqueue(encoder.encode(chunk.response));
        }
        controller.close();
      },
    });
    return new Response(customReadable, {
      headers: {
        Connection: "keep-alive",
        "Content-Encoding": "none",
        "Cache-Control": "no-cache, no-transform",
        "Content-Type": "text/plain; charset=utf-8",
      },
    });
  } catch (error) {
    const errorMessage =
      error instanceof Error ? error.message : "An error ocurred while processing the request.";

    return Response.json(
      { error: errorMessage },
      {
        headers: { "Content-Type": "application/json" },
        status: 500,
      },
    );
  }
}

himself65 · 2024-09-12T22:15:10Z

I try this on my locally. It runs well, I'm guessing you are using different version of llamaindex & @llamaindex/core

AndreMaz · 2024-09-16T07:26:34Z

Tested with the latest 0.6.0 and the error is still present.

The error comes from here

export function splitNodesByType(nodes: BaseNode[]): NodesByType {
  const result: NodesByType = {};

  for (const node of nodes) {
    let type: ModalityType;
    if (node instanceof ImageNode) {
      type = ModalityType.IMAGE;
    } else if (node instanceof TextNode) {
      type = ModalityType.TEXT;
    } else {
      throw new Error(`Unknown node type: ${node.type}`);
    }
    if (type in result) {
      result[type]?.push(node);
    } else {
      result[type] = [node];
    }
  }
  return result;
}

Link to source: https://github.com/run-llama/LlamaIndexTS/blob/main/packages/core/src/schema/node.ts#L438-L446

Wouldn't it be safer to replace the node instanceof ImageNode for node.type === ObjectType.IMAGE ?

marcusschiesser · 2024-09-16T09:57:44Z

@AndreMaz If we do that change, we would also have to include subclasses of ImageNode, e.g. ImageDocument. Why do you think that would be safer?

himself65 · 2024-09-16T14:55:20Z

Tested with the latest 0.6.0 and the error is still present.

The error comes from here

export function splitNodesByType(nodes: BaseNode[]): NodesByType {
  const result: NodesByType = {};

  for (const node of nodes) {
    let type: ModalityType;
    if (node instanceof ImageNode) {
      type = ModalityType.IMAGE;
    } else if (node instanceof TextNode) {
      type = ModalityType.TEXT;
    } else {
      throw new Error(`Unknown node type: ${node.type}`);
    }
    if (type in result) {
      result[type]?.push(node);
    } else {
      result[type] = [node];
    }
  }
  return result;
}

Link to source: https://github.com/run-llama/LlamaIndexTS/blob/main/packages/core/src/schema/node.ts#L438-L446

Wouldn't it be safer to replace the node instanceof ImageNode for node.type === ObjectType.IMAGE ?

It’s weird to me that we didn’t change any code related to you error stack. It’s confusing to me

himself65 · 2024-09-16T18:01:39Z

but I think it might caused by dual package somewhere, I know bundler will cause two ImageNode, TextNode in some cases

AndreMaz · 2024-09-17T09:24:59Z

@himself65 with 0.6.2 the Error: Unknown node type: DOCUMENT is gone.

However, now I'm seeing the llamaindex was already imported. This breaks constructor checks and will lead to issues that you've added in https://github.com/run-llama/LlamaIndexTS/pull/1214/files. However, I don't fully understand what's going on here. I mean, to avoid having multiple versions of the package I'm declaring it in pnpm's catalog and then only use it in the files above.

I also don't mix ESM and CJS imports so I don't know what's causing this

Do you have any pointers on what should I be looking at to solve this issue?

Unrelated
In PGVectorStore if I pass my custom PG client it cannot define the dimensions or defined the embedModel

  constructor(configOrClient?: PGVectorStoreConfig | pg.ClientBase) {
    // We cannot import pg from top level, it might have side effects
    //  so we only check if the config.connect function exists
    if (
      configOrClient &&
      "connect" in configOrClient &&
      typeof configOrClient.connect === "function"
    ) {
      const db = configOrClient as pg.ClientBase;
      super();
      this.db = db;
    } else {
      const config = configOrClient as PGVectorStoreConfig;
      super(config?.embedModel);
      this.schemaName = config?.schemaName ?? PGVECTOR_SCHEMA;
      this.tableName = config?.tableName ?? PGVECTOR_TABLE;
      this.database = config?.database;
      this.connectionString = config?.connectionString;
      this.dimensions = config?.dimensions ?? 1536;
    }
  }

AndreMaz · 2024-09-17T10:42:58Z

@himself65 what if we add something like pgClientConfig to PGVectorStoreConfig? Something like:

export type PGVectorStoreConfig = {
  schemaName?: string | undefined;
  tableName?: string | undefined;
  database?: string | undefined;
  connectionString?: string | undefined;
  dimensions?: number | undefined;
  embedModel?: BaseEmbedding | undefined;

  pgClientConfig?: pg.ClientConfig | undefined; <=========== CUSTOM PG CLIENT CONFIGS
};



export declare class PGVectorStore extends VectorStoreBase implements VectorStoreNoEmbedModel {
  // ... other vars //

  private db?: pg.ClientBase;
  private pgClientConfig?: pg.ClientConfig; //  <=========== REF TO CUSTOM PG CLIENT CONFIGS

  /**
   * Constructs a new instance of the PGVectorStore
   *
   * If the `connectionString` is not provided the following env variables are
   * used to connect to the DB:
   * PGHOST=your database host
   * PGUSER=your database user
   * PGPASSWORD=your database password
   * PGDATABASE=your database name
   * PGPORT=your database port
   */
  constructor(config?: PGVectorStoreConfig) {
    super(config?.embedModel);
    this.schemaName = config?.schemaName ?? PGVECTOR_SCHEMA;
    this.tableName = config?.tableName ?? PGVECTOR_TABLE;
    this.database = config?.database;
    this.connectionString = config?.connectionString;
    this.dimensions = config?.dimensions ?? 1536;

    this.pgClientConfig = config?.pgClientConfig ?? {}; <=========== REF TO CUSTOM PG CLIENT CONFIGS
  }
  
  
  // ... other fns //

private async getDb(): Promise<pg.ClientBase> {
    if (!this.db) {
      try {
        const pg = await import("pg");
        const { Client } = pg.default ? pg.default : pg;

        const { registerType } = await import("pgvector/pg");
        // Create DB connection
        // Read connection params from env - see comment block above
        const db = new Client({
          ...this.pgClientConfig, <=========== INJECT CUSTOM PARAMS 
          database: this.database,
          connectionString: this.connectionString,
        });
        await db.connect();

        // Check vector extension
        await db.query("CREATE EXTENSION IF NOT EXISTS vector");
        await registerType(db);

        // All good?  Keep the connection reference
        this.db = db;
      } catch (err) {
        console.error(err);
        return Promise.reject(err instanceof Error ? err : new Error(`${err}`));
      }
    }

    const db = this.db;

    // Check schema, table(s), index(es)
    await this.checkSchema(db);

    return Promise.resolve(this.db);
  }

This way we can easily config the PGVectorStore and the client. This should allow the user to pass custom certs (#366)

AndreMaz · 2024-09-17T11:27:00Z

I try this on my locally. It runs well, I'm guessing you are using different version of llamaindex & @llamaindex/core

@himself65 here are the versions that I'm currently using

pnpm why -F @web llamaindex
Legend: production dependency, optional only, dev only

@web@6.0.1 /home/Dev/web/apps/web

dependencies:
llamaindex 0.6.2

pnpm why -F @web @llamaindex/core
Legend: production dependency, optional only, dev only

@web@6.0.1 /home/Dev/web/apps/web

dependencies:
llamaindex 0.6.2
├─┬ @llamaindex/cloud 0.2.6
│ └── @llamaindex/core 0.2.2 peer
├── @llamaindex/core 0.2.2
├─┬ @llamaindex/groq 0.0.3
│ └─┬ @llamaindex/openai 0.1.4
│   └── @llamaindex/core 0.2.2
└─┬ @llamaindex/openai 0.1.4
  └── @llamaindex/core 0.2.2

AndreMaz · 2024-09-18T13:11:03Z

@himself65 after googling around and checking the issue that you've linked in #1214

What's the chance of some change after 0.5.20 (last one that was working without complaining) caused CJS and ESM bundling as described in this article https://www.codejam.info/2024/02/esm-cjs-dupe.html ?

himself65 · 2024-09-18T20:40:50Z

In PGVectorStore if I pass my custom PG client it cannot define the dimensions or defined the embedModel

Thanks for feedback, I didn't consider that case.

himself65 · 2024-09-18T20:42:38Z

I try this on my locally. It runs well, I'm guessing you are using different version of llamaindex & @llamaindex/core

@himself65 here are the versions that I'm currently using

pnpm why -F @web llamaindex
Legend: production dependency, optional only, dev only

@web@6.0.1 /home/Dev/web/apps/web

dependencies:
llamaindex 0.6.2

pnpm why -F @web @llamaindex/core
Legend: production dependency, optional only, dev only

@web@6.0.1 /home/Dev/web/apps/web

dependencies:
llamaindex 0.6.2
├─┬ @llamaindex/cloud 0.2.6
│ └── @llamaindex/core 0.2.2 peer
├── @llamaindex/core 0.2.2
├─┬ @llamaindex/groq 0.0.3
│ └─┬ @llamaindex/openai 0.1.4
│   └── @llamaindex/core 0.2.2
└─┬ @llamaindex/openai 0.1.4
  └── @llamaindex/core 0.2.2

Im guessing you are using pnpm monorepo, It's very offen to have dual module issue, you need check your pnpm-lock.yml and run pnpm dedupe to cleanup the deps

himself65 · 2024-09-18T20:43:47Z

Or do sth like this in your config https://github.com/toeverything/AFFiNE/pull/1276/files#diff-197cd8ca285a4abd2f21479e0bf6e36e90b08528fcd7f3bdbe8d1221897e377dR87 replace yjs to llamaindex

himself65 self-assigned this Sep 11, 2024

himself65 added the bug Something isn't working label Sep 11, 2024

himself65 mentioned this issue Sep 16, 2024

fix: replace instanceof check with .type check #1214

Merged

himself65 closed this as completed in #1214 Sep 16, 2024

AndreMaz mentioned this issue Sep 18, 2024

feat: update PGVectorStore constructor parameters #1225

Merged

AndreMaz mentioned this issue Sep 18, 2024

@next/bundle-analyzer throws an error with nextjs-node-runtime example #1226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: Unknown node type: DOCUMENT #1179

Error: Unknown node type: DOCUMENT #1179

AndreMaz commented Sep 11, 2024 •

edited

Loading

himself65 commented Sep 11, 2024

himself65 commented Sep 11, 2024

AndreMaz commented Sep 12, 2024

himself65 commented Sep 12, 2024

AndreMaz commented Sep 16, 2024

marcusschiesser commented Sep 16, 2024

himself65 commented Sep 16, 2024

himself65 commented Sep 16, 2024

AndreMaz commented Sep 17, 2024 •

edited

Loading

AndreMaz commented Sep 17, 2024 •

edited

Loading

AndreMaz commented Sep 17, 2024 •

edited

Loading

AndreMaz commented Sep 18, 2024

himself65 commented Sep 18, 2024

himself65 commented Sep 18, 2024

himself65 commented Sep 18, 2024 •

edited

Loading

Error: Unknown node type: DOCUMENT #1179

Error: Unknown node type: DOCUMENT #1179

Comments

AndreMaz commented Sep 11, 2024 • edited Loading

himself65 commented Sep 11, 2024

himself65 commented Sep 11, 2024

AndreMaz commented Sep 12, 2024

himself65 commented Sep 12, 2024

AndreMaz commented Sep 16, 2024

marcusschiesser commented Sep 16, 2024

himself65 commented Sep 16, 2024

himself65 commented Sep 16, 2024

AndreMaz commented Sep 17, 2024 • edited Loading

AndreMaz commented Sep 17, 2024 • edited Loading

AndreMaz commented Sep 17, 2024 • edited Loading

AndreMaz commented Sep 18, 2024

himself65 commented Sep 18, 2024

himself65 commented Sep 18, 2024

himself65 commented Sep 18, 2024 • edited Loading

AndreMaz commented Sep 11, 2024 •

edited

Loading

AndreMaz commented Sep 17, 2024 •

edited

Loading

AndreMaz commented Sep 17, 2024 •

edited

Loading

AndreMaz commented Sep 17, 2024 •

edited

Loading

himself65 commented Sep 18, 2024 •

edited

Loading