Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowledge - Ingestion fails sometimes with context deadline exceeded errorr #807

Closed
sangee2004 opened this issue Dec 6, 2024 · 2 comments
Assignees
Labels
bug Something isn't working knowledge

Comments

@sangee2004
Copy link

When ingesting files from corp docs which has about 600+ files, ingestion of 1 of the files fails sometimes with this context deadline exceeded error.

failed to ingest file: ingestion failed for at least one file: failed to add documents from file "ws://s3://test-otto8-workspaces/0ebf2d40-18af-4535-bd30-8d0c8645f593/.conversion/Corp Docs/Legal/State Registrations/2024/Arizona/Acorn Labs Inc. - AZ - 2024.pdf.json": failed to embed document c8f57d78-aef2-42b3-9c95-430791f2448a: error sending request(s): retry limit (5) exceeded or failed with non-retriable error(s): #1/5: 502 <> (err: <nil>); #2/5: 502 <> (err: <nil>); #3/5: 502 <> (err: <nil>); #4/5: 502 <> (err: <nil>); #5/5: failed to send request: Post "https://test.otto8.ai/api/llm-proxy/embeddings": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I see this error happen very infrequently.
Out of the 4 times i ingested corp docs , this error happened for 2 of the attempts for 1 file.

from api:

   {
      "id": "16f58d10230f8b49d84e759619387da4b1034e61bb8fcfc18f055e78c6d4e93d",
      "created": "2024-12-06T20:45:48Z",
      "revision": "1016537",
      "type": "knowledgefile",
      "fileName": "/Corp Docs/Legal/Vendors/Agile SEO/Agile SEO - Services Proposal and MSA - Acorn v3.docx",
      "state": "error",
      "error": "failed to ingest file: ingestion failed for at least one file: failed to add documents from file \"ws://s3://test-otto8-workspaces/0ebf2d40-18af-4535-bd30-8d0c8645f593/.conversion/Corp Docs/Legal/State Registrations/2024/Arizona/Acorn Labs Inc. - AZ - 2024.pdf.json\": failed to embed document c8f57d78-aef2-42b3-9c95-430791f2448a: error sending request(s): retry limit (5) exceeded or failed with non-retriable error(s): #1/5: 502 \u003C\u003E (err: \u003Cnil\u003E); #2/5: 502 \u003C\u003E (err: \u003Cnil\u003E); #3/5: 502 \u003C\u003E (err: \u003Cnil\u003E); #4/5: 502 \u003C\u003E (err: \u003Cnil\u003E); #5/5: failed to send request: Post \"https://test.otto8.ai/api/llm-proxy/embeddings\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)",
      "agentID": "a12c8zm",
      "knowledgeSetID": "kst1-a12c8zm",
      "knowledgeSourceID": "ks1qgtlh",
      "approved": true,
      "url": "https://acorn1-my.sharepoint.com/personal/sheng_acorn_io/_layouts/15/Doc.aspx?sourcedoc=%7BB8CD1B1C-5D0B-462B-891E-D98912EA9D89%7D&file=Agile%20SEO%20-%20Services%20Proposal%20and%20MSA%20-%20Acorn%20v3.docx&action=default&mobileredirect=true",
      "updatedAt": "2024-08-22 21:45:29 +0000 UTC",
      "lastIngestionStartTime": "2024-12-06T21:36:32Z",
      "lastIngestionEndTime": "2024-12-06T21:36:38Z",
      "lastRunIDs": [
        "r1bjwrf",
        "r14ln7q"
      ],
      "sizeInBytes": 307839
    },

debug logs:


  "frames": {
    "1733552479": {
      "chatResponseCached": false,
      "currentAgent": {

      },
      "displayText": "Running Knowledge Ingestion from /otto8-tools/knowledge/ingest.gpt",
      "end": "0001-01-01T00:00:00Z",
      "id": "1733552479",
      "input": "{\"dataset\":\"default/kst1-a12c8zm\",\"input\":\".conversion/Corp Docs/Legal/Vendors/Agile SEO/Agile SEO - Services Proposal and MSA - Acorn v3.docx.json\",\"metadata_json\":{\"url\":\"https://acorn1-my.sharepoint.com/personal/sheng_acorn_io/_layouts/15/Doc.aspx?sourcedoc=%7BB8CD1B1C-5D0B-462B-891E-D98912EA9D89%7D\\u0026file=Agile%20SEO%20-%20Services%20Proposal%20and%20MSA%20-%20Acorn%20v3.docx\\u0026action=default\\u0026mobileredirect=true\",\"workspaceFileName\":\".conversion/Corp Docs/Legal/Vendors/Agile SEO/Agile SEO - Services Proposal and MSA - Acorn v3.docx.json\",\"workspaceID\":\"s3://test-otto8-workspaces/35500f29-23a1-4dcf-8063-fd2b11f532f8\"}}",
      "inputContext": null,
      "llmRequest": {
        "command": [
          "/bin/sh",
          "-c",
          "exec ${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool ingest --flows-file=blueprint:otto --dataset ${DATASET} \"ws://${INPUT}\""
        ],
        "input": "{\"dataset\":\"default/kst1-a12c8zm\",\"input\":\".conversion/Corp Docs/Legal/Vendors/Agile SEO/Agile SEO - Services Proposal and MSA - Acorn v3.docx.json\",\"metadata_json\":{\"url\":\"https://acorn1-my.sharepoint.com/personal/sheng_acorn_io/_layouts/15/Doc.aspx?sourcedoc=%7BB8CD1B1C-5D0B-462B-891E-D98912EA9D89%7D\\u0026file=Agile%20SEO%20-%20Services%20Proposal%20and%20MSA%20-%20Acorn%20v3.docx\\u0026action=default\\u0026mobileredirect=true\",\"workspaceFileName\":\".conversion/Corp Docs/Legal/Vendors/Agile SEO/Agile SEO - Services Proposal and MSA - Acorn v3.docx.json\",\"workspaceID\":\"s3://test-otto8-workspaces/35500f29-23a1-4dcf-8063-fd2b11f532f8\"}}"
      },
      "llmResponse": null,
      "output": null,
      "start": "2024-12-06T21:36:37.823380837Z",
      "tool": {
        "arguments": {
          "properties": {
            "Dataset": {
              "description": "Dataset ID",
              "type": "string"
            },
            "Input": {
              "description": "Input File",
              "type": "string"
            }
          },
          "type": "object"
        },
        "credentials": [
          "github.com/gptscript-ai/credentials/model-provider"
        ],
        "description": "Ingest content into a dataset.",
        "id": "/otto8-tools/knowledge/ingest.gpt:Knowledge Ingestion",
        "instructions": "#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool ingest --flows-file=blueprint:otto --dataset ${DATASET} \"ws://${INPUT}\"",
        "internalPrompt": null,
        "localTools": {
          "knowledge ingestion": "/otto8-tools/knowledge/ingest.gpt:Knowledge Ingestion"
        },
        "modelName": "llm",
        "name": "Knowledge Ingestion",
        "source": {
          "lineNo": 1,
          "location": "/otto8-tools/knowledge/ingest.gpt"
        },
        "toolMapping": {
          "github.com/gptscript-ai/credentials/model-provider": [
            {
              "reference": "github.com/gptscript-ai/credentials/model-provider",
              "toolID": "https://raw.githubusercontent.com/gptscript-ai/credentials/bd959c8f57a499835b927453645f94bf47973774/model-provider/tool.gpt:GPTScript Model Provider Credential"
            }
          ]
        },
        "workingDir": "/otto8-tools/knowledge"
      },
      "toolResults": 0,
      "type": "callChat",
      "usage": {

      }
    },
    "1733552480": {
      "chatResponseCached": false,
      "currentAgent": {

      },
      "displayText": "",
      "end": "2024-12-06T21:36:37.823789213Z",
      "id": "1733552480",
      "input": "",
      "inputContext": null,
      "llmRequest": {
        "command": [
          "sys.model.provider.credential"
        ],
        "input": ""
      },
      "llmResponse": {
        "err": null,
        "fullOutput": "",
        "output": "{\"env\":{\"OPENAI_API_KEY\":\"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZ2VudElEIjoiIiwiUnVuSUQiOiJyMTRsbjdxIiwiU2NvcGUiOiJkZWZhdWx0IiwiVGhyZWFkSUQiOiJ0MS1rczFxZ3RsaCIsIldvcmtmbG93SUQiOiIiLCJXb3JrZmxvd1N0ZXBJRCI6IiJ9.GPV58sR1ysyJFNCrOwfPGOb8nlNDDCZkmtbyzEsQP84\",\"OPENAI_BASE_URL\":\"https://test.otto8.ai/api/llm-proxy\"},\"ephemeral\":true}"
      },
      "output": [
        {
          "content": "",
          "subCalls": null
        }
      ],
      "parentID": "1733552479",
      "start": "2024-12-06T21:36:37.823449268Z",
      "tool": {
        "arguments": {
          "type": "object"
        },
        "description": "A credential tool to set the OPENAI_API_KEY and OPENAI_BASE_URL to give access to the default model provider",
        "id": "sys.model.provider.credential",
        "instructions": "#!sys.model.provider.credential",
        "internalPrompt": null,
        "modelName": "llm",
        "name": "sys.model.provider.credential",
        "source": {

        }
      },
      "toolCategory": "credential",
      "toolResults": 0,
      "type": "callFinish",
      "usage": {

      }
    }
  },
  "spec": {
    "synchronous": true,
    "threadName": "t1-ks1qgtlh",
    "input": "{\"dataset\":\"default/kst1-a12c8zm\",\"input\":\".conversion/Corp Docs/Legal/Vendors/Agile SEO/Agile SEO - Services Proposal and MSA - Acorn v3.docx.json\",\"metadata_json\":{\"url\":\"https://acorn1-my.sharepoint.com/personal/sheng_acorn_io/_layouts/15/Doc.aspx?sourcedoc=%7BB8CD1B1C-5D0B-462B-891E-D98912EA9D89%7D\\u0026file=Agile%20SEO%20-%20Services%20Proposal%20and%20MSA%20-%20Acorn%20v3.docx\\u0026action=default\\u0026mobileredirect=true\",\"workspaceFileName\":\".conversion/Corp Docs/Legal/Vendors/Agile SEO/Agile SEO - Services Proposal and MSA - Acorn v3.docx.json\",\"workspaceID\":\"s3://test-otto8-workspaces/35500f29-23a1-4dcf-8063-fd2b11f532f8\"}}",
    "env": [
      "OPENAI_EMBEDDING_MODEL=m1-openai-model-provider-dbb48b3aa923fc46689511082d2f8b18-60708"
    ],
    "tool": "\"knowledge-ingest\"",
    "defaultModel": "llm",
    "timeout": "1h0m0s"
  },
  "status": {
    "state": "error",
    "output": "",
    "endTime": "2024-12-06T21:36:39Z",
    "error": "run encountered an error: failed to read events: context canceled with error output: "
  }
}
@sangee2004 sangee2004 added bug Something isn't working knowledge labels Dec 6, 2024
@iwilltry42 iwilltry42 self-assigned this Dec 13, 2024
@iwilltry42
Copy link
Contributor

The providers have since been rewritten and we have retry mechanisms. Please retest.

@sangee2004
Copy link
Author

Have not been able to reproduce this error when testing with latest builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working knowledge
Projects
None yet
Development

No branches or pull requests

2 participants