Skip to content

# Prometheus Evaluation on Open-Source Issues #87

@Wes1eyyy

Description

@Wes1eyyy

Test Case 1 Summary

Date: July 11, 2025
Test Target: LangChain Issue Processing (Tool Schema Generation Bug)
Result: FAILED - Context Token Limit Exceeded

Repository: langchain-ai/langchain
Issue: #31808 - bug in tool schema generation - missing field description for fields with pydantic model type
URL: langchain-ai/langchain#31808

Error Analysis

Primary Issue

  • Error Type: OpenAI API BadRequestError
  • Root Cause: Context length exceeded model limits
  • Token Usage: 79,472 tokens requested vs 65,536 limit
  • Breakdown: 71,472 tokens in messages + 8,000 completion tokens

Technical Details

openai.BadRequestError: Error code: 400 - {
  'error': {
    'message': "This model's maximum context length is 65536 tokens. 
               However, you requested 79472 tokens (71472 in the messages, 
               8000 in the completion). Please reduce the length of the 
               messages or completion.",
    'type': 'invalid_request_error'
  }
}

System Behavior

  • Retry Attempts: 3/3 failed
  • Component: Context Retrieval Subgraph
  • Final Status: RuntimeError - "Failed to retrieve context after maximum attempts"
  • Container Cleanup: Successful

Impact Assessment

Functional Impact

  • Context Retrieval: Completely failed
  • Bug Analysis: Unable to proceed
  • Patch Generation: Not reached
  • System Stability: No crashes, clean error handling

Performance Metrics

  • Processing Time: ~10 seconds before failure
  • Resource Usage: Within normal limits
  • Error Recovery: Graceful degradation

Recommendations

Immediate Actions

  1. Reduce MAX_INPUT_TOKENS from current setting to <57,000
  2. Lower CHUNK_SIZE to optimize context segmentation
  3. Implement context truncation for large issues

Configuration Changes

MAX_INPUT_TOKENS=50000
CHUNK_SIZE=6000
MAX_COMPLETION_TOKENS=4000

Long-term Solutions

  • Implement intelligent context pruning
  • Add progressive context loading
  • Consider model with larger context window

Test Environment

  • Platform: Docker Compose
  • Model: DeepSeek-V3 (65,536 token limit)
  • Issue Type: Complex LangChain tool schema bug

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions