-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Do you need to file an issue?
- I have searched the existing issues and this bug is not already filed.
- I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
Preparing to upload PDF file...
Uploaded: 2026test13.pdf, parsing...
Initializing...
Saving PDF: 2026test13.pdf
Parsing PDF exam paper (MinerU)...
Executing question generation workflow...
Parsing PDF with MinerU...
PDF parsed successfully: 2026test13
Extracting reference questions from exam...
Error: Question extraction failed
Error: Question extraction failed
2026-01-24 11:14:25.293 | 🔄 Step 1: parse the PDF exam
2026-01-24 11:14:25.293 | --------------------------------------------------------------------------------
2026-01-24 11:14:26.804 | ✓ Detected MinerU command: mineru
2026-01-24 11:14:26.805 | 📄 PDF file: /app/data/user/question/mimic_papers/mimic_20260124_031425_2026test13/2026test13.pdf
2026-01-24 11:14:26.805 | 📁 Output directory: /app/data/user/question/mimic_papers/mimic_20260124_031425_2026test13/2026test13
2026-01-24 11:14:26.805 | → Starting parsing...
2026-01-24 11:14:26.805 | 🔧 Executing command: mineru -p /app/data/user/question/mimic_papers/mimic_20260124_031425_2026test13/2026test13.pdf -o /app/data/user/question/mimic_papers/mimic_20260124_031425_2026test13/temp_mineru_output
2026-01-24 11:51:58.013 | ✓ MinerU parsing completed!
2026-01-24 11:51:58.016 | 📦 Files saved to: /app/data/user/question/mimic_papers/mimic_20260124_031425_2026test13/2026test13
2026-01-24 11:51:58.017 |
2026-01-24 11:51:58.017 | 📋 Generated files:
2026-01-24 11:51:58.018 | - hybrid_auto/2026test13.md
2026-01-24 11:51:58.018 | - hybrid_auto/2026test13_origin.pdf
2026-01-24 11:51:58.018 | - hybrid_auto/2026test13_model.json
2026-01-24 11:51:58.019 | - hybrid_auto/2026test13_content_list.json
2026-01-24 11:51:58.019 | - hybrid_auto/2026test13_middle.json
2026-01-24 11:51:58.019 | - hybrid_auto/2026test13_content_list_v2.json
2026-01-24 11:51:58.019 | - hybrid_auto/2026test13_layout.pdf
2026-01-24 11:51:58.020 |
2026-01-24 11:51:58.020 | 🔍 Step 2: locating parsed results
2026-01-24 11:51:58.020 | --------------------------------------------------------------------------------
2026-01-24 11:51:58.020 | ✓ Parsed folder: 2026test13
2026-01-24 11:51:58.020 |
2026-01-24 11:51:58.021 | 🔄 Step 3: extract reference questions
2026-01-24 11:51:58.021 | --------------------------------------------------------------------------------
2026-01-24 11:51:58.021 | 📄 No question file found, starting extraction...
2026-01-24 11:51:58.022 | 📁 Paper directory: /app/data/user/question/mimic_papers/mimic_20260124_031425_2026test13/2026test13
2026-01-24 11:51:58.022 | ✗ Error: No markdown file found in /app/data/user/question/mimic_papers/mimic_20260124_031425_2026test13/2026test13
2026-01-24 11:51:58.022 | ✗ Error: Unable to load paper content
2026-01-24 11:51:58.023 | [QuestionAPI] ERROR: Mimic generation failed: Question extraction failed
2026-01-24 11:51:58.031 | INFO: 127.0.0.1:45134 - "GET / HTTP/1.1" 200 OK
2026-01-24 11:51:58.041 | INFO: 127.0.0.1:43894 - "GET / HTTP/1.1" 200 OK
2026-01-24 11:51:58.041 | 2026-01-24 03:51:58,041 INFO reaped unknown pid 4616 (exit status 0)
2026-01-24 11:51:58.042 | INFO: 127.0.0.1:44636 - "GET / HTTP/1.1" 200 OK
2026-01-24 11:51:58.042 | 2026-01-24 03:51:58,042 INFO reaped unknown pid 4623 (exit status 0)
Steps to reproduce
No response
Expected Behavior
No response
Related Module
Question Generator
Configuration Used
==============================================================================
DeepTutor Environment Configuration
==============================================================================
Copy this file to .env and fill in the values.
Required fields are marked with [Required], optional fields with [Optional].
==============================================================================
Server Ports
==============================================================================
[Optional] Backend API server port
BACKEND_PORT=8001
[Optional] Frontend server port
FRONTEND_PORT=3782
==============================================================================
LLM Configuration (Large Language Model)
==============================================================================
Primary LLM for all AI operations (chat, research, solve, etc.)
[Required] Provider binding: openai, azure_openai, anthropic,
deepseek, openrouter, groq, together, mistral
ollama, lm_studio, vllm, llama_cpp
LLM_BINDING=openai
[Required] Model name (e.g., gpt-4o, deepseek-chat, claude-3-5-sonnet)
LLM_MODEL=deepseek-chat
[Required] API key for the LLM provider
LLM_API_KEY=sk-xxx
[Required] API endpoint URL
LLM_HOST=https://api.deepseek.com/v1
[Optional] API version (required for Azure OpenAI)
LLM_API_VERSION=
==============================================================================
Embedding Configuration
==============================================================================
Embedding model for RAG (Retrieval-Augmented Generation)
[Required] Provider: openai, azure_openai, jina,
cohere, huggingface, google, ollama, lm_studio
EMBEDDING_BINDING=openai
[Required] Model name
EMBEDDING_MODEL=text-embedding-v4
[Required] API key
EMBEDDING_API_KEY=sk-xxx
[Required] API endpoint URL
EMBEDDING_HOST=https://dashscope.aliyuncs.com/compatible-mode/v1
[Required] Vector dimensions (must match model output)
EMBEDDING_DIMENSION=1536
[Optional] API version (for Azure OpenAI)
EMBEDDING_API_VERSION=
==============================================================================
TTS Configuration (Text-to-Speech)
==============================================================================
Optional: Enable audio narration features
[Optional] Provider: openai, azure_openai
TTS_BINDING=openai
[Optional] TTS model name
TTS_MODEL=tts-1
[Optional] API key (can be same as LLM_API_KEY for OpenAI)
TTS_API_KEY=sk-xxx
[Optional] API endpoint URL
TTS_URL=https://api.openai.com/v1
[Optional] Voice: alloy, echo, fable, onyx, nova, shimmer
TTS_VOICE=alloy
[Optional] API version (for Azure OpenAI)
TTS_BINDING_API_VERSION=
==============================================================================
Search Configuration (Web Search)
==============================================================================
Optional: Enable web search capabilities
[Optional] Provider: perplexity, tavily, serper, jina, exa
SEARCH_PROVIDER=perplexity
[Optional] API key for your chosen search provider
SEARCH_API_KEY=pplx-xxx
==============================================================================
Cloud Deployment Configuration
==============================================================================
Required when deploying to cloud/remote servers
[Optional] External API base URL for cloud deployment
Set this to your server's public URL when deploying remotely
Example: https://your-server.com:8001 or https://api.yourdomain.com
NEXT_PUBLIC_API_BASE_EXTERNAL=
[Optional] Direct API base URL (alternative to above)
NEXT_PUBLIC_API_BASE=
==============================================================================
Debug & Development
==============================================================================
[Optional] Disable SSL verification (not recommended for production)
DISABLE_SSL_VERIFY=false
==============================
HuggingFace / MinerU (Optional)
==============================
Use a HuggingFace mirror endpoint (optional)
HF_ENDPOINT=
HuggingFace cache directory (recommended: mount this in Docker to reuse cache)
HF_HOME=/app/data/hf
Force offline mode (requires models already downloaded into the cache)
HF_HUB_OFFLINE=1
Logs and screenshots
No response
Additional Information
- DeepTutor Version:ver0.6.0
- Operating System:macOS-Docker
- Python Version:
- Node.js Version:
- Browser (if applicable):
- Related Issues: