-
Notifications
You must be signed in to change notification settings - Fork 739
Description
Hello,
I'm trying to replicate your results with SEAL-0.
Request: Any tips to get this to run faster and to eliminate the infinite loop and node ... permanently blocked from aggregation errors?
Setup:
- Install on Ubuntu 24
- Native setup
- Add OpenRouter API Key
- Start backend and frontend
- Test run of evaluation.py: python evaluation.py --num-examples 3
Issues:
- Takes a long time to run - around an hour - seems like a long time for only 3 questions (from the set of 130 seal-0.csv questions)
- In backend output get lots of warnings/errors:
- INFINITE LOOP DETECTED: Node root stuck in PLAN_DONE for 1902.3s
- Node root permanently blocked from aggregation - considering forced completion
- Even when evaluation.py finishes, the backend keeps repeating:
TaskScheduler: No nodes in READY or AGGREGATING status
🔍 AGGREGATION DEBUG - Node root sub_graph_id: subgraph_root, found 3 children
⏳ AGGREGATION BLOCKED - Node root cannot AGGREGATE: 2/3 children incomplete: root.2:PLAN_DONE, root.3:PENDING
🔍 AGGREGATION DEBUG - Node root.2 sub_graph_id: subgraph_root.2, found 4 children
⏳ AGGREGATION BLOCKED - Node root.2 cannot AGGREGATE: 2/4 children incomplete: root.2.3:RUNNING, root.2.4:PENDING
🚨 INFINITE LOOP DETECTED: Node root stuck in PLAN_DONE for 2162.9s
🔍 AGGREGATION DEBUG - Node root sub_graph_id: subgraph_root, found 3 children
⏳ AGGREGATION BLOCKED - Node root cannot AGGREGATE: 2/3 children incomplete: root.2:PLAN_DONE, root.3:PENDING
🚨 Node root permanently blocked from aggregation - considering forced completion
🚨 INFINITE LOOP DETECTED: Node root.2 stuck in PLAN_DONE for 1742.0s
🔍 AGGREGATION DEBUG - Node root.2 sub_graph_id: subgraph_root.2, found 4 children
⏳ AGGREGATION BLOCKED - Node root.2 cannot AGGREGATE: 2/4 children incomplete: root.2.3:RUNNING, root.2.4:PENDING
(...)
Output from running evaluation:
$ python evaluation.py --num-examples 3
🔍 Testing connection to server at http://localhost:5000...
✅ Connected to server - Profile: None
📋 Processing 3 out of 3 total queries
🏁 Starting 2 worker processes for 3 queries...
Processing queries: 0%| | 0/3 [00:00<?, ?it/s]🤖 [EvalWorker-2] Worker initialized, connected to http://localhost:5000
🚀 [EvalWorker-2] Creating project for query #1: 'Who holds the all-time record at the Grammys for the most wins in the album of the year category?'
⏱️ [EvalWorker-2] Waiting 5.0s before request (rate limiting)
🤖 [EvalWorker-1] Worker initialized, connected to http://localhost:5000
🚀 [EvalWorker-1] Creating project for query #2: 'How many NBA players have scored 60 or more points in a regular season game since 2023?'
⏱️ [EvalWorker-1] Waiting 5.0s before request (rate limiting)
✅ [EvalWorker-1] Created project c427a462-63f7-48af-ba98-18ea1c92c5e9 for query #2
📊 Project c427a462-63f7-48af-ba98-18ea1c92c5e9 status: active
✅ [EvalWorker-2] Created project ff2c9540-f231-4419-9105-d902dbfb7c29 for query #1
📊 Project ff2c9540-f231-4419-9105-d902dbfb7c29 status: active
📊 Project c427a462-63f7-48af-ba98-18ea1c92c5e9 status: running
📊 Project ff2c9540-f231-4419-9105-d902dbfb7c29 status: running
⏱️ Timeout waiting for results. Checking worker status...
⏱️ Timeout waiting for results. Checking worker status...
⏱️ Timeout waiting for results. Checking worker status...
📊 Project c427a462-63f7-48af-ba98-18ea1c92c5e9 status: completed
✅ [EvalWorker-1] Completed query #2 in 1082.46s
🚀 [EvalWorker-1] Creating project for query #3: 'What is the most recent film to join the top 10 highest-grossing films of all time?'
⏱️ [EvalWorker-1] Waiting 5.0s before request (rate limiting)
Processing queries: 33%|██████████████▎ | 1/3 [18:03<36:06, 1083.00s/it]📊 Progress: 1/3 - Latest project: c427a462-63f7-48af-ba98-18ea1c92c5e9
✅ [EvalWorker-1] Created project 96491b5f-6cbe-4cbd-aeea-3ee03274184c for query #3
📊 Project 96491b5f-6cbe-4cbd-aeea-3ee03274184c status: active
📊 Project 96491b5f-6cbe-4cbd-aeea-3ee03274184c status: running
⏱️ Timeout waiting for results. Checking worker status...
⏱️ Timeout waiting for results. Checking worker status...
⏱️ Project ff2c9540-f231-4419-9105-d902dbfb7c29 timed out after 1800 seconds
✅ [EvalWorker-2] Completed query #1 in 1805.53s
🏁 [EvalWorker-2] Worker finished
Processing queries: 67%|█████████████████████████████▎ | 2/3 [30:06<14:31, 871.28s/it]📊 Progress: 2/3 - Latest project: ff2c9540-f231-4419-9105-d902dbfb7c29
⏱️ Timeout waiting for results. Checking worker status...
⏱️ Timeout waiting for results. Checking worker status...
⏱️ Timeout waiting for results. Checking worker status...
⏱️ Project 96491b5f-6cbe-4cbd-aeea-3ee03274184c timed out after 1800 seconds
✅ [EvalWorker-1] Completed query #3 in 1805.55s
Processing queries: 100%|████████████████████████████████████████████| 3/3 [48:08<00:00, 967.72s/it]🏁 [EvalWorker-1] Worker finished
📊 Progress: 3/3 - Latest project: 96491b5f-6cbe-4cbd-aeea-3ee03274184c
Processing queries: 100%|████████████████████████████████████████████| 3/3 [48:08<00:00, 962.85s/it]
✅ All evaluations completed
💾 Results saved to server_eval_results.csv
📊 Evaluation Summary:
Total queries: 3
Successful: 3
Failed: 0
Average execution time: 1564.51 seconds
🌐 Projects created (viewable in frontend):
- ff2c9540-f231-4419-9105-d902dbfb7c29: Who holds the all-time record at the Grammys for t...
- c427a462-63f7-48af-ba98-18ea1c92c5e9: How many NBA players have scored 60 or more points...
- 96491b5f-6cbe-4cbd-aeea-3ee03274184c: What is the most recent film to join the top 10 hi...
Any suggestions appreciated.