-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatic splitting fails on missing throughput file #8792
Comments
looks like the problem is to run a tail step w/o info on the main one. The file
which may have something to do with the odd log
|
last lines of
|
after some debugging I believe that in here CRABServer/src/python/TaskWorker/Actions/PreDAG.py Lines 112 to 127 in 3b2e705
Line 123 is wrong, it should be instead
That will make the error in this issue go away. But since things are usually working, there may be something more. |
I believe this problem only happens when all CRABServer/src/python/TaskWorker/Actions/PreDAG.py Lines 200 to 201 in 3b2e705
but for reasons obscure to me processFailed is set to False which makes failed jobs NOT skipped (!!)CRABServer/src/python/TaskWorker/Actions/PreDAG.py Lines 123 to 124 in 3b2e705
while "normally" the default processFailed=True is used
Surely variable naming is confusing ! CRABServer/src/python/TaskWorker/Actions/PreDAG.py Lines 117 to 118 in 3b2e705
At this point I have no idea why the But failed jobs have no throughput report, so can't be used ! |
I made that task DAG complete successfully by rerunning PreDag manually after changing
estimates = set(self.completedJobs(stage='processing', processFailed=True)) which basically forces submission of a tail job with same config. as the processing one (OK, since the failure was an accidental 8028). But I am still worried that making the change in the code for everybody may trigger problems in different situations which I can not imagine/test no. |
Maybe there are situations where processing jobs fail, but still produce a report ? E.g. if they hit the time limit ? CRABServer/scripts/TweakPSet.py Lines 209 to 212 in 3b2e705
Or will they count as successful ? |
that No comments. no issue. I am still unsure what to do. |
some (but not all) probe jobs failing and all processing jobs failing is all in all a very rare case. |
I have prepared a PR with that fix. But need to think more about possible side effects |
I found this while looking at stuck automatic task in the CI pipeline
https://cmsweb-testbed.cern.ch/crabserver/ui/task/241113_203248%3Acrabint1_crab_20241113_213248
The text was updated successfully, but these errors were encountered: