Proxy renewal of user jobs #7307
Replies: 1 comment
-
Hi,
you should start from https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-8.0
General answer is that pilot credentials are used for renewing the payload proxies. The fact that you see a difference between single and multi processor jobs seems weird to me. Can you paste the configuration you are using for these cases? |
Beta Was this translation helpful? Give feedback.
-
Hi,
DIRAC py3 server 7.3.17, py3 pilot jobs and py3 diracos2 client
We are still running quite an old dirac version, and my plan is to update in the near future. I might need some help, though, with where and how to start (maybe you can already give me some pointers).
For the current version, we are running into a problem here at SURF, Amsterdam, where km3net.org user jobs are killed after ~24 hours (multicore jobs) or ~48 hours (single core jobs). We are almost sure the problem is related to the user proxy expiring because they are not renewed (only once for the single core jobs and not for the multi core jobs). If we run a test jobs for another VO, the jobs run perfectly fine until they run out of wall clock time.
Note that:
A km3net.org VO proxy has a max VO attribute life time of 24 hours. We think this might be the root of the problems.
Question:
Are pilot proxies also supposed to be renewed or are they in fact used to renew user proxies, in which case the pilot proxies have to be long enough of course (default 168 hours I think).
What we see in the log:
2023-11-23 06:09:54 UTC WorkloadManagement/JobAgent/InProcess INFO: Using Pilot credentials to get a new payload Proxy
Above line, we see only for the single core job but not for the multi core job.
Hope you can shed some light on the problem.
Ernst
Beta Was this translation helpful? Give feedback.
All reactions