Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

How to gather the job startTime info in PAI? #1978

Closed
JianfeiHu opened this issue Jan 7, 2019 · 5 comments
Closed

How to gather the job startTime info in PAI? #1978

JianfeiHu opened this issue Jan 7, 2019 · 5 comments

Comments

@JianfeiHu
Copy link

Organization Name: MSRAIT
Short summary about the issue/question:我试图统计用户对系统的使用情况,计算用户对资源的使用时间,从REST服务器获取作业列表,发现只有createdTime和completedTime。createdTime是用户提交作业的时间,而不是job实际运行的时间。PAI是否记录作业实际运行的时间?

OpenPAI Environment:

  • OpenPAI version:0.8.2
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:
@yqwang-ms
Copy link
Member

yqwang-ms commented Jan 7, 2019

FrameworkLauncher记录了YARN app实际运行时间,
private Long applicationLaunchedTimestamp;
private Long applicationCompletedTimestamp;

如:
image

restserver暴露出来的是:
appLaunchedTime
appCompletedTime
你要不试试这两个?

(这只是最后一次retry的job的起始和终止时间)

@yqwang-ms yqwang-ms assigned Gerhut and yqwang-ms and unassigned yqwang-ms Jan 7, 2019
@JianfeiHu
Copy link
Author

JianfeiHu commented Jan 15, 2019

@yqwang-ms我在使用这种方法获得appLaunchedTime,但是对一部分job查询时会出现这种情况。
http://*5:9086/v1/Frameworks/nni_exp_NftmjP6l_trial_jOcTK

I'm changing the IP to secured, you guys can contact offline for the job ip.

@yqwang-ms
Copy link
Member

这个job应该被人删掉了,你看看nni那边是不是有删除逻辑

@scarlett2018 scarlett2018 changed the title job startTime FAQ - How to gather the job startTime info in PAI? Jan 19, 2019
@scarlett2018 scarlett2018 changed the title FAQ - How to gather the job startTime info in PAI? How to gather the job startTime info in PAI? Jan 19, 2019
@scarlett2018
Copy link
Member

scarlett2018 commented Jan 19, 2019

I'm closing the issue as the original question had been resolved. @JianfeiHu - please follow up with @yds05 or Chi for the nni job log issue.

@yqwang-ms
Copy link
Member

yqwang-ms commented Aug 7, 2019

@JianfeiHu We have just refined the "作业实际开始运行的时间" in frameworkcontroller (so will be supported in Pure K8S PAI soon).

For more details, please check:
microsoft/frameworkcontroller#35

For a job attempt:
WholeDuration = CompletionTime - StartTime
RunningDuration = CompletionTime - RunTime
WaitingDuration = WholeDuration - RunningDuration

For the whole job which may have multiple attempts:
WholeDuration = CompletionTime - StartTime
RunningTime = Sum_all_attempts (CompletionTime - RunTime)
WaitingDuration = WholeDuration - RunningDuration

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants