-
Notifications
You must be signed in to change notification settings - Fork 549
P0 - Job Name, UID and Description #3935
Comments
Supposing Job Name has duplication, how could rest server know which job exactly if only Job Name was given by user (or client)? It is more straightforward if only one identifier is used to refer to job, which is UID. The jobname and description is only used for displaying and filtering. And the rest server does not need to cache the mapping from uid to name. Example:
Explain: |
@debuggy See the example, when using name, rest server only check it against activejobs, i.e. by k8s APIServer. |
@yqwang-ms It will increase complexity if differing active jobs from history jobs, and there is no benefit from this. It better to leave this difference in rest server and not expose to client. In this design, how does a user get a history job by querying api server? |
See
|
@yqwang-ms So currently my idea supports second endpoint As for the first one By the way, is there any significant difference between active jobs and history jobs? |
active jobs: jobs in k8s apiserver |
I inclined to proposal-1 as it at least kept the job name unchanged. I think we can just keep the original solution to use job name to archive idempotent job submission. However we introduce UIDs just for job history. Job name will not unique in job history, UID will be used in job history. So based on proposal-1, we will not change the old APIs, but add some enahancements:
With this minor modifications to proposal-1, we can keep the fully backward compatible while support history server job queries. |
So, this seems to be the UID is generated by RestServer instead of K8S. Until now, we have 3 proposal: |
Another solution:
Pros:
|
@abuccts to help check:
UID related:
|
setup: 6 nodes, 1000 frameworks
there exists cache in apiserver, but using label selector may be expensive in first time.
there's no such need yet, but we can also use uid as pod's hostname in DNS if needed?
one shortcoming is, multiple rest servers for one cluster cannot avoid duplicate job names without persistent storages
better to generate uid in rest server
|
Thanks @abuccts seems 1000 frameworks is too small to differentiate iteration vs lookup. Could you please also test 30k frameworks (assume 1k framework one day)? And you can also check the code to double confirm the label time complexity. |
@abuccts , please do some stress test like the performance with 300k active jobs. |
Thanks @abuccts
Cache maintenance is not easy and will make restserver stateful, you need to make sure it is consistent with ApiServer all over the time. Let's not try this in first stage.
Even with persistent storage, these multiple rest servers need to sync with each other or handle its own naming partition/space only. So we may lost the rest server scalibility.
So, there is still a breaking naming change, such as -_ cannot be head or tail.
The uid needed in priority class and secret is the K8S generated framework UID, you cannot use restserver generated UID as it.
So, seems during a post K8S request, even if K8S generated a conflicted random string, it will not try to generate another one, but just return 404. |
for 30k frameworks, apiserver will timeout (exceed 1m, return 504) in several hours after creating frameworks, here're the results after 15 hours:
it will take a lot of time to create 300k frameworks, and etcd will timeout frequently during creating:
|
So, for Proposal-4, it seems have below cons: 2. Label is 20x slower than Name in large scale 3. Label still have 63 length limitation For API, we need to take it seriously, to guarantee idempotent, instead of besteffort to achieve it. @fanyangCS @abuccts |
Offlined discussed, here is the agreement: Still insist Proposal-4, but some adjustments:
We will try to use MD5(UserName + JobName) as FrameworkName to achieve nearly idempotent.
FrameworkName can be got by MD5, no need to use label to search anymore.
We will try break (UserName + JobName) to multiple labels to break this limitation. The UID is FrameworkUID which is generated by K8S. TBD: Planning: |
offline discussed and decided to use label:
|
regarding to the length limit of job name, I think we can keep the limit to 63 for now. if there is further requirement, can we extend to more than 63 by using multiple labels? |
Update job name encoding method, use md5 hash instead. Query job by k8s label selector. Resolve job name related issues in #3935.
yes, it's possible to use multiple labels separating a long job name, or we could also change to annotations if we don't need to use labels to query legacy jobs (base32 encoded) in the future. |
* Update job name encoding method Update job name encoding method, use md5 hash instead. Query job by k8s label selector. Resolve job name related issues in #3935. * Drop legacy jobs compatibility Drop legacy jobs compatibility.
|
done. |
If RestServer uses Job UID instead of Job Name as Job Key to serve query:
Pros:
Cons:
Proposal-1:
Job Name to submit idempotently,
Job UID to query uniquely,
Job Description to attach metadata arbitrarily.
UID generated by K8S
Add a new field in PAI Job Spec called
description
, which can be any string in any reasonable length (<10k), and RestServer stores it into k8s framework annotation.If user specified job name (he wants idempotence), then RestServer uses this job name as k8s framework name to submit, but RestServer still uses k8s framework uid as this job key serve query (may still can use name to serve active job query).
If user did not specify job name (he does not care idempotence, like Aether), then RestServer uses empty name as k8s framework name to submit (k8s will auto generate it if
metadata.generateName
is set) , and RestServer uses k8s framework uid as this job key serve query (may still can use name to serve active job query).Example:
active jobs: jobs in k8s apiserver
history jobs: jobs only in elasticsearch
TBD:
Proposal-2:
Job UID to submit idempotently and query uniquely,
Job Description to attach metadata arbitrarily.
UID generated by client
Assume RestServer client (WebPortal/SDK/RawHttpClient) always generates unique UID as current PAI's Job Name.
Or RestServer always also check the current PAI's Job Name conflict in history server
Pros:
In this way, we can merge the concepts JOB_NAME and JOB_UID in Proposal-1 to be only one concept: JOB_UID. Furthermore, RestServer does not need to change too much, such as store the mapping from JOB_UID to JOB_NAME. So, this Proposal is more simple and smooth.
Cons:
Example:
Proposal-3:
UID generated by RestServer.
Based on Proposal-1, but the UID is generated by RestServer instead of K8S, RestServer will use it as k8s framework name to submit if user does not specify job name.
#3935 (comment)
Proposal-4:
Based on Proposal-1, but
Job Name to submit idempotently and attach metadata arbitrarily,
Job UID to query uniquely.
UID generated by RestServer or K8S.
#3935 (comment)
Cons is summarized at #3935 (comment)
The text was updated successfully, but these errors were encountered: