-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid node IDs in workflow.status #10107
Comments
I have the same error :) |
Node IDs are opaque identifiers, not pod names. Not a bug. |
Hi @alexec, thank you for your message. I didn't read this issue properly. He wrote about nodeID. |
Ah. That is a bug. Can you raise a separate issue? |
Closed per @alexec comment:
|
@alexec We have been relying on node id being equal to pod name for the past few years until we recently try to upgrade from v3.3.8 to v3.4.7 and hit this issue. So this is a breaking change at minimum. Is there any reason why we can't or shouldn't use pod name as node id? Also, if I'm not mistaken, workflow.status would contain pod name only when there's a failure case. If node id is different from pod name, then how can we get the pod name for non-failure case? Thanks in advance! |
+1 on what @mweibel said. This describes the issue precisely. This regression is blocking us from upgrading to v3.4. |
Let's reopen this since this blocks upgrade for KFP. |
@chensun To unblock your upgrade, you can construct/infer the podName from nodeId: coinflip-tmrzc-3636679105
templateName: flip-coin
podName: coinflip-tmrzc-flip-coin-3636679105 This is basically how we get podName in the UI: export const getPodName = (workflowName: string, nodeName: string, templateName: string, nodeID: string, version: string): string => {
if (version === POD_NAME_V2 && templateName !== '') {
if (workflowName === nodeName) {
return workflowName;
}
const prefix = ensurePodNamePrefixLength(`${workflowName}-${templateName}`);
const hash = createFNVHash(nodeName);
return `${prefix}-${hash}`;
}
return nodeID;
}; and here's the podName implementation for failure case: func GeneratePodName(workflowName, nodeName, templateName, nodeID string, version PodNameVersion) string {
if version == PodNameV1 {
return nodeID
}
if workflowName == nodeName {
return workflowName
}
prefix := workflowName
if !strings.Contains(nodeName, ".inline") {
prefix = fmt.Sprintf("%s-%s", workflowName, templateName)
}
prefix = ensurePodNamePrefixLength(prefix)
h := fnv.New32a()
_, _ = h.Write([]byte(nodeName))
return fmt.Sprintf("%s-%v", prefix, h.Sum32())
} |
I don't think you should rely on
Instead, it might be a good enhancement proposal to also add podName to node statuses for non-failure cases. |
@terrytangyuan Thank your for your reply.
This is rather hacky, and subject to break with future upgrade. The way argo construct pod name is an implementation detail that no user should ever reimplement it themselves. In fact, even within Argo's code space, it should be a single SOT for pod name generation. There has been a chain of bugs following the v2 pod naming change. Please consider #10267.
In fact, there's possibly an open bug with the code you pasted. See #11015 and the proposed fix. Again to the point of having a single SOT for pod name.
Can anyone explain why node IDs cannot be the same as pod names (which was the case before v3.4)? Node IDs can an arbitrary UUIDs if you want, but being able to know the k8s pod name regardless failure or not is a super important feature. Missing pod name is a regression and an undocumented breaking change that should be avoided with minor version bumps. |
Yes I agree it's hacky and could only be a temporary workaround to unblock your release. cc @isubasinghe @JPZ13 @juliev0 who contributed the feature and relevant fixes. Any thoughts? |
Hey Terry. I'm not sure exactly the question, but I believe JP put in the original work for updating to POD NAMES V2 and has been looking at many of the bugs for that since then. (Although, the very first issue I worked on was switching to V2 as the default, and then also fixed an issue that was related to this afterward. I actually never really was told the motivation for the change or anything.) |
I don't recall being involved in that conversation either. Perhaps @alexec or @sarabala1979 have more context around the motivation behind the change? |
Chiming in - the motivation for switching from node ids to pod names with the workflow template in the pod name was so that a user who is looking through their individual pods using
I'll confirm in the codebase, but if you don't want the pod names to display with the workflowtemplate and you need the pod names to just be the node id, you can set the |
Following up, the code path for v1 pod names (node ids) still exists: https://github.com/argoproj/argo-workflows/blob/master/workflow/util/pod_name.go#L38 |
@JPZ13 Thanks for your reply. I can see the value of adding more info to pod name to make it more debug-friend. I wasn't questioning the change to pod name itself, but why node id cannot follow the same? I can try the |
I very much agree with #10267 |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
We use Argo heavily at ZG and when upgrading to related to #10267 |
Agree with it. If we change pod'name with v2, we can't get the pod name from status. It's really inconvenient. |
Pre-requisites
:latest
What happened/what you expected to happen?
Since #8748 pod names use the V2 naming which contains the template name in the pod name. However the implementation did not update the Workflow.Status.Nodes map to contain the correct pod name anymore. There's a disconnect between NodeIDs and pod names which wasn't the case before. This makes it impossible to look at a argo workflow status, take the nodeID and use it to know which pod it belongs to.
This is a follow-up of #9906. I initially thought this would be the same case but apparently is not.
The below workflow triggers the following workflow:
The pod to run is named
nodename-bvd45-blender-3928099255
but theNodeID
inworkflow.status.nodes
is justnodename-bvd45-3928099255
.Version
latest
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: