-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad CSI plugin is started to often #9248
Comments
Btw; is there any way for me to kill those allocs now? |
Hi @apollo13, sorry to hear you're having trouble. Any chance you can provide debug logs from the shutdown of those allocations? Or the allocation logs, so that we can see if the process is hanging during SIGTERM somehow. Once you've tried to grab those logs, |
@tgross if you tell me how to get those debug logs? I added the client logs in the initial comment which at least shows an error. I'll leave the alloc running till we have more infos. |
The alloc logs show:
Seems as if it just ignored the keyboard interrupt? |
I just tried |
Ah, I see. Well unfortunately those logs are at But if nothing is currently logging that won't help, as By the way, I would strongly recommend running at
That shows me it crashed when getting SIGINT, rather than exiting gracefully (also, not SIGTERM as I said incorrectly earlier). For some applications this would be fine (albeit messy in the logs), but I'm wondering if there are mounts getting bound into the allocation directory that aren't getting released. Nomad is showing the allocation in the "running" state, but it may be that it's stuck trying to run poststop hooks in the allocation runner or task runner. Is the Docker container for the plugin actually still running? |
Hi Tim, sadly I do have not any debug logs enabled :) The CSI plugin is so simple that it actually worked at the first try gg The application seems to exit immediately with exit code 130 on SIGKILL (just tested locally). The containers stopped properly. The only thing that still thinks the alloc is running is nomad, Since I cannot use
Any ideas, otherwise I'll just kill the allocations (if that even works :D) |
If you can't get that pprof endpoint information, then I'd suggest that given this is a development environment, you could send the Nomad client agent a |
Sorry, since I still think that the initial error:
might be something that might be relatable to the source code. (whatever "not found" means here). What I noticed during stopping the job though is that the mounts became broken on the other jobs. Since the are fuser mounts, there seems to be at least some dependency on the container that mounted them. What is the recommended way to allow node plugin updates while existing mounts are in place? |
That happens when the client node sends a fingerprint when the allocation running the plugin is terminal but the plugin has already been removed.
Nomad (and k8s for that matter) only explicitly expects that the CSI plugin is available for communication during the various lifecycle events of the volume (ex. publish/unpublish). I just double-checked and the CSI specification doesn't call it out, but if the volume disappears when the plugin process exits that's not really going to work. I don't see any other examples in the driver list that jump out at me as doing this. If there's any possible way to get it so that the plugin process doesn't have to remain resident, that'll be what you want to do here. Edit: while I'm thinking of it, if you haven't already see the Storage internals doc, it'll probably be helpful for your development process. |
Yeah I just realized that my problem isn't the mount itself but more generally how glusterfs works:
This means that it has a process running that is bound to the lifetime of the node plugin container. I wonder how other drivers do this (or maybe their mounts work without leaving any processes behind?). On the other hand it kinda makes sense that a
Thanks, I have seen it and the plugin mostly works. I guess my main confusion is how to mount something so it "moves" out to the host and is independent of the container as soon as the mount command is finished. |
@tgross I managed to get debug logs during a shutdown which left the alloc running. Sadly the do not show much either:
Does this help in any way? I'll try to nuke the job from orbit and retry :) |
Good news, I managed to get an operator debug log of the whole things from all nodes in the cluster. Where can I send it (I rather not upload it here)? EDIT:// found the email addr, you got mail :) |
Commenting solely on (if others also run into this issue):
The fact that mounts are destroyed during node plugin restarts is a known thing for mounts that happen via FUSE and similar; see ceph/ceph-csi#703 for details. |
Yes, exactly that. Typically if you mount something on Unix it just stays mounted. This is why you can restart a Nomad client without destroying all the directories we bind-mount to tasks. And it's also why Nomad bugs like #7848 #8814 exist; Nomad needs to explicitly make sure we clean up the mounts.
Yeah, that's unfortunate... I'm not convinced that CSI is the right use case for something like FUSE if they're talking about running a separate service on the host anyways -- you'd be better off running the whole thing on the host and avoiding all the lifetime management complexities of CSI. (Aside: this is my take on the CSI spec in general but the FUSE use case especially underscores it! 😁 )
Great! Am I right in understanding this means this is a reproducible problem? I dug through these and there are a couple things that jump out at me. You've got three allocations of interest here but for purposes of what I'm showing below let's focus on The allocation status recorded in the alloc 7e1b11e9-971c-f367-b4ab-d309aca6c9e6{
"ID": "7e1b11e9-971c-f367-b4ab-d309aca6c9e6",
"EvalID": "c869b27a-d7d1-52e5-ab5c-413131d02c03",
"Name": "infra.storage.node[0]",
"Namespace": "default",
"NodeID": "8e100f4b-6d1b-ca4a-e8d6-b27a65ddac93",
"NodeName": "nomad01",
"JobID": "infra.storage",
"JobType": "system",
"JobVersion": 0,
"TaskGroup": "node",
"DesiredStatus": "run",
"DesiredDescription": "",
"ClientStatus": "running",
"ClientDescription": "Tasks are running",
"TaskStates": {
"plugin": {
"State": "running",
"Failed": false,
"Restarts": 0,
"LastRestart": "0001-01-01T00:00:00Z",
"StartedAt": "2020-11-03T10:35:33.468602911Z",
"FinishedAt": "0001-01-01T00:00:00Z",
"Events": [
{
"Type": "Received",
"Time": 1604399732979323400,
"DisplayMessage": "Task received by client",
"Details": {},
"Message": "",
"FailsTask": false,
"RestartReason": "",
"SetupError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"Signal": 0,
"KillReason": "",
"KillTimeout": 0,
"KillError": "",
"StartDelay": 0,
"DownloadError": "",
"ValidationError": "",
"DiskLimit": 0,
"DiskSize": 0,
"FailedSibling": "",
"VaultError": "",
"TaskSignalReason": "",
"TaskSignal": "",
"GenericSource": ""
},
{
"Type": "Task Setup",
"Time": 1604399732981774600,
"DisplayMessage": "Building Task Directory",
"Details": {
"message": "Building Task Directory"
},
"Message": "Building Task Directory",
"FailsTask": false,
"RestartReason": "",
"SetupError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"Signal": 0,
"KillReason": "",
"KillTimeout": 0,
"KillError": "",
"StartDelay": 0,
"DownloadError": "",
"ValidationError": "",
"DiskLimit": 0,
"DiskSize": 0,
"FailedSibling": "",
"VaultError": "",
"TaskSignalReason": "",
"TaskSignal": "",
"GenericSource": ""
},
{
"Type": "Started",
"Time": 1604399733468598500,
"DisplayMessage": "Task started by client",
"Details": {},
"Message": "",
"FailsTask": false,
"RestartReason": "",
"SetupError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"Signal": 0,
"KillReason": "",
"KillTimeout": 0,
"KillError": "",
"StartDelay": 0,
"DownloadError": "",
"ValidationError": "",
"DiskLimit": 0,
"DiskSize": 0,
"FailedSibling": "",
"VaultError": "",
"TaskSignalReason": "",
"TaskSignal": "",
"GenericSource": ""
}
]
}
},
"DeploymentStatus": null,
"FollowupEvalID": "",
"RescheduleTracker": null,
"PreemptedAllocations": null,
"PreemptedByAllocation": "",
"CreateIndex": 272481,
"ModifyIndex": 272488,
"CreateTime": 1604399732937911000,
"ModifyTime": 1604399733556201200
} That's just weird and probably not related at all to CSI. Maybe the snapshot of that status is before the allocation stops though? Might be useful to take a second snapshot right after the problem. Next if we look at the logs for the shutdown of that allocation:
This mostly looks like an orderly shutdown of the plugin, except of course that the plugin delete fails. Maybe this is an issue of us counting the number of active plugins incorrectly? When you have all 3 running what does |
So this is the status now (I have recreated the job):
After
Redoing a start again:
As you can see the old allocs are still there in the job and not stopping… I'll see if I can get another (longer) operator dump today.
I am with you on that one, so how do I do that and tell nomad about the plugin? |
Yes, simply running Edit:// Maybe it is specific to node only plugins. You should be able to test with any CSI plugin though if you set it's type to node only. Nomad should not check the other stuff then I think… |
The important bit is that debug log around shutdown. But if there's no new information there I'll have to go from what you've been able to give me (thanks again for that!). There's something happening here that isn't the logs, and I want to see if I can replicate it without your custom plugin.
You can't, that's a limitation of CSI itself unfortunately. I have some thoughts on how we might be able to provide a better solution (which would violate the CSI spec) but for the moment there's no workaround for that. |
If you think access to the plugin would help I can upload the docker container, it should only need access to an (unauthenticated) NFS server
I started reading the CSI spec since yesterday, so excuse the question, but why does CSI limit that? Couldn't you just say:
in the nomad client config and nomad could match that up to controllers use the |
@tgross Have you been able to reproduce the issue? Can I help with anything to make sure it ends up fixed in 1.0? |
I've been tied up in the recent CVE response, so not yet. I'm working on CSI stuff now though so hoping to have some answers for you soon here. |
I'll try to get a build running to verify. Thanks! |
Ok, I didn't yet manage to test master :/ |
@tgross Good news, I finally managed to test this (and all the other CSI issues you fixed) against the current master (and also compared with the beta) and it seems that #9438 does fix this. I do not think that the other two issues are the cause here (in this special case). As such I am closing this issue -- thank you for your help! |
So glad to hear! 😁 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
0.12.5
Operating system and Environment details
Debian 10
Issue
I am currently developing my own CSI plugin. After pushing a new version of the plugin, nomad seems to be somewhat confused (I have three nodes):
As you can see version 1 is supposed to be stopped but it continues running (beats me why). The alloc log says:
Nomad Client logs (if appropriate)
Did I forget to implement something? scratches head
The text was updated successfully, but these errors were encountered: