-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.2.2 borked? #573
Comments
@pires I am not able to reproduce the issue. Can you please provide us some more details about your environment, configuration, etc? The job definition had incorrect check defintion, which shouldn't have resulted in a crash so I will need more information to debug here. The updated job definition is -
Note that when you are running a system job, there is no need to use the |
nomad client is failing to create the task alloc dir, it will be helpful to have the whole stack trace, theres an error check https://github.com/hashicorp/nomad/blob/master/client/allocdir/alloc_dir.go#L62 , but its not clear from the stack trace why the panic occured |
im just tried with master + lxc driver.. its working fine for me. comments were specific to @pires issue. I dont think this issue is driver specific, |
@diptanu first of all, I set job type as Now, I'm using CoreOS to provide 3 servers and 1 client.
log_level = "INFO"
data_dir = "/var/lib/nomad/data"
bind_addr = "$private_ipv4"
server {
enabled = true
bootstrap_expect = 3
}
log_level = "DEBUG"
data_dir = "/var/lib/nomad/data"
bind_addr = "$private_ipv4"
client {
enabled = true
options = {
consul.address = "$private_ipv4"
}
} Servers
|
Client logs on first start:
Client logs after
Client logs after
Client logs after
|
@pires First of all sorry, somehow I read the scheduler was system and not service! And secondly, thanks for the detailed logs. I think I know what's going on here. The reason this is failing is because -
We merged a PR related to how we find users, to get rid of cgo dependencies. @dadgar You might want to take a look at this? |
@pires Looks like we were making a syscall to get the user |
@diptanu I can confirm there is no such user. But mind that the client starts at first and becomes ready, but crashes on a restart that happens after the allocation failed. |
@pires I think the client crashes on restart because it's trying to restore some state regarding the allocdirs, but it is not finding them. This is probably an un-related bug, which needs to be fixed too. I think the main issue here is that Nomad isn't able to find the @dadgar Since Nomad relies on that user, we shouldn't mark that node as ready if it's can't find it. Also I am wondering if the user could be configurable by the user. |
@diptanu I was wrong, the
|
@pires So the user I will work on a PR which will use We should be able to release a 0.2.3-rc-1 soon, I will update this thread once we do, please let us know if that would resolve your issue. |
I'll wait on 0.2.3-rc1 then. Adding the line to |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
The following service runs with
0.2.1
but fails with0.2.2
with no reason to be found:Also, restarting
0.2.2
crashes:The stack is just too long.
The text was updated successfully, but these errors were encountered: