Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Panic on reconnect #192

Closed
toni-moreno opened this issue Mar 18, 2020 · 14 comments
Closed

[BUG] Panic on reconnect #192

toni-moreno opened this issue Mar 18, 2020 · 14 comments
Labels
bug Something isn't working

Comments

@toni-moreno
Copy link

Hello, first and thank you for this great project.

I'm working with mangos as Req(master)/Resp(agents) protocol with 1 master listening for connections and distributing jobs and N agents connecting to the master and doing distributed work.

I did a PoC as I commented here #189 , in the poc all working fine.

But with real data we have Panics in the master, after agents have been restarted and trying to reconnect again.

[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xb81f89]

goroutine 39 [running]:
go.nanomsg.org/mangos/v3.(*Message).Dup(0x0, 0x419ff6)
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.0/message.go:157 +0x29
go.nanomsg.org/mangos/v3/protocol/req.(*socket).send(0xc0001f0240)
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.0/protocol/req/req.go:89 +0x13d
go.nanomsg.org/mangos/v3/protocol/req.(*socket).AddPipe(0xc0001f0240, 0x1004300, 0xc0000974a0, 0x0, 0x0)
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.0/protocol/req/req.go:489 +0x13a
go.nanomsg.org/mangos/v3/internal/core.(*socket).addPipe(0xc00032a000, 0x7f82e205bea0, 0xc000096960, 0x0, 0xc000290100)
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.0/internal/core/socket.go:78 +0x176
go.nanomsg.org/mangos/v3/internal/core.(*listener).serve(0xc000290100)
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.0/internal/core/listener.go:59 +0xfe
created by go.nanomsg.org/mangos/v3/internal/core.(*listener).Listen
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.0/internal/core/listener.go:92 +0x1aa
[Bra] 03-18 18:43:53 [ WARN] Fail to execute command: ./bin/synthetix-manager [] - exit status 2

There is any workaround to avoid this error?

@gdamore
Copy link
Contributor

gdamore commented Mar 18, 2020

I can't advise an easy workaround. It looks at first blush like a bug in the req code that is calling Dup() on a nil message. We could have Dup() check for nil, which might be one approach, but it isn't a real fix. I'll try to get to this soon, but it might be a day or two as I'm fairly buried in work on other projects for $dayjob.

@gdamore gdamore added the bug Something isn't working label Mar 18, 2020
@toni-moreno
Copy link
Author

ok @gdamore let my test the fix ( when done)k . And thank you for your fast response.!!!

@gdamore
Copy link
Contributor

gdamore commented Mar 19, 2020

I see you are using 1.3.0. However, I've made changes since then, which may address this. stay tuned.

@gdamore
Copy link
Contributor

gdamore commented Mar 19, 2020

Yeah, I don't think that would have fixed it. I'm still trying to determine how the circumstance arises that the reqMsg is nil at this time.

gdamore added a commit that referenced this issue Mar 19, 2020
@gdamore
Copy link
Contributor

gdamore commented Mar 19, 2020

Please have a look at branch ged/bug192 -- I'm not sure if it will solve it for you, but I think it might. I'd at least like to know if it doesn't.

@toni-moreno
Copy link
Author

sorry @gdamore ; something happened when trying to get this specific branch version.

vant@vant-N2x0WU:~/$ go get go.nanomsg.org/mangos@ged/bug192
go: finding go.nanomsg.org ged/bug192
go: finding go.nanomsg.org/mangos ged/bug192
go get go.nanomsg.org/mangos@ged/bug192: go.nanomsg.org/mangos@ged/bug192: invalid version: version "ged/bug192" invalid: disallowed version string

How can I add to my gomodules this branch version?

@toni-moreno
Copy link
Author

when trying to get from github, similar error.

vant@vant-N2x0WU:~/$ go get github.com/nanomsg/mangos@ged/bug192
go: finding github.com ged/bug192
go: finding github.com/nanomsg/mangos ged/bug192
go: finding github.com/nanomsg ged/bug192
go get github.com/nanomsg/mangos@ged/bug192: github.com/nanomsg/mangos@ged/bug192: invalid version: version "ged/bug192" invalid: disallowed version string

@toni-moreno
Copy link
Author

with commitid seems to work..

vant@vant-N2x0WU:~/$ go get go.nanomsg.org/mangos/v3@6e7dae87a98a5776c8e80268399d5a1e51588ae9
go: finding go.nanomsg.org/mangos/v3 6e7dae87a98a5776c8e80268399d5a1e51588ae9
go: downloading go.nanomsg.org/mangos/v3 v3.0.1-0.20200319045234-6e7dae87a98a
go: extracting go.nanomsg.org/mangos/v3 v3.0.1-0.20200319045234-6e7dae87a98a

I will rebuild and test with this version... and will give you feedback..

@toni-moreno
Copy link
Author

toni-moreno commented Mar 19, 2020

Sorry the patch seems not being the solution. Panic again when "ctrl+C" on the agent which is connecting to the master waitting for new jobs".

time="2020-03-19 11:19:15" level=info msg="cron: job processed in [21.364734361s]"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0xb8b3d1]

goroutine 114 [running]:
go.nanomsg.org/mangos/v3.(*Message).Clone(...)
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.1-0.20200319045234-6e7dae87a98a/message.go:129
go.nanomsg.org/mangos/v3/protocol/req.(*socket).send(0xc00025b980)
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.1-0.20200319045234-6e7dae87a98a/protocol/req/req.go:88 +0x131
go.nanomsg.org/mangos/v3/protocol/req.(*context).resendMessage(0xc000166070)
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.1-0.20200319045234-6e7dae87a98a/protocol/req/req.go:171 +0xe8
created by go.nanomsg.org/mangos/v3/protocol/req.(*socket).RemovePipe
	/home/vant/go/pkg/mod/go.nanomsg.org/mangos/v3@v3.0.1-0.20200319045234-6e7dae87a98a/protocol/req/req.go:503 +0x30c
[Bra] 03-19 11:19:30 [ WARN] Fail to execute command: ./bin/synthetix-manager [] - exit status 2

Any other test that we can do?

@toni-moreno
Copy link
Author

Hello @gdamore , after working a lot with this panic, we could see that it only happened when master has sent a message and waiting for response from the agents, and one of them, is killed ( Ctrl+C)

if no message waiting the panic doesn't happen.

The agent which is provoking the panic, could not execute the socket.Close() before it was killed.

@gdamore
Copy link
Contributor

gdamore commented Mar 20, 2020 via email

@gdamore
Copy link
Contributor

gdamore commented Mar 22, 2020

I think I've pushed an update to that branch that should fix it. Please test again (commit is c4b7a01 ) and let me know.

@toni-moreno
Copy link
Author

Hello @gdamore . Good News!

It seems working fine, with one agent running and restarting !!!!

let me maintain opened for a days where I could test with more than one agent at once.

Thank you very much!

@gdamore
Copy link
Contributor

gdamore commented Mar 24, 2020

I'm going to go ahead and merge this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants