-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function clause error in Swarm.IntervalTreeClock.fill
on topology change
#126
Comments
I have this issue as well but on 3.3.1
My configuration
|
We are also having this issue on 3.3.1. Probably related is that we experienced the following crash right before the Swarm.IntervalTreeClock.fill error. {
{
{:badmatch, false},
[
{Swarm.Registry, :new!, 1, [file: 'lib/swarm/registry.ex', line: 99]},
{Swarm.Tracker, :handle_replica_event, 4, [file: 'lib/swarm/tracker/tracker.ex', line: 814]},
{:gen_statem, :call_state_function, 5, [file: 'gen_statem.erl', line: 1240]},
{:gen_statem, :loop_event, 6, [file: 'gen_statem.erl', line: 1012]},
{:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]},
{:gen_statem, :call, [
Swarm.Tracker, {:track, "<our_id>",
%{mfa: {Cadex.Swarm.Supervisor, :start_event_processor, ["PH06407805_2019"]}}}, :infinity]}
} Seems like Registry tries to insert a record in ets with a duplicate key? Some of our processes exit immediately after being started, they assume that there isn't anything left to do. [swarm on cadex_service@server2] [tracker:handle_cast] received sync request from cadex_service@server1
[swarm on cadex_service@server1] [tracker:cluster_wait] selected sync node: cadex_service@server2
[swarm on cadex_service@server1] [tracker:cluster_wait] found connected nodes: [:"cadex_service@server2"]
[swarm on cadex_service@server1] [tracker:cluster_wait] joining cluster.. |
We have upgraded to 3.4, and now it seems to derail even more quickly. beside the IntervalTreeClock (343,584 log entries in 6 hours,
and these:
and these
And also, some tracked processes are actually gone:
I'll try to get the states from the system I am also wondering what would happen if a node restarts and starts creating processes before it joined the cluster? |
Sorry for diluting this ticket, When we have bursts in data, we still get these warnings:
Not sure how to interpret these, since everything seems to work out fine. |
Howdy!
Our error tracker picked up the following error that happened when we deployed our application yesterday.
We're using Swarm (version 3.4.0) to manage a two node cluster on Kubernetes. When the first node was shut down as part of the rolling deploy, the following
ErlangError
was raised (I've formatted it a bit to make it easier to read):It looks like a function clause error where
Swarm.IntervalTreeClock.fill({1, 0}, {1, 81714})
was called, but none of the function clauses match:This is pretty deep in the CRDT weeds so I don't think I'm going to be able to help much in figuring out how that function got called with those arguments, but I'd be happy to help with any other information about the application that can help in fixing this.
The text was updated successfully, but these errors were encountered: