Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupted system call when -L #18

Closed
time-less-ness opened this issue Dec 16, 2014 · 8 comments
Closed

Interrupted system call when -L #18

time-less-ness opened this issue Dec 16, 2014 · 8 comments

Comments

@time-less-ness
Copy link

I am using CentOS 6.5. I compiled the binary on one system then copied it out to several others. There are Kafka brokers on hadoop-dev[02,03,04] on my cluster. I get proper output, with what looks like valid info, but also some odd errors.

When I look at metadata_list() which calls metadata_print(), I start to suspect the problem comes when you're iterating over the members of the ISR. Perhaps the error comes from me having defined my broker IDs as 2,3,4 (instead of a more-expected 0,1,2?)...

I'm honestly not sure. I'll keep digging, but if you have any insight, I'd welcome any guidance. Full output follows.

:) ./kafkacat-CentOS-6.5-x86_64 -L -b hadoop-dev03
Metadata for all topics (from broker -1: hadoop-dev03:9092/bootstrap):
3 brokers:
broker 3 at hadoop-dev03:9092
broker 2 at hadoop-dev02:9092
broker 4 at hadoop-dev04:9092
3 topics:
topic "CleanedStream" with 32 partitions:
partition 23, leader 3, replicas: 3,2, isrs: 3,2
partition 17, leader 3, replicas: 3,2, isrs: 3,2
partition 8, leader 3, replicas: 3,4, isrs: 3,4
partition 26, leader 3, replicas: 3,4, isrs: 3,4
partition 11, leader 3, replicas: 3,2, isrs: 3,2
partition 29, leader 3, replicas: 3,2, isrs: 3,2
partition 20, leader 3, replicas: 3,4, isrs: 3,4
partition 2, leader 3, replicas: 3,4, isrs: 3,4
partition 5, leader 3, replicas: 3,2, isrs: 3,2
partition 14, leader 3, replicas: 3,4, isrs: 3,4
partition 13, leader 2, replicas: 2,3, isrs: 2,3
partition 4, leader 2, replicas: 2,4, isrs: 2,4
partition 22, leader 2, replicas: 2,4, isrs: 2,4
partition 31, leader 2, replicas: 2,3, isrs: 2,3
partition 7, leader 2, replicas: 2,3, isrs: 2,3
partition 16, leader 2, replicas: 2,4, isrs: 2,4
partition 25, leader 2, replicas: 2,3, isrs: 2,3
partition 10, leader 2, replicas: 2,4, isrs: 2,4
partition 1, leader 2, replicas: 2,3, isrs: 2,3
partition 19, leader 2, replicas: 2,3, isrs: 2,3
partition 28, leader 2, replicas: 2,4, isrs: 2,4
partition 9, leader 4, replicas: 4,3, isrs: 4,3
partition 18, leader 4, replicas: 4,2, isrs: 4,2
partition 27, leader 4, replicas: 4,3, isrs: 4,3
partition 12, leader 4, replicas: 4,2, isrs: 4,2
partition 3, leader 4, replicas: 4,3, isrs: 4,3
partition 21, leader 4, replicas: 4,3, isrs: 4,3
partition 30, leader 4, replicas: 4,2, isrs: 4,2
partition 15, leader 4, replicas: 4,3, isrs: 4,3
partition 24, leader 4, replicas: 4,2, isrs: 4,2
partition 6, leader 4, replicas: 4,2, isrs: 4,2
partition 0, leader 4, replicas: 4,2, isrs: 4,2
topic "RawStream" with 32 partitions:
partition 23, leader 3, replicas: 3,4, isrs: 3,4
partition 8, leader 3, replicas: 3,2, isrs: 3,2
partition 17, leader 3, replicas: 3,4, isrs: 3,4
partition 26, leader 3, replicas: 3,2, isrs: 3,2
partition 11, leader 3, replicas: 3,4, isrs: 3,4
partition 29, leader 3, replicas: 3,4, isrs: 3,4
partition 2, leader 3, replicas: 3,2, isrs: 3,2
partition 20, leader 3, replicas: 3,2, isrs: 3,2
partition 5, leader 3, replicas: 3,4, isrs: 3,4
partition 14, leader 3, replicas: 3,2, isrs: 3,2
partition 13, leader 2, replicas: 2,4, isrs: 2,4
partition 4, leader 2, replicas: 2,3, isrs: 2,3
partition 31, leader 2, replicas: 2,4, isrs: 2,4
partition 22, leader 2, replicas: 2,3, isrs: 2,3
partition 16, leader 2, replicas: 2,3, isrs: 2,3
partition 7, leader 2, replicas: 2,4, isrs: 2,4
partition 25, leader 2, replicas: 2,4, isrs: 2,4
partition 10, leader 2, replicas: 2,3, isrs: 2,3
partition 1, leader 2, replicas: 2,4, isrs: 2,4
partition 28, leader 2, replicas: 2,3, isrs: 2,3
partition 19, leader 2, replicas: 2,4, isrs: 2,4
partition 9, leader 4, replicas: 4,2, isrs: 4,2
partition 27, leader 4, replicas: 4,2, isrs: 4,2
partition 18, leader 4, replicas: 4,3, isrs: 4,3
partition 21, leader 4, replicas: 4,2, isrs: 4,2
partition 3, leader 4, replicas: 4,2, isrs: 4,2
partition 12, leader 4, replicas: 4,3, isrs: 4,3
partition 30, leader 4, replicas: 4,3, isrs: 4,3
partition 15, leader 4, replicas: 4,2, isrs: 4,2
partition 24, leader 4, replicas: 4,3, isrs: 4,3
partition 6, leader 4, replicas: 4,3, isrs: 4,3
partition 0, leader 4, replicas: 4,3, isrs: 4,3
topic "FirstTest" with 8 partitions:
partition 2, leader 2, replicas: 2,4, isrs: 2,4
partition 5, leader 2, replicas: 2,3, isrs: 2,3
partition 4, leader 4, replicas: 4,2, isrs: 4,2
partition 7, leader 4, replicas: 4,3, isrs: 4,3
partition 1, leader 4, replicas: 4,3, isrs: 4,3
partition 3, leader 3, replicas: 3,4, isrs: 3,4
partition 6, leader 3, replicas: 3,2, isrs: 3,2
partition 0, leader 3, replicas: 3,2, isrs: 3,2
%3|1418754255.111|FAIL|rdkafka#producer-0| hadoop-dev02:9092/2: Failed to connect to broker at hadoop-dev02:9092: Interrupted system call
%3|1418754255.111|FAIL|rdkafka#producer-0| hadoop-dev03:9092/3: Failed to connect to broker at hadoop-dev03:9092: Interrupted system call
%3|1418754255.111|ERROR|rdkafka#producer-0| hadoop-dev02:9092/2: Failed to connect to broker at hadoop-dev02:9092: Interrupted system call
%3|1418754255.111|ERROR|rdkafka#producer-0| hadoop-dev03:9092/3: Failed to connect to broker at hadoop-dev03:9092: Interrupted system call
%3|1418754255.111|FAIL|rdkafka#producer-0| hadoop-dev04:9092/4: Failed to connect to broker at hadoop-dev04:9092: Interrupted system call
%3|1418754255.111|ERROR|rdkafka#producer-0| hadoop-dev04:9092/4: Failed to connect to broker at hadoop-dev04:9092: Interrupted system call

@edenhill
Copy link
Owner

Is kafkacat exiting cleanly after this? (i.e., $? is 0)

@edenhill
Copy link
Owner

And can you try running it with .. -d broker to see what's going on?

@time-less-ness
Copy link
Author

It is exiting cleanly. Debug! Duh. Sorry, I should have noticed this and run that. Here's the debug output and $? output. Also, I run it as 2>&1 so we can see the errors interspersed with normal output. It looks like it happens right before the final line of normal output.

[much more normal output precedes]
partition 3, leader 3, replicas: 3,4, isrs: 3,4
partition 6, leader 3, replicas: 3,2, isrs: 3,2 <--2nd to last normal line of output
%7|1418755433.235|BROKERFAIL|rdkafka#producer-0| hadoop-dev03:9092/bootstrap: failed: err: Local: Broker handle destroyed: (errno: Interrupted system call)
%7|1418755433.235|CONNECT|rdkafka#producer-0| hadoop-dev03:9092/3: couldn't connect to ipv4#10.101.51.231:9092: Interrupted system call
%7|1418755433.235|CONNECT|rdkafka#producer-0| hadoop-dev04:9092/4: couldn't connect to ipv4#10.101.51.233:9092: Interrupted system call
%7|1418755433.235|CONNECT|rdkafka#producer-0| hadoop-dev02:9092/2: couldn't connect to ipv4#10.101.51.229:9092: Interrupted system call
%7|1418755433.235|BROKERFAIL|rdkafka#producer-0| hadoop-dev04:9092/4: failed: err: Local: Broker transport failure: (errno: Interrupted system call)
%7|1418755433.235|BROKERFAIL|rdkafka#producer-0| hadoop-dev02:9092/2: failed: err: Local: Broker transport failure: (errno: Interrupted system call)
%7|1418755433.235|BROKERFAIL|rdkafka#producer-0| hadoop-dev03:9092/3: failed: err: Local: Broker transport failure: (errno: Interrupted system call)
%7|1418755433.235|STATE|rdkafka#producer-0| hadoop-dev03:9092/bootstrap: Broker changed state UP -> DOWN
%3|1418755433.235|FAIL|rdkafka#producer-0| hadoop-dev02:9092/2: Failed to connect to broker at hadoop-dev02:9092: Interrupted system call
%3|1418755433.235|ERROR|rdkafka#producer-0| hadoop-dev02:9092/2: Failed to connect to broker at hadoop-dev02:9092: Interrupted system call
%3|1418755433.235|FAIL|rdkafka#producer-0| hadoop-dev04:9092/4: Failed to connect to broker at hadoop-dev04:9092: Interrupted system call
%3|1418755433.235|FAIL|rdkafka#producer-0| hadoop-dev03:9092/3: Failed to connect to broker at hadoop-dev03:9092: Interrupted system call
%3|1418755433.235|ERROR|rdkafka#producer-0| hadoop-dev04:9092/4: Failed to connect to broker at hadoop-dev04:9092: Interrupted system call
%3|1418755433.235|ERROR|rdkafka#producer-0| hadoop-dev03:9092/3: Failed to connect to broker at hadoop-dev03:9092: Interrupted system call
%7|1418755433.235|STATE|rdkafka#producer-0| hadoop-dev02:9092/2: Broker changed state INIT -> DOWN
%7|1418755433.235|STATE|rdkafka#producer-0| hadoop-dev03:9092/3: Broker changed state INIT -> DOWN
%7|1418755433.235|STATE|rdkafka#producer-0| hadoop-dev04:9092/4: Broker changed state INIT -> DOWN
%7|1418755434.235|BROKERFAIL|rdkafka#producer-0| hadoop-dev03:9092/3: failed: err: Local: Broker handle destroyed: (errno: Interrupted system call)
%7|1418755434.235|BROKERFAIL|rdkafka#producer-0| hadoop-dev04:9092/4: failed: err: Local: Broker handle destroyed: (errno: Interrupted system call)
%7|1418755434.235|BROKERFAIL|rdkafka#producer-0| hadoop-dev02:9092/2: failed: err: Local: Broker handle destroyed: (errno: Interrupted system call)
partition 0, leader 3, replicas: 3,2, isrs: 3,2

$ echo $?
0

@edenhill
Copy link
Owner

Okay, so when kafkacat tells librdkafka to shut down, librdkafka sends a signal to each one of its internal broker threads (one thread per broker) to interrupt any ongoing system call immediately (rather than to wait for some arbitrary timeout thus stalling the termination), and this is what we're seeing here.
Purely cosmetic, but annoying.
I'll make sure this error isn't printed when things are being shut down.

@time-less-ness
Copy link
Author

Yeah, I was starting to get this idea, too.

http://stackoverflow.com/questions/6030310/c-interrupted-system-call-fork-vs-thread

It looks like you just need the thread to handle the EINTR.

@edenhill
Copy link
Owner

Can you update librdkafka and try again?
(If you are using ./bootstrap.sh, simply remove tmp-*)

@time-less-ness
Copy link
Author

Yep. Problem solved!

@edenhill
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants