Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a worker paused when occur "Race condition: tried to assign ID opentsdb" #879

Closed
mzw-g opened this issue Oct 10, 2016 · 2 comments
Closed
Labels

Comments

@mzw-g
Copy link

mzw-g commented Oct 10, 2016

version opentsdb2.2.0
when accur a WARN like "Race condition: tried to assign ID opentsdb...",the worker is paused ,and never work again.

thread stack:

[3] com.stumbleupon.async.Deferred.doJoin (Deferred.java:1,138)
[4] com.stumbleupon.async.Deferred.joinUninterruptibly (Deferred.java:1,064)
[5] net.opentsdb.uid.UniqueId.getOrCreateId (UniqueId.java:663)
[6] net.opentsdb.core.IncomingDataPoints.rowKeyTemplate (IncomingDataPoints.java:132)
[7] net.opentsdb.core.TSDB.addPointInternal (TSDB.java:785)
[8] java.lang.Float.parseFloat (Float.java:452)
[9] net.opentsdb.tsd.PutDataPointRpc.execute (PutDataPointRpc.java:210)
[10] net.opentsdb.tsd.RpcHandler.handleHttpQuery (RpcHandler.java:283)
[11] net.opentsdb.tsd.RpcHandler.messageReceived (RpcHandler.java:134)
[12] org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream (SimpleChannelUpstreamHandler.java:70)
[13] org.jboss.netty.handler.timeout.IdleStateAwareChannelUpstreamHandler.handleUpstream (IdleStateAwareChannelUpstreamHandler.java:36)
[14] org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream (DefaultChannelPipeline.java:564)
[15] org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream (DefaultChannelPipeline.java:791)
[16] org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived (IdleStateHandler.java:294)
[17] org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream (SimpleChannelUpstreamHandler.java:70)
[18] org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream (DefaultChannelPipeline.java:564)
[19] org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream (DefaultChannelPipeline.java:791)
[20] org.jboss.netty.handler.codec.http.HttpContentEncoder.messageReceived (HttpContentEncoder.java:82)
[21] org.jboss.netty.channel.SimpleChannelHandler.handleUpstream (SimpleChannelHandler.java:88)
[22] org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream (DefaultChannelPipeline.java:564)
[23] org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream (DefaultChannelPipeline.java:791)
[24] org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived (HttpContentDecoder.java:108)
[25] org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream (SimpleChannelUpstreamHandler.java:70)
[26] org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream (DefaultChannelPipeline.java:564)
[27] org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream (DefaultChannelPipeline.java:791)
[28] org.jboss.netty.channel.Channels.fireMessageReceived (Channels.java:296)
[29] org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived (HttpChunkAggregator.java:194)
[30] org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream (SimpleChannelUpstreamHandler.java:70)
[31] org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream (DefaultChannelPipeline.java:564)
[32] org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream (DefaultChannelPipeline.java:791)
[33] org.jboss.netty.channel.Channels.fireMessageReceived (Channels.java:296)
[34] org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived (FrameDecoder.java:452)
[35] org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode (ReplayingDecoder.java:536)
[36] org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived (ReplayingDecoder.java:435)
[37] org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream (SimpleChannelUpstreamHandler.java:70)
[38] org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream (DefaultChannelPipeline.java:564)
[39] org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream (DefaultChannelPipeline.java:791)
[40] org.jboss.netty.channel.SimpleChannelHandler.messageReceived (SimpleChannelHandler.java:142)
[41] org.jboss.netty.channel.SimpleChannelHandler.handleUpstream (SimpleChannelHandler.java:88)
[42] net.opentsdb.tsd.ConnectionManager.handleUpstream (ConnectionManager.java:87)
[43] org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream (DefaultChannelPipeline.java:564)
[44] org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream (DefaultChannelPipeline.java:559)
[45] org.jboss.netty.channel.Channels.fireMessageReceived (Channels.java:268)
[46] org.jboss.netty.channel.Channels.fireMessageReceived (Channels.java:255)
[47] org.jboss.netty.channel.socket.nio.NioWorker.read (NioWorker.java:88)
[48] org.jboss.netty.channel.socket.nio.AbstractNioWorker.process (AbstractNioWorker.java:108)
[49] org.jboss.netty.channel.socket.nio.AbstractNioSelector.run (AbstractNioSelector.java:318)
[50] org.jboss.netty.channel.socket.nio.AbstractNioWorker.run (AbstractNioWorker.java:89)
[51] org.jboss.netty.channel.socket.nio.NioWorker.run (NioWorker.java:178)
[52] org.jboss.netty.util.ThreadRenamingRunnable.run (ThreadRenamingRunnable.java:108)
[53] org.jboss.netty.util.internal.DeadLockProofWorker$1.run (DeadLockProofWorker.java:42)
[54] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,145)
[55] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:615)
[56] java.lang.Thread.run (Thread.java:745)

Dump the com.stumbleupon.async.Deferred

OpenTSDB I/O Worker #15[3] dump this
this = {
LOG: instance of ch.qos.logback.classic.Logger(id=108)
MAX_CALLBACK_CHAIN_LENGTH: 16383
INIT_CALLBACK_CHAIN_SIZE: 4
PENDING: 0
RUNNING: 1
PAUSED: 2
DONE: 3
state: 0
result: null
callbacks: instance of com.stumbleupon.async.Callback4
next_callback: 0
last_callback: 4
stateUpdater: instance of java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl(id=110)
}

OpenTSDB I/O Worker #15[3] dump this.callbacks
this.callbacks = {
instance of com.stumbleupon.async.Deferred$Continue(id=111), instance of com.stumbleupon.async.Deferred$Continue(id=111), instance of com.stumbleupon.async.Deferred$Signal(id=106), instance of com.stumbleupon.async.Deferred$Signal(id=106)
}
OpenTSDB I/O Worker #15[3] locals
Method arguments:
interruptible = false
timeout = 0
Local variables:
signal_cb = instance of com.stumbleupon.async.Deferred$Signal(id=106)
interrupted = false
OpenTSDB I/O Worker #15[3] dump signal_cb
signal_cb = {
result: instance of com.stumbleupon.async.Deferred$Signal(id=106)
thread: "OpenTSDB I/O Worker #15"
}

all threads

Group system:
(java.lang.ref.Reference$ReferenceHandler)0x67 Reference Handler cond. waiting
(java.lang.ref.Finalizer$FinalizerThread)0x66 Finalizer cond. waiting
(java.lang.Thread)0x65 Signal Dispatcher running
(java.lang.Thread)0x0 Attach Listener running
Group main:
(java.lang.Thread)0x64 OpenTSDB I/O Boss #1 running
(java.lang.Thread)0x63 OpenTSDB I/O Worker #1 running
(java.lang.Thread)0x62 OpenTSDB I/O Worker #2 running
(java.lang.Thread)0x61 OpenTSDB I/O Worker #3 running
(java.lang.Thread)0x60 OpenTSDB I/O Worker #4 running
(java.lang.Thread)0x5f OpenTSDB I/O Worker #5 running
(java.lang.Thread)0x5e OpenTSDB I/O Worker #6 running
(java.lang.Thread)0x5d OpenTSDB I/O Worker #7 running
(java.lang.Thread)0x5c OpenTSDB I/O Worker #8 running
(java.lang.Thread)0x5b OpenTSDB I/O Worker #9 running
(java.lang.Thread)0x5a OpenTSDB I/O Worker #10 running
(java.lang.Thread)0x59 OpenTSDB I/O Worker #11 running
(java.lang.Thread)0x58 OpenTSDB I/O Worker #12 running
(java.lang.Thread)0x57 OpenTSDB I/O Worker #13 running
(java.lang.Thread)0x56 OpenTSDB I/O Worker #14 running
(java.lang.Thread)0x55 OpenTSDB I/O Worker #15 cond. waiting
(java.lang.Thread)0x54 OpenTSDB I/O Worker #16 running
(java.lang.Thread)0x53 OpenTSDB I/O Worker #17 running
(java.lang.Thread)0x52 OpenTSDB I/O Worker #18 running
(java.lang.Thread)0x51 OpenTSDB I/O Worker #19 running
(java.lang.Thread)0x50 OpenTSDB I/O Worker #20 running
(java.lang.Thread)0x4f OpenTSDB I/O Worker #21 running
(java.lang.Thread)0x4e OpenTSDB I/O Worker #22 running
(java.lang.Thread)0x4d OpenTSDB I/O Worker #23 running
(java.lang.Thread)0x4c OpenTSDB I/O Worker #24 running
(java.lang.Thread)0x4b OpenTSDB I/O Worker #25 running
(java.lang.Thread)0x4a OpenTSDB I/O Worker #26 running
(java.lang.Thread)0x49 OpenTSDB I/O Worker #27 running
(java.lang.Thread)0x48 OpenTSDB I/O Worker #28 running
(java.lang.Thread)0x47 OpenTSDB I/O Worker #29 running
(java.lang.Thread)0x46 OpenTSDB I/O Worker #30 running
(java.lang.Thread)0x45 OpenTSDB I/O Worker #31 running
(java.lang.Thread)0x44 OpenTSDB I/O Worker #32 running
(java.lang.Thread)0x43 OpenTSDB I/O Worker #33 running
(java.lang.Thread)0x42 OpenTSDB I/O Worker #34 running
(java.lang.Thread)0x41 OpenTSDB I/O Worker #35 running
(java.lang.Thread)0x40 OpenTSDB I/O Worker #36 running
(java.lang.Thread)0x3f OpenTSDB I/O Worker #37 running
(java.lang.Thread)0x3e OpenTSDB I/O Worker #38 running
(java.lang.Thread)0x3d OpenTSDB I/O Worker #39 running
(java.lang.Thread)0x3c OpenTSDB I/O Worker #40 running
(java.lang.Thread)0x3b OpenTSDB I/O Worker #41 running
(java.lang.Thread)0x3a OpenTSDB I/O Worker #42 running
(java.lang.Thread)0x39 OpenTSDB I/O Worker #43 running
(java.lang.Thread)0x38 OpenTSDB I/O Worker #44 running
(java.lang.Thread)0x37 OpenTSDB I/O Worker #45 running
(java.lang.Thread)0x36 OpenTSDB I/O Worker #46 running
(java.lang.Thread)0x35 OpenTSDB I/O Worker #47 running
(java.lang.Thread)0x34 OpenTSDB I/O Worker #48 running
(java.lang.Thread)0x2 DestroyJavaVM running
(java.lang.Thread)0x1 OpenTSDB Timer TSDB Timer #1 sleeping

@manolama manolama added the bug label Oct 15, 2016
@mzw-g
Copy link
Author

mzw-g commented Oct 21, 2016

This is solved ,There is a bug on our own custom api of HBaseClient.(we rewrite a new api like HBaseClient to deal with another Storage System)
At the function HBaseClient.atomicIncrement,we returned a IOException witch is not a instance of HBaseException .
So at the below code:
net.opentsdb.uid.UniqueId.UniqueIdAllocator.call(UniqueId.java:419)

if (arg instanceof Exception) {
        final String msg = ("Failed attempt #" + (randomize_id
                         ? (MAX_ATTEMPTS_ASSIGN_RANDOM_ID - attempt) 
                         : (MAX_ATTEMPTS_ASSIGN_ID - attempt))
                         + " to assign an UID for " + kind() + ':' + name
                         + " at step #" + state);
        if (arg instanceof HBaseException) {
          LOG.error(msg, (Exception) arg);
          hbe = (HBaseException) arg;
          attempt--;
          state = ALLOCATE_UID;;  // Retry from the beginning.
        } else {
          LOG.error("WTF?  Unexpected exception!  " + msg, (Exception) arg);
          return arg;  // Unexpected exception, let it bubble up.
        }

that cause a "WTF" exception. and got the stack above.

Thanks for your attention!

@mzw-g mzw-g changed the title a worker paused when accur "Race condition: tried to assign ID opentsdb" a worker paused when occur "Race condition: tried to assign ID opentsdb" Oct 24, 2016
@manolama
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants