Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metasync appears to hang #1387

Closed
johnseekins opened this issue Oct 22, 2018 · 9 comments
Closed

Metasync appears to hang #1387

johnseekins opened this issue Oct 22, 2018 · 9 comments
Labels

Comments

@johnseekins
Copy link
Contributor

root@use1-hadoop-4:~# sudo -u opentsdb /usr/share/opentsdb/bin/tsdb version
net.opentsdb.tools 2.3.1 built at revision 1478380 (MINT)
Built on 2018/05/22 00:28:29 +0000 by hobbes@clhbase:/home/hobbes/OFFICIAL/build

Every time I attempt to manually sync metadata, the job will run for a short period, then appear to hang. I say "appear" because the log of uidmeta objects being created stops progressing, but stracing the metasync process shows activity (although it doesn't look like useful activity).
The log will look similar to:

19:42:22.492 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D500000100078D00000500002F00000900005600000F00051200001B00092200001C00011000001D000058]
19:42:22.492 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D500000100085900000500002F00000900005600000F00002500001B00055A00001C00005400001D000058]
19:42:22.492 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D50000010007CA00000500002F00000900005600000F0000CA00001B00092200001C00011000001D000058]
19:42:22.492 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D500000100078B00000500002F00000900005600000F00002500001B00055A00001C00005C00001D000058]
19:42:22.493 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D500000100031100000500000600000900005600001B0007B300001C00055D00001D000064]
19:42:22.493 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D500000100086500000500009B00000900005600000F0003FB00001B00055A00001C00005400001D000058]
19:42:22.493 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D50000010009AD00000500002F00000900005600000F00002500001B00092200001C00011000001D000058]
19:42:22.493 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D80000010007C300000500002F00000900005600000F00030800001B00096D00001C00005400001D000064]
19:42:22.494 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D500000100090400000500000600000900005600000F0000CA00001B00092200001C00011000001D000058]
19:42:22.494 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D80000010007CA00000500002F00000900005600000F0000CA00001B00096D00001C00005400001D000064]
19:42:22.494 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D80000010007F800000500002F00000900005600000F0003FB00001B00096D00001C00005400001D000064]
19:42:22.494 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D800000100024E00000500000600000900005600001B00096D00001C00005400001D000064]
19:42:22.495 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D800000100047500000500000600000900005600001B00096D00001C00005400001D000064]
19:42:22.495 WARN  [TSMeta.call] - Found a counter TSMeta column without a meta for TSUID: 0001D4000001000CC800000500002F00000F000CC7
19:42:22.496 INFO  [MetaSync.call] - Counter exists but meta was null, creating meta data for timeseries [0001D4000001000CC800000500002F00000F000CC7]
19:43:01.998 INFO  [UniqueId.allocateUid] - Creating an ID for kind='tagv' name='dattoarray1bfydata9office365sharepoint127.zfssnapshot1540237294887'
19:43:02.000 INFO  [UniqueId.createReverseMapping] - Got ID=8778 for kind='tagv' name='dattoarray1bfydata9office365sharepoint127.zfssnapshot1540237294887'
19:43:02.002 INFO  [UniqueId.done] - Wrote UIDMeta for: dattoarray1bfydata9office365sharepoint127.zfssnapshot1540237294887
19:43:02.003 INFO  [UniqueId.done] - Completed pending assignment for: dattoarray1bfydata9office365sharepoint127.zfssnapshot1540237294887
19:43:02.003 INFO  [UniqueId.allocateUid] - Creating an ID for kind='tagv' name='dattoarraybfydata91271540237294887'
19:43:02.005 INFO  [UniqueId.createReverseMapping] - Got ID=8780 for kind='tagv' name='dattoarraybfydata91271540237294887'
19:43:02.007 INFO  [UniqueId.done] - Wrote UIDMeta for: dattoarraybfydata91271540237294887
19:43:02.008 INFO  [UniqueId.done] - Completed pending assignment for: dattoarraybfydata91271540237294887

All data from then on no longer mentions Meta threads...

Each subsequent run of metasync seems to at least be "progressing" in the sense that the UID that shows up at the end of the list is different every time.

Also, each metasync run has at least one appearance of this stacktrace (albeit with a different row mentioned):

22:13:45.269 ERROR [MetaSync.call] - Unknown exception processing row: [KeyValue(key=[5, 0, 0, 1, 91, -64, -88, -32, 0, 0, 1, 0, 0, 1, 0, 0, 5, 0, 0, 6], family="t", qualifier=[41, 91, 45, 27, 48, -53, 52, -101, 56, 91, 60, 27, 63, -37, 67, -101, 71, 91, 75, 27, 78, -37, 82, -101, 86
, 75, 90, 27, 93, -53, 97, -101, 101, 75, 105, 27, 108, -37, 112, -101, 116, 91, 120, 27, 123, -37, 127, -117, -125, 91, -121, 27, -118, -37, -114, -101, -110, 75, -106, 11, -103, -37, -99, -101, -95, 91, -91, 27, -88, -37, -84, -101, -80, 91, -76, 27, -73, -53, -69, -117, -65, 91, -
61, 27, -58, -53, -54, -101, -50, 91, -46, 27, -43, -37, -39, -101, -35, 91], value=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], timestamp=1539357304788)]
java.lang.IllegalMonitorStateException: null
        at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151) ~[na:1.8.0_181]
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261) ~[na:1.8.0_181]
        at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457) ~[na:1.8.0_181]
        at net.opentsdb.tree.TreeBuilder$1ErrorCB.call(TreeBuilder.java:528) ~[tsdb-2.3.1.jar:1478380]
        at net.opentsdb.tree.TreeBuilder$1ErrorCB.call(TreeBuilder.java:524) ~[tsdb-2.3.1.jar:1478380]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1005) [async-1.4.0.jar:na]
        at net.opentsdb.tree.Tree$1AllTreeScanner.call(Tree.java:566) [tsdb-2.3.1.jar:1478380]
        at net.opentsdb.tree.Tree$1AllTreeScanner.call(Tree.java:543) [tsdb-2.3.1.jar:1478380]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1005) [async-1.4.0.jar:na]
        at org.hbase.async.HBaseRpc.callback(HBaseRpc.java:720) [asynchbase-1.8.2.jar:na]
        at org.hbase.async.RegionClient.decode(RegionClient.java:1575) [asynchbase-1.8.2.jar:na]
        at org.hbase.async.RegionClient.decode(RegionClient.java:88) [asynchbase-1.8.2.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.10.6.Final.jar:na]
        at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1230) [asynchbase-1.8.2.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelHandler.messageReceived(SimpleChannelHandler.java:142) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.handler.timeout.IdleStateAwareChannelHandler.handleUpstream(IdleStateAwareChannelHandler.java:36) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:294) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.10.6.Final.jar:na]
        at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:3857) [asynchbase-1.8.2.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.6.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]

I did collect a stack trace (jstack -l <pid>) if that's helpful...

@johnseekins
Copy link
Contributor Author

Forgot my config:

root@use1-hadoop-4:~# cat /etc/opentsdb/opentsdb.conf 
tsd.network.port = 4242
tsd.network.bind = 0.0.0.0
tsd.network.tcp_no_delay = true
tsd.network.keep_alive = true
tsd.network.reuse_address = true
# Number of worker threads dedicated to Netty, defaults to # of CPUs * 2
#tsd.network.worker_threads = 8
tsd.network.async_io = true
tsd.http.staticroot = /usr/share/opentsdb/static/
tsd.http.cachedir = /hadoop/0/opentsdb
tsd.http.request.enable_chunked = true
tsd.http.query.allow_delete = true
tsd.http.request.max_chunk = 20097152
tsd.query.filter.expansion_limit = 8192
tsd.core.plugin_path = /usr/share/opentsdb/plugins
tsd.core.auto_create_metrics = true
tsd.core.meta.enable_realtime_uid = true
tsd.core.meta.enable_realtime_ts = false
tsd.core.tree.enable_processing = true
tsd.core.uid.random_metrics = true
tsd.storage.max_tags = 10
tsd.storage.enable_compaction = true
tsd.storage.flush_interval = 10000
tsd.storage.fix_duplicates = true
tsd.storage.salt.width = 1
tsd.storage.salt.buckets = 7
# Name of the HBase table where data points are stored, default is "tsdb"
#tsd.storage.hbase.data_table = tsdb
# Name of the HBase table where UID information is stored, default is "tsdb-uid"
#tsd.storage.hbase.uid_table = tsdb-uid
tsd.storage.hbase.zk_basedir = /hbase-unsecure
tsd.storage.hbase.zk_quorum = use1-hadoop-3.datto.lan:2181,use1-hadoop-5.datto.lan:2181,use1-hadoop-4.datto.lan:2181

@johnseekins
Copy link
Contributor Author

And here is the stuck thread matching the stacktrace from earlier:

AsyncHBase I/O Worker #7" #29 prio=5 os_prio=0 tid=0x00007fad20443800 nid=0x25d7 waiting on condition [0x00007facd8af7000]                                                                                                                                                                                             java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000067cea9b00> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at net.opentsdb.tree.TreeBuilder.processAllTrees(TreeBuilder.java:535)
        at net.opentsdb.core.TSDB.processTSMetaThroughTrees(TSDB.java:1492)
        at net.opentsdb.meta.TSMeta$1TSMetaCB$1FetchNewCB.call(TSMeta.java:569)
        at net.opentsdb.meta.TSMeta$1TSMetaCB$1FetchNewCB.call(TSMeta.java:560)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
        at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1313)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1284)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
        at com.stumbleupon.async.DeferredGroup.done(DeferredGroup.java:173)
        at com.stumbleupon.async.DeferredGroup.recordCompletion(DeferredGroup.java:142)
        at com.stumbleupon.async.DeferredGroup.access$000(DeferredGroup.java:36)
        at com.stumbleupon.async.DeferredGroup$1Notify.call(DeferredGroup.java:82)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
        at com.stumbleupon.async.Deferred.access$300(Deferred.java:430)
        at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
        at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1313)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1284)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
        at org.hbase.async.HBaseRpc.callback(HBaseRpc.java:720)
        at org.hbase.async.RegionClient.decode(RegionClient.java:1575)
        at org.hbase.async.RegionClient.decode(RegionClient.java:88)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1230)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.jboss.netty.channel.SimpleChannelHandler.messageReceived(SimpleChannelHandler.java:142)
        at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
        at org.jboss.netty.handler.timeout.IdleStateAwareChannelHandler.handleUpstream(IdleStateAwareChannelHandler.java:36)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:294)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:3857)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

@johnseekins
Copy link
Contributor Author

johnseekins commented Oct 24, 2018

I'll also frequently see messages like
[AsyncHBase I/O Worker #6] MetaSync: Counter exists but meta was null, creating meta data for timeseries [00000500000100038C000005000006]
when we're in this "state".

I assume this is a normal condition where metadata wasn't entirely synced, so the metasync overwrites the broken data.

@johnseekins
Copy link
Contributor Author

The result of metasync not working is we can't use template variables in Grafana with OpenTSDB. Which is rather frustrating.

@johnseekins
Copy link
Contributor Author

It seems that this is caused by changing the config for metasync (or an initial run of the metasync crashing).
It also seems to matter whether the actual TSD daemon is trying to generate metadata or not.

Our solution for this problem was to completely disable compaction and metadata creation on for the TSD daemon and performing both operations through the CLI. Compaction through FSCK operations (and the tool we wrote that was just accepted in #1391 ) and metasync on a cron.

@johnseekins
Copy link
Contributor Author

johnseekins commented Nov 6, 2018

Turns out this doesn't completely resolve the issue.
After carefully ensuring all hosts have consistent config, etc. we still get errors like this:

2018-11-06 17:59:02,903 WARN  [AsyncHBase I/O Worker #3] MetaSync: Failed creating meta for: [B@5d1a7a84 with exception: 
java.lang.IllegalMonitorStateException: null
        at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151) ~[na:1.8.0_112]
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261) ~[na:1.8.0_112]
        at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457) ~[na:1.8.0_112]
        at net.opentsdb.tree.TreeBuilder$1ErrorCB.call(TreeBuilder.java:528) ~[tsdb-2.3.1.jar:1478380]
        at net.opentsdb.tree.TreeBuilder$1ErrorCB.call(TreeBuilder.java:524) ~[tsdb-2.3.1.jar:1478380]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1005) [async-1.4.0.jar:na]
        at net.opentsdb.tree.Tree$1AllTreeScanner.call(Tree.java:566) [tsdb-2.3.1.jar:1478380]
        at net.opentsdb.tree.Tree$1AllTreeScanner.call(Tree.java:543) [tsdb-2.3.1.jar:1478380]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) [async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1005) [async-1.4.0.jar:na]
        at org.hbase.async.HBaseRpc.callback(HBaseRpc.java:720) [asynchbase-1.8.2.jar:na]
        at org.hbase.async.RegionClient.decode(RegionClient.java:1575) [asynchbase-1.8.2.jar:na]
        at org.hbase.async.RegionClient.decode(RegionClient.java:88) [asynchbase-1.8.2.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.10.6.Final.jar:na]
        at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1230) [asynchbase-1.8.2.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelHandler.messageReceived(SimpleChannelHandler.java:142) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.handler.timeout.IdleStateAwareChannelHandler.handleUpstream(IdleStateAwareChannelHandler.java:36) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:294) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.10.6.Final.jar:na]
        at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:3857) [asynchbase-1.8.2.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.6.Final.jar:na]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.6.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_112]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_112]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]

@johnseekins johnseekins reopened this Nov 6, 2018
@manolama
Copy link
Member

Ahhhh.... you're using my really ugly tree code huh? Do you really need that enabled? If not and you can disable it that would be your best situation. I'm finally uploading our ElasticSearch meta-data code as part of 3.0 so that will be the best way to sync and search for time series.

@manolama manolama added the bug label Dec 11, 2018
@johnseekins
Copy link
Contributor Author

Unless you have another solution to make Grafana display nice filtered graphs in OpenTSDB, we're stuck with this code tree.
As a side note, deleting metrics from OpenTSDB seems to slow down the metasync tool...

@johnseekins
Copy link
Contributor Author

So...this was likely a problem with our data table. The table was so backed up with the compaction queue that it couldn't properly close regions. This made scans and other activity break in weird ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants