Skip to content
This repository has been archived by the owner on Oct 6, 2018. It is now read-only.

Play service crashing after extended period with window open. #38

Closed
ibanner56 opened this issue Oct 15, 2014 · 7 comments
Closed

Play service crashing after extended period with window open. #38

ibanner56 opened this issue Oct 15, 2014 · 7 comments

Comments

@ibanner56
Copy link

Service crashes with the following stack strace:

Uncaught error from thread [play-akka.actor.default-dispatcher-800] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[play]
java.lang.NoClassDefFoundError: common/Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1$$anonfun$applyOrElse$1
    at common.Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1.applyOrElse(Util.scala:82)
    at common.Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1.applyOrElse(Util.scala:81)
    at scala.runtime.AbstractPartialFunction$mcJL$sp.apply$mcJL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcJL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcJL$sp.apply(AbstractPartialFunction.scala:25)
    at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
    at scala.util.Try$.apply(Try.scala:161)
    at scala.util.Failure.recover(Try.scala:185)
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:387)
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:387)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:29)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
    at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: common.Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1$$anonfun$applyOrElse$1
    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 23 more
Caused by: java.io.FileNotFoundException: /home/ubuntu/app/kafka-web-console/target/scala-2.10/classes/common/Util$$anonfun$getPartitionsLogSize$3$$anonfun$apply$19$$anonfun$apply$1$$anonfun$applyOrElse$1.class (Too many open files)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:146)
    at sun.misc.URLClassPath$FileLoader$1.getInputStream(URLClassPath.java:1086)
    at sun.misc.Resource.cachedInputStream(Resource.java:77)
    at sun.misc.Resource.getByteBuffer(Resource.java:160)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:436)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    ... 29 more

Here is the function it seems to be failing in, as mentioned at the top of the stack trace (from Util.scala):

  def getPartitionsLogSize(topicName: String, partitionLeaders: Seq[String]): Future[Seq[Long]] = {
    Logger.debug("Getting partition log sizes for topic " + topicName + " from partition leaders " + partitionLeaders.mkString(", "))
    return for {
      clients <- Future.sequence(partitionLeaders.map(addr => Future((addr, Kafka.newRichClient(addr)))))
      partitionsLogSize <- Future.sequence(clients.zipWithIndex.map { tu =>
        val addr = tu._1._1
        val client = tu._1._2
        var offset = Future(0L)
        if (!addr.isEmpty) {
          offset = twitterToScalaFuture(client.offset(topicName, tu._2, OffsetRequest.LatestTime)).map(_.offsets.head).recover {
            case e => Logger.warn("Could not connect to partition leader " + addr + ". Error message: " + e.getMessage); 0L
          }
        }

        client.close()
        offset
      })
    } yield partitionsLogSize
  }
@ibanner56
Copy link
Author

Note that the line numbers done correspond to the master branch, since we added a few imports for other tasks. The specific lines according to the stack trace for our version are:

        if (!addr.isEmpty) {
          offset = twitterToScalaFuture(client.offset(topicName, tu._2, OffsetRequest.LatestTime)).map(_.offsets.head).recover {
            case e => Logger.warn("Could not connect to partition leader " + addr + ". Error message: " + e.getMessage); 0L
          }
        }

@ibanner56
Copy link
Author

Apparently this is due to the ulimit being too low. 1024 is too small. We're going to boost out ulimit to the max for our server and see if it resolves the issue.

@ibanner56
Copy link
Author

Seems to have extended the length of time we can run before the issue reappears, however the issue still persists.

@guihaojin
Copy link

I got the same error.

@guihaojin
Copy link

Looks like the web-console is leaking socket. I saw the number of TCP connections with Kafka brokers keep growing as the server runs. Not sure if it's problem of the web-console or my Kafka/Zookeepers.

@joelsvensson
Copy link

I have the same problem with a 3node zookeeper/kafka cluster

For each: "[debug] application - Getting partition log sizes for topic test from partition leaders" the number of established sessions towards kafka grows by 15:

netstat -an | grep ESTA | grep 9092 | wc -l
15
netstat -an | grep ESTA | grep 9092 | wc -l
30
netstat -an | grep ESTA | grep 9092 | wc -l
45
netstat -an | grep ESTA | grep 9092 | wc -l
60
netstat -an | grep ESTA | grep 9092 | wc -l
75

This is from a fresh restart without browsing the GUI

@ibanner56
Copy link
Author

This is just another version of #30.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants