Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient database unmarshaling #775

Merged
merged 2 commits into from
Feb 13, 2019
Merged

More efficient database unmarshaling #775

merged 2 commits into from
Feb 13, 2019

Conversation

rtitle
Copy link
Collaborator

@rtitle rtitle commented Feb 13, 2019

I'm on-call and saw several Leo high CPU alerts on 2/12. I looked at some thread dumps and saw multiple threads here:

"leonardo-akka.actor.default-dispatcher-50705" #319469 prio=5 os_prio=0 tid=0x00007f4a5c04f800 nid=0x6b1d runnable [0x00007f49f6640000]
   java.lang.Thread.State: RUNNABLE
	at scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:177)
	at scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:44)
	at scala.collection.generic.Growable.loop$1(Growable.scala:53)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:58)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:50)
	at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:186)
	at scala.collection.immutable.List.$colon$colon$colon(List.scala:128)
	at cats.kernel.instances.ListMonoid.combine(list.scala:81)
	at cats.kernel.instances.ListMonoid.combine(list.scala:79)
	at cats.kernel.instances.TupleInstances$$anon$83.combine(TupleAlgebra.scala:245)
	at cats.kernel.instances.TupleInstances$$anon$83.combine(TupleAlgebra.scala:244)
	at cats.kernel.SemigroupFunctions.maybeCombine(Semigroup.scala:56)
	at cats.kernel.instances.MapMonoid.$anonfun$combine$1(map.scala:33)
	at cats.kernel.instances.MapMonoid$$Lambda$1580/442904133.apply(Unknown Source)
	at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:157)
	at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:157)
	at scala.collection.TraversableOnce$$Lambda$68/1374026904.apply(Unknown Source)
	at scala.collection.immutable.Map$Map1.foreach(Map.scala:125)
	at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:157)
	at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:155)
	at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
	at cats.kernel.instances.MapMonoid.combine(map.scala:32)
	at cats.kernel.instances.MapMonoid.combine(map.scala:26)
	at cats.Foldable.$anonfun$foldMap$1(Foldable.scala:172)
	at cats.Foldable$$Lambda$1578/1338520970.apply(Unknown Source)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118)
	at scala.collection.immutable.List.foldLeft(List.scala:86)
	at cats.instances.ListInstances$$anon$1.foldLeft(list.scala:52)
	at cats.instances.ListInstances$$anon$1.foldLeft(list.scala:12)
	at cats.Foldable.foldMap(Foldable.scala:172)
	at cats.Foldable.foldMap$(Foldable.scala:26)
	at cats.instances.ListInstances$$anon$1.foldMap(list.scala:12)
	at cats.Foldable$Ops.foldMap(Foldable.scala:26)
	at cats.Foldable$Ops.foldMap$(Foldable.scala:26)
	at cats.Foldable$ToFoldableOps$$anon$3.foldMap(Foldable.scala:26)
	at org.broadinstitute.dsde.workbench.leonardo.db.ClusterComponent$clusterQuery$.unmarshalFullCluster(ClusterComponent.scala:421)
	at org.broadinstitute.dsde.workbench.leonardo.db.ClusterComponent$clusterQuery$.$anonfun$getActiveClusterByName$4(ClusterComponent.scala:182)
	at org.broadinstitute.dsde.workbench.leonardo.db.ClusterComponent$clusterQuery$$$Lambda$1768/1485543541.apply(Unknown Source)
	at slick.dbio.DBIOAction.$anonfun$map$1(DBIOAction.scala:43)
	at slick.dbio.DBIOAction$$Lambda$1370/1175847072.apply(Unknown Source)
	at slick.basic.BasicBackend$DatabaseDef.$anonfun$runInContextInline$1(BasicBackend.scala:171)
	at slick.basic.BasicBackend$DatabaseDef$$Lambda$1374/582919910.apply(Unknown Source)
	at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:303)
	at scala.concurrent.Future$$Lambda$610/1720451111.apply(Unknown Source)
	at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
	at scala.concurrent.impl.Promise$$Lambda$607/1165895922.apply(Unknown Source)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
	at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
	at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
	at akka.dispatch.BatchingExecutor$BlockableBatch$$Lambda$604/387462873.apply$mcV$sp(Unknown Source)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
	at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

A couple things stand out:

  1. The code is unmarshaling a DB response in ClusterComponent.getActiveClusterByName. This method is called in 2 places:
    • LeonardoService methods (create, get, update, delete)
    • ClusterDnsCache when it repopulates the cache (I suspect in practice it's mostly coming from here).
  2. It's stuck in ListMonoid. Scala lists are notoriously inefficient for everything except prepending.

So I made the following changes:

  1. ClusterDnsCache no longer calls the expensive getActiveClusterByName because it doesn't need all the extra join information. I added a new method getActiveClusterForDnsCache which doesn't do any joins.
  2. I switched from List to Chain where we are using the list monoid (note to do this I had to upgrade cats). Chain is supposedly much more efficient, though I haven't verified it. See:

Unit tests pass, will verify automation tests.

Have you read CONTRIBUTING.md lately? If not, do that first.

I, the developer opening this PR, do solemnly pinky swear that:

  • I've documented my API changes in Swagger

In all cases:

  • Get a thumbsworth of review and PO signoff if necessary
  • Verify all tests go green
  • Squash and merge; you can delete your branch after this
  • Test this change deployed correctly and works on dev environment after deployment


val workbenchUtilV = "0.3-0e9d080"
val workbenchUtilV = "0.5-6942040"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgraded workbenchUtil to not pull in an older version of cats.

@@ -97,7 +97,7 @@ class HttpGoogleDataprocDAO(appName: String,
override def getClusterStatus(googleProject: GoogleProject, clusterName: ClusterName): Future[ClusterStatus] = {
val transformed = for {
cluster <- OptionT(getCluster(googleProject, clusterName))
status <- OptionT.pure[Future, ClusterStatus](
status <- OptionT.pure[Future](
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cats API change - OptionT no longer takes the second type parameter. Otherwise it's equivalent.

(googleProjectDAO.isProjectActive(googleProject.value) |@| googleProjectDAO.isBillingActive(googleProject.value))
.map(_ && _)
(googleProjectDAO.isProjectActive(googleProject.value), googleProjectDAO.isBillingActive(googleProject.value))
.mapN(_ && _)
Copy link
Collaborator Author

@rtitle rtitle Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style is deprecated in cats:

(a |@| b).map(...)

They now recommend:

(a, b).mapN(...)

val errorList = errorRecordOpt.toList
val extList = extensionOpt.toList
val clusterImageList = clusterImageOpt.toList
val scopeList = scopeOpt.toList
Map(clusterRecord -> (instanceList, errorList, labelMap, extList, clusterImageList, scopeList))
Map(clusterRecord -> (Chain.fromSeq(instanceList), Chain.fromSeq(errorList), labelMap, Chain.fromSeq(extList), Chain.fromSeq(clusterImageList), Chain.fromSeq(scopeList)))
Copy link
Collaborator Author

@rtitle rtitle Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where I'm using Chain, inside the foldMap. Hopefully this is more efficient than ListMonoid. See:

https://typelevel.org/cats/datatypes/chain.html
https://typelevel.org/blog/2018/09/04/chain-replacing-the-list-monoid.html
typelevel/cats#2429

@rtitle
Copy link
Collaborator Author

rtitle commented Feb 13, 2019

fiab-start failed. jenkins retest

@rtitle
Copy link
Collaborator Author

rtitle commented Feb 13, 2019

Cats still isn't happy. From auto tests:

Uncaught error from thread [leonardo-akka.actor.default-dispatcher-18]: cats.instances.package$list$.catsStdInstancesForList()Lcats/TraverseFilter;, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[leonardo]
java.lang.NoSuchMethodError: cats.instances.package$list$.catsStdInstancesForList()Lcats/TraverseFilter;
	at org.broadinstitute.dsde.workbench.google.HttpGoogleIamDAO.updatePolicy(HttpGoogleIamDAO.scala:249)
	at org.broadinstitute.dsde.workbench.google.HttpGoogleIamDAO.$anonfun$addIamRolesForUser$1(HttpGoogleIamDAO.scala:135)
	at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:303)
	at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
	at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
	at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
	at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Will investigate.

EDIT: probably need to update workbench-google in addition to workbench-util

@rtitle rtitle merged commit 991838b into develop Feb 13, 2019
@rtitle rtitle deleted the rt-dns-fix branch February 13, 2019 21:28
Qi77Qi pushed a commit that referenced this pull request Jun 25, 2019
* More efficient database unmarshaling

* Update workbenchGoogle to the latest
jchanbroad pushed a commit that referenced this pull request Jun 27, 2019
* More efficient database unmarshaling

* Update workbenchGoogle to the latest
jdcanas pushed a commit that referenced this pull request Jul 10, 2019
* More efficient database unmarshaling

* Update workbenchGoogle to the latest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants