-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a test with a large DAG #27
Conversation
@johnynek nice, I have something like that, but separate for like:
and for
|
@dieu @non please take a look. This passes (after a very long time) on my laptop. We should be able to optimize the giant dag case. Right now, we spin a long time constantly recomputing the fanOut function. This won't throw, but it can get very slow. I think we have some quadratic algoritms in play here currently (or possibly worse). We have a number of places where we do a linear search at each step (this should be N^2 or greater if we are looking at each position on the graph to apply a rule). We could possibly keep the memoizations across modifications to the DAG, or possibly find a way to avoid checking every node. |
three issues with this PR currently:
Item 3 seems the most intractable. Knowing the fanout of a node, or at least it is used exactly once, not at all, or more than once, seems needed for many optimizations. However, I have not found a good algorithm yet to incrementally update this property. Currently, we just recompute the entire reverse dependency graph every time the graph changes. If we could find an incremental algorithm to compute fanout (or this fanout 3 state: 0, 1, >1), I think we will have solved our problems. |
I ran this with yourkit... I think the only way we will get a lot better is maintaining data structures and not recomputing them. Right now, basically all the time is taken on getting from HashMaps. It's pretty hard to maintain many of the views since if we change an Id from one Node to another due to a rule, that can rewrite a lot of the graph. |
Okay, this change gets the giant test passing, but still other tests fail which I need to fix. This graph turns out to be high pathological for the current code. We were going in order from smallest ID to largest, but that tends to work on nodes from source to sink. Working from sinks to source is a better direction since the rules can see bigger subgraphs and potentially work on those. By leveraging a recursive rule, I took the complexity from O(N^2) to O(N) I think (or maybe even N^3 to N^2). This is because we didn't modify the graph N times, and each of those modifications doing O(N) work. Instead, we modify 1 time, and do O(N) work on that modification, and then follow up with O(N) work. This hints a better way to write rules: always apply them top down. That is a bigger change. |
@non @dieu could you review this? I think it is mergable after we figure out the CI issues that are currently making the giant test timeout on scalajs. I think the usual approach here is to make the test smaller on scalajs, but that seems to require some sbt hijinks, which of course will cause me to spin out on how much I hate having to deal with that crap. actually maybe just adding a test dependency on catalysts works: |
okay, this passes. @dieu can I get a review? |
private val nodeToId: HCache[N, Lambda[t => Option[Id[t]]]] = | ||
HCache.empty[N, Lambda[t => Option[Id[t]]]] | ||
// Caches polymorphic functions of type Literal[N, T] => Option[Id[T]] | ||
private val litToId: HCache[Literal[N, ?], Lambda[t => Option[Id[t]]]] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this potentially can be a problem because Literal
represents the structure of a user graph and can be as deep as a user graph. And if HCache is using hashCode
and equals
for lookups it can lead StackOverflow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, since we are not triggering it now. The trick is we memoize construction of Literal so we leverage reference equality on them for fast equals.
This works as long as the user uses Memoize
as we have done here to construct the Literals.
I don't think we need to go nuts being defensive, we just need to make sure there is a usable path for users that need giant graph support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, hashCode is cached for Literal nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some weird stuff:
if I run the test as test:testOnly *DataFlowTest
then tests are passed.
but if I run the test as test:testOnly *DataFlowTest -- -t "test a giant graph"
then I got StackOverflow
:
[info] at com.stripe.dagon.DataFlowTest$Flow$ComposedOM.hashCode(DataFlowTest.scala:196)
[info] at scala.runtime.Statics.anyHash(Statics.java:115)
[info] at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:64)
[info] at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:211)
[info] at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:145)
[info] at com.stripe.dagon.DataFlowTest$Flow$ComposedOM.hashCode(DataFlowTest.scala:196)
[info] at scala.runtime.Statics.anyHash(Statics.java:115)
[info] at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:64)
[info] at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:211)
[info] at com.stripe.dagon.DataFlowTest$Flow.<init>(DataFlowTest.scala:33)
[info] at com.stripe.dagon.DataFlowTest$Flow$OptionMapped.<init>(DataFlowTest.scala:108)
[info] at com.stripe.dagon.DataFlowTest$Flow$composeOptionMapped$.compose(DataFlowTest.scala:279)
[info] at com.stripe.dagon.DataFlowTest$Flow$composeOptionMapped$.$anonfun$apply$5(DataFlowTest.scala:285)
[info] at com.stripe.dagon.Rule$$anon$1.$anonfun$apply$1(Rule.scala:27)
[info] at com.stripe.dagon.Rule$$anon$1.$anonfun$apply$1(Rule.scala:27)
[info] at com.stripe.dagon.Rule$$anon$1.$anonfun$apply$1(Rule.scala:27)
[info] at com.stripe.dagon.Rule$$anon$1.$anonfun$apply$1(Rule.scala:27)
[info] at com.stripe.dagon.Rule$$anon$1.$anonfun$apply$1(Rule.scala:27)
[info] at com.stripe.dagon.Dag$$anon$2.$anonfun$toFunction$1(Dag.scala:109)
[info] at com.stripe.dagon.Dag.go$1(Dag.scala:155)
[info] at com.stripe.dagon.Dag.$anonfun$applyOnce$1(Dag.scala:157)
[info] at scala.collection.Iterator$$anon$10.next(Iterator.scala:448)
[info] at scala.collection.TraversableOnce.collectFirst(TraversableOnce.scala:145)
[info] at scala.collection.TraversableOnce.collectFirst$(TraversableOnce.scala:132)
[info] at scala.collection.AbstractIterator.collectFirst(Iterator.scala:1417)
[info] at com.stripe.dagon.Dag.applyOnce(Dag.scala:159)
[info] at com.stripe.dagon.Dag.loop$1(Dag.scala:79)
[info] at com.stripe.dagon.Dag.apply(Dag.scala:84)
[info] at com.stripe.dagon.DataFlowTest.$anonfun$new$87(DataFlowTest.scala:863)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is weird. What do you propose we do about it? The test passes as part of the suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the stack depth could be different depending on how we run it, and it may be that this way of running increases the depth just enough... Note, that stack overflow is in the hashCode method of the test code. We know about this issue already: your AST nodes should have stack safe hashCode and equals. As mentioned before, this can be made somewhat easier in a follow up PR.
val lit = toLiteral(node) | ||
try ensureFast(lit) | ||
catch { | ||
case _: Throwable => //StackOverflowError should work, but not on scala.js |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we find out which error is happening in scala.js?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but it isn't the same error on jvm (it was in the travis CI output). I don't think it is worth trying to some special build that catches one kind of error (which does not exist on jvm) in one, and another for scalajs.
You could do this with a custom build of this method, but honestly, I don't think it is a problem. If there is a user error, they will hit it in both branches and there will be an exception in the second branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you right probably is not that important.
/* | ||
* We *non-recursively* use either the fast approach or the slow approach | ||
*/ | ||
Memoize.functionK[Literal[N, ?], N](new Memoize.RecursiveK[Literal[N, ?], N] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially same problem with caching Literal
/* | ||
* You need a custom equals to avoid stack overflow | ||
*/ | ||
override def equals(that: Any) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to give our user some easier way to define this or avoid using equals
and hashCode
in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a follow up PR to make this easier, but honestly, this library is basically for compiler-like things. It is not some general purpose thing that comes up all the time. I don't mind if those users have to think for a few minutes about best practices. Giving up performance across the board (which almost any generic solution will do) to make things easier might not be a good trade.
Note, the current changes are about making it possible to use with arbitrary graphs, i.e. fixing the internal issues IF the user makes sane equals and hashCode methods on their type.
@johnynek I need more time to play with it, I will come back to you in the next 2 days. I want to double check that's we don't have a problem with caching |
Thanks. Note we pass the tests which exercise the caching. |
Codecov Report
@@ Coverage Diff @@
## master #27 +/- ##
=========================================
- Coverage 85.01% 84.22% -0.8%
=========================================
Files 13 13
Lines 267 374 +107
Branches 18 21 +3
=========================================
+ Hits 227 315 +88
- Misses 40 59 +19
Continue to review full report at Codecov.
|
one side note, in one of my iteration to avoid recursion, I come up with something like:
and usage:
the general idea is linearizing tree into the list of items, and then process them one by one, on same way we can do equals and hashcode as well. not sure if you like it, but for user looks like it's easy to write this kind of PS: instead of stack, we can use |
* This does recursion on the stack, which is faster, but can overflow | ||
*/ | ||
protected def ensureFast[T](lit: Literal[N, T]): (Dag[N], Id[T]) = | ||
findLiteral(lit, lit.evaluate) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use here saved version of Literal.evaluateMemo
to maintain the cache of evaluated literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great suggestion. Thanks.
/* | ||
* This does recursion on the stack, which is faster, but can overflow | ||
*/ | ||
protected def ensureFast[T](lit: Literal[N, T]): (Dag[N], Id[T]) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure
basically converting Literal
to Expr
, right? what's do you think to remove the layer of Expr
and merge Literal
and Expr
together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expr has Ids which allow us to rewrite nodes. Literal has no Ids. It’s not totally obvious to me how to merge them in a useful way. The point of adding the Ids is so we can rewrite the graph without losing the types. If the users have to construct the Expr directly they have a hard problem on their hands (basically they need to write ensure and the Is tracking in user code).
Do you have a concrete idea of how to merge them?
@dieu I have taken your suggestion about sharing the memoization of the Literal to N mapping across the entire ensure operation. What do you think about merging this and you can follow up with more improvements along the lines you are suggesting. I think this addresses the core problem: making it possible to use dagon with giant graphs. Note it also does so in a binary compatible way. |
sounds good for me. |
@dieu take a look...
This reproduces the stack overflow we have seen in scalding.