-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐸 #36
🐸 #36
Conversation
This switches from using @frankmcsherry’s differential dataflow to using @frankmcsherry’s bespoke datalog engine: datafrog. 99% of this commit was authored by the one and only @frankmcsherry.
5% perf for free
src/output/datafrog_opt.rs
Outdated
let requires_1 = iteration.variable("requires_1"); | ||
let requires_2 = iteration.variable("requires_2"); | ||
let requires_bp = iteration.variable("requires_bp"); | ||
let requires_rp = iteration.variable("requires_rp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for not noticing this before, but each of the requires_
variables can be variable_indistinct()
. We just need requires
itself (without a _
) to be a variable()
without _indistinct()
. This has a solid perf benefit for me, taking the time from 9.43s to 7.99s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid benefits for me as well, 10-15% 👍
Another 10-15% for free
The old code dropped `p` and renamed `q` to `p`, which confused me.
@@ -12,8 +12,8 @@ arg_enum! { | |||
#[derive(Debug, Clone, Copy)] | |||
pub enum Algorithm { | |||
Naive, | |||
TimelyOpt, | |||
LocationInsensitive, | |||
DatafrogOpt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, nice
src/output/naive.rs
Outdated
subset.from_join(&subset_2, ®ion_live_at, |&(r2,q),&r1,&()| (r1,r2,q)); | ||
} | ||
|
||
subset_r1p.complete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check my understanding here: it seems to me that the subset
variable is not needed. We could just use subset_r1p
instead? (That is, nothing directly joins from subset
; the only use of it is to be mapped into different arrangements, and we could do those maps from subset_r1p
just as easily, right?)
(cc @frankmcsherry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answer: yes, I checked elsewhere, and even got a slight win from doing so (in datafrog-opt).
|
||
// since we're using subset mapped ((r, p), r) we can use it directly out of iteration 1 | ||
let subset_r1p = iteration2.variable::<((Region, Point), Region)>("subset_r1p"); | ||
subset_r1p.insert(subset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "variable" is invariant, right? Do we really need to 'copy' the values from subset
in here, or could we use subset
directly?
(cc @frankmcsherry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, let me answer my own question: it won't build if we don't, because from_join
wants a Variable
and not a Relation
: presumably that could be modified in datafrog? (Or is this insert
very cheap anyway?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I realize in this specific case we ought to collapse the iteration anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is also relevant, it seems to the "redundantly computed index" entries we see below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inserting a sorted relation should be pretty cheap. It will still do linear work, confirming that none of subset
exists in the empty list of prior tuples, which we could totally optimize. But it shouldn't be too expensive either way.
More generally, there a bunch of cases where the difference between Variable
and Relation
and between Key
and (Key, ())
bite us a little. These can be picked off as appropriate; just some repetition in the library code in the interest of squeaking out some perf. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my mistake. It will not even do the linear work, I think, on account of self.stable
should be empty. I could be wrong about that, and we could work through exactly what happens when you install a relation. Not a great amount of care was used putting the wrappers in place. :)
Final measurements:
Amazing! Great work @frankmcsherry and @lqd! |
This switches from using @frankmcsherry’s differential dataflow to
using @frankmcsherry’s bespoke datalog engine: datafrog.
99% of this commit was authored by the one and only @frankmcsherry.