Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐸 #36

Merged
merged 9 commits into from
May 19, 2018
Merged

🐸 #36

merged 9 commits into from
May 19, 2018

Conversation

lqd
Copy link
Member

@lqd lqd commented May 18, 2018

This switches from using @frankmcsherry’s differential dataflow to
using @frankmcsherry’s bespoke datalog engine: datafrog.

99% of this commit was authored by the one and only @frankmcsherry.

lqd added 2 commits May 18, 2018 23:02
This switches from using @frankmcsherry’s differential dataflow to
using @frankmcsherry’s bespoke datalog engine: datafrog.

99% of this commit was authored by the one and only @frankmcsherry.
5% perf for free
let requires_1 = iteration.variable("requires_1");
let requires_2 = iteration.variable("requires_2");
let requires_bp = iteration.variable("requires_bp");
let requires_rp = iteration.variable("requires_rp");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for not noticing this before, but each of the requires_ variables can be variable_indistinct(). We just need requires itself (without a _) to be a variable() without _indistinct(). This has a solid perf benefit for me, taking the time from 9.43s to 7.99s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid benefits for me as well, 10-15% 👍

@@ -12,8 +12,8 @@ arg_enum! {
#[derive(Debug, Clone, Copy)]
pub enum Algorithm {
Naive,
TimelyOpt,
LocationInsensitive,
DatafrogOpt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, nice

subset.from_join(&subset_2, &region_live_at, |&(r2,q),&r1,&()| (r1,r2,q));
}

subset_r1p.complete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding here: it seems to me that the subset variable is not needed. We could just use subset_r1p instead? (That is, nothing directly joins from subset; the only use of it is to be mapped into different arrangements, and we could do those maps from subset_r1p just as easily, right?)

(cc @frankmcsherry)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer: yes, I checked elsewhere, and even got a slight win from doing so (in datafrog-opt).


// since we're using subset mapped ((r, p), r) we can use it directly out of iteration 1
let subset_r1p = iteration2.variable::<((Region, Point), Region)>("subset_r1p");
subset_r1p.insert(subset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "variable" is invariant, right? Do we really need to 'copy' the values from subset in here, or could we use subset directly?

(cc @frankmcsherry)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, let me answer my own question: it won't build if we don't, because from_join wants a Variable and not a Relation: presumably that could be modified in datafrog? (Or is this insert very cheap anyway?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I realize in this specific case we ought to collapse the iteration anyway.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is also relevant, it seems to the "redundantly computed index" entries we see below)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting a sorted relation should be pretty cheap. It will still do linear work, confirming that none of subset exists in the empty list of prior tuples, which we could totally optimize. But it shouldn't be too expensive either way.

More generally, there a bunch of cases where the difference between Variable and Relation and between Key and (Key, ()) bite us a little. These can be picked off as appropriate; just some repetition in the library code in the interest of squeaking out some perf. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my mistake. It will not even do the linear work, I think, on account of self.stable should be empty. I could be wrong about that, and we could work through exactly what happens when you install a relation. Not a great amount of care was used putting the wrappers in place. :)

@nikomatsakis nikomatsakis merged commit 3b985d0 into rust-lang:master May 19, 2018
@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 19, 2018

Final measurements:

Version Workers Polonius compile time clap analysis time
⏲️ 1 210s 14.191s
⏲️ 2 210s 11.035s
⏲️ 4 210s 8.898s
⏲️ 8 210s 7.953s
🐸 1 11s 6.843s

Amazing! Great work @frankmcsherry and @lqd!

@lqd lqd deleted the introducing-datafrog branch May 19, 2018 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants