-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Parallelize loading include
objects
#6501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: alpha
Are you sure you want to change the base?
Conversation
Hi @noahsilas Have you thought about how to profile this stuff so we can get some before/after benchmarks? Gonna try and digest this. |
Codecov Report
@@ Coverage Diff @@
## master #6501 +/- ##
==========================================
+ Coverage 93.87% 93.88% +0.01%
==========================================
Files 169 169
Lines 11968 11961 -7
==========================================
- Hits 11235 11230 -5
+ Misses 733 731 -2
Continue to review full report at Codecov.
|
} | ||
// Construct a graph of promises, ensuring that we don't try to load | ||
// any path until it's prefix has finished loading. | ||
const promises = { '': Promise.resolve() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this idiom. trying to figure out how i can figure it out :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit tricky, I could probably comment this more clearly!
What this is intending is that entries in this.includes
that have no nesting will start processing right away, by chaining a then
handler on to this resolved promise.
For an include with no nesting, its entry in this.include
will be an array with one element: ['user']
, for instance. From that path we extract a prefix
, which in this case will be [].join('.')
which is the empty string ''
. We would find the above base case with promises['']
.
It might be even clearer to write this as an inline default:
const promises = {};
this.include.forEach(path => {
const prefix = path.slice(0, -1).join('.');
const loadAfter = promises[prefix] || Promise.resolve();
...
}
That seems like it is more idiomatic JS, I think I'll probably follow up and switch it over.
// Construct a graph of promises, ensuring that we don't try to load | ||
// any path until it's prefix has finished loading. | ||
const promises = { '': Promise.resolve() }; | ||
this.include.forEach(path => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having some kind of concurrency limiter would be a plus. Something that mimics what bluebird.map offers as an option?
I could imagine a scenario where you have an array of pointers as a field and we're including for each of those objects and you could end up with dog pile of queries (this is off the cuff, I'd have to think through how that would actually work...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the way this works is that we'll end up issuing one query for each include
, regardless of how many objects it is fetching. For each path we gather all the pointers at that path, and then do one subquery with where: { objectId: { $in: objectIds } }
.
That said, there's a degenerate case where someone is trying to load in tons of include paths that could be dangerous, and a concurrency limiter might be a helpful thing to add. I don't love the idea of adding another promise implementation as a dependency. I could write a similar helper for native promises, but I wonder if someone has already created a nice implementation we can reuse... I'll look around a bit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also suspect that this could be written more naturally as
const promises = this.include.reduce(() => {}, Promise.resolve())
but i'm still trying to grok, so feel free to disregard.... :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const promises = this.include.reduce(() => {}, Promise.resolve())
I think this is conceptually similar to what's going on in the existing implementation, in that it will wait for each include to finish loading before starting the next one, which leads to the sort of "waterfall" that I'd like to avoid.
The thought behind this implementation is that the include paths form a sort of tree. All the includes that have only one path component can be unblocked at once, since none of them have dependencies on any data that wasn't part of the initial query.
From there, as each of those paths resolves, they can unblock any nested includes that are one path component deeper; for instance, if you are include post,post.author,post.category
, then we would start a subquery for pointers at the post
path immediately, and as soon as that resolves, we would ungate the two subqueries that rely on that one, post.author
and post.category
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the idea of adding another promise implementation as a dependency.
yes, I'm certainly not suggesting requiring bluebird, just an example...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're doing here and the initiative you're taking is awesome. Honestly, I'm just trying to keep up!
My comments, as I've tried to point out, are me just thinking out loud as I try to process what you're presenting. Not that your replies sound defensive AT ALL, just want to make it clear :).
I think I should make a test case or two and step through the code so I can be sure that I understand, so I can then be helpful in reviewing this pr -- I'll do it, but it may take me a few days.
Having said all that, here's some feedback on your responses.
-
I don't think you're on the same page with me on using
reduce
vs.forEach
. Usingreduce
doesn't necessitate waiting before proceeding to the next iteration and that wasn't what I had in mind. My only point is that if you're declaring a variable that you then populate in a loop, it's a candidate forreduce
(ormap
). -
I wasn't thinking explicitly about an array of different types, I was more thinking about an array of a dozen or so pointers of the same class, each of which has three or four pointer fields that are all included, and each of those has another one or two objects to include. In that situation, would this change introduce a potential dog pile. I don't know the answer, and I'd have to make a test and step through it to have any clue if I'm onto something or not....(you may already have an idea, but I don't).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your feedback! It's definitely hard to convey tone in github comments, and I'm used to leaving reviews on stuff at work where I have preexisting relationships with the reviewers, so getting back into the habit of being a little less terse would probably help me convey my thoughts more effectively, haha.
Thanks for your attention and thoughtfulness on my PRs. I notice your profile says you're in SF - maybe I can get you a coffee or a drink sometime to show my appreciation!
And now, into the responses!
Re: 1
const promises = this.include.reduce(() => {}, Promise.resolve())
I think that because of the initial value being a Promise, I thought that you were suggesting using the reducer to create a promise chain, like this:
const promise = this.include.reduce(
(runAfter, path) => runAfter.then(() => includePath(path, ...)),
Promise.resolve()
);
await promise;
But it sounds like you are actually thinking about the reduce as a replacement for the usage of forEach
to construct the object that indexes the promises by path. Is something like this what you're thinking about?
const promises = this.include.reduce((acc, path) => {
const parentDottedPath = path.slice(0, -1).join('.');
const myDottedPath = path.join('.');
const runAfter = acc[parentDottedPath] || Promise.resolve();
acc[myDottedPath] = runAfter.then(() => includePath(path, ...));
return acc;
}, {});
await Promise.all(Object.keys(promises));
I'm happy to make such a switch. I find myself often waffling on the usage of reduce
when building up an object - using reduce
nicely signifies "I am building a single value", which can help readability, but the ergonomics of it feel just a little off to me. I think it might be something about the type of the accumulator not being obvious until you reach the initialValue
argument which is sort of dangling when the reducer function is inline this way. Maybe that suggests I should add some type annotations?
type PromiseMap = { [dottedPath: string]: Promise<void> }
const promises = this.include.reduce((acc: PromiseMap, path: Array<string>) => {
...
}, {});
Let me know what your preferences here are, and I'm happy to incorporate that feedback!
Re: 2
So, it sounds like you're envisioning a scenario like:
class Item extends Parse.Object { }
// imagine that there are many "items" with characteristics like this:
const children = _.times(100, idx => new Item({ number: idx }));
const parent = new Item({ number: 0, children });
await parent.save();
// initiate a query that starts prefetches on those big arrays, potentially recursing down
new Parse.Query(Item).include('children.children').find();
It seems like you're worried about a scenario like this, where we have huge fan-out in the number of documents fetched via include
- our base query could get 100 documents, which could each reference 100 documents, which could each reference 100 documents, and that's potentially a million rows that we would be fetching.
This is certainly a legitimate concern! I think it's somewhat alleviated by how this is already implemented. At a high level, when we are doing an include on a path, the code looks roughly like this:
includePath(path, object) {
const pointers = recursivelyFindPointersAtPath(path, object);
const groups = _.groupBy(pointers, p => p.className);
const subqueries = _.map(groups, (pointers, className) => {
const where = { objectId: { $in: pointers.map(p => p.objectId) } };
return get(`/classes/${classname}`, { where });
});
const responses = await Promise.all(subqueries);
responses.forEach(response => substitutePointers(path, object, response));
}
So, for each included path, we do at most one query per included class, regardless of how many included pointers there are. In this step, we're also already letting the number of concurrent queries fan out to the number of distinct classes that we found pointers to in this path. This seems like another good place for a concurrency limit!
While looking at this, I noticed something that strikes me as potentially a related problem that I'd like to address later - it seems like the subquery is going to use the default limit
of 100 rows. In that scenario, we'll end up dropping pointers to the objects that didn't fit in that batch 😨
But, that feels like a sort of separate problem, and is probably one that requires some kind of careful decision. It would definitely be a breaking change if some queries suddenly started returning something like ParseError.TOO_MANY_INCLUDED_OBJECTS
.
Whew! That's a lot of words about how I'm thinking about these things. I think I'll try to verify the problem around too many included objects and open an issue about it tomorrow so we can discuss options about solving that separately.
Thanks again for your time and energy looking in to this with me! I have a couple high traffic queries that include something like 4 related objects, so I'm personally pretty excited about the potential performance boost here 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof - looks like I was wrong about the subquery being limited. It seems like the default limit: 100
is being applied by the ClassesRouter
when the query is received by the server. This nested query doesn't traverse through that level of the stack, and so doesn't get any limit applied. So, the good news is that include
should always find all the related documents! The bad news is that with a high degree of fan-out, this could become an enormous fetch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's another issue, right? I'm not worried about that problem, we can't prevent every user from shooting themselves in the foot and your good change doesn't exacerbate the issue I don't think beyond the concurrency limit.
The appropriate tool for large nested objects is a relation and I think the documentation makes that clear.
My only concern at this point with the change, which we definitely want, is to make sure that when we release this, queries that are maybe sluggish but work, don't start bringing down data servers cause we're doing 1k queries simultaneously instead of sequentially.
On the loop vs. reduce issue, you clearly understand my thought, so I'll now defer to your judgment. I also appreciate your detailed and thoughtful response.
I'm gonna step through your code today to make sure I'm properly groking.
PS: we work DIRECTLY across Market street from each other. In a couple of weeks when things get back to normal (hopefully), I'd love to meet up. In the meantime, if you have any spare cycles at all, I'd like to add you to the parse-sever and js-sdk teams (do you use any of the other SDKs?). Also, you can contact me directly on the parseopensource slack channel if it's helpful in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's another issue, right?
Yeah, the giant fan-out for large include lists is already present and unrelated to this change. I'm still going to think about this a little bit, but I think you're probably right that it's up to users to protect themselves from this. Maybe the follow up here will be to add a note to the documentation later?
when we release this, queries that are maybe sluggish but work, don't start bringing down data servers cause we're doing 1k queries simultaneously instead of sequentially
Ok, I think I understand! I think the expected fan-out that this introduces is roughly the largest number of included paths at a given depth. So, include('a.b.c', 'd.e.f')
should have a concurrency of 2 (since there are exactly two chains of dependencies), while include('foo', 'bar', 'baz', 'biz')
would have a concurrency of 4 (all of those includes are at depth 1 and can be processed in parallel).
The worst case concurrency should be equal to the number of include paths. As an example, include('a', 'b.foo', 'b.bar', b.baz')
would start with concurrency of 2 (a
and b
), and if the query for a
is still running when the query for b
completes. That would start the queries for the three fields that are dependent on b
, and we would have concurrency of 4.
I think that having hundreds of paths in your include
is a bit of a degenerate case, but certainly there's nothing stopping someone from using parse-server in that way. Maybe this is another good thing to add to the documentation around the include
field?
One thing I'm noticing as I've been pushing patches on to this branch to make the tests all pass is that the traversal stuff is actually quite complex. I'm thinking that it might be worthwhile to drop that from this PR. I tried a bit more of a minimalistic approach, and that looks something like this: https://github.com/Hustle/parse-server/commits/concurrent-includes-minimal (note particularly 87d6d91). Does that seem like a preferable approach? I think it's sort of hard to interpret some of the trickiness in replacePointers
, but it certainly seems less likely to introduce subtle breakages. I'd love your thoughts about the different approaches!
I've also added 03c02cd which hopefully will make it easier to step through what's going on in handleInclude
. Let me know how that reads to you!
I'm mostly working in JS these days, so happy to keep contributing to parse-server and the JS SDK! What does it mean to be on those teams?
My fingers are crossed that things get back to normal fast! Looking forward to meeting you sometime then 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stepped through it and feel real good about it. @noahsilas you ready for me to merge this?
One thing I'm noticing as I've been pushing patches on to this branch to make the tests all pass is that the traversal stuff is actually quite complex. I'm thinking that it might be worthwhile to drop that from this PR. I tried a bit more of a minimalistic approach, and that looks something like this: https://github.com/Hustle/parse-server/commits/concurrent-includes-minimal (note particularly 87d6d91). Does that seem like a preferable approach? I think it's sort of hard to interpret some of the trickiness in replacePointers, but it certainly seems less likely to introduce subtle breakages. Any thoughts?
One available option is for us to hold off a bit on merging this into the mainline, and I can merge a copy of it into @Hustle's fork - we have some monitoring set up on our deployment, and I'll check with @TylerBrock but I think we should be able to share some of the results from that here. It's unfortunately not quite a true test - we're still using an older version (though I'm going to work on bringing that up to current later). But this code is pretty much unchanged since then, so I think that any changes we see in our branch should be fairly representative of what we would expect with this patch? It's certainly not perfect, as our workload is probably somewhat atypical, but it would be something! Thanks for all your time on this @acinader! |
I had really been ignoring the traversal, replacement stuff and just focused on the concurrent queries I took a quick look at your branch and at this pr but wasn't able to grok it quickly and I don't have time to study it right now. I will give it a try later today if I can. Is it possible to break the two apart, i.e. will the query concurrency stand alone? And if so, could we merge that at this point. In any event, I'll try and look at that later and get back to you. |
The concern is that two replacers might be going at once, and since they rebuild the tree with their modifications, one might clobber the other. But, thinking about it with a fresh brain this morning I think that the replacement operations are entirely synchronous. I think that in NodeJS that means that they won't yield back to the event loop, and shouldn't be interruptible, right? So, maybe we don't need to worry about that as a prerequisit. It does sort of rely on a bit of trickyness - it's a little weird that introducing any async boundary anywhere deep in that function would break things, and it wouldn't necessarily work on exotic runtimes. I think that's probably OK! Having said all that, it feels possible to separate out, I'll spend a little time investigating and testing later today! |
I took a stab at replacing the traversal stuff, and that looks like this: Hustle@6bbae28?w=1 It's definitely tricky, but I could follow up separately to simplify that, and it makes the overall diff for concurrent includes(master...Hustle:concurrent-includes-only) simpler to read. How does that feel to folks? |
I'd like to parallelize the fetching of included documents, but it would be tricky to do in the world where these methods are cloning the response. Instead of trying to do tricky processing on the responses to merge all the pointers, we can mutate the response in-place. This kind of in-place mutation can be viewed as more dangerous, particularly with multiple promises trying to run this kind of replacement in a row. However, we have some nice guarantees here: - Each path passed is unique; we don't need to worry about multiple calls each trying to replace the same documents. - When we start doing a replacement, all the path prefixes have already been replaced. In the current implementation this is known because we sort the include paths by path length, so if a path `a.b.c` is being processed, then the prefix paths `a` and `a.b` must have already been processed. This in-place mutation might also have a nice benefit in terms of memory usage; queries with lots of includes would have been rebuilding the result tree multiple times, causing extra allocations. The in-place method avoids this, and so might reduce memory usage for those queries.
When preloading multiple fields, we can get some extra performance by doing multiple fetches simultaneously. Consider a query fetching comments, and requesting that ParseServer preload "post" and "author" pointers: > GET /classes/Comment?include=post,author In this case, we first need to fetch the resulting "Comment" documents, but after that we should be able to fetch the documents referenced by the `post` and `author` fields of the results simultaneously, as they don't depend on each other at all. Things get a little trickier when we have nested fields: > GET /classes/Comment?include=post,post.author,post.category To resolve this query, we first need to fetch the related posts, and only once we've added the data about those posts into the results tree can we scan it for the `post.author` and `post.category` pointers. But, once that first fetch is completed, we can unblock both of those nested queries! The solution here is to build what is basically a dependency graph out of promises; each include path blocks while it is waiting for whatever path it depends on to finish loading. Finally, once we have that graph, we return a promise that depends on every node in it. Aside: Technically we only need to depend on leaf nodes, but there shouldn't be any meaningful difference between waiting on the leafs and waiting on the entire graph, and this way we don't have to do any analysis to find the leafs) It's possible that for degenerate cases (hundreds of includes in a single query), this could result in performance degradation as many queries are kicked off in parallel. For the more common case of just a few top level includes, this should be a noticeable speed up as we remove a "waterfall" style dependency graph. Improve readability of `RestQuery.handleInclude` There's quite a bit of trickiness going on here, so for the benefit of future maintainers, lets add some documentation and clear up some variable names. - Documented the preconditions that we expect when we're entering this function. These are all met by the current implementation of the caller, but it's helpful to someone trying to verify the correctness of this function. - Added a lengthier description of what is going on with the map of `promises` - Renamed some variables to help make their purpose clearer
I had a thought that I might not be handling the case where `includePath` doesn't return a promise, but it turns out to work because now I'm only calling that method from inside a promise handler. Let's still commit the test to prevent any regressions in the future.
67479fb
to
c23b09a
Compare
Sorry for the long delay here! I've updated this PR to reflect the "minimal" branch I had linked to above, and I'll submit a separate PR relating to the traversal stuff later. (For reference, the original commits in the PR are available here: https://github.com/Hustle/parse-server/tree/concurrent-includes-original) To add a little more confidence to this change, @Hustle spent several days with a patch very close to this one running in production. It's not quite an exact match (we're slowly upgrading from an older release), but it did give me much more confidence in the correctness and performance characteristics. At Hustle this didn't give us the big performance boost I had been hoping for - it turns out that each query to Mongo is individually pretty quick, and compared to some of our expensive operations like making HTTP requests to other services, the cost of doing the fetches serially is small enough that it's hardly noticeable for our slowest endpoints. Bummer for me, means I need to keep working on finding more places to make other perf wins. The good news is that our monitoring does show this patch is effective at running the fetches in parallel. Here's excerpts from a trace where we're doing a query similar to Before this patch, with serial includes:After this patch, with parallel includesThese two traces aren't exactly apples-to-apples; they're getting different object sets, and we also changed the version of the mongo driver between them, but I think that this shows that there are some query patterns that can see their performance significantly increase with this patch! |
|
Thanks for opening this pull request!
|
@noahsilas Do you think we can get this ready for merge? It looks very promising. |
What do we have to do to get this merged @mtrezza? This unit test is 2x as fast with these changes, and these pointers are only one level down:
Can I start a new PR based on this one considering it is currently stale? |
Yeah, I'd suggest opening a new issue to give this some more visibility, maybe we can break this down into smaller PRs if necessary due to complexity. Then open a new PR. |
Great PR to work on, praise guaranteed for whoever gets this merged; I'll pin this and add a bounty |
Little ping here @mtrezza. I worked on a PR (not pushed currently). I managed to implement parallel include, but after some performance investigation, I didn't see a noticeable difference, even with a dedicated test with many includes. My guess are:
I'll conduct more tests and ensure with a flame graph and further investigation if we truly have a major performance gain. I also think that the original Parse team (from Meta) was not dumb and was totally capable of implementing the parallel include, but they did not do it, maybe on purpose. I'll push my PR (all tests passing) and make it easy to understand. I avoided as much as possible weird patterns and internal structure changes. |
The perf benefit may depend also on the cluster type, its current load and type of query. A sharded cluster whose nodes are distributed across regions for resilience may see more benefits. |
When preloading multiple fields, we can get some extra performance by doing multiple fetches simultaneously.
Consider a query fetching comments, and requesting that ParseServer preload "post" and "author" pointers:
In this case, we first need to fetch the resulting "Comment" documents, but after that we should be able to fetch the documents referenced by the
post
andauthor
fields of the results simultaneously, as they don't depend on each other at all.Things get a little trickier when we have nested fields:
To resolve this query, we first need to fetch the related posts, and only once we've added the data about those posts into the results tree can we scan it for the
post.author
andpost.category
pointers. But, once that first fetch is completed, we can unblock both of those nested queries!The solution here is to build what is basically a dependency graph out of promises; each include path blocks while it is waiting for whatever path it depends on to finish loading. Finally, once we have that graph, we return a promise that depends on every node in it.
Aside: Technically we only need to depend on leaf nodes, but there shouldn't be any meaningful difference between waiting on the leafs and waiting on the entire graph, and this way we don't have to do any analysis to find the leafs)
Implementation Notes:
To make these parallel updates play nicely, I started by switching the code that updates the response with the fetched objects from a style where it clones the response as it inserts the objects to a style where it mutates the original response. Because we know that each include is affecting independent paths in the tree, this should be a safe optimization, and it may have other beneficial effects for performance as we build fewer intermediate representations of the response.
While making that change, I found that it was easier for me to think about when I split the responsibilities of traversing the object graph from the responsibility of updating a given node. The traversal helper also served the
findPointers
function nicely!It's possible that for degenerate cases (hundreds of includes in a single query), this could result in performance degradation as many queries are kicked off in parallel. For the more common case of just a few top level includes, this should be a noticeable speed up as we remove a "waterfall" style dependency graph.