-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Parallelize loading include
objects
#6501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: alpha
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -747,30 +747,61 @@ RestQuery.prototype.handleExcludeKeys = function() { | |
}; | ||
|
||
// Augments this.response with data at the paths provided in this.include. | ||
// | ||
// Preconditions: | ||
// - `this.include` is an array of arrays of strings; (in flow parlance, Array<Array<string>>) | ||
// | ||
// - `this.include` is de-duplicated. This ensures that we don't try to fetch | ||
// the same objects twice. | ||
// | ||
// - For each value in `this.include` with length > 1, there is also | ||
// an earlier value for the prefix of that value. | ||
// | ||
// Example: ['a', 'b', 'c'] in the array implies that ['a', 'b'] is also in | ||
// the array, at an earlier position). | ||
// | ||
// This prevents trying to follow pointers on unfetched objects. | ||
RestQuery.prototype.handleInclude = function() { | ||
if (this.include.length == 0) { | ||
return; | ||
} | ||
|
||
var pathResponse = includePath( | ||
this.config, | ||
this.auth, | ||
this.response, | ||
this.include[0], | ||
this.restOptions | ||
); | ||
if (pathResponse.then) { | ||
return pathResponse.then(newResponse => { | ||
this.response = newResponse; | ||
this.include = this.include.slice(1); | ||
return this.handleInclude(); | ||
}); | ||
} else if (this.include.length > 0) { | ||
this.include = this.include.slice(1); | ||
return this.handleInclude(); | ||
} | ||
// The list of includes form a sort of a tree - Each path should wait to | ||
// start trying to load until its parent path has finished loading (so that | ||
// the pointers it is trying to read and fetch are in the object tree). | ||
// | ||
// So, for instance, if we have an include of ['a', 'b', 'c'], that must | ||
// wait on the include of ['a', 'b'] to finish, which must wait on the include | ||
// of ['a'] to finish. | ||
// | ||
// This `promises` object is a map of dotted paths to promises that resolve | ||
// when that path has finished loading into the tree. One special case is the | ||
// empty path (represented by the empty string). This represents the root of | ||
// the tree, which is `this.response` and is already fetched. We set a | ||
// pre-resolved promise at that level, meaning that include paths with only | ||
// one component (like `['a']`) will chain onto that resolved promise and | ||
// are immediately unblocked. | ||
const promises = { '': Promise.resolve() }; | ||
|
||
this.include.forEach(path => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that having some kind of concurrency limiter would be a plus. Something that mimics what bluebird.map offers as an option? I could imagine a scenario where you have an array of pointers as a field and we're including for each of those objects and you could end up with dog pile of queries (this is off the cuff, I'd have to think through how that would actually work...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that the way this works is that we'll end up issuing one query for each That said, there's a degenerate case where someone is trying to load in tons of include paths that could be dangerous, and a concurrency limiter might be a helpful thing to add. I don't love the idea of adding another promise implementation as a dependency. I could write a similar helper for native promises, but I wonder if someone has already created a nice implementation we can reuse... I'll look around a bit! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also suspect that this could be written more naturally as
but i'm still trying to grok, so feel free to disregard.... :). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this is conceptually similar to what's going on in the existing implementation, in that it will wait for each include to finish loading before starting the next one, which leads to the sort of "waterfall" that I'd like to avoid. The thought behind this implementation is that the include paths form a sort of tree. All the includes that have only one path component can be unblocked at once, since none of them have dependencies on any data that wasn't part of the initial query. From there, as each of those paths resolves, they can unblock any nested includes that are one path component deeper; for instance, if you are include There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
yes, I'm certainly not suggesting requiring bluebird, just an example... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think what you're doing here and the initiative you're taking is awesome. Honestly, I'm just trying to keep up! My comments, as I've tried to point out, are me just thinking out loud as I try to process what you're presenting. Not that your replies sound defensive AT ALL, just want to make it clear :). I think I should make a test case or two and step through the code so I can be sure that I understand, so I can then be helpful in reviewing this pr -- I'll do it, but it may take me a few days. Having said all that, here's some feedback on your responses.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you so much for your feedback! It's definitely hard to convey tone in github comments, and I'm used to leaving reviews on stuff at work where I have preexisting relationships with the reviewers, so getting back into the habit of being a little less terse would probably help me convey my thoughts more effectively, haha. Thanks for your attention and thoughtfulness on my PRs. I notice your profile says you're in SF - maybe I can get you a coffee or a drink sometime to show my appreciation! And now, into the responses! Re: 1
I think that because of the initial value being a Promise, I thought that you were suggesting using the reducer to create a promise chain, like this: const promise = this.include.reduce(
(runAfter, path) => runAfter.then(() => includePath(path, ...)),
Promise.resolve()
);
await promise; But it sounds like you are actually thinking about the reduce as a replacement for the usage of const promises = this.include.reduce((acc, path) => {
const parentDottedPath = path.slice(0, -1).join('.');
const myDottedPath = path.join('.');
const runAfter = acc[parentDottedPath] || Promise.resolve();
acc[myDottedPath] = runAfter.then(() => includePath(path, ...));
return acc;
}, {});
await Promise.all(Object.keys(promises)); I'm happy to make such a switch. I find myself often waffling on the usage of type PromiseMap = { [dottedPath: string]: Promise<void> }
const promises = this.include.reduce((acc: PromiseMap, path: Array<string>) => {
...
}, {}); Let me know what your preferences here are, and I'm happy to incorporate that feedback! Re: 2So, it sounds like you're envisioning a scenario like: class Item extends Parse.Object { }
// imagine that there are many "items" with characteristics like this:
const children = _.times(100, idx => new Item({ number: idx }));
const parent = new Item({ number: 0, children });
await parent.save();
// initiate a query that starts prefetches on those big arrays, potentially recursing down
new Parse.Query(Item).include('children.children').find(); It seems like you're worried about a scenario like this, where we have huge fan-out in the number of documents fetched via This is certainly a legitimate concern! I think it's somewhat alleviated by how this is already implemented. At a high level, when we are doing an include on a path, the code looks roughly like this: includePath(path, object) {
const pointers = recursivelyFindPointersAtPath(path, object);
const groups = _.groupBy(pointers, p => p.className);
const subqueries = _.map(groups, (pointers, className) => {
const where = { objectId: { $in: pointers.map(p => p.objectId) } };
return get(`/classes/${classname}`, { where });
});
const responses = await Promise.all(subqueries);
responses.forEach(response => substitutePointers(path, object, response));
} So, for each included path, we do at most one query per included class, regardless of how many included pointers there are. In this step, we're also already letting the number of concurrent queries fan out to the number of distinct classes that we found pointers to in this path. This seems like another good place for a concurrency limit! While looking at this, I noticed something that strikes me as potentially a related problem that I'd like to address later - it seems like the subquery is going to use the default But, that feels like a sort of separate problem, and is probably one that requires some kind of careful decision. It would definitely be a breaking change if some queries suddenly started returning something like Whew! That's a lot of words about how I'm thinking about these things. I think I'll try to verify the problem around too many included objects and open an issue about it tomorrow so we can discuss options about solving that separately. Thanks again for your time and energy looking in to this with me! I have a couple high traffic queries that include something like 4 related objects, so I'm personally pretty excited about the potential performance boost here 😁 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oof - looks like I was wrong about the subquery being limited. It seems like the default There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But that's another issue, right? I'm not worried about that problem, we can't prevent every user from shooting themselves in the foot and your good change doesn't exacerbate the issue I don't think beyond the concurrency limit. The appropriate tool for large nested objects is a relation and I think the documentation makes that clear. My only concern at this point with the change, which we definitely want, is to make sure that when we release this, queries that are maybe sluggish but work, don't start bringing down data servers cause we're doing 1k queries simultaneously instead of sequentially. On the loop vs. reduce issue, you clearly understand my thought, so I'll now defer to your judgment. I also appreciate your detailed and thoughtful response. I'm gonna step through your code today to make sure I'm properly groking. PS: we work DIRECTLY across Market street from each other. In a couple of weeks when things get back to normal (hopefully), I'd love to meet up. In the meantime, if you have any spare cycles at all, I'd like to add you to the parse-sever and js-sdk teams (do you use any of the other SDKs?). Also, you can contact me directly on the parseopensource slack channel if it's helpful in the meantime. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, the giant fan-out for large include lists is already present and unrelated to this change. I'm still going to think about this a little bit, but I think you're probably right that it's up to users to protect themselves from this. Maybe the follow up here will be to add a note to the documentation later?
Ok, I think I understand! I think the expected fan-out that this introduces is roughly the largest number of included paths at a given depth. So, The worst case concurrency should be equal to the number of include paths. As an example, I think that having hundreds of paths in your One thing I'm noticing as I've been pushing patches on to this branch to make the tests all pass is that the traversal stuff is actually quite complex. I'm thinking that it might be worthwhile to drop that from this PR. I tried a bit more of a minimalistic approach, and that looks something like this: https://github.com/Hustle/parse-server/commits/concurrent-includes-minimal (note particularly 87d6d91). Does that seem like a preferable approach? I think it's sort of hard to interpret some of the trickiness in I've also added 03c02cd which hopefully will make it easier to step through what's going on in I'm mostly working in JS these days, so happy to keep contributing to parse-server and the JS SDK! What does it mean to be on those teams? My fingers are crossed that things get back to normal fast! Looking forward to meeting you sometime then 😄 |
||
const dottedPath = path.join('.'); | ||
|
||
// Get the promise for the parent path | ||
const parentDottedPath = path.slice(0, -1).join('.'); | ||
const parentPromise = promises[parentDottedPath]; | ||
|
||
// Once the parent promise has resolved, do this path's load step | ||
const loadPromise = parentPromise.then(() => | ||
includePath(this.config, this.auth, this.response, path, this.restOptions) | ||
); | ||
|
||
// Put our promise into the promises map, so child paths can find and chain | ||
// off of it | ||
promises[dottedPath] = loadPromise; | ||
}); | ||
|
||
return pathResponse; | ||
// Wait for all includes to be fetched and merged in to the response tree | ||
return Promise.all(Object.values(promises)); | ||
}; | ||
|
||
//Returns a promise of a processed set of results | ||
|
@@ -821,7 +852,10 @@ RestQuery.prototype.runAfterFindTrigger = function() { | |
|
||
// Adds included values to the response. | ||
// Path is a list of field names. | ||
// Returns a promise for an augmented response. | ||
// | ||
// Modifies the response in place by replacing pointers with fetched objects | ||
// Returns a promise that resolves when all includes have been fetched and | ||
// substituted into the response. | ||
function includePath(config, auth, response, path, restOptions = {}) { | ||
var pointers = findPointers(response.results, path); | ||
if (pointers.length == 0) { | ||
|
@@ -905,19 +939,12 @@ function includePath(config, auth, response, path, restOptions = {}) { | |
return replace; | ||
}, {}); | ||
|
||
var resp = { | ||
results: replacePointers(response.results, path, replace), | ||
}; | ||
if (response.count) { | ||
resp.count = response.count; | ||
} | ||
return resp; | ||
replacePointers(response.results, path, replace); | ||
}); | ||
} | ||
|
||
// Object may be a list of REST-format object to find pointers in, or | ||
// it may be a single object. | ||
// If the path yields things that aren't pointers, this throws an error. | ||
// Path is a list of fields to search into. | ||
// Returns a list of pointers in REST format. | ||
function findPointers(object, path) { | ||
|
@@ -951,8 +978,8 @@ function findPointers(object, path) { | |
// in, or it may be a single object. | ||
// Path is a list of fields to search into. | ||
// replace is a map from object id -> object. | ||
// Returns something analogous to object, but with the appropriate | ||
// pointers inflated. | ||
// | ||
// Performs in-place replacements on object | ||
function replacePointers(object, path, replace) { | ||
if (object instanceof Array) { | ||
return object | ||
|
@@ -964,27 +991,27 @@ function replacePointers(object, path, replace) { | |
return object; | ||
} | ||
|
||
// base case: we're returning a replacement | ||
if (path.length === 0) { | ||
if (object && object.__type === 'Pointer') { | ||
return replace[object.objectId]; | ||
} | ||
return object; | ||
} | ||
|
||
// penultimate case: we're looking at the object holding a reference | ||
// to be mutated | ||
else if (path.length === 1) { | ||
object[path[0]] = replacePointers(object[path[0]], [], replace); | ||
return; | ||
} | ||
|
||
var subobject = object[path[0]]; | ||
if (!subobject) { | ||
return object; | ||
} | ||
var newsub = replacePointers(subobject, path.slice(1), replace); | ||
var answer = {}; | ||
for (var key in object) { | ||
if (key == path[0]) { | ||
answer[key] = newsub; | ||
} else { | ||
answer[key] = object[key]; | ||
} | ||
} | ||
return answer; | ||
|
||
return replacePointers(subobject, path.slice(1), replace); | ||
} | ||
|
||
// Finds a subobject that has the given key, if there is one. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this idiom. trying to figure out how i can figure it out :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit tricky, I could probably comment this more clearly!
What this is intending is that entries in
this.includes
that have no nesting will start processing right away, by chaining athen
handler on to this resolved promise.For an include with no nesting, its entry in
this.include
will be an array with one element:['user']
, for instance. From that path we extract aprefix
, which in this case will be[].join('.')
which is the empty string''
. We would find the above base case withpromises['']
.It might be even clearer to write this as an inline default:
That seems like it is more idiomatic JS, I think I'll probably follow up and switch it over.