-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Assistants] Add new indices for efficient querying #810
Conversation
@@ -75,5 +75,7 @@ client.on("open", () => { | |||
assistants.createIndex({ createdBy: 1 }).catch(console.error); | |||
assistants.createIndex({ userCount: 1 }).catch(console.error); | |||
assistants.createIndex({ featured: 1 }).catch(console.error); | |||
assistants.createIndex({ modelId: 1, userCount: -1, createdByName: 1 }).catch(console.error); | |||
assistants.createIndex({ modelId: 1, userCount: -1, featured: 1 }).catch(console.error); | |||
reports.createIndex({ assistantId: 1 }).catch(console.error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have this filter (for hf.co/chat/assistants), which is
const filter: Filter<Assistant> = {
modelId: modelId ?? { $exists: true },
...(!createdByCurrentUser && { userCount: { $gt: 1 } }),
...(createdByName ? { createdByName } : { featured: true }),
};
...
.sort({ userCount: -1 })
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at Assistant.ts
, modelId
is always defined.
So the filter should be like this:
const filter: Filter<Assistant> = {
...(modelId && {modelId}),
...(!createdByCurrentUser && { userCount: { $gt: 1 } }),
...(createdByName ? { createdByName } : { featured: true }),
};
...
.sort({ userCount: -1 })
There is already an index on createdBy
, but createdBy
doesn't exist in Assistant.ts
, there is instead createdByName
and createdById
.
IMO:
- Change the
createdBy
index to{createdById: 1, userCount: -1}
- Change the filter to use
createdById
if possible (by doing a first query to find user matching name). Otherwise add an index{createdByName: 1, userCount: -1}
- Change the
featured
index to{featured: 1, userCount: -1}
- Create an index on
{modelId: 1, userCount: -1}
. No need for a more specific index at the moment, if we add a more specific index the other key should be added beforeuserCount
, eg{modelId: 1, featured: 1, userCount: -1
). It's important that the "sort" key is last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great suggestions. Will apply those.
if we add a more specific index the other key should be added before userCount, eg {modelId: 1, featured: 1, userCount: -1). It's important that the "sort" key is last.
I don't see any information the "sort" key is last
on the docs here. Why so ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out this video in their resources: https://learn.mongodb.com/learn/course/mongodb-indexes/lesson-4-working-with-compound-indexes-in-mongodb/learn?page=1
Basically in a compound index you want to:
- have an exact match on key1, key2, ... keyN
- sort on keyN+1
For example, if you define an index {featured: 1, userCount: -1}
, and you have those documents:
{_id: "id0", featured: true, userCount: 1},
{_id: "id1", featured: true, userCount: 1},
{_id: "id2", featured: true, userCount: 2},
{_id: "id3", featured: true, userCount: 3},
{_id: "id4", featured: false, userCount: 1},
{_id: "id5", featured: false, userCount: 2},
{_id: "id6", featured: false, userCount: 3},
The index will have this internal structure:
{
[featured=true]: {
[userCount=3]: ["id3"],
[userCount=2]: ["id2"],
[userCount=1]: ["id1", "id0"]
},
[featured=false]: {
[userCount=3]: ["id6"],
[userCount=2]: ["id5"],
[userCount=1]: ["id4"]
},
}
So if you do the query .find({featured: true}).sort({userCount: -1})
, mongodb will be able to look up the desired results very efficiently: index[featured=true].entries().flatMap(([, v] => v)
.
If the index was defined ({userCount: -1, featured: 1})
the internal index structure would look like this:
{
[userCount=3]: {
[featured=true]: ["id3"],
[featured=false]: ["id6"],
},
[userCount=2]: {
[featured=true]: ["id2"],
[featured=false]: ["id5"],
},
[userCount=1]: {
[featured=true]: ["id1", "id0"],
[featured=false]: ["id4"],
},
}
and MongoDB would have to examine all index keys to return the results:
index
.entries()
.flatMap(
([, v]) => v
.entries().filter(k => k === "featured=true")
.flatMap([, v] => v)
);
It will examine all 7 indexed values instead of only the 4 returned.
This rule - putting "sort" last - is only when we have an exact match on the previous keys. Otherwise there are tradeoffs between the values being already sorted & the number of values examined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general https://learn.mongodb.com/ is an awesome resource to learn from - but the format is videos, not text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
damn, learned a lot here, thanks 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coyotte508 wdyt 2e87ab5 ?
I don't have enough mongo knowledge to chime in so if it's good for @coyotte508 then it's good for me 😁 Do we need to manually build the indices in atlas or will they be picked up on space rebuild ? |
It will automatically build We should delete the removed indicies manually though |
these indices were deleted: assistants.createIndex({ createdBy: 1 }).catch(console.error);
assistants.createIndex({ featured: 1 }).catch(console.error); |
* [Assistants] Add new indeices for efficient querying * update indices & use `createdById` * stronger typing
Read this comment