Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Assistants] Add new indices for efficient querying #810

Merged
merged 5 commits into from
Feb 15, 2024
Merged

Conversation

mishig25
Copy link
Collaborator

@mishig25 mishig25 commented Feb 9, 2024

@@ -75,5 +75,7 @@ client.on("open", () => {
assistants.createIndex({ createdBy: 1 }).catch(console.error);
assistants.createIndex({ userCount: 1 }).catch(console.error);
assistants.createIndex({ featured: 1 }).catch(console.error);
assistants.createIndex({ modelId: 1, userCount: -1, createdByName: 1 }).catch(console.error);
assistants.createIndex({ modelId: 1, userCount: -1, featured: 1 }).catch(console.error);
reports.createIndex({ assistantId: 1 }).catch(console.error);
Copy link
Collaborator Author

@mishig25 mishig25 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have this filter (for hf.co/chat/assistants), which is

const filter: Filter<Assistant> = {
	modelId: modelId ?? { $exists: true },
	...(!createdByCurrentUser && { userCount: { $gt: 1 } }),
	...(createdByName ? { createdByName } : { featured: true }),
};
...
.sort({ userCount: -1 })

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at Assistant.ts, modelId is always defined.

So the filter should be like this:

const filter: Filter<Assistant> = {
	...(modelId && {modelId}),
	...(!createdByCurrentUser && { userCount: { $gt: 1 } }),
	...(createdByName ? { createdByName } : { featured: true }),
};
...
.sort({ userCount: -1 })

There is already an index on createdBy, but createdBy doesn't exist in Assistant.ts, there is instead createdByName and createdById.

IMO:

  • Change the createdBy index to {createdById: 1, userCount: -1}
  • Change the filter to use createdById if possible (by doing a first query to find user matching name). Otherwise add an index {createdByName: 1, userCount: -1}
  • Change the featured index to {featured: 1, userCount: -1}
  • Create an index on {modelId: 1, userCount: -1}. No need for a more specific index at the moment, if we add a more specific index the other key should be added before userCount, eg {modelId: 1, featured: 1, userCount: -1). It's important that the "sort" key is last.

Copy link
Collaborator Author

@mishig25 mishig25 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestions. Will apply those.

if we add a more specific index the other key should be added before userCount, eg {modelId: 1, featured: 1, userCount: -1). It's important that the "sort" key is last.

I don't see any information the "sort" key is last on the docs here. Why so ?

Copy link
Member

@coyotte508 coyotte508 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out this video in their resources: https://learn.mongodb.com/learn/course/mongodb-indexes/lesson-4-working-with-compound-indexes-in-mongodb/learn?page=1

Basically in a compound index you want to:

  • have an exact match on key1, key2, ... keyN
  • sort on keyN+1

For example, if you define an index {featured: 1, userCount: -1}, and you have those documents:

{_id: "id0", featured: true, userCount: 1},
{_id: "id1", featured: true, userCount: 1},
{_id: "id2", featured: true, userCount: 2},
{_id: "id3", featured: true, userCount: 3},
{_id: "id4", featured: false, userCount: 1},
{_id: "id5", featured: false, userCount: 2},
{_id: "id6", featured: false, userCount: 3},

The index will have this internal structure:

{
  [featured=true]: {
    [userCount=3]: ["id3"],
    [userCount=2]: ["id2"],
    [userCount=1]: ["id1", "id0"]
  },
 [featured=false]: {
    [userCount=3]: ["id6"],
    [userCount=2]: ["id5"],
    [userCount=1]: ["id4"]
  },
}

So if you do the query .find({featured: true}).sort({userCount: -1}), mongodb will be able to look up the desired results very efficiently: index[featured=true].entries().flatMap(([, v] => v).

If the index was defined ({userCount: -1, featured: 1}) the internal index structure would look like this:

{
  [userCount=3]: {
    [featured=true]: ["id3"],
    [featured=false]: ["id6"],
  },
  [userCount=2]: {
    [featured=true]: ["id2"],
    [featured=false]: ["id5"],
  },
  [userCount=1]: {
    [featured=true]: ["id1", "id0"],
    [featured=false]: ["id4"],
  },
}

and MongoDB would have to examine all index keys to return the results:

index
  .entries()
  .flatMap(
    ([, v]) => v
                   .entries().filter(k => k === "featured=true")
                   .flatMap([, v] => v)
  );

It will examine all 7 indexed values instead of only the 4 returned.

This rule - putting "sort" last - is only when we have an exact match on the previous keys. Otherwise there are tradeoffs between the values being already sorted & the number of values examined.

Copy link
Member

@coyotte508 coyotte508 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general https://learn.mongodb.com/ is an awesome resource to learn from - but the format is videos, not text

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn, learned a lot here, thanks 😁

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mishig25 mishig25 changed the title [Assistants] Add new indeices for efficient querying [Assistants] Add new indices for efficient querying Feb 9, 2024
@nsarrazin
Copy link
Collaborator

I don't have enough mongo knowledge to chime in so if it's good for @coyotte508 then it's good for me 😁

Do we need to manually build the indices in atlas or will they be picked up on space rebuild ?

@coyotte508
Copy link
Member

It will automatically build

We should delete the removed indicies manually though

@mishig25 mishig25 merged commit db943f9 into main Feb 15, 2024
3 checks passed
@mishig25 mishig25 deleted the assistans_index branch February 15, 2024 13:18
@mishig25
Copy link
Collaborator Author

these indices were deleted:

assistants.createIndex({ createdBy: 1 }).catch(console.error);
assistants.createIndex({ featured: 1 }).catch(console.error);

ice91 pushed a commit to ice91/chat-ui that referenced this pull request Oct 30, 2024
* [Assistants] Add new indeices for efficient querying

* update indices & use `createdById`

* stronger typing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants