Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Use indexes for Mongo collections #8

Open
1 task done
jonbarrow opened this issue May 31, 2024 · 0 comments
Open
1 task done

[Enhancement]: Use indexes for Mongo collections #8

jonbarrow opened this issue May 31, 2024 · 0 comments
Labels
approved The topic is approved by a developer enhancement An update to an existing part of the codebase

Comments

@jonbarrow
Copy link
Member

Checked Existing

  • I have checked the repository for duplicate issues.

What enhancement would you like to see?

When a collection has no indexes, Mongo will perform a full COLLSCAN on the collection, scanning all documents to perform the query. This can very easily eat all available resources when working with

  • Large databases
  • Low performance queries
  • Complex aggregations

When there are many documents, and many scans of those documents, Mongo can easily max out CPU usage and remain there. When working with very large documents (such as those which contain files), this can also easily max out memory usage.

Any other details to share? (OPTIONAL)

Currently our BOSS server is doing a COLLSCAN over a 7gb+ collection, which has 880,975 documents, over a million times per day. This is locking up all system resources

Screenshot from 2024-05-30 23-24-06

I propose we add the following indexes:

Task

  • Compound index on task_id and boss_app_id, as that is the combination most often queried by

File

  • Compound index on task_id and boss_app_id, as that is the combination most often queried by
  • Single field index on name, as we sometimes query by task_id, boss_app_id, and name. This will use index intersection
  • Single field index on data_id, as we also often query by just this field

CECData

  • Compound index on creator_pid and game_id, as that is the combination most often queried by
  • Single field index on latest_data_id, as we also often query by just this field

CECSlot

  • Compound index on creator_pid and game_id, as that is the combination most often queried by

It should be noted that indexes do not come for free, nor does index intersection. Indexes are stored on disk by Mongo and will increase our storage usage. Index intersection also has some overhead compared to regular indexed queries, but it should be better than a full COLLSCAN. We CAN make multiple compound indexes using the same fields, but this creates duplicate indexes on disk which again increases storage costs.

@jonbarrow jonbarrow added enhancement An update to an existing part of the codebase awaiting-approval Topic has not been approved or denied labels May 31, 2024
@jonbarrow jonbarrow added approved The topic is approved by a developer and removed awaiting-approval Topic has not been approved or denied labels Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved The topic is approved by a developer enhancement An update to an existing part of the codebase
Projects
Status: Todo
Development

No branches or pull requests

1 participant