Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Bulk Reindexing with custom_import_scope for Faster Partial Indexing #1707

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Yasoob01
Copy link

@Yasoob01 Yasoob01 commented Mar 3, 2025

This PR enhances Searchkick's bulk reindexing by introducing custom_import_scope, allowing selective indexing of associated data. Previously, reindexing would load all related records via search_import, even if only a subset was needed. This resulted in slower performance and unnecessary memory usage.

With this enhancement, developers can explicitly define which associations to include(for partial reindexing), making indexing up to 70% faster in cases where only partial data (e.g., client details) is required.

Key Changes
✅ Added custom_import_scope to limit loaded associations during reindexing.
✅ Ensures backward compatibility with search_import.
✅ Performance boost by avoiding unnecessary data fetches.

Example Usage

1. Full Reindexing (Legacy Approach)
This would load all related data, making reindexing slower.

Searchkick::BulkReindexJob.perform_later(
  class_name: "Vehicle",
  record_ids: [100, 101, 102],
  index_name: "vehicles_index",
  method_name: :make_data
)

Internally, this includes all associations present in import scope. such as: ( Vehicle.includes(:make, :model, :variant, inventory: [:user, :city, :tasks]) )

2. Optimized Partial Reindexing (New Approach)

Searchkick::BulkReindexJob.perform_later(
  class_name: "Vehicle",
  record_ids: [100, 101, 102],
  index_name: "vehicles_index",
  method_name: :make_data,
  custom_import_scope: [:make]
)

Internally, this only includes only make : such as: ( Vehicle.includes(:make) )

Why This Matters?

  1. Faster Reindexing – Avoids loading irrelevant relations.
  2. Lower Memory Usage – Fetches only what's needed.
  3. More Control – Allows selective indexing based on use case.

This update significantly improves performance while ensuring existing functionality remains intact. Let me know if any refinements are needed! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant