Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding heuristic based batch_mem_size allocation #1300

Closed
wants to merge 2 commits into from
Closed

Adding heuristic based batch_mem_size allocation #1300

wants to merge 2 commits into from

Conversation

shikhar1729
Copy link

@shikhar1729 shikhar1729 commented Oct 18, 2023

This pull request enhances the LogicalGetToSeqScan rule by introducing a heuristic-based approach for setting the batch_mem_size parameter. The focus of this update is not to resolve a specific issue but to optimize the calculation of batch_mem_size using intelligent heuristics.

Changes Made:

  • Heuristics Implementation: Implemented a heuristic-based calculation for the batch_mem_size parameter within the LogicalGetToSeqScan rule. The new approach intelligently determines the batch size by considering both system memory availability and the length of the target list. This dynamic calculation ensures efficient utilization of system resources during batch processing.

  • Code Modification: Modified the apply method in the LogicalGetToSeqScan rule to seamlessly incorporate the heuristic-based calculation. The method now utilizes the calculated batch_mem_size for further processing, optimizing the rule's behavior under various conditions.

  • Error Handling: The introduced heuristic-based approach enhances the robustness of the code. By dynamically adjusting the batch_mem_size parameter, the code adapts to different scenarios, providing a stable and reliable solution for batch-processing tasks.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @shikhar1729, thanks for submitting a EVA DB PR 🙏 To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify that your PR is up-to-date with georgia-tech-db/eva master branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge master locally.
  • ✅ Verify that all EVA DB Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition.

@jarulraj
Copy link
Member

@shikhar1729 Can you share details on the heuristic in this PR and also add a comment in the code?

@shikhar1729
Copy link
Author

@jarulraj : I have updated the PR description and added comments to the code.

@jarulraj
Copy link
Member

@shikhar1729 Can you explain this heuristic?

if before.target_list is None:
batch_mem_size = self.context.db.config.get_value("executor", "batch_mem_size")
else:
# Calculate batch_mem_size based on the number of columns and available memory
num_columns = len(before.target_list)
batch_mem_size = int(available_memory * memory_fraction / num_columns)

@shikhar1729
Copy link
Author

@jarulraj : Here is the explanation:

  1. Getting Available Memory : Retrieves available system memory using psutil.virtual_memory().available.
  2. Memory Fraction : Allocates half of available memory (memory_fraction = 0.5) for batch processing.
  3. Calculating batch_mem_size : If specific columns are in the query (before.target_list not None), batch_mem_size is based on available memory and column count. batch_mem_size = int(available_memory * memory_fraction / len(before.target_list)) => If no specific columns (before.target_list is None), batch_mem_size is obtained from configuration.
  4. Bounding batch_mem_size : Ensures batch_mem_size falls between 100 and 1000 for efficient processing.
    This heuristic dynamically adjusts batch size based on query columns and available memory, ensuring optimal resource utilization during processing.

@jarulraj
Copy link
Member

Closing this PR (subsumed in #1306).

@jarulraj jarulraj closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants