Adding heuristic based batch_mem_size allocation #1300

shikhar1729 · 2023-10-18T03:46:53Z

This pull request enhances the LogicalGetToSeqScan rule by introducing a heuristic-based approach for setting the batch_mem_size parameter. The focus of this update is not to resolve a specific issue but to optimize the calculation of batch_mem_size using intelligent heuristics.

Changes Made:

Heuristics Implementation: Implemented a heuristic-based calculation for the batch_mem_size parameter within the LogicalGetToSeqScan rule. The new approach intelligently determines the batch size by considering both system memory availability and the length of the target list. This dynamic calculation ensures efficient utilization of system resources during batch processing.
Code Modification: Modified the apply method in the LogicalGetToSeqScan rule to seamlessly incorporate the heuristic-based calculation. The method now utilizes the calculated batch_mem_size for further processing, optimizing the rule's behavior under various conditions.
Error Handling: The introduced heuristic-based approach enhances the robustness of the code. By dynamically adjusting the batch_mem_size parameter, the code adapts to different scenarios, providing a stable and reliable solution for batch-processing tasks.

github-actions

👋 Hello @shikhar1729, thanks for submitting a EVA DB PR 🙏 To allow your work to be integrated as seamlessly as possible, we advise you to:

✅ Verify that your PR is up-to-date with georgia-tech-db/eva master branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge master locally.
✅ Verify that all EVA DB Continuous Integration (CI) checks are passing.
✅ Reduce changes to the absolute minimum required for your bug fix or feature addition.

jarulraj · 2023-10-18T15:52:16Z

@shikhar1729 Can you share details on the heuristic in this PR and also add a comment in the code?

shikhar1729 · 2023-10-18T19:56:58Z

@jarulraj : I have updated the PR description and added comments to the code.

jarulraj · 2023-10-25T01:06:45Z

@shikhar1729 Can you explain this heuristic?

if before.target_list is None:
batch_mem_size = self.context.db.config.get_value("executor", "batch_mem_size")
else:
# Calculate batch_mem_size based on the number of columns and available memory
num_columns = len(before.target_list)
batch_mem_size = int(available_memory * memory_fraction / num_columns)

shikhar1729 · 2023-10-25T03:18:02Z

@jarulraj : Here is the explanation:

Getting Available Memory : Retrieves available system memory using psutil.virtual_memory().available.
Memory Fraction : Allocates half of available memory (memory_fraction = 0.5) for batch processing.
Calculating batch_mem_size : If specific columns are in the query (before.target_list not None), batch_mem_size is based on available memory and column count. batch_mem_size = int(available_memory * memory_fraction / len(before.target_list)) => If no specific columns (before.target_list is None), batch_mem_size is obtained from configuration.
Bounding batch_mem_size : Ensures batch_mem_size falls between 100 and 1000 for efficient processing.
This heuristic dynamically adjusts batch size based on query columns and available memory, ensuring optimal resource utilization during processing.

jarulraj · 2023-10-30T22:47:37Z

Closing this PR (subsumed in #1306).

Adding heuristic based batch_mem_size allocation

8a32438

github-actions bot reviewed Oct 18, 2023

View reviewed changes

Added comments

b5d71fa

jarulraj closed this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding heuristic based batch_mem_size allocation #1300

Adding heuristic based batch_mem_size allocation #1300

shikhar1729 commented Oct 18, 2023 •

edited

Loading

github-actions bot left a comment

jarulraj commented Oct 18, 2023

shikhar1729 commented Oct 18, 2023

jarulraj commented Oct 25, 2023

shikhar1729 commented Oct 25, 2023

jarulraj commented Oct 30, 2023

Adding heuristic based batch_mem_size allocation #1300

Adding heuristic based batch_mem_size allocation #1300

Conversation

shikhar1729 commented Oct 18, 2023 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

jarulraj commented Oct 18, 2023

shikhar1729 commented Oct 18, 2023

jarulraj commented Oct 25, 2023

shikhar1729 commented Oct 25, 2023

jarulraj commented Oct 30, 2023

shikhar1729 commented Oct 18, 2023 •

edited

Loading