Refactor Prompt Builder for Efficient DataChunk Processing and Memory Optimization #42

dorbanianas · 2024-10-26T13:28:00Z

The current prompt_builder.cpp implementation in src/core/functions requires several optimizations and refactoring steps to improve efficiency and memory usage when processing DataChunks. Below are the main issues identified:

Function Naming and Parameter Handling:
- Rename GetMaxLengthValues to GetMaxTokenLengthPerAttribute for better clarity.
- Optimization: Update this function to find max token length directly over the DataChunk vectors, bypassing JSON-related overhead.
Concatenation and String Allocation:
- In CombineValues, while += concatenation is currently manageable, the approach could be improved by allocating the full string size upfront to prevent reallocation overhead. This is especially relevant as the code is likely to be reused in scenarios with aggregate data handling.
- Loop Optimization: Remove pop_back usage for trimming the last space after concatenation. Instead, concatenate up to N-1 times and handle the last item separately, reducing unnecessary operations.
DataChunk to JSON Transformation:
- Avoid mapping entire DataChunks to JSON before chunking. Although JSON is currently used as a temporary solution, the goal is to remove this dependency.
- For efficient memory handling, calculate and set the maximum number of tuples per LLM request based on token size limits, allocating memory in advance on the stack.
Prompt Rendering and Inja Removal:
- In the long term, replace Inja for prompt rendering. Implement a custom rendering solution directly with DataChunks to avoid intermediary JSON representations and reduce memory usage.

Source: remove redundant code #41

The text was updated successfully, but these errors were encountered:

dorbanianas added the enhancement New feature or request label Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Prompt Builder for Efficient DataChunk Processing and Memory Optimization #42

Refactor Prompt Builder for Efficient DataChunk Processing and Memory Optimization #42

dorbanianas commented Oct 26, 2024 •

edited

Loading

Refactor Prompt Builder for Efficient DataChunk Processing and Memory Optimization #42

Refactor Prompt Builder for Efficient DataChunk Processing and Memory Optimization #42

Comments

dorbanianas commented Oct 26, 2024 • edited Loading

dorbanianas commented Oct 26, 2024 •

edited

Loading