improved regressor memory usage by 60%#745
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
The pull request successfully refactors the predict method in TabPFNRegressor to incrementally calculate the sum of estimator outputs, significantly reducing memory usage as demonstrated by the memory_reserved metric. This is a valuable improvement for efficiency. The change involves converting the previous forward method into an iterator _iter_forward_executor and then consuming this iterator in predict for memory-efficient aggregation. The original forward method is also updated to use the new iterator, but it still collects all outputs into lists, which might be intentional for fine-tuning purposes.
| return logit_to_output(output_type=output_type) | ||
|
|
||
| def forward( | ||
| def _iter_forward_executor( |
|
@oscarkey friendly ping to review this PR. This is a small change and requires less time to review the PR. I know you may be busy with other work so no rush at all. Review like this encourages me to do more opensource contribution. |
|
hey @poonai , sorry for the delay. I'll get to both your prs on monday :) |
Signed-off-by: balaji <rbalajis25@gmail.com>
- Use _iter_forward_executor directly instead of forward method - Transform probabilities across borders inside the loop - Average ensemble outputs on-the-fly instead of accumulating all outputs This reduces memory usage by avoiding storage of all intermediate outputs, especially beneficial for large n_estimators. Co-Authored-By: glm4.5
Signed-off-by: balaji <rbalajis25@gmail.com>
6b78dfd to
962b2a3
Compare
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request effectively reduces memory usage in the regressor's predict method by processing estimator outputs iteratively instead of accumulating them in memory. The refactoring is well-executed, extracting the iterative logic into a new _iter_forward_executor method while maintaining the previous forward method's behavior for backward compatibility. I've identified one potential issue regarding the handling of cases where n_estimators is zero, which could lead to a runtime error. My feedback includes a suggestion to make the code more robust against this scenario.
| if self.average_before_softmax: | ||
| logits = stacked_logits.log().mean(dim=0).softmax(dim=-1) | ||
| logits = (averaged_logits / n_estimators).softmax(dim=-1) # type: ignore | ||
| else: | ||
| logits = stacked_logits.mean(dim=0) | ||
| logits = averaged_logits / n_estimators # type: ignore |
There was a problem hiding this comment.
If _iter_forward_executor yields no items (e.g., if n_estimators=0), averaged_logits will remain None. This will cause a TypeError when it's used in the division on lines 941 or 943. The existing type: ignore comments suppress this potential runtime error.
To make the method more robust, I suggest adding a check to ensure averaged_logits is not None before proceeding with the calculation. This will provide a clearer error message if no estimators were run and allows for the removal of the type: ignore comments.
| if self.average_before_softmax: | |
| logits = stacked_logits.log().mean(dim=0).softmax(dim=-1) | |
| logits = (averaged_logits / n_estimators).softmax(dim=-1) # type: ignore | |
| else: | |
| logits = stacked_logits.mean(dim=0) | |
| logits = averaged_logits / n_estimators # type: ignore | |
| if averaged_logits is None: | |
| raise ValueError("Cannot make predictions, possibly due to `n_estimators=0`.") | |
| elif self.average_before_softmax: | |
| logits = (averaged_logits / n_estimators).softmax(dim=-1) | |
| else: | |
| logits = averaged_logits / n_estimators |
Issue
Fixes #354
Motivation and Context
The Github issue suggest to remove the low probability borders to reduce the regressor memory usage. Instead, I refactored the existing code to achieve the same result. The proposed code change, incrementally calculates the sum of estimator outputs instead of calculating the sum by accumulating all the estimator output.
existing code:
proposed code:
Public API Changes
How Has This Been Tested?
I've tested the memory usage by tweaking the fine tuning example.
previous allocation:
current allocation:
gain
1.021/1.638 * 100 = 62.33memory_reservedis the total memory allocated during the entire life of the program.memory_allocatedrepresent the current allocation of the program. I assumes, accumulated memory is released to the allocator after thepredictfunction returns. Hence the no change inmemory_allocated. I think it's fair to use gain ofmemory_reservedmetric to argue the benefit of this code change. Please correct me if something is wrong.Checklist
changelog/README.md), or "no changelog needed" label requested.