Skip to content

improved regressor memory usage by 60%#745

Open
poonai wants to merge 5 commits intoPriorLabs:mainfrom
poonai:poonai/optimize_regressor
Open

improved regressor memory usage by 60%#745
poonai wants to merge 5 commits intoPriorLabs:mainfrom
poonai:poonai/optimize_regressor

Conversation

@poonai
Copy link
Contributor

@poonai poonai commented Jan 22, 2026

Issue

Fixes #354

Motivation and Context

The Github issue suggest to remove the low probability borders to reduce the regressor memory usage. Instead, I refactored the existing code to achieve the same result. The proposed code change, incrementally calculates the sum of estimator outputs instead of calculating the sum by accumulating all the estimator output.

existing code:

outputs = []
for output in iter_estimator_output:
    outputs.append(output)

avg_mean = sum(ouputs) / n_estimator

proposed code:

sum = 0
for output in iter_estimator_output:
    sum += output

avg_mean = sum / n_estimator

Public API Changes

  • No Public API changes
  • Yes, Public API changes (Details below)

How Has This Been Tested?

I've tested the memory usage by tweaking the fine tuning example.

def main() -> None:
    data = sklearn.datasets.fetch_california_housing(as_frame=True)
    X_all = data.data
    y_all = data.target

    X_train, X_test, y_train, y_test = train_test_split(
        X_all, y_all, test_size=0.1, random_state=RANDOM_STATE
    )

    print(
        f"Loaded {len(X_train):,} samples for training and "
        f"{len(X_test):,} samples for testing."
    )

    # 2. Initial model evaluation on test set
    base_reg = TabPFNRegressor(
        device=["cuda:0"],
        n_estimators=NUM_ESTIMATORS_FINAL_INFERENCE,
        ignore_pretraining_limits=True,
        inference_config={"SUBSAMPLE_SAMPLES": 50_000},
    )
    base_reg.fit(X_train, y_train)

    for _ in range(5):
        base_pred = base_reg.predict(X_test)    
    print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated(0)/1024/1024/1024))
    print("torch.cuda.memory_reserved: %fGB"%(torch.cuda.memory_reserved(0)/1024/1024/1024))
    print("torch.cuda.max_memory_reserved: %fGB"%(torch.cuda.max_memory_reserved(0)/1024/1024/1024))
    



if __name__ == "__main__":
    if not torch.cuda.is_available():
        raise RuntimeError(
            "CUDA is not available. Please run the script on a CUDA-enabled GPU."
        )
    main()

previous allocation:

torch.cuda.memory_allocated: 0.046939GB
torch.cuda.memory_reserved: 1.638672GB
torch.cuda.max_memory_reserved: 1.638672GB

current allocation:

torch.cuda.memory_allocated: 0.046939GB
torch.cuda.memory_reserved: 1.021484GB
torch.cuda.max_memory_reserved: 1.021484GB

gain

1.021/1.638 * 100 = 62.33

memory_reserved is the total memory allocated during the entire life of the program. memory_allocated represent the current allocation of the program. I assumes, accumulated memory is released to the allocator after the predict function returns. Hence the no change in memory_allocated. I think it's fair to use gain of memory_reserved metric to argue the benefit of this code change. Please correct me if something is wrong.


Checklist

  • The changes have been tested locally.
  • Documentation has been updated (if the public API or usage changes).
  • A changelog entry has been added (see changelog/README.md), or "no changelog needed" label requested.
  • The code follows the project's style guidelines.
  • I have considered the impact of these changes on the public API.

@poonai poonai requested a review from a team as a code owner January 22, 2026 08:51
@poonai poonai requested review from oscarkey and removed request for a team January 22, 2026 08:51
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully refactors the predict method in TabPFNRegressor to incrementally calculate the sum of estimator outputs, significantly reducing memory usage as demonstrated by the memory_reserved metric. This is a valuable improvement for efficiency. The change involves converting the previous forward method into an iterator _iter_forward_executor and then consuming this iterator in predict for memory-efficient aggregation. The original forward method is also updated to use the new iterator, but it still collects all outputs into lists, which might be intentional for fine-tuning purposes.

return logit_to_output(output_type=output_type)

def forward(
def _iter_forward_executor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the return type hint for _iter_forward_executor. np.ndaarray should be np.ndarray.

    ) -> Iterator[tuple[np.ndarray, torch.Tensor]]:

Copy link
Contributor Author

@poonai poonai Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

@poonai
Copy link
Contributor Author

poonai commented Jan 29, 2026

@oscarkey friendly ping to review this PR.

This is a small change and requires less time to review the PR. I know you may be busy with other work so no rush at all. Review like this encourages me to do more opensource contribution.

@oscarkey
Copy link
Contributor

hey @poonai , sorry for the delay. I'll get to both your prs on monday :)

Signed-off-by: balaji <rbalajis25@gmail.com>
- Use _iter_forward_executor directly instead of forward method
- Transform probabilities across borders inside the loop
- Average ensemble outputs on-the-fly instead of accumulating all outputs

This reduces memory usage by avoiding storage of all intermediate outputs, especially beneficial for large n_estimators.

Co-Authored-By: glm4.5
Signed-off-by: balaji <rbalajis25@gmail.com>
@poonai poonai force-pushed the poonai/optimize_regressor branch from 6b78dfd to 962b2a3 Compare February 4, 2026 08:02
@poonai
Copy link
Contributor Author

poonai commented Feb 4, 2026

@gemini-code-assist review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively reduces memory usage in the regressor's predict method by processing estimator outputs iteratively instead of accumulating them in memory. The refactoring is well-executed, extracting the iterative logic into a new _iter_forward_executor method while maintaining the previous forward method's behavior for backward compatibility. I've identified one potential issue regarding the handling of cases where n_estimators is zero, which could lead to a runtime error. My feedback includes a suggestion to make the code more robust against this scenario.

Comment on lines 940 to +943
if self.average_before_softmax:
logits = stacked_logits.log().mean(dim=0).softmax(dim=-1)
logits = (averaged_logits / n_estimators).softmax(dim=-1) # type: ignore
else:
logits = stacked_logits.mean(dim=0)
logits = averaged_logits / n_estimators # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If _iter_forward_executor yields no items (e.g., if n_estimators=0), averaged_logits will remain None. This will cause a TypeError when it's used in the division on lines 941 or 943. The existing type: ignore comments suppress this potential runtime error.

To make the method more robust, I suggest adding a check to ensure averaged_logits is not None before proceeding with the calculation. This will provide a clearer error message if no estimators were run and allows for the removal of the type: ignore comments.

Suggested change
if self.average_before_softmax:
logits = stacked_logits.log().mean(dim=0).softmax(dim=-1)
logits = (averaged_logits / n_estimators).softmax(dim=-1) # type: ignore
else:
logits = stacked_logits.mean(dim=0)
logits = averaged_logits / n_estimators # type: ignore
if averaged_logits is None:
raise ValueError("Cannot make predictions, possibly due to `n_estimators=0`.")
elif self.average_before_softmax:
logits = (averaged_logits / n_estimators).softmax(dim=-1)
else:
logits = averaged_logits / n_estimators

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regressor memory increase with each estimator iteration

2 participants