Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow releasing GPU memory #4668

Closed
RAMitchell opened this issue Jul 16, 2019 · 11 comments
Closed

Allow releasing GPU memory #4668

RAMitchell opened this issue Jul 16, 2019 · 11 comments

Comments

@RAMitchell
Copy link
Member

One common piece of feedback we receive about the GPU algorithms is that memory is not released after training. It may be possible to release memory by deleting the booster object but this is not a great user experience.

See
#4018
#3083
#2663
#3045

The reason why we have not implemented this already is that the internal C++ code does not actually know when training is finished. The language bindings call each training iteration one by one and I don't believe we have any information inside the GPU training code to say if another training iteration is expected or not.

I see a few solutions:

  1. We try to use some heuristic internally to decide if it is a good time to free all memory from inside GBTree.
  2. We implement an API function for cleanup - this function can be specific to GPU memory or it can just be a general hint for xgboost to delete any working memory or temporary data structures. I do not like this option as it will propagate through the entire code base - the learner, booster, updaters and predictor will all have to implement these methods.
  3. We implement a method in the language bindings where the booster object serializes itself and then deserializes from disk. Doing this will clear all temporary data structures and should leave the booster in a usable state to resume training or do prediction.

I am leaning towards option 3) but I think it relies on #3980 to make sure all parameters are correctly saved. Maybe it's still possible to do this with current serialization and not have any unexpected side-effects due to parameters not all being saved.

@trivialfis @sriramch @rongou

@trivialfis
Copy link
Member

Or we pass num boost round to c++?

@seanthegreat7
Copy link

For those who look for a quick workaround till you fix it properly check my solution here

@trivialfis
Copy link
Member

@seanthegreat7 Thanks. That's actually an interesting workaround.

@Lauler
Copy link

Lauler commented Jul 18, 2019

None of the workarounds seem to be working on Windows 10. Tried deleting and loading the booster object (still crashed).

Tried predicting in a subprocess similar to @seanthegreat7 (but for R instead of python). The subprocess just ran indefinitely without finishing.

Would indeed be greatly appreciated if you provided a solution for this issue!

@jtromans
Copy link

I'm finding this very difficult especially when performing a wide parameter search in a loop of some kind.

For example:

exp_models= []
for cnt, mdl_version in enumerate(range(200)):
    clf = xgb.XGBClassifier(booster='gbtree', objective='binary:logistic', 
                tree_method='gpu_hist', n_gpus=1, gpu_id=1, n_estimators=30) 
    trained_model = clf.fit(X_train, y_train, verbose=False)
    exp_models.append(trained_model)

This will crash, since I guess the trained_model hangs around on the GPU indefinitely. Alternatively, if I exp_models.append(trained_model.get_booster().copy()) all is well.

However, I'm also running into the same issue when submitting numerous jobs via a Dask scheduler (note not dask-xgboost).

In both cases I eventually get:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: out of memory

I don't have a view on the best solution, but would love to resolve.

@aviolov
Copy link

aviolov commented Oct 28, 2019

My hack is to do this

xgbPredictor = xgboost.XGBRegressor(**self.xgb_params)            
        xgbPredictor .fit(Xs,ys)
        
        # This hack should only be used if tree_method == gpu_hist or gpu_exact
        if self.xgb_params['tree_method'][:3] == 'gpu':        
            with tempfile.TemporaryFile() as dump_file:
                pickle.dump(xgbPredictor , dump_file)
                dump_file.seek(0)
                self.predictor_ = pickle.load(dump_file)
        else:
            self.predictor_= xgbPredictor

and it has solved my GPU mem-leak

@paantya
Copy link

paantya commented Mar 2, 2020

wouldn't it be easier to implement the function as in pytorch?
Like:
torch.cuda.empty_cache()

@trivialfis
Copy link
Member

It wouldn't be easier, but that's an option.

@paantya
Copy link

paantya commented Mar 5, 2020

@trivialfis Do you (or someone else) plan to fix this problem at all?
in Dask and Spark it is not like this?

@maxmetzger
Copy link

I am running into this same issue, when training many small gpu_hist models.

@trivialfis
Copy link
Member

Could you please open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants