-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature interaction constraint for GPU Hist. #4488
Conversation
* Add interaction constraint for GPU_HIST.
I don't really care about gpu_exact and am willing to deprecate the algorithm in future. Probably after we do some work to improve the performance of sparse data sets for gpu_hist. |
I'm not sure about why VC failed to build generated host stub from |
@trivialfis I found https://issues.jenkins-ci.org/plugins/servlet/mobile#issue/JENKINS-9104. Looks like we have issues running multiple MSBuild jobs running in parallel. |
@hcho3 Could you help working around it? From what Andreas said, it seems to be a matter of setting envs. |
Let me see if I can improve stability of Windows tests |
I have been thinking about that too. Perhaps the feature bundling from lgb would help, but it has memory usage issue as mentioned by @hcho3 in #4354 (comment) |
This PR uses |
I think we need to build a DeviceSplitEvaluator class that shadows the functionality of SplitEvaluator but uses device memory internally and may be updated on the device (e.g. when registering new splits I do not want to copy memory up and down, this is a very large performance penalty). Do you think this is possible? |
For feature interaction constraint (FIC), it won't be necessary since feature sampling has already been using host memory, FIC does not do any extra host device copying. For other split evaluators, I don't think the current abstract interface of |
Its not strictly true that its already doing a copy - I think if column sampling is 1.0 the column sampler always returns the same HostDeviceVector and no copy occurs. The cost of introducing a single small memcpy between host and device for each node may be about 30% increased runtime. |
@RAMitchell Sorry for the ambiguity. I'm aware of this. The correct term is: no more than using feature sampling. The difficulties of implementing this on device are:
Currently I don't have any good idea for how to meet your requirement yet. But suggestions are welcomed. If it's any consolation, I plan to bring back feature grouping to reduce sparsity #4501, which should bring some nice improvement for GPU Hist after we support it. |
To implement a set I would use a boolean (bit?) vector of length n_features. You should know how much memory to allocate ahead of time because the maximum number of nodes is always constrained in the GPU algorithm and we know how many interaction sets there are. Should be fun to try and implement :) Looking at monotone constraints and feature interaction constraints at a higher level, it seems to me that there is a desire from users to directly influence the optimisation process of tree construction. I wonder if there is a more general way of specifying this as an interface? Probably not but interesting to think about. |
@RAMitchell Let me wrap up other PRs first so I can focus on this. |
Closing for now, will open a new PR when it's ready. |
This is still WIP, my approach is to just drop all features that don't comply to constraints before entering split evaluation. Not sure about how to merge it with CPU SplitEvaluator yet.
GPU Exact will be left to another PR. Initially I tried to implement a kernel side evaluator that rejects features during split evaluation, see: 70a5936 , which is much more complicated and now replaced, but might still be useful for GPU EXACT.
related to #4169
@RAMitchell