Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature interaction constraint for GPU Hist. #4488

Closed
wants to merge 14 commits into from

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented May 22, 2019

This is still WIP, my approach is to just drop all features that don't comply to constraints before entering split evaluation. Not sure about how to merge it with CPU SplitEvaluator yet.

GPU Exact will be left to another PR. Initially I tried to implement a kernel side evaluator that rejects features during split evaluation, see: 70a5936 , which is much more complicated and now replaced, but might still be useful for GPU EXACT.

related to #4169

@RAMitchell

@RAMitchell
Copy link
Member

I don't really care about gpu_exact and am willing to deprecate the algorithm in future. Probably after we do some work to improve the performance of sparse data sets for gpu_hist.

@trivialfis trivialfis marked this pull request as ready for review May 23, 2019 16:26
@trivialfis
Copy link
Member Author

trivialfis commented May 23, 2019

I'm not sure about why VC failed to build generated host stub from host_device_vector.cu.

@hcho3
Copy link
Collaborator

hcho3 commented May 23, 2019

@trivialfis I found https://issues.jenkins-ci.org/plugins/servlet/mobile#issue/JENKINS-9104. Looks like we have issues running multiple MSBuild jobs running in parallel.

@trivialfis
Copy link
Member Author

@hcho3 Could you help working around it? From what Andreas said, it seems to be a matter of setting envs.

@trivialfis trivialfis changed the title [WIP] Feature interaction constraint for GPU Hist. Feature interaction constraint for GPU Hist. May 23, 2019
@hcho3
Copy link
Collaborator

hcho3 commented May 23, 2019

Let me see if I can improve stability of Windows tests

@trivialfis
Copy link
Member Author

trivialfis commented May 24, 2019

@RAMitchell

Probably after we do some work to improve the performance of sparse data sets for gpu_hist.

I have been thinking about that too. Perhaps the feature bundling from lgb would help, but it has memory usage issue as mentioned by @hcho3 in #4354 (comment)

@trivialfis
Copy link
Member Author

This PR uses SplitEvaluator to enable feature interaction support for GPU Hist. This avoids creating another interaction constraint implementation, but still we need to think about how to merge the code path of other split evaluators. One thing worth noting is that, interaction constraint is different from other split evaluators in two ways, it has dynamic size buffer to track tree splits, and it doesn't have to be used during actual split evaluation. So in the future we might want to distinguish it from others.

@RAMitchell
Copy link
Member

I think we need to build a DeviceSplitEvaluator class that shadows the functionality of SplitEvaluator but uses device memory internally and may be updated on the device (e.g. when registering new splits I do not want to copy memory up and down, this is a very large performance penalty). Do you think this is possible?

@trivialfis
Copy link
Member Author

trivialfis commented May 25, 2019

@RAMitchell

For feature interaction constraint (FIC), it won't be necessary since feature sampling has already been using host memory, FIC does not do any extra host device copying.

For other split evaluators, I don't think the current abstract interface of SplitEvaluator is useful, it's just making things more complicated than necessary. There are only 3 of them, technically FIC is better used for feature selection than gradient gain evaluation. So 2 ifs are all we need ...

@RAMitchell
Copy link
Member

Its not strictly true that its already doing a copy - I think if column sampling is 1.0 the column sampler always returns the same HostDeviceVector and no copy occurs. The cost of introducing a single small memcpy between host and device for each node may be about 30% increased runtime.

@trivialfis
Copy link
Member Author

trivialfis commented May 25, 2019

@RAMitchell Sorry for the ambiguity. I'm aware of this. The correct term is: no more than using feature sampling.

The difficulties of implementing this on device are:

  • It needs to use set. Children merge their parents' constraint set during split. Ordering doesn't matter, but no duplication is allowed. Situations like [[0, 2], [1, 3, 4], [5, 6], [3, 5]] where 3 appears in two different set can happen. Say root splits at 1, and its left child splits at 5. During split the two set [1, 3, 4], [3, 5] needs to be merged into child because children should be able to interact with existing parent's split (also this is what the current implementation do).

  • FIC can come in with varying size, from single element [[0]] to all features.

  • Also device memory allocation is inevitable since every node needs to have its own set. Combined with first difficulty, to avoid copying, we need to 1. compute the size of set on device (the information is not copied to host). 2. Allocate memory on host. 3 Come back to device and compute the actual set. I assume this is not much better than copying memory. Also, combined with difficulty 2, the computation won't be very efficient.

Currently I don't have any good idea for how to meet your requirement yet. But suggestions are welcomed. If it's any consolation, I plan to bring back feature grouping to reduce sparsity #4501, which should bring some nice improvement for GPU Hist after we support it.

@RAMitchell
Copy link
Member

To implement a set I would use a boolean (bit?) vector of length n_features. You should know how much memory to allocate ahead of time because the maximum number of nodes is always constrained in the GPU algorithm and we know how many interaction sets there are. Should be fun to try and implement :)

Looking at monotone constraints and feature interaction constraints at a higher level, it seems to me that there is a desire from users to directly influence the optimisation process of tree construction. I wonder if there is a more general way of specifying this as an interface? Probably not but interesting to think about.

@trivialfis
Copy link
Member Author

@RAMitchell Let me wrap up other PRs first so I can focus on this.

@trivialfis
Copy link
Member Author

Closing for now, will open a new PR when it's ready.

@trivialfis trivialfis closed this May 29, 2019
@trivialfis trivialfis deleted the feature-interaction-rebase branch May 29, 2019 15:13
@trivialfis trivialfis mentioned this pull request Jun 6, 2019
3 tasks
@lock lock bot locked as resolved and limited conversation to collaborators Aug 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants