Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support column split in multi-target hist #9171

Merged
merged 23 commits into from
May 26, 2023

Conversation

rongou
Copy link
Contributor

@rongou rongou commented May 17, 2023

Also added support for categorical features in approx.

The main challenge is that the expand entry objects contain vector fields, which means the memory is not contiguous, so we cannot allgather the objects in one go. We have to allgather the primitive fields first, then separately allgather the vector fields.

An alternative may be to serialize the expand entry objects into a binary blob/json, do allgather, then deserialize them back. But that might have a bigger performance impact.

Future work may be to push AllgatherV into the communicator. I think we can implement it more efficiently in gRPC.

@rongou
Copy link
Contributor Author

rongou commented May 17, 2023

@trivialfis

@rongou
Copy link
Contributor Author

rongou commented May 18, 2023

@trivialfis @hcho3 any idea on the test failures on macos-11? I can't reproduce it on my linux box.

@trivialfis
Copy link
Member

Unfortunately, I don't have a macos device.

@hcho3
Copy link
Collaborator

hcho3 commented May 18, 2023

Let me try to reproduce it on my Macbook.

@rongou
Copy link
Contributor Author

rongou commented May 19, 2023

I started a mac instance on AWS and was able to reproduce the test failures. I think I know where the failures are coming from, but still need to do some debugging to figure out a fix.

@rongou
Copy link
Contributor Author

rongou commented May 23, 2023

@trivialfis finally got the CI to pass. Please take a look.

@@ -371,7 +369,8 @@ auto AddCategories(std::set<float> const &categories, HistogramCuts *cuts) {
InvalidCategory();
}
auto &cut_values = cuts->cut_values_.HostVector();
auto max_cat = *std::max_element(categories.cbegin(), categories.cend());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trivialfis I think this is a legit bug. If categories is empty, std::max_element returns categories.cend, which leads to undefined behavior when accessed.

Presumably with row-split categories would never be empty, but with column-split it's pretty common.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if you can leave a comment there so that others don't accidentally remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall

@trivialfis trivialfis merged commit 5b69534 into dmlc:master May 26, 2023
@trivialfis trivialfis mentioned this pull request Apr 28, 2023
39 tasks
@rongou rongou deleted the test-colsplit-hist branch September 25, 2023 16:40
@@ -698,6 +698,9 @@ void MetaInfo::Extend(MetaInfo const& that, bool accumulate_rows, bool check_col
this->feature_type_names = that.feature_type_names;
auto &h_feature_types = feature_types.HostVector();
LoadFeatureType(this->feature_type_names, &h_feature_types);
} else if (!that.feature_types.Empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rongou Could you please help comment on when the meta info has feature types but doesn't have type names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in some tests we set the feature types directly without setting the type names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, let me try to run the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants