-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support qid format like ranklib for ranking task #2748
Comments
👍 This, unlike the current group data format, would allow distributing ranking on Spark. |
liuliang01
pushed a commit
to liuliang01/xgboost
that referenced
this issue
Sep 26, 2017
hcho3
pushed a commit
to liuliang01/xgboost
that referenced
this issue
Jun 29, 2018
hcho3
pushed a commit
to liuliang01/xgboost
that referenced
this issue
Jun 29, 2018
hcho3
pushed a commit
to liuliang01/xgboost
that referenced
this issue
Jun 29, 2018
hcho3
pushed a commit
that referenced
this issue
Jun 30, 2018
* add qid for #2748 * change names * change spaces * change qid to bst_uint type * change qid type to size_t * change qid first to SIZE_MAX * change qid type from size_t to uint64_t * update dmlc-core * fix qids name error * fix group_ptr_ error * Style fix * Add qid handling logic to SparsePage * New MetaInfo format + backward compatibility fix Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able to read from MetaInfo files saved in old format. Also, define a new format (2.0) that contains the qid field. This way, we can distinguish files that contain qid and those that do not. * Update MetaInfo test * Simply group assignment logic * Explicitly set qid=nullptr in NativeDataIter NativeDataIter's callback does not support qid field. Users of NativeDataIter will need to call setGroup() function separately to set group information. * Save qids_ in SaveBinary() * Upgrade dmlc-core submodule * Add a test for reading qid * Add contributor * Check the size of qids_ * Document qid format
CodingCat
pushed a commit
to CodingCat/xgboost
that referenced
this issue
Jul 26, 2018
* add qid for dmlc#2748 * change names * change spaces * change qid to bst_uint type * change qid type to size_t * change qid first to SIZE_MAX * change qid type from size_t to uint64_t * update dmlc-core * fix qids name error * fix group_ptr_ error * Style fix * Add qid handling logic to SparsePage * New MetaInfo format + backward compatibility fix Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able to read from MetaInfo files saved in old format. Also, define a new format (2.0) that contains the qid field. This way, we can distinguish files that contain qid and those that do not. * Update MetaInfo test * Simply group assignment logic * Explicitly set qid=nullptr in NativeDataIter NativeDataIter's callback does not support qid field. Users of NativeDataIter will need to call setGroup() function separately to set group information. * Save qids_ in SaveBinary() * Upgrade dmlc-core submodule * Add a test for reading qid * Add contributor * Check the size of qids_ * Document qid format
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
for ranking problem, we have group files for split query-session.
But the group file seams only for single machine, not for row split distributed environment.
Can we add one 'qid' column like ranklib ?
src-code:src/data/data.cc
// backward compatiblity code.
if (!load_row_split) {
MetaInfo& info = dmat->info();
if (MetaTryLoadGroup(fname + ".group", &info.group_ptr) && !silent) {
LOG(CONSOLE) << info.group_ptr.size() - 1
<< " groups are loaded from " << fname << ".group";
}
The text was updated successfully, but these errors were encountered: