Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support qid format like ranklib for ranking task #2748

Closed
liuliang01 opened this issue Sep 26, 2017 · 1 comment
Closed

support qid format like ranklib for ranking task #2748

liuliang01 opened this issue Sep 26, 2017 · 1 comment

Comments

@liuliang01
Copy link
Contributor

for ranking problem, we have group files for split query-session.
But the group file seams only for single machine, not for row split distributed environment.
Can we add one 'qid' column like ranklib ?

src-code:src/data/data.cc
// backward compatiblity code.
if (!load_row_split) {
MetaInfo& info = dmat->info();
if (MetaTryLoadGroup(fname + ".group", &info.group_ptr) && !silent) {
LOG(CONSOLE) << info.group_ptr.size() - 1
<< " groups are loaded from " << fname << ".group";
}

@superbobry
Copy link
Contributor

👍 This, unlike the current group data format, would allow distributing ranking on Spark.

liuliang01 pushed a commit to liuliang01/xgboost that referenced this issue Sep 26, 2017
hcho3 pushed a commit to liuliang01/xgboost that referenced this issue Jun 29, 2018
hcho3 pushed a commit to liuliang01/xgboost that referenced this issue Jun 29, 2018
hcho3 pushed a commit to liuliang01/xgboost that referenced this issue Jun 29, 2018
hcho3 pushed a commit that referenced this issue Jun 30, 2018
* add qid for #2748

* change names

* change spaces

* change qid to bst_uint type

* change qid type to size_t

* change qid first to SIZE_MAX

* change qid type from size_t to uint64_t

* update dmlc-core

* fix qids name error

* fix group_ptr_ error

* Style fix

* Add qid handling logic to SparsePage

* New MetaInfo format + backward compatibility fix

Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able
to read from MetaInfo files saved in old format. Also, define a new format
(2.0) that contains the qid field. This way, we can distinguish files that
contain qid and those that do not.

* Update MetaInfo test

* Simply group assignment logic

* Explicitly set qid=nullptr in NativeDataIter

NativeDataIter's callback does not support qid field. Users of NativeDataIter
will need to call setGroup() function separately to set group information.

* Save qids_ in SaveBinary()

* Upgrade dmlc-core submodule

* Add a test for reading qid

* Add contributor

* Check the size of qids_

* Document qid format
@hcho3 hcho3 closed this as completed Jun 30, 2018
CodingCat pushed a commit to CodingCat/xgboost that referenced this issue Jul 26, 2018
* add qid for dmlc#2748

* change names

* change spaces

* change qid to bst_uint type

* change qid type to size_t

* change qid first to SIZE_MAX

* change qid type from size_t to uint64_t

* update dmlc-core

* fix qids name error

* fix group_ptr_ error

* Style fix

* Add qid handling logic to SparsePage

* New MetaInfo format + backward compatibility fix

Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able
to read from MetaInfo files saved in old format. Also, define a new format
(2.0) that contains the qid field. This way, we can distinguish files that
contain qid and those that do not.

* Update MetaInfo test

* Simply group assignment logic

* Explicitly set qid=nullptr in NativeDataIter

NativeDataIter's callback does not support qid field. Users of NativeDataIter
will need to call setGroup() function separately to set group information.

* Save qids_ in SaveBinary()

* Upgrade dmlc-core submodule

* Add a test for reading qid

* Add contributor

* Check the size of qids_

* Document qid format
@lock lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants