-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add qid like ranklib format #2749
Conversation
include/xgboost/data.h
Outdated
@@ -50,6 +50,8 @@ struct MetaInfo { | |||
std::vector<bst_uint> group_ptr; | |||
/*! \brief weights of each instance, optional */ | |||
std::vector<bst_float> weights; | |||
/*! \brief session-id of each instance, optional */ | |||
std::vector<bst_float> qids; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these should be floats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will change it to bst_uint. Type bst_float also works, but bst_uint should be enough.
What is the current status of this task? |
The travis check failed, because all the changes depend on dmlc-core dmlc/dmlc-core#317 |
This pr depend on dmlc/dmlc-core#317, merged few days ago, but appveyor check fail because it clone the old version of dmlc-core : |
pull the dmlc-core in your fork ad push again |
What is the status of this PR? Now that dmlc-core has been updated to latest version, we should rebase it. |
@liuliang01 @superbobry I went ahead and updated the PR to reflect the latest mater. I also added qid handling logic to SparsePage. Can you take a look? |
Codecov Report
@@ Coverage Diff @@
## master #2749 +/- ##
============================================
- Coverage 45.69% 45.67% -0.02%
Complexity 228 228
============================================
Files 166 166
Lines 12972 13010 +38
Branches 466 466
============================================
+ Hits 5927 5942 +15
- Misses 6853 6876 +23
Partials 192 192
Continue to review full report at Codecov.
|
cc23e04
to
e6d7851
Compare
Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able to read from MetaInfo files saved in old format. Also, define a new format (2.0) that contains the qid field. This way, we can distinguish files that contain qid and those that do not.
NativeDataIter's callback does not support qid field. Users of NativeDataIter will need to call setGroup() function separately to set group information.
Will merge after dmlc/dmlc-core#419 is merged. |
@CodingCat @yanboliang FYI, the qid support is now part of XGBoost. You can find the documentation at http://xgboost.readthedocs.io/en/latest/input_format.html#query-id-columns |
@hcho3 that link 404's for me, could you point me in the right direction for QID work? |
* add qid for dmlc#2748 * change names * change spaces * change qid to bst_uint type * change qid type to size_t * change qid first to SIZE_MAX * change qid type from size_t to uint64_t * update dmlc-core * fix qids name error * fix group_ptr_ error * Style fix * Add qid handling logic to SparsePage * New MetaInfo format + backward compatibility fix Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able to read from MetaInfo files saved in old format. Also, define a new format (2.0) that contains the qid field. This way, we can distinguish files that contain qid and those that do not. * Update MetaInfo test * Simply group assignment logic * Explicitly set qid=nullptr in NativeDataIter NativeDataIter's callback does not support qid field. Users of NativeDataIter will need to call setGroup() function separately to set group information. * Save qids_ in SaveBinary() * Upgrade dmlc-core submodule * Add a test for reading qid * Add contributor * Check the size of qids_ * Document qid format
add qid data format like ranklib,for ranking task.
#2748
example:
0 qid:1 1:1.0 2:1.0 5:7.0 7:0.0
1 qid:1 1:0.0 2:1.0 5:2.0 7:0.0 8:0.5
1 qid:2 1:0.0 2:1.0 5:2.0 7:0.0 8:1.0
...