Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add qid like ranklib format #2749

Merged
merged 22 commits into from
Jun 30, 2018
Merged

Add qid like ranklib format #2749

merged 22 commits into from
Jun 30, 2018

Conversation

liuliang01
Copy link
Contributor

add qid data format like ranklib,for ranking task.
#2748
example:
0 qid:1 1:1.0 2:1.0 5:7.0 7:0.0
1 qid:1 1:0.0 2:1.0 5:2.0 7:0.0 8:0.5
1 qid:2 1:0.0 2:1.0 5:2.0 7:0.0 8:1.0
...

@@ -50,6 +50,8 @@ struct MetaInfo {
std::vector<bst_uint> group_ptr;
/*! \brief weights of each instance, optional */
std::vector<bst_float> weights;
/*! \brief session-id of each instance, optional */
std::vector<bst_float> qids;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these should be floats?

Copy link
Contributor Author

@liuliang01 liuliang01 Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will change it to bst_uint. Type bst_float also works, but bst_uint should be enough.

@TheEdoardo93
Copy link

What is the current status of this task?

@liuliang01
Copy link
Contributor Author

The travis check failed, because all the changes depend on dmlc-core dmlc/dmlc-core#317
Can any one help me review the dmlc-core and xgboost pr? @TheEdoardo93 @superbobry @tqchen
Thinks a lot !

@liuliang01 liuliang01 closed this Jan 22, 2018
@liuliang01 liuliang01 reopened this Jan 22, 2018
@liuliang01
Copy link
Contributor Author

This pr depend on dmlc/dmlc-core#317, merged few days ago, but appveyor check fail because it clone the old version of dmlc-core :
Submodule path 'dmlc-core': checked out 'b5bec5481df86e8e6728d8bd80a61d87ef3b2cd5'
How to use the latest version of dmlc-core for continuous-integration checks? @tqchen
Thinks !

@tqchen
Copy link
Member

tqchen commented Jan 23, 2018

pull the dmlc-core in your fork ad push again

@hcho3
Copy link
Collaborator

hcho3 commented May 17, 2018

What is the status of this PR? Now that dmlc-core has been updated to latest version, we should rebase it.

@hcho3
Copy link
Collaborator

hcho3 commented May 24, 2018

@liuliang01 @superbobry I went ahead and updated the PR to reflect the latest mater. I also added qid handling logic to SparsePage. Can you take a look?

@codecov-io
Copy link

codecov-io commented May 24, 2018

Codecov Report

Merging #2749 into master will decrease coverage by 0.01%.
The diff coverage is 42.5%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2749      +/-   ##
============================================
- Coverage     45.69%   45.67%   -0.02%     
  Complexity      228      228              
============================================
  Files           166      166              
  Lines         12972    13010      +38     
  Branches        466      466              
============================================
+ Hits           5927     5942      +15     
- Misses         6853     6876      +23     
  Partials        192      192
Impacted Files Coverage Δ Complexity Δ
include/xgboost/data.h 83.01% <ø> (ø) 0 <0> (ø) ⬇️
src/c_api/c_api.cc 18.65% <0%> (-0.08%) 0 <0> (ø)
tests/cpp/data/test_metainfo.cc 95.65% <100%> (ø) 0 <0> (ø) ⬇️
src/data/sparse_page_source.cc 59.87% <35.71%> (-2.37%) 0 <0> (ø)
src/data/simple_csr_source.cc 87.34% <35.71%> (-11.12%) 0 <0> (ø)
src/data/data.cc 72.67% <66.66%> (-0.47%) 0 <0> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 480e3fd...f0d9c40. Read the comment docs.

liuliang09 and others added 14 commits June 28, 2018 22:14
NativeDataIter's callback does not support qid field. Users of NativeDataIter
will need to call setGroup() function separately to set group information.
@hcho3
Copy link
Collaborator

hcho3 commented Jun 30, 2018

Will merge after dmlc/dmlc-core#419 is merged.

@hcho3 hcho3 merged commit 0cf88d0 into dmlc:master Jun 30, 2018
@hcho3
Copy link
Collaborator

hcho3 commented Jun 30, 2018

@CodingCat @yanboliang FYI, the qid support is now part of XGBoost. You can find the documentation at http://xgboost.readthedocs.io/en/latest/input_format.html#query-id-columns

@Helw150
Copy link

Helw150 commented Jul 25, 2018

@hcho3 that link 404's for me, could you point me in the right direction for QID work?

@hcho3
Copy link
Collaborator

hcho3 commented Jul 25, 2018

Here it is: http://xgboost.readthedocs.io/en/latest/tutorials/input_format.html#query-id-columns

CodingCat pushed a commit to CodingCat/xgboost that referenced this pull request Jul 26, 2018
* add qid for dmlc#2748

* change names

* change spaces

* change qid to bst_uint type

* change qid type to size_t

* change qid first to SIZE_MAX

* change qid type from size_t to uint64_t

* update dmlc-core

* fix qids name error

* fix group_ptr_ error

* Style fix

* Add qid handling logic to SparsePage

* New MetaInfo format + backward compatibility fix

Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able
to read from MetaInfo files saved in old format. Also, define a new format
(2.0) that contains the qid field. This way, we can distinguish files that
contain qid and those that do not.

* Update MetaInfo test

* Simply group assignment logic

* Explicitly set qid=nullptr in NativeDataIter

NativeDataIter's callback does not support qid field. Users of NativeDataIter
will need to call setGroup() function separately to set group information.

* Save qids_ in SaveBinary()

* Upgrade dmlc-core submodule

* Add a test for reading qid

* Add contributor

* Check the size of qids_

* Document qid format
@lock lock bot locked as resolved and limited conversation to collaborators Oct 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants