Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/workspace/xgboost/src/metric/rank_metric.cc:603: Check failed: dat[0] <= dat[1] (1.07837 vs. 1) : AUC-PR: AUC > 1.0 #6561

Closed
pseudotensor opened this issue Dec 31, 2020 · 5 comments · Fixed by #7297
Assignees

Comments

@pseudotensor
Copy link
Contributor

pseudotensor commented Dec 31, 2020

No dask this time ;), just hyperopt and cudf being used and among hyperparameters happen to stumble onto this error for a couple tests. This is a binary problem.

Seems like a bug in AUC-PR.

ERROR: tests/test_models/test_stacking.py::test_xgboost_class_hyperopt_dask_rapids[XGBoostGBMModel2TrueFalseTrue204c3]
------------------------------------------------------------------------------
[gw3] linux -- Python 3.6.10 /home/jon/minicondadai/bin/python
concurrent.futures.process._RemoteTraceback: 
"""
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/data/jon/h2oai.fullcondatest3/h2oaicore/models.py", line 1626, in objective
    func(X, y, **kwargs_fit_hyper)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/core.py", line 422, in inner_f
    return f(**kwargs)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/sklearn.py", line 926, in fit
    callbacks=callbacks)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/training.py", line 189, in train
    early_stopping_rounds=early_stopping_rounds)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/training.py", line 83, in _train_internal
    if callbacks.after_iteration(bst, i, dtrain, evals):
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/callback.py", line 776, in after_iteration
    bst_eval_set = model.eval_set(evals, epoch, self.feval)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/core.py", line 1343, in eval_set
    ctypes.byref(msg)))
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/core.py", line 189, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [19:12:07] /workspace/xgboost/src/metric/rank_metric.cc:603: Check failed: dat[0] <= dat[1] (1.07837 vs. 1) : AUC-PR: AUC > 1.0
Stack trace:
  [bt] (0) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7f5878b15f64]
  [bt] (1) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::metric::EvalAucPR::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool)+0xc11) [0x7f5878c68b51]
  [bt] (2) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerImpl::EvalOneIter(int, std::vector<std::shared_ptr<xgboost::DMatrix>, std::allocator<std::shared_ptr<xgboost::DMatrix> > > const&, std::vector<std::string, std::allocator<std::string> > const&)+0x4f4) [0x7f5878c3f964]
  [bt] (3) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterEvalOneIter+0x22d) [0x7f5878b1db6d]
  [bt] (4) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f5a40143630]
  [bt] (5) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f5a40142fed]
  [bt] (6) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f5a402f8f9e]
  [bt] (7) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5) [0x7f5a402f99d5]
  [bt] (8) #_#_h2oai_run_test_test_xgboost_class_hyperopt_dask_rapids_run_subprocessXGBoostGBMModel_fit_model(_PyObject_FastCallDict+0x8b) [0x559eb3d6800b]

@pseudotensor
Copy link
Contributor Author

If not obvious how to fix, I can provide a repro in a few days.

pseudotensor added a commit to h2oai/xgboost that referenced this issue Dec 31, 2020
@pseudotensor
Copy link
Contributor Author

segfault_xgb140.zip

This fails on first iteration of the loop, but produces the AUC-PR problem. The odd thing is the result is different each time.

(base) jon@pseudotensor:~/h2oai.fullcondatest$ python 815da5e1-d66d-4346-b88f-65bc02e846b0_hyperfit_beforexgbobj.dat.py
0
[01:25:31] WARNING: /workspace/xgboost/src/learner.cc:547: 
Parameters: { model_class_name, monotonicity_constraints } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Traceback (most recent call last):
  File "815da5e1-d66d-4346-b88f-65bc02e846b0_hyperfit_beforexgbobj.dat.py", line 7, in <module>
    model.fit(X, y, **kwargs)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/core.py", line 422, in inner_f
    return f(**kwargs)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/sklearn.py", line 926, in fit
    callbacks=callbacks)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/training.py", line 189, in train
    early_stopping_rounds=early_stopping_rounds)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/training.py", line 83, in _train_internal
    if callbacks.after_iteration(bst, i, dtrain, evals):
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/callback.py", line 432, in after_iteration
    score = model.eval_set(evals, epoch, self.metric)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/core.py", line 1343, in eval_set
    ctypes.byref(msg)))
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/core.py", line 189, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [01:25:31] /workspace/xgboost/src/metric/rank_metric.cc:603: Check failed: dat[0] <= dat[1] (1.36802 vs. 1) : AUC-PR: AUC > 1.0
Stack trace:
  [bt] (0) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7fc15554bf64]
  [bt] (1) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::metric::EvalAucPR::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool)+0xc11) [0x7fc15569eb51]
  [bt] (2) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerImpl::EvalOneIter(int, std::vector<std::shared_ptr<xgboost::DMatrix>, std::allocator<std::shared_ptr<xgboost::DMatrix> > > const&, std::vector<std::string, std::allocator<std::string> > const&)+0x4f4) [0x7fc155675964]
  [bt] (3) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterEvalOneIter+0x22d) [0x7fc155553b6d]
  [bt] (4) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fc181be1630]
  [bt] (5) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7fc181be0fed]
  [bt] (6) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7fc180560f9e]
  [bt] (7) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5) [0x7fc1805619d5]
  [bt] (8) python(_PyObject_FastCallDict+0x8b) [0x565072ae200b]
xgboost.core.XGBoostError: [01:25:28] /workspace/xgboost/src/metric/rank_metric.cc:603: Check failed: dat[0] <= dat[1] (1.22534 vs. 1) : AUC-PR: AUC > 1.0
xgboost.core.XGBoostError: [01:25:24] /workspace/xgboost/src/metric/rank_metric.cc:603: Check failed: dat[0] <= dat[1] (1.04165 vs. 1) : AUC-PR: AUC > 1.0

etc.

@pseudotensor
Copy link
Contributor Author

pseudotensor commented Jan 1, 2021

FYI the file is named as a segfault because on another system with this work-around: h2oai@8e319d4 it leads to segfault eventually in the loop, even though it is a fresh fit every time. I'll file separate issue.

@JohnZed
Copy link
Contributor

JohnZed commented Mar 23, 2021

@pseudotensor are you able to use ROC-AUC instead with the current main? #6747 was just merged in, which should make that version work much better. AUCPR is still planned for an update, but it will take a fair amount of work too, so if ROC-AUC is usable as a workaround it may get you a solution faster.

@pseudotensor
Copy link
Contributor Author

We are just finishing py38+rapids0.19.2 etc. upgrade and will be trying to see if have these dask problems still.

Yes, I'll avoid AUCPR for dask for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants