-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work #5848
Comments
Size of weight is either equal to number of groups or number of rows. As you can come to this check, I assume you are not doing ranking. Line 389 in eb067c1
|
I am not doing ranking. |
The number of weights should be equal to number of rows. Since the weight is defined for each data instance. In later version of XGBoost we are adding a lots of checks to prevent user errors. If you are using Python or R XGBoost will even check your parameters. |
But I am not using weights.... |
Because of this bug I cannot upgrade to 1.1.1, and to spark 3.0 - because this error seems to also happen in 1.0.0. Do you any speculation to when this can be fixed? |
Do you have something I can run and reproduce? The bug you described doesn't show up on our tests. |
I uploaded a zip file with the data folder and model folder. import org.apache.spark.ml.PipelineModel
import org.apache.spark.sql.SparkSession
object XgboostTest {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().enableHiveSupport().master("local").getOrCreate();
try{
val data = spark.read.parquet(("/tmp/xgboost_test/data"))
val model = PipelineModel.load("/tmp/xgboost_test/model")
val predictions = model.transform(data)
predictions.persist()
predictions.count()
predictions.show()
}finally {
spark.close()
}
}
} |
Thanks! Let me check that later this week. I'm not familiar with spark so might take some time. |
Hi, |
Not yet. |
still nothing? |
@ranInc Not yet. |
@ranInc I managed to reproduce the error on my end. |
Full error log
I used #5925 to log all invocations of the C API functions. All C API invocations:
|
@trivialfis @RAMitchell We have an issue with the iterator adaptor. Consider a CSR batch consisting 32768 rows whose the last row is empty (no non-zero element). The We will need to handle empty trailing rows or columns carefully. |
@hcho3 Glad that you take over this. |
The most minimal example: apply the following patch to the C++ unit test: diff --git tests/cpp/data/test_adapter.cc tests/cpp/data/test_adapter.cc
index de835358..1da2a71c 100644
--- tests/cpp/data/test_adapter.cc
+++ tests/cpp/data/test_adapter.cc
@@ -73,10 +73,11 @@ class CSRIterForTest {
std::vector<std::remove_pointer<decltype(std::declval<XGBoostBatchCSR>().index)>::type>
feature_idx_ {0, 1, 0, 1, 1};
std::vector<std::remove_pointer<decltype(std::declval<XGBoostBatchCSR>().offset)>::type>
- row_ptr_ {0, 2, 4, 5};
+ row_ptr_ {0, 2, 4, 5, 5};
size_t iter_ {0};
public:
+ size_t static constexpr kRows { 4 }; // Test for the last row being empty
size_t static constexpr kCols { 13 }; // Test for having some missing columns
XGBoostBatchCSR Next() {
@@ -88,7 +89,7 @@ class CSRIterForTest {
batch.offset = dmlc::BeginPtr(row_ptr_);
batch.index = dmlc::BeginPtr(feature_idx_);
batch.value = dmlc::BeginPtr(data_);
- batch.size = 3;
+ batch.size = kRows;
batch.label = nullptr;
batch.weight = nullptr;
@@ -117,11 +118,11 @@ int CSRSetDataNextForTest(DataIterHandle data_handle,
}
}
-TEST(Adapter, IteratorAdaper) {
+TEST(Adapter, IteratorAdapter) {
CSRIterForTest iter;
data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext,
XGBoostBatchCSR> adapter{&iter, CSRSetDataNextForTest};
- constexpr size_t kRows { 6 };
+ constexpr size_t kRows { 8 };
std::unique_ptr<DMatrix> data {
DMatrix::Create(&adapter, std::numeric_limits<float>::quiet_NaN(), 1)
@@ -129,4 +130,5 @@ TEST(Adapter, IteratorAdaper) {
ASSERT_EQ(data->Info().num_col_, CSRIterForTest::kCols);
ASSERT_EQ(data->Info().num_row_, kRows);
}
+
} // namespace xgboost Log from
The example shows a matrix where Row ID 3 and 7 are empty. The |
The bug affects Minimal example for diff --git tests/cpp/data/test_simple_dmatrix.cc tests/cpp/data/test_simple_dmatrix.cc
index 691dc854..563a4949 100644
--- tests/cpp/data/test_simple_dmatrix.cc
+++ tests/cpp/data/test_simple_dmatrix.cc
@@ -185,16 +185,21 @@ TEST(SimpleDMatrix, FromCSC) {
TEST(SimpleDMatrix, FromFile) {
std::string filename = "test.libsvm";
CreateBigTestData(filename, 3 * 5);
+ {
+ std::ofstream fo(filename, std::ios::app | std::ios::out);
+ fo << "0\n";
+ }
+ constexpr size_t expected_nrow = 6;
std::unique_ptr<dmlc::Parser<uint32_t>> parser(
dmlc::Parser<uint32_t>::Create(filename.c_str(), 0, 1, "auto"));
auto verify_batch = [](SparsePage const &batch) {
- EXPECT_EQ(batch.Size(), 5);
+ EXPECT_EQ(batch.Size(), expected_nrow);
EXPECT_EQ(batch.offset.HostVector(),
- std::vector<bst_row_t>({0, 3, 6, 9, 12, 15}));
+ std::vector<bst_row_t>({0, 3, 6, 9, 12, 15, 15}));
EXPECT_EQ(batch.base_rowid, 0);
- for (auto i = 0ull; i < batch.Size(); i++) {
+ for (auto i = 0ull; i < batch.Size() - 1; i++) {
if (i % 2 == 0) {
EXPECT_EQ(batch[i][0].index, 0);
EXPECT_EQ(batch[i][1].index, 1); |
This comment has been minimized.
This comment has been minimized.
Hi, I think I will be able to test it out next week. |
Hi, |
@ranInc Yes. You can either wait for 1.2.0 release or use the SNAPSHOT version. |
Hello Hyunsu, I met the same problem like Ranlnc reported in 1.1.2, it's pretty tricky that due to the environment being fixed, we cannot use 1.2.0, and when I want to download the handle empty rows branch in your folk repo, it cannot be found. Would you happen to have any suggestions? |
The bug has been long fixed, starting from 1.2.0. Please upgrade to the latest XGBoost. We are not able to support very old versions. |
Some models’ predictions fail on the following error:
Check failed: weights_.Size() == num_row_ (15363 vs. 15362) : Size of weights must equal to number of rows.
The numbers in the error are of course not always the same (but the subtraction is always 1).
The same data/model works on xgboost 0.9.
The text was updated successfully, but these errors were encountered: