Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix for serializer #273

Merged
merged 5 commits into from
May 11, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions include/treelite/tree_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -925,6 +925,7 @@ ModelImpl<ThresholdType, LeafOutputType>::InitFromPyBuffer(

auto tree_hanlder = [&begin](Tree<ThresholdType, LeafOutputType>& tree) {
tree.InitFromPyBuffer(begin, begin + kNumFramePerTree);
begin += kNumFramePerTree;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're capturing begin by reference, won't this iterate begin forward as a side effect of executing the lambda? It's not immediately clear to me why that is the correct thing to do with this handler. Maybe add a comment or refactor such that the lambda returns the next iterator to be processed rather than directly changing begin?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're capturing begin by reference, won't this iterate begin forward as a side effect of executing the lambda?

Yes, this is intended behavior.

refactor such that the lambda returns the next iterator to be processed

That would not be feasible, since the tree handler for the file stream does not involve iterators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Can we get a comment just to make it clear to future developers that the side-effect is intentional?

};

DeserializeTemplate(num_tree, header_field_handler, tree_hanlder);
Expand Down
52 changes: 27 additions & 25 deletions tests/cpp/test_serializer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ inline void TestRoundTrip(treelite::Model* model) {
auto buffer = model->GetPyBuffer();
std::unique_ptr<treelite::Model> received_model = treelite::Model::CreateFromPyBuffer(buffer);

ASSERT_EQ(TreeliteToBytes(model), TreeliteToBytes(received_model.get()));
ASSERT_TRUE(TreeliteToBytes(model) == TreeliteToBytes(received_model.get()));
}

for (int i = 0; i < 2; ++i) {
Expand All @@ -44,7 +44,7 @@ inline void TestRoundTrip(treelite::Model* model) {
std::unique_ptr<treelite::Model> received_model = treelite::Model::DeserializeFromFile(fp);
std::fclose(fp);

ASSERT_EQ(TreeliteToBytes(model), TreeliteToBytes(received_model.get()));
ASSERT_TRUE(TreeliteToBytes(model) == TreeliteToBytes(received_model.get()));
Copy link
Collaborator Author

@hcho3 hcho3 May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSERT_EQ will dump all the raw bytes into a string, leading to OOM error for this test case. So use ASSERT_TRUE instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to this effect so that someone doesn't naively reintroduce this problem down the line?

}
}

Expand Down Expand Up @@ -178,7 +178,7 @@ void PyBufferInterfaceRoundTrip_TreeDepth2() {
};
builder->SetModelParam("pred_transform", "sigmoid");
builder->SetModelParam("global_bias", "0.5");
for (int tree_id = 0; tree_id < 2; ++tree_id) {
for (int tree_id = 0; tree_id < 3; ++tree_id) {
std::unique_ptr<frontend::TreeBuilder> tree{
new frontend::TreeBuilder(threshold_type, leaf_output_type)
};
Expand All @@ -189,10 +189,10 @@ void PyBufferInterfaceRoundTrip_TreeDepth2() {
tree->SetCategoricalTestNode(1, 0, {0, 1}, true, 3, 4);
tree->SetCategoricalTestNode(2, 1, {0}, true, 5, 6);
tree->SetRootNode(0);
tree->SetLeafNode(3, frontend::Value::Create<LeafOutputType>(3));
tree->SetLeafNode(4, frontend::Value::Create<LeafOutputType>(1));
tree->SetLeafNode(5, frontend::Value::Create<LeafOutputType>(4));
tree->SetLeafNode(6, frontend::Value::Create<LeafOutputType>(2));
tree->SetLeafNode(3, frontend::Value::Create<LeafOutputType>(tree_id + 3));
tree->SetLeafNode(4, frontend::Value::Create<LeafOutputType>(tree_id + 1));
tree->SetLeafNode(5, frontend::Value::Create<LeafOutputType>(tree_id + 4));
tree->SetLeafNode(6, frontend::Value::Create<LeafOutputType>(tree_id + 2));
Comment on lines +196 to +199
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I crafted the test case so that all the trees will have different leaf outputs. The bug being addressed causes all trees to be identical.

builder->InsertTree(tree.get());
}

Expand Down Expand Up @@ -221,28 +221,30 @@ void PyBufferInterfaceRoundTrip_DeepFullTree() {
std::unique_ptr<frontend::ModelBuilder> builder{
new frontend::ModelBuilder(3, 1, false, threshold_type, leaf_output_type)
};
std::unique_ptr<frontend::TreeBuilder> tree{
new frontend::TreeBuilder(threshold_type, leaf_output_type)
};
for (int level = 0; level <= depth; ++level) {
for (int i = 0; i < (1 << level); ++i) {
const int nid = (1 << level) - 1 + i;
tree->CreateNode(nid);
for (int tree_id = 0; tree_id < 3; ++tree_id) {
std::unique_ptr<frontend::TreeBuilder> tree{
new frontend::TreeBuilder(threshold_type, leaf_output_type)
};
for (int level = 0; level <= depth; ++level) {
for (int i = 0; i < (1 << level); ++i) {
const int nid = (1 << level) - 1 + i;
tree->CreateNode(nid);
}
}
}
for (int level = 0; level <= depth; ++level) {
for (int i = 0; i < (1 << level); ++i) {
const int nid = (1 << level) - 1 + i;
if (level == depth) {
tree->SetLeafNode(nid, frontend::Value::Create<LeafOutputType>(1));
} else {
tree->SetNumericalTestNode(nid, (level % 2), "<", frontend::Value::Create<ThresholdType>(0),
true, 2 * nid + 1, 2 * nid + 2);
for (int level = 0; level <= depth; ++level) {
for (int i = 0; i < (1 << level); ++i) {
const int nid = (1 << level) - 1 + i;
if (level == depth) {
tree->SetLeafNode(nid, frontend::Value::Create<LeafOutputType>(tree_id + 1));
} else {
tree->SetNumericalTestNode(nid, (level % 2), "<", frontend::Value::Create<ThresholdType>(0),
true, 2 * nid + 1, 2 * nid + 2);
}
}
}
tree->SetRootNode(0);
builder->InsertTree(tree.get());
}
tree->SetRootNode(0);
builder->InsertTree(tree.get());

std::unique_ptr<Model> model = builder->CommitModel();
TestRoundTrip(model.get());
Expand Down