Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent undefined behavior when passing handle from Treelite to cuML FIL #5849

Merged
merged 9 commits into from
Apr 20, 2024

Conversation

hcho3
Copy link
Contributor

@hcho3 hcho3 commented Apr 16, 2024

Problem. FIL unit tests have been failing when run with the pip wheel:

FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-1-native-2-False-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-1-native-2-True-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-1-native-10-False-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-1-native-10-True-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-1-native-20-True-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-10-native-2-False-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-10-native-2-True-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-10-native-10-False-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-10-native-10-True-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-GradientBoostingClassifier-10-native-20-True-30-1000] - MemoryError: std::bad_alloc
FAILED test_fil.py::test_fil_skl_classification[2-RandomForestClassifier-1-native-2-False-30-1000] - MemoryError: std::bad_alloc

In the meanwhile, the same tests were passing when run with the Conda package.

Diagnosis. When a Treelite model object is passed from the treelite Python package into the cuml Python package, std::string fields get corrupted. This is likely caused by C++ ABI incompatibilities, as different C++ compiler toolchain/version etc are used when building cuML pip wheel and Treelite pip wheel. More precisely, libtreelite in cuml is incompatible with libtreelite in treelite, and it's unsafe to directly pass a handle from one to another.

Solution. Have the treelite Python package export a serialized byte sequence, and have cuML re-construct the Treelite model from the bytes. TL2cgen uses this technique to enable loose coupling between itself and Treelite.

@hcho3 hcho3 requested a review from a team as a code owner April 16, 2024 20:50
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Apr 16, 2024
@hcho3 hcho3 requested a review from a team as a code owner April 16, 2024 20:54
@github-actions github-actions bot added the ci label Apr 16, 2024
@hcho3 hcho3 added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Apr 16, 2024
python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
Co-authored-by: jakirkham <jakirkham@gmail.com>
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Philip! 🙏

Think we might be able to type these as well. Though please double check the types in case I missed something

python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
Co-authored-by: jakirkham <jakirkham@gmail.com>
@hcho3
Copy link
Contributor Author

hcho3 commented Apr 16, 2024

I will need to make the same fix for the experimental FIL.

python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Philip! 🙏

Think the error message could be formatted a bit nicer. Made a few minor suggestions related to this. Not a big deal either way, but maybe this is helpful

python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved
@jakirkham jakirkham requested a review from dantegd April 17, 2024 04:11
@hcho3
Copy link
Contributor Author

hcho3 commented Apr 17, 2024

Todo. Check for possible memory leak.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix @hcho3!

@hcho3
Copy link
Contributor Author

hcho3 commented Apr 20, 2024

/merge

@rapids-bot rapids-bot bot merged commit c5f7b27 into rapidsai:branch-24.06 Apr 20, 2024
60 checks passed
@hcho3 hcho3 deleted the fix_fil_tests branch April 20, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants