-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong size of feature_names #2226
Comments
could you provide the content of |
X_cols.txt |
it seems there are some non-ASCII characters in feature names, I think they cause this fail. |
Thank you for your answer. I'll have look whenever possible. |
maybe we can do this in python side. |
@guolinke I think yes, we can. But why not to do this in more general way at cpp side? And I guess that not only |
@StrikerRUS It is not very straight-forward to support the non-ascii characters (e.g. utf8) in cpp side. |
@guolinke I meant not support utf-8, which requires to change model file encoding, but raise the error when meet non-ascii symbol. Is it possible at cpp side? |
is it really an encoding problem? The above attached X_cols.txt contains 2 non-ascii characters. c_str function in basic.py use utf-8 encoding, so these characters safely converted to utf-8 encoded bytes.
model_to_string function in basic.py use decode function with no argument.
How about model file encoding?
I think it's not an encoding problem. I doubt a blank feature name.
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This commit reverts 0d59859. Also see: - microsoft#2226 - microsoft#2478 - microsoft#2229 I reproduced the issue and as @kidotaka gave us a great survey in microsoft#2226, I don't conclude that the cause is UTF-8, but "an empty string (character)". Therefore, I revert "throw error when meet non ascii (microsoft#2229)" whose commit hash is 0d59859, and add support feture names as UTF-8 again.
This commit reverts 0d59859. Also see: - microsoft#2226 - microsoft#2478 - microsoft#2229 I reproduced the issue and as @kidotaka gave us a great survey in microsoft#2226, I don't conclude that the cause is UTF-8, but "an empty string (character)". Therefore, I revert "throw error when meet non ascii (microsoft#2229)" whose commit hash is 0d59859, and add support feture names as UTF-8 again.
* Support UTF-8 characters in feature name again This commit reverts 0d59859. Also see: - #2226 - #2478 - #2229 I reproduced the issue and as @kidotaka gave us a great survey in #2226, I don't conclude that the cause is UTF-8, but "an empty string (character)". Therefore, I revert "throw error when meet non ascii (#2229)" whose commit hash is 0d59859, and add support feture names as UTF-8 again. * add tests * fix check-docs tests * update * fix tests * update .travis.yml * fix tests * update test_r_package.sh * update test_r_package.sh * update test_r_package.sh * add a test for R-package * update test_r_package.sh * update test_r_package.sh * update test_r_package.sh * fix test for R-package * update test_r_package.sh * update test_r_package.sh * update test_r_package.sh * update test_r_package.sh * update * updte * update * remove unneeded comments
Environment info
Operating System:
CPU/GPU model:
C++/Python/R version:
LightGBM version or commit hash:
lightgbm==2.2.3
Error message
Reproducible examples
It is difficult to make it reproducible :)
I've checked various issues related to this such as
#379 which led to this merge #426
#540
Please find bellow the code I used
Which lead to this
I've the feeling that the problem is linked to this, but unsure how to solve it. #426
I'm curently commenting
feature_name=X_cols,
to avoid bug in my pipeline.The text was updated successfully, but these errors were encountered: