Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug report] prediction difference between R XGBoost model and translated SQL when set base_score #65

Open
JiaxiangBU opened this issue Oct 20, 2019 · 1 comment
Labels
feature a feature request or enhancement

Comments

@JiaxiangBU
Copy link
Contributor

I choose dataset mtcars to make a reproducible example below.

library(xgboost)
#> Warning: 程辑包'xgboost'是用R版本3.6.1 来建造的
library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
#> Warning: 程辑包'dplyr'是用R版本3.6.1 来建造的
train_data <- mtcars %>% 
    rename(y = am)
dtrain <- 
    xgb.DMatrix(
        data = as.matrix(
            train_data %>% select(-y)
        )
        ,label = train_data$y
    )
xgb_model <- xgb.train(
    data=dtrain,
    nround=10,
    seed = 1, 
    max_depth = 1,
    objective = "binary:logistic",
    base_score = mean(train_data$y) # fix uncalibration problem
)
pred_from_model <- predict(xgb_model, newdata = dtrain)
library(sqldf)
#> Warning: 程辑包'sqldf'是用R版本3.6.1 来建造的
#> 载入需要的程辑包:gsubfn
#> Warning: 程辑包'gsubfn'是用R版本3.6.1 来建造的
#> 载入需要的程辑包:proto
#> Warning: 程辑包'proto'是用R版本3.6.1 来建造的
#> 载入需要的程辑包:RSQLite
#> Warning: 程辑包'RSQLite'是用R版本3.6.1 来建造的
library(tidypredict)
#> Warning: 程辑包'tidypredict'是用R版本3.6.1 来建造的
pred_from_tidypredict <- 
tidypredict_sql(xgb_model, dbplyr::simulate_dbi()) %>% 
  paste("select ",.," from mtcars") %>% 
  # cat
    sqldf() %>% 
  pull
(pred_from_model-pred_from_tidypredict) %>% abs %>% mean
#> [1] 0.04692561

Created on 2019-10-20 by the reprex package (v0.3.0)

@JiaxiangBU JiaxiangBU changed the title [bug report] prediction difference between R model and translated SQL when set base_score [bug report] prediction difference between R XGBoost model and translated SQL when set base_score Oct 20, 2019
JiaxiangBU added a commit to JiaxiangBU/tidypredict that referenced this issue Oct 20, 2019
@JiaxiangBU
Copy link
Contributor Author

I create a pull request to solve this problem #66

@topepo topepo added the feature a feature request or enhancement label Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants