-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xgboost RF bump for n=10M #14
Comments
@tqchen says: "I now think the bump in running time was due to cache-line issues. As there are some non-consecutive going on xgboost. Having larger amount of rows could mean less cache hit rate, but the impact should not be large as this has things to do micro level optimization. I have pushed some optimization to do prefetching, which should in general improve the speed of xgboost. Would be great if you want to run another round of test." |
Thanks, I have to note that the bump in trend is still likely to exist, but the impact should be limited due to the micro level thing I mentioned. Just that we know the cause of this phenomenon:) |
As for the AUC part, I find that at least in terms of boosting, seems treating all the dates and times as integer gives definitely better result. |
I think that's a reasonable explanation. I re-ran it and there was a significant improvement for n=10M (from 4800sec to 3000sec). The Time vs size curve is still convex though (see updated graphs in README), but your previous comments can be an explanation for this. |
Moved "something weird happens for the largest data size (n=10M) - the trend for Run time and AUC "breaks", see figures main README" issue from #2 here.
The text was updated successfully, but these errors were encountered: