Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Update daal4py #1407

Merged
merged 7 commits into from
Aug 18, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions doc/daal4py/_templates/layout.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{% extends "!layout.html" %}
{% block extrahead %}
<script defer type="text/javascript" src="https://www.intel.com/content/dam/www/global/wap/performance-config.js" ></script>
napetrov marked this conversation as resolved.
Show resolved Hide resolved
<script type="text/javascript">
// Configure TMS settings
var wapLocalCode = 'us-en'; // Dynamically set per localized site, see mapping table for values
var wapSection = "scikit-learn"; // WAP team will give you a unique section for your site
// Load TMS
if(document.location.href.contains("intel.github.io/scikit-learn-intelex")){
(function () {
var url = 'https://www.intel.com/content/dam/www/global/wap/tms-loader.js'; // WAP file URL
var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = url;
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
})();
}
</script>
{% endblock %}
2 changes: 2 additions & 0 deletions doc/daal4py/algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
Algorithms
##########

.. include:: note.rst

Classification
--------------
See also |onedal-dg-classification|_.
Expand Down
8 changes: 6 additions & 2 deletions doc/daal4py/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
# -- Project information -----------------------------------------------------

project = 'daal4py'
copyright = '2021, Intel'
copyright = 'Intel'
author = 'Intel'

# The short X.Y version
Expand Down Expand Up @@ -80,7 +80,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = 'en'

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down Expand Up @@ -200,3 +200,7 @@

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True

exclude_patterns = exclude_patterns + [
"note.rst"
]
3 changes: 3 additions & 0 deletions doc/daal4py/contents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
########
Contents
########

.. include:: note.rst

.. toctree::
:maxdepth: 2
:caption: Contents:
Expand Down
3 changes: 3 additions & 0 deletions doc/daal4py/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
##########
Input Data
##########

.. include:: note.rst

All array arguments to compute functions and to algorithm constructors can be
provided in different formats. daal4py will automatically do its best to work on
the provided data with minimal overhead, most notably without copying the data.
Expand Down
2 changes: 2 additions & 0 deletions doc/daal4py/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
Examples
##########

.. include:: note.rst

Below are examples on how to utilize daal4py for various usage styles.

General usage
Expand Down
3 changes: 3 additions & 0 deletions doc/daal4py/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
#####################################################
Fast, Scalable and Easy Machine Learning With DAAL4PY
#####################################################

.. include:: note.rst

Daal4py makes your Machine Learning algorithms in Python lightning fast and easy to use. It provides
highly configurable Machine Learning kernels, some of which support streaming input data and/or can
be easily and efficiently scaled out to clusters of workstations. Internally it uses Intel(R)
Expand Down
61 changes: 33 additions & 28 deletions doc/daal4py/model-builders.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,64 +19,69 @@
###############################################
Model Builders for the Gradient Boosting Frameworks
###############################################

.. include:: note.rst

Introduction
------------------
Gradient boosting on decision trees is one of the most accurate and efficient
machine learning algorithms for classification and regression.
The most popular implementations of it are the XGBoost*,
LightGBM*, and CatBoost* frameworks.
The most popular implementations of it are:

* XGBoost*
* LightGBM*
* CatBoost*

daal4py Model Builders deliver the accelerated
models inference of those frameworks. The inference is performed by the oneDAL GBT implementation tuned
for the best performance on the Intel(R) Architecture.

Conversion
---------
The first step is to convert already trained model. There are similar
APIs for different frameworks.
The first step is to convert already trained model. The
API usage for different frameworks is the same:

XGBoost::

import daal4py as d4p
d4p_model = d4p.get_gbt_model_from_xgboost(xgb_model)
d4p_model = d4p.mb.convert_model(xgb_model)

LightGBM::

import daal4py as d4p
d4p_model = d4p.get_gbt_model_from_lightgbm(lgb_model)
d4p_model = d4p.mb.convert_model(lgb_model)

CatBoost::

import daal4py as d4p
d4p_model = d4p.get_gbt_model_from_catboost(cb_model)
d4p_model = d4p.mb.convert_model(cb_model)

.. note:: Convert model only once and then use it for the inference.

Classification and Regression Inference
---------
GBT implementation in daal4py assumes separate APIs for the classification and regression.
Specify the corresponding API and match the corresponding problem
in the initial framework.

Classification::
----------------------------------------

d4p_cls_algo = d4p.gbt_classification_prediction(
nClasses=params['classes_count'],
resultsToEvaluate="computeClassLabels",
fptype='float'
)
The API is the same for classification and regression inference.
Based on the original model passed to the ``convert_model``, ``daal_prediction`` is either the classification or regression output.
aepanchi marked this conversation as resolved.
Show resolved Hide resolved

::

daal_prediction = daal_model.predict(test_data)
aepanchi marked this conversation as resolved.
Show resolved Hide resolved

Regression::
d4p_reg_algo = d4p.gbt_regression_prediction()
Here, the ``predict()`` method of ``daal_model`` is being used to make predictions on the ``test_data`` dataset.
The ``daal_prediction``variable stores the predictions made by the ``predict()`` method.
aepanchi marked this conversation as resolved.
Show resolved Hide resolved

Next, daal4py algorithm object needs compute method to be called.
Both the data and the previously converted model should be passed with the results of the prediction
available within the ``.prediction`` parameter.
Scikit-learn-style Estimators
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Compute::
You can also use the scikit-learn-style classes ``GBTDAALClassifier`` and ``GBTDAALRegressor`` to convert and infer your models. For example:

d4p_predictions = d4p_reg_algo.compute(X_test, d4p_model).prediction
::

The one-line variant of the same code::
d4p_prediction = d4p.gbt_regression_prediction().compute(X_test, d4p_model).prediction
from daal4py.sklearn.ensemble import GBTDAALRegressor
reg = xgb.XGBRegressor()
reg.fit(X, y)
d4p_predt = GBTDAALRegressor.convert_model(reg).predict(X)


Limitations
aepanchi marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -97,4 +102,4 @@ Model Builders models conversion
Articles and Blog Posts
---------------------------------

- `Improving the Performance of XGBoost and LightGBM Inference <https://medium.com/intel-analytics-software/improving-the-performance-of-xgboost-and-lightgbm-inference-3b542c03447e>` _
- `Improving the Performance of XGBoost and LightGBM Inference <https://medium.com/intel-analytics-software/improving-the-performance-of-xgboost-and-lightgbm-inference-3b542c03447e>`_
4 changes: 4 additions & 0 deletions doc/daal4py/note.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.. _note:

.. note:: Scikit-learn patching functionality in daal4py was deprecated and moved to a separate package, `Intel(R) Extension for Scikit-learn* <https://github.com/intel/scikit-learn-intelex>`_.
All future patches will be available only in Intel(R) Extension for Scikit-learn*. Use the scikit-learn-intelex package instead of daal4py for the scikit-learn acceleration.
3 changes: 3 additions & 0 deletions doc/daal4py/scaling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
###############################################
Scaling on Distributed Memory (Multiprocessing)
###############################################

.. include:: note.rst

It's Easy
---------
daal4py operates in SPMD style (Single Program Multiple Data), which means your
Expand Down
1 change: 1 addition & 0 deletions doc/daal4py/sklearn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
Scikit-Learn API and patching
#############################


Python interface to efficient Intel(R) oneAPI Data Analytics Library provided by daal4py allows one
to create scikit-learn compatible estimators, transformers, clusterers, etc. powered by oneDAL which
are nearly as efficient as native programs.
Expand Down
3 changes: 3 additions & 0 deletions doc/daal4py/streaming.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
##############
Streaming Data
##############

.. include:: note.rst

For large quantities of data it might be impossible to provide all input data at
once. This might be because the data resides in multiple files and merging it is
to costly (or not feasible in other ways). In other cases the data is simply too
Expand Down