Skip to content

Commit

Permalink
Update document for model dump. (#5818)
Browse files Browse the repository at this point in the history
* Clarify the relationship between dump and save.
* Mention the schema.
  • Loading branch information
trivialfis authored Jun 22, 2020
1 parent 26143ad commit 8104f10
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 30 deletions.
43 changes: 20 additions & 23 deletions doc/tutorials/saving_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ configuration directly as a JSON string. In Python package:
print(config)
or
or in R:

.. code-block:: R
Expand Down Expand Up @@ -158,22 +158,9 @@ Will print out something similiar to (not actual output as it's too long for dem
"colsample_bynode": "1",
"colsample_bytree": "1",
"default_direction": "learn",
"enable_feature_grouping": "0",
"eta": "0.300000012",
"gamma": "0",
"grow_policy": "depthwise",
"interaction_constraints": "",
"lambda": "1",
"learning_rate": "0.300000012",
"max_bin": "256",
"max_conflict_rate": "0",
"max_delta_step": "0",
"max_depth": "6",
"max_leaves": "0",
"max_search_group": "100",
"refresh_leaf": "1",
"sketch_eps": "0.0299999993",
"sketch_ratio": "2",
...
"subsample": "1"
}
}
Expand Down Expand Up @@ -207,13 +194,16 @@ This way users can study the internal representation more closely. Please note
JSON generators make use of locale dependent floating point serialization methods, which
is not supported by XGBoost.

************
Future Plans
************
*************************************************
Difference between saving model and dumping model
*************************************************

Right now using the JSON format incurs longer serialisation time, we have been working on
optimizing the JSON implementation to close the gap between binary format and JSON format.
You can track the progress in `#5046 <https://github.com/dmlc/xgboost/pull/5046>`_.
XGBoost has a function called ``dump_model`` in Booster object, which lets you to export
the model in a readable format like ``text``, ``json`` or ``dot`` (graphviz). The primary
use case for it is for model interpretation or visualization, and is not supposed to be
loaded back to XGBoost. The JSON version has a `schema
<https://github.com/dmlc/xgboost/blob/master/doc/dump.schema>`_. See next section for
more info.

***********
JSON Schema
Expand All @@ -229,3 +219,10 @@ leaf directly, instead it saves the weights as a separated array.

.. include:: ../model.schema
:code: json

************
Future Plans
************

Right now using the JSON format incurs longer serialisation time, we have been working on
optimizing the JSON implementation to close the gap between binary format and JSON format.
25 changes: 18 additions & 7 deletions python-package/xgboost/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1444,8 +1444,11 @@ def save_model(self, fname):
The model is saved in an XGBoost internal format which is universal
among the various XGBoost interfaces. Auxiliary attributes of the
Python Booster object (such as feature_names) will not be saved. To
preserve all attributes, pickle the Booster object.
Python Booster object (such as feature_names) will not be saved. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more info.
Parameters
----------
Expand All @@ -1460,7 +1463,7 @@ def save_model(self, fname):
raise TypeError("fname must be a string or os_PathLike")

def save_raw(self):
"""Save the model to a in memory buffer representation
"""Save the model to a in memory buffer representation instead of file.
Returns
-------
Expand All @@ -1479,8 +1482,11 @@ def load_model(self, fname):
The model is loaded from an XGBoost format which is universal among the
various XGBoost interfaces. Auxiliary attributes of the Python Booster
object (such as feature_names) will not be loaded. To preserve all
attributes, pickle the Booster object.
object (such as feature_names) will not be loaded. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more info.
Parameters
----------
Expand All @@ -1503,7 +1509,9 @@ def load_model(self, fname):
raise TypeError('Unknown file type: ', fname)

def dump_model(self, fout, fmap='', with_stats=False, dump_format="text"):
"""Dump model into a text or JSON file.
"""Dump model into a text or JSON file. Unlike `save_model`, the
output format is primarily used for visualization or interpretation,
hence it's more human readable but cannot be loaded back to XGBoost.
Parameters
----------
Expand Down Expand Up @@ -1537,7 +1545,9 @@ def dump_model(self, fout, fmap='', with_stats=False, dump_format="text"):
fout.close()

def get_dump(self, fmap='', with_stats=False, dump_format="text"):
"""Returns the model dump as a list of strings.
"""Returns the model dump as a list of strings. Unlike `save_model`, the
output format is primarily used for visualization or interpretation,
hence it's more human readable but cannot be loaded back to XGBoost.
Parameters
----------
Expand All @@ -1547,6 +1557,7 @@ def get_dump(self, fmap='', with_stats=False, dump_format="text"):
Controls whether the split statistics are output.
dump_format : string, optional
Format of model dump. Can be 'text', 'json' or 'dot'.
"""
fmap = os_fspath(fmap)
length = c_bst_ulong()
Expand Down

0 comments on commit 8104f10

Please sign in to comment.