forked from PaddlePaddle/Paddle
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
50 changed files
with
16,731 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
## High/Low-level API简介 | ||
|
||
Paddle目前有2套API接口: | ||
|
||
- Low-level(底层) API: | ||
|
||
- 灵活性强并且已经相对成熟,使用它训练的模型,能直接支持C++预测上线。 | ||
- 提供了大量的模型作为使用示例,包括[Book](https://github.com/PaddlePaddle/book)中的第7和8章,以及[models](https://github.com/PaddlePaddle/models)中的所有章节。 | ||
- 适用人群:对深度学习有一定了解,需要自定义网络进行训练/预测/上线部署的用户。 | ||
|
||
- High-level(高层)API: | ||
|
||
- 使用简单,[Book](https://github.com/PaddlePaddle/book)中前六章提供了示例。 | ||
- 尚未成熟,接口暂时在[paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib)下面。 | ||
- 适用人群:想通过Book课程进行深度学习基础知识学习的初级用户。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
=========== | ||
API使用指南 | ||
=========== | ||
|
||
.. toctree:: | ||
:titlesonly: | ||
|
||
high_low_level_api.md | ||
low_level/layers/index.rst | ||
low_level/executor.rst | ||
low_level/optimizer.rst | ||
low_level/metrics.rst | ||
low_level/model_save_reader.rst | ||
low_level/inference.rst | ||
low_level/distributed/index.rst |
64 changes: 64 additions & 0 deletions
64
doc/fluid/api_cn/api_guides/low_level/cluster/cluster_train_data_cn.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
.. _api_guide_cluster_train_data: | ||
|
||
#################### | ||
分布式训练reader准备 | ||
#################### | ||
|
||
一个数据并行的分布式训练任务通常会含有多个训练进程,每个训练进程处理整个数据集中的一部分,根据当前进程的唯一序号(trainer_id)以及训练进程总数(trainers)可以决定当前训练进程应该读取哪一部分数据。 | ||
|
||
实现 cluster_reader 来读取分布式训练数据集 | ||
---------------------------------------- | ||
|
||
比较通用的方法,可以实现一个 cluster_reader, 根据训练进程数量以及进程序号决定读取哪些 example: | ||
|
||
.. code-block:: python | ||
def cluster_reader(reader, trainers, trainer_id): | ||
def reader_creator(): | ||
for idx, data in enumerate(reader()): | ||
if idx % trainers == trainer_id: | ||
yield data | ||
return reader | ||
trainers = int(os.getenv("PADDLE_TRAINERS", "1")) | ||
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0")) | ||
train_reader = cluster_reader(paddle.dataset.mnist.train(), trainers, trainer_id) | ||
上述代码中,`trainers` 和 `trainer_id` 分别是训练进程总数和当前训练进程的序号,可以通过环境变量或者参数的方式传递给 Python 程序。 | ||
|
||
预先切分训练文件 | ||
----------------- | ||
|
||
由于使用 `cluster_reader` 依然会读取全量数据,对于训练进程比较多的任务,会造成IO资源的浪费、影响训练性能。另一种方法是可以将训练数据切分成多个小文件,每个进程处理其中的一部分文件, | ||
例如在 Linux 系统中可以使用 `split <http://man7.org/linux/man-pages/man1/split.1.html>`_ 命令将训练数据切分成多个小文件: | ||
|
||
.. code-block:: bash | ||
$ split -d -a 4 -d -l 100 housing.data cluster/housing.data. | ||
$ find ./cluster | ||
cluster/ | ||
cluster/housing.data.0002 | ||
cluster/housing.data.0003 | ||
cluster/housing.data.0004 | ||
cluster/housing.data.0000 | ||
cluster/housing.data.0001 | ||
cluster/housing.data.0005 | ||
数据切分好以后, 可以实现一个 file_dispatcher 函数,根据训练进程数量以及序号决定需要读取哪些文件: | ||
|
||
.. code-block:: python | ||
def file_dispatcher(files_pattern, trainers, trainer_id): | ||
file_list = glob.glob(files_pattern) | ||
ret_list = [] | ||
for idx, f in enumerate(file_list): | ||
if (idx + trainers) % trainers == trainer_id: | ||
ret_list.append(f) | ||
return ret_list | ||
trainers = int(os.getenv("PADDLE_TRAINERS", "1")) | ||
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0")) | ||
files_pattern = "cluster/housing.data.*" | ||
my_files = file_dispatcher(files_pattern, triners, trainer_id) | ||
在上述例子中,`files_pattern` 是训练文件的 `glob 表达式 <https://docs.python.org/2.7/library/glob.html>`_,一般可以用通配符来表示。 |
33 changes: 33 additions & 0 deletions
33
doc/fluid/api_cn/api_guides/low_level/distributed/async_training.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
.. _api_guide_async_training: | ||
|
||
############ | ||
分布式异步训练 | ||
############ | ||
|
||
Fluid支持数据并行的分布式异步训练,API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的 | ||
:code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码,根据环境变量或启动参数, | ||
可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid异步训练只支持pserver模式,异步训练和 `同步训练 <../distributed/sync_training.html>`_ 的主要差异在于:异步训练每个trainer的梯度是单独更新到参数上的, | ||
而同步训练是所有trainer的梯度合并之后统一更新到参数上,因此,同步训练和异步训练的超参数需要分别调节。 | ||
|
||
pserver模式分布式异步训练 | ||
====================== | ||
|
||
API详细使用方法参考 :ref: `api_fluid_DistributeTranspiler` ,简单示例用法: | ||
|
||
.. code-block:: python | ||
config = fluid.DistributedTranspilerConfig() | ||
# 配置策略config | ||
config.slice_var_up = False | ||
t = fluid.DistributedTranspiler(config=config) | ||
t.transpile(trainer_id, | ||
program=main_program, | ||
pservers="192.168.0.1:6174,192.168.0.2:6174", | ||
trainers=1, | ||
sync_mode=False) | ||
以上参数说明请参考`同步训练 <../distributed/sync_training.html>`_ | ||
|
||
需要注意的是:进行异步训练时,请修改 :code:`sync_mode` 的值 | ||
|
||
- :code:`sync_mode` : 是否是同步训练模式,默认为True,不传此参数也默认是同步训练模式,设置为False则为异步训练 |
58 changes: 58 additions & 0 deletions
58
doc/fluid/api_cn/api_guides/low_level/distributed/cpu_train_best_practice.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
.. _api_guide_cpu_training_best_practice: | ||
|
||
################## | ||
分布式CPU训练最佳实践 | ||
################## | ||
|
||
提高CPU分布式训练的训练速度,主要要从两个方面来考虑: | ||
1)提高训练速度,主要是提高CPU的使用率;2)提高通信速度,主要是减少通信传输的数据量。 | ||
|
||
提高CPU的使用率 | ||
============= | ||
|
||
提高CPU使用率主要依赖 :code:`ParallelExecutor`,可以充分利用多个CPU的计算能力来加速计算。 | ||
|
||
API详细使用方法参考 :ref:`api_fluid_ParallelExecutor` ,简单实例用法: | ||
|
||
.. code-block:: python | ||
# 配置执行策略,主要是设置线程数 | ||
exec_strategy = fluid.ExecutionStrategy() | ||
exec_strategy.num_threads = 8 | ||
# 配置构图策略,对于CPU训练而言,应该使用Reduce模式进行训练 | ||
build_strategy = fluid.BuildStrategy() | ||
if int(os.getenv("CPU_NUM")) > 1: | ||
build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce | ||
pe = fluid.ParallelExecutor( | ||
use_cuda=False, | ||
loss_name=avg_cost.name, | ||
main_program=main_program, | ||
build_strategy=build_strategy, | ||
exec_strategy=exec_strategy) | ||
以上参数中: | ||
|
||
- :code:`num_threads` : 模型训练使用的线程数,最好和训练所在机器的物理CPU核数接近 | ||
- :code:`reduce_strategy` : 对于CPU训练而言,应该选择 fluid.BuildStrategy.ReduceStrategy.Reduce | ||
|
||
|
||
通用环境变量配置: | ||
|
||
- :code:`CPU_NUM` :模型副本replica的个数,最好和num_threads一致 | ||
|
||
|
||
提高通信速度 | ||
========== | ||
|
||
要减少通信数据量,提高通信速度,主要是使用稀疏更新 ,目前支持 `稀疏更新 <../distributed/sparse_update.html>`_ 的主要是 :ref:`api_fluid_layers_embedding` 。 | ||
|
||
.. code-block:: python | ||
data = fluid.layers.data(name='ids', shape=[1], dtype='int64') | ||
fc = fluid.layers.embedding(input=data, size=[dict_size, 16], is_sparse=True) | ||
以上参数中: | ||
|
||
- :code:`is_sparse` : 配置embedding使用稀疏更新,如果embedding的dict_size很大,而每次数据data很少,建议使用sparse更新方式。 |
11 changes: 11 additions & 0 deletions
11
doc/fluid/api_cn/api_guides/low_level/distributed/index.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
============= | ||
分布式训练 | ||
============= | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
async_training.rst | ||
cpu_train_best_practice.rst | ||
large_scale_sparse_feature_training.rst | ||
|
42 changes: 42 additions & 0 deletions
42
...api_cn/api_guides/low_level/distributed/large_scale_sparse_feature_training.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
.. _api_guide_large_scale_sparse_feature_training: | ||
|
||
################### | ||
大规模稀疏特征模型训练 | ||
################### | ||
|
||
|
||
模型配置和训练 | ||
============= | ||
|
||
embedding被广泛应用在各种网络结构中,尤其是文本处理相关的模型。在某些场景,例如推荐系统或者搜索引擎中, | ||
embedding的feature id可能会非常多,当feature id达到一定数量时,embedding参数会变得很大, | ||
会带来两个问题: | ||
1)单机内存由于无法存放如此巨大的embedding参数,导致无法训练; | ||
2)普通的训练模式每一轮迭代都需要同步完整的参数,参数太大会让通信变得非常慢,进而影响训练速度。 | ||
|
||
Fluid支持千亿量级超大规模稀疏特征embedding的训练,embedding参数只会保存在parameter server上,通过 | ||
参数prefetch和梯度稀疏更新的方法,大大减少通信量,提高通信速度。 | ||
|
||
该功能只对分布式训练有效,单机无法使用。 | ||
需要配合 `稀疏更新 <../distributed/sparse_update.html>`_ 一起使用。 | ||
|
||
使用方法:在配置embedding的时候,加上参数 :code:`is_distributed=True` 以及 :code:`is_sparse=True` 即可。 | ||
参数 :code:`dict_size` 定义数据中总的id的数量,id可以是int64范围内的任意值,只要总id个数小于等于dict_size就可以支持。 | ||
所以配置之前需要预估一下数据中总的feature id的数量。 | ||
|
||
.. code-block:: python | ||
emb = fluid.layers.embedding( | ||
is_distributed=True, | ||
input=input, | ||
size=[dict_size, embedding_width], | ||
is_sparse=True, | ||
is_distributed=True) | ||
模型存储和预测 | ||
============= | ||
|
||
当特征数量达到千亿的时候,参数量很大,单机已经无法存下,所以模型的存储和加载都和普通模式不同: | ||
1)普通模式下,参数是在trainer端保存和加载的; | ||
2)分布式模式下,参数的保存和加载,都是在pserver端进行,每个pserver只保存和加载该pserver自身对应部分的参数 |
83 changes: 83 additions & 0 deletions
83
doc/fluid/api_cn/api_guides/low_level/distributed/sync_training.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
.. _api_guide_sync_training: | ||
|
||
############ | ||
分布式同步训练 | ||
############ | ||
|
||
Fluid支持数据并行的分布式同步训练,API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的 | ||
:code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码,根据环境变量或启动参数, | ||
可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid分布式同步训练同时支持pserver模式和NCCL2模式, | ||
在API使用上有差别,需要注意。 | ||
|
||
pserver模式分布式训练 | ||
=================== | ||
|
||
API详细使用方法参考 :ref:`DistributeTranspiler` ,简单实例用法: | ||
|
||
.. code-block:: python | ||
config = fluid.DistributedTranspilerConfig() | ||
# 配置策略config | ||
config.slice_var_up = False | ||
t = fluid.DistributedTranspiler(config=config) | ||
t.transpile(trainer_id, | ||
program=main_program, | ||
pservers="192.168.0.1:6174,192.168.0.2:6174", | ||
trainers=1, | ||
sync_mode=True) | ||
以上参数中: | ||
|
||
- :code:`trainer_id` : trainer节点的id,从0到n-1,n为当前训练任务中trainer节点的个数 | ||
- :code:`program` : 被转换的 :code:`program` 默认使用 :code:`fluid.default_main_program()` | ||
- :code:`pservers` : 当前训练任务中pserver节点的IP端口列表 | ||
- :code:`trainers` : int类型,当前训练任务中trainer节点的个数。注意: | ||
* pserver模式下,trainer节点个数可以和pserver节点个数不一致,比如使用20个pserver和50个trainer。在实际训练任务中,您可以通过调整pserver节点和trainer节点个数找到最佳性能 | ||
* NCCL2模式中,此项参数是字符串,指定trainer节点的IP端口列表 | ||
- :code:`sync_mode` : 是否是同步训练模式,默认为True,不传此参数也默认是同步训练模式 | ||
|
||
|
||
其中,支持的config包括: | ||
|
||
- :code:`slice_var_up` : 配置是否切分一个参数到多个pserver上进行优化,默认开启。此选项适用于模型参数个数少,但需要使用大量节点的场景,有利于提升pserver端计算并行度 | ||
- :code:`split_method` : 配置transpiler分配参数(或参数的切片)到多个pserver的方式,默认为"RoundRobin",也可以使用"HashName" | ||
- :code:`min_block_size` : 如果配置了参数切分,指定最小Tensor的切分大小,防止RPC请求包过小,默认为8192,一般情况不需要调整此项参数 | ||
- :code:`enable_dc_asgd` : 是否开启 :code:`DC-ASGD` 此选项在异步训练中生效,启用异步训练补偿算法 | ||
- :code:`mode` : 可以选择"pserver"或"nccl2",指定使用pserver模式或NCCL2模式分布式训练 | ||
- :code:`print_log` : 是否开启transpiler debug日志,此项为开发调试使用 | ||
|
||
通用环境变量配置: | ||
|
||
- :code:`FLAGS_rpc_send_thread_num` :int,指定RPC通信发送时线程的个数 | ||
- :code:`FLAGS_rpc_get_thread_num` : int,指定RPC通信接受时线程的个数 | ||
- :code:`FLAGS_rpc_prefetch_thread_num` : int,分布式lookup table执行RPC通信时,prefetch线程的个数 | ||
- :code:`FLAGS_rpc_deadline` : int,RPC通信最长等待时间,单位为毫秒,默认180000 | ||
|
||
|
||
NCCL2模式分布式训练 | ||
================= | ||
|
||
基于NCCL2 (Collective Communication) 的多机同步训练模式,仅支持在GPU集群下进行。 | ||
此部分详细API说明可以参考 :ref:`DistributeTranspiler` 。 | ||
|
||
注意:NCCL2模式下,集群不需要启动pserver,只需要启动多个trainer节点即可。 | ||
|
||
使用以下代码,将当前 :code:`Program` 转化成适用于NCCL2分布式计算的Fluid :code:`Program` : | ||
|
||
.. code-block:: python | ||
config = fluid.DistributeTranspilerConfig() | ||
config.mode = "nccl2" | ||
t = fluid.DistributedTranspiler(config=config) | ||
t.transpile(trainer_id, | ||
program=main_program, | ||
startup_program=startup_program, | ||
trainers="192.168.0.1:6174,192.168.0.2:6174", | ||
current_endpoint="192.168.0.1:6174") | ||
其中: | ||
|
||
- :code:`trainer_id` : trainer节点的id,从0到n-1,n为当前训练任务中trainer节点的个数 | ||
- :code:`program` 和 :code:`startup_program` : 分别为Fluid 模型的主配置program和初始化startup_program | ||
- :code:`trainers` : 字符串类型,指定当前任务所有trainer的IP和端口号,仅用于NCCL2初始化(pserver模式中,此参数为int,指定trainer节点的个数) | ||
- :code:`current_endpoint` : 当前任务的当前节点的IP和端口号 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
.. _api_guide_executor: | ||
|
||
########## | ||
执行引擎 | ||
########## | ||
|
||
:code:`Executor` 即 :code:`执行器` 。PaddlePaddle Fluid中有两种执行器可以选择。 | ||
:code:`Executor` 实现了一个简易的执行器,所有Operator会被顺序执行。用户可以使用 | ||
Python脚本驱动 :code:`Executor` 执行。默认情况下 :code:`Executor` 是单线程的,如果 | ||
想使用数据并行,请参考另一个执行器, :ref:`api_guide_parallel_executor` 。 | ||
|
||
:code:`Executor` 的代码逻辑非常简单。建议用户在调试过程中,先使用 | ||
:code:`Executor` 跑通模型,再切换到多设备计算,甚至多机计算。 | ||
|
||
:code:`Executor` 在构造的时候接受一个 :code:`Place`, 它们可以是 :ref:`api_fluid_CPUPlace` | ||
或 :ref:`api_fluid_CUDAPlace` 。 :code:`Executor` 在执行的时候可以选择执行的 | ||
:ref:`api_guide_low_level_program` 。 | ||
|
||
简单的使用方法,请参考 `quick_start_fit_a_line <http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/quick_start/fit_a_line/README.cn.html>`_ , API Reference 请参考 | ||
:ref:`api_fluid_Executor` 。 |
Oops, something went wrong.