-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update faq/local/index_en.rst #9947
Conversation
translation version 1.0
doc/v2/faq/local/index_en.rst
Outdated
TBD | ||
.. contents:: | ||
|
||
1. Reduce Memory Consuming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consuming -> Consumption
doc/v2/faq/local/index_en.rst
Outdated
1. Reduce Memory Consuming | ||
------------------- | ||
|
||
The training procedure of neural networks demands dozens gigabytes of host memory or serval gigabytes of device memory, which is a rather memory consuming work. The memory consumed by PaddlePaddle framework mainly includes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dozens gigabytes -> dozens of gigabytes
doc/v2/faq/local/index_en.rst
Outdated
Reduce DataProvider cache memory | ||
++++++++++++++++++++++++++ | ||
|
||
PyDataProvider works under asynchronously mechanism, it loads together with the data fetch and shuffle procedure in host memory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asynchronously -> asynchronous
doc/v2/faq/local/index_en.rst
Outdated
Data Files -> Host Memory Pool -> PaddlePaddle Training | ||
} | ||
|
||
Thus the reduction of the DataProvider cache memory can reduce memory occupancy, meanwhile speed up the data loading procedure before training. However, the size of the memory pool can actually effect the granularity of shuffle,which means a shuffle operation is needed before each data file reading process to ensure the randomness of data when try to reduce the size of the memory pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
effect -> affect
doc/v2/faq/local/index_en.rst
Outdated
|
||
.. literalinclude:: src/reduce_min_pool_size.py | ||
|
||
In such way, the memory consuming can be significantly reduced and hence the training procedure can be accelerated. More details are demonstrated in :ref:`api_pydataprovider2`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such way -> In this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memory consuming -> memory consumption
doc/v2/faq/local/index_en.rst
Outdated
|
||
* Parameters or gradients during training are oversize, which leads to floating overflow during calculation. | ||
* The model failed to convergence and divert to a big value. | ||
* Errors in training data leads to parameters converge to a singularity situation. This may also due to the large scale of input data, which contains millions of parameter values, and that will raise float overflow when operating matrix multiplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors in training data leads to parameters converge to a singularity situation.
This sentence does not make any sense. What are you trying to say here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also due to -> also be due to
doc/v2/faq/local/index_en.rst
Outdated
|
||
Details can refer to example `machine translation <https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/train.py#L66>`_ 。 | ||
|
||
The main difference of these two methods are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of these two -> between these two
doc/v2/faq/local/index_en.rst
Outdated
|
||
The main difference of these two methods are: | ||
|
||
1. They both block the gradient, but within different occasion,the former one happens when then :code:`optimzier` updates the network parameters while the latter happens when the back propagation computing of activation functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but within different occasion
. What does this mean?
doc/v2/faq/local/index_en.rst
Outdated
* Output sequence layer and non sequence layer; | ||
* Multiple output layers process multiple sequence with different length; | ||
|
||
Such issue can be avoid by calling infer interface and set :code:`flatten_result=False`. Thus, the infer interface returns a python list, in which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid -> avoided
doc/v2/faq/local/index_en.rst
Outdated
7. Fetch parameters’ weight and gradient during training | ||
----------------------------------------------- | ||
|
||
Under certain situations, know the weights of currently training mini-batch can provide more inceptions of many problems. Their value can be acquired by printing values in :code:`event_handler` (note that to gain such parameters when training on GPU, you should set :code:`paddle.event.EndForwardBackward`). Detailed code is as following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
know -> knowing
translation version 1.0
fix #8953