Skip to content

Commit 7123f44

Browse files
authored
prototype_source/skip_param_init.rst λ²ˆμ—­ (#559)
* prototype_source/skip_param_init.rst λ²ˆμ—­ (#558)
1 parent da57861 commit 7123f44

File tree

1 file changed

+50
-57
lines changed

1 file changed

+50
-57
lines changed
Lines changed: 50 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,61 @@
1-
Skipping Module Parameter Initialization
2-
========================================
1+
λͺ¨λ“ˆ λ§€κ°œλ³€μˆ˜ μ΄ˆκΈ°ν™” κ±΄λ„ˆλ›°κΈ°
2+
===========================
33

4-
Introduction
5-
------------
4+
μ†Œκ°œ
5+
----
66

7-
When a module is created, its learnable parameters are initialized according
8-
to a default initialization scheme associated with the module type. For example, the `weight`
9-
parameter for a :class:`torch.nn.Linear` module is initialized from a
10-
`uniform(-1/sqrt(in_features), 1/sqrt(in_features))` distribution. If some other initialization
11-
scheme is desired, this has traditionally required re-initializing the parameters
12-
after module instantiation:
7+
λͺ¨λ“ˆμ΄ 생성될 λ•Œ, λͺ¨λ“ˆ μœ ν˜•κ³Ό κ΄€λ ¨λœ κΈ°λ³Έ μ΄ˆκΈ°ν™” 방법에 따라 ν•™μŠ΅ κ°€λŠ₯ν•œ λ§€κ°œλ³€μˆ˜κ°€ μ΄ˆκΈ°ν™”λ©λ‹ˆλ‹€.
8+
예λ₯Ό λ“€μ–΄, :class:`torch.nn.Linear` λͺ¨λ“ˆμ˜ `weight` λ§€κ°œλ³€μˆ˜λŠ”
9+
`uniform(-1/sqrt(in_features), 1/sqrt(in_features))` λΆ„ν¬λ‘œ μ΄ˆκΈ°ν™”λ©λ‹ˆλ‹€.
10+
κΈ°μ‘΄μ—λŠ” λ‹€λ₯Έ μ΄ˆκΈ°ν™” 방법이 ν•„μš”ν•œ 경우 λͺ¨λ“ˆ μΈμŠ€ν„΄μŠ€ν™” ν›„ λ§€κ°œλ³€μˆ˜λ₯Ό μž¬μ΄ˆκΈ°ν™”ν•΄μ•Ό ν–ˆμŠ΅λ‹ˆλ‹€.
1311

1412
::
1513

1614
from torch import nn
1715

18-
# Initializes weight from the default distribution: uniform(-1/sqrt(10), 1/sqrt(10)).
16+
# κΈ°λ³Έ λΆ„ν¬λ‘œ κ°€μ€‘μΉ˜λ₯Ό μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€: uniform(-1/sqrt(10), 1/sqrt(10)).
1917
m = nn.Linear(10, 5)
2018

21-
# Re-initialize weight from a different distribution.
19+
# λ‹€λ₯Έ λΆ„ν¬λ‘œ κ°€μ€‘μΉ˜λ₯Ό μž¬μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€.
2220
nn.init.orthogonal_(m.weight)
2321

24-
In this case, the initialization done during construction is wasted computation, and it may be non-trivial if
25-
the `weight` parameter is large.
22+
이 경우 ꡬ성 쀑 μˆ˜ν–‰λ˜λŠ” μ΄ˆκΈ°ν™”λŠ” 계산 낭비이며, `weight` λ§€κ°œλ³€μˆ˜κ°€ 크면 μ‚¬μ†Œν•œ λ¬Έμ œκ°€ 아닐 수 μžˆμŠ΅λ‹ˆλ‹€.
2623

27-
Skipping Initialization
28-
-----------------------
29-
30-
It is now possible to skip parameter initialization during module construction, avoiding
31-
wasted computation. This is easily accomplished using the :func:`torch.nn.utils.skip_init` function:
24+
μ΄ˆκΈ°ν™” κ±΄λ„ˆλ›°κΈ°
25+
--------------
3226

27+
λͺ¨λ“ˆ ꡬ성 쀑 λ§€κ°œλ³€μˆ˜ μ΄ˆκΈ°ν™”λ₯Ό κ±΄λ„ˆλ›°κ²Œ λ˜μ–΄ λ‚­λΉ„λ˜λŠ” 계산을 ν”Όν•  수 μžˆμŠ΅λ‹ˆλ‹€.
28+
:func:`torch.nn.utils.skip_init` ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜λ©΄ μ‰½κ²Œ κ±΄λ„ˆλ›°κΈ°κ°€ κ°€λŠ₯ν•©λ‹ˆλ‹€.
3329
::
3430

3531
from torch import nn
3632
from torch.nn.utils import skip_init
3733

3834
m = skip_init(nn.Linear, 10, 5)
3935

40-
# Example: Do custom, non-default parameter initialization.
36+
# 예제 : κΈ°λ³Έ μ΄μ™Έμ˜ λ§€κ°œλ³€μˆ˜ μ΄ˆκΈ°ν™”λ₯Ό μˆ˜μ •ν•˜μ—¬ μ‹€ν–‰ν•©λ‹ˆλ‹€.
4137
nn.init.orthogonal_(m.weight)
4238

43-
This can be applied to any module that satisfies the conditions described in the
44-
:ref:`Updating` section below. Note that all modules provided by
45-
`torch.nn` satisfy these conditions and thus support skipping init.
39+
μ•„λž˜ :ref:`Updating` μ„Ήμ…˜μ— μ„€λͺ…λœ 쑰건을 μΆ©μ‘±ν•˜λŠ” λͺ¨λ“ˆμ— μ μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
40+
`torch.nn` 에 μžˆλŠ” λͺ¨λ“  λͺ¨λ“ˆμ€ 쑰건을 μΆ©μ‘±ν•˜κΈ° λ•Œλ¬Έμ— μ΄ˆκΈ°ν™” κ±΄λ„ˆλ›°κΈ°λ₯Ό μ§€μ›ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.
4641

4742
.. _Updating:
4843

49-
Updating Modules to Support Skipping Initialization
50-
---------------------------------------------------
44+
μ΄ˆκΈ°ν™” κ±΄λ„ˆλ›°κΈ°λ₯Ό μœ„ν•œ λͺ¨λ“ˆ μ—…λ°μ΄νŠΈ
45+
---------------------------------
5146

52-
Due to the way :func:`torch.nn.utils.skip_init` is implemented (see :ref:`Details`), there are
53-
two requirements that a module must meet to be compatible with the function.
54-
You can opt in to the parameter initialization skipping functionality for your custom module
55-
simply by adhering to these requirements:
47+
:func:`torch.nn.utils.skip_init` 의 κ΅¬ν˜„(μ°Έκ³  :ref:`Details`) 방법에 따라,
48+
λͺ¨λ“ˆμ΄ ν•¨μˆ˜μ™€ ν˜Έν™˜λ˜κΈ° μœ„ν•œ 두 κ°€μ§€ μš”κ΅¬μ‚¬ν•­μ΄ μžˆμŠ΅λ‹ˆλ‹€.
49+
λ‹€μŒμ˜ μš”κ΅¬μ‚¬ν•­μ„ μ΄ν–‰ν•˜λ©΄ μ»€μŠ€ν…€ λͺ¨λ“ˆμ˜ λ§€κ°œλ³€μˆ˜ μ΄ˆκΈ°ν™” κ±΄λ„ˆλ›°κΈ° κΈ°λŠ₯을 선택할 수 μžˆμŠ΅λ‹ˆλ‹€.
5650

57-
1. The module must accept a `device` kwarg in its constructor that is passed to any parameters
58-
or buffers created during construction.
51+
1. λͺ¨λ“ˆμ„ 생성할 λ•Œ λ§€κ°œλ³€μˆ˜μ™€ λ²„νΌλ‘œ μ „λ‹¬λ˜λŠ” λͺ¨λ“ˆμ˜ μƒμ„±μž λ‚΄ `device` ν‚€μ›Œλ“œ 인자(keyword argument)λ₯Ό
52+
μ‚¬μš©ν•΄μ•Ό ν•©λ‹ˆλ‹€.
5953

60-
2. The module must not perform any computation on parameters or buffers in its constructor except
61-
initialization (i.e. functions from `torch.nn.init`).
54+
2. λͺ¨λ“ˆμ€ μ΄ˆκΈ°ν™”λ₯Ό μ œμ™Έν•˜κ³  λͺ¨λ“ˆμ˜ μƒμ„±μž λ‚΄ λ§€κ°œλ³€μˆ˜ λ˜λŠ” 버퍼 계산을 μˆ˜ν–‰ν•˜μ§€ μ•Šμ•„μ•Ό ν•©λ‹ˆλ‹€
55+
(즉, `torch.nn.init`의 ν•¨μˆ˜).
6256

63-
The following example demonstrates a module updated to support the `device`
64-
kwarg by passing it along to any created parameters, buffers, or submodules:
57+
λ‹€μŒμ€ `device` ν‚€μ›Œλ“œ μΈμžκ°€ μƒμ„±λœ νŒŒλΌλ―Έν„°, 버퍼, μ„œλΈŒλͺ¨λ“ˆλ‘œ 따라 μ „λ‹¬λ˜κΈ° μœ„ν•œ
58+
λͺ¨λ“ˆ μ—…λ°μ΄νŠΈλ₯Ό λ³΄μ—¬μ£ΌλŠ” μ˜ˆμ‹œμž…λ‹ˆλ‹€.
6559

6660
::
6761

@@ -72,56 +66,55 @@ kwarg by passing it along to any created parameters, buffers, or submodules:
7266
def __init__(self, foo, bar, device=None):
7367
super().__init__()
7468

75-
# ==== Case 1: Module creates parameters directly. ====
76-
# Pass device along to any created parameters.
69+
# ==== 사둀 1: λͺ¨λ“ˆ λ§€κ°œλ³€μˆ˜λ₯Ό 직접 μƒμ„±ν•©λ‹ˆλ‹€. ====
70+
# μƒμ„±ν•œ λ§€κ°œλ³€μˆ˜μ— μž₯치(device)λ₯Ό μ „λ‹¬ν•©λ‹ˆλ‹€.
7771
self.param1 = nn.Parameter(torch.empty((foo, bar), device=device))
7872
self.register_parameter('param2', nn.Parameter(torch.empty(bar, device=device)))
7973

80-
# To ensure support for the meta device, avoid using ops except those in
81-
# torch.nn.init on parameters in your module's constructor.
74+
# meta μž₯치 지원을 ν™•μ‹€νžˆ ν•˜κΈ° μœ„ν•΄ λͺ¨λ“ˆμ˜ μƒμ„±μž λ‚΄ λ§€κ°œλ³€μˆ˜μ—
75+
# torch.nn.init의 ops μ™Έμ—λŠ” μ‚¬μš©ν•˜μ§€ λ§ˆμ‹­μ‹œμ˜€.
8276
with torch.no_grad():
8377
nn.init.kaiming_uniform_(self.param1)
8478
nn.init.uniform_(self.param2)
8579

8680

87-
# ==== Case 2: Module creates submodules. ====
88-
# Pass device along recursively. All submodules will need to support
89-
# them as well; this is the case for all torch.nn provided modules.
81+
# ==== 사둀 2: λͺ¨λ“ˆμ˜ μ„œλΈŒ λͺ¨λ“ˆμ„ μƒμ„±ν•©λ‹ˆλ‹€. ====
82+
# λͺ¨λ“  μ„œλΈŒ λͺ¨λ“ˆμ΄ ν•΄λ‹Ή 사항을 지원해야 ν•˜κΈ° λ•Œλ¬Έμ— μž₯치λ₯Ό μž¬κ·€μ μœΌλ‘œ μ „λ‹¬ν•©λ‹ˆλ‹€.
83+
# μ΄λŠ” torch.nn이 μ œκ³΅ν•˜λŠ” λͺ¨λ“ˆλ“€μ˜ κ²½μš°μ— ν•΄λ‹Ήν•©λ‹ˆλ‹€.
9084
self.fc = nn.Linear(bar, 5, device=device)
9185

92-
# This also works with containers.
86+
# μ»¨ν…Œμ΄λ„ˆμ—λ„ λ™μΌν•˜κ²Œ μ μš©ν•©λ‹ˆλ‹€.
9387
self.linears = nn.Sequential(
9488
nn.Linear(5, 5, device=device),
9589
nn.Linear(5, 1, device=device)
9690
)
9791

9892

99-
# ==== Case 3: Module creates buffers. ====
100-
# Pass device along during buffer tensor creation.
93+
# ==== 사둀 3: λͺ¨λ“ˆμ˜ 버퍼λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€. ====
94+
# 버퍼 tensor μƒμ„±ν•˜λŠ” λ™μ•ˆ μž₯치λ₯Ό μ „λ‹¬ν•©λ‹ˆλ‹€.
10195
self.register_buffer('some_buffer', torch.ones(7, device=device))
10296

10397
...
10498

10599
.. _Details:
106100

107-
Implementation Details
108-
----------------------
101+
κ΅¬ν˜„ μ„ΈλΆ€ 사항
102+
-------------
109103

110-
Behind the scenes, the :func:`torch.nn.utils.skip_init` function is implemented in terms of a two-step pattern:
104+
λ‚΄λΆ€μ μœΌλ‘œ :func:`torch.nn.utils.skip_init` ν•¨μˆ˜λŠ” 2단계 νŒ¨ν„΄μœΌλ‘œ κ΅¬ν˜„λ©λ‹ˆλ‹€.
111105

112106
::
113107

114-
# 1. Initialize module on the meta device; all torch.nn.init ops have
115-
# no-op behavior on the meta device.
108+
# 1. meta μž₯μΉ˜μ—μ„œ λͺ¨λ“ˆμ„ μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€; λͺ¨λ“  torch.nn.init opsλŠ”
109+
# meta μž₯μΉ˜μ—μ„œ no-op λ™μž‘μ„ ν•©λ‹ˆλ‹€.
116110
m = nn.Linear(10, 5, device='meta')
117111

118-
# 2. Materialize an uninitialized (empty) form of the module on the CPU device.
119-
# The result of this is a module instance with uninitialized parameters.
112+
# 2. μ΄ˆκΈ°ν™”λ˜μ§€ μ•Šμ€(빈) ν˜•νƒœμ˜ λͺ¨λ“ˆμ„ CPU μž₯μΉ˜μ— κ΅¬ν˜„ν•©λ‹ˆλ‹€.
113+
# κ²°κ³ΌλŠ” μ΄ˆκΈ°ν™”λ˜μ§€ μ•Šμ€ 맀개 λ³€μˆ˜λ₯Ό κ°€μ§„ λͺ¨λ“ˆ μΈμŠ€ν„΄μŠ€μž…λ‹ˆλ‹€.
120114
m.to_empty(device='cpu')
121115

122-
It works by instantiating the module onto a "meta" device, which has tensor shape information
123-
but does not allocate any storage. The `torch.nn.init` ops are specially implemented for this meta device
124-
so that they have no-op behavior. This results in the parameter intialization logic being essentially skipped.
116+
λͺ¨λ“ˆμ€ "meta" μž₯치둜 μΈμŠ€ν„΄μŠ€ν™”ν•˜μ—¬ λ™μž‘ν•©λ‹ˆλ‹€. tensor shape 정보λ₯Ό κ°€μ§€κ³  μžˆμ§€λ§Œ μ €μž₯ 곡간은 ν• λ‹Ήν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.
117+
`torch.nn.init` opsλŠ” meta μž₯치λ₯Ό μœ„ν•΄ νŠΉλ³„νžˆ κ΅¬ν˜„λ˜μ–΄ 있고 no-op λ™μž‘μ„ ν•©λ‹ˆλ‹€.
118+
이에 따라 λ§€κ°œλ³€μˆ˜ μ΄ˆκΈ°ν™” λ‘œμ§μ—μ„œ 본질적으둜 κ±΄λ„ˆλ›°κ²Œ λ©λ‹ˆλ‹€.
125119

126-
Note that this pattern only works for modules that properly support a `device` kwarg during construction, as
127-
described in :ref:`Updating`.
120+
:ref:`Updating` 에 μ„€λͺ…λœ λŒ€λ‘œ 이 νŒ¨ν„΄μ€ λͺ¨λ“ˆ ꡬ성 쀑 `device` ν‚€μ›Œλ“œ 인자λ₯Ό 적절히 μ§€μ›ν•˜λŠ” λͺ¨λ“ˆμ—μ„œλ§Œ μž‘λ™ν•©λ‹ˆλ‹€.

0 commit comments

Comments
Β (0)