Skip to content

Commit

Permalink
Update tutorials (apache#18609)
Browse files Browse the repository at this point in the history
Update docs according to new Block APIs (apache#18413)
  • Loading branch information
acphile authored and chinakook committed Nov 19, 2020
1 parent 3ead83b commit 4d614bd
Show file tree
Hide file tree
Showing 23 changed files with 194 additions and 287 deletions.
7 changes: 3 additions & 4 deletions docs/python_docs/python/api/gluon/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,9 @@ one input layer, one hidden layer, and one output layer.
# When instantiated, Sequential stores a chain of neural network layers.
# Once presented with data, Sequential executes each layer in turn, using
# the output of one layer as the input for the next
with net.name_scope():
net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes)
net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer
net.add(gluon.nn.Dense(num_outputs))
net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes)
net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer
net.add(gluon.nn.Dense(num_outputs))
.. automodule:: mxnet.gluon
Expand Down
28 changes: 12 additions & 16 deletions docs/python_docs/python/tutorials/extend/custom_layer.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,9 @@ Below is an example of how to create a simple neural network with a custom layer

```python
net = gluon.nn.HybridSequential() # Define a Neural Network as a sequence of hybrid blocks
with net.name_scope(): # Used to disambiguate saving and loading net parameters
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer()) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer()) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons


net.initialize(mx.init.Xavier(magnitude=2.24)) # Initialize parameters of all layers
Expand Down Expand Up @@ -148,12 +147,11 @@ class NormalizationHybridLayer(gluon.HybridBlock):
def __init__(self, hidden_units, scales):
super(NormalizationHybridLayer, self).__init__()

with self.name_scope():
self.weights = self.params.get('weights',
shape=(hidden_units, 0),
allow_deferred_init=True)
self.weights = gluon.Parameter('weights',
shape=(hidden_units, 0),
allow_deferred_init=True)

self.scales = self.params.get('scales',
self.scales = gluon.Parameter('scales',
shape=scales.shape,
init=mx.init.Constant(scales.asnumpy().tolist()), # Convert to regular list to make this object serializable
differentiable=False)
Expand All @@ -170,14 +168,13 @@ In the example above 2 set of parameters are defined:
1. Parameter `scale` is a constant that doesn't change. Its shape is defined during construction.

Notice a few aspects of this code:
* `name_scope()` method is used to add a prefix to parameter names during saving and loading
* Shape is not provided when creating `weights`. Instead it is going to be infered from the shape of the input
* `Scales` parameter is initialized and marked as `differentiable=False`.
* `F` backend is used for all calculations
* The calculation of dot product is done using `F.FullyConnected()` method instead of `F.dot()` method. The one was chosen over another because the former supports automatic infering shapes of inputs while the latter doesn't. This is extremely important to know, if one doesn't want to hard code all the shapes. The best way to learn what operators supports automatic inference of input shapes at the moment is browsing C++ implementation of operators to see if one uses a method `SHAPE_ASSIGN_CHECK(*in_shape, fullc::kWeight, Shape2(param.num_hidden, num_input));`
* `hybrid_forward()` method signature has changed. It accepts two new arguments: `weights` and `scales`.

The last peculiarity is due to support of imperative and symbolic programming by `HybridBlock`. During training phase, parameters are passed to the layer by Apache MxNet framework as additional arguments to the method, because they might need to be converted to a `Symbol` depending on if the layer was hybridized. One shouldn't use `self.weights` and `self.scales` or `self.params.get` in `hybrid_forward` except to get shapes of parameters.
The last peculiarity is due to support of imperative and symbolic programming by `HybridBlock`. During training phase, parameters are passed to the layer by Apache MxNet framework as additional arguments to the method, because they might need to be converted to a `Symbol` depending on if the layer was hybridized. One shouldn't use `self.weights` and `self.scales` in `hybrid_forward` except to get shapes of parameters.

Running forward pass on this network is very similar to the previous example, so instead of just doing one forward pass, let's run whole training for a few epochs to show that `scales` parameter doesn't change during the training while `weights` parameter is changing.

Expand All @@ -194,11 +191,10 @@ def print_params(title, net):
print('{} = {}\n'.format(key, value.data()))

net = gluon.nn.HybridSequential() # Define a Neural Network as a sequence of hybrid blocks
with net.name_scope(): # Used to disambiguate saving and loading net parameters
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer(hidden_units=5,
scales = nd.array([2]))) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer(hidden_units=5,
scales = nd.array([2]))) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons


net.initialize(mx.init.Xavier(magnitude=2.24)) # Initialize parameters of all layers
Expand Down
2 changes: 1 addition & 1 deletion docs/python_docs/python/tutorials/extend/customop.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ class DenseBlock(mx.gluon.Block):
def __init__(self, in_channels, channels, bias, **kwargs):
super(DenseBlock, self).__init__(**kwargs)
self._bias = bias
self.weight = self.params.get('weight', shape=(channels, in_channels))
self.weight = gluon.Parameter('weight', shape=(channels, in_channels))

def forward(self, x):
ctx = x.context
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ net.add(nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
nn.Dense(10))
```

And then load the saved parameters into GPU 0 directly, or use `net.collect_params().reset_ctx` to change the device.
And then load the saved parameters into GPU 0 directly, or use `net.reset_ctx` to change the device.

```{.python .input n=20}
net.load_parameters('net.params', ctx=gpu(0))
Expand Down Expand Up @@ -120,7 +120,7 @@ The training loop is quite similar to what we introduced before. The major diffe
# Diff 1: Use two GPUs for training.
devices = [gpu(0), gpu(1)]
# Diff 2: reinitialize the parameters and place them on multiple GPUs
net.collect_params().initialize(force_reinit=True, ctx=devices)
net.initialize(force_reinit=True, ctx=devices)
# Loss and trainer are the same as before
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,7 @@ Before we go to training, one unique Gluon feature you should be aware of is hyb
finetune_net = resnet50_v2(pretrained=True, ctx=ctx)

# change last softmax layer since number of classes are different
with finetune_net.name_scope():
finetune_net.output = nn.Dense(classes)
finetune_net.output = nn.Dense(classes)
finetune_net.output.initialize(init.Xavier(), ctx=ctx)
# hybridize for better performance
finetune_net.hybridize()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,10 @@ Below, we define a model which has an input layer of 10 neurons, a couple of inn
```python
net = nn.HybridSequential()

with net.name_scope():
net.add(nn.Dense(units=10, activation='relu')) # input layer
net.add(nn.Dense(units=10, activation='relu')) # inner layer 1
net.add(nn.Dense(units=10, activation='relu')) # inner layer 2
net.add(nn.Dense(units=1)) # output layer: notice, it must have only 1 neuron
net.add(nn.Dense(units=10, activation='relu')) # input layer
net.add(nn.Dense(units=10, activation='relu')) # inner layer 1
net.add(nn.Dense(units=10, activation='relu')) # inner layer 2
net.add(nn.Dense(units=1)) # output layer: notice, it must have only 1 neuron

net.initialize(mx.init.Xavier())
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -342,13 +342,12 @@ Apache MXNet uses lazy evaluation to achieve superior performance. The Python th

## PyTorch module and Gluon blocks

### For new block definition, gluon needs name_scope
### For new block definition, gluon is similar to PyTorch

`name_scope` coerces Gluon to give each parameter an appropriate name, indicating which model it belongs to.

| Function | PyTorch | MXNet Gluon |
|------------------------|-----------------------------------|----------------------------------------------------------------------------|
| New block definition | `class Net(torch.nn.Module):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`def __init__(self, D_in, D_out):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`super(Net, self).__init__()`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`self.linear = torch.nn.Linear(D_in, D_out)`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`def forward(self, x):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`return self.linear(x)` | `class Net(mx.gluon.Block):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`def __init__(self, D_in, D_out):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`super(Net, self).__init__()`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`with self.name_scope():`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`self.dense=mx.gluon.nn.Dense(D_out, in_units=D_in)`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`def forward(self, x):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`return self.dense(x)` |
| New block definition | `class Net(torch.nn.Module):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`def __init__(self, D_in, D_out):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`super(Net, self).__init__()`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`self.linear = torch.nn.Linear(D_in, D_out)`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`def forward(self, x):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`return self.linear(x)` | `class Net(mx.gluon.Block):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`def __init__(self, D_in, D_out):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`super(Net, self).__init__()`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`self.dense=mx.gluon.nn.Dense(D_out, in_units=D_in)`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`def forward(self, x):`<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`return self.dense(x)` |

### Parameter and Initializer

Expand All @@ -374,15 +373,15 @@ Instead of explicitly declaring the number of inputs to a layer, we can simply s

| Function | PyTorch | MXNet Gluon |
|------------------------|-----------------------------------|----------------------------------------------------------------------------|
| partial-shape <br/> hybridized | Not Available | `net = mx.gluon.nn.HybridSequential()`<br/>`with net.name_scope():`<br/>&nbsp;&nbsp;&nbsp;&nbsp;`net.add(mx.gluon.nn.Dense(10))`<br/>`net.hybridize()` |
| partial-shape <br/> hybridized | Not Available | `net = mx.gluon.nn.HybridSequential()`<br/>`net.add(mx.gluon.nn.Dense(10))`<br/>`net.hybridize()` |

### SymbolBlock

SymbolBlock can construct block from symbol. This is useful for using pre-trained models as feature extractors.

| Function | PyTorch | MXNet Gluon |
|------------------------|-----------------------------------|----------------------------------------------------------------------------|
| SymbolBlock | Not Available | `alexnet = mx.gluon.model_zoo.vision.alexnet(pretrained=True, prefix='model_')`<br/>`out = alexnet(inputs)`<br/>`internals = out.get_internals()`<br/>`outputs = [internals['model_dense0_relu_fwd_output']]`<br/>`feat_model = gluon.SymbolBlock(outputs, inputs, params=alexnet.collect_params())` |
| SymbolBlock | Not Available | `alexnet = mx.gluon.model_zoo.vision.alexnet(pretrained=True)`<br/>`out = alexnet(inputs)`<br/>`internals = out.get_internals()`<br/>`outputs = [internals['model_dense0_relu_fwd_output']]`<br/>`feat_model = gluon.SymbolBlock(outputs, inputs, params=alexnet.collect_params())` |

## PyTorch optimizer vs Gluon Trainer
### For Gluon API calling zero_grad is not necessary most of the time
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,8 @@ class MyDense(nn.Block):
# in_units: the number of inputs in this layer
super(MyDense, self).__init__(**kwargs)
self.weight = self.params.get('weight', shape=(in_units, units))
self.bias = self.params.get('bias', shape=(units,))
self.weight = gluon.Parameter('weight', shape=(in_units, units))
self.bias = gluon.Parameter('bias', shape=(units,))
def forward(self, x):
linear = nd.dot(x, self.weight.data()) + self.bias.data()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,9 @@ Below is an example of how to create a simple neural network with a custom layer

```python
net = gluon.nn.HybridSequential() # Define a Neural Network as a sequence of hybrid blocks
with net.name_scope(): # Used to disambiguate saving and loading net parameters
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer()) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer()) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons


net.initialize(mx.init.Xavier(magnitude=2.24)) # Initialize parameters of all layers
Expand Down Expand Up @@ -134,12 +133,11 @@ class NormalizationHybridLayer(gluon.HybridBlock):
def __init__(self, hidden_units, scales):
super(NormalizationHybridLayer, self).__init__()

with self.name_scope():
self.weights = self.params.get('weights',
shape=(hidden_units, 0),
allow_deferred_init=True)
self.weights = gluon.Parameter('weights',
shape=(hidden_units, 0),
allow_deferred_init=True)

self.scales = self.params.get('scales',
self.scales = gluon.Parameter('scales',
shape=scales.shape,
init=mx.init.Constant(scales.asnumpy()),
differentiable=False)
Expand All @@ -157,14 +155,13 @@ In the example above 2 set of parameters are defined:

Notice a few aspects of this code:

+ `name_scope()` method is used to add a prefix to parameter names during saving and loading
+ Shape is not provided when creating `weights`. Instead it is going to be infered from the shape of the input
+ `Scales` parameter is initialized and marked as `differentiable=False`.
+ `F` backend is used for all calculations
+ The calculation of dot product is done using `F.FullyConnected()` method instead of `F.dot()` method. The one was chosen over another because the former supports automatic infering shapes of inputs while the latter doesn’t. This is extremely important to know, if one doesn’t want to hard code all the shapes. The best way to learn what operators supports automatic inference of input shapes at the moment is browsing C++ implementation of operators to see if one uses a method `SHAPE_ASSIGN_CHECK(*in_shape, fullc::kWeight, Shape2(param.num_hidden, num_input));`
+ `hybrid_forward()` method signature has changed. It accepts two new arguments: `weights` and `scales`.

The last peculiarity is due to support of imperative and symbolic programming by `HybridBlock`. During training phase, parameters are passed to the layer by Apache MxNet framework as additional arguments to the method, because they might need to be converted to a `Symbol` depending on if the layer was hybridized. One shouldn’t use `self.weights` and `self.scales` or `self.params.get` in `hybrid_forward` except to get shapes of parameters.
The last peculiarity is due to support of imperative and symbolic programming by `HybridBlock`. During training phase, parameters are passed to the layer by Apache MxNet framework as additional arguments to the method, because they might need to be converted to a `Symbol` depending on if the layer was hybridized. One shouldn’t use `self.weights` and `self.scales` in `hybrid_forward` except to get shapes of parameters.

Running forward pass on this network is very similar to the previous example, so instead of just doing one forward pass, let’s run whole training for a few epochs to show that `scales` parameter doesn’t change during the training while `weights` parameter is changing.

Expand All @@ -180,11 +177,10 @@ def print_params(title, net):
print('{} = {}\n'.format(key, value.data()))

net = gluon.nn.HybridSequential() # Define a Neural Network as a sequence of hybrid blocks
with net.name_scope(): # Used to disambiguate saving and loading net parameters
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer(hidden_units=5,
scales = nd.array([2]))) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons
net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer(hidden_units=5,
scales = nd.array([2]))) # Add our custom layer
net.add(Dense(1)) # Add Dense layer with 1 neurons


net.initialize(mx.init.Xavier(magnitude=2.24)) # Initialize parameters of all layers
Expand Down
Loading

0 comments on commit 4d614bd

Please sign in to comment.