ResNet中的跨层连接设计引申出了数个后续工作。本节我们介绍其中的一个:稠密连接网络(DenseNet) [1]。 它与ResNet的主要区别如图5.10所示。
DenseNet的主要构建模块是稠密块(dense block)和过渡层(transition layer)。前者定义了输入和输出是如何连结的,后者则用来控制通道数,使之不过大。
DenseNet使用了ResNet改良版的“批量归一化、激活和卷积”结构,我们首先在BottleNeck
函数里实现这个结构。在前向计算时,我们将每块的输入和输出在通道维上连结。
import tensorflow as tf
class BottleNeck(tf.keras.layers.Layer):
def __init__(self, growth_rate, drop_rate):
super(BottleNeck, self).__init__()
self.bn1 = tf.keras.layers.BatchNormalization()
self.conv1 = tf.keras.layers.Conv2D(filters=4 * growth_rate,
kernel_size=(1, 1),
strides=1,
padding="same")
self.bn2 = tf.keras.layers.BatchNormalization()
self.conv2 = tf.keras.layers.Conv2D(filters=growth_rate,
kernel_size=(3, 3),
strides=1,
padding="same")
self.dropout = tf.keras.layers.Dropout(rate=drop_rate)
self.listLayers = [self.bn1,
tf.keras.layers.Activation("relu"),
self.conv1,
self.bn2,
tf.keras.layers.Activation("relu"),
self.conv2,
self.dropout]
def call(self, x):
y = x
for layer in self.listLayers.layers:
y = layer(y)
y = tf.keras.layers.concatenate([x,y], axis=-1)
return y
稠密块由多个BottleNeck
组成,每块使用相同的输出通道数。
class DenseBlock(tf.keras.layers.Layer):
def __init__(self, num_layers, growth_rate, drop_rate=0.5):
super(DenseBlock, self).__init__()
self.num_layers = num_layers
self.growth_rate = growth_rate
self.drop_rate = drop_rate
self.listLayers = []
for _ in range(num_layers):
self.listLayers.append(BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate))
def call(self, x):
for layer in self.listLayers.layers:
x = layer(x)
return x
在下面的例子中,我们定义一个有2个输出通道数为10的卷积块。使用通道数为3的输入时,我们会得到通道数为$3+2\times 10=23$的输出。卷积块的通道数控制了输出通道数相对于输入通道数的增长,因此也被称为增长率(growth rate)。
blk = DenseBlock(2, 10)
X = tf.random.uniform((4, 8, 8,3))
Y = blk(X)
print(Y.shape)
由于每个稠密块都会带来通道数的增加,使用过多则会带来过于复杂的模型。过渡层用来控制模型复杂度。它通过$1\times1$卷积层来减小通道数,并使用步幅为2的平均池化层减半高和宽,从而进一步降低模型复杂度。
class TransitionLayer(tf.keras.layers.Layer):
def __init__(self, out_channels):
super(TransitionLayer, self).__init__()
self.bn = tf.keras.layers.BatchNormalization()
self.conv = tf.keras.layers.Conv2D(filters=out_channels,
kernel_size=(1, 1),
strides=1,
padding="same")
self.pool = tf.keras.layers.MaxPool2D(pool_size=(2, 2),
strides=2,
padding="same")
def call(self, inputs):
x = self.bn(inputs)
x = tf.keras.activations.relu(x)
x = self.conv(x)
x = self.pool(x)
return x
对上一个例子中稠密块的输出使用通道数为10的过渡层。此时输出的通道数减为10,高和宽均减半。
blk = TransitionLayer(10)
blk(Y).shape#TensorShape([4, 4, 4, 10])
我们来构造DenseNet模型。DenseNet首先使用同ResNet一样的单卷积层和最大池化层。类似于ResNet接下来使用的4个残差块,DenseNet使用的是4个稠密块。同ResNet一样,我们可以设置每个稠密块使用多少个卷积层。这里我们设成4,从而与上一节的ResNet-18保持一致。稠密块里的卷积层通道数(即增长率)设为32,所以每个稠密块将增加128个通道。
ResNet里通过步幅为2的残差块在每个模块之间减小高和宽。这里我们则使用过渡层来减半高和宽,并减半通道数。
class DenseNet(tf.keras.Model):
def __init__(self, num_init_features, growth_rate, block_layers, compression_rate, drop_rate):
super(DenseNet, self).__init__()
self.conv = tf.keras.layers.Conv2D(filters=num_init_features,
kernel_size=(7, 7),
strides=2,
padding="same")
self.bn = tf.keras.layers.BatchNormalization()
self.pool = tf.keras.layers.MaxPool2D(pool_size=(3, 3),
strides=2,
padding="same")
self.num_channels = num_init_features
self.dense_block_1 = DenseBlock(num_layers=block_layers[0], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[0]
self.num_channels = compression_rate * self.num_channels
self.transition_1 = TransitionLayer(out_channels=int(self.num_channels))
self.dense_block_2 = DenseBlock(num_layers=block_layers[1], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[1]
self.num_channels = compression_rate * self.num_channels
self.transition_2 = TransitionLayer(out_channels=int(self.num_channels))
self.dense_block_3 = DenseBlock(num_layers=block_layers[2], growth_rate=growth_rate, drop_rate=drop_rate)
self.num_channels += growth_rate * block_layers[2]
self.num_channels = compression_rate * self.num_channels
self.transition_3 = TransitionLayer(out_channels=int(self.num_channels))
self.dense_block_4 = DenseBlock(num_layers=block_layers[3], growth_rate=growth_rate, drop_rate=drop_rate)
self.avgpool = tf.keras.layers.GlobalAveragePooling2D()
self.fc = tf.keras.layers.Dense(units=10,
activation=tf.keras.activations.softmax)
def call(self, inputs):
x = self.conv(inputs)
x = self.bn(x)
x = tf.keras.activations.relu(x)
x = self.pool(x)
x = self.dense_block_1(x)
x = self.transition_1(x)
x = self.dense_block_2(x)
x = self.transition_2(x)
x = self.dense_block_3(x)
x = self.transition_3(x,)
x = self.dense_block_4(x)
x = self.avgpool(x)
x = self.fc(x)
return x
def densenet():
return DenseNet(num_init_features=64, growth_rate=32, block_layers=[4,4,4,4], compression_rate=0.5, drop_rate=0.5)
mynet=densenet()
我们尝试打印每个子模块的输出维度确保网络无误:
X = tf.random.uniform(shape=(1, 96, 96 , 1))
for layer in mynet.layers:
X = layer(X)
print(layer.name, 'output shape:\t', X.shape)
输出:
conv2d_45 output shape: (1, 48, 48, 64)
batch_normalization_45 output shape: (1, 48, 48, 64)
max_pooling2d_5 output shape: (1, 24, 24, 64)
dense_block_6 output shape: (1, 24, 24, 192)
transition_layer_4 output shape: (1, 12, 12, 96)
dense_block_7 output shape: (1, 12, 12, 224)
transition_layer_5 output shape: (1, 6, 6, 112)
dense_block_8 output shape: (1, 6, 6, 240)
transition_layer_6 output shape: (1, 3, 3, 120)
dense_block_9 output shape: (1, 3, 3, 248)
global_average_pooling2d_1 output shape: (1, 248)
dense output shape: (1, 10)
由于这里使用了比较深的网络,本节里我们将输入高和宽从224降到96来简化计算。
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255
mynet.compile(loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
history = mynet.fit(x_train, y_train,
batch_size=64,
epochs=5,
validation_split=0.2)
test_scores = mynet.evaluate(x_test, y_test, verbose=2)
输出:
Train on 48000 samples, validate on 12000 samples
Epoch 1/5
48000/48000 [==============================] - 244s 5ms/sample - loss: 0.6859 - accuracy: 0.7465 - val_loss: 0.4778 - val_accuracy: 0.8270
Epoch 2/5
48000/48000 [==============================] - 267s 6ms/sample - loss: 0.3933 - accuracy: 0.8541 - val_loss: 0.3478 - val_accuracy: 0.8716
Epoch 3/5
48000/48000 [==============================] - 263s 5ms/sample - loss: 0.3312 - accuracy: 0.8783 - val_loss: 0.3403 - val_accuracy: 0.8720
Epoch 4/5
48000/48000 [==============================] - 240s 5ms/sample - loss: 0.3013 - accuracy: 0.8888 - val_loss: 0.3079 - val_accuracy: 0.8842
Epoch 5/5
48000/48000 [==============================] - 241s 5ms/sample - loss: 0.2783 - accuracy: 0.8974 - val_loss: 0.2962 - val_accuracy: 0.8913
10000/1 - 11s - loss: 0.2877 - accuracy: 0.8848
- 在跨层连接上,不同于ResNet中将输入与输出相加,DenseNet在通道维上连结输入与输出。
- DenseNet的主要构建模块是稠密块和过渡层。
[1] Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, No. 2).
注:除代码外本节与原书此节基本相同,原书传送门