A collection of weights I've trained comparing various types of SE-like (SE, ECA, GC, etc), self-attention (bottleneck, halo, lambda) blocks, and related non-attn baselines.
ResNet-26-T series
- [2, 2, 2, 2] repeat Bottlneck block ResNet architecture
- ReLU activations
- 3 layer stem with 24, 32, 64 chs, max-pool
- avg pool in shortcut downsample
- self-attn blocks replace 3x3 in both blocks for last stage, and second block of penultimate stage
model |
top1 |
top1_err |
top5 |
top5_err |
param_count |
img_size |
cropt_pct |
interpolation |
botnet26t_256 |
79.246 |
20.754 |
94.53 |
5.47 |
12.49 |
256 |
0.95 |
bicubic |
halonet26t |
79.13 |
20.87 |
94.314 |
5.686 |
12.48 |
256 |
0.95 |
bicubic |
lambda_resnet26t |
79.112 |
20.888 |
94.59 |
5.41 |
10.96 |
256 |
0.94 |
bicubic |
lambda_resnet26rpt_256 |
78.964 |
21.036 |
94.428 |
5.572 |
10.99 |
256 |
0.94 |
bicubic |
resnet26t |
77.872 |
22.128 |
93.834 |
6.166 |
16.01 |
256 |
0.94 |
bicubic |
Details:
- HaloNet - 8 pixel block size, 2 pixel halo (overlap), relative position embedding
- BotNet - relative position embedding
- Lambda-ResNet-26-T - 3d lambda conv, kernel = 9
- Lambda-ResNet-26-RPT - relative position embedding
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
model |
infer_samples_per_sec |
infer_step_time |
infer_batch_size |
infer_img_size |
train_samples_per_sec |
train_step_time |
train_batch_size |
train_img_size |
param_count |
resnet26t |
2967.55 |
86.252 |
256 |
256 |
857.62 |
297.984 |
256 |
256 |
16.01 |
botnet26t_256 |
2642.08 |
96.879 |
256 |
256 |
809.41 |
315.706 |
256 |
256 |
12.49 |
halonet26t |
2601.91 |
98.375 |
256 |
256 |
783.92 |
325.976 |
256 |
256 |
12.48 |
lambda_resnet26t |
2354.1 |
108.732 |
256 |
256 |
697.28 |
366.521 |
256 |
256 |
10.96 |
lambda_resnet26rpt_256 |
1847.34 |
138.563 |
256 |
256 |
644.84 |
197.892 |
128 |
256 |
10.99 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
model |
infer_samples_per_sec |
infer_step_time |
infer_batch_size |
infer_img_size |
train_samples_per_sec |
train_step_time |
train_batch_size |
train_img_size |
param_count |
resnet26t |
3691.94 |
69.327 |
256 |
256 |
1188.17 |
214.96 |
256 |
256 |
16.01 |
botnet26t_256 |
3291.63 |
77.76 |
256 |
256 |
1126.68 |
226.653 |
256 |
256 |
12.49 |
halonet26t |
3230.5 |
79.232 |
256 |
256 |
1077.82 |
236.934 |
256 |
256 |
12.48 |
lambda_resnet26rpt_256 |
2324.15 |
110.133 |
256 |
256 |
864.42 |
147.485 |
128 |
256 |
10.99 |
lambda_resnet26t |
Not Supported |
|
|
|
|
|
|
|
|
ResNeXT-26-T series
- [2, 2, 2, 2] repeat Bottlneck block ResNeXt architectures
- SiLU activations
- grouped 3x3 convolutions in bottleneck, 32 channels per group
- 3 layer stem with 24, 32, 64 chs, max-pool
- avg pool in shortcut downsample
- channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
- when active, self-attn blocks replace 3x3 conv in both blocks for last stage, and second block of penultimate stage
model |
top1 |
top1_err |
top5 |
top5_err |
param_count |
img_size |
cropt_pct |
interpolation |
eca_halonext26ts |
79.484 |
20.516 |
94.600 |
5.400 |
10.76 |
256 |
0.94 |
bicubic |
eca_botnext26ts_256 |
79.270 |
20.730 |
94.594 |
5.406 |
10.59 |
256 |
0.95 |
bicubic |
bat_resnext26ts |
78.268 |
21.732 |
94.1 |
5.9 |
10.73 |
256 |
0.9 |
bicubic |
seresnext26ts |
77.852 |
22.148 |
93.784 |
6.216 |
10.39 |
256 |
0.9 |
bicubic |
gcresnext26ts |
77.804 |
22.196 |
93.824 |
6.176 |
10.48 |
256 |
0.9 |
bicubic |
eca_resnext26ts |
77.446 |
22.554 |
93.57 |
6.43 |
10.3 |
256 |
0.9 |
bicubic |
resnext26ts |
76.764 |
23.236 |
93.136 |
6.864 |
10.3 |
256 |
0.9 |
bicubic |
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
model |
infer_samples_per_sec |
infer_step_time |
infer_batch_size |
infer_img_size |
train_samples_per_sec |
train_step_time |
train_batch_size |
train_img_size |
param_count |
resnext26ts |
3006.57 |
85.134 |
256 |
256 |
864.4 |
295.646 |
256 |
256 |
10.3 |
seresnext26ts |
2931.27 |
87.321 |
256 |
256 |
836.92 |
305.193 |
256 |
256 |
10.39 |
eca_resnext26ts |
2925.47 |
87.495 |
256 |
256 |
837.78 |
305.003 |
256 |
256 |
10.3 |
gcresnext26ts |
2870.01 |
89.186 |
256 |
256 |
818.35 |
311.97 |
256 |
256 |
10.48 |
eca_botnext26ts_256 |
2652.03 |
96.513 |
256 |
256 |
790.43 |
323.257 |
256 |
256 |
10.59 |
eca_halonext26ts |
2593.03 |
98.705 |
256 |
256 |
766.07 |
333.541 |
256 |
256 |
10.76 |
bat_resnext26ts |
2469.78 |
103.64 |
256 |
256 |
697.21 |
365.964 |
256 |
256 |
10.73 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
NOTE: there are performance issues with certain grouped conv configs with channels last layout, backwards pass in particular is really slow. Also causing issues for RegNet and NFNet networks.
model |
infer_samples_per_sec |
infer_step_time |
infer_batch_size |
infer_img_size |
train_samples_per_sec |
train_step_time |
train_batch_size |
train_img_size |
param_count |
resnext26ts |
3952.37 |
64.755 |
256 |
256 |
608.67 |
420.049 |
256 |
256 |
10.3 |
eca_resnext26ts |
3815.77 |
67.074 |
256 |
256 |
594.35 |
430.146 |
256 |
256 |
10.3 |
seresnext26ts |
3802.75 |
67.304 |
256 |
256 |
592.82 |
431.14 |
256 |
256 |
10.39 |
gcresnext26ts |
3626.97 |
70.57 |
256 |
256 |
581.83 |
439.119 |
256 |
256 |
10.48 |
eca_botnext26ts_256 |
3515.84 |
72.8 |
256 |
256 |
611.71 |
417.862 |
256 |
256 |
10.59 |
eca_halonext26ts |
3410.12 |
75.057 |
256 |
256 |
597.52 |
427.789 |
256 |
256 |
10.76 |
bat_resnext26ts |
3053.83 |
83.811 |
256 |
256 |
533.23 |
478.839 |
256 |
256 |
10.73 |
ResNet-33-T series.
- [2, 3, 3, 2] repeat Bottlneck block ResNet architecture
- SiLU activations
- 3 layer stem with 24, 32, 64 chs, no max-pool, 1st and 3rd conv stride 2
- avg pool in shortcut downsample
- channel attn (active in non self-attn blocks) between 3x3 and last 1x1 conv
- when active, self-attn blocks replace 3x3 conv last block of stage 2 and 3, and both blocks of final stage
- FC 1x1 conv between last block and classifier
The 33-layer models have an extra 1x1 FC layer between last conv block and classifier. There is both a non-attenion 33 layer baseline and a 32 layer without the extra FC.
model |
top1 |
top1_err |
top5 |
top5_err |
param_count |
img_size |
cropt_pct |
interpolation |
sehalonet33ts |
80.986 |
19.014 |
95.272 |
4.728 |
13.69 |
256 |
0.94 |
bicubic |
seresnet33ts |
80.388 |
19.612 |
95.108 |
4.892 |
19.78 |
256 |
0.94 |
bicubic |
eca_resnet33ts |
80.132 |
19.868 |
95.054 |
4.946 |
19.68 |
256 |
0.94 |
bicubic |
gcresnet33ts |
79.99 |
20.01 |
94.988 |
5.012 |
19.88 |
256 |
0.94 |
bicubic |
resnet33ts |
79.352 |
20.648 |
94.596 |
5.404 |
19.68 |
256 |
0.94 |
bicubic |
resnet32ts |
79.028 |
20.972 |
94.444 |
5.556 |
17.96 |
256 |
0.94 |
bicubic |
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
model |
infer_samples_per_sec |
infer_step_time |
infer_batch_size |
infer_img_size |
train_samples_per_sec |
train_step_time |
train_batch_size |
train_img_size |
param_count |
resnet32ts |
2502.96 |
102.266 |
256 |
256 |
733.27 |
348.507 |
256 |
256 |
17.96 |
resnet33ts |
2473.92 |
103.466 |
256 |
256 |
725.34 |
352.309 |
256 |
256 |
19.68 |
seresnet33ts |
2400.18 |
106.646 |
256 |
256 |
695.19 |
367.413 |
256 |
256 |
19.78 |
eca_resnet33ts |
2394.77 |
106.886 |
256 |
256 |
696.93 |
366.637 |
256 |
256 |
19.68 |
gcresnet33ts |
2342.81 |
109.257 |
256 |
256 |
678.22 |
376.404 |
256 |
256 |
19.88 |
sehalonet33ts |
1857.65 |
137.794 |
256 |
256 |
577.34 |
442.545 |
256 |
256 |
13.69 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
model |
infer_samples_per_sec |
infer_step_time |
infer_batch_size |
infer_img_size |
train_samples_per_sec |
train_step_time |
train_batch_size |
train_img_size |
param_count |
resnet32ts |
3306.22 |
77.416 |
256 |
256 |
1012.82 |
252.158 |
256 |
256 |
17.96 |
resnet33ts |
3257.59 |
78.573 |
256 |
256 |
1002.38 |
254.778 |
256 |
256 |
19.68 |
seresnet33ts |
3128.08 |
81.826 |
256 |
256 |
950.27 |
268.581 |
256 |
256 |
19.78 |
eca_resnet33ts |
3127.11 |
81.852 |
256 |
256 |
948.84 |
269.123 |
256 |
256 |
19.68 |
gcresnet33ts |
2984.87 |
85.753 |
256 |
256 |
916.98 |
278.169 |
256 |
256 |
19.88 |
sehalonet33ts |
2188.23 |
116.975 |
256 |
256 |
711.63 |
179.03 |
128 |
256 |
13.69 |
ResNet-50(ish) models
In Progress
RegNet"Z" series
- RegNetZ inspired architecture, inverted bottleneck, SE attention, pre-classifier FC, essentially an EfficientNet w/ grouped conv instead of depthwise
- b, c, and d are three different sizes I put together to cover differing flop ranges, not based on the paper (https://arxiv.org/abs/2103.06877) or a search process
- for comparison to RegNetY and paper RegNetZ models, at 224x224 b,c, and d models are 1.45, 1.92, and 4.58 GMACs respectively, b, and c are trained at 256 here so higher than that (see tables)
haloregnetz_c
uses halo attention for all of last stage, and interleaved every 3 (for 4) of penultimate stage
- b, c variants use a stem / 1st stage like the paper, d uses a 3-deep tiered stem with 2-1-2 striding
ImageNet-1k validation at train resolution
model |
top1 |
top1_err |
top5 |
top5_err |
param_count |
img_size |
cropt_pct |
interpolation |
regnetz_d |
83.422 |
16.578 |
96.636 |
3.364 |
27.58 |
256 |
0.95 |
bicubic |
regnetz_c |
82.164 |
17.836 |
96.058 |
3.942 |
13.46 |
256 |
0.94 |
bicubic |
haloregnetz_b |
81.058 |
18.942 |
95.2 |
4.8 |
11.68 |
224 |
0.94 |
bicubic |
regnetz_b |
79.868 |
20.132 |
94.988 |
5.012 |
9.72 |
224 |
0.94 |
bicubic |
ImageNet-1k validation at optimal test res
model |
top1 |
top1_err |
top5 |
top5_err |
param_count |
img_size |
cropt_pct |
interpolation |
regnetz_d |
84.04 |
15.96 |
96.87 |
3.13 |
27.58 |
320 |
0.95 |
bicubic |
regnetz_c |
82.516 |
17.484 |
96.356 |
3.644 |
13.46 |
320 |
0.94 |
bicubic |
haloregnetz_b |
81.058 |
18.942 |
95.2 |
4.8 |
11.68 |
224 |
0.94 |
bicubic |
regnetz_b |
80.728 |
19.272 |
95.47 |
4.53 |
9.72 |
288 |
0.94 |
bicubic |
Benchmark - RTX 3090 - AMP - NCHW - NGC 21.09
model |
infer_samples_per_sec |
infer_step_time |
infer_batch_size |
infer_img_size |
infer_GMACs |
train_samples_per_sec |
train_step_time |
train_batch_size |
train_img_size |
param_count |
regnetz_b |
2703.42 |
94.68 |
256 |
224 |
1.45 |
764.85 |
333.348 |
256 |
224 |
9.72 |
haloregnetz_b |
2086.22 |
122.695 |
256 |
224 |
1.88 |
620.1 |
411.415 |
256 |
224 |
11.68 |
regnetz_c |
1653.19 |
154.836 |
256 |
256 |
2.51 |
459.41 |
277.268 |
128 |
256 |
13.46 |
regnetz_d |
1060.91 |
241.284 |
256 |
256 |
5.98 |
296.51 |
430.143 |
128 |
256 |
27.58 |
Benchmark - RTX 3090 - AMP - NHWC - NGC 21.09
NOTE: channels last layout is painfully slow for backward pass here due to some sort of cuDNN issue
model |
infer_samples_per_sec |
infer_step_time |
infer_batch_size |
infer_img_size |
infer_GMACs |
train_samples_per_sec |
train_step_time |
train_batch_size |
train_img_size |
param_count |
regnetz_b |
4152.59 |
61.634 |
256 |
224 |
1.45 |
399.37 |
639.572 |
256 |
224 |
9.72 |
haloregnetz_b |
2770.78 |
92.378 |
256 |
224 |
1.88 |
364.22 |
701.386 |
256 |
224 |
11.68 |
regnetz_c |
2512.4 |
101.878 |
256 |
256 |
2.51 |
376.72 |
338.372 |
128 |
256 |
13.46 |
regnetz_d |
1456.05 |
175.8 |
256 |
256 |
5.98 |
111.32 |
1148.279 |
128 |
256 |
27.58 |