forked from ilarinieminen/SOM-Toolbox
-
Notifications
You must be signed in to change notification settings - Fork 0
/
som_demo1.m
295 lines (207 loc) · 8.86 KB
/
som_demo1.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
%SOM_DEMO1 Basic properties and behaviour of the Self-Organizing Map.
% Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto
% http://www.cis.hut.fi/projects/somtoolbox/
% Version 1.0beta juuso 071197
% Version 2.0beta juuso 030200
clf reset;
figure(gcf)
echo on
clc
% ==========================================================
% SOM_DEMO1 - BEHAVIOUR AND PROPERTIES OF SOM
% ==========================================================
% som_make - Create, initialize and train a SOM.
% som_randinit - Create and initialize a SOM.
% som_lininit - Create and initialize a SOM.
% som_seqtrain - Train a SOM.
% som_batchtrain - Train a SOM.
% som_bmus - Find best-matching units (BMUs).
% som_quality - Measure quality of SOM.
% SELF-ORGANIZING MAP (SOM):
% A self-organized map (SOM) is a "map" of the training data,
% dense where there is a lot of data and thin where the data
% density is low.
% The map constitutes of neurons located on a regular map grid.
% The lattice of the grid can be either hexagonal or rectangular.
subplot(1,2,1)
som_cplane('hexa',[10 15],'none')
title('Hexagonal SOM grid')
subplot(1,2,2)
som_cplane('rect',[10 15],'none')
title('Rectangular SOM grid')
% Each neuron (hexagon on the left, rectangle on the right) has an
% associated prototype vector. After training, neighboring neurons
% have similar prototype vectors.
% The SOM can be used for data visualization, clustering (or
% classification), estimation and a variety of other purposes.
pause % Strike any key to continue...
clf
clc
% INITIALIZE AND TRAIN THE SELF-ORGANIZING MAP
% ============================================
% Here are 300 data points sampled from the unit square:
D = rand(300,2);
% The map will be a 2-dimensional grid of size 10 x 10.
msize = [10 10];
% SOM_RANDINIT and SOM_LININIT can be used to initialize the
% prototype vectors in the map. The map size is actually an
% optional argument. If omitted, it is determined automatically
% based on the amount of data vectors and the principal
% eigenvectors of the data set. Below, the random initialization
% algorithm is used.
sMap = som_randinit(D, 'msize', msize);
% Actually, each map unit can be thought as having two sets
% of coordinates:
% (1) in the input space: the prototype vectors
% (2) in the output space: the position on the map
% In the two spaces, the map looks like this:
subplot(1,3,1)
som_grid(sMap)
axis([0 11 0 11]), view(0,-90), title('Map in output space')
subplot(1,3,2)
plot(D(:,1),D(:,2),'+r'), hold on
som_grid(sMap,'Coord',sMap.codebook)
title('Map in input space')
% The black dots show positions of map units, and the gray lines
% show connections between neighboring map units. Since the map
% was initialized randomly, the positions in in the input space are
% completely disorganized. The red crosses are training data.
pause % Strike any key to train the SOM...
% During training, the map organizes and folds to the training
% data. Here, the sequential training algorithm is used:
sMap = som_seqtrain(sMap,D,'radius',[5 1],'trainlen',10);
subplot(1,3,3)
som_grid(sMap,'Coord',sMap.codebook)
hold on, plot(D(:,1),D(:,2),'+r')
title('Trained map')
pause % Strike any key to view more closely the training process...
clf
clc
% TRAINING THE SELF-ORGANIZING MAP
% ================================
% To get a better idea of what happens during training, let's look
% at how the map gradually unfolds and organizes itself. To make it
% even more clear, the map is now initialized so that it is away
% from the data.
sMap = som_randinit(D,'msize',msize);
sMap.codebook = sMap.codebook + 1;
subplot(1,2,1)
som_grid(sMap,'Coord',sMap.codebook)
hold on, plot(D(:,1),D(:,2),'+r'), hold off
title('Data and original map')
axis([0 2 0 2]);
% The training is based on two principles:
%
% Competitive learning: the prototype vector most similar to a
% data vector is modified so that it it is even more similar to
% it. This way the map learns the position of the data cloud.
%
% Cooperative learning: not only the most similar prototype
% vector, but also its neighbors on the map are moved towards the
% data vector. This way the map self-organizes.
pause % Strike any key to train the map...
echo off
subplot(1,2,2)
o = ones(5,1);
r = (1-(1:60)/60);
for i=1:60,
sMap = som_seqtrain(sMap,D,'tracking',0,...
'trainlen',5,'samples',...
'alpha',0.1*o,'radius',(4*r(i)+1)*o);
som_grid(sMap,'Coord',sMap.codebook)
hold on, plot(D(:,1),D(:,2),'+r'), hold off
title(sprintf('%d/300 training steps',5*i))
axis([0 2 0 2]);
drawnow
end
title('Sequential training after 300 steps')
echo on
pause % Strike any key to continue with 3D data...
clf
clc
% TRAINING DATA: THE UNIT CUBE
% ============================
% Above, the map dimension was equal to input space dimension: both
% were 2-dimensional. Typically, the input space dimension is much
% higher than the 2-dimensional map. In this case the map cannot
% follow perfectly the data set any more but must find a balance
% between two goals:
% - data representation accuracy
% - data set topology representation accuracy
% Here are 500 data points sampled from the unit cube:
D = rand(500,3);
subplot(1,3,1), plot3(D(:,1),D(:,2),D(:,3),'+r')
view(3), axis on, rotate3d on
title('Data')
% The ROTATE3D command enables you to rotate the picture by
% dragging the pointer above the picture with the leftmost mouse
% button pressed down.
pause % Strike any key to train the SOM...
clc
% DEFAULT TRAINING PROCEDURE
% ==========================
% Above, the initialization was done randomly and training was done
% with sequential training function (SOM_SEQTRAIN). By default, the
% initialization is linear, and batch training algorithm is
% used. In addition, the training is done in two phases: first with
% large neighborhood radius, and then finetuning with small radius.
% The function SOM_MAKE can be used to both initialize and train
% the map using default parameters:
pause % Strike any key to use SOM_MAKE...
sMap = som_make(D);
% Here, the linear initialization is done again, so that
% the results can be compared.
sMap0 = som_lininit(D);
subplot(1,3,2)
som_grid(sMap0,'Coord',sMap0.codebook,...
'Markersize',2,'Linecolor','k','Surf',sMap0.codebook(:,3))
axis([0 1 0 1 0 1]), view(-120,-25), title('After initialization')
subplot(1,3,3)
som_grid(sMap,'Coord',sMap.codebook,...
'Markersize',2,'Linecolor','k','Surf',sMap.codebook(:,3))
axis([0 1 0 1 0 1]), view(3), title('After training'), hold on
% Here you can see that the 2-dimensional map has folded into the
% 3-dimensional space in order to be able to capture the whole data
% space.
pause % Strike any key to evaluate the quality of maps...
clc
% BEST-MATCHING UNITS (BMU)
% =========================
% Before going to the quality, an important concept needs to be
% introduced: the Best-Matching Unit (BMU). The BMU of a data
% vector is the unit on the map whose model vector best resembles
% the data vector. In practise the similarity is measured as the
% minimum distance between data vector and each model vector on the
% map. The BMUs can be calculated using function SOM_BMUS. This
% function gives the index of the unit.
% Here the BMU is searched for the origin point (from the
% trained map):
bmu = som_bmus(sMap,[0 0 0]);
% Here the corresponding unit is shown in the figure. You can
% rotate the figure to see better where the BMU is.
co = sMap.codebook(bmu,:);
text(co(1),co(2),co(3),'BMU','Fontsize',20)
plot3([0 co(1)],[0 co(2)],[0 co(3)],'ro-')
pause % Strike any key to analyze map quality...
clc
% SELF-ORGANIZING MAP QUALITY
% ===========================
% The maps have two primary quality properties:
% - data representation accuracy
% - data set topology representation accuracy
% The former is usually measured using average quantization error
% between data vectors and their BMUs on the map. For the latter
% several measures have been proposed, e.g. the topographic error
% measure: percentage of data vectors for which the first- and
% second-BMUs are not adjacent units.
% Both measures have been implemented in the SOM_QUALITY function.
% Here are the quality measures for the trained map:
[q,t] = som_quality(sMap,D)
% And here for the initial map:
[q0,t0] = som_quality(sMap0,D)
% As can be seen, by folding the SOM has reduced the average
% quantization error, but on the other hand the topology
% representation capability has suffered. By using a larger final
% neighborhood radius in the training, the map becomes stiffer and
% preserves the topology of the data set better.
echo off