-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathclean_artifacts.m
390 lines (362 loc) · 19.6 KB
/
clean_artifacts.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
function [EEG,HP,BUR,removed_channels] = clean_artifacts(EEG,varargin)
% All-in-one function for artifact removal, including ASR.
% [EEG,HP,BUR] = clean_artifacts(EEG, Options...)
%
% This function removes flatline channels, low-frequency drifts, noisy channels, short-time bursts
% and incompletely repaird segments from the data. Tip: Any of the core parameters can also be
% passed in as [] to use the respective default of the underlying functions, or as 'off' to disable
% it entirely.
%
% Hopefully parameter tuning should be the exception when using this function -- however, there are
% 3 parameters governing how aggressively bad channels, bursts, and irrecoverable time windows are
% being removed, plus several detail parameters that only need tuning under special circumstances.
%
% Notes:
% * This function uses the Signal Processing toolbox for pre- and post-processing of the data
% (removing drifts, channels and time windows); the core ASR method (clean_asr) does not require
% this toolbox but you will need high-pass filtered data if you use it directly.
% * By default this function will identify subsets of clean data from the given recording to
% enhance the robustness of the ASR calibration phase to strongly contaminated data; this uses
% the Statistics toolbox, but can be skipped/bypassed if needed (see documentation).
%
% In:
% EEG : Raw continuous EEG recording to clean up (as EEGLAB dataset structure).
%
%
% NOTE: The following parameters are the core parameters of the cleaning procedure; they should be
% passed in as Name-Value Pairs. If the method removes too many (or too few) channels, time
% windows, or general high-amplitude ("burst") artifacts, you will want to tune these values.
% Hopefully you only need to do this in rare cases.
%
% ChannelCriterion : Minimum channel correlation. If a channel is correlated at less than this
% value to an estimate based on other channels, it is considered abnormal in
% the given time window. This method requires that channel locations are
% available and roughly correct; otherwise a fallback criterion will be used.
% (default: 0.85)
%
% LineNoiseCriterion : If a channel has more line noise relative to its signal than this value, in
% standard deviations based on the total channel population, it is considered
% abnormal. (default: 4)
%
% BurstCriterion : Standard deviation cutoff for removal of bursts (via ASR). Data portions whose
% variance is larger than this threshold relative to the calibration data are
% considered missing data and will be removed. The most aggressive value that can
% be used without losing much EEG is 3. For new users it is recommended to at
% first visually inspect the difference between the original and cleaned data to
% get a sense of the removed content at various levels. An agressive value is 5
% and a quite conservative value is 20. Default: 5 (from the GUI, default is 20).
%
% WindowCriterion : Criterion for removing time windows that were not repaired completely. This may
% happen if the artifact in a window was composed of too many simultaneous
% uncorrelated sources (for example, extreme movements such as jumps). This is
% the maximum fraction of contaminated channels that are tolerated in the final
% output data for each considered window. Generally a lower value makes the
% criterion more aggressive. Default: 0.25. Reasonable range: 0.05 (very
% aggressive) to 0.3 (very lax).
%
% Highpass : Transition band for the initial high-pass filter in Hz. This is formatted as
% [transition-start, transition-end]. Default: [0.25 0.75].
%
% NOTE: The following are detail parameters that may be tuned if one of the criteria does
% not seem to be doing the right thing. These basically amount to side assumptions about the
% data that usually do not change much across recordings, but sometimes do.
%
% ChannelCriterionMaxBadTime : This is the maximum tolerated fraction of the recording duration
% during which a channel may be flagged as "bad" without being
% removed altogether. Generally a lower (shorter) value makes the
% criterion more aggresive. Reasonable range: 0.15 (very aggressive)
% to 0.6 (very lax). Default: 0.5.
%
% BurstCriterionRefMaxBadChns: If a number is passed in here, the ASR method will be calibrated based
% on sufficiently clean data that is extracted first from the
% recording that is then processed with ASR. This number is the
% maximum tolerated fraction of "bad" channels within a given time
% window of the recording that is considered acceptable for use as
% calibration data. Any data windows within the tolerance range are
% then used for calibrating the threshold statistics. Instead of a
% number one may also directly pass in a data set that contains
% calibration data (for example a minute of resting EEG).
%
% If this is set to 'off', all data is used for calibration. This will
% work as long as the fraction of contaminated data is lower than the
% the breakdown point of the robust statistics in the ASR
% calibration (50%, where 30% of clearly recognizable artifacts is a
% better estimate of the practical breakdown point).
%
% A lower value makes this criterion more aggressive. Reasonable
% range: 0.05 (very aggressive) to 0.3 (quite lax). If you have lots
% of little glitches in a few channels that don't get entirely
% cleaned you might want to reduce this number so that they don't go
% into the calibration data. Default: 0.075.
%
% BurstCriterionRefTolerances : These are the power tolerances outside of which a channel in a
% given time window is considered "bad", in standard deviations
% relative to a robust EEG power distribution (lower and upper
% bound). Together with the previous parameter this determines how
% ASR calibration data is be extracted from a recording. Can also be
% specified as 'off' to achieve the same effect as in the previous
% parameter. Default: [-Inf 5.5].
%
% BurstRejection : 'on' or 'off'. If 'on' reject portions of data containing burst instead of
% correcting them using ASR. Default is 'off'.
%
% WindowCriterionTolerances : These are the power tolerances outside of which a channel in the final
% output data is considered "bad", in standard deviations relative
% to a robust EEG power distribution (lower and upper bound). Any time
% window in the final (repaired) output which has more than the
% tolerated fraction (set by the WindowCriterion parameter) of channel
% with a power outside of this range will be considered incompletely
% repaired and will be removed from the output. This last stage can be
% skipped either by setting the WindowCriterion to 'off' or by taking
% the third output of this processing function (which does not include
% the last stage). Default: [-Inf 7].
%
% FlatlineCriterion : Maximum tolerated flatline duration. In seconds. If a channel has a longer
% flatline than this, it will be considered abnormal. Default: 5
%
% NoLocsChannelCriterion : Criterion for removing bad channels when no channel locations are
% present. This is a minimum correlation value that a given channel must
% have w.r.t. a fraction of other channels. A higher value makes the
% criterion more aggressive. Reasonable range: 0.4 (very lax) - 0.6
% (quite aggressive). Default: 0.45.
%
% NoLocsChannelCriterionExcluded : The fraction of channels that must be sufficiently correlated with
% a given channel for it to be considered "good" in a given time
% window. Applies only to the NoLocsChannelCriterion. This adds
% robustness against pairs of channels that are shorted or other
% that are disconnected but record the same noise process.
% Reasonable range: 0.1 (fairly lax) to 0.3 (very aggressive);
% note that increasing this value requires the ChannelCriterion
% to be relaxed in order to maintain the same overall amount of
% removed channels. Default: 0.1.
%
% MaxMem : The maximum amount of memory in MB used by the algorithm when processing.
% See function asr_process for more information. Default is 64.
%
% Out:
% EEG : Final cleaned EEG recording.
%
% HP : Optionally just the high-pass filtered data.
%
% BUR : Optionally the data without final removal of "irrecoverable" windows.
%
% Examples:
% % Load a recording, clean it, and visualize the difference (using the defaults)
% raw = pop_loadset(...);
% clean = clean_artifacts(raw);
% vis_artifacts(clean,raw);
%
% % Use a more aggressive threshold (passing the parameters in by position)
% raw = pop_loadset(...);
% clean = clean_artifacts(raw,[],2.5);
% vis_artifacts(clean,raw);
%
% % Passing some parameter by name (here making the WindowCriterion setting less picky)
% raw = pop_loadset(...);
% clean = clean_artifacts(raw,'WindowCriterion',0.25);
%
% % Disabling the WindowCriterion and ChannelCriterion altogether
% raw = pop_loadset(...);
% clean = clean_artifacts(raw,'WindowCriterion','off','ChannelCriterion','off');
%
%
% Christian Kothe, Swartz Center for Computational Neuroscience, UCSD
% 2012-09-04
% Copyright (C) Christian Kothe, SCCN, 2012, ckothe@ucsd.edu
%
% This program is free software; you can redistribute it and/or modify it under the terms of the GNU
% General Public License as published by the Free Software Foundation; either version 2 of the
% License, or (at your option) any later version.
%
% This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
% even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
% General Public License for more details.
%
% You should have received a copy of the GNU General Public License along with this program; if not,
% write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
% USA
hlp_varargin2struct(varargin,...
{'chancorr_crit','ChannelCorrelationCriterion','ChannelCriterion'}, 0.8, ...
{'line_crit','LineNoiseCriterion'}, 4, ...
{'burst_crit','BurstCriterion'}, 5, ...
{'fusechanrej','Fusechanrej'}, [], ... % unused in this function
{'channels','Channels'}, [], ...
{'channels_ignore','Channels_ignore'}, [], ...
{'window_crit','WindowCriterion'}, 0.25, ...
{'num_samples','NumSamples'}, 50, ...
{'highpass_band','Highpass'}, [0.25 0.75], ...
{'channel_crit_maxbad_time','ChannelCriterionMaxBadTime'}, 0.5, ...
{'burst_crit_refmaxbadchns','BurstCriterionRefMaxBadChns'}, 0.075, ...
{'burst_crit_reftolerances','BurstCriterionRefTolerances'}, [-inf 5.5], ...
{'distance2','Distance'}, 'euclidian', ...
{'window_crit_tolerances','WindowCriterionTolerances'},[-inf 7], ...
{'burst_rejection','BurstRejection'},'off', ...
{'nolocs_channel_crit','NoLocsChannelCriterion'}, 0.45, ...
{'nolocs_channel_crit_excluded','NoLocsChannelCriterionExcluded'}, 0.1, ...
{'max_mem','MaxMem'}, 64, ...
{'availableRAM_GB','availableRAM_GB'}, NaN, ...
{'flatline_crit','FlatlineCriterion'}, 5);
if ~isnan(availableRAM_GB)
max_mem = availableRAM_GB*1000;
end
% ignore some channels
if ~isempty(channels)
if ~isempty(channels_ignore)
error('Can include or ignore channel but not both at the same time')
end
if ~iscell(channels)
error('Cannot exclude channels without channel labels')
end
oriEEG = EEG;
EEG = pop_select(EEG, 'channel', channels);
oriEEG_without_ignored_channels = EEG;
EEG.event = []; % will be added back later
end
if ~isempty(channels_ignore)
if ~iscell(channels_ignore)
error('Cannot exclude channels without channel labels')
end
oriEEG = EEG;
EEG = pop_select(EEG, 'nochannel', channels_ignore);
oriEEG_without_ignored_channels = EEG;
EEG.event = []; % will be added back later
end
% remove flat-line channels
if ~strcmp(flatline_crit,'off')
disp('Detecting flat line...')
EEG = clean_flatlines(EEG,flatline_crit);
end
% high-pass filter the data
if ~strcmp(highpass_band,'off')
disp('Applying highpass filter...')
EEG = clean_drifts(EEG,highpass_band);
end
if nargout > 1
HP = EEG;
end
% remove noisy channels by correlation and line-noise thresholds
oldSrate = EEG.srate;
EEG.srate = round(EEG.srate);
removed_channels = [];
if ~strcmp(chancorr_crit,'off') || ~strcmp(line_crit,'off') %#ok<NODEF>
if strcmp(chancorr_crit,'off')
chancorr_crit = 0; end
if strcmp(line_crit,'off')
line_crit = 100; end
try
[EEG,removed_channels] = clean_channels(EEG,chancorr_crit,line_crit,[],channel_crit_maxbad_time, num_samples);
catch e
% if strcmp(e.identifier,'clean_channels:bad_chanlocs')
disp('Your dataset appears to lack correct channel locations; using a location-free channel cleaning method.');
[EEG,removed_channels] = clean_channels_nolocs(EEG,nolocs_channel_crit,nolocs_channel_crit_excluded,[],channel_crit_maxbad_time);
% else
% rethrow(e);
% end
end
end
% repair bursts by ASR
if ~strcmp(burst_crit,'off')
if ~strcmpi(distance2, 'euclidian')
BUR = clean_asr(EEG,burst_crit,[],[],[],burst_crit_refmaxbadchns,burst_crit_reftolerances,[], [], true, max_mem);
else
try
BUR = clean_asr(EEG,burst_crit,[],[],[],burst_crit_refmaxbadchns,burst_crit_reftolerances,[], [], false, max_mem);
catch
lasterr
return
end
end
if strcmp(burst_rejection,'on')
% portion of data which have changed
sample_mask = sum(abs(EEG.data-BUR.data),1) < 1e-8;
% find latency of regions
retain_data_intervals = reshape(find(diff([false sample_mask false])),2,[])';
retain_data_intervals(:,2) = retain_data_intervals(:,2)-1;
% remove small intervals
if ~isempty(retain_data_intervals)
smallIntervals = diff(retain_data_intervals')' < 5;
for iInterval = find(smallIntervals)'
sample_mask(retain_data_intervals(iInterval,1):retain_data_intervals(iInterval,2)) = 0;
end
retain_data_intervals(smallIntervals,:) = [];
end
% reject regions
EEG = pop_select(EEG, 'point', retain_data_intervals);
EEG.etc.clean_sample_mask = sample_mask;
% check
try
res = checksamples(EEG);
if res.CleanBoundaryMatch == 0
fprintf(2, 'Your data is fine, but upon double checking the removed portions of data, there is\n');
fprintf(2, 'a small discrepency; if you can reproduce the warning message, send us your dataset.\n');
end
catch, end
else
EEG = BUR;
end
end
EEG.srate = oldSrate;
if nargout > 2
BUR = EEG;
end
% remove irrecoverable time windows based on power
if ~strcmp(window_crit,'off') && ~strcmp(window_crit_tolerances,'off')
disp('Now doing final post-cleanup of the output.');
EEG = clean_windows(EEG,window_crit,window_crit_tolerances);
end
disp('Use vis_artifacts to compare the cleaned data to the original.');
% add back original channels
if ~isempty(channels) || ~isempty(channels_ignore)
if ~isempty(removed_channels)
removed_channels = { oriEEG_without_ignored_channels.chanlocs(removed_channels).labels };
end
% Apply same transformation to the data before removal of channels and data
EEG = eeg_checkset(EEG, 'eventconsistency');
if ~isempty(EEG.event) && isfield(EEG.event, 'type') && isstr(EEG.event(1).type)
disp('Adding back removed channels');
boundaryEvents = eeg_findboundaries(EEG);
% remove again data portions
if ~isempty(boundaryEvents)
boundloc = [ EEG.event(boundaryEvents).latency ];
dur = [ EEG.event(boundaryEvents).duration ];
cumdur = cumsum(dur);
boundloc = boundloc + [0 cumdur(1:end-1) ];
boundloc = [ boundloc; boundloc+dur-1]';
oriEEG = eeg_eegrej(oriEEG, ceil(boundloc));
end
% copy clean data to oriEEG (in case data was corrected
[~,chanInds1, chanInds2] = intersect({ oriEEG.chanlocs.labels }, { EEG.chanlocs.labels });
if size(oriEEG.data,2) ~= size(EEG.data,2)
error('Issue with adding back removed channels. Remove channels, then remove bad portions of data.');
end
oriEEG.data(chanInds1,:) = EEG.data(chanInds2,:);
oriEEG.pnts = EEG.pnts;
EEG = oriEEG;
if ~isempty(removed_channels)
EEG = pop_select(EEG, 'rmchannel', removed_channels);
end
end
end
%creates a structure summarizing if Assumptions 2 and 4 are true or false
%for each dataset in my study
function CleanSampleMismatch = check_rm_samples(EEG)
CleanSampleMismatch.oldLength = length(EEG.etc.clean_sample_mask);
CleanSampleMismatch.newLength = EEG.pnts;
CleanSampleMismatch.maskSum = sum(EEG.etc.clean_sample_mask);
if EEG.pnts ~= sum(EEG.etc.clean_sample_mask) %tests Assumption 2
CleanSampleMismatch.CleanMaskMatch = 0;
else
CleanSampleMismatch.CleanMaskMatch = 1;
end
CleanSampleMismatch.boundaryDuration = 0;
for j = 1: length(EEG.event) %finds total duration of boundary events
if strcmp(EEG.event(j).type, 'boundary')
CleanSampleMismatch.boundaryDuration = CleanSampleMismatch.boundaryDuration + EEG.event(j).duration;
end
end
CleanSampleMismatch.nonBoundaryLength = length(EEG.etc.clean_sample_mask) - CleanSampleMismatch.boundaryDuration;
if EEG.pnts ~= CleanSampleMismatch.nonBoundaryLength %tests Assumption 4
CleanSampleMismatch.CleanBoundaryMatch = 0;
else
CleanSampleMismatch.CleanBoundaryMatch = 1;
end