-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
480 lines (393 loc) · 21.7 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
Memtest86++ v4.20
====================
Table of Contents
=================
1) Introduction
2) Licensing
3) Installation
4) Serial Port Console
5) Online Commands
6) Memory Sizing
7) Error Display
8) Trouble-shooting Memory Errors
9) Execution Time
10) Memory Testing Philosophy
11) Memtest86+ Test Algorithms
12) Individual Test Descriptions
13) Problem Reporting - Contact Information
14) Known Problems
1) Introduction
===============
Memtest86+ is thorough, stand alone memory test for Intel/AMD x86 architecture
systems. BIOS based memory tests are only a quick check and often miss
failures that are detected by Memtest86+.
For updates go to the Memtest86+ web page:
http://www.memtest.org
2) Licensing
============
Memtest86+ is released under the terms of the Gnu Public License (GPL). Other
than the provisions of the GPL there are no restrictions for use, private or
commercial. See: http://www.gnu.org/licenses/gpl.html for details.
Explicit permission for inclusion of Memtest86+ in software compilations and
publications is hereby granted.
3) Installation (Linux Only)
============================
Memtest86+ is a stand alone program and can be loaded from either a disk
partition or from a floppy disk.
To build Memtest86+:
1) Review the Makefile and adjust options as needed.
2) Type "make"
This creates a file named "memtest.bin" which is a bootable image. This
image file may be copied to a floppy disk or lilo may be used to boot this
image from a hard disk partition.
To create a Memtest86+ bootdisk
1) Insert a blank write enabled floppy disk.
2) As root, Type "make install"
To boot from a disk partition via lilo
1) Copy the image file to a permanent location (ie. /memtest).
2) Add an entry in the lilo config file (usually /etc/lilo.conf) to boot
Memtest86+. Only the image and label fields need to be specified.
The following is a sample lilo entry for booting Memtest86+:
image = /memtest
label = memtest
3) As root, type "lilo"
At the lilo prompt enter memtest to boot Memtest86+.
If you encounter build problems a binary image has been included (precomp.bin).
To create a boot-disk with this pre-built image do the following:
1) Insert a blank write enabled floppy disk.
2) Type "make install-precomp"
4) Serial Console
=================
Memtest86+ can be used on PC's equipped with a serial port for the console.
By default serial port console support is not enabled since it slows
down testing. To enable change the SERIAL_CONSOLE_DEFAULT define in
config.h from a zero to a one. The serial console baud rate may also
be set in config.h with the SERIAL_BAUD_RATE define. The other serial
port settings are no parity, 8 data bits, 1 stop bit. All of the features
used by Memtest86+ are accessible via the serial console. However, the
screen sometimes is garbled when the online commands are used.
5) Online Commands
==================
Memtest86+ has a limited number of online commands. Online commands
provide control over caching, test selection, address range and error
scrolling. A help bar is displayed at the bottom of the screen listing
the available on-line commands.
Command Description
ESC Exits the test and does a warm restart via the BIOS.
c Enters test configuration menu
Menu options are:
1) Cache mode
2) Test selection
3) Address Range
4) Memory Sizing
5) Error Summary
6) Error Report Mode
7) ECC Mode
8) Restart
9) Adv. Options
SP Set scroll lock (Stops scrolling of error messages)
Note: Testing is stalled when the scroll lock is
set and the scroll region is full.
CR Clear scroll lock (Enables error message scrolling)
6) Memory Sizing
================
The BIOS in modern PC's will often reserve several sections of memory for
it's use and also to communicate information to the operating system (ie.
ACPI tables). It is just as important to test these reserved memory blocks
as it is for the remainder of memory. For proper operation all of memory
needs to function properly regardless of what the eventual use is. For
this reason Memtest86+ has been designed to test as much memory as is
possible.
However, safely and reliably detecting all of the available memory has been
problematic. Versions of Memtest86+ prior to v0.91 would probe to find where
memory is. This works for the vast majority of motherboards but is not 100%
reliable. Sometimes the memory size is incorrect and worse probing the wrong
places can in some cases cause the test to hang or crash.
Starting in version 0.91 alternative methods are available for determining the
memory size. By default the test attempts to get the memory size from the
BIOS using the "e820" method. With "e820" the BIOS provides a table of memory
segments and identifies what they will be used for. By default Memtest86+
will test all of the ram marked as available and also the area reserved for
the ACPI tables. This is safe since the test does not use the ACPI tables
and the "e820" specifications state that this memory may be reused after the
tables have been copied. Although this is a safe default some memory will
not be tested.
Two additional options are available through online configuration options.
The first option (BIOS-All) also uses the "e820" method to obtain a memory
map. However, when this option is selected all of the reserved memory
segments are tested, regardless of what their intended use is. The only
exception is memory segments that begin above 3gb. Testing has shown that
these segments are typically not safe to test. The BIOS-All option is more
thorough but could be unstable with some motherboards.
The second option for memory sizing is the traditional "Probe" method.
This is a very thorough but not entirely safe method. In the majority of
cases the BIOS-All and Probe methods will return the same memory map.
For older BIOS's that do not support the "e820" method there are two
additional methods (e801 and e88) for getting the memory size from the
BIOS. These methods only provide the amount of extended memory that is
available, not a memory table. When the e801 and e88 methods are used
the BIOS-All option will not be available.
The MemMap field on the display shows what memory size method is in use.
Also the RsvdMem field shows how much memory is reserved and is not being
tested.
7) Error Information
======================
Memtest has two options for reporting errors. The default is to report
individual errors. In BadRAM Patterns mode patterns are created for
use with the Linux BadRAM feature. This slick feature allows Linux to
avoid bad memory pages. Details about the BadRAM feature can be found at:
http://home.zonnet.nl/vanrein/badram
For individual errors the following information is displayed when a memory
error is detected. An error message is only displayed for errors with a
different address or failing bit pattern. All displayed values are in
hexadecimal.
Tst: Test number
Failing Address : Failing memory address
Good: Expected data pattern
Bad: Failing data pattern
Err-Bits: Exclusive or of good and bad data (this shows the
position of the failing bit(s))
Count: Number of consecutive errors with the same address
and failing bits
In BadRAM Patterns mode, Lines are printed in a form badram=F1,M1,F2,M2.
In each F/M pair, the F represents a fault address, and the corresponding M
is a bitmask for that address. These patterns state that faults have
occurred in addresses that equal F on all "1" bits in M. Such a pattern may
capture more errors that actually exist, but at least all the errors are
captured. These patterns have been designed to capture regular patterns of
errors caused by the hardware structure in a terse syntax.
The BadRAM patterns are `grown' increment-ally rather than `designed' from an
overview of all errors. The number of pairs is constrained to five for a
number of practical reasons. As a result, handcrafting patterns from the
output in address printing mode may, in exceptional cases, yield better
results.
8) Trouble-shooting Memory Errors
================================
Please be aware that not all errors reported by Memtest86+ are due to
bad memory. The test implicitly tests the CPU, L1 and L2 caches as well as
the motherboard. It is impossible for the test to determine what causes
the failure to occur. Most failures will be due to a problem with memory.
When it is not, the only option is to replace parts until the failure is
corrected.
Once a memory error has been detected, determining the failing
module is not a clear cut procedure. With the large number of motherboard
vendors and possible combinations of simm slots it would be difficult if
not impossible to assemble complete information about how a particular
error would map to a failing memory module. However, there are steps
that may be taken to determine the failing module. Here are three
techniques that you may wish to use:
1) Removing modules
This is simplest method for isolating a failing modules, but may only be
employed when one or more modules can be removed from the system. By
selectively removing modules from the system and then running the test
you will be able to find the bad module(s). Be sure to note exactly which
modules are in the system when the test passes and when the test fails.
2) Rotating modules
When none of the modules can be removed then you may wish to rotate modules
to find the failing one. This technique can only be used if there are
three or more modules in the system. Change the location of two modules
at a time. For example put the module from slot 1 into slot 2 and put
the module from slot 2 in slot 1. Run the test and if either the failing
bit or address changes then you know that the failing module is one of the
ones just moved. By using several combinations of module movement you
should be able to determine which module is failing.
3) Replacing modules
If you are unable to use either of the previous techniques then you are
left to selective replacement of modules to find the failure.
4) Avoiding allocation
The printing mode for BadRAM patterns is intended to construct boot time
parameters for a Linux kernel that is compiled with BadRAM support. This
work-around makes it possible for Linux to reliably run on defective
RAM. For more information on BadRAM support
for Linux, sail to
http://home.zonnet.nl/vanrein/badram
Sometimes memory errors show up due to component incompatibility. A memory
module may work fine in one system and not in another. This is not
uncommon and is a source of confusion. The components are not necessarily
bad but certain combinations may need to be avoided.
I am often asked about the reliability of errors reported by Memtest86+.
In the vast majority of cases errors reported by the test are valid.
There are some systems that cause Memtest86+ to be confused about the size of
memory and it will try to test non-existent memory. This will cause a large
number of consecutive addresses to be reported as bad and generally there
will be many bits in error. If you have a relatively small number of
failing addresses and only one or two bits in error you can be certain
that the errors are valid. Also intermittent errors are always valid.
All valid memory errors should be corrected. It is possible that a
particular error will never show up in normal operation. However, operating
with marginal memory is risky and can result in data loss and even
disk corruption. You can be sure that Murphy will get you if you know
about a memory error and ignore it.
Memtest86+ can not diagnose many types of PC failures. For example a
faulty CPU that causes Windows to crash will most likely just cause
Memtest86+ to crash in the same way.
9) Execution Time
==================
The time required for a complete pass of Memtest86+ will vary greatly
depending on CPU speed, memory speed and memory size. Memtest86+ executes
indefinitely. The pass counter increments each time that all of the
selected tests have been run. Generally a single pass is sufficient to
catch all but the most obscure errors. However, for complete confidence
when intermittent errors are suspected testing for a longer period is advised.
10) Memory Testing Philosophy
=============================
There are many good approaches for testing memory. However, many tests
simply throw some patterns at memory without much thought or knowledge
of memory architecture or how errors can best be detected. This
works fine for hard memory failures but does little to find intermittent
errors. BIOS based memory tests are useless for finding intermittent
memory errors.
Memory chips consist of a large array of tightly packed memory cells,
one for each bit of data. The vast majority of the intermittent failures
are a result of interaction between these memory cells. Often writing a
memory cell can cause one of the adjacent cells to be written with the
same data. An effective memory test attempts to test for this
condition. Therefore, an ideal strategy for testing memory would be
the following:
1) write a cell with a zero
2) write all of the adjacent cells with a one, one or more times
3) check that the first cell still has a zero
It should be obvious that this strategy requires an exact knowledge
of how the memory cells are laid out on the chip. In addition there is a
never ending number of possible chip layouts for different chip types
and manufacturers making this strategy impractical. However, there
are testing algorithms that can approximate this ideal strategy.
11) Memtest86+ Test Algorithms
=============================
Memtest86+ uses two algorithms that provide a reasonable approximation
of the ideal test strategy above. The first of these strategies is called
moving inversions. The moving inversion test works as follows:
1) Fill memory with a pattern
2) Starting at the lowest address
2a check that the pattern has not changed
2b write the patterns complement
2c increment the address
repeat 2a - 2c
3) Starting at the highest address
3a check that the pattern has not changed
3b write the patterns complement
3c decrement the address
repeat 3a - 3c
This algorithm is a good approximation of an ideal memory test but
there are some limitations. Most high density chips today store data
4 to 16 bits wide. With chips that are more than one bit wide it
is impossible to selectively read or write just one bit. This means
that we cannot guarantee that all adjacent cells have been tested
for interaction. In this case the best we can do is to use some
patterns to insure that all adjacent cells have at least been written
with all possible one and zero combinations.
It can also be seen that caching, buffering and out of order execution
will interfere with the moving inversions algorithm and make less effective.
It is possible to turn off cache but the memory buffering in new high
performance chips can not be disabled. To address this limitation a new
algorithm I call Modulo-X was created. This algorithm is not affected by
cache or buffering. The algorithm works as follows:
1) For starting offsets of 0 - 20 do
1a write every 20th location with a pattern
1b write all other locations with the patterns complement
repeat 1b one or more times
1c check every 20th location for the pattern
This algorithm accomplishes nearly the same level of adjacency testing
as moving inversions but is not affected by caching or buffering. Since
separate write passes (1a, 1b) and the read pass (1c) are done for all of
memory we can be assured that all of the buffers and cache have been
flushed between passes. The selection of 20 as the stride size was somewhat
arbitrary. Larger strides may be more effective but would take longer to
execute. The choice of 20 seemed to be a reasonable compromise between
speed and thoroughness.
12) Individual Test Descriptions
================================
Memtest86+ executes a series of numbered test sections to check for
errors. These test sections consist of a combination of test
algorithm, data pattern and caching. The execution order for these tests
were arranged so that errors will be detected as rapidly as possible.
A description of each of the test sections follows:
Test 0 [Address test, walking ones, no cache]
Tests all address bits in all memory banks by using a walking ones
address pattern. Errors from this test are not used to calculate
BadRAM patterns.
Test 1 [Address test, own address]
Each address is written with its own address and then is checked
for consistency. In theory previous tests should have caught any
memory addressing problems. This test should catch any addressing
errors that somehow were not previously detected.
Test 2 [Moving inversions, ones&zeros]
This test uses the moving inversions algorithm with patterns of all
ones and zeros. Cache is enabled even though it interferes to some
degree with the test algorithm. With cache enabled this test does not
take long and should quickly find all "hard" errors and some more
subtle errors. This section is only a quick check.
Test 3 [Moving inversions, 8 bit pat]
This is the same as test 1 but uses a 8 bit wide pattern of
"walking" ones and zeros. This test will better detect subtle errors
in "wide" memory chips. A total of 20 data patterns are used.
Test 4 [Moving inversions, random pattern]
Test 4 uses the same algorithm as test 1 but the data pattern is a
random number and it's complement. This test is particularly effective
in finding difficult to detect data sensitive errors. A total of 60
patterns are used. The random number sequence is different with each pass
so multiple passes increase effectiveness.
Test 5 [Block move, 64 moves]
This test stresses memory by using block move (movsl) instructions
and is based on Robert Redelmeier's burnBX test. Memory is initialized
with shifting patterns that are inverted every 8 bytes. Then 4MB blocks
of memory are moved around using the movsl instruction. After the moves
are completed the data patterns are checked. Because the data is checked
only after the memory moves are completed it is not possible to know
where the error occurred. The addresses reported are only for where the
bad pattern was found. Since the moves are constrained to a 8MB segment
of memory the failing address will always be lest than 8MB away from the
reported address. Errors from this test are not used to calculate
BadRAM patterns.
Test 6 [Moving inversions, 32 bit pat]
This is a variation of the moving inversions algorithm that shifts the data
pattern left one bit for each successive address. The starting bit position
is shifted left for each pass. To use all possible data patterns 32 passes
are required. This test is quite effective at detecting data sensitive
errors but the execution time is long.
Test 7 [Random number sequence]
This test writes a series of random numbers into memory. By resetting the
seed for the random number the same sequence of number can be created for
a reference. The initial pattern is checked and then complemented and
checked again on the next pass. However, unlike the moving inversions test
writing and checking can only be done in the forward direction.
Test 8 [Modulo 20, ones&zeros]
Using the Modulo-X algorithm should uncover errors that are not
detected by moving inversions due to cache and buffering interference
with the the algorithm. All ones and zeros are used for data patterns.
Test 9 [Bit fade test, 90 min, 2 patterns]
The bit fade test initializes all of memory with a pattern and then
sleeps for 90 minutes. Then memory is examined to see if any memory bits
have changed. All ones and all zero patterns are used. This test takes
3 hours to complete. The Bit Fade test is not included in the normal test
sequence and must be run manually via the runtime configuration menu.
14) Known Problems
==================
Sometimes when booting from a floppy disk the following messages scroll up
on the screen:
X:8000
AX:0212
BX:8600
CX:0201
DX:0000
This the BIOS reporting floppy disk read errors. Either re-write or toss
the floppy disk.
Memtest86+ has no support for multiple CPUs. Memtest86+ should run
without problems, but it will only use one CPU.
Memtest86+ can not diagnose many types of PC failures. For example a
faulty CPU that causes Windows to crash will most likely just cause
Memtest86+ to crash in the same way.
There have been numerous reports of errors in only tests 5 and 8 on Athlon
systems. Often the memory works in a different system or the vendor insists
that it is good. In these cases the memory is not necessarily bad but is
not able to operate reliably at Athlon speeds. Sometimes more conservative
memory timings on the motherboard will correct these errors. In other
cases the only option is to replace the memory with better quality, higher
speed memory. Don't buy cheap memory and expect it to work with an Athlon!
Memtest86+ supports all types of memory. If fact the test has absolutely
no knowledge of the memory type nor does it need to. This not a problem
or bug but is listed here due to the many questions I get about this issue.
Changes in the compiler and loader have caused problems with
Memtest86+ resulting in both build failures and errors in execution. A
binary image (precomp.bin) of the test is included and may be used if
problems are encountered.