Flakyflash is a Linux-based tool for diagnosing and salvaging FAT- and exFAT-formatted flash media having flaky sectors that do not retain data correctly. Note, it is not intended for diagnosing "fake flash" (a.k.a. "fraudulent flash") media; see F3 for that.
Flakyflash works by reading each free data cluster in a FAT or exFAT file system and then re-reading it and comparing the two reads. If they differ, then the cluster is assumed to be "flaky," and Flakyflash marks the cluster as bad in the file allocation table so that file system drivers will not allocate data to it. Only free data clusters (and, as an option, already marked-bad data clusters) are checked. Clusters that are currently in use by files and subdirectory metadata are not touched in any way. Obviously, for the most thorough checking, one should run this tool on a completely empty file system so that every data cluster may be checked for flakiness.
usage: flakyflash [options] <block-device>
options:
-b,--bad-clusters=[<action>,...]
-f,--free-clusters=[<action>,...]
-n,--dry-run
-v,--verbose
actions:
read: read cluster; mark bad if device errors
reread: re-read cluster; mark bad if different
zeroout: issue BLKZEROOUT ioctl on cluster
(elided if a previous "read" found cluster already zeroed)
readzeros: read cluster; mark bad if not zeroed
f3write: fill cluster with reproducible data
(elided if a previous "read" found cluster already correct)
f3read: read cluster; mark bad if subtly changed
secdiscard: issue BLKSECDISCARD ioctl on cluster
discard: issue BLKDISCARD ioctl on cluster
trash: fill cluster with pseudorandom garbage
bad: mark cluster as bad unconditionally
free: mark cluster as free unconditionally
list: write cluster number to stdout
defaults:
--bad-clusters=
--free-clusters=read,reread
If run with the --dry-run
option, Flakyflash does not perform any writes to the media. Any destructive actions specified in --bad-clusters
or --free-clusters
will be ignored with a warning.
If run with the --verbose
option, Flakyflash outputs a human-readable decoding of all standard FAT or exFAT file system superblock fields, including FAT32-specific fields (if applicable) and the fields of the File System Information sector (if present).
For each free cluster (and, optionally, for each bad cluster), Flakyflash performs a user-specified sequence of actions, which may comprise:
-
read
– Reads the cluster from the media. -
reread
– Reads the cluster from the media again. If the data retrieved do not match the data retrieved by the previousread
action (or written by the previouszeroout
,f3write
, ortrash
action), then Flakyflash marks the cluster as bad and continues immediately to the next cluster. If no previous action had established a baseline for the current cluster, thenreread
implicitly performs aread
first. -
zeroout
– Issues aBLKZEROOUT
ioctl
on the cluster, which causes the kernel to overwrite the cluster on the media with null bytes. If a previous action had already established that the cluster contains only null bytes, then thezeroout
action is skipped to avoid excessive writing to the flash media. Thezeroout
action itself establishes that the cluster contains only null bytes, which implies that multiplezeroout
actions in a row will collapse to a single action. -
readzeros
– Reads the cluster from the media. If the data retrieved are not all null bytes, then Flakyflash marks the cluster as bad and continues immediately to the next cluster. This action may be used to check for spurious bit flips (a.k.a. "bit rot") if it is known for certain that the clusters should contain only null bytes, such as if a previous session of Flakyflash had zeroed them out using thezeroout
action. -
f3write
– Overwrites the entire cluster on the media with reproducible data derived from the cluster's offset by the same linear congruential generator as F3 uses. If a previous action had already established that the cluster contains exactly these data, then thef3write
action is skipped to avoid excessive writing to the flash media. Thef3write
action itself establishes that the cluster contains exactly these data, which implies that multiplef3write
actions in a row will collapse to a single action. -
f3read
– Reads the cluster from the media and compares the data retrieved versus the data that thef3write
action would write to the same cluster. If (and only if) more than zero but fewer than one eighth of the bits differ, then Flakyflash marks the cluster as bad and continues immediately to the next cluster. If the cluster appears to contain data that thef3write
action would write to some other cluster, then Flakyflash emits a warning message suggesting that the flash media may be fraudulent. Otherwise, if more than one eighth of the bits differ from the intended data, then Flakyflash emits a warning message indicating that the cluster is corrupted. This action may be used to check for spurious bit flips (a.k.a. "bit rot") if it is known for certain that the clusters were most recently written using thef3write
action. -
secdiscard
– Issues aBLKSECDISCARD
ioctl
on the cluster, which is supposed to cause the cluster to be erased irrecoverably, although hardware support for this is spotty. If the device does not support this action, then Flakyflash will emit an error message and abort with status code 134 (indicatingSIGABRT
). -
discard
– Issues aBLKDISCARD
ioctl
on the cluster, which indicates to the device that the data contents of the cluster are no longer needed. Some devices will immediately begin reading a discarded cluster as all null bytes, whereas others will put the cluster into an indeterminate state. If the device does not support this action, then Flakyflash will emit an error message and abort with status code 134 (indicatingSIGABRT
). -
trash
– Overwrites the entire cluster on the media with pseudorandom garbage generated by the kernel. Thetrash
action establishes a new baseline for the cluster, against which a subsequentreread
action may compare to determine if the write succeeded. -
bad
– Unconditionally marks the cluster as bad and continues immediately to the next cluster. There may be no good use for this action, but it is included for completeness. -
free
– Unconditionally marks the cluster as free. This action may be used to effect retesting of clusters previously marked as bad by specifying it as an action to perform on bad clusters. -
list
– Writes the cluster number to the standard output stream, followed by a newline character. Note that for historical reasons the first data cluster is numbered 2.
To reduce I/O overhead (and potentially also to reduce wear on cheap flash media), Flakyflash logically merges contiguous runs of clusters having the same disposition (free or bad) into chunks and performs the user-specified actions upon a whole chunk at a time. Where possible, Flakyflash attempts to align these chunks to device offsets that are multiples of 8 MiB, taking into account any partition offset. If any action that was attempted upon a multiple-cluster chunk encounters a hardware error (i.e., a system call returns error code EIO
), then Flakyflash reattempts the action (and all remaining actions in the specified sequence) on each cluster in the chunk individually so as to determine exactly which clusters are problematic. If any action that was attempted upon a single cluster encounters a hardware error, then Flakyflash marks the cluster as bad and continues immediately to the next cluster.
Any changes to the FAT are written to the media only once at the completion of testing. Interrupting Flakyflash before it has finished will cause any pending FAT changes to be lost. Note that writes to data clusters are performed during testing and are not deferred to the end.
flakyflash --verbose --bad-clusters= --free-clusters= /dev/sdX1
Only outputs a human-readable decoding of the file system superblocks but does not test any data clusters. Note that the default action list for bad clusters is empty, so the --bad-clusters=
argument may be omitted with the same effect.
flakyflash --free-clusters=read,reread /dev/sdX1
Tests each free data cluster by reading it and re-reading it. Marks as bad any clusters that do not read the same on the second read. Note that the default action list for free clusters is read,reread
, so the --free-clusters=read,reread
argument may be omitted with the same effect.
flakyflash --free-clusters=read,reread,reread,reread /dev/sdX1
Almost the same as the preceding example except that each free data cluster is read a total of four times instead of twice. If any read of the cluster fails to match any other read of the same cluster, then Flakyflash marks the cluster as bad.
flakyflash --free-clusters=read,zeroout,reread /dev/sdX1
Reads each free data cluster, and if it does not already contain only null bytes, then fills the cluster with null bytes. Re-reads the cluster afterward, and if it does not then contain only null bytes, then marks the cluster as bad.
flakyflash --free-clusters=zeroout,reread /dev/sdX1
Almost the same as the preceding example except that every free cluster is overwritten with null bytes unconditionally, without first checking whether the cluster already contains only null bytes.
flakyflash --free-clusters=zeroout,reread,trash,reread,zeroout,reread,discard /dev/sdX1
Really exercises each free data cluster by overwriting it with null bytes, verifying that it reads back as all null bytes, overwriting it with pseudorandom garbage, verifying that the garbage reads back correctly, overwriting it null bytes again, verifying that it reads back as all null bytes, and finally discarding it. This will very likely detect most weak clusters except if the device has an onboard cache.
flakyflash --free-clusters=read,f3write,reread /dev/sdX1
Reads each free data cluster, and if it does not already contain exactly the data that f3write
would write to it, then overwrites the cluster with those deterministic data. Re-reads the cluster afterward, and if it does not then contain the intended data, then marks the cluster as bad. This may be used to prepare the media for a data retention test. After thusly preparing the media, it should be placed in storage for a period of time. Upon retrieval of the media from storage, the next example shall conclude the test.
flakyflash --free-clusters=f3read /dev/sdX1
Reads each free data cluster and verifies that it contains the same deterministic data that the previous example wrote to it. Marks as bad any clusters whose data have changed subtly since they were written. Emits warnings about any clusters whose data have changed more than subtly or whose data were supposed to have been written to some other cluster. These latter cases may indicate that the flash media is fraudulent, and F3 may be further employed to diagnose such media.
flakyflash --bad-clusters=free,read,reread --free-clusters= /dev/sdX1
Retests all data clusters previously marked as bad. Each bad cluster is first marked as free (i.e., not bad) and is then read and re-read. If the two reads do not match, then the cluster is marked as bad again. Note that any changes to the FAT are written to the media only once at the completion of testing, so this will not issue redundant writes to the media.
You'll need GCC 10 or newer to build Flakyflash.
git clone --recurse-submodules https://github.com/whitslack/flakyflash.git
cd flakyflash
make
There is no installation needed, but you can install the compiled binary if you want to:
install -Dt /usr/local/sbin "out/$(gcc -dumpmachine)/flakyflash"