Skip to content

Sparse Distributed Representations

Scott Purdy edited this page Aug 26, 2013 · 6 revisions

Sparse Distributed Representations are binary representations of data comprised of many bits with a small percentage of the bits active (1's). The bits in these representations have semantic meaning and that meaning is distributed across the bits.

SDR Properties

Similarity

Shared bits between two representations indicate semantic similarity because the bits in SDRs each have semantic meaning. More shared bits means more semantic similarity. Here are two representations with some shared semantics, indicated by the shared bits:

0100000000000000000000010000000000000000000000000000000000000000010000..........01000
0000000010000000000000010000000000000000000000000010000000000000010000..........00000
                       ↑                                         ↑

Questions

  1. You want to compare two SDRs, A and B, with a third, C, to say if A or B is more similar. If A has 10 overlapping active bits of of 40, and B has a distinct set of 15, can you say that B is more similar? Or can some overlapping bits indicate more similarity than others?

Storing SDRs

These representations can be stored as a list of the indices of the active bits. This takes up little space because the representations are very sparse.

0100000000000000000000010000000000000000000000000000000000000000010000..........01000
becomes
[1, 23, 75, ..., 2045]

Subsampling

You can also subsample the indices because the semantics of the data are distributed across the bits. If you have a new representation that contains all of the subsampled indices then there is a high likelihood it is the same value (need to justify this more). And even if it isn't the same exact value, it is at least semantically very similar.

[1, 23, 75, ..., 2045] (40 indices)
can be stored as
[23, ...] (10 indices)

Questions

  1. Can we formalize the effect of subsampling? In the case of checking if an SDR is the same as a stored subsample, with what confidence can we say it is the same as a function of the total bits and the number sampled? This is applicable to the Comparing and Fault Tolerance sections below.

Comparing

Comparing two representations is as simple as taking the intersection of the two indices sets. The larger the overlap, the more semantically similar the two values are. This generally works even when subsampling the bits.

Fault Tolerance

SDRs are inherently fault-tolerant for the same reason that subsampling works. The semantics are distributed across the bits so you can discard several active indices without a significant loss in representation.

Dense Representations

Dense representations use a small number of bits and typically assign values to all, or most, of the combinations of 1's and 0's. ASCII, for instance, encodes text characters in eight bits. Each combination of the bits has a specific value assigned to it.

Dense representations are very high capacity and thus good for storing large amounts of data. But they do not share many of the properties of SDRs. They are not fault tolerant as flipping a single bit will result in a completed different value with potentially no shared semantics.

Clone this wiki locally