Associative Memory using a Locality Sensitive Hash (Random Projection + Binarization)
Associative Memory using a Locality Sensitive Hash
Hashing and Associative Memory
Suppose you have a hash algorithm with d (+1,-1) binary outputs and you apply a weighted sum to those d binary outputs. Then you can store up to d <input,scalar> associations. Where the input goes to the hash function and the scalar value is the output of the weighted sum.
Only if by chance the hash algorithm outputs 2 binary patterns (for particular inputs) that are not linearly seperable will the storage capacity be less than d.
As d increases the probability of such a collision decreases.
The training algorithm simply involves adding or subtracting the recall error (divided by d) to the weights according to whether the hash bit for the particular weight is valued -1 or +1.
The training algorithm is applied to the training data a number of times to drive the recall error to a low value solving a set of d linear simultaneous equations.
An input that was not in the training set results in a completely random (+,-) sum of the weights which tends to zero mean low level Gaussian noise when d is large (central limit theorem.)
Locality Sensitive Hash
Using a normal hashing function makes the associations extemely brittle. The slighest change in an input will produce a completely different hashing bit pattern and result in a Gaussian noise output rather than the expected output.
The use of a Locality Sensitive Hash (LSH) where only a few bits change for a small change in input allows you to create more useful associative memory.
A good choice of LSH is a Random Projection (RP) followed by (+1,-1) binarization.
A suitable and fast RP is simply to apply a random pattern of sign flips to the input data and follow by a fast Walsh Hadamard transform. The main restriction is the input data has to be of length a power of 2 (2,4,8,16...)
Some example code is here.
Other Matters of Interest
Storing one <vector,scalar> association results in the weight vector aligning with the binarized hash vector, angular distance zero.
Storing two <vector,scalar> associations forces the angle between the hash vectors and the weight vector apart.
As you store more association the hash vectors become more forced away from the direction of the weight vector (toward 90 degrees) and as a result the vector magnitude (length) of the weighted sum needs to increase to provide a particular scalar value out (at 90 degrees a weighted sum outputs only zero.)
This make the weighted sum much more sensitive to minor changes in input.
And also the weighted sum will produce a hyper-output for any hash vector that happens to align in direction with the weight vector.
Used under-capacity (less than d associations) such a associative memory will show some repetition code like error correction with a strong bias in the direction of the weight vector.
Used at capacity an assuming no dependent collisions you can perfect recall but with a very strong false response in the direction of weight vector.
Used over-capacity and with a reduced learning rate (a/d a<1 rather than 1/d) you can get approximate recall with Gaussian noise error.
These weighted sum matters are also of interest in artificial neural network research.
These behaviours become very apparent when you use multiple scalar output associative memories to produce <vector, vector> associations.
Sparse (large scale) associative memory algorithm: https://archive.org/details/sparse-am
ReplyDeleteSparse associative memory code in Java: https://archive.org/details/sparse-amjava
ReplyDelete