Posts

Showing posts from April, 2023

Switch Net 4 (Switch Net N) Combine multiple low width neural layers with a fast transform.

Image
 Switch Net 4  Switch Net 4. Random sign flipping before a fast Walsh Hadamard transform results in a Random Projection. For almost any input the result is a Gaussian distributed output vector where each output element contains knowledge of all the input elements. With Switch Net 4 the output elements are 2-way switched using (x<0?) as a switching predicate. Where x is the input to the switch.  The switches are grouped together in units of 4 which together form a small width 4 neural. When a particualr x>=0, the pattern in the selected pattern of weights is forward projected with intensity x. When x<0 a different pattern of forward selected weights is again projected with intensity x (x being negative this time.) If nothing was projected when x<0 then the situation would be identical to using a ReLU. You could view the situation in Switch Net 4 as using 2 ReLUs one with input x and one with input -x.  The reason to 2-way switch (or +- ReLU) is to avoid early information los

Switch Net

Image
 Switch Net Switch Net. Switch Net is a neural network based on using the fast Walsh Hadamard tranform (WHT) as a fixed set of layer weights with a parametric switching based activation function. The cost of the WHT is nlog2(n) add subtracts as opposed to n squared fused multiply adds for a conventional dense neural layer's weight calculations. A fixed random pattern of sign flips is applied to the input data so that the first WHT fairly distributes information with a Gaussian distribution across its outputs. Then a switching decision is applied to each output of the WHT and the output is multiplied by one weight or another.  The section between the red lines is basically a layer and can be repeated a number of times. A final WHT is done to combine the last lot of switched weight outputs. Some example code is here: https://editor.p5js.org/congchuatocmaydangyeu7/sketches/7ekZTwQMF One slight issue is if the switching weights for a particular output have opposite signs then the resul

ReLU as a Switch

Image
 ReLU as a Switch The conventional view - ReLU as a function ReLU as a function. ReLU as a Switch A slight confusion in computer science is that switches are purely digital, from switching gates and such. That appears to be the case because the supply voltage is fixed and that is the only thing switched. However a light switch in you house is binary on off, yet connects or disconnects an AC sine wave voltage.  A switch therefore is a mixed digital analog device. When on, the input signal goes through 1 to 1. When off, the output is zero. You can then view ReLU as a switch that is 1 to 1 when on and output zero when off. Of course there is a switching decision to make.  In your house you make the switching decision for a light. With ReLU the switching decision is based on the predicate (x>=0)? Where x is the input to the switch. You could supply other predicates for the switching decisions if you wanted but switching at x=0 is very helpful for optimization. A ReLU network with all th

Subrandom Projections

Image
Subrandom Projections Performing random sign flips on natural data like images before many types of fast transform such as the FFT or WHT gives Gaussian noise outputs.  Essentially the sign flips convert the input data to a collection of random variables that then get weighted, summed and differenced by the fast transform giving Gaussian noise by the Central Limit Theorem. Another way of stating the situation is that random sign flips combined with fast transforms create Random Projections (RPs) of the input data. While such RPs have many uses there are some disadvantages such as the complete loss of direct rotation, scaling and translation (RST) information about objects in the orignal image.  It would be nice if some of that RST information was kept along with some of the nice information distribution properties of Random Projections. One solution to the problem is to use subrandom (low discrepancy) numbers to decide the sign flips. For example, Additive Recurrence based Subrandom Nu

Associative Memory using a Locality Sensitive Hash (Random Projection + Binarization)

Associative Memory using a Locality Sensitive Hash Hashing and Associative Memory Suppose you have a hash algorithm with d (+1,-1) binary outputs and you apply a weighted sum to those d binary outputs. Then you can store up to d <input,scalar> associations. Where the input goes to the hash function and the scalar value is the output of the weighted sum.  Only if by chance the hash algorithm outputs 2 binary patterns (for particular inputs) that are not linearly seperable will the storage capacity be less than d. As d increases the probability of such a collision decreases. The training algorithm simply involves adding or subtracting the recall error (divided by d) to the weights according to whether the hash bit for the particular weight is valued -1 or +1.  The training algorithm is applied to the training data a number of times to drive the recall error to a low value solving a set of d linear simultaneous equations. An input that was not in the training set results in a comple

Random Projections for Neural Networks

Image
Random Projections for Neural Networks  Variance and the CLT For Linear Combinations of Random Variables a negative sign before a particular variable has no effect on variance. And of course the Central Limit Theorem is in operation. The Walsh Hadamard transform (WHT) is a set of orthogonal weighted sums (where the weights are +1 or -1 or some constant multiple of those.)  And so the variance equation for linear combinations of random variables applies (minus signs not invalidating that), as does the Central Limit Theorem. 8-Point Walsh Hadamard transform. Therefore applying the WHT to a sequence of random numbers from the Uniform random number distribution results in a sequence of numbers from the Gaussian (Normal) distribution. Example code is here: https://editor.p5js.org/siobhan.491/sketches/WhoxMA7pH If you want to use that behavior to generate Gaussians you should remember that the WHT leaves vector magnitued (length) unchanged (except by a constant c.)  The result is all the Gau

Walsh Hadamard transform - Image Compression.

Image
A random image compressed with the WHT. Basic Image Compression An early application of the fast Walsh Hadamard transform (WHT) was for image compression. It was used on a Mariner spacecraft in 1969, to broadcast pictures back to earth. The Walsh Hadamard transform, unlike the FFT, is not bandwidth limited. One consequence is there is no need to use a 2-D WHT for image compression as is necessary with the FFT.  Another consequence is more blocky artifacts than with the FFT. There is some elementary image compression code here: https://editor.p5js.org/siobhan.491/sketches/BQcmEs9GM 8-Point WHT The WHT when used for image compression examines the image for certain square wave like patterns and extracts those of highest magnitued. The WHT is self-inverse. To decompress you simply run those high magitude values through the WHT again. Improved Compression The intermediate calculations of the fast WHT algorithm are quite wavelet like and can also advantageously be used for image compression.