Switch Net 4 (Switch Net N) Combine multiple low width neural layers with a fast transform.

April 14, 2023

Switch Net 4

Switch Net 4.

Random sign flipping before a fast Walsh Hadamard transform results in a Random Projection. For almost any input the result is a Gaussian distributed output vector where each output element contains knowledge of all the input elements.

With Switch Net 4 the output elements are 2-way switched using (x<0?) as a switching predicate. Where x is the input to the switch.

The switches are grouped together in units of 4 which together form a small width 4 neural. When a particualr x>=0, the pattern in the selected pattern of weights is forward projected with intensity x. When x<0 a different pattern of forward selected weights is again projected with intensity x (x being negative this time.)

If nothing was projected when x<0 then the situation would be identical to using a ReLU. You could view the situation in Switch Net 4 as using 2 ReLUs one with input x and one with input -x.

The reason to 2-way switch (or +- ReLU) is to avoid early information loss about the input before the net has fully processed it and to allow an output to be constructed through multiple layers without information bottlenecks due to ReLU blocking fully when x<0.

Once the width 4 layers have been computed they are combined using another fast Walsh Hadamard transform.

The transform does 2 things.

First it is one-to-all. A change in a single input alters all the outputs. That makes it an excellent combining algorithm. It is a one-to-all connectionist device.

Second it is a set of orthogonal statistical summary measures. Where each switching decision and forward projected pattern alters the statistical measures in a gradual way.

Since the width 4 neural layers already provide 4-way connectivity you can truncate the later fast Walsh Hadamard transforms by 2 'layers.'

Shortened Switched Net 4 (for net width 8). WHT Truncated by 2 layers.

You can add layers by repeating the section between the red marks.

When choosing the weights you have to include a scaling factor for the WHT.

Which is 1/sqrt(n) for the full WHT and 1/sqrt(n>>>2) for the partial WHT.

Some example code is here:

https://editor.p5js.org/congchuatocmaydangyeu7/sketches/IIZ9L5fzS

Of course you could choose a different width size for the small width layers, like 2 or 8 or 16 for example.

One metric for such networks is the number of switching decisions per parameter.

Conventional dense ReLU neural networks score poorly on that basis.

Switch Net 4 scores better and is easy to train.

Switch Net scores even better but for technical reasons you may need to use a net with double or 4 times the input data width. Switch Net also seems to generalize better.

It is an engineering decision whether to use Switch Net 4 or Switch Net.

Switch Net:

https://ai462qqq.blogspot.com/2023/04/switch-net.html

Comments

AI462April 15, 2023 at 4:13 PM
A Walsh Hadamard transform library for CPUs: https://github.com/FALCONN-LIB/FFHT
ReplyDelete
Replies
AI462May 2, 2023 at 8:34 PM
Switch Net 4 with backpropagation. Written in Processing, a Java like programming language:
https://archive.org/details/switchnet4bp
ReplyDelete
Replies
AI462May 7, 2023 at 1:03 AM
Switch Net 4 with backpropagation in FreeBasic:
https://archive.org/details/switch-net-4-bpfb
FreeBasic is nearly C, with pointers etc. It should be an easy convert if you wanted to do that. There are also optional Linux AMD64 assembly language speed-ups included.
ReplyDelete
Replies
AI462May 8, 2023 at 7:38 PM
Switch Net 4 in JavaScript (online editor) : https://editor.p5js.org/siobhan.491/sketches/sHU0dNmWk
ReplyDelete
Replies

Add comment

Search This Blog

AI462

Switch Net 4 (Switch Net N) Combine multiple low width neural layers with a fast transform.

Switch Net 4

Comments

Post a Comment

Popular posts from this blog

2 Siding ReLU via Forward Projections

Switch Net