Switch Net 4 (Switch Net N) Combine multiple low width neural layers with a fast transform.

 Switch Net 4 

Switch Net 4.

Random sign flipping before a fast Walsh Hadamard transform results in a Random Projection. For almost any input the result is a Gaussian distributed output vector where each output element contains knowledge of all the input elements.

With Switch Net 4 the output elements are 2-way switched using (x<0?) as a switching predicate. Where x is the input to the switch. 

The switches are grouped together in units of 4 which together form a small width 4 neural. When a particualr x>=0, the pattern in the selected pattern of weights is forward projected with intensity x. When x<0 a different pattern of forward selected weights is again projected with intensity x (x being negative this time.)

If nothing was projected when x<0 then the situation would be identical to using a ReLU. You could view the situation in Switch Net 4 as using 2 ReLUs one with input x and one with input -x. 

The reason to 2-way switch (or +- ReLU) is to avoid early information loss about the input before the net has fully processed it and to allow an output to be constructed through multiple layers without information bottlenecks due to ReLU blocking fully when x<0.

Once the width 4 layers have been computed they are combined using another fast Walsh Hadamard transform.

The transform does 2 things.

First it is one-to-all. A change in a single input alters all the outputs. That makes it an excellent combining algorithm. It is a one-to-all connectionist device.

Second it is a set of orthogonal statistical summary measures. Where each switching decision and forward projected pattern alters the statistical measures in a gradual way.

Since the width 4 neural layers already provide 4-way connectivity you can truncate the later fast Walsh Hadamard transforms by 2 'layers.'

Shortened Switched Net 4 (for net width 8). WHT Truncated by 2 layers.

You can add layers by repeating the section between the red marks.

When choosing the weights you have to include a scaling factor for the WHT.
Which is 1/sqrt(n) for the full WHT and 1/sqrt(n>>>2) for the partial WHT.

Some example code is here:

Of course you could choose a different width size for the small width layers, like 2 or 8 or 16 for example.

One metric for such networks is the number of switching decisions per parameter.
Conventional dense ReLU neural networks score poorly on that basis.
Switch Net 4 scores better and is easy to train.
Switch Net scores even better but for technical reasons you may need to use a net with double or 4 times the input data width. Switch Net also seems to generalize better.
It is an engineering decision whether to use Switch Net 4 or Switch Net.
Switch Net:




Comments

  1. A Walsh Hadamard transform library for CPUs: https://github.com/FALCONN-LIB/FFHT

    ReplyDelete
  2. Switch Net 4 with backpropagation. Written in Processing, a Java like programming language:
    https://archive.org/details/switchnet4bp

    ReplyDelete
  3. Switch Net 4 with backpropagation in FreeBasic:
    https://archive.org/details/switch-net-4-bpfb
    FreeBasic is nearly C, with pointers etc. It should be an easy convert if you wanted to do that. There are also optional Linux AMD64 assembly language speed-ups included.

    ReplyDelete
  4. Switch Net 4 in JavaScript (online editor) : https://editor.p5js.org/siobhan.491/sketches/sHU0dNmWk

    ReplyDelete

Post a Comment

Popular posts from this blog

Switch Net

2 Siding ReLU via Forward Projections

Artificial Neural Networks