Switch Net 4 - reducing the cost of a neural network layer.

 Switch Net 4

The layers in a fully connected artificial neural network don't scale nicely with width.

Width of a neural network.

For width n the number of multiply add operations required per layer is n squared. For a layer of width 256 the number multiply adds would be 65536 (256*256.) Even with modern hardward layers cannot be made much wider than that.

Or can they? 

A layer of width 2 only requires 4 operations, width 4 only 16 operations, width 8 only 64 operations. If you could combine k width n (n being small) layers into a new much wider layer you'd end up with a computational advantage.

For example 64 width 4 layers combined into a width 256 layer would cost 64*16=1024 multiply add operations plus the combining cost. 

A combining algorithm.

The fast Walsh Hadmard transform can be used as a combiner because a change in a single input causes all the outputs to vary.


The combining cost is n*log2(n) add subtract operations. For a layer of width 256 the combining cost is 2048 add subtract operations. In practice you don't need to do a full WHT transform and the combining cost is n*(log2(n)-2) operations starting with width 4 layers. For a final layer of width 256 (base width 4) that gives 2560 add subtract operations and 1024 multiply operations compaired to 65536 add multiply operations for a normal width 256 layer.

The larger the width the better the scaling comparison goes.

WTH example: http://md2020.eu5.org/wht1.html

The actual Switch Net 4 algorithm uses "2 Siding ReLU via its forward connections" as the activation function (or actually activation scheme.)

See: 2 Siding ReLU

An early example of Switch Net 4 is here:

https://discourse.processing.org/t/switch-net-4-neural-network/33220

A somewhat later version:

https://archive.org/details/swnet

A version using subrandom projections for pre and post-processing:

https://archive.org/details/switchnet4_new

Further Information:

https://archive.org/details/afrozenneuralnetwork

https://archive.org/details/whtebook-archive

https://archive.org/details/zero-curvatue

https://www.kdnuggets.com/2021/07/wht-simpler-fast-fourier-transform-fft.html






Comments

  1. Online JavaScript version of Switch Net 4: https://editor.p5js.org/congchuatocmaydangyeu7/sketches/IIZ9L5fzS
    And for comparison Switch Net Wide:
    https://editor.p5js.org/congchuatocmaydangyeu7/sketches/7ekZTwQMF

    ReplyDelete

Post a Comment

Popular posts from this blog

Switch Net

Switch Net 4 (Switch Net N) Combine multiple low width neural layers with a fast transform.

Random Projections for Neural Networks