Switch Net 4 - reducing the cost of a neural network layer.
Switch Net 4
The layers in a fully connected artificial neural network don't scale nicely with width.
Width of a neural network. |
For width n the number of multiply add operations required per layer is n squared. For a layer of width 256 the number multiply adds would be 65536 (256*256.) Even with modern hardward layers cannot be made much wider than that.
Or can they?
A layer of width 2 only requires 4 operations, width 4 only 16 operations, width 8 only 64 operations. If you could combine k width n (n being small) layers into a new much wider layer you'd end up with a computational advantage.
For example 64 width 4 layers combined into a width 256 layer would cost 64*16=1024 multiply add operations plus the combining cost.
A combining algorithm.
The fast Walsh Hadmard transform can be used as a combiner because a change in a single input causes all the outputs to vary.
The combining cost is n*log2(n) add subtract operations. For a layer of width 256 the combining cost is 2048 add subtract operations. In practice you don't need to do a full WHT transform and the combining cost is n*(log2(n)-2) operations starting with width 4 layers. For a final layer of width 256 (base width 4) that gives 2560 add subtract operations and 1024 multiply operations compaired to 65536 add multiply operations for a normal width 256 layer.
The larger the width the better the scaling comparison goes.
WTH example: http://md2020.eu5.org/wht1.html
The actual Switch Net 4 algorithm uses "2 Siding ReLU via its forward connections" as the activation function (or actually activation scheme.)
See: 2 Siding ReLU
An early example of Switch Net 4 is here:
https://discourse.processing.org/t/switch-net-4-neural-network/33220
A somewhat later version:
https://archive.org/details/swnet
A version using subrandom projections for pre and post-processing:
https://archive.org/details/switchnet4_new
Further Information:
https://archive.org/details/afrozenneuralnetwork
https://archive.org/details/whtebook-archive
https://archive.org/details/zero-curvatue
https://www.kdnuggets.com/2021/07/wht-simpler-fast-fourier-transform-fft.html
Online JavaScript version of Switch Net 4: https://editor.p5js.org/congchuatocmaydangyeu7/sketches/IIZ9L5fzS
ReplyDeleteAnd for comparison Switch Net Wide:
https://editor.p5js.org/congchuatocmaydangyeu7/sketches/7ekZTwQMF