Posts

Showing posts from March, 2023

Switch Net 4 - reducing the cost of a neural network layer.

Image
 Switch Net 4 The layers in a fully connected artificial neural network don't scale nicely with width. Width of a neural network. For width n the number of multiply add operations required per layer is n squared. For a layer of width 256 the number multiply adds would be 65536 (256*256.) Even with modern hardward layers cannot be made much wider than that. Or can they?  A layer of width 2 only requires 4 operations, width 4 only 16 operations, width 8 only 64 operations. If you could combine k width n (n being small) layers into a new much wider layer you'd end up with a computational advantage. For example 64 width 4 layers combined into a width 256 layer would cost 64*16=1024 multiply add operations plus the combining cost.  A combining algorithm. The fast Walsh Hadmard transform can be used as a combiner because a change in a single input causes all the outputs to vary. The combining cost is n*log2(n) add subtract operations. For a layer of width 256 the combining cost is 20

GPT-4 Songs

Image
 GPT-4 Songs And perhaps talking directly to OpenAI

2 Siding ReLU via Forward Projections

Image
2 Siding ReLU via Forward Projections The ReLU acivation function in neural networks has special properties that allow it to be considered in a different way to other activation functions. The Switching Viewpoint You can view ReLU as being non-conducting when x<=0 and fully conducting when x>0.  It is a switch which is automatically turned on when x>0 and automatically turned off when x<=0. Which is an ideal rectifier in electrical engineering terms.  Hence the term  Rectified Linear Unit. A switch isn't there when it is on from the point of view of the flowing electricity (or analogous thing.)  Electricity flows through pushed together switch contacts the same as through the wires to the switch.  All the ReLUs in a neural network that are conducting, wire together various sub-components of the network. The wiring being complete the ReLUs become essentially invisible, until one or more ReLUs change state. Since neural networks are computed in a feedforward fashion the w