2 Siding ReLU via Forward Projections

March 30, 2023

2 Siding ReLU via Forward Projections

The ReLU acivation function in neural networks has special properties that allow it to be considered in a different way to other activation functions.

The Switching Viewpoint

You can view ReLU as being non-conducting when x<=0 and fully conducting when x>0.

It is a switch which is automatically turned on when x>0 and automatically turned off when x<=0. Which is an ideal rectifier in electrical engineering terms.

Hence the term Rectified Linear Unit.

A switch isn't there when it is on from the point of view of the flowing electricity (or analogous thing.) Electricity flows through pushed together switch contacts the same as through the wires to the switch.

All the ReLUs in a neural network that are conducting, wire together various sub-components of the network.

The wiring being complete the ReLUs become essentially invisible, until one or more ReLUs change state.

Since neural networks are computed in a feedforward fashion the wiring together of weighted sums happens in a feedforward way. In the layers already computed, wired together structures of weighted sums exist. In the current layer swirching decisions are made based on those preexisting structures of weighted sums acting on the input vector.

Wired together structures of weighted sums can be simplifed to a single weighted sum using linear algebra, if you wanted to examine the network for a particular input.

2 Siding ReLU

Five forward connected weights from neuron N to the next layer.

Consider the ReLU neuron N in the diagram. The output of N is forward connected to the next layer by 5 weights, one for each neuron in the next layer.

When the value x (the input to the ReLU function of neuron N) is greater than zero the pattern of the forward connected weights is projected into the next layer with intensity x.

When x<=0 nothing is projected into next layer. Then the neuron N is in a non-conducting state and it entirely blocks any information trying to flow through it.

Why not have an alternative set of forward connected weights for when x<=0 and project that different pattern onto the next layer with intensity x (x being negative this time)?

Alternative set of 5 forward connected weights when x<=0.

This prevents the information loss caused by ReLUs impeding the flow of information in the network, including early loss of information before it can be processed.

It also causes the cost landscape during optimization to be far less rough than when single sided ReLUs are used. This can greatly help many neural network training/optimisation algorithms

Here is an example of 2 Siding ReLU using the processing P5 library. You get a local copy of the code which you can interact with.

https://editor.p5js.org/congchuatocmaydangyeu7/sketches/iZwTnULQ2

The end.

Comments

AI462April 6, 2023 at 5:22 AM
This paper has been mentioned:
https://paperswithcode.com/method/crelu#:~:text=CReLU%2C%20or%20Concatenated%20Rectified%20Linear,non%2Dsaturated%20non%2Dlinearity.
ReplyDelete
Replies

Add comment

Search This Blog

AI462