ReLU as a Switch
ReLU as a Switch
The conventional view - ReLU as a function
ReLU as a function. |
ReLU as a Switch
A slight confusion in computer science is that switches are purely digital, from switching gates and such. That appears to be the case because the supply voltage is fixed and that is the only thing switched.
However a light switch in you house is binary on off, yet connects or disconnects an AC sine wave voltage. A switch therefore is a mixed digital analog device.
When on, the input signal goes through 1 to 1. When off, the output is zero.
You can then view ReLU as a switch that is 1 to 1 when on and output zero when off. Of course there is a switching decision to make.
In your house you make the switching decision for a light. With ReLU the switching decision is based on the predicate (x>=0)? Where x is the input to the switch.
You could supply other predicates for the switching decisions if you wanted but switching at x=0 is very helpful for optimization.
A ReLU network with all the switches open. Pa, Pb and Pc are the switching predicates. |
A ReLU network with one switch closed. |
In the diagram with one switch closed you can see some weighted sums connect together. The switching decision Pb is based on a simple weighted sum of the inputs. The outputs are weighted sums (with only 1 connected input each in this case) of a weighted sum.
Weighted sums of weighted sums can be simplifed via linear algebra to an equivelent simple weighted sum.
Each switching decision in a layer in a ReLU neural network is based on some equivalent simple weighted sum of the input values. Though of course the previous layers have decided what that simple weighted sum is via their switching decisions.
A ReLU neural network with 2 switches closed. |
You can also notice some other things like each switch whose output is x projects a pattern onto the next layer through its forward connected weights of intensity x. Of course when the switch is open x=0 and nothing is forward projected.
You can ask the question whether the switch should actually be 2-way rather than 1-way. And have it project an alternative pattern through an alternative set of weights of intensity x when x<0.
You can also view max-pooling as a from of switching.
What a fun perspective! Some people may think a ReLU as a diode, if the x is the electric current...
ReplyDelete