ANN

In Computer Science,

The artificial Neurons

by Tanya Jain

16 mins read

The functioning of biological neuron.¶

Dendrites extend from the neuron's cell body like branches of a tree extend from its trunk. However, in the human body, dendrites of one neuron connect to those of other neurons at a connection point called synapse.

Signal is received as inputs through the synapse which either excites(/fires/activates) the cell or inhibits its firing. The synaptic strength of each cell decides the intensity with which the cell fires up.

This product of input from the synapse into the synapse strength is summed up in the cell body. If this cummulative excitation, excedes the threshold value, the cell fires (or gets activated) and then further sends this signal down the axon to other neurons.

Biological neurons and their corresponding analogous artificial neurons¶

Well labelled Biological Neural Network analogous to Artificial Neural Network

Biological Neuron Parts	Analogous Artificial Neuron Parts or function
synapse via axon	source of input or destination of output
synaptic strength	weight (or strength) of each input for activation
cell body	Summation Block where cummulative frequency computed for resultant activation and compared to threshold

The artificial neuron architecture¶

designed to mimic the first-order characteristics of biological neuron

A set of inputs are fed/applied of which, each input is also the representation of output of the neurons they come from. Each input is multiplied to its corresponding weight (synaptic strength) and then cummulatively summed up.

You can read The Importance of Synaptic Strength to understand its need via the real life applicability of sportsmanship.

Computational Layer¶

Summation Function or Summation Block¶

Let a set of inputs $x_{1}, x_{2}, ..., x_{n}$ be represented as a vector X, associated with a set of weight $w_{1}, w_{2}, ..., w_{n}$ represented as a vector W. Each input is multiplied to its corresponding weight, that is, each element of vector X is multiplied to each corresponding element in vector W. These products are cummulatively summed up (algebraically) in the summation block ( $∑$ ) producing an output called NET. The vector notation is hence follows:

$NET = XW$

Matrix Multiplication¶

The dimensions of matrix W are $m$ rows by $n$ columns, where $m$ is the number of inputs, and $n$ is the number of neurons. Hence, the weight connecting the third input to second neuron will be represented as $w_{3,2}$. Via simple matrix multiplication, NET and X are row vectors in $X \cdot W = NET$.

For illustration, let's take the example of the logical OR function.

Input 1	Input 2	Output = Input 1 $\lor$ Input 2
0	0	0
0	1	1
1	0	1
1	1	1

This function can be represented in terms of matrices as follows where the two inputs, Input 1 and Input 2 are combined to form matrix $X$. A weight matrix is generated randomly of size 2x1 such that we get a resultant OUT matrix of dimension 4x1. Using the following operations we shall observe the output derived to approximate the values given in output above, regardless of the function used.

$$X \cdot W = NET$$$$ \begin{bmatrix} 0 & 0\\ 0 & 1\\ 1 & 0\\ 1 & 1 \end{bmatrix} \cdot \begin{bmatrix} 0.96231994\\ 0.53628568 \end{bmatrix} = \begin{bmatrix} 0\\ 0.53628568\\ 0.96231994\\ 1.49860562 \end{bmatrix}$$

This can be implemented in python as:

import numpy

X = numpy.array([ [0,0],[0,1],[1,0],[1,1] ])

global w1
w1 = numpy.random.random((2,1))

numpy.dot(X,w1)

Activation Function or Processing Block¶

The output/signal received at NET is further process by an activation function F to produce the neuron's output signal, OUT. The activation functions checks like a biological neuron whether the signal must be fired up(or activated) or not. This can be represented as the following linear function: $OUT = K(NET)$, where $K =$ constant, threshold function. Hence,

$OUT = 1$ if $NET > T$

$OUT = 0$ otherwise

where $T$ is a constant threshold value, also known as bias.

Matrix¶

Checking the matrix output $NET$ and comparing it with threshold, $T$ as defined above taking $T = 0.5$:

$$T(NET) = OUTPUT$$$$T \begin{bmatrix} 0\\ 0.53628568\\ 0.96231994\\ 1.49860562 \end{bmatrix} = \begin{bmatrix} 0\\ 1\\ 1\\ 1 \end{bmatrix}$$

Squashing Function¶

The processing block $F$ is called a squashing function when it compresses the range of $NET$ such that OUT doesn't exceed some low limits regardless of the value of NET.

Well labelled Single Layer Neural Network

Sigmoid¶

$F$ is often chosen to be the logistic function or "sigmoid" (which means S-shaped) represented mathematically as: $F(x) = 1/(1+e^{-x})$. Hence, $$OUT = 1/(1+e^{-NET})$$.

Activation function is a non-linear gain for the artificial neuron which is the ratio of the change in OUT to a small change in NET, also equal to the slope of the curve at a special excitation level.

Excitation value	Shape of the curve	Activation function value
large, negative	~ horizontal	low
zero	steep	high
large, positive	~ horizontal	low

$$sigmoid(NET) = OUTPUT$$$$sigmoid \begin{bmatrix} 0\\ 0.53628568\\ 0.96231994\\ 1.49860562 \end{bmatrix} = \begin{bmatrix} 0.5\\ 0.63094795\\ 0.72358606\\ 0.81736642 \end{bmatrix}$$

Finding by Grossberg (1973)¶

The activation function's non linear gain characteristic solves the noise-saturation dilemma, "How can the same network handle both small and large signals?".

Hyperbolic Tangent¶

often used by biologists as a mathematical model of nerve-cell ativation

It is another commonly used activation function, expressed as:

$$\tanh(NET) = OUTPUT$$$$\tanh \begin{bmatrix} 0\\ 0.53628568\\ 0.96231994\\ 1.49860562 \end{bmatrix} = \begin{bmatrix} 0\\ 0.49017122\\ 0.7453099\\ 0.90489596 \end{bmatrix} \longrightarrow \begin{bmatrix} 0\\ 1\\ 1\\ 1 \end{bmatrix}$$

Difference between the Sigmoid Logistic function and Hyperbolic Tangent function

	Sigmoid Logistic function	Hyperbolic Tangent function
Mathematical Function	$OUT = 1/(1+e^{-NET})$	$OUT = \tanh(NET)$
Shape	S-shaped, throughout positive	S-shaped, symmetrical about origin
Condition for OUT = 0	NET ≤ T	NET = 0
Values for OUT	Positive (Unipolar)	Bipolar

Limitations with respect to biological counterpart¶

It overlooks the time delays affecting the system dynamics, rather inputs produce immediate outputs.
It overlooks the synchronism or the frequency-modulation of the biological neuron.

Single Layer Artificial Neural Networks¶

The number of layers in an artificial neural network is determined by the number of computational layers in the network, that is, the layers that process the NET to produce OUT. The circular nodes on the left-most only distribute the inputs and perform no computation hence, will not considered to constitute a layer.

The set of inputs X has each of its elements connected to each artificial neuron through a separate weight.

Multi Layer Artificial Neural Networks¶

Multilayer networks may be formed by cascading a group of single layers as the output of one layer provides the input to the subsequent layers. Though this would not provide any increase in computational power unless there is a non-linear activation function between layers.

This can be expressed with vectors as:

If there is no non-linear activation function in between layers, the output from the product of input and the intial weight layer will serve as the input to be multiplied with the next weight layer, that is: $$(XW_{1})W_{2}$$ Since matrix multiplication is associative: $$= X(W_{1}W_{2})$$

Hence, a two layer linear network is equivalent to a single layer linear network having a weight matrix equal to the product of the two weight matrices.

Python Code Implementation¶

Single Layer Neural Network¶

In [1]:

import numpy

In [2]:

#input layer
X = numpy.array([ [0,0],[0,1],[1,0],[1,1] ])

#expected output layer
y = numpy.array([[0,1,1,1]]).T

#weight matrix
global w1
w1 = numpy.random.random((2,1))

dot_product = numpy.dot(X,w1)

print("X = ",)
print(X)
print("")
print("y = ",)
print(y)
print("")
print("weight matrix w1 = ",)
print(w1)
print("-"*60)
print("dot product of x and w1 = ",)
print(dot_product)

X = 
[[0 0]
 [0 1]
 [1 0]
 [1 1]]

y = 
[[0]
 [1]
 [1]
 [1]]

weight matrix w1 = 
[[0.96231994]
 [0.53628568]]
------------------------------------------------------------
dot product of x and w1 = 
[[0.        ]
 [0.53628568]
 [0.96231994]
 [1.49860562]]

In [3]:

def threshold_value(x, threshold=0.5):
    OUT = numpy.empty( [ len(x),len(x[0]) ] )
    for m in range(len(x)):
        for n in range(len(x[0])):
            if x[m][n] > threshold:
                OUT[m][n] = 1
            else:
                OUT[m][n] = 0
    return OUT

def sigmoid(x, derivative = False):
    if derivative == True: return x(1-x)
    return (1/(1 + numpy.exp(-x)))

def tanh(x):
    return numpy.tanh(x)

def compute_single_layer(summation, activation='sigmoid'):
    if activation == 'threshold_value':
        layer1 = threshold_value(summation)
    elif activation == 'sigmoid':
        layer1 = sigmoid(summation)
    elif activation == 'tanh':
        layer1 = tanh(summation)
    return layer1

In [4]:

print("-"*60)
print("Output with threshold value activation = ",)
print(compute_single_layer(dot_product,'threshold_value'))
print("-"*60)
print("Output with sigmoid activation = ",)
print(compute_single_layer(dot_product,'sigmoid'))
print("-"*60)
print("Output with tanh activation = ",)
print(compute_single_layer(dot_product,'tanh'))

------------------------------------------------------------
Output with threshold value activation = 
[[0.]
 [1.]
 [1.]
 [1.]]
------------------------------------------------------------
Output with sigmoid activation = 
[[0.5       ]
 [0.63094795]
 [0.72358606]
 [0.81736642]]
------------------------------------------------------------
Output with tanh activation = 
[[0.        ]
 [0.49017122]
 [0.7453099 ]
 [0.90489596]]

Multi Layer Neural Network¶

In [5]:

#input layer
X = numpy.array([ [0,0],[0,1],[1,0],[1,1] ])

#expected output layer
y = numpy.array([[0,1,1,1]]).T

#weight matrix
global w01
w01 = numpy.random.random((2,3))

global w02
w02 = numpy.random.random((3,1))

print("X = ",)
print(X)
print("")
print("y = ",)
print(y)
print("")
print("weight matrix w1 = ",)
print(w01)
print("")
print("weight matrix w2 = ",)
print(w02)

X = 
[[0 0]
 [0 1]
 [1 0]
 [1 1]]

y = 
[[0]
 [1]
 [1]
 [1]]

weight matrix w1 = 
[[0.26021624 0.51186679 0.6512579 ]
 [0.35512576 0.03031688 0.61480384]]

weight matrix w2 = 
[[0.37076634]
 [0.65789418]
 [0.87056282]]

Computing (x . w01 ) . w02¶

In [6]:

layer_1 = numpy.dot(X, w01)
layer_2 = numpy.dot(layer_1, w02)
print("(x . w01 ) . w02 = ",)
print(layer_2)

(x . w01 ) . w02 = 
[[0.        ]
 [0.68683933]
 [1.00019451]
 [1.68703385]]

Computing x . (w01 . w02)¶

In [7]:

layer_1 = numpy.dot(w01, w02)
layer_2 = numpy.dot(X, layer_1)
print("x . (w01 . w02) = ",)
print(layer_2)

x . (w01 . w02) = 
[[0.        ]
 [0.68683933]
 [1.00019451]
 [1.68703385]]

Hence, proved that (x . w01 ) . w02 = x . (w01 . w02)¶

This implies that the number of layers in a neural network is defined by number of computational layers (those using non-linear activation functions).

References¶

[1] P. D. Wasserman, Neural computing: theory and practice. Estados Unidos: Van Nostrand Reinhold, 1989.