Playing with neuronal networks

The Code and Basic Setup

At first let us consider an easy setup of a neuronal network. This can be systematically implemented by the following Python code which is a generalization of some work found in this Github article. It creates and trains a neuronal network consisting out of n layers (variable 'layers', see CONFIG section) with m neurons each (variable 'neurons_per_layer') and is visualized in Fig. 1.

Network Shape
Fig. 1: Shape of the neuronal network modeled in the Python code.

It takes a single parameter as input and produces one output value. The network can be tested using the 'predict' method found at the end of the code. By running the 'train' method the network will be trained to reproduce the input number lying between 0 and 1 as output which conforms to the identity. For further explanations regarding the code structure read the article mentioned at the top first and remember that the following lines do nothing else but represent the generalization of the training process applying to any number of (hidden) layers, more than just 0 or 1.

import numpy as np

####### CONFIG ##############
training_set_size = 20
layers = 3 #number of hidden layers + input layer + output layer, needs to be >= 2 (2 means there is no hidden layer)
neurons_per_layer = 10
####### END CONFIG ##########

def nonlin(x,deriv=False): #Sigmoid function
            return x*(1-x)

        return 1/(1+np.exp(-x))

syn = [None] * (layers + 2)

def train():
    print ("Training: training set size: " + str(training_set_size) + " network: " + str(layers) + " layers " + str(neurons_per_layer) + " neurons/layer")

    #Create training set
    X = np.random.random((training_set_size,1))
    y = np.random.random((training_set_size,1))
    for i in range(0, training_set_size - 1):
        y[i][0] = X[i][0] * 1  #***TASK*** Her as an example: Identity for numbers between 0 and 1


    #Create random weights for all neurons
    syn[0] = np.array(2*np.random.random((1,neurons_per_layer)) - 1)
    for k in range(1,layers-1):
        syn[k] = np.array(2*np.random.random((neurons_per_layer,neurons_per_layer)) - 1)
    syn[layers - 1] = np.array(2*np.random.random((neurons_per_layer,1)) - 1)

    #Begin of backpropagation learning method using gradient descent
    for j in range(20000): #Number of iterations set to 20000
        l = [None] * (layers + 1)
        l_error = [None] * (layers + 1)
        l_delta = [None] * (layers + 1)
        l[0] = X
        for k in range(1,layers+1):
            l[k] = (nonlin([k-1],syn[k-1])))
        l_error[layers] = y - l[layers]

        if (j% 5000) == 0:
            print ("Error:" + str(np.mean(np.abs(l_error[layers]))))

        l_delta[layers] = l_error[layers]*nonlin(l[layers],deriv=True)

        for k in range(1,layers):
            ki = layers - k

            l_error[ki] = l_delta[ki+1].dot(syn[ki].T)

            l_delta[ki] = l_error[ki] * nonlin(l[ki],deriv=True)

        for k in range(1,layers+1):
            ki = layers - k
            syn[ki] += l[ki][ki+1])

def predict(arg1): #Method for testing the trained (or untrained) network
    x = ([[arg1]])
    ld = [None] * (layers + 2)
    ld[0] = x
    for m in range(1,layers+1):
        ld[m] = nonlin([m-1],syn[m-1]))
    return ld[layers]

Evaluating Examples

We start looking at Fig. 2 which shows the prediction results for a 3 layer network with 10 neurons per (input or hidden) layer in dependence of the training set size. Like in the lines of code shown above the network was asked to learn the identity function. It may seem like that a bigger training set used for teaching the network results in a smaller prediction error and therefore in a better outcome. This is a fallacy. Examining more examples with more training sets we'll see the prediction result worsening again.

Fig. 2: Predicted values for different training set sizes and corresponding prediction error. Note that the exact course of the upper graph representing the identity function is a line from the origin with a slope of 1.

The identity function is a very basic learning example for neuronal networks requiring only one input parameter. One further single input problem would be e.g. finding the square of a number x (x > 0 and x < 1): \(f(x) = x^2\).

It does not take much time to update the Python code and make the neuronal network capable of handling two input parameters so learning processes involving more complex tasks can be examined.

These include:

Addition and Multiplication

Watch the following animated gifs which show the prediction quality of the neuronal network being trained with an increasing amount of data.

In each gif the upper heat map indicates the actual prediction while the lower heat map visualizes the squared deviation between precisely this prediction and the exact value expected by conventional computation without neuronal structures (just run the function directly).

You will recognize that in these two basic examples the learning process goes well as soon as the size of the set used for training the network is big enough. From time to time for specific training set sizes there is some kind of sudden and short deterioration in the result observable. But the general trend appears to promise higher accuracy for bigger training set sizes again.

Network Shape
Fig. 3: Prediction and prediction error of neuronal network trained for addition.
Network Shape
Fig. 4: Prediction and prediction error of neuronal network trained for multiplication.
More Complicated Function with Discontinuity

Now we consider the function which assigns 0 if the squared distance of the two parameters is smaller than 0.1, respectively 1 if that's not the case, and try to make the neuronal network learn it. This task is much harder to train than the two preceding examples involving simple addition and multiplication. An advanced modified implementation which promises to improve the backpropagation algorithm and therefore improve the overall prediction result may be elucidated in future articles. But for now we'll stick to the Python code shown at the beginning.

There are three gif animations for this problem here. The first one shows the prediction results and errors for a network consisting out of 3 neurons (no hidden layer, input layer with 2 neurons plus output layer wth 1 neuron), the neuronal structure used for the second one has got 13 neurons (2 hidden layers plus input layer with 4 neurons per layer plus output layer) and for the third gif there were 31 neurons (4 hidden layers plus input layer with 6 neurons per layer plus output layer) contributing to the prediction.

Network Shape
Fig. 5: Prediction and prediction error of a 3 neuron network trained for a function with discontinuity.
Network Shape
Fig. 6: Prediction and prediction error of a 13 neuron network trained for a function with discontinuity.
Network Shape
Fig. 7: Prediction and prediction error of a 31 neuron network trained for a function with discontinuity.

Three observations can be written down:

Square Function

For the task of finding the square of a number x (x > 0 and x < 1): \(f(x) = x^2\) we go one step further and additionally examine how the observations made in the previous part persist under different structures and magnitudes of the neuronal network (constant number of iterations). Subsequently, the number of layers n and the amount of neurons per layer m are of interest at this point.

The hue in the following animated diagram expresses the size of the total prediction error made by a neuronal network with n layers (horizontal axis) and m neurons per layer (vertical axis). The animation shows how this total prediction error depends on n, m over a rising training set size. A blue field means there is a small (equals good) error, a red field refers to a bad prediction result.

Fig. 8: Total prediction errors for neuronal networks of different shapes trained with various training set sizes.

Like in the preceding paragraph we can make certain observations:

In a similar way we can examine the behavior of prediction results in dependence of the number of iterations when the traning set size is constant. It turns out that the same way of propagation can be observed excluding the inversion part.

Fig. 9: Total prediction errors for neuronal networks of different shapes trained with various total backpropagation iterations.

But which field in the two diagrams above is the "bluest" one marking the perfect neuronal network structure for the given number of iterations or training units respectively?

Fig. 10: Neuronal network shapes that provide the lowest prediction error in dependence of the training set size and the number of backpropagation iterations respectively.

Aiming for the lowest prediction errors and taking into account that the lowest prediction error occurs always for networks with a low number of layers n (from 2 up to 5) we can learn from Fig. 10 what is summarized in the following table:

shape of the neuronal networktraining unitsenough iterations
a lof of neurons per layerlessmore
just a few neurons per layermoreless

So adding additional layers to your network does not significantly affect the learning behavior (apart from slowing it down) and also does not affect the quality of the result. But the general trend suggests that by adding more neurons to each layer you have to do less training and more backpropagation iterations to get the lowest prediction error.