## Training a Neural Network with Python

### Introduction

In the chapter "Running Neural Networks", we programmed a class in Python code called 'NeuralNetwork'. The instances of this class are networks with three layers. When we instantiate an ANN of this class, the weight matrices between the layers are automatically and randomly chosen. It is even possible to run such a ANN on some input, but naturally it doesn't make a lot of sense exept for testing purposes. Such an ANN cannot provide correct classification results. In fact, the classification results are in no way adapted to the expected results. The values of the weight matrices have to be set according the the classification task. We need to improve the weight values, which means that we have to train our network. To train it we have to implement backpropagation in the `train`

method. If you don't understand backpropagation and want to understand it, we recommend to go back to the chapter Backpropagation in Neural Networks.

After knowing und hopefully understanding backpropagation, you are ready to fully understand the `train`

method.

The `train`

method is called with an input vector and a target vector. The shape of the vectors can be one-dimensional, but they will be automatically turned into the correct two-dimensional shape, i.e. `reshape(input_vector.size, 1)`

and `reshape(target_vector.size, 1)`

. After this we call the `run`

method to get the result of the network `output_vector_network = self.run(input_vector)`

. This output may differ from the `target_vector`

. We calculate the `output_error`

by subtracting the output of the network `output_vector_network`

from the `target_vector`

.

```
import numpy as np
@np.vectorize
def sigmoid(x):
return 1 / (1 + np.e ** -x)
activation_function = sigmoid
from scipy.stats import truncnorm
def truncated_normal(mean=0, sd=1, low=0, upp=10):
return truncnorm(
(low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)
class NeuralNetwork:
def __init__(self,
no_of_in_nodes,
no_of_out_nodes,
no_of_hidden_nodes,
learning_rate):
self.no_of_in_nodes = no_of_in_nodes
self.no_of_out_nodes = no_of_out_nodes
self.no_of_hidden_nodes = no_of_hidden_nodes
self.learning_rate = learning_rate
self.create_weight_matrices()
def create_weight_matrices(self):
""" A method to initialize the weight matrices of the neural network"""
rad = 1 / np.sqrt(self.no_of_in_nodes)
X = truncated_normal(mean=0, sd=1, low=-rad, upp=rad)
self.weights_in_hidden = X.rvs((self.no_of_hidden_nodes,
self.no_of_in_nodes))
rad = 1 / np.sqrt(self.no_of_hidden_nodes)
X = truncated_normal(mean=0, sd=1, low=-rad, upp=rad)
self.weights_hidden_out = X.rvs((self.no_of_out_nodes,
self.no_of_hidden_nodes))
def train(self, input_vector, target_vector):
"""
input_vector and target_vector can be tuples, lists or ndarrays
"""
# make sure that the vectors have the right shape
input_vector = np.array(input_vector)
input_vector = input_vector.reshape(input_vector.size, 1)
target_vector = np.array(target_vector).reshape(target_vector.size, 1)
output_vector_hidden = activation_function(self.weights_in_hidden @ input_vector)
output_vector_network = activation_function(self.weights_hidden_out @ output_vector_hidden)
output_error = target_vector - output_vector_network
tmp = output_error * output_vector_network * (1.0 - output_vector_network)
self.weights_hidden_out += self.learning_rate * (tmp @ output_vector_hidden.T)
# calculate hidden errors:
hidden_errors = self.weights_hidden_out.T @ output_error
# update the weights:
tmp = hidden_errors * output_vector_hidden * (1.0 - output_vector_hidden)
self.weights_in_hidden += self.learning_rate * (tmp @ input_vector.T)
def run(self, input_vector):
"""
running the network with an input vector 'input_vector'.
'input_vector' can be tuple, list or ndarray
"""
# make sure that input_vector is a column vector:
input_vector = np.array(input_vector)
input_vector = input_vector.reshape(input_vector.size, 1)
input4hidden = activation_function(self.weights_in_hidden @ input_vector)
output_vector_network = activation_function(self.weights_hidden_out @ input4hidden)
return output_vector_network
```

We assume that you save the previous code in a file called `neural_networks1.py`

. We will use it under this name in the coming examples.

To test this neural network class we need train and test data. We create the data with `make_blobs`

from `sklearn.datasets`

.

```
from sklearn.datasets import make_blobs
n_samples = 300
samples, labels = make_blobs(n_samples=n_samples,
centers=([2, 6], [6, 2]),
random_state=0)
```

Let us visualize the previously created data:

```
import matplotlib.pyplot as plt
colours = ('green', 'red', 'blue', 'magenta', 'yellow', 'cyan')
fig, ax = plt.subplots()
for n_class in range(2):
ax.scatter(samples[labels==n_class][:, 0], samples[labels==n_class][:, 1],
c=colours[n_class], s=40, label=str(n_class))
```

We are going to create a train and a test data set:

```
size_of_learn_sample = int(n_samples * 0.8)
learn_data = samples[:size_of_learn_sample]
test_data = samples[-size_of_learn_sample:]
```

We create a neural network with two input nodes, two hidden nodes and one output node:

```
from neural_networks1 import NeuralNetwork
simple_network = NeuralNetwork(no_of_in_nodes=2,
no_of_out_nodes=1,
no_of_hidden_nodes=5,
learning_rate=0.3)
```

The next step consists in training our network with the samples from our training samples:

```
for i in range(size_of_learn_sample):
simple_network.train(learn_data[i], labels[i])
```

We now have to check how well our network has learned. The network has only one output neuron. This means that the values will be between 0 and 1. If the output values were ideal, which they cannot be, the output values would be one for class 1 and 0 for class 0. Due to the sigmoid function 0 and and will not be even a possible result. So we have to assign result to the values between 0 and 1. We use 0.5 as a threshold. Everything greater or equal than 0.5 is considered to be 1 and everything smaller is taken as a 0. Now we are capable of comparing the results with the labels:

```
from collections import Counter
evaluation = Counter()
for i in range(size_of_learn_sample):
point, label = learn_data[i], labels[i]
res = simple_network.run(point)
if label == 1:
if res >= 0.5:
evaluation["correct"] += 1
else:
evaluation["wrong"] += 1
elif label == 0:
if res <= 0.5:
evaluation["correct"] += 1
else:
evaluation["wrong"] += 1
print(evaluation)
```

The flaw in the design above is this: If we have a value of 0.5 or close to this, the classifier is rather undecided. The result is in the middle between two possible results.

```
from collections import Counter
def evaluate(data, labels, threshold=0.5):
evaluation = Counter()
for i in range(len(data)):
point, label = data[i], labels[i]
res = simple_network.run(point)
if threshold < res < 1 - threshold:
evaluation["undecided"] += 1
elif label == 1:
if res >= 1 - threshold:
evaluation["correct"] += 1
else:
evaluation["wrong"] += 1
elif label == 0:
if res <= threshold:
evaluation["correct"] += 1
else:
evaluation["wrong"] += 1
return evaluation
res = evaluate(learn_data, labels)
res
```

### Neural Network with Bias Nodes

We already introduced the basic idea and necessity of bias node in the chapter "Simple Neural Network", in which we focussed on very simple linearly separable data sets. We learned that a bias node is a node that is always returning the same output. In other words: It is a node which is not depending on some input and it does not have any input. The value of a bias node is often set to one, but it can be set to other values as well. Except for zero, which makes no sense for obvious reasons. If a neural network does not have a bias node in a given layer, it will not be able to produce output in the next layer that differs from 0 when the feature values are 0. Generally speaking, we can say that bias nodes are used to increase the flexibility of the network to fit the data. Usually, there will be not more than one bias node per layer. The only exception is the output layer, because it makes no sense to add a bias node to this layer.

The following diagram shows the first two layers of our previously used three-layered neural network:

We can see from this diagram that our weight matrix needs one additional column and the bias value has to be added to the input vector:

Again, the situation for the weight matrix between the hidden and the output layer is similar:

The same is true for the corresponding matrix:

The following is a complete Python class implementing our network with bias nodes:

```
import numpy as np
from scipy.stats import truncnorm
@np.vectorize
def sigmoid(x):
return 1 / (1 + np.e ** -x)
activation_function = sigmoid
def truncated_normal(mean=0, sd=1, low=0, upp=10):
return truncnorm(
(low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)
class NeuralNetwork:
def __init__(self,
no_of_in_nodes,
no_of_out_nodes,
no_of_hidden_nodes,
learning_rate,
bias=None):
self.no_of_in_nodes = no_of_in_nodes
self.no_of_hidden_nodes = no_of_hidden_nodes
self.no_of_out_nodes = no_of_out_nodes
self.learning_rate = learning_rate
self.bias = bias
self.create_weight_matrices()
def create_weight_matrices(self):
""" A method to initialize the weight matrices of the neural
network with optional bias nodes"""
bias_node = 1 if self.bias else 0
rad = 1 / np.sqrt(self.no_of_in_nodes + bias_node)
X = truncated_normal(mean=0, sd=1, low=-rad, upp=rad)
self.weights_in_hidden = X.rvs((self.no_of_hidden_nodes,
self.no_of_in_nodes + bias_node))
rad = 1 / np.sqrt(self.no_of_hidden_nodes + bias_node)
X = truncated_normal(mean=0, sd=1, low=-rad, upp=rad)
self.weights_hidden_out = X.rvs((self.no_of_out_nodes,
self.no_of_hidden_nodes + bias_node))
def train(self, input_vector, target_vector):
""" input_vector and target_vector can be tuple, list or ndarray """
# make sure that the vectors have the right shap
input_vector = np.array(input_vector)
input_vector = input_vector.reshape(input_vector.size, 1)
if self.bias:
# adding bias node to the end of the input_vector
input_vector = np.concatenate( (input_vector, [[self.bias]]) )
target_vector = np.array(target_vector).reshape(target_vector.size, 1)
output_vector_hidden = activation_function(self.weights_in_hidden @ input_vector)
if self.bias:
output_vector_hidden = np.concatenate( (output_vector_hidden, [[self.bias]]) )
output_vector_network = activation_function(self.weights_hidden_out @ output_vector_hidden)
output_error = target_vector - output_vector_network
# update the weights:
tmp = output_error * output_vector_network * (1.0 - output_vector_network)
self.weights_hidden_out += self.learning_rate * (tmp @ output_vector_hidden.T)
# calculate hidden errors:
hidden_errors = self.weights_hidden_out.T @ output_error
# update the weights:
tmp = hidden_errors * output_vector_hidden * (1.0 - output_vector_hidden)
if self.bias:
x = (tmp @input_vector.T)[:-1,:] # last row cut off,
else:
x = tmp @ input_vector.T
self.weights_in_hidden += self.learning_rate * x
def run(self, input_vector):
"""
running the network with an input vector 'input_vector'.
'input_vector' can be tuple, list or ndarray
"""
# make sure that input_vector is a column vector:
input_vector = np.array(input_vector)
input_vector = input_vector.reshape(input_vector.size, 1)
if self.bias:
# adding bias node to the end of the inpuy_vector
input_vector = np.concatenate( (input_vector, [[1]]) )
input4hidden = activation_function(self.weights_in_hidden @ input_vector)
if self.bias:
input4hidden = np.concatenate( (input4hidden, [[1]]) )
output_vector_network = activation_function(self.weights_hidden_out @ input4hidden)
return output_vector_network
def evaluate(self, data, labels):
corrects, wrongs = 0, 0
for i in range(len(data)):
res = self.run(data[i])
res_max = res.argmax()
if res_max == labels[i]:
corrects += 1
else:
wrongs += 1
return corrects, wrongs
```

```
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
data, labels = make_blobs(n_samples=250,
centers=([2.5, 3], [6.7, 7.9]),
random_state=0)
data, labels = make_blobs(n_samples=250,
centers=([2, 7.9], [8, 3]),
random_state=0)
colours = ('green', 'blue', 'red', 'magenta', 'yellow', 'cyan')
fig, ax = plt.subplots()
for n_class in range(2):
ax.scatter(data[labels==n_class][:, 0], data[labels==n_class][:, 1],
c=colours[n_class], s=40, label=str(n_class))
```

```
from neural_networks2 import NeuralNetwork
simple_network = NeuralNetwork(no_of_in_nodes=2,
no_of_out_nodes=2,
no_of_hidden_nodes=10,
learning_rate=0.1,
bias=1)
simple_network.__dict__
```

```
import numpy as np
labels_one_hot = (np.arange(2) == labels.reshape(labels.size, 1))
labels_one_hot = labels_one_hot.astype(np.float)
for i in range(len(data)):
simple_network.train(data[i], labels_one_hot[i])
simple_network.evaluate(data, labels)
```

```
data
```

```
```