Machine Learning with Python: Neural Networks with Scikit

Neural Networks with scikit

Multi-layer Perceptron

image symbolizing scikit

We will start with examples using the multilayer perceptron (MLP). The multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers and each layer is fully connected to the following one. The nodes of the layers are neurons using nonlinear activation functions, except for the nodes of the input layer. There can be one or more non-linear hidden layers between the input and the output layer.

from sklearn.neural_network import MLPClassifier
X = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]
y = [0, 0, 0, 1]
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
                    hidden_layer_sizes=(5, 2), random_state=1)
print(clf.fit(X, y))                         
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(5, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

The following diagram depicts the neural network, that we have trained for our classifier clf. We have two input nodes $X_0$ and $X_1$, called the input layer, and one output neuron 'Out'. We have two hidden layers the first one with the neurons $H_{00}$ ... $H_{04}$ and the second hidden layer consisting of $H_{10}$ and $H_{11}$. Each neuron of the hidden layers and the output neuron possesses a corresponding Bias, i.e. $B_{00}$ is the corresponding Bias to the neuron $H_{00}$, $B_{01}$ is the corresponding Bias to the neuron $H_{01}$ and so on.

Each neuron of the hidden layers receives the output from every neuron of the previous layers and transforms these values with a weighted linear summation $$\sum_{i=0}^{n-1}w_ix_i = w_0x_0 + w_1x_1 + ... + w_{n-1}x_{n-1}$$ into an output value, where n is the number of neurons of the layer and $w_i$ corresponds to the ith component of the weight vector. The output layer receives the values from the last hidden layer. It also performs a linear summation, but a non-linear activation function $$g(\cdot):R \rightarrow R$$ like the hyperbolic tan function will be applied to the summation result.

%matplotlib inline from IPython.display import Image Image(filename='images/mlp_example_layer.png')

image symbolizing scikit

The attribute coefs_ contains a list of weight matrices for every layer. The weight matrix at index i holds the weights between the layer i and layer i + 1.

print(clf.coefs_)
The Python code above returned the following:
[array([[-0.14203691, -1.18304359, -0.85567518, -4.53250719, -0.60466275],
        [-0.69781111, -3.5850093 , -0.26436018, -4.39161248,  0.06644423]]),
 array([[ 0.29179638, -0.14155284],
        [ 4.02666592, -0.61556475],
        [-0.51677234,  0.51479708],
        [ 7.37215202, -0.31936965],
        [ 0.32920668,  0.64428109]]),
 array([[-4.96774269],
        [-0.86330397]])]

The summation formula of the neuron H00 is defined by:

$$\sum_{i=0}^{n-1}w_ix_i = w_0x_0 + w_1x_1 + w_{B_{11}} * B_{11}$$

which can be written as

$$\sum_{i=0}^{n-1}w_ix_i = w_0x_0 + w_1x_1 + w_{B_{11}}$$

because $B_{11} = 1$.

We can get the values for $w_0$ and $w_1$ from clf.coefs_ like this:

$w_0 =$ clf.coefs_[0][0][0] and $w_1 =$ clf.coefs_[0][1][0]

print("w0 = ", clf.coefs_[0][0][0])
print("w1 = ", clf.coefs_[0][1][0])
w0 =  -0.142036912678
w1 =  -0.697811114978

The weight vector of $H_{00}$ can be accessed with

clf.coefs_[0][:,0]
The above code returned the following:
array([-0.14203691, -0.69781111])

We can generalize the above to access a neuron $H_{ij}$ in the following way:

for i in range(len(clf.coefs_)):
    number_neurons_in_layer = clf.coefs_[i].shape[1]
    for j in range(number_neurons_in_layer):
        weights = clf.coefs_[i][:,j]
        print(i, j, weights, end=", ")
        print()
    print()
0 0 [-0.14203691 -0.69781111], 
0 1 [-1.18304359 -3.5850093 ], 
0 2 [-0.85567518 -0.26436018], 
0 3 [-4.53250719 -4.39161248], 
0 4 [-0.60466275  0.06644423], 
1 0 [ 0.29179638  4.02666592 -0.51677234  7.37215202  0.32920668], 
1 1 [-0.14155284 -0.61556475  0.51479708 -0.31936965  0.64428109], 
2 0 [-4.96774269 -0.86330397], 

intercepts_ is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

print(clf.intercepts_)
The above Python code returned the following output:
[array([-0.14962269, -0.59232707, -0.5472481 ,  7.02667699, -0.87510813]),
 array([-3.61417672, -0.76834882]),
 array([ 8.48188176])]

The main reason, why we train a classifier is to predict results for new samples. We can do this with the predict method. The method returns a predicted class for a sample, in our case a "0" or a "1" :

result = clf.predict([[0, 0], [0, 1], 
                      [1, 0], [0, 1], 
                      [1, 1], [2., 2.],
                      [1.3, 1.3], [2, 4.8]])

Instead of just looking at the class results, we can also use the predict_proba method to get the probability estimates.

prob_results = clf.predict_proba([[0, 0], [0, 1], 
                                  [1, 0], [0, 1], 
                                  [1, 1], [2., 2.], 
                                  [1.3, 1.3], [2, 4.8]])
print(prob_results)
[[  1.00000000e+000   5.25723951e-101]
 [  1.00000000e+000   3.71534882e-031]
 [  1.00000000e+000   6.47069178e-029]
 [  1.00000000e+000   3.71534882e-031]
 [  2.07145538e-004   9.99792854e-001]
 [  2.07145538e-004   9.99792854e-001]
 [  2.07145538e-004   9.99792854e-001]
 [  2.07145538e-004   9.99792854e-001]]

prob_results[i][0] gives us the probability for the class0, i.e. a "0" and results[i][1] the probabilty for a "1". i corresponds to the ith sample.

Another Example

We will populate to clusters (class0 and class1) in a two dimensional space.

import numpy as np
from matplotlib import pyplot as plt
npoints = 50
X, Y = [], []
# class 0
X.append(np.random.uniform(low=-2.5, high=2.3, size=(npoints,)) )
Y.append(np.random.uniform(low=-1.7, high=2.8, size=(npoints,)))
# class 1
X.append(np.random.uniform(low=-7.2, high=-4.4, size=(npoints,)) )
Y.append(np.random.uniform(low=3, high=6.5, size=(npoints,)))
learnset = []
learnlabels = []
for i in range(2):
    # adding points of class i to learnset
    points = zip(X[i], Y[i])
    for p in points:
        learnset.append(p)
        learnlabels.append(i)
npoints_test = 3 * npoints
TestX = np.random.uniform(low=-7.2, high=5, size=(npoints_test,)) 
TestY = np.random.uniform(low=-4, high=9, size=(npoints_test,))
testset = []
points = zip(TestX, TestY)
for p in points:
    testset.append(p)
colours = ["b", "r"]
for i in range(2):
    plt.scatter(X[i], Y[i], c=colours[i])
plt.scatter(TestX, TestY, c="g")
plt.show()

We will train a MLPClassifier for out two classes:

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_mldata
from sklearn.neural_network import MLPClassifier
# mlp = MLPClassifier(hidden_layer_sizes=(100, 100), max_iter=400, alpha=1e-4,
#                     solver='sgd', verbose=10, tol=1e-4, random_state=1)
mlp = MLPClassifier(hidden_layer_sizes=(20, 3), max_iter=150, alpha=1e-4,
                    solver='sgd', verbose=10, tol=1e-4, random_state=1,
                    learning_rate_init=.1)
mlp.fit(learnset, learnlabels)
print("Training set score: %f" % mlp.score(learnset, learnlabels))
print("Test set score: %f" % mlp.score(learnset, learnlabels))
mlp.classes_
Iteration 1, loss = 0.47209614
Iteration 2, loss = 0.44614294
Iteration 3, loss = 0.41336245
Iteration 4, loss = 0.37903617
Iteration 5, loss = 0.34893492
Iteration 6, loss = 0.31801372
Iteration 7, loss = 0.28795204
Iteration 8, loss = 0.25973898
Iteration 9, loss = 0.23339132
Iteration 10, loss = 0.20923182
Iteration 11, loss = 0.18742655
Iteration 12, loss = 0.16785779
Iteration 13, loss = 0.15037921
Iteration 14, loss = 0.13479158
Iteration 15, loss = 0.12095939
Iteration 16, loss = 0.10880727
Iteration 17, loss = 0.09810485
Iteration 18, loss = 0.08870370
Iteration 19, loss = 0.08049147
Iteration 20, loss = 0.07329201
Iteration 21, loss = 0.06696649
Iteration 22, loss = 0.06140222
Iteration 23, loss = 0.05650041
Iteration 24, loss = 0.05217473
Iteration 25, loss = 0.04835234
Iteration 26, loss = 0.04497095
Iteration 27, loss = 0.04196786
Iteration 28, loss = 0.03929475
Iteration 29, loss = 0.03690869
Iteration 30, loss = 0.03477277
Iteration 31, loss = 0.03285525
Iteration 32, loss = 0.03112890
Iteration 33, loss = 0.02957041
Iteration 34, loss = 0.02815974
Iteration 35, loss = 0.02687962
Iteration 36, loss = 0.02571506
Iteration 37, loss = 0.02465300
Iteration 38, loss = 0.02368203
Iteration 39, loss = 0.02279213
Iteration 40, loss = 0.02197453
Iteration 41, loss = 0.02122149
Iteration 42, loss = 0.02052625
Iteration 43, loss = 0.01988283
Iteration 44, loss = 0.01928600
Iteration 45, loss = 0.01873112
Iteration 46, loss = 0.01821413
Iteration 47, loss = 0.01773141
Iteration 48, loss = 0.01727976
Iteration 49, loss = 0.01685633
Iteration 50, loss = 0.01645859
Iteration 51, loss = 0.01608425
Iteration 52, loss = 0.01573129
Iteration 53, loss = 0.01539788
Iteration 54, loss = 0.01508238
Iteration 55, loss = 0.01478333
Iteration 56, loss = 0.01449938
Iteration 57, loss = 0.01422935
Iteration 58, loss = 0.01397216
Iteration 59, loss = 0.01372683
Iteration 60, loss = 0.01349248
Iteration 61, loss = 0.01326831
Iteration 62, loss = 0.01305360
Iteration 63, loss = 0.01284768
Iteration 64, loss = 0.01264995
Iteration 65, loss = 0.01245986
Iteration 66, loss = 0.01227692
Iteration 67, loss = 0.01210067
Iteration 68, loss = 0.01193067
Iteration 69, loss = 0.01176657
Iteration 70, loss = 0.01160798
Iteration 71, loss = 0.01145461
Iteration 72, loss = 0.01130613
Iteration 73, loss = 0.01116229
Iteration 74, loss = 0.01102282
Iteration 75, loss = 0.01088750
Iteration 76, loss = 0.01075610
Iteration 77, loss = 0.01062842
Iteration 78, loss = 0.01050428
Iteration 79, loss = 0.01038351
Iteration 80, loss = 0.01026593
Iteration 81, loss = 0.01015136
Iteration 82, loss = 0.01003970
Iteration 83, loss = 0.00993082
Iteration 84, loss = 0.00982460
Iteration 85, loss = 0.00972093
Iteration 86, loss = 0.00961971
Iteration 87, loss = 0.00952082
Iteration 88, loss = 0.00942417
Iteration 89, loss = 0.00932969
Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping.
Training set score: 1.000000
Test set score: 1.000000
The previous Python code returned the following output:
array([0, 1])
predictions = clf.predict(testset)
predictions
The previous Python code returned the following output:
array([1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1,
       1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1,
       1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0,
       1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0,
       0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0])
testset = np.array(testset)
testset[predictions==1]
colours = ['#C0FFFF', "#FFC8C8"]
for i in range(2):
    plt.scatter(X[i], Y[i], c=colours[i])
colours = ["b", "r"]
for i in range(2):
    cls = testset[predictions==i]
    Xt, Yt = zip(*cls)
    plt.scatter(Xt, Yt, marker="D", c=colours[i])