Build Your First Neural Network With Python And Keras

Next Story

Python Scikit Learn Random Forest Classification Tutorial

Keras Deep Learning

Keras is one of the most popular software frameworks used currently for deep learning in python.

It is a higher level api that makes it extremely simple to build deep neural nets on top of frameworks such as Tensorflow, Theano, and CNTK.

In this example, we will use the mnist dataset that is included with the keras package.

If you don’t have keras installed already, you can install it effortlessly with pip.

Once our programming environment is setup, lets first import keras.

import keras

Next, we need to access the mnist dataset. The mnist dataset is a collection of images of handwritten digits ranging from 0 through 9.

The images are 28 by 28 pixels and are labeled with the correct digit. There are 60,000 training images and 10,000 test images in the image set. You can learn more about mnist from this page.

Here is some of examples of what the handwritten digits might look like.

After your initial keras import, let’s add the following.

from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Our above code has loaded the corresponding images via their pixel values into the variables train_images and test_images. The variables train_labels and test_labels store the correct digit that matches with the associated image.

Next up, let’s start building our keras neural network. The one we will be making in this post will only have a single hidden layer to keep things simple.


from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(10, activation='softmax'))

In the above code, we first imported models and layers from keras to make our code more concise when we build our model.  Next, we will create a sequential model and store it in the variable ‘model’.

The Sequential model is just a linear stack of layers that pass data from one layer to the next with transformations at each step.

The first layer we will be adding is a dense layer with 512 units. It has an activation function of relu and input shape of 28*28 (784) by however many rows of data we pass into the network.

The dense layer is the standard layer in keras to map a representation of the input data to the output. The 512 units in our dense layer help get a representation of the data and it is a hyperparameter that can be tuned to best fit respective datasets.

Relu is a popular activation function that gives good results and fast training time. We won’t go into too much detail on the activation function in this post but it has the below activation curve.

After the initial 512 unit dense layer. We add a second dense layer with 10 output units with an activation function of softmax.

The softmax activation gives us a probability for each of the 10 classes that the initial data could potentially be. In our case, the 10 output probabilities represent the probability for each number 0-9.

Keras will automatically handle the input shape for the layers after the initial layer so we don’t have to specify that the second layer has an input of 512 units.

After setting up our model architectures with 2 dense layers, we are ready to compile.

model.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

In the above code, we compiled our Sequential model. We compiled our model with a loss function of categorical cross entropy, optimizer of rmsprop, and an evaluation metric of accuracy.

Categorical cross entropy is a standard loss function for multi-class classification problems where each example belongs to a single class.

Rmsprop is a good optimizer that helps our model update its weights when training.

Accuracy will tell us how well our model can generalize on unseen data.

 

Before we can train our model on our mnist dataset, we actually need to preprocess our data before it can be fed into the network.

Preprocessing must be done on both the train and test sets so that they can be as similar as possible.

First, we will be reshaping our image data which is currently 28×28 by flattening it. Our new dimension of each image will be 1 x 784.

Next, we need to scale our pixel data to be between 0 and 1. The data is currently int values between 0 and 255. To scale it between 0 and 1, we need to first convert the pixel values to floats and then divide by 255.


from keras.utils import to_categorical

# reshape our training image data by flattening it
train_images = train_images.reshape((60000, 28 * 28))
# scale our pixel values so that it fits between 0 and 1
train_images = train_images.astype('float32') / 255

# reshape our testing image data by flattening it
test_images = test_images.reshape((10000, 28 * 28))
# scale our pixel values so that it fits between 0 and 1
test_images = test_images.astype('float32') / 255

# encoding our labels
# eg. a label of 6 would be transformed to a list of
# [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
# all the values in the list would be 0 except for the number that it represents

# encode our train labels into a vector
train_labels = to_categorical(train_labels)
# endoe our test labels
test_labels = to_categorical(test_labels)

 

Lastly, we can finally fit our neural network on the processed training data. Once
the model is fit properly, it can be used to predict future values.

In our case, we will be evaluating the test images and their respective labels.


model.fit(train_images, train_labels, epochs=5, batch_size=128)
test_loss, test_acc = model.evaluate(test_images, test_labels)

We fit our model on the training images and labels by running through the entire training set 5 times or 5 epochs. We will be making updates to our model weights every 128 training units.

After fitting, we can take a look at how our model is doing by testing it on unseen data.

The fitted model will predict labels for the test_images and then those predicted labels will be compared with actual labels to determine accuracy.

We store data on our evaluation into the test_loss and test_acc variables.

You should be able to get accuracies above .97+ even with this simple architecture.

The complete code for this post can be found below:


import keras
from keras import models
from keras import layers
from keras.datasets import mnist
from keras.utils import to_categorical


(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

model.fit(train_images, train_labels, epochs=5, batch_size=128)

test_loss, test_acc = model.evaluate(test_images, test_labels)
https://www.googletagmanager.com/gtag/js?id=UA-63695651-4