Convolutional Neural Networks (CNN) are the foundation of implementations of deep learning for computer vision, which include image classification. TensorFlow lets you build CNN architectures with tremendous flexibility, for tasks like image classification and object detection, but can be a bit challenging at first.
Concepts Required to Understand CNN on TensorFlow
What is a tensor?
A tensor is a way to represent deep learning data. It is a multidimensional array, used to store data for multiple features of a dataset, where each feature represents an additional dimension. For example, a 3-dimensional tensor is a “cube” storing values along three axes.
What is a computational graph?
The TensorFlow computational graph represents the flow of operations that occur during training of a deep learning model. For CNN models, the computational graph can be quite complex. Below is an example of a simple graph. You can visualize your model’s computational graph using TensorBoard – learn more about TensorFlow visualization .
What is a constant?
Used in TensorFlow to store constant values that don’t change. Used for nodes that are required to stay the same throughout model training. A constant does not take inputs.
What is a placeholder?
Used to feed input when running a model. A placeholder can take parameters, and so it can be modified in runtime, while you are running the computational graph.
What is a variable?
Used to modify the computational graph, adding parameters or nodes to the graph which are trainable.
TensorFlow API levels
TensorFlow lets you work directly with Tensors to build a neural network from the ground up. However, instead of using these low-level APIs which can be quite complex, TensorFlow recommends working with the higher-level Estimators API. This API enables object detection in TensorFlow, allowing you to define an object, at a higher level of abstraction, which creates and trains deep learning structures.
TensorFlow Object Detection
TensorFlow provides the Object Detection API, an open source framework that allows you to perform image recognition and image segmentation tasks.
Basic TensorFlow CNN Example: Using MNIST Dataset with Estimators
A great way to get started with CNN on TensorFlow is to work with examples based on standard datasets. These datasets are built into TensorFlow and will give you predictable results, helping you learn to run and tune a model.
The TensorFlow MNIST example builds a TensorFlow object detection Estimator that creates a Convolutional Neural Network, which can classify handwritten digits in the MNIST dataset. Below are the general steps.
Architecture:
- Convolutional layer with 32 5×5 filters
- Pooling layer with 2×2 filter
- Convolutional layer with 64 5×5 filters
- Pooling layer with 2×2 filter
- Dense layer with 1024 neurons
- Dense layer with 10 neurons, to predict the digit for the current image
Process:
- Build the input layer using the reshape() function.
- Build the convolutional/pooling layers using the layers.conv2d() and layers.max_pooling2d() functions.
- Build the dense layers using the layers.dense() function.
- Generate predictions by running the softmax() function.
- Calculate loss by running the losses.sparse_softmax_cross_entropy() function.
- Configure the training operation using the optimizer.minimize() function.
- Add an evaluation metric using tf.metrics.accuracy()
- Load data using the mnist.load_data() function.
- Define an Estimator for the custom object detection model (the example provides a ready-made estimator for MNIST data).
- Train the model by running the train() function on the Estimator object.
- Evaluate the model on all MNIST images using the evaluate() function.
Example code and tutorial: https://www.tensorflow.org/tutorials/estimator/keras_model_to_estimator
Intermediate TensorFlow CNN Example: Fashion-MNIST Dataset with Estimators
This is a slightly more advanced example using 28×28 grayscale images of 65,000 fashion products in 10 categories. The dataset was presented in an article by Xiao, Rasul and Vollgraf, and is not built into TensorFlow, so you’ll need to import it and perform some pre-processing.
Architecture: The model uses three convolutional layers:
- Convolutional layer with 32-3 and 3 filters
- Max pooling layer with 2×2 filter
- Convolutional layer with 64-4 and 3 filters
- Max pooling layer with 2×2 filter
- Convolutional layer with 128-3 and 3 filters
- Max pooling layer with 2×2 filter
- Flattening
- Dense layer with 128 neurons
- Output layer with 10 neurons corresponding to the 10 fashion categories
Process:
- Load data – requires one-hot encoding because the dataset is not built into TensorFlow.
- Reshape images to 28x28x1.
- Define network parameters and placeholders.
- Build the network architecture using conv_net() function, conv2d() and maxpool2d().
- Add loss and optimizer nodes.
- Add an evaluation node.
- Train and test the model.
- Plot accuracy and loss between training and validation data.
Example code and tutorial: https://indiantechwarrior.com/cnn-based-image-classification-model-in-keras/
Advanced TensorFlow CNN Example: CIFAR10 without Estimators
This example shows how to build a CNN on TensorFlow without an object detection Estimator, using lower level APIs that give you much more control over network structure and parameters, because you’ll create custom object detection in TensorFlow.
In this example, you classify an RGB 32×32 pixel image across 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The example includes a multi-GPU version which will show you how to scale up your model.
Architecture: Alternating convolutions and nonlinearities, followed by fully connected layer, ending with a softmax classifier.
Process:
- Crop images to 24×24 pixels and random distortions are applied to increase the dataset size.
- Use the inference() function to compute predictions. The computation graph is as follows:
- Train the model using standard gradient descent (see our in-depth guide on Backpropagation ), to minimize the loss of the softmax regression function (see our guide on activation functions ).
- Launch the model using the training script, which reports total loss every 10 steps and the processing speed for the last batch of data.
- Evaluate the model using the evaluation script. It tests the model on all 10,000 images in the evaluation set of CIFAR-10, and displays accuracy.
- Train the same model on multiple GPUs by running the separate GPU training script. This creates several replicas of the model and runs each of them on a different GPU, on a subset of the training data.
Example code and tutorial: https://www.tensorflow.org/tutorials/images/cnn