The widespread adoption of Convolutional Neural Networks (CNNs) has driven progress in deep learning for computer vision, and especially in object detection. Architectures such as Faster R-CNN, R-FCN, Multibox, SSD, and YOLO provide a framework for modern object detectors.
TensorFlow, which is a deep learning framework, lets you build Faster R-CNN architectures to automatically recognize objects in images
Overview of R-CNN Algorithms for Object Detection
There are four types of R-CNN. Each type attempts to optimize, speed up, or enhance object detection results. Let’s compare these algorithms:
Learn more about the R-CNN Models at Computer Vision with Deep Learning.
Running Faster R-CNN on TensorFlow: Typical Steps
The following is a general process many practitioners use to run the R-CNN algorithm on TensorFlow:
Step | TensorFlow Documentation |
ConvNet produces a feature map of an image based on the input it receives about an image. | Build a Convolutional Neural Network using Estimators |
Region proposal network is applied to these feature maps. The ConvNet then returns the object proposals along with their object score. | Image segmentation with tf.keras |
A RoI pooling layer is applied to these proposals to produce a small feature map of fixed size. | tf.nn.max_pool |
The proposals are passed onto a fully connected layer, which includes a softmax layer and a linear regression layer. This process classifies and outputs the bounding boxes for objects. | tf.contrib.layers.fully_connected |
Faster R-CNN TensorFlow and Keras
Lets begin with our code for testing and training our Faster RCNN network
As a first step we need to clone our repository with working code
git clone https://github.com/indiantechwarrior/faster_rcnn_tensorflow.git
cd faster_rcnn_tensorflow
Install requirements
pip install -r requirements.txt
Downloaded pretrained weights are checkin with repository, using imagenet pretrained VGG16 weights it will significantly speed up training
https://drive.google.com/file/d/1IgxPP0aI5pxyPHVSM2ZJjN1p9dtE4_64/view?usp=sharing
# place pretrain model in vgg dir.
mkdir models
mkdir vgg
Now lets test our object detection model on our Faster RCNN pretrained model
Open the project in Pycharm with –path variable or directly execute below command from command line
python test_frcnn.py --path /home/name/PycharmProjects/faster_rcnn_tensorflow/images/ --write /home/name/PycharmProjects/faster_rcnn_tensorflow/results
This will pick up all images present in folder and perform object detection on it. The results with confidence score will be printed on console together with rectangle drawn around them. You can see output images in result folder, example given below:
Now let’s train the Faster RCNN model on custom data using Transfer Learning
Let us download pretrained imagenet VGG16 weights.
# for VGG16
wget https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
# for mobilenetv1
wget https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
# for mobilenetv2
wget https://github.com/JonathanCMitchell/mobilenet_v2_keras/releases/download/v1.1/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5
# for resnet 50
wget https://github.com/fchollet/deep-learning-models/releases/download/v0.1/resnet50_weights_tf_dim_ordering_tf_kernels.h5
Downloaded weights can be placed in newly created directory pretrain
mkdir pretrain
Other tensorflow pretrained models are in bellow.
https://github.com/fchollet/deep-learning-models/releases/
Light Training using RPN (region proposal network) network only instead of complete network using Transfer Learning
python train_rpn.py --network vgg -o simple -p /path/to/your/dataset/
Epoch 1/20
100/100 [==============================] - 57s 574ms/step - loss: 5.2831 - rpn_out_class_loss: 4.8526 - rpn_out_regress_loss: 0.4305 - val_loss: 4.2840 - val_rpn_out_class_loss: 3.8344 - val_rpn_out_regress_loss: 0.4496
Epoch 2/20
100/100 [==============================] - 51s 511ms/step - loss: 4.1171 - rpn_out_class_loss: 3.7523 - rpn_out_regress_loss: 0.3649 - val_loss: 4.5257 - val_rpn_out_class_loss: 4.1379 - val_rpn_out_regress_loss: 0.3877
Epoch 3/20
100/100 [==============================] - 49s 493ms/step - loss: 3.4928 - rpn_out_class_loss: 3.1787 - rpn_out_regress_loss: 0.3142 - val_loss: 2.9241 - val_rpn_out_class_loss: 2.5502 - val_rpn_out_regress_loss: 0.3739
Epoch 4/20
80/100 [=======================>......] - ETA: 9s - loss: 2.8467 - rpn_out_class_loss: 2.5729 - rpn_out_regress_loss: 0.2738
Other network options are: resnet50, mobilenetv1, vgg19
Heavy Training on whole Faster-RCNN network
# sample command
python train_frcnn.py --network vgg -o simple -p /path/to/your/dataset/
# using the rpn trained in previous step will make the training more stable.
python train_frcnn.py --network vgg -o simple -p /path/to/your/dataset/ --rpn models/rpn/rpn.vgg.weights.36-1.42.hdf5
# sample command to train PASCAL_VOC dataset:
python train_frcnn.py -p ../VOCdevkit/ --lr 1e-4 --opt SGD --network vgg --elen 1000 --num_epoch 100 --hf
# this may take about 12 hours with GPU..
# add --load yourmodelpath if you want to resume training.
python train_frcnn.py --network vgg16 -o simple -p /path/to/your/dataset/ --load model_frcnn.hdf5
Using TensorFlow backend.
Parsing annotation files
Training images per class:
{'Car': 1357, 'Cyclist': 182, 'Pedestrian': 5, 'bg': 0}
Num classes (including bg) = 4
Config has been written to config.pickle, and can be loaded when testing to ensure correct results
Num train samples 401
Num val samples 88
loading weights from ./pretrain/mobilenet_1_0_224_tf.h5
loading previous rpn model..
no previous model was loaded
Starting training
Epoch 1/200
100/100 [==============================] - 150s 2s/step - rpn_cls: 4.5333 - rpn_regr: 0.4783 - detector_cls: 1.2654 - detector_regr: 0.1691
Mean number of bounding boxes from RPN overlapping ground truth boxes: 1.74
Classifier accuracy for bounding boxes from RPN: 0.935625
Loss RPN classifier: 4.244322432279587
Loss RPN regression: 0.4736669697239995
Loss Detector classifier: 1.1491613787412644
Loss Detector regression: 0.20629869312047958
Elapsed time: 150.15273475646973
Total loss decreased from inf to 6.07344947386533, saving weights
Epoch 2/200
Average number of overlapping bounding boxes from RPN = 1.74 for 100 previous iterations
38/100 [==========>...................] - ETA: 1:24 - rpn_cls: 3.2813 - rpn_regr: 0.4576 - detector_cls: 0.8776 - detector_regr: 0.1826
Setting up training dataset for VOC2007 training
download dataset and extract.
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar
tar -xf VOCtrainval_11-May-2012.tar
then run training
python train_frcnn.py --network mobilenetv1 -p ./VOCdevkit
Using TensorFlow backend.
data path: ['VOCdevkit/VOC2007']
Parsing annotation files
[Errno 2] No such file or directory: 'VOCdevkit/VOC2007/ImageSets/Main/test.txt'
Training images per class:
{'aeroplane': 331,
'bg': 0,
'bicycle': 418,
'bird': 599,
'boat': 398,
'bottle': 634,
'bus': 272,
'car': 1644,
'cat': 389,
'chair': 1432,
'cow': 356,
'diningtable': 310,
'dog': 538,
'horse': 406,
'motorbike': 390,
'person': 5447,
'pottedplant': 625,
'sheep': 353,
'sofa': 425,
'train': 328,
'tvmonitor': 367}
Num classes (including bg) = 21
Config has been written to config.pickle, and can be loaded when testing to ensure correct results
Num train samples 5011
Num val samples 0
Instructions for updating:
Colocations handled automatically by placer.
loading weights from ./pretrain/mobilenet_1_0_224_tf.h5
loading previous rpn model..
no previous model was loaded
Starting training
Epoch 1/200
Instructions for updating:
Use tf.cast instead.
23/1000 [..............................] - ETA: 43:30 - rpn_cls: 7.3691 - rpn_regr: 0.1865 - detector_cls: 3.0206 - detector_regr: 0.3050