Building Faster R-CNN on TensorFlow

The widespread adoption of Convolutional Neural Networks (CNNs) has driven progress in deep learning for computer vision, and especially in object detection. Architectures such as Faster R-CNN, R-FCN, Multibox, SSD, and YOLO provide a framework for modern object detectors.

TensorFlow, which is a deep learning framework, lets you build Faster R-CNN architectures to automatically recognize objects in images

Overview of R-CNN Algorithms for Object Detection

There are four types of R-CNN. Each type attempts to optimize, speed up, or enhance object detection results. Let’s compare these algorithms:

Learn more about the R-CNN Models at Computer Vision with Deep Learning.

Running Faster R-CNN on TensorFlow: Typical Steps

The following is a general process many practitioners use to run the R-CNN algorithm on TensorFlow:

Step	TensorFlow Documentation
ConvNet produces a feature map of an image based on the input it receives about an image.	Build a Convolutional Neural Network using Estimators
Region proposal network is applied to these feature maps. The ConvNet then returns the object proposals along with their object score.	Image segmentation with tf.keras
A RoI pooling layer is applied to these proposals to produce a small feature map of fixed size.	tf.nn.max_pool
The proposals are passed onto a fully connected layer, which includes a softmax layer and a linear regression layer. This process classifies and outputs the bounding boxes for objects.	tf.contrib.layers.fully_connected

Faster R-CNN TensorFlow and Keras

Lets begin with our code for testing and training our Faster RCNN network

As a first step we need to clone our repository with working code

git clone https://github.com/indiantechwarrior/faster_rcnn_tensorflow.git
cd faster_rcnn_tensorflow

Install requirements

pip install -r requirements.txt

Downloaded pretrained weights are checkin with repository, using imagenet pretrained VGG16 weights it will significantly speed up training

https://drive.google.com/file/d/1IgxPP0aI5pxyPHVSM2ZJjN1p9dtE4_64/view?usp=sharing

# place pretrain model in vgg dir.
mkdir models
mkdir vgg

Now lets test our object detection model on our Faster RCNN pretrained model

Open the project in Pycharm with –path variable or directly execute below command from command line

python test_frcnn.py --path /home/name/PycharmProjects/faster_rcnn_tensorflow/images/ --write /home/name/PycharmProjects/faster_rcnn_tensorflow/results

This will pick up all images present in folder and perform object detection on it. The results with confidence score will be printed on console together with rectangle drawn around them. You can see output images in result folder, example given below:

Now let’s train the Faster RCNN model on custom data using Transfer Learning

Let us download pretrained imagenet VGG16 weights.

# for VGG16
wget https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5

# for mobilenetv1
wget https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5

# for mobilenetv2
wget https://github.com/JonathanCMitchell/mobilenet_v2_keras/releases/download/v1.1/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5

# for resnet 50
wget https://github.com/fchollet/deep-learning-models/releases/download/v0.1/resnet50_weights_tf_dim_ordering_tf_kernels.h5

Downloaded weights can be placed in newly created directory pretrain

mkdir pretrain

Other tensorflow pretrained models are in bellow.

https://github.com/fchollet/deep-learning-models/releases/

Light Training using RPN (region proposal network) network only instead of complete network using Transfer Learning

python train_rpn.py --network vgg -o simple -p /path/to/your/dataset/

Epoch 1/20
100/100 [==============================] - 57s 574ms/step - loss: 5.2831 - rpn_out_class_loss: 4.8526 - rpn_out_regress_loss: 0.4305 - val_loss: 4.2840 - val_rpn_out_class_loss: 3.8344 - val_rpn_out_regress_loss: 0.4496
Epoch 2/20
100/100 [==============================] - 51s 511ms/step - loss: 4.1171 - rpn_out_class_loss: 3.7523 - rpn_out_regress_loss: 0.3649 - val_loss: 4.5257 - val_rpn_out_class_loss: 4.1379 - val_rpn_out_regress_loss: 0.3877
Epoch 3/20
100/100 [==============================] - 49s 493ms/step - loss: 3.4928 - rpn_out_class_loss: 3.1787 - rpn_out_regress_loss: 0.3142 - val_loss: 2.9241 - val_rpn_out_class_loss: 2.5502 - val_rpn_out_regress_loss: 0.3739
Epoch 4/20
 80/100 [=======================>......] - ETA: 9s - loss: 2.8467 - rpn_out_class_loss: 2.5729 - rpn_out_regress_loss: 0.2738

Other network options are: resnet50, mobilenetv1, vgg19

Heavy Training on whole Faster-RCNN network

# sample command
python train_frcnn.py --network vgg -o simple -p /path/to/your/dataset/

# using the rpn trained in previous step will make the training more stable.
python train_frcnn.py --network vgg -o simple -p /path/to/your/dataset/ --rpn models/rpn/rpn.vgg.weights.36-1.42.hdf5

# sample command to train PASCAL_VOC dataset:
python train_frcnn.py -p ../VOCdevkit/ --lr 1e-4 --opt SGD --network vgg --elen 1000 --num_epoch 100 --hf 
# this may take about 12 hours with GPU..

# add --load yourmodelpath if you want to resume training.
python train_frcnn.py --network vgg16 -o simple -p /path/to/your/dataset/ --load model_frcnn.hdf5

Using TensorFlow backend.
Parsing annotation files
Training images per class:
{'Car': 1357, 'Cyclist': 182, 'Pedestrian': 5, 'bg': 0}
Num classes (including bg) = 4
Config has been written to config.pickle, and can be loaded when testing to ensure correct results
Num train samples 401
Num val samples 88
loading weights from ./pretrain/mobilenet_1_0_224_tf.h5
loading previous rpn model..
no previous model was loaded
Starting training
Epoch 1/200
100/100 [==============================] - 150s 2s/step - rpn_cls: 4.5333 - rpn_regr: 0.4783 - detector_cls: 1.2654 - detector_regr: 0.1691  
Mean number of bounding boxes from RPN overlapping ground truth boxes: 1.74
Classifier accuracy for bounding boxes from RPN: 0.935625
Loss RPN classifier: 4.244322432279587
Loss RPN regression: 0.4736669697239995
Loss Detector classifier: 1.1491613787412644
Loss Detector regression: 0.20629869312047958
Elapsed time: 150.15273475646973
Total loss decreased from inf to 6.07344947386533, saving weights
Epoch 2/200
Average number of overlapping bounding boxes from RPN = 1.74 for 100 previous iterations
 38/100 [==========>...................] - ETA: 1:24 - rpn_cls: 3.2813 - rpn_regr: 0.4576 - detector_cls: 0.8776 - detector_regr: 0.1826

Setting up training dataset for VOC2007 training

download dataset and extract.

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar
tar -xf VOCtrainval_11-May-2012.tar

then run training

python train_frcnn.py --network mobilenetv1 -p ./VOCdevkit

Using TensorFlow backend.
data path: ['VOCdevkit/VOC2007']
Parsing annotation files
[Errno 2] No such file or directory: 'VOCdevkit/VOC2007/ImageSets/Main/test.txt'
Training images per class:
{'aeroplane': 331,
 'bg': 0,
 'bicycle': 418,
 'bird': 599,
 'boat': 398,
 'bottle': 634,
 'bus': 272,
 'car': 1644,
 'cat': 389,
 'chair': 1432,
 'cow': 356,
 'diningtable': 310,
 'dog': 538,
 'horse': 406,
 'motorbike': 390,
 'person': 5447,
 'pottedplant': 625,
 'sheep': 353,
 'sofa': 425,
 'train': 328,
 'tvmonitor': 367}
Num classes (including bg) = 21
Config has been written to config.pickle, and can be loaded when testing to ensure correct results
Num train samples 5011
Num val samples 0
Instructions for updating:
Colocations handled automatically by placer.
loading weights from ./pretrain/mobilenet_1_0_224_tf.h5
loading previous rpn model..
no previous model was loaded
Starting training
Epoch 1/200
Instructions for updating:
Use tf.cast instead.
  23/1000 [..............................] - ETA: 43:30 - rpn_cls: 7.3691 - rpn_regr: 0.1865 - detector_cls: 3.0206 - detector_regr: 0.3050

Building Faster R-CNN on TensorFlow

Technical Articles

Gemma: Google’s Open-Source Powerhouse for Responsible AI

Top 10 Generative AI Tools and Platforms Reshaping the Future

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

Useful Links

Categories