Summer Research Fellowship Programme of India's Science Academies

Incremental training for image classification of unseen objects

Harshil Rakesh Jain

Department of Computer Science and Engineering, IIT Gandhinagar, Palaj, Gandhinagar, Gujarat, India, 382355

Prof. S.K. Nandy

Covener, CAD Lab, Department of Computational and Data Sciences, IISc Bengaluru, CV Raman Road, Karnataka, 560012


Object detection is a computer vision technique for locating instances of objects in images or videos. It basically deals with the detection of instances of semantic objects of a certain class in digital images and videos. It is a spine of a lot of practical applications of computer vision including image retrieval, self-driving cars, face recognition, object tracking, video surveillance, etc. Hence, object detection is significantly encompassing many fields in today’s world. Object detection can be achieved through traditional machine learning approaches which are histogram of oriented gradients (HoG) or scale-invariant feature transform (SIFT) features and also through various deep learning approaches which include two broad categories. First is an architecture which uses two neural networks which includes region proposals (R-CNN, Fast R-CNN & Faster R-CNN) & second is single shot detectors which includes You Only Look Once (YOLO) and Single Shot MultiBox Detector (SSD). RCNN and its derivatives (Fast R-CNN and Faster RCNN) first use region proposal to get a list of probable places in the image where the objects can lie. It then passes each of these proposals through the detection layer. This increases the time complexity of the overall algorithm. However, algorithms like YOLO and SSD pass the image through their respective convolutional and fully-connected networks and are able to detect objects in a single shot. Thus, YOLO and SSD are way faster than RCNN and its derivatives. YOLO basically uses Darknet for feature extraction followed by convolutional layers for object localization while SSD uses VGG-16 for feature extraction. Though the problem of object detection is gaining the attention of the research community, most of the works have concentrated on improving current object detection algorithms. Detection of objects on unseen classes for which the networks were never trained has been overlooked. In this work, an attempt has been made to understand the YOLO architecture and answer various questions related to it and also to improve the existing single shot detectors like YOLO and SSD to classify unseen classes in real time by incremental learning. This can prove very robust as it is very difficult retrain these huge convolutional networks as and when new classes are added, that too in real time.

Keywords: object detection, YOLO, convolutional neural networks, incremental learning, unseen classes, computer vision

Written, reviewed, revised, proofed and published with