Summer Research Fellowship Programme of India's Science Academies

Granular computing, rough theory and rule base for object tracking

G. Vandana

Student, Dept. of Electronics and Communication (III Year), National Institute of Engineering, Mysore

Prof. Sankar K Pal,

Fellow of the Indian National Science Academy; Fellow of IEEE, National Science Chair; Distinguished Scientist and Former Director, Indian Statistical Institute, Kolkata - 700108


A Spatio-temporal segmentation approach was used for object tracking in video sequences. Tracking of a moving object in a video was done on a partially supervised method. The concept of Rough sets was used to study the uncertainness in data. Techniques for spatial and temporal segmentation based on the formation of granules are discussed. The objective was to achieve a technique close to natural granulation. A rule base was used to enhance the accuracy of tracking.

Keywords: rough theory, ranules, spatial segmentation, temporal segmentation, rule base


Object tracking is an important task in computer vision. Object tracking is required in several fields of computer vision like surveillance and gesture recognition. The important task in object tracking is to effectively segment (identify) the object from the background. But in images, the object-background boundaries are ill-defined. The concepts of Rough sets and Rough Entropy were found to give an optimum threshold for segmentation thereby reducing vagueness.

Spatial and temporal information of the object in a video sequence, play an important role in object tracking. Spatial information can be used to study the homogenous parts of the image. This is done by grouping similar objects into a clump or a granule. A granule is defined as a clump of objects (points), in the universe of discourse, drawn together by indistinguishability, similarity, proximity, or functionality [1] Unequally formed granules were found to be more effective in identifying the segments of an image. Two methods were studied, Quad-Tree Decomposition and Granulation based on neighbourhood sets, after which an appropriate method was chosen for spatial segmentation.

Temporal segmentation involves the study of change in information from one frame to another frame in a video sequence. The three-point estimation technique was found to be memory effective and simple in separating the foreground and the background. But often, the information of the tracking object keeps changing due to change in illumination or shape.

This change in information was carefully studied and used to construct a Rule Base for tracking. Usually, the parameters studied are the changes in texture, colour and shape. Here, the Rule base considers changes in colour and depth values. The depth values (D) are obtained from a Kinect sensor, where the depth of the moving object(s) is lower than the background. The dataset was obtained from ChaLearn 2011​​[2]​​ which consists of RGB and depth videos of various hand-movements recorded with the Kinect camera.


The Research carried out in the field on Rough Sets and Granular Computing by Prof. Sankar K Pal inspired me to pursue this project. The papers published by Sir on Object detection and Extraction using the above methods provided me with an insight into how object tracking can be modelled using spatial and temporal data present in video frames. The paper on Neighbourhood Granules and Rough Rule Base helped in constructing a Rule Base.



Rough Entropy and Granules for object extraction

Rough set theory helps in identifying the boundaries present in an image using upper and lower approximations. The upper and lower approximations on a set X are defined as

Upper approximation ( R\overline R(X)) : {Y\inU / R: Y ​X\neq\varnothing​}

Lower approximation (R\underline R(X)): {Y\inU / R: Y \subseteqX}​

A measure called Rough Entropy when maximised gives us the threshold to effectively segment the image.​[1]​A granule in a M x N image(I) can be defined as a non-overlapping window of size mi  X  nim_i\;X\;n_i,(where i reflects the ith granule).Granules are then further classified as belonging to the object(O) or the background(B) based on the threshold(T) obtained. Information of the entire Image(I) can now be viewed as a collection of granules.The below figure 1 illustrates the above concepts

Rough set on Images
    A scanned newspaper with low illumination
      Segmented image obtained with Threshold T 
        Granules and Rough Entropy

        Here,the threshold T and granule size is estimated by observing the bi-modal histogram of the image as described in[1]​.It can be seen that maximised Rough Entropy enhances the quality of the image and text.


        Spatial segmentation

        Video frames (images) usually contain heterogeneous data. Spatial segmentation is addressed using two kinds of techniques. These methods result in granulation of the image(grouping of the homogenous parts). Quad-tree decomposition is a region splitting(Split and merge) method and Neighbourhood sets is a region growing technique.

        Quad-Tree decomposition

        Quad Tree decomposition is a recursive process that keeps dividing the image into 4 equal parts to form granules.Granules are formed as long as the difference between the maximum and minimum grey level is greater than a certain threshold given by


        In the above formula Q3 and Q1 represent the third and first quartiles of the gray level distribution of the image.The resulting granules which are formed using image statistics, are unequal in size.

        Neighbourhood sets

        The formation of granules based on the information of neighbours around a point is the main concept of neighbourhood sets.The neighborhood of a point xix_i\in\cup, N(xi)\mathfrak N(x_i) is defined as

        N(xi)={xj:(xi,xj)δ}\mathfrak N(x_i)=\{x_j\in\cup:\triangle(x_i,x_j)\leq\delta\}[3]​,where \bigtriangleupis the distance function and δ\deltais the threshold.Here, it can be seen that a granule is nothing but a point xix_iand its neighbourhood N(xi)\mathfrak N(x_i).Granules formed are unequal in size and sometimes may be overlapping. Here a neighbourhood of 4 is used to form granules.

        ​Spatio-Color Granules

        The spatio and colour nearness of a image is used to form spatio-colour granules in a Image using the rule

        Nspclr(xi)=xjU:  color(xi)color(xj)<Thr{\mathfrak N}_{sp-clr}(x_i)=\cup x_j\in U:\;\left|color(x_i)-color(x_j)\right|<Thr

        Here xix_iis such a point which is not contained by any another granules. Overlapping occurs only if the neighbourhood properties merge.The number of granules formed is dependent on the threshold chosen.The accuracy(no. of segments) decreases with increase in the value of Thr.In the below figure, a 256 x 256 grayscale image of Lena is chosen.The spatio-color granules formed are shown below in Fig 2.

        Original Image
          Threshold = 10
                Spatio-color granules formed at different of Thr.

                Temporal Segmentation

                The change of information from current frame to previous frames consitutes temporal information.The three point estimation technique which involves estimating the foreground and background using measures such as mean and median was studied. ​[4]

                Here, the frame difference Temp_Val = ftft1\left|f_t-f_{t-1}\right|was used for temporal segmentation.Let the video frame t be of size M x N,then Temp_Val is also a M x N matrix, consisting of intensity difference from t and (t-1)th frame.

                Rule Base

                A rule base is designed to effectively segment the object and the background in a video frame. To generate a rule base in pixel level, the initial object(s)/background in the first frame need to be labelled. This labelling is done by using Rough Entropy concepts as given in Section 3.1. The depth values and the colour intensities of the Object are noted. Later on, unknown object(s) in the sequences are tracked by using the below features.

                • Temporal features: Frame difference in RGB-D feature space taken as TRGB  and  TDT_{RGB}\;and\;T_Drespectively.As given in Table 1 below, 1 indicates there is change and 0 indicates no change
                • Colour features: The color and depth values of the object are taken as RGBvRGB_v and DvD_v .Here, W indicates within the model while Ou indicates outside the model.

                The conditions to evaluate the status of a moving object were studied.​[3]​The conditions selected for a pixel to be considered as an Object(O) or a Background(B) are given below in Table 1:

                Rule Base to evaluate background and object pixels
                 1 1W W  O
                 0 0 Ou Ou B


                Spatial information for tracking

                The tracker is designed based on the Fig 3 given below, where

                • The (t-1)th and (t-2)th frames are considered to design the tracker for the (t)th frame.
                • The centroids at the (t-1)th and (t-2)th frames are (xc,ycx_c,y_c) and (xc',yc') respectively.
                • The euclidean distance between the two centroids is s. Here, dx=(xc-xc' ) and dy=(yc -yc')
                • The size of the tracker Tx  ,TyT_x\;,T_y is determined by s,while the direction of the tracker is determined by the signs of dx and dy
                Screenshot (202).png
                  Pictorial representation of the tracker

                  Object tracking

                  Initially, the desired tracker is drawn around the object of interest in the form of a rectangle.The size of the tracker is considered to be (Tx,Ty)(T_x,T_y)where Tx=Txu+TxlT_x=T_{xu}+T_{xl} and Ty=Tyu+TylT_y=T_{yu}+T_{yl}as shown in Fig3. The background and object are labelled initially using the Rough entropy method.The colour range and the depth range of the object are noted.

                  The algorithm for object tracking in a video frame is given below:

                  • Convert the input color (RGB)image (I) to gray level image (Y) by using the equation: Y = 0.3R + 0.59G+ 0.11B
                  • The tracker is predicted at the (t+1)th frame,with the help of equations in Section 4.1 [4] .If dx>0d_x>0Txu=Txu  +1.25sT_{xu}=T_{xu\;}+1.25sand Txl=Txl0.75sT_{xl}=T_{xl}-0.75s,ELSE Txu=Txu+0.75sT_{xu}=T_{xu}+0.75s and Txl=Txl1.25sT_{xl}=T_{xl}-1.25s.If dy>0d_y>0Tyu=Tyu+1.25sT_{yu}=T_{yu}+1.25s and Tyl=Tyl0.75sT_{yl}=T_{yl}-0.75s,ELSE Tyu=Tyu+0.75sT_{yu}=T_{yu}+0.75s and Tyl=Tyl1.25sT_{yl}=T_{yl}-1.25s.
                  • Now for every pixel within the tracker,the Rule Base(in pixel level) is applied considering the (t-1)th and (t+1)th frame based on Table 1.
                  • This Rule Base clearly helps in identifying the Object in the predicted tracker.The tracker is now redefined around the detected object.
                  • Steps 1-4 are now repeated to track the moving object in the video sequence.
                    Flowchart of the tracking algorithm


                    Results on Frames

                    • Tracked frames of hand movements

                    The tracking was done on different kinds of hand movements. The video frames are of size 240 x 320. and 10fps. In Fig 5,the first row of images show hand movement by a woman who moves her arm in a circular motion. Similarly,the second row of images show a man moving his arm from one side to another. In both the cases, a change in shape of the object(hand) being tracked is observed.However, the algorithm handles it smoothly and the object being tracked remains in the rectangular tracker.

                    Frame 13 M_9
                      Frame 23
                        Frame 64
                          Frame 13 of M_1
                            Frame 22
                              Frame 38
                                Tracking Results

                                Tracking Videos

                                • Video 1:Tracking of a moving car on a highway.

                                In Video 1,the moving car is tracked successfully.The targeted car remains in the tracker throughout it's motion in the video.Quad-Tree Decomposition for Spatial and Three point estimation techniques are used for Temporal segmentation.Object detection is done with the help of measures such as mean and max_dev of the RGB channels.

                                Tracking of a car
                                • Video 2 & 3: Tracking of hand movements.

                                The hand movements in video(K_1) and (M_9) are successfully tracked using the Rule base in pixel level method.In both the cases,the hand remains inside the rectangular tracker.

                                Hand Movement(K_1)
                                Hand Movement(M_9)


                                Estimated granule size in an image for improving quality and threshold using the bi -modal histogram.Used Rough entropy concept for labelling of object and background in an image.

                                Spatial information of a video frame was used to predict the position of tracker and identifying homogenious parts of an image (granules). The background and foreground were separated using the Three Point Estimation Technique and Frame Difference method.When Frame Difference method was used along with other parameters like Depth(D) in a Rule Base,many aspects of the moving object in a video could be accounted.This made the process of object detection much faster.The tracker designed was able to address the problems of illumination and shape change of the tracked object.This tracker has its applications in simple survelliance videos where the position of the camera is fixed and occlusions do not occur.The advantages of granules in video processing include:

                                • The number of classes need not be predefined as done in K-means clustering.
                                • In Neighbourhood Pixel method,the threshold need not be found.

                                In this article, Rule Base was applied in pixel level.The Rule Base can be constructed incorporating the concept of Spatio-Colour and Temporal value granules.The concept of overlapping granules can then be used to improve accuracy.Deep-learning methods can be applied for tracking in complex scenarios.


                                I am grateful to the Indian Academy of Sciences for providing me an opportunity to pursue the Summer Research Fellowship(SRF-2020) under Prof. Sankar K Pal, former Director, Indian Statistical Institute, Kolkata(Padma Shree). I am extremely humbled and thankful to be an Intern under such an experienced and senior Professor who guided me throughout. This internship gave me an insight of Granular Computing techniques which have a huge potential in Digital Image and Video Processing. This also made me aware of the research opportunities in the Soft Computing domain. I would also like to thank Ms.Debarati Chakraborty who assisted and gave timely guidance in my project.

                                My special thanks to my college Principal, Dr.Rohini Nagapadma who encouraged me to pursue this research internship.


                                • Sankar K. Pal, B. Uma Shankar, Pabitra Mitra, 2005, Granular computing, rough entropy and object extraction, Pattern Recognition Letters, vol. 26, no. 16, pp. 2509-2517

                                • https://gesture.chalearn.org/data/cgd2011

                                • Debarati Bhunia Chakraborty, Sankar K. Pal, 2015, Neighborhood granules and rough rule-base in tracking, Natural Computing, vol. 15, no. 3, pp. 359-370

                                • Debarati Chakraborty, B. Uma Shankar, Sankar K. Pal, 2013, Granulation, rough entropy and spatiotemporal moving object detection, Applied Soft Computing, vol. 13, no. 9, pp. 4001-4009


                                • Fig 1 a: S.K. Pal et al. / Pattern Recognition Letters 26 (2005) 2509–2517
                                • Fig 3: D. Chakraborty et al. / Applied Soft Computing 13 (2013) 4001–4009
                                • Video 2: https://vimeo.com/455381838
                                Written, reviewed, revised, proofed and published with