Abandoned Object Detection Using Dual Background Model from Surveillance Videos

— with increase in threats and concerns in security, detection of suspicious activities in public areas has attracted an enormous level of attention. In general, video processing system are been employed for post-event analysis. However, there is a need to build an intelligent video surveillance system so as to find ways to prevent such events. The proposed system is used todetect the abandoned object from the surveillance videos with the use of dual background model. The division of video into frames is done and are pre-processed. In this approach, dual background method is used to subtract the foreground objects from the background, which generates two backgrounds called buffered background and current background. The foreground blobs are generated using subtraction of the two backgrounds and it is tracked to detect the abandoned objects. Tracking is done by maintaining a Track set which includes blobProperties, namely Area, Centroid, Major Axis length, Minor Axis Length and Convex Area, and two separate count set. The system is tested on various videoswhich are publically available.


II. RELATED WORK
Wahyono, Alexander Filonenko, and Kang-Hyun Jo [3] have presented their work on abandoned objects detection from crowded ccenes of surveillance videos using adaptive dual background model. In this paper, a new framework is presented to detect abandoned object using dual background model subtraction. Major contributions of their work includes: A new background model is introduced based on statistical information of image intensity. Dual background model subtraction is performed in order to extract candidate abandoned region which is robust against lightening changes. Matching-based tracking algorithm is employed to detect abandoned object under occlusion. Human and vehicle detection are integrated to classify human, vehicle and other objects.Quan Wei, Zhang Yuqiang, Ge Wei and LI Hialan [8] have presented their work on research on stationary object detection technique based on dual-Background. A new stationary object detection algorithm is proposed in this paper which includes dual background subtraction to get foreground image based on the approximated median filtering using the adaptive threshold method and detection of stationary object through morphological processing. The target detection algorithm is used the current background and buffer background difference to detect stationary object, then analyse the connected region to abstract the stationary target.Rajesh Kumar Tripathi, Anand Singh Jalal and CharulBhatnagar [10] have presented their work on a framework for abandoned object detection from video surveillance. In this paper,proposed method is used to detect abandoned object from surveillance video. Here, foreground objects are extracted by using background subtraction where background modelling is done through running average method. The objects which are static are detected by using contour features of foreground objects of consecutive frames. Edge based object recognition is used to classify detected static objects into human and non-human objects.A. Singh, S. Sawan, M.Hanmandlu, V.K. Madasu and B.C. Lovell [12] have presented their work on an abandoned object detection system based on dual background segmentation. In this paper, the system is based on a simplistic and intuitive mathematical model. The proposed system consists of a novel self-adaptive dual background subtraction technique based on the approximate median model framework. Tracking is performed on the detected block. A track set is maintained with three variables blobProperties, hitCount and miscount. If hitCount goes above the user defined threshold value, an alarm is triggered indicating the abandoned object is detected. Kevin Lin, Shen-Chi Chen, Chu-Song Chen, Daw-Tung Lin and Yi-Ping Hung [18] have presented their work on abandoned object detection via temporal consistency modelling and back-tracking verification for visual surveillance. Temporal dual-rate foreground integration method is proposed for static-foreground estimation for single camera video images. Subsequently, method introduced a simple pixel-based finite-state machine (PFSM) model that is used to temporal transition information to identify the static foreground based on the sequence pattern of each object pixel. The merits of their system include the dual-rate background modelling framework with temporal consistency which is better than single-image based double background models. It is superior in handling temporary occlusions and is still highly efficient to implement.
III. PROPOSED APPROACH In the proposed system, abandoned object detection is carried out from various scenarios. The abstract model of the proposed system is shown in the Fig. 2.

A. Pre-Processing
A video clip is provided as an input to the system. The input video is then extracted into frames for further processing. Pre-processing is applied on the extracted frames including resizing the video frames then converting frames from RBG to Gray-scale and later applying median filter in order to remove noise and sharpen the edges.

B. Dual Background Generation
In the proposed approach, Dual Background concept is used rather than the conventionally used simple background subtraction method. In this method, two different backgrounds are maintained-Buffered Background and Current Background. The buffered background is initialized by only the first frame of the input video. This background is stored and is not updated. On the other hand, current background is initialized by the first frame and subsequently each pixel of this current background is compared with the corresponding pixel of the next incoming frame. The mathematical model for update strategy is given below: Where CB is the pixel value of current background and I is the pixel value of each frame that has been read and t represents time.

C. Object Detection
In order to detect the object, difference between the current background and buffer background is calculated after every 10 seconds. The image pixels values are traversed from top to bottom, from left to right. CB i, j as current background pixel value and BB i, j as the buffer background pixel value, then the background subtraction B i, j is represented as: After the difference between the two backgrounds is calculated, the result is then binarized depending upon the threshold so as to detect the corresponding suspicious activity. This binarization is shown as follows: Here the value 1 is assigned for those pixels classified as foreground and 0 for those classified as background. Foreground pixels can be grouped into by means of connectivity properties.

D. Object Tracking
As from the above step a binary image is obtained, this binary image is divided into number of legitimate blobs i.e., rectangular regions enclosing continuous regions of foreground. Firstly, blobs along with their various properties such as area, centroid, position etc is been generated and later tracking algorithm is applied. Mathematically, it is assumed that after blob analysis N number of blobs are generated with enclosing regionR t, l, h, w , having area A , centroid C i, j , major axis length Maj , minor axis length Min and convex area CA , where t is top position value of the pixel, l is the left position value of the pixel, h gives the height of the blob and w is the width of the blob; and 1 n N. A set of tracked blobs is maintained. 'T' is the set of tracked blob defined as, Where, M is the number of blob tracked. Two count arrays are maintained namely count1 and count2. Count1 is the set of all individual blobs detected throughout the video frames maintaining total number of times it appeared, centroid values, frame number etc., and count2 is a set maintaining the blobs' records for the current 10 seconds including the blobs that weren't being tracked previously. The new incoming blob detected in count2 is added in the Count1 set if not present already. The next step in object detection is to track the different blobs so as to detect which blob corresponds to abandoned objects. Tracking involves the following steps:  Create a set, Track, whose elements have six properties: Area, Centroid, Bounding Box, Major axis length, Minor axis length and Convex area collectively called as them blobProperties. This Track set stores properties of all the blobs detected after every 10 seconds interval. Also, two counts sets are maintained.  For the initial first 10 seconds, add centroid values of each identified blob into the set count1.  For every interval of 10 seconds analyze the incoming image for all the blobs and store their centroid values into another count set count2. If this set introduces new blobs which are not present in the set count1, then make their entry in count1.  After 10 seconds is elapsed, comparison is made between the new entries in the Track set with the previous one. The comparison is made to check if all the blobProperties centroid, Area, Major axis length, Minor axis length and Convex area of two blobs are equal or not. If the match is found, the count value is incremented in the count1 set or else the count retained as it is.  If the value of count reaches a threshold, here 2 indicating 20 seconds, the detected blob is marked with red boundaries indicating it to be potentially abandoned object. These steps are repeated until there are no incoming images.

A. Implementation Platform Details
The hardware and software specifications of the platform on which the proposed approach is implemented and tested is given below:

B. Tools and Technology
The whole approach is implemented in MATLAB R2012a. MATLAB is a fourth generation programming language which provides multi-paradigm numerical computing environment. The functions of computer vision system toolbox provide facilities to design and simulate video processing systems. Object detection and tracking, feature extraction and matching, estimation of motion can be done using it.

C. Dataset Design
In order to evaluate the presented work, the surveillance video datasets are collected from various resources. The work has been evaluated using 16 video sequences. Video Sequences includes various scenarios: indoor, outdoor, detection in light and crowd. The duration, length, frame rate and scenarios of these video sequences are specified in Table I below.

D. Experimental Results
The video1 video sequence is chosen for testing which contains outdoor scenario. The duration of the video is 73 seconds having 2189 frames with frame rate is 29.97. As per the ground truth, 1 abandoned object is present in the video. The first phase of the system includes: Extraction of first frame, resize it, convert from RGB to Gray and initialize it to Buffered and Current Background.
The next phase is to run the video in segments of size 10 seconds. Here since the duration of video is 73 seconds thus the segments obtained are 8. Frame extraction is carried out and the pixel values of next incoming video frame are compared with that of the current background for the each segment i.e., for the every 10 seconds of the video. Figure 3 shows the experimental result for the last segment as we detect abandoned object in that segment. Fig. 3(a) shows the buffered background, Fig. 3(b) shows the updated current background and Fig. 3(c) shows the binarized difference image. Here, three blobs are obtained whose blobProperties are listed in the Table II below: The values in array count 2 are updated as shown in the Table III below:   TABLE III Now here, the centroid of blob1, blob2 and blob3 matches with the centroid of 67th, 70th and 71th entries in the set count1 respectively. Thus their values are overwritten by these new values. No new entries are included in count1. Below is Table IV showing the updated count1 set from 67th position. The comparison between the blobProperties of recently obtained blobs is made with the ones obtained in the previous segment. Comparison of the three blobs is shown is shown in the Table V along with their respective matches from the previous segment. Here, it is observed that all the three blobs in this segment match with previously obtained blobs in terms of area, centroid, major axis length, minor axis length and convex area. Thus the count of blob1 in set count1 is incremented by 1 making it a total of 2. Similarly, count of blob2 and blob3 are also incremented by 1. The Table VI is shown below with the incremented values of the count: The threshold set in the proposed system to label the object as abandoned is 20 seconds. Here, the count of value 2 indicates 20 seconds, meaning that the object is left abandoned for 20 seconds. Once the object is detected as abandoned, the boundary surrounding the object is changed to red colour. Figure 4 shows the three identified objects with two marked with green bounding box while one marked with red bounding box, indicating it to be abandoned.  Table VII shows the summary of the detection results on outdoor, indoor and detection in night. Here, total of 7 videos are used. Ground truth is provided with every video having one abandoned object. Here, GT = Ground Truth, TN = True Negative, TP = True Positive, FN = False Negative and FP = False Positive. The detection results from the above table shows that the abandoned objects are successfully detected for all the video sequences with precision of 77.78% and recall being 100%. Only two false positives are obtained from video sequence 5 which is mainly due to unclean buffered background.Similarly, experiment performed on crowded scenario is shown in the below Table VIII. Here total of 9 videos are used each having crowded scene. Ground truth is provided with every video containing one abandoned object except for videos 4 and 5, having no abandoned object. From the above table it is observed that all the abandoned objects are accuratelydetected but also few false positives are detected because of the crowded nature of the videos and unclear first frame resulting in the unclean buffered background. The precision observed is 53.85% while recall remains 100%. Combining all the video sequences, the overall precision is calculated to be 63.63% and the recall is 100%.

V. CONCLUSION AND FUTURE WORK
This paper presents an approach based on dual background model which detects abandoned object from the surveillance videos. Dual background model maintains two separate backgrounds one called buffered background and other called current background.Experiments are performed on various video sequences for different scenarios including indoor, outdoor, detection in night and crowded scenes. The experimental results for video sequences having outdoor, indoor and detection in night scenario has precision of 77.78% and recall of 100%. For crowded scenario the precision measured is 53.85% and the recall measured is 100%. The overall precision for all the video sequences is measured to be 63.63% with 100% recall.Though the proposed system works well but in future it can be extended to work better even in densely crowded places. Occlusion of the objects continues to be an issue, which can be solved if way to update the buffered background is improved, thus enhancing the object detection by the system.