An Approach to Video Search Engine

- Big Data, whenever we come through this word then only a single thought come in our mind, a huge amount of data. Data is too large and complicated that we are unable to process it through traditional database management system. Today we have 80 percent of unstructured data over the internet mainly images and videos. So to manage such a big amount of data we use the Hadoop. In this paper we are going to cover the video analytics using Hadoop. Video Analytics is the demand of today. Now a days we are using the video surveillance at home which produce the lots of data. Now the issue occur to store it and then analysis it. In this paper we are going to discuss the techniques to store the large amount of data and how to make the analysis of it. So to do image processing/comparison we are using the HIPI (Hadoop image Processing Interface). Image comparison is the first phase from where we start the video analytics.


I. INTRODUCTION
When we talk about the examples of Big Data than video analytics is the best among all. Today we are surrounded by the video analytics like if we talk about the malls then there we have the cameras for security or it may be any bank or it may be any small shop. Everywhere we found the camera as the security purpose but no one is thinking about how to store and process this huge data. Everyone is looking to get a better way to store and access.
Just look onto a simple scenario that we are managing a mall of 4 stories which consists 4 cameras per story for video surveillance purpose. So now we are managing total of 16 cameras. Imagine all these cameras running all 24*7 days, how much data they are generating. According to a survey more than 80% of this data is used once and is of no use. Around 1 terabytes of data is generated in a single day, which is too huge to handle through the traditional database management tools. For all this data big data techniques will be the efficient one which can handle large amount of data.
Currently big data technique is applied on the structured data like face book, stock market, as social media data is in structured format like comments, likes, share. But recent studies show that now big data is moving around the unstructured data which is in images and video formats. Today everyone wants fast results, fast updates which they can perform on the text based data but not on the content based. So for that we need to do analysis on the content based data. Video analytics is the demand of today's world.

II. RELATED WORK
In that we are going to discuss about the Big Data and Hadoop. What are the advantages and disadvantages are there, which technique is more relevant and which not. The main point of concentration is how to access and store the video format data and apply different algorithm on it. Myoungjin Kim [3] et al. 2013 discussed about the "Hadoop Distributed Video Transcoding System in Cloud Computing". This is mainly focuses on how to use the different Transcoding of videos in different platform. Experiment result show increases in processing speed and quality. In the Video analytics mainly we have task like motion detection, feature extraction and many more. Tao LUO [4]  Shweta Pandey and Vrinda Tokekar [6] (2014), explains the various aspects of big data and different domains. In that survey they have explained about how to collect job and how to process the map reduce. We have lots of "Scheduling Algorithm" through which we can process the map reduce framework.
For the learning purpose and to know how map reduce work the best elaboration given on the [2] Amrit Pal and Pinki Agrawal (2014), in which they briefly explained about the count words program. In that first they have create a large data set and store it in the Hadoop Distributed File System. E. Dede [6] et al. 2014, explaining in this his paper that how we can use the "NoSQL with the Hadoop Map Reduce framework".
One more common issue we are facing in the field of video analysis in traffic. As there we have the 24*7 days recording which generate the huge amount of data. In [7], they have discussed about vehicle detection. For that they implemented "rule-based reasoning" algorithm.
From the few years TRECVid is working on different video analytics methods. Every year they organize some workshop where they thoroughly go through different techniques to work on video classes. Here assignment are given and provide the steady environment to share the information. They have to analyze the different video techniques like "video shot limits, recognition of semantic ideas inside of shots, video location, and video synopsis, semantic occasion discovery in CCTV and television news story division.
Today we are getting data from every source either it is YouTube, social media or any other. We are getting tons of data which is very difficult to store and later process. As these data comes under the unstructured data which is difficult to extract and transform [1].
So this section describes how important big data and hadoop framework is to analyze and process the data. This is the future field of modern computer science and need in depth research for such a huge data processing.

III. PROPOSED METHODOLOGIES
Our work is more focus on the Video comparison rather than the video analytics. If we are successful to compare the video then we can use that methodology for video analytics. As we are doing our project on the Hadoop so we need to familiar with the terminology we are going to use.

Hadoop:
One can say what is the Hadoop? Why we are using hadoop for video comparison? Is this the new method for the video analysis? So the answer for this is Hadoop is neither a methodology nor a technique. Hadoop is just a simple framework for java based programming through which we can process the large set of data on distributed system. Now concentrate on the word Distributed System which is the main beauty of Hadoop means as it can run on distributed system so there we need to less computation time to access large amount of data. We don't need to preprocess the data as we did in the previous database management techniques which make it more flexible. As in normal system happen like if any hardware failure occurs we lost the data processing but in hadoop we are using the distributed environment so we one system goes down rest will distribute the work and process it. Also we can add as many systems as we want. In hadoop we use the map reduce algorithm.

Map Reduce:
In every programming framework to execute or process data we need some logical concept or algorithm. In hadoop to process the big data we use the Map Reduce Algorithm. In this we perform two operations Map and reduce. In map we are getting the data from different place and convert it into the set of data (in tuples). Whereas Reduce took the data as input from map and combine these data to make the tuple's. Always we do the reduce operation after the Map operation. The following figure shows the process of map reduces: Figure 2 Process of Map Reduce [1] Just look to an example of map reduce program. In the figure [2] it is clearly explained that how we are going to map the data. Here we have taken three sentences as the inputs which are splits into three parts called input split. It is a piece of the input that is used by a single map. Here now we have to map the inputs. Here to get the output value we pass each split to a map function. In the shuffling phase we are extracting the common data to single place. After mapping and shuffling this data is taken as input of reducer which combines this data and we got the final output.

Hadoop Distributed File System:
This is mainly to handle and store the very large data. HDFS mainly have NameNode which work as the master and DataNode which acts as the slave. NameNode will have the all the details of DataNode while DataNode's work is to read, write request of clients.
The following figure [4] shows the architecture of read request from the client. Figure 4 Read Request in Map reduce [3] Here we have done the image comparison based on image duplicate founder. Here first we are taking the video file and extracting it using the FFMPEG to desired image format. Now as we know we can't process and store the image and video file in hadoop so we first need to convert it into the sequence file format.
Next we are going to store these sequence file to the Hadoop Distributed File system. We have to write the map reduce program in whatever language we want. Compile it and covert this in to the jar file as in hadoop to run program we need to convert it into jar format.

Principle of Operation:
The process of video analytics using hadoop can be understood from the below given algorithm:   Basically the idea is to use the video file we are using the hadoop streaming. Storage of hadoop is HDFS and the result also HDFS. Map function is for the image processing part.
As we know we have to run some commands to work with hadoop. Initially to before starting the hadoop job first we need to start the local host by putting following command:

$ssh localhost
Once the local host started we need to format the NameNode for make sure that there is no data exist. For that we execute the following command:

$hadoop namenode -format
To start the hadoop we need to execute the following command: $start-all.sh Here we have to confirm that our entire job is running my executing the "jps" command in command prompt. Now here we are extracting frame from the video file and putting it to a particular folder/Directory. This directory is then converted into the tar file. For example: $ java -jar tar-to-seq.jar image.tar image.seq Now we have to transfer this sequence file to the HDFS directory. As of now we have the map reduce function written so we need to compile it and create a jar file of this code. To run the code in hadoop we just simply write the following command in terminal: $ hadoop jar image.seq output/unique

IV. EXPECTED RESULT/OUTCOME
Our main work is to provide a system which can process the video format data. To develop this system we are going to use Ubuntu 14.04, Hadoop mainframe and Eclipse, as programming language Java will be used. To extracting the image FFMPEG tool is used, which is then converted to the sequence file format. Map Reduce program is written in the java language. Here first data is splited which is than process through the map function. Some classifier technique is use which categorizes the data in different sets. Further reduced algo is used and we got the output in the HDFS folder.
Below are the screenshots of our system which we tried to develop:  Figure 9 show the command prompt terminal to execute the commands. How to view where is our output is stored and log files. Figure 7 show the summary of our project. Its show heap size, how many submissions we have done. On how many nodes we are performing the operation and many more data. Figure 8 show the status of completed job. When the job is started, total map and reduce function and result of these map and reduce function.

V. FUTURE ENHANCEMENT
As big data is the major platform for the computer science student. We need to do more research on this. Hadoop provides the fast processing of large dataset. Through hadoop we can access the structured and unstructured data. So hadoop is really a good option to perform the video analytics'. In hadoop we need to take the input in sequences file format so any large amount of data can be processed very fast.
For the video analytics we can try different approach. For the Video we can do perform the comparison of subtitles' also through which we can get the exact time where our images are matched. Using this approach we can process video file very effectively and very fast.