Automated Real-time Gesture Recognition using Hand Motion Trajectory

—In this quite busy and technologically evolved world, gesture plays a very vital role in person’s everyday life to convey the data or send command to the machines using only the motions or wave of the hand and thus, automating the processes. Gesture recognition is basically a part of HCI (Human Computer Interaction). In the past recent years, many algorithms and methodology have been implemented on the gesture recognition and achieve a touch less environment between the computer and the human. In many of the developed algorithms and the methodologies, the use of high end cameras is mandatory. For example, the Kinect camera for motion capture used in PS3 gaming. In this paper, focus has been given on the utilization of the normal web camera to recognize the gesture as robust and correct as possible. The proposed methodology in the present work is the use of deep learning algorithm in order to learn the features of the gestures and then further classify them correctly in the real time. The main target is to recognize the alphabets of the English language like (A, B, C, D) by just waving hand in front of the web camera of the laptop with 5 mega pixel resolution. Firstly, the object of any specific color is detected for tracking the movement. Then, the skin color detection is done to effectively track the movement of human hand. The Deep Belief Neural Network is used to learn the gestures in the training phase of the project. The training of the system is done using the database manually created which consists of 11 characters, 4 samples each. Finally, the gestures are recognized in real-time using Deep Learning Algorithm. The proposed methodology with incorporation of the Deep Belief Neural Network learning method achieves the 95.5 % of the success rate. The recognition rate obtained from the present work mentioned in this paper is 95.5% which is significantly higher as compared to already published recognition rate which is 92.3% on the real-time gesture recognition.

So, it makes more suitable to the computer users than keyboard or mouse. In future, vision-based interpretation devices can replace the keyboard and mouse and even exclude their usage to communicate with a computer or laptop. The communication between humans is accomplished through different media like facial, body expressions, gesture and speech. The main advantage of utilizing hand gestures to communicate or interact with a computer is the vision based non-contact input method rather than the contact based techniques. Being an exciting part of the Human computer communications hand gesture recognition requires to be robust for factual life presentations, but human hand's complex structure presents various difficulties for being traced and deduced. Other than the gesture there are many problems likeflexibility and variability of human hand structure and the shape of gestures are also included. The ability to accurately detect and recognize the gestures in spite of the variations in clarity conditions and background noise has led to the aim of evolving a robust vision based application to run the objects via hand gesture recognition. The application presents extra effective and friendly devices for human computer communications cleverly with hand gestures. Hand gestures replace the use of mouse to control the movement of the virtual object. But the complexity is involved in detection. Also, a noisy environment that creates a major impingement on the recognition performance and the detection of human hand gestures. So, it is necessary to design low cost system using just a webcam so as to capture hand as input. Running of virtual objects can be done by modeling some predefined hand gestures based commands. The user can perform various actions as a command in a smart system which can be executed to fulfill the requirements of the user by executing theminto real-life requirements. The work in this paper is divided in three stages. 1) Feature Extraction 2) Training 3) Testing. In Section II, the work done by various researchers in the field of gesture recognition is explained. Section III describes the complete technique and methodology used for the process of real time gesture recognition. In Section IV, the results obtained from the proposed technique are presented and at the end, the conclusion of the work is shown in Section V.
II. RELATED WORK In this section, the work done in the last few years in the field of human computer interaction via gesture recognition is examined. In table1 the comparison of different techniques of hand gesture recognition is noted. The various techniques with their recognition rates marked is also listed in the table. Table 2 illustrates the various video processing techniques which can be used to process the frames of the videos which are handy in the processing of the frames taken from the camera in real time. Hong et al. [2]taken into consideration the gestures affected from spatio-temporal variations. The recognition of such continuous gestures has been done after their segmentation. Two variations of gestures were studied here. First one was for outline extraction taken from two arm movements and the second one gave feature vector from a single hand movement. A Multi scale Gesture Model was also proposed. Three approaches which differ in end point localization were presented by this model. The first approach was used to find the end points by multi-scale search and motion detection strategy. The end points of fingers were located approximately with active time covering by using the second approach.Third approach is based on active programming. The recognition ratio of 88% to 96% has been achieved for hand gesture recognition by using mentioned three approaches. But it is found that for hand gesture recognition in continuous video streams, the third approach was the best one. Cosio et al. [3] has done the classification and gesture recognition by using Artificial Neural Network. The Wii remote was used for the recognition of gesture by rotating in X, Y, Z directions. The gesture recognition was processed in two levels so as to minimize the memory consumed and the computational cost. Accelerometerbased gesture recognition method is used and verification is done in first level. In second level, Fuzzy automata algorithm has been proposed for gesture recognition without any kind of signal processing. After this, the k-means and Fast Fourier Transform algorithm was used for filtering and normalization of the data. The recognition accuracy increased up to 95% by using Dynamic Bayesian Network. Dominio et al. [4]introduced a novel hand gesture recognition scheme which utilizes the depth information of the image taken from the depth cameras. A set of 3-Dimensional features were used to recognize properly complex gestures by using 3-D information. Three main steps were presented in this hand gesture recognition system. In the first step, the hand samples were segmented from the background where palm, wrist and the fingers are subparts of the segmented hand samples. The four types of features are consisted in the proposed hand gesture recognition. Mentioned features for the segmentation are extracted in the second step. The distance from elevation of fingers tips to palm centre is the first two set of features. Computed curvature features of hand outline are included in the third feature set. The fourth set is based on geometry of the palm region. For the performance in front of the camera to identify the hand gesture the SVM classifier with constructed feature vectors is used and 95% accuracy is achieved. Peng et al. [5]proposed a system by which daily information of hand movements is saved from the internet. The analysis of main components was used for hand identification by taking YCBCR colour spaces for detection of skin colour and to detect and track the hand gestures CAMSHIFT algorithm is used. The skin detection was utilized for detection of the position and region of the hand. Detection of the skin region remains continued until the tracking trigger condition is enough. The PCA was used for segmentation and normalization. By experiments it is proved that for hand gesture recognition the accuracy rate 93.1% is achieved. The total time between0.1sec to 0.3 sec was taken for processing a single frame. Gharasuie et al. [6] proposed a system to identify the numbers-0 to 9 by using active hand gesture. This involved two steps. First one is pre-processing and classification is the second step. According to Gharasuie, there are two classes of gestures-Link gestures and Key gestures. For identifying the link gestures in continuous gestures' stream, the key gestures are used. For classification, discrete HMM was used and also to find the path between any two points in continuous gesture. The Baum-Welch algorithm was used to train the DHMM. Average identification rates were obtained by using HMM (93.84% to 97.34%.) Ren et al. [7] made a robust gesture identification system which was part based as it only took into consideration the fingers of hand by using economical depth camera i.e., Kinect sensor. For Kinect sensors, it is hard to identify the hand because of its low resolution. But the large objects are captured easily by them. To deal with the captured noisy hand by Kinect sensors, here a distance metric known as Finger Earth Movers Distance was proposed to match the fingers. To manage the noise associated while detection, whole hand was not considered. Since FEMD coulddifferentiate even the smallest differences in hand gestures, the system proved to be working efficiently in environments which are uncontrolled. The experimental results show the accuracy of 93.2%. Different approaches and different algorithms have been studied. From the past few years, the different research done on the real-time gesture recognition is scrutinized. The main thing is that by using the high-end cameras like Kinect and the 3D depth cameras the Gesture recognition with HMM and SVM algorithm giving good results. But the use of 3D cameras and special cameras requires high investment. The main aim of this project is to recognize hand gestures with high success rate and low cost as well as the same time. So, to full fill the gap of implementing the robust recognition system of gesture with low cost camera like web camera is the main aim.
After going through the work done earlier in this area, the problem is stated for this paper work. "Hand Gesture Recognition Using Camera" is based on concept of Image processing. In recent years, a lot of work is done on gesture recognition using Kinect sensor on using HD camera but camera and Kinect sensors are costlier. The present work is focused on reducing the cost and improving the robustness of the proposed system using simple web camera. To identify the character drawn using the simple hand motion is the main target. To analyze the hand movement skin based object detection and tracking of its movement with respect to each frame is done.
The main objectives of the present work are: 1. Creating a robust Gesture recognition algorithm for real time application using web camera and MATLAB software tool. 2. Recognition of at least four characters like ('L', 'Z', 'O', 'N') will be the part. 3. Utilizing DBNN -deep learning method for the classification and learning of the motion happening in front of the camera. Moving on to describe the methodology and computational technique in Section III.
III. METHODOLOGY Almost all of the gesture recognition methodologies have three major steps of implementation. First step is the detection of the object which can be hand or any specific object. The main motive of this stage is to identify the object in the image to track further. Various constraints related to environmental noise are required to solve to make sure that the contour is there on the object to be detected, in our case its hand. If the hand is extracted precisely then recognition accuracy could be better. Common image problems contain variation in the illumination, poor resolution and noise. The noise less environmental condition and better resolution camera devices can improve the problem related to get better recognition. However, when the gesture recognition system is working in the real environment, it is hard to control these conditions. Hence, the image processing methodologies are a good solution for the image related constraints to develop adaptive and robust gesture recognition system. The motive of this paper is to provide real time motion as the gesture identification. There is no image specific criterion to choose the gesture. Here, the hand movement in a specific arrangement is used to detect and recognize the gesture. The complete system steps for implementation of real-time gesture recognition are shown in the figure 1. The complete flow is also basically divided in to two parts. One is the training part and another is the testing part. The sequence is as follows: 1. Image acquisition device is on and continuously frames are captured. 2. As soon as the motion detected with the skin region. Then the algorithm of skin detection and making masked images is executed.
3. Tracing of the hand portion is done to mark the gesture. 4. If the gesture is drawn perfectly then it is saved for the further processes 5. Next step is the feature extraction 6. After the feature extraction of the gesture the features are saved in the .mat format 7. After all gesture feature extraction and saving of the database the deep neural network is trained for those collected features for respective gesture. 8. Then in the next step, in run time a gesture is drawn and the trained neural network predict about which particular gesture it is. The real-time hand gesture recognition system which is proposed in the paper have certain environmental condition requirement which is to be fulfilled to achieve the accurate results with better success rate. The main problem while working on the real system is to grab the noise free skin color and extract the hand out of that skin color. Now for the better detection of the hand a color based approach is used with as less as possible background stability. That means the background should not have much of the same color as that of the skin.Our second approach is the specific color object tracing in which an object of any specific color which is not there in the background is used to trace the gesture and then identifies it. If the gesture is drawn perfectly then our approach is robust as it maps the gesture drawn on the white background image with blue color gesture drawn. So, to get the maximum performance out of the proposed methodology, the background should be as stable as possible with less noise.

C. Deep Neural Networks
The paybacks of pre-training (DNN) a deep neural network has been covered broadly [9], [10], and the models discussed so far can be improved a little to construct outdated feedforward neural network.DBN trained on a specific dataset, an additional linear or logistic regression layer can be add to the top of the model, the output of the dataset is the base of training moved to the top layer of the DBN, and then use outdated back movement is designed for neural networks, and regularizing the parameter weights are included. The best results are given on the test of dataset.
D. Training a Better Neural Network A few optimizations are there for the backpropagation algorithm covered in Nielsen [11] and used for finetuning the DNN. To improve learning at the top of extremes of activation functions A better error/loss function is used, and to prevent weights from growing too large weight decay is used.
E. Learning of a Network To train a DBN is become simple by the composition of RBMs which are trained in an unsupervised way. Actually, training time of RBM governs the overall DBN training time, but simple code is made. The remarked labels are required for CDBNs during training of the top layer, therefore a training period includes that the first of all the bottom layer to be trained, for propagating the dataset via the learned RBM, and then new dataset is transformed as the training data for the next RBM. This remains active until the dataset propagated through the last trained RBM, where the labels are concatenated with the transformed dataset and used to train the top-layer associative memory.

F. Creating a Deep Neural Network
The pre-trained weights and hidden biases from a DBN, is used by the DNN model is a simple matter to extract the significant constituents from a trained DBN and the missing constituents are added to create a DNN ready to train. A DBN is taken by The DNN constructor and the number of classes in the target dataset as its arguments. The weights and biases for each layer in the new DNN are formed by these, and a final top layer with an noutput SoftMax is added, where n represent the number of classes in the target dataset.
G. Learning a Deep Neural Network Backpropagation algorithm is used to pre-trained the top layer of weights, being initialized random weights and the backpropagation algorithm is permitted to start with close weights ideal for the entire network, and effectively random outputs are not used. But the learning algorithm for a DNN is not as space-efficient as that of the RBM, since the output of each units of each layer requires to be retained for the backpropagation algorithm. The memory usage isn't an issue with smaller batch sizes, but this is only linear growth with an increase in batch size. In its current repetition, there is no early stopping applied for the backpropagation training, but it is specified as a number of periods by the user, along with other hyper parameters as given in Table 3.  . 4 shows the complete flow diagram of the training and testing process using DBNN (Deep Belief Neural Network). The input image is of size 28x28 resolution with total 784 pixel that will be processed in the DBNN hidden layers with three major layer process. After that a trained or learned vector is obtained. The testing process will be like the trained network will ask the query image and pass the image in to the trained DBNN network and return the probabilities of all the gestures and the maximum probability gesture number is the gesture of the query image.

A. Real time Gesture Recognition
Here in this particular work, considering deep belief neural network to recognize the gesture drawn in front of the camera in real time. To accomplish the task various preliminary experiments has been done. The various steps discussed are as follows: 1. Object detection based on color.   Figure 5 illustrates the color object detection output in which the top image is the blue color object detected with circle marked at the center image and bottom is the masked image of the same. This tracing will basically be used to mark the gesture drawn on the white background. Furthermore, the white background gesture will be treated as one of the sample to train the network to recognize the gesture.
C. Hand Detection using skin Detection The second experimentation is done using the skin detection algorithm and detection of hand. This algorithm is used to remove the dependency of the specific color object presence. In this experiment shown in fig. 6 and 7, the skin is used as the object to be traced. And with the help of filtering methods like median filtering the and the bwareaopen function of the Matlab, the noise of the skin color is removed to get the smooth detection of the hand and later smooth tracing of the hand movement. D. Database Creation By using this database creation method, we have created the database of the "11" characters and saved it in their respective folders as shown in fig. 8.   Autoencoder consists of an encoder and decoder in each of its single units. The input image is of size 28x28 which gives 784-dimensional vector as the training data as shown on the input side of fig. 10. The dimension value is reduced to 100 in the first stage of neural network. This is done with the help of the default values of weights (w) and biases (b). After passing them through second hidden layer, they are again reduced from 100 to 50 dimensions. This means that the features are compressed as we move on to each successive hidden unit of the deep learning network thereby, reducing the complexity. In these hidden layers, unsupervised training is done through the encoders. The final layer, generally referred to as SoftMax layer, can be trained for the classification of the input vectors received from the previous layer into various character classes. As shown in fig. 10, 50dimensional vectors are classified into 10 different classes which gives the desired output.
F. Testing of the Gestures Another GUI for the testing of the gesture is created. In this GUI, as soon as we hit the start button a camera interface will get open and the time gets started. The timer we set is for 10 seconds. So, we have 10 seconds to draw any respective gesture to get it recognized. And for the testing we have shown the four gestures correctly recognized using the GUI, when the gesture drawn with in the 10 seconds of the time.   In fig. 15 and 16, the hand of a user is detected by the system by using skin detection method and the characters drawn are correctly recognized as shown in the figures. Now the binary image creation is done for the hand. As shown in fig. 17,18 and 19, the skin color is detected which can be seen in the binary image where white region corresponds to the hand of the user and the gesture is recognized.    Table 4 shows the comparison of present work where 95.5% R.R. is achieved with the published results [13] having 92.3% R.R. for gesture recognition. This shows a major difference of approximately 3% in recognition rate validating the present work. V. CONCLUSION In this section, the various conclusions are listed as follows.

H. Comparison with published results
1. Set of 11 characters are taken into consideration in this paper and 11 out of 11 are successfully recognised in real-time implementation. 2. The task was completed using Deep Belief Neural Network technique which is quite fast and gives better results. The Deep Belief Neural Network was used to learn the gesture and then recognise the gesture in real time. The training of the system is done using the database manually created. Then the network created after the training of the system is used to detect the gesture which is quite fast and accurate. 3. The process is totally automated as the user just has to wave the hand in front of the image acquisition device. The recognition rate of our system is coming out to be 95.5 percent. The performance of the system in terms of execution time is for one character recognition the system is giving user 10 sec to wave the hand infront of the camera and draw a proper character and with in 1.12sec after the character drawn it gives the recognised character.
4. Comparing the present work recognition ratio of 95.5% with already published technique where recognition ratio was 92.3%, approximately 3% difference is achieved in the results. 5. The system can be utilised to run and control any robotic mechanism via tracking of hand gestures made by the user. Furthermore, the various processes could be automated by interfacing the system with the respective process to be controlled. For example, the recognition of character "A" can control a mechanism to proceed in a specific direction.