Review: Support Vector Machines in Pattern Recognition

— SVM is extensively used in pattern recognition because of its capability to classify future unseen data and its’ good generalization performance. Several algorithms and models have been proposed for pattern recognition that uses SVM for classification. These models proved the efficiency of SVM in pattern recognition. Researchers have compared their results for SVM with other traditional empirical risk minimization techniques, such as Artificial Neural Network, Decision tree, etc. Comparison results show that SVM is superior to these techniques. Also, different variants of SVM are developed for enhancing the performance. In this paper, SVM is briefed and some of the pattern recognition applications of SVM are surveyed and briefly summarized.

. Each of the data points i x belong to either of the two classes labelled } 1 , . The goal is to define a hyperplane that maximizes the distance between the two class boundaries and divides all the input data into two classes, with all points of one class falling on one side of the hyperplane and of the other class falling on the other side.

A. Linearly Separable Case
In this case, for all the input data points, there exists at least one hyperplane, that linearly separates the two classes and the data points of each class fall into respective class in space. The goal is to find out the hyperplane that maximizes the distance between the two class boundaries. The data points of each class closest to a hyperplane are called Support Vectors. Fig .1 depicts Linear SVM for separable case. Let us define, The above set of equations can be generalized as, The equation for a separating hyperplane ) , ( b w can be derived as The distance between the two marginal hyperplanes is Hence, the optimum separating hyperplane can be considered as the solution to the problem of maximizing || || / 1 w subject to constraint (1), which, for mathematical convention, can be formalized as, Instead of solving equation (3) in primal form, one can solve its dual form by using Lagrange's Multiplier [16] and apply the Karush-Kuhn-Tucker (KKT) conditions.

B. Linearly Non-Separable Case
But, in real-life data, the data points are not linearly separable in most of the cases. Though, a maximal margin hyperplane that minimizes misclassification error can be determined by introducing a positive slack variable i  in constraint (3).
The variable i  exceeds unity in case of an error.  i i  is the upper bound for total of misclassification error.
Hence, the objective function in (3) becomes, Here, C is a parameter chosen by the user. Solving the above problem of equation (4) gives the Generalized Optimum Separating Hyperplane. Fig.2 illustrates SVM for linearly no-separable case.

C. Non Linear Kernels
In most of the practical cases, linear separation of input data points gets too restrictive. The input space is those cases is mapped to a higher dimensional feature space using a kernel function where the feature vectors can be linearly separated by a hyperplane.
In order to accomplish separability, the input data are non-linearly mapped into a higher dimensional feature space such as , where n P  . Fig. 3 depicts the mapping of input data sample into feature space by using  transformation. The training is then performed on the data obtained from the dot product Although,  is not known a priori and the dot product of the mapping functions is very expensive and complex. Using the Mercer's theorem [1] for positive definite functions As it can be seen that it is not necessary to know the mapping function  in order to calculate the feature vectors. Knowing only the input data and the kernel function is enough to calculate the training data. Some popular kernel functions are stated below. [3] 1. Polynomial kernel,

III. APPLICATIONS OF SUPPORT VECTOR MACHINE
SVMs are extensively used for pattern recognition. Researchers have proposed and developed many methods and techniques to solve pattern recognition problems using SVM. In this section, some existing methods of pattern classification are roughly categorized based on their purpose.

A. Object Detection And Recognition
Object detection is the technique that deals with detection of objects of certain class (eg. animals, trees, vehicles, etc.) in an image or a video. SVM comes handy in automatic detection of object of such classes when trained with proper training data. One such method was proposed and developed by Nakajima et al. [19]. They have defined a multi-class classification problem for people recognition and pose estimation. Authors have used pair-wise and DDAG multi-class SVMs with linear kernel. They have experimented on 640 images, taken 40 images for each of four different people selected. Each person was taken images with four different poses. For people and pose recognition, two features (colour histogram and local shape) were selected and are tested with. Results showed that he local shape feature outperformed the colour histogram feature.
Roobaert and Van Hulle [22] developed a model for recognizing 3D objects using SVMs. Their experiment was based on the COIL object database that contains 7200 images of 100 objects and each object with 72 different views. They have summarized their results with different number of training views taken and they found that with less than 18 training views taken, the performance of the model decreases. Pontil and Verri's [20] work is similar to [22]. They also have used COIL object database for their experiment. They have used linear SVMs for aspect-based 3D object recognition from a single view. Unlike [22], their experiment was performed without feature extraction, data reduction and pose estimation. Hence, the testing images contained noise, occlusion and pixel shifts. However, their result has shown a very good performance.
Pittore et al [21] proposed a system to detect the presence of moving people. They have represented the event by using a SVM for regression, and recognized trajectory of visual dynamic events from an image sequence by SVM classifier. Authors named their developed system as VIDERE (VIsual Dynamic Event REcogniton). Gao et al. [6] proposed an SVM based algorithm that tries to detect moving vehicles from shadows using shadow and head-lights elimination technique. The considered the problem to a simple two-class classification scenario.

B. Handwritten Character Recognition
In handwritten character recognition a computer receives input data from sources such as paper documents, photographs, touch-screens and other devices and interprets them as characters or letters or words of a language. SVM found to be very effective as compared to other learning algorithms in recognizing handwritten characters. A major problem in handwritten character recognition is its huge variability and distortions of pattern. Choisy and Belaid [5] proposed model to recognize French bank cheque words. For local view NSPH-HMM and for global view SVM were used by the authors.
Gorgevik et al. [7] used SVMs for handwritten digits recognition. They tested their model with single SVM classifier, and with two different SVM classifiers whose results were combined together using rule-based reasoning. Their experimental results shows, single SVM classifier is more efficient over rule-based reasoning in recognizing handwritten digits. Teow et al. [23] have proposed a digit recognition system that uses a linear SVM classifier by extracting features that are biologically plausible, linearly separable and semantically clear.

C. Face Recognition
Face recognition is a well-established field of research that deals with identifying or verifying a person from a digital image or a video frame from a video source. Many face recognition systems using SVM with very high performance, were developed till date. Among them, Guo et al. [8] proposed a multi-class SVM classifier for face recognition. They have also compared the results of their model with Nearest Center (NC), Hidden Markov Model (HMM), Conventional Neural Network (CNN), and Nearest Feature Line (NFL). The input dataset was first normalized using PCA. Then the normalized data were input to the SVM classifier. The SVM classifier outperformed the others with an error rate of 3% on ORL face database. In another model, Kim et al. [15] explored spatial relationship among potential eye, nose and mouth objects for face recognition using a modified SVM local correlation kernel. They compared their proposed kernel with existing kernels. It showed better performance as compared to the others with an error rate of 2% when tested on ORL database.
A component-based method was proposed by Heisele et al. [12] and its performance was compared with two global methods for face recognition by one-to-others SVMs. Huang et al. [13] proposed a model that generated a large number of synthetic face images for training of the system. The 3D models were rendered to train the system under various poses and illumination. In their component-based system, a single feature vector was formed by extracting and combining the facial components, which was then classified by the SVMs. The component-based method was compared to two global methods, which showed that component-based method performed better than the global methods.

D. Speaker Recognition And Speech Recognition
Discriminative classifiers and generative model classifiers are the two most popular techniques in speaker and speech recognition. SVMs are generally used in discriminative classifiers. SVMs were used by Bengio and Mariethoz [2] for speaker verification. They performed their experiments on different datasets. Instead of the classical thresholding rules, SVMs decide whether to accept or reject. Wan and Campbell [24] proposed a new technique in which they have normalized the traditional polynomial kernel and used with SVMs for speech recognition.
Some researchers applied SVM to visual speech recognition [9], [10]. Visual speech recognition recognizes speech from lip-reading of the speaker. In visual speech recognition, a particular sound uttered by the speaker is described by a generic facial image, called a viseme. Each viseme is described by SVM. SVMs were used as nodes by vitterbi algorithm for modelling the temporal character speech. Performance was evaluated by experimenting on audio-visual data Tulip 1, to solve the task of recognizing the first four digits in English [9], [10].

E. Some Other Applications
There are many other applications of SVM in pattern recognition problems. Nonlinear SVM was used by Moghaddam and Yang [17], [25] for gender classification. They have used FERET face dataset for their experiment. The training set of FERET dataset contains 1496 images (793 males and 713 females) and the test set contains 259 images (133 males and 126 females). They have used five -fold cross-validation technique for training and testing of each classifier using the face images. SVM outperformed some existing traditional classifiers with an error rate of 3.4%.
Gutta et al. [11] used SVM to classify face poses. FERET database was used for their experiment. Experimental results showed 100% accuracy. Huang et al. [14] also performed face pose detection using SVM. They classified face poses into three categories. Yao et al. [26] used multi-class SVM for fingerprint classification. Combining flat and structured representation of the features of fingerprints was used to train the SVMs. Results showed good performance of the model. Also, SVMs are widely used in detecting intrusions to a network or a host computer. In a model, proposed by S. Mukkamala [18], SVM light was used for designing intrusion detection system. Also, in comparison of SVM to ANN, SVM is found to be more efficient than ANN in terms of training time and detection time. Experiment was performed on benchmark DARPA dataset and authors have claimed above 99% accuracy for both the systems.
IV. CONCLUSION In this paper, we have presented briefly about SVM and discussed some applications of SVM in pattern recognition problems. Because of its excellent generalization performance SVMs are extensively used to solve various pattern recognition problems. Some of the applications are presented in this paper. Initially, SVM was applied for two-class classification problems. But, to obtain more specificity in results, SVMs are enhanced and used for multi-class classification problems as well. In some cases, different variants of the original SVM are also applied to yield better performance.
To prove superiority of SVM in pattern recognition problem, some authors have compared performance of SVM with other traditional empirical risk minimization techniques. In those cases, SVM outperformed those techniques in terms of training and testing time efficiency and classification accuracy due to having structural risk minimization principle. Some researchers have experimented SVM with different kernels and compared their performance. With proper selection of the kernel and balanced training data along with relevant feature selection, SVMs yield excellent result. SVMs are widely used in other classification problems also and proven to deliver excellent performance. It is an ongoing active field of research and lots of dimensions are yet to be unfolded. Although SVM performs excellent for classification problems but still it can be improved by some better techniques to find out the optimum parameters. Other future work can be proposed online learning based algorithm for real-life applications.