Indian Sign Language Recognition System

— Normal humans can easily interact and communicate with one another, but the person with hearing and speaking disabilities face problems in communicating with other hearing people without a translator. The Sign Language is a barrier of communication for deaf and dumb people. People with hearing and speaking disability are highly dependent on non-verbal form of communication that involves hand gesture. This is the reason that the implementation of a system that recognize the sign language would have a significant benefit impact on dumb - deaf people. In this paper, a method is proposed for the automatic recognition of the finger spelling in the Indian sign language. Here, the sign in the form of gestures is given as an input to the system. Further various steps are performed on the input sign image. Firstly segmentation phase is performed based on the skin color so as to detect the shape of the sign. The detected region is then transformed into binary image. Later, the Euclidean distance transformation is applied on the obtained binary image. Row and column projection is applied on the distance transformed image. For feature extraction central moments along with HU’s moments are used. For classification, neural network and SVM are used.


I. INTRODUCTION
The sign language is used widely by people who are deaf-dumb; these are used as a medium for communication. A sign language is nothing but composed of various gestures formed by different shapes of hand, its movements, orientations as well as the facial expressions. These gestures are generally used by deafdumb people in order to express their thought. Dumb-deaf persons faces communication barrier in public places while interacting with normal person, such as in bank, hospital and post offices. Sometimes the deaf needs to seek the help of the sign language interpreter so as to translate their thoughts to normal people and vice versa. However, this way turns out to be very costly and does not work throughout the life period of a deaf person. So a system which can automatically recognize the sign language gestures becomes a necessity. Introducing such a system would lead to minimize the gap between deaf and normal people in the society. The sign language in use at a particular place depends on the culture and spoken language at that place. Indian sign language (ISL) is used by the deaf community in India. ISL is a standard and well-developed way of communication for hearing impaired people in India and speaking in English. Different symbols are involved for different alphabets for Indian Sign Language. It consists of both word level gestures and finger spelling. This paper presents a method for the automatic recognition of the static gestures in the Indian sign language alphabet. The signs considered for recognition include 17 letters of the English alphabet.
In the proposed approach, the main focus is on the classification and recognition of the Indian sign language given by the dumb-deaf user in real time. Thus, the speed and simplicity of the algorithm is important. The system approach involves segmenting the hand based on the skin colour statistics, then convert that segmented image into binary, apply feature extraction on the binary image, for extraction of the features the techniques used are distance transformation, Discrete Fourier Transform, Probability distribution property that is central moments  Fig. 1 shows the gestures used for all alphabets in the Indian sign language.

II. PAGE LAYOUT
In gesture recognition, the sign language recognition forms an important application. It consists of two different approaches [6].
 Glove based approach  Vision based approach

1)
Glove based approach: Here the signer requires wearing a sensor or a colored glove. Wearing the glove simplifies the task during the segmentation phase. The limitation to this approach is that it becomes mandatory for the signer to bear the sensor hardware including the glove during the entire operation.
2) Vision based approach: It makes use of the algorithms of image processing for detecting and tracking the hand signs including the signer's facial expressions. This vision based approach is simple since the signers need not wear additional hardware. In the proposed system vision based approach is used.

II. RELATED WORK
Adithya V, Vinod P. R, Usha Gopalakrishnan [1] presented in their work, Artificial Neural Network Based Method for Indian Sign Language Recognition. For segmentation RGB colour spaced are transformed into YCbCr color space, the pixel of skin colour in the input images are identified by applying a thresholding technique based on distribution of the skin colour in YCbCr colour space. The result of segmentation produces a binary image in which the skin pixels are white in colour and background in black colour. For feature extraction distance transformation, row and column projection applied on distance transformed image, Fourier descriptor is applied on row and column projected image. Central moments are calculated. Anchal Sood, Anju Mishra [7] have presented in their work, AAWAAZ: A Communication System for Deaf and Dumb. For segmentation they have used Hue-Saturation-Value (HSV) histogram. For the extraction of the features Harris algorithm is used. For Feature matching and recognition, the dataset already has the feature extracted of standard image and are stored as N*2 matrix mat file. The matrix value of this image query is then matched with each of those in the data set of every image and the minimum distance between the matched features is calculated to get the desired result. Shreyashi Narayan Sawant, M. S. Kumbhar [8] have presented in their work, Real Time Sign Language Recognition using PCA. Data acquisition: 260 images are used 10 images of each 26 signs. The algorithm used for segmentation purpose is Otsu's method. Noise is removed from the images using the morphological filtering techniques so as to get the contour. Here the main feature used is the principal component. In the phase of recognition, normalization is done for the subject gesture with respect to the average gesture and then it is projected onto the gesture space using the eigenvector matrix. At last, Euclidean distance is calculated between this projection and all the other known projections. The one being minimum value of these comparisons is chosen for recognition during the training phase. The recognized sign is converted to appropriate text and voice. Suriya M, Sathyapriya N,Srinithi M,Yesodha V [9] presented in their work, Survey on Real Time Sign Language Recognition System: An LDA Approach. The algorithm used for segmentation purpose is Otsu's method. Here the main feature used is the principal component. KNN classifier are used for classification and Similarity measures likes Euclidean distance, City Block Metric, Cosine Similarity and Correlation are made used so as to evaluate the performance of classifiers. Madhuri Sharma, Ranjna Pal and Ashok Kumar Sahoo [10] presented in their work, Indian Sign Language Recognition Using Neural Networks and KNN Classifiers. In their work first derivative Sobel edge detector method is used as it can compute gradient using the discrete difference between rows and columns of 3×3 neighbours. Feature extraction techniques used are direct pixel value and hierarchical centroid. For classification 2 classifiers are used that are: K-Nearest Neighbour (KNN), neural network pattern recognition tool.

III. PROPOSED APPROACH A. Image Acquisition
Image acquisition is an operation of capturing the images of the hand gesture representing different signs. In this system publically available dataset is used for training and testing. The dataset used contains 17 different sign languages. The resolution for each image in dataset is 320*240. The resolution is same so as to lower the computational effort required for processing. The number of signs made use in the system are A, B, D, E, F, G, H, J, K, O, P, Q, S, T, X, Y, Z [13].

B. Hand Object Detection 1) Hand Segmentation:
For skin detection adaptive probabilistic model is used. In this model manually annotated skin and background images are used for creating 32×32×32 RGB colour histograms for both skin and background appearance, and these histograms were normalized and used as probabilistic models of the skin and background [11]- [12]. First step is that the intensity level of RGB image is adjusted, later the adjusted input image is given as input to the skin detection model, the skin model further detect the skin region and convert the detected skin region into binary image. The skin value is determined and further normalized, and skin area is detected.
2) Filter and Noise Removal: The resulting binary image may include some sort of noise and error in segmentation. Filtering and various morphological operations are performed on the input image hence decreasing noise and errors in segmentation if any. Here image morphology algorithm is used that performs image erosion and dilation so as to eliminate the noise.
3) Feature Extraction: After image segmentation and pre-processing, binary image is obtained containing the shape of hand which represents a particular sign. In order to classify this image, extraction of certain features of that image is employed. Shape is considered to be an important visual feature of an object. In this work feature for shape representation is used. The proposed shape feature is derived from the distance transform of binary image [1].  Distance Transformation: It is a derived representation of an image which is normally applied upon binary images [1]. To apply distance transform on an image, it should be first converted to binary form. A binary image contains object pixels as well as non object pixels. Applying distance transform of such image gives another image of the same size where each pixel value is replaced by the minimum distance of that pixel from its nearest background pixel. So it results in a gray-scale image where the gray scale intensity of the foreground region corresponds to the distance from the closest boundary pixel. In the proposed work, the distance transform is computed by using the Euclidean distance. -The first value is the Row vector, say R, where each value in this vector is the total sum of non-zero pixel values of the row from the distance transformed image.
-The second value is the Column vector, say C, where every element in the vector is the total sum of non-zero pixel values of the column of the distance transformed image.
-The above steps give the 1-D row projection and column projection vectors, which uniquely represent the shape of the hand from the input image. These vectors are considered to be the shape descriptor. These shape descriptors represent the shape of the hand locally, but are sensitive to noise. So these descriptors have to be processed further to make them robust. The central moments of higher order are related only to the shape and spread of the probability distribution not to its location. For any real-valued random variable, say X, the k th moment about the mean or k th central moment is given by µk= where E denotes Expectation operation. The zeroth central moment µ0 is one. The 1 st central moment µ1 is zero. The 2 nd central moment µ2 is known as the variance usually denoted as σ 2 , where σ denotes the standard deviation of the distribution. The 3 rd central moment µ3 defines skewness and 4 th central moment µ4 defines kurtosis.  Hu moments: Hu invariants moments [2] are calculated by using geometrical moments of hand region. The first 6 Hu moments gives shapes which are invariance to translation, scale and rotation. The 7 th Hu moment gives shape which is skew invariance, and helps to distinguish between mirrored images.

4) Classification:
The feature vector obtained from the feature extraction step is used as the input of the classifier that recognizes the sign. Artificial neural network is used as the classification tool. Classification step involves two phases: training phase and testing phase.

IV. IMPLEMENTATION DETAILS A. Implementation Platform Details
The hardware and software specifications of the platform on which the proposed approach implemented and tested is given below:

B. Dataset Details
The dataset are acquired from the internet with all the images having black background. The input image consists of only sign language gesture and no other skin area is present. The total number of images used is 848 with 320*240 of dimension. Below Fig. 2   The fig  Fig 2 (