OpenCV Based Disease Identification of Mango Leaves

—This paper aims in classifying and identifying the diseases of mango leaves for Indian agriculture. K-means algorithm is chosen for the disease segmentation, and the disease classification and identification is carried out using the SVM classifier. Disease identification based on analysis of patches or discoloring of leaf will hold good for some of the plant diseases, but some other diseases which will deform the leaf shape cannot be identified based on the same method. In this case leaf shape based disease identification has to be performed. Based on this analysis two topics are addressed in this research paper.


I. INTRODUCTION
Mango (Mangifera indica) is one of the delicious and most important fruit crops cultivated in Indian agriculture. It is exported to many countries in the form of raw or ripe fruits and also in the form of processed consumables like ripe mango slices or juice, raw mango Pickle, etc. Mango is rich in vitamin A and C, it also has rich medicinal values in Indian traditional Ayurvedic medicine, and Mango leaves are mostly used during rituals since these are having antibacterial activity against gram positive bacteria. In recent times the export value of Indian mango is declined due to the uncontrolled use of pesticides, hence it is a right time for the researchers to come up with ideas for early identification of diseases and control the use of dangerous pesticides which causes threat to human health. Common diseases of mango include gall midge infestation, black mildew hopper attack, mango malformation disease, pulp weevil, stem miner, anthracnose, alternaria leaf spots etc. These diseases are occurring due to the insects, bacterial, fungal and viral infection and these diseases affect the crop yield by infecting the leaves, flowers, fruits and stem. Infection in leaf causes the photosynthesis to be blocked and in due course of time it causes the plant to die. Identification of diseases or deficiency is usually carried out by farmers by frequent monitoring of the plant leaves, flowers, fruits or stem. For small scale farmers, early identification of disease is very much possible and able to control the insects by organic pesticides or by the use of minimal amount of chemical pesticides. For large scale farmers frequent monitoring and early identification of disease is not possible and it results in a severe outbreak of the disease and pest growth which cannot be controlled by organic means. In this situation farmers are forced to use the poisonous chemicals to eradicate the disease in order to retain the crop yield. This problem can be solved by automating the monitoring process by use of advanced image processing techniques. The proposed work aims in making the automated system easily available for the farmer's using the low cost devices or boards such as android devices and raspberry pi. The steps involved in disease detection are Digital image acquisition, Image pre-processing (noise removal, Color transformation, and histogram equalization), kmeans Segmentation, Feature extraction, and classification using the support vector machine algorithm which is a supervised learning algorithm. In case of disease identification based on discoloring of leaf, steps mentioned above holds good, but in case of disease identification based on leaf deformation or leaf shape, a separate learning algorithm for the leaf deformation analysis is needed; In this case the image pre-processing step is slightly different, where the preprocessed image is converted to a binary image for the leaf shape identification. Further to the morphological feature extraction, principal component analysis (PCA) is done to extract the relevant features of the plant. The classification of the plant is done using the SVM classification algorithm. The disease name and the corresponding feature vectors are added to the Matlab or OpenCV database using the learning algorithm. Different segmentation and classification algorithms are studied for the disease clustering and identification. K-means is one of the simple and robust segmentation algorithms to implement for low cost development and it uses unsupervised learning method to solve known clustering issues. Disease classification is achieved by SVM classifier, because of the implementation simplicity in both the Matlab and OpenCV libraries compared back propagation neural networks. Leaf shape based disease identification are done in spatial and in frequency domain image processing. In case of the spatial domain image processing, basic geometric features of leaf such as diameter, area, perimeter, physiological length, width, etc. are identified. Based on these geometric features, digital morphological features such as rectangularity, circularity, etc. are calculated for identifying the diseased leaf. The frequency domain analysis of the diseased leaf identification is done using the Elliptic Fourier descriptor (EFD) or elliptic Fourier analysis (EFA) [6]. The chain code of the leaf shape contour is extracted and the Fourier transform is applied for the extracted chain code. The number of harmonics required for the analysis is user definable, and each harmonic has 4 coefficients known as Fourier descriptors. Precise descriptions can be obtained with more harmonics, but usually ten harmonics are required to identify the leaf shape or disease accurately. This method is used only for the Matlab based implementation. For android based application morphological feature extraction technique is used because of the non availability of Fourier transform libraries in OpenCV. Implementation of the algorithm is done using both Matlab and OpenCV libraries and the results are compared. Since OpenCV libraries are open source, and implementation is possible in android devices and other low cost open hardware development boards such as Raspberry Pi and Beagle bone, The OpenCV implementation is fine tuned to get the accurate results compared to Matlab, so that it can be made available to farmers easily.

II. STEPS IN DISEASE IDENTIFICATION
A. Image acquisition The diseased leaf image is acquired using the digital camera interfaced to the Raspberry Pi hardware or the smart phone camera, the image is acquired from a certain uniform distance with sufficient lighting for learning and classification. The image background should provide a proper contrast to the leaf color. Mango leaf disease dataset is prepared with both black and white background, based on the comparative study black background image provides better results and hence it is used for the disease identification of mango leaf. B. Image pre-processing Image acquired using the digital camera is pre-processed using the noise removal with averaging filter, color transformation and histogram equalization. The color transformation step converts the RGB image to HSI (Hue, Saturation and intensity) representation as this color space is based on human perception. Hue refers to the dominant color attribute in the same way as perceived by a human observer. Saturation refers to the amount of brightness or white light added to the hue. Intensity refers to the amplitude of light. After the RGB to HSI conversion, Hue part of the image is considered for the analysis as this provides only the required information. S and I component are ignored as it does not give any significant information [2].  Masking green pixels: Since most of the green colored pixels refer to the healthy leaf and it does not add any value to the disease identification techniques, the green pixels of the leaf are removed by a certain masking technique, this method significantly reduces processing time [2]. The masking of green pixels is achieved by computing the intensity value of the green pixels, if the intensity is less than a predefined threshold value, RGB component of that particular pixel is assigned with a value of zero. The green pixel masking is an optional step in our disease identification technique as the diseased part of the leaf is able to be completely isolated in the segmentation process. C. Segmentation There are different image segmentation techniques like threshold based, edge based, cluster based and neural network based. One of the most efficient methods is the clustering method which again has multiple subtypes, kmeans clustering, Fuzzy C-means clustering, subtractive clustering method etc. One of most used clustering algorithm is k-means clustering.  features. Geometric features that are considered for disease identification based on leaf shape are rectangularity, sphericity, circularity, Aspect ratio, convex area ratio, convex perimeter ratio, form factor. The invariable moments are regional moments of inertia and angle code histogram which are not used in our project [3][5]. These digital morphological features are derived from a basic set of geometric features like diameter, physiological length and width, leaf area and perimeter [5].
With a large number of variables (morphological features of leaf) mentioned above, the dispersion matrix will become too large to study and interpret the data properly. It is very much possible that the variables under study may be correlated or provides the same meaning, which implies that the data are redundant. Having such variables for analysis increases the dataset size with no much importance in the data. There will be a total of 210 three-dimensional scatter plot combinations with 12 variables, compared to 35 combinations with 7 variables. We can see a drastic reduction of the process overhead by reducing the redundant data by applying the principal component analysis. So PCA will result in a more meaningful feature sets which are necessary to classify the disease under consideration. Principal component analysis (PCA) uses the simple statistical calculations like calculation of mean subtracted data and calculation of correlation matrix of the variables under consideration and the calculation of Eigen values and vectors for the correlation matrix and a max calculation of the data based on Eigen vector. This procedure converts the correlated variables in to a set of uncorrelated variables called principal components. The resulting principal components will be less than the number of original variables.

1) Aspect Ratio (AR):
The aspect ratio is the proportionality relation between the width and height, it is the ratio of maximum length L max to the minimum length L min of the minimum bounding rectangle (MBR).
2) Rectangularity (R):Rectangularity is the measure of similarity of the leaf to a Rectangle; it is the ratio of region-of-interest (ROI) area to the MBR area.
/ 3) Circularity (C):It is based on the bounding points of the ROI and is the ratio of the mean distance between the center of the ROI and all of the bounding points (μR)and the quadratic mean deviation of the mean distance (sR).

4) Convex Area Ratio (CAR):
The convex area ratio is the ratio of the ROI area and the convex hull area (AC).

5) Convex Perimeter Ratio (CPR):
The ratio of the ROI perimeter (PROI) and the convex hull perimeter (PC).

6) Sphericity (S):
The ratio of the radius of the incircle of the ROI (r i ) to radius of the excircle of the ROI (r c ).

7) Form Factor (FF):Form factor is a well-known shape description characteristic given by 4
Elliptic Fourier analysis (EFA) is another popular method used for leaf shape identification in which the frequency domain analysis of the leaf shape is done in contrast to the spatial domain in morphological feature extraction. In EFA the chain code of the leaf contour is first extracted [6]. Using the chain code extracted as shown in Fig8, a Fourier series expansion of the chain-code extracted from the closed-contour of the binarized leaf image in x and y axis is obtained as below. The harmonic coefficients a n , b n , c n and d n for each of the harmonics are obtained using the formula defined in [6]. Principal component analysis is then applied to choose the uncorrelated harmonic coefficients. Feature vectors are then used for the classification and identification of the leaf deformation. The EFA implementation is possible only with the Matlab as there are no library functions available in OpenCV.

IV. FUTURE WORK
The future research topics are aimed at resolving the drawbacks in the current work and to add some new inventions to the project. The main drawback in the current work is the background dependant image processing. I.e. during the image acquisition, leaf background should be arranged in such a way that it provides sufficient contrast to the leaf and the diseased part of the leaf so that a proper segmentation can be achieved. The future work aims at a real time image acquisition directly in the agricultural field and the implementation should be able to distinguish the background and the leaf area. Implementation of the robotic arm is planned as future advancement to this research topic. This implementation provides an autonomous robot which would be able to survey the agricultural field and identify the disease of the plants. The identified diseases will be automatically updated to the disease database which would be communicated to the farmer. The vehicle carrying the robotic arm could be a tractor in case of the fields with spacious crops like mango. In case of the other crops where the tractor cannot maneuver inside the field, a line following robot is implemented [mini project of the course] to carry the robotic arm. There is no special arrangement needed to create the lines inside the field for robot movement, drip irrigation pipes used inside the field will provide the direction for the robot movement inside the field.

V. CONCLUSION
A comparison study of the disease classification between Matlab and OpenCV implementation is carried out and OpenCV implementation is fine tuned to obtain the similar results of the Matlab implementation. OpenCV implementation provides easy availability for the usage by farmers in the form of Android App or in the form of a low cost application specific board. Leaf shape based deformation analysis is implemented in Matlab and it provides better results to classify the deformed leaf against the normal leaf.