Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

erstood, because the <i>h_θ (x) </i>output is between (0, 1), and it also indicates the probability that the data belongs to a certain category, for example:</p><p id="2248"><i>h_θ (x)<0.5</i> indicates that the current data is of Class A;</p><p id="b045"><i>h_θ (x)>0.5</i> indicates that the current data is of Class B.</p><p id="89e2">So we can regard the sigmoid function as the probability density function of sample data. With the above formula, what we need to do next is to estimate the parameter theta.</p><p id="10bc">First of all, we see that the value of the <i>θ</i> function has a special meaning, which represents the probability that the <i>h_θ (x) </i>result takes 1, so the probability for the class 1 and Class 0 for the input <i>x</i>classification results are [10]:</p><figure id="496d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CpCCft9iRuFGQBvMSLC1kw.png"><figcaption></figcaption></figure><figure id="868b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sqtAMqzawwoH5vmdmBILXA.png"><figcaption></figcaption></figure><p id="f17d">respectively.</p><p id="eafe">Maximum likelihood estimation:</p><p id="5f3a">Based on the above formula, we can use the maximum likelihood estimation method in probability theory to solve the cost function. First, we get the probability function as follows [11]:</p><figure id="946c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*BnJBKQ-aLGbRzc5zTpwMBQ.png"><figcaption></figcaption></figure><p id="4a7b">Because the sample <i>data (m)</i>are independent, their joint distribution can be expressed as the product of each marginal distribution, and the likelihood function is [11]:</p><figure id="085e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*arkYrgage24WcaNMxgmUXA.png"><figcaption></figcaption></figure><figure id="6c70"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gnGMViRiIxI5OctyPYXaPQ.png"><figcaption></figcaption></figure><p id="b444">Taking logarithmic likelihood function [12]:</p><figure id="173f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*IpTMJjpLjuNPVU035rmOmg.png"><figcaption></figcaption></figure><figure id="d3a1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6649lKKe5eqIEIiqlfFMPw.png"><figcaption></figcaption></figure><p id="cef7">The maximum likelihood estimation is to get the value of <i>θ</i> that requires the maximum value of <i>L(θ) </i>to be obtained, which can be solved by the gradient ascent method. Let’s change it a little bit [12]:</p><figure id="3b70"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8yZ5XXA-bm6fIzfHd5X2Jw.png"><figcaption></figcaption></figure><p id="733d">Since a negative coefficient <i>1/m </i>is multiplied, the gradient descent algorithm can be used to solve the parameters.</p><h1 id="19ae">1.1.1 Support vector machine</h1><p id="8be1">We wish to minimize SVM (support vector machine):</p><figure id="7830"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7fdWN2O4BUUn9xH9mY2Imw.png"><figcaption></figcaption></figure><p id="4911">where</p><figure id="c29b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*lFgdX6tG3dku8jilKS77sg.png"><figcaption></figcaption></figure><p id="ab20">This is the definition of support vector machine in mathematics. It looks like the cost function <i>J(θ)</i>, but adding a regularization term to the right. In a word, SVM is the problem of minimizing the above formula, thus obtaining the parameters <i>C</i> and <i>θ</i>.</p><figure id="baa9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1Q60FmsQxCoslUw4CKyjDQ.png"><figcaption></figcaption></figure><p id="5326">Parameter <i>θ </i>is the form of hypothesis function of support vector machine.</p><p id="9bd3">When you solve this optimization problem, and when you minimize the function of variables, you will get a very interesting decision boundary, SVM decision boundary:</p><figure id="0ca6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CahyCZzmDnbzgtmfwDw_LQ.png"><figcaption></figcaption></figure><p id="5c12">This is the boundary that classifies the samples.</p><h1 id="f261">3.2 Artificial neural networks(ANN)</h1><p id="0854">An ANN is a model that can learn patterns from data by simulation the human nervous system [13]. They are applied in various scientific areas or solving engineering problems as to provide any non-linear function without being explicitly programmed.</p><p id="849c">The architecture of ANN used in this paper is the most popular feedforward artificial neural network. It has three layers: an input layer, one or several hidden layers and an output layer. The basic elements of each layer are named neurons. The letter recognition dataset has 16 attributes as input features and 26 capital letters (A to Z) as output labels, which represents 16 imputes and 26 outputs. Figure 2 is a typical simplified structure of ANN with 2 inputs, 3 hidden neurons and 1 output neuron.</p><figure id="70bf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EU2MRbiE9b4vwngYYMl_6w.png"><figcaption>Figure 2.Typical Neural network [14]</figcaption></figure><p id="5fcf">All above neurons have the same structure (Figure 3) and consist of two unit: a sum unit and a function unit. All of the weighted input value will be summarized and transfer to a single value <i>x</i>. Then the output of each neuron is called the output function or activation function. and represented with<i> f(x)</i>.</p><figure id="1b88"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*lGhWeMQTJOag_CyMapYj7Q.png"><figcaption>Figure 3. Structure of neuron [15]</figcaption></figure><p id="8d97">There are many output functions, including identity function or simple linear function. We use the most common one called sigmoid function which is already introduced before.</p><p id="fe58">Then the output of each neuron can be calculated like the following formula based on the formal mentioned typical ANN structure [16]:</p><figure id="c518"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MJ1QX1R2TPh1wMiulX38Wg.png"><figcaption></figcaption></figure><p id="0551">where:</p><figure id="dea3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sKDslY_AIEWyMcQcR8wVFw.png"><figcaption></figcaption></figure><p id="6e66">The example ANN is just doing a binary classification since it has only one output neuron. In our study that has 26 output neurons, each will be allocated to a class (capital letters from A to Z). Considering each output neuron allocated one sigmoid function, the output of our ANN is 26 numbers between 0 and 1. The class will then be recognized as the neuron which has the highest number.</p><p id="4a4f">To differentiate the output classes in ANN network, different weights are assigned to neuron inputs. Together with the input value, the weight of input neuron can hence determine the neuron’s output between 0 and 1 by the sigmoid function. The propose of the network training is to find the best value of weight to approach the most accurate classification.</p><p id="433c">The training algorithm is called backpropagation, which aims to updates all the weights with differences between the actual response and the function response. The training will be processed with a given number of iterations, until output a satisfied result with appropriate weights.</p><h1 id="41c8">4 Overview of SVM and ANN recognition system</h1><p id="a946">The proposed recognition system for the SVM and neural network was implemented using R 3.4.3.</p><h1 id="4676">4.1 SVM system</h1><p id="2533">The flow of the proposed SVM recognition system is shown in Figure 4.</p><figure id="551f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*zjNoW7bh14dSTExasB0Atg.png"><figcaption>Figure 4. The flow of the proposed SVM recognition system</figcaption></figure><h1 id="86c8">4.1.1 Preparation of data</h1><p id="02f0">Read the data into R, and confirm that the data received has 16 characteristics, which define the case of each letter class.</p><figure id="3534"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*fJ5AoxcQyXlOfDcOgO395A.png"><figcaption></figcaption></figure><p id="ec9b">In this case, each feature is an integer. On the other hand, some of these integer variables appear to be quite wide, which seems to suggest the need for standardization or normalization of data, but fortunately, the R used to fit the support vector machine model will automatically help us to adjust the data.</p><p id="b3a0">Now we are entering the training and testing stage of machine learning process. I use the first 16000 records to build models and use the 4000 records to test them. We create training data frames and test data boxes. The code is as follows:</p><figure id="41e8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*TCiiPUfq426iY1mgrp-YrQ.png"><figcaption></figcaption></figure><h1 id="e624">4.1.2 Training SVM</h1><p id="22c2">The e1071 package from the Statistics Department of Vienna University of Technology provides a R interface of the LIBSVM library, and then we call SVM () function based on training data. We start with training a simple linear support vector machine classifier and use the linear option to specify a linear kernel function.</p><figure id="f895"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*K40B7DXmVPPEbw6QjtuCGw.png"><figcaption></figcaption></figure><p id="a333">Depending on computer performance, this operation may take some time to complete. When it is finished, input the name of the model to see some basic information about training parameters and the fitting degree of the model.</p><figure id="5e6e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*rk_otZfyUnZLIvnw4pfdIQ.png"><figcaption></figcaption></figure><p id="3825">This information hardly tells us how well the model works in the real world. Therefore, we need to study the performance of the model based on the test data set, so as to determine whether it can be well extended to the unknown data.</p><h1 id="22b1">4.1.3 Evaluating the performance of the model</h1><p id="43bf">The Predict () function allows us to predict based on the test data using the alphabetic classification model.</p><figure id="54b0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Co0E8Nih_8Of-htEyYlv_Q.png"><figcaption></figcaption></figure><p id="5051">We need to compare the predicted values with the real values in the test data. For this purpose, we use the table () function. Only one part of the table is shown below.</p><figure id="3fe5"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*doPsOPqm-b91h9aZJ1QdoQ.png"><figcaption></figcaption></figure><figure id="42bd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Ou9CRmVVS4sVdrbPi6Nr8Q.png"><figcaption></figcaption></figure><p id="5973">The diagonal value 137,151,126,127,140,127,132 represents the total number of records that match the predicted value with the true value. Similarly, the number of mistakes is also listed. For example, the value 3 in the Z row and the T column indicates that there are 3 cases where the letter T is mistaken for the letter Z.</p><p id="5794">A single look at each type of error may reveal some interesting patterns of specific letter types that are difficult for model identification, but it is also time consuming. Therefore, we can simplify our evaluation by calculating the accuracy of the whole, that is, only the letters that predict are correct or incorrect, and the types of errors are ignored.</p><p id="71b6">The following command returns a vector of TRUE or FALSE values to indicate whether the letter predicted by the model matches the real letter in the test data.</p><figure id="d3c6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*npQId1WBPIwgsyN4PuacuQ.png"><figcaption></figcaption></figure><p id="91ce">Using the table () function, we see that in the 4000 test records, there are 3364 letters correctly identified by the classifier.</p><figure id="e6c2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*GA1vnutdXjWCGyCMVJB_NA.png"><figcaption></figcaption></figure><p id="221f">As a percentage, the accuracy is about 84.1%:</p><figure id="231e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sOE1xSDmobwdLApEc1Tpqg.png"><figcaption></figcaption></figure><h1 id="5e69">4.1.4 Model performance tuning</h1><p id="73bf">Previous support vector machines use simple linear kernel functions. By using a more complex kernel function, we can map data to a higher dimensional space, and it is possible to get a better degree of fitting. A popular practice is to start with the Gauss RBF kernel function, because it has proved to be able to run well for many types of data. We can use the SVM () function to train a RBF based support vector machine. The default kernel function of the SVM () function in e1071 is kernel = “radial” (RBF), so it does not need to be set, the code is shown as follows:</p><figure id="331a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1Y5DYiC9tqb2rz1luIYr9Q.png"><figcaption></figcaption></figure><p id="846c">Then, we predict as before:</p><figure id="a03f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4j6aNXwCqA7HmApkfdHwSw.png"><figcaption></figcaption></figure><p id="e8d4">Finally, we compare the accuracy with our linear support vector machine.</p><figure id="e517"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*QZ-z1yypIblivjqIc3BoRQ.png"><figcaption></figcaption></figure><p id="4e8c">By changing the kernel function, we improved the accuracy of the character recognition model from 84.1% to 94.2%.</p><h1 id="8001">4.2 ANN system</h1><p id="1ce8">Package nnet was adopted to realize feedforward artificial neural network, which has one hidden layer of sigmoid function neurons. Figure 5 shows the training and testing procedure of using nnet package in R.</p><figure id="e0f4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*fF_Ne_WlbsDyKtvsUWIVSQ.png"><figcaption><i>Figure5. The flow of the proposed ANN recognition system</i></figcaption></figure><h1 id="8704">4.2.1 Installing nnet package & dataset classification</h1><p id="9c54">First the nnet should be installed and then load to RStudio. Then we need to load the source data and classified into trainset and testset with corresponding records descripted in section 2. The function is realized in R as follows:</p><blockquote id="3bcc"><p>install.packages(nnet)</p></blockquote><blockquote id="4940"><p>library(nnet)</p></blockquote><blockquote id="ce6d"><p>letters<- read.csv(“~/ML/letterdata.csv”)</p></blockquote><blockquote id="3f9f"><p>trainset<-letters[1:16000, ];</p></blockquote><blockquote id="db62"><p>testset<-letters[16001:20000, ];</p></blockquote><h1 id="440e">4.2.2 Fit ANN with nnet</h1><p id="1b6e">Although many parameters are fixed in nnet package, there are still some parameters for tuning to maximize the accuracy of classification. Table 2 sh

Options

ows the meanings and default values of the arguments used in our training procedure.</p> <figure id="a5f9"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrLKjrzdaAOUufM9&url=https%3A%2F%2Fairtable.com%2FshrLKjrzdaAOUufM9&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="05c3">12 different neural networks architectures were chosen based on the mixed combination of parameters. Note that for ANN with large inputs, the parameter <i>rang </i>should set based on the following formula that:</p><p id="1b07">In this experiment, we set <i>rang </i>based on the mentioned formula in 11 and 12 neural networks.</p> <figure id="a28b"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrymyqHCec27pzmu&url=https%3A%2F%2Fairtable.com%2FshrymyqHCec27pzmu&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="d6ee">The training function is realized in R as follows:</p><blockquote id="c43d"><p>letters.nn = nnet(letter ~ .,data = trainset,rang = 0.1,decay = 5e-4,size = 20,maxit = 5000)</p></blockquote><h1 id="db7c">4.2.3 Prediction using ANN</h1><p id="f783">Then we use the trained ANN model to propose further prediction with testset. The result will be explained in detail in section 5. The predicting function is realized in R as follows:</p><blockquote id="53e4"><p>## testset</p></blockquote><blockquote id="6816"><p>letters.predict = predict(letters.nn,testset,type = “class”)</p></blockquote><h1 id="8d51">4.2.4 Performance evaluation</h1><p id="b202">In this part, we need to evaluate the performance of the network module. Also the comparative analysis will present in section 6. In R system this is realized using the function confusionMatrix as follows:</p><blockquote id="1e3a"><p># use union to ensure similar levels</p></blockquote><blockquote id="9989"><p>u = union(testset $letter,letters.predict)</p></blockquote><blockquote id="ba24"><p>nn.table = table(factor(letters.predict, u), factor(testset$ letter, u))</p></blockquote><blockquote id="7166"><p># Evaluate the result</p></blockquote><blockquote id="fa97"><p>confusionMatrix(nn.table)</p></blockquote><h1 id="52f0">4.2.5 Plotting ANN</h1><p id="a879">Figure 6 shows one of the ANN structureadopted in our test with 20 neurons in hidden layer. Note the bias neurons B1 and B2 added in hidden layer and output layer to allow the neural network to change the output value on demand. Also, larger weights will be showed with thicker lines and color indicates sign, that is black and grey represents positive and negative prediction value respectively [18].</p><figure id="5a97"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*FZuPgEV1bQlCa99-VCowAA.png"><figcaption><i>Figure6. One proposed ANN structure</i></figcaption></figure><p id="5e5b">This is realized with the function plotnet in R as follows:</p><blockquote id="0d46"><p>#plot nn</p></blockquote><blockquote id="9b74"><p>library(NeuralNetTools)</p></blockquote><blockquote id="bef0"><p>plotnet(letters.nn)</p></blockquote><h1 id="004a">5 Results & Discussion</h1><h1 id="6bbc">5.1 SVM system evaluation</h1><p id="157c">The following table shows the prediction results of single character recognition of 2 SVM classifiers. The statistics data showed the recognition accuracy of each classification of AZ letters. We can conclude that 23/26 of recognition accuracy of SVM with radial basis function kernel are over 90%, which obviously presents the better performance than linear SVM.</p> <figure id="af7c"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2Fshr4hGliNU0beDz1q&url=https%3A%2F%2Fairtable.com%2Fshr4hGliNU0beDz1q&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="c5cf">Table 5 present the recognition result of two SVM system with linear and radial basis function(RBF) kernel, where the SVM classifier with radial basis function(RBF) kernel shows better accuracy.</p> <figure id="97f2"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrGMKipcBjc6dqMI&url=https%3A%2F%2Fairtable.com%2FshrGMKipcBjc6dqMI&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="7f09">Figure 7 shows the recognition accuracy result of above two SVM approaches. The SVM with rbf kernel performs obviously better than the linear SVM, more than ten percentage.</p><figure id="164c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1sOUcA-SZ648mIYqNzpj8g.png"><figcaption><i>Figure 7 Comparing the recognition accuracy For SVM with different Kernel Function</i></figcaption></figure><h1 id="e80b">5.2 ANN system evaluation</h1><p id="b241">Table 6 shows the prediction result of the 12 ANN. The statistics data showed the recognition accuracy of each classification of AZ letters and total result. We can conclude that the networks 10 presents the best performance.</p> <figure id="7efd"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshruMmudhMFoLNUX8&url=https%3A%2F%2Fairtable.com%2FshruMmudhMFoLNUX8&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="7a9d">Figure 8 shows the recognition accuracy total result of all 12 ANN approaches. However, only one of them obtains over 90% accuracy of recognition.</p><figure id="0b1b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RbTnrbCMZpR04qKMlmpaMA.png"><figcaption><i>Figure 8 Comparing the recognition accuracy for 12 ANN approaches</i></figcaption></figure><h1 id="4353">5.3 Compare ANN& SVM</h1><p id="0708">The result in Table 7 shows that SVM with Kernel function of Radial Basis Function(RBF) performs better than Artificial Neural Network.</p> <figure id="6266"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2Fshr4sjjzrMEZevJtX&url=https%3A%2F%2Fairtable.com%2Fshr4sjjzrMEZevJtX&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="786f">Figure 9 also visualized the comparing result.</p><figure id="32cb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dl6HyPspqyKNwpg0kF3Zdg.png"><figcaption><i>Figure 9 Comparing the recognition accuracy of SVM and ANN</i></figcaption></figure><p id="7636">In our report, we use 2 support vector machines and 12 neural network classifiers. When we use a linear kernel function support vector machine classifier, it can correctly identify the alphabet accuracy of 84.1%, which is better than the 9 neural network classifier; while we use the RBF kernel function support vector. The accuracy of the classifier is 94.2%, which exceeds all 12 neural network classifiers. This result shows that SVM should be an advantage for small samples, and it is indeed one of the best classifiers.</p><p id="e659">Support vector machine is based on statistical theory, so it has strict theoretical and mathematical basis, not like neural network structure design need to rely on the designer’s experience knowledge and prior knowledge. Compared with neural network learning methods, Support vector machine has advantages:</p><p id="9071">1) Support vector machine (SVM) is based on SRM (structural risk minimization) principle, having good generalization ability.</p><p id="2b5d">2) By mapping the nonlinear problem in the input space to the high-dimensional space by constructing the kernel function, the linear function is constructed in the high-dimensional space.</p><p id="37a1">3) The algorithm can be transformed into a convex optimization problem, which guarantees the global optimality of the algorithm and avoids the local minimum problem which the neural network can not solve.</p><p id="acb2">4) Support vector machine has strict theoretical and mathematical foundation, avoiding the empirical components in neural network implementation.</p><h1 id="03d9">6 Conclusion</h1><p id="34cf">Support vector machine (SVM) is a new generation learning machine based on statistical learning theory. It has many attractive features. It is superior to the traditional artificial neural network in function expression, popularization and learning efficiency.</p><p id="4730">Since SVM is used to solve support vectors with the aid of the quadratic programming, which will involve the calculation of the m order matrix (m is the number of samples). When the number of m is large, the storage and calculation of the matrix will consume a large amount of machine memory and operation time.</p><p id="d8e9">The main improvements to the above problems include J.Platt’s SMO algorithm, T.Joachims’ PCGC, Zhang ‘s CSVM, and O.L.Mangasarian’s SOR.</p><p id="17f4">In the report, we have studied two machine learning methods that can provide great potential in letters recognition and classification. Result shows that comparing with Artificial neural networks(ANN),Support vector machine (SVM) has better approximation ability and generalization ability. Future research on the application of Support vector machine (SVM) will spur greater interest in various fields for its unique superiority.</p><h1 id="14a5">7 Bibliography</h1><p id="2cf6">[1]M. R. Phangtriastu, Jeklin Harefa and Dian Felita Tanoto, “Comparison Between Neural Network and Support Vector Machine in Optical Character Recognition,” <i>Procedia Computer Science, </i>vol. 116, pp. 351–357, 2017.</p><p id="d8e8">[2]Malon C, Uchida S and Suzuki M, “Mathematical symbol recognition with support vector machines,” 2008. [Online]. Available: <a href="http://www.sciencedirect.com/science/article/pii/S0167865508000603.">http://www.sciencedirect.com/science/article/pii/S0167865508000603.</a></p><p id="360b">[3]Rao NV and Pradesh A, “OPTICAL CHARACTER RECOGNITION TECHNIQUE,” <i>Technology AI, </i>vol. 83, no. 2, 2016.</p><p id="9a19">[4]Mahto MK, Bhatia K and Sharma RK, “Combined horizontal and vertical projection feature extraction technique for Gurmukhi handwritten character recognition,” in <i>2015 International Conference on Advances in Computer Engineering and Applications</i>, 2015.</p><p id="8b19">[5]M. OVR, “Zoning based Devanagari Character Recognition,” vol. 27, no. 4, p. 21–5, 2011.</p><p id="827e">[6]C. L. Blake and C. J. Merz, “UCI repository of machine learning databases,” 1998. [Online]. Available: <a href="https://archive.ics.uci.edu/ml/datasets/letter+recognition.">https://archive.ics.uci.edu/ml/datasets/letter+recognition.</a></p><p id="8050">[7]J. Bell, “Support vector machines,” <i>Machine Learning: Hands-On for Developers and Technical Professionals, </i>pp. 139–160, 2014.</p><p id="60a8">[8]Nasrabadi and N. M, “Pattern recognition and machine learning,” <i>Journal of electronic imaging , </i>vol. 049901, no. 16, p. 4, 2007.</p><p id="8a01">[9]R. E. Schapire, “The boosting approach to machine learning: An overview,” <i>Nonlinear estimation and classification, </i>pp. 149–171, 2003.</p><p id="dbcd">[10]Dreiseitl, Stephan and Lucila Ohno-Machado, “Logistic regression and artificial neural network classification models: a methodology review,” <i>Journal of biomedical informatics, </i>vol. 35, no. 5–6, pp. 352–359, 2002.</p><p id="c4e1">[11]C. Robert, “Machine learning, a probabilistic perspective,” pp. 62–63, 2014.</p><p id="c713">[12]Jordan, Michael I. and Tom M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” <i>Science, </i>vol. 349, no. 6245 , pp. 255–260, 2015.</p><p id="16f8">[13]Gazzah, Sami and Najoua Ben Amara, “Neural networks and support vector machines classifiers for writer identification using Arabic script,” <i>International Arab Journal of Information Technology (IAJIT), </i>vol. 5, p. 1, 2008.</p><p id="35f8">[14]V. D. Do and Dong-Min Woo, “Handwritten Character Recognition Using Feedforward Artificial Neural Network,” in <i>7th International Conference on Latest Trends in Engineering & Technology</i>, Pretoria, 2015.</p><p id="8b00">[15]Kumar, Parveen, N. Sharma and A. Rana, “Handwritten Character Recognition using Different Kernel based SVM Classifier and MLP Neural Network (A COMPARISON),” <i>International Journal of Computer Applications, </i>vol. 53, no. 11, 2012.</p><p id="649b">[16]S. D. B. M. N. L. M. M. K. a. D. K. B. Arora, “Performance comparison of SVM and ANN for handwritten Devnagari character recognition,” vol. 1006, no. 5902, 2010.</p><p id="54b4">[17]Brian Ripley and William Venables, “Feed-Forward Neural Networks and Multinomial Log-Linear Models,” 2 2 2016. [Online]. Available: <a href="https://cran.r-project.org/web/packages/nnet/nnet.pdf.">https://cran.r-project.org/web/packages/nnet/nnet.pdf.</a> [Accessed 4 6 2018].</p><p id="c900">[18]S. Thompson, “Visualizing neural networks in R — update,” 4 3 2013. [Online]. Available: <a href="https://beckmw.wordpress.com/tag/nnet/.">https://beckmw.wordpress.com/tag/nnet/.</a> [Accessed 4 6 2018].</p></article></body>

Performance Comparison of SVM and ANN for Handwritten Character Recognition

Jinqi Ge

Harry Zheng

Abstract

Support Vector Machines(SVM) and Artificial Neural Networks(ANN) are among the most popular methods applied in various kind of pattern recognition. It is never an easy task for a machine to recognize letters, numbers, figures like humans being. Character recognition has become a challenging and fascinating topic in the field of image processing and machine learning. In this paper, we propose to recognize handwritten character by feedforward neural network and SVM classifier. Letter recognition dataset is used for training the SVM and ANN. Both methods are divided into training and testing phase. A comparative analysis and evaluation between two classifiers is presented with SVM and ANN. The experimental results suggest that comparing ANN, we reached a higher accuracy using SVM (RBF kernel) with an average rate of 94.2%.

3.1 Support Vector Machines(SVM)

3.1.1 The cost function

3.1.2 Logistic regression

3.1.3 Support vector machine

3.2 Artificial neural networks(ANN)

4 OVERVIEW OF SVM AND ANN RECOGNITION SYSTEM

4.1 SVM system

4.1.1 Preparation of data

4.1.2 Training SVM

4.1.3 Evaluating the performance of the model

4.1.4 Model performance tuning

4.2ANN system

4.2.1Installingnnetpackage&datasetclassification

4.2.2FitANNwithnnet

4.2.3 Prediction using ANN

4.2.4 Performance evaluation

4.2.5 Plotting ANN

5 RESULTS & DISCUSSION

5.1 SVM system evaluation

5.2 ANN system evaluation

5.3 Compare ANN& SVM

6 CONCLUSION

7 BIBLIOGRAPHY

1 Introduction

Recently, handwritten letter recognition has become one of the most challenging and popular research topic in the area of pattern recognition and artificial intelligence. This is due to the variety of handwritten styles, the lack of enough support information of source and the variations between different writers [1]. For these reasons, it is very hard to identified by a machine.

Several up-to-date methods and techniques were proposed to decrease the processing time and give more accurate recognition rate. In 2008, Malon and his team conducted a research on the Mathematical symbol recognition with SVM [2]. Support Vector Machine is adopted to classify several experimental mathematical symbols and improved the performance of InftReader method. The result was claimed that the SVM could bring the error rate down to 41%.

Another study comes from Rao and his team in 2016, who proposed a modified back propagation based method for optical character recognition [3]. This method dramatically reduced the error and reached 100% promising accuracy in Optical Character Recognition (OCR).

In the next study, research on letter recognition is proposed by Mahto, Bahtia and Sharma in 2015 [4]. To classify the handwritten letter of Gurmukhi, they introduced a new technique for combing the horizontal and vertical projection feature extraction. This empirical study use Support Vector Machine with linear and polynomial kernel and k-NN(k=1,3,5,7) to recognize the handwritten letters. The linear SVM classifier gives the best accuracy of 98.06% among different kernels.

Another researcher, Murthy and Hanmandlu developed a new method of zoning based feature extraction for hand-written letter recognition [5]. To achieve the best performance, they included the label of black pixel location. The result was claimed to give the accuracy of 98.5% accuracy with the proposed SVM method.

Based on those previous works, this paper aims to compare two most popular classifiers, an artificial neural network (ANN) without feature extraction and Support Vector Machine(SVM). The proposed ANN model in this paper is a standard feedforward neural network. For the SVM we proposed both linear and RBF kernel for comparion. Both Model consists of two parts, that is training and testing phase. The attributes in chosen datasets are used to train the model without feature extraction in the training phase. In the testing phase ANN and SVM classifier are used to recognize the letters.

Our experiments proposed to classify the English alphabet including 26 letters. For this purpose, our recognition system need the one-to-one correspondence number of outputs to represent the 26 letters. In ANN experiments we have analysed and compared different data approaches with various number of iterations and hidden neurons. The proposed recognition method gives better recognition accuracy after comparing the result.

In the rest of the paper, we first introduce the chosen dataset in section 2. Then in section 3, we present a brief description of the SVM and ANN classifier. In section 4, we present an overview of our experimental system design. Section 5 present the experimental results are comparative analysis. Conclusion and future works are given in the last section.

2 Dataset

The character data “Letter Recognition Data Set” was taken from the UCI Machine Learning Repository [6]. The samples character images were retrieved from 20 different fonts and each letter was randomly parsed with 20,000 unique records. Each record was transferred to 16 numerical attributes, which are ranges from 0 to 15 and represented by integers. Table 1 shows all the information of the letter and 16 attributes. We split the datasets of 20,000 records into two parts: the training dataset (16000 records, 80% of total dataset) and the testing (4000 records, 20% of total set).

3 Methodology

3.1 Support Vector Machines(SVM)

3.1.1 The cost function

The cost function (some place is called the loss function) is very important in every algorithm in machine learning. This is because the process of training model is the process of optimizing the cost function, the partial derivative of the cost function to each parameter is the gradient mentioned in the gradient descent, and the regularization term is added to the cost function when overfitting is prevented [7]. In the process of learning related algorithms, the understanding of the cost function is also deepening. Here is a brief summary.

1. What’s the cost function?

Suppose that the training sample(x,y), the model is h and the parameter is θ. h(θ) = θTx( θT is the transposing of θ).

1) In general, any function that can measure the difference between the values predicted by the model h(θ) and the true value y can be called the cost function C(θ). If there are many samples, the values of all the cost functions can be calculated and recorded as J (θ). Therefore, it is easy to get the following properties about the cost function [8].

For each algorithm, the cost function is not unique.
The cost function is a function of the parameter θ.
The total cost function J (θ)can be used to evaluate the quality of the model. The cost function is smaller, which indicates the model and parameters are more consistent with the training sample(x, y).
J (θ)is a scalar;

2) When we have identified the model h, everything we do behind is training the parameter θof model. So when does model training end? The cost function is also involved, and because the cost function is used to measure the model, our goal is, of course, to get the best model. Therefore, the process of training parameters is to change theta continuously, thus obtaining a smaller J (θ). Ideally, when we get the minimum value of the cost function J, we get the optimal parameter theta [8]:

For example, J (θ)= 0 indicates that our model perfectly matches the observed data without any error.

3) In the process of optimizing parameter theta, the most commonly used method is gradient descent. The gradient here is the partial derivative of the cost function J (θ) to θ1, θ2, …, θn. Because of the need for partial derivatives, we can get another property of the cost function:

When choosing a cost function, it is best to select functions that are differentiable to parameter theta (total differential exists, partial derivative must exist).

2. Common forms of cost functions

In logistic regression, the most commonly used function is the cross entropy (Cross Entropy), and cross entropy is a common cost function.

where

m: the number of training samples;

hθ(x): the yvalue predicted by parameter theta and X;

y: the yvalue of the original training sample, that is, the standard answer.

Upper corner sign (i): i-th sample.

The cost function measures the difference between the model predictive value h (θ)and the standard answer y, so the total cost function Jis the function of H (θ)and y, that is, J=f (h (θ), y). And because y is given in training samples, h (θ)is determined by θ, so eventually the change of model parameter theta leads to the change of J. Different theta, corresponding prediction value h (θ), also corresponds to the value of J of different cost functions. The process of change is:

θ→h(θ)→J(θ)

Theta causes the change of h (θ)and changes the value of J (θ).

3.1.2 Logistic regression

Before introducing logistic regression, we first briefly describe linear regression. The main idea of linear regression is to fit a straight line through historical data and use this line to predict the new data.

We know that the formula for linear regression is as follows [9]:

For logistic regression, the idea is also based on linear regression (Logistic Regression belongs to generalized linear regression model). The formula is as follows:

Especially,

is known as the sigmoid function, we can see that the Logistic Regression algorithm maps the result of the linear function into the sigmoid function.

The Figure 1 of sigmoid function is as follow [10]:

We can see that the output of sigmoid is between (0, 1) and the intermediate value is 0.5, so the meaning of the former formula h_θ (x) is well understood, because the h_θ (x) output is between (0, 1), and it also indicates the probability that the data belongs to a certain category, for example:

h_θ (x)<0.5 indicates that the current data is of Class A;

h_θ (x)>0.5 indicates that the current data is of Class B.

So we can regard the sigmoid function as the probability density function of sample data. With the above formula, what we need to do next is to estimate the parameter theta.

First of all, we see that the value of the θ function has a special meaning, which represents the probability that the h_θ (x) result takes 1, so the probability for the class 1 and Class 0 for the input xclassification results are [10]:

respectively.

Maximum likelihood estimation:

Based on the above formula, we can use the maximum likelihood estimation method in probability theory to solve the cost function. First, we get the probability function as follows [11]:

Because the sample data (m)are independent, their joint distribution can be expressed as the product of each marginal distribution, and the likelihood function is [11]:

Taking logarithmic likelihood function [12]:

The maximum likelihood estimation is to get the value of θ that requires the maximum value of L(θ) to be obtained, which can be solved by the gradient ascent method. Let’s change it a little bit [12]:

Since a negative coefficient 1/m is multiplied, the gradient descent algorithm can be used to solve the parameters.

1.1.1 Support vector machine

We wish to minimize SVM (support vector machine):

where

This is the definition of support vector machine in mathematics. It looks like the cost function J(θ), but adding a regularization term to the right. In a word, SVM is the problem of minimizing the above formula, thus obtaining the parameters C and θ.

Parameter θ is the form of hypothesis function of support vector machine.

When you solve this optimization problem, and when you minimize the function of variables, you will get a very interesting decision boundary, SVM decision boundary:

This is the boundary that classifies the samples.

3.2 Artificial neural networks(ANN)

An ANN is a model that can learn patterns from data by simulation the human nervous system [13]. They are applied in various scientific areas or solving engineering problems as to provide any non-linear function without being explicitly programmed.

The architecture of ANN used in this paper is the most popular feedforward artificial neural network. It has three layers: an input layer, one or several hidden layers and an output layer. The basic elements of each layer are named neurons. The letter recognition dataset has 16 attributes as input features and 26 capital letters (A to Z) as output labels, which represents 16 imputes and 26 outputs. Figure 2 is a typical simplified structure of ANN with 2 inputs, 3 hidden neurons and 1 output neuron.

All above neurons have the same structure (Figure 3) and consist of two unit: a sum unit and a function unit. All of the weighted input value will be summarized and transfer to a single value x. Then the output of each neuron is called the output function or activation function. and represented with f(x).

There are many output functions, including identity function or simple linear function. We use the most common one called sigmoid function which is already introduced before.

Then the output of each neuron can be calculated like the following formula based on the formal mentioned typical ANN structure [16]:

where:

The example ANN is just doing a binary classification since it has only one output neuron. In our study that has 26 output neurons, each will be allocated to a class (capital letters from A to Z). Considering each output neuron allocated one sigmoid function, the output of our ANN is 26 numbers between 0 and 1. The class will then be recognized as the neuron which has the highest number.

To differentiate the output classes in ANN network, different weights are assigned to neuron inputs. Together with the input value, the weight of input neuron can hence determine the neuron’s output between 0 and 1 by the sigmoid function. The propose of the network training is to find the best value of weight to approach the most accurate classification.

The training algorithm is called backpropagation, which aims to updates all the weights with differences between the actual response and the function response. The training will be processed with a given number of iterations, until output a satisfied result with appropriate weights.

4 Overview of SVM and ANN recognition system

The proposed recognition system for the SVM and neural network was implemented using R 3.4.3.

4.1 SVM system

The flow of the proposed SVM recognition system is shown in Figure 4.

Figure 4. The flow of the proposed SVM recognition system

4.1.1 Preparation of data

Read the data into R, and confirm that the data received has 16 characteristics, which define the case of each letter class.

In this case, each feature is an integer. On the other hand, some of these integer variables appear to be quite wide, which seems to suggest the need for standardization or normalization of data, but fortunately, the R used to fit the support vector machine model will automatically help us to adjust the data.

Now we are entering the training and testing stage of machine learning process. I use the first 16000 records to build models and use the 4000 records to test them. We create training data frames and test data boxes. The code is as follows:

4.1.2 Training SVM

The e1071 package from the Statistics Department of Vienna University of Technology provides a R interface of the LIBSVM library, and then we call SVM () function based on training data. We start with training a simple linear support vector machine classifier and use the linear option to specify a linear kernel function.

Depending on computer performance, this operation may take some time to complete. When it is finished, input the name of the model to see some basic information about training parameters and the fitting degree of the model.

This information hardly tells us how well the model works in the real world. Therefore, we need to study the performance of the model based on the test data set, so as to determine whether it can be well extended to the unknown data.

4.1.3 Evaluating the performance of the model

The Predict () function allows us to predict based on the test data using the alphabetic classification model.

We need to compare the predicted values with the real values in the test data. For this purpose, we use the table () function. Only one part of the table is shown below.

The diagonal value 137,151,126,127,140,127,132 represents the total number of records that match the predicted value with the true value. Similarly, the number of mistakes is also listed. For example, the value 3 in the Z row and the T column indicates that there are 3 cases where the letter T is mistaken for the letter Z.

A single look at each type of error may reveal some interesting patterns of specific letter types that are difficult for model identification, but it is also time consuming. Therefore, we can simplify our evaluation by calculating the accuracy of the whole, that is, only the letters that predict are correct or incorrect, and the types of errors are ignored.

The following command returns a vector of TRUE or FALSE values to indicate whether the letter predicted by the model matches the real letter in the test data.

Using the table () function, we see that in the 4000 test records, there are 3364 letters correctly identified by the classifier.

As a percentage, the accuracy is about 84.1%:

4.1.4 Model performance tuning

Previous support vector machines use simple linear kernel functions. By using a more complex kernel function, we can map data to a higher dimensional space, and it is possible to get a better degree of fitting. A popular practice is to start with the Gauss RBF kernel function, because it has proved to be able to run well for many types of data. We can use the SVM () function to train a RBF based support vector machine. The default kernel function of the SVM () function in e1071 is kernel = “radial” (RBF), so it does not need to be set, the code is shown as follows:

Then, we predict as before:

Finally, we compare the accuracy with our linear support vector machine.

By changing the kernel function, we improved the accuracy of the character recognition model from 84.1% to 94.2%.

4.2 ANN system

Package nnet was adopted to realize feedforward artificial neural network, which has one hidden layer of sigmoid function neurons. Figure 5 shows the training and testing procedure of using nnet package in R.

*Figure5. The flow of the proposed ANN recognition system*

4.2.1 Installing nnet package & dataset classification

First the nnet should be installed and then load to RStudio. Then we need to load the source data and classified into trainset and testset with corresponding records descripted in section 2. The function is realized in R as follows:

install.packages(nnet)

library(nnet)

letters<- read.csv(“~/ML/letterdata.csv”)

trainset<-letters[1:16000, ];

testset<-letters[16001:20000, ];

4.2.2 Fit ANN with nnet

Although many parameters are fixed in nnet package, there are still some parameters for tuning to maximize the accuracy of classification. Table 2 shows the meanings and default values of the arguments used in our training procedure.

12 different neural networks architectures were chosen based on the mixed combination of parameters. Note that for ANN with large inputs, the parameter rang should set based on the following formula that:

In this experiment, we set rang based on the mentioned formula in 11 and 12 neural networks.

The training function is realized in R as follows:

letters.nn = nnet(letter ~ .,data = trainset,rang = 0.1,decay = 5e-4,size = 20,maxit = 5000)

4.2.3 Prediction using ANN

Then we use the trained ANN model to propose further prediction with testset. The result will be explained in detail in section 5. The predicting function is realized in R as follows:

## testset

letters.predict = predict(letters.nn,testset,type = “class”)

4.2.4 Performance evaluation

In this part, we need to evaluate the performance of the network module. Also the comparative analysis will present in section 6. In R system this is realized using the function confusionMatrix as follows:

# use union to ensure similar levels

u = union(testset$letter,letters.predict)

nn.table = table(factor(letters.predict, u), factor(testset$letter, u))

# Evaluate the result

confusionMatrix(nn.table)

4.2.5 Plotting ANN

Figure 6 shows one of the ANN structureadopted in our test with 20 neurons in hidden layer. Note the bias neurons B1 and B2 added in hidden layer and output layer to allow the neural network to change the output value on demand. Also, larger weights will be showed with thicker lines and color indicates sign, that is black and grey represents positive and negative prediction value respectively [18].

This is realized with the function plotnet in R as follows:

#plot nn

library(NeuralNetTools)

plotnet(letters.nn)

5 Results & Discussion

5.1 SVM system evaluation

The following table shows the prediction results of single character recognition of 2 SVM classifiers. The statistics data showed the recognition accuracy of each classification of A~Z letters. We can conclude that 23/26 of recognition accuracy of SVM with radial basis function kernel are over 90%, which obviously presents the better performance than linear SVM.

Table 5 present the recognition result of two SVM system with linear and radial basis function(RBF) kernel, where the SVM classifier with radial basis function(RBF) kernel shows better accuracy.

Figure 7 shows the recognition accuracy result of above two SVM approaches. The SVM with rbf kernel performs obviously better than the linear SVM, more than ten percentage.

*Figure 7 Comparing the recognition accuracy For SVM with different Kernel Function*

5.2 ANN system evaluation

Table 6 shows the prediction result of the 12 ANN. The statistics data showed the recognition accuracy of each classification of A~Z letters and total result. We can conclude that the networks 10 presents the best performance.

Figure 8 shows the recognition accuracy total result of all 12 ANN approaches. However, only one of them obtains over 90% accuracy of recognition.

*Figure 8 Comparing the recognition accuracy for 12 ANN approaches*

5.3 Compare ANN& SVM

The result in Table 7 shows that SVM with Kernel function of Radial Basis Function(RBF) performs better than Artificial Neural Network.

Figure 9 also visualized the comparing result.

*Figure 9 Comparing the recognition accuracy of SVM and ANN*

In our report, we use 2 support vector machines and 12 neural network classifiers. When we use a linear kernel function support vector machine classifier, it can correctly identify the alphabet accuracy of 84.1%, which is better than the 9 neural network classifier; while we use the RBF kernel function support vector. The accuracy of the classifier is 94.2%, which exceeds all 12 neural network classifiers. This result shows that SVM should be an advantage for small samples, and it is indeed one of the best classifiers.

Support vector machine is based on statistical theory, so it has strict theoretical and mathematical basis, not like neural network structure design need to rely on the designer’s experience knowledge and prior knowledge. Compared with neural network learning methods, Support vector machine has advantages:

1) Support vector machine (SVM) is based on SRM (structural risk minimization) principle, having good generalization ability.

2) By mapping the nonlinear problem in the input space to the high-dimensional space by constructing the kernel function, the linear function is constructed in the high-dimensional space.

3) The algorithm can be transformed into a convex optimization problem, which guarantees the global optimality of the algorithm and avoids the local minimum problem which the neural network can not solve.

4) Support vector machine has strict theoretical and mathematical foundation, avoiding the empirical components in neural network implementation.

6 Conclusion

Support vector machine (SVM) is a new generation learning machine based on statistical learning theory. It has many attractive features. It is superior to the traditional artificial neural network in function expression, popularization and learning efficiency.

Since SVM is used to solve support vectors with the aid of the quadratic programming, which will involve the calculation of the m order matrix (m is the number of samples). When the number of m is large, the storage and calculation of the matrix will consume a large amount of machine memory and operation time.

The main improvements to the above problems include J.Platt’s SMO algorithm, T.Joachims’ PCGC, Zhang ‘s CSVM, and O.L.Mangasarian’s SOR.

In the report, we have studied two machine learning methods that can provide great potential in letters recognition and classification. Result shows that comparing with Artificial neural networks(ANN),Support vector machine (SVM) has better approximation ability and generalization ability. Future research on the application of Support vector machine (SVM) will spur greater interest in various fields for its unique superiority.

7 Bibliography

[1]M. R. Phangtriastu, Jeklin Harefa and Dian Felita Tanoto, “Comparison Between Neural Network and Support Vector Machine in Optical Character Recognition,” Procedia Computer Science, vol. 116, pp. 351–357, 2017.

[2]Malon C, Uchida S and Suzuki M, “Mathematical symbol recognition with support vector machines,” 2008. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0167865508000603.

[3]Rao NV and Pradesh A, “OPTICAL CHARACTER RECOGNITION TECHNIQUE,” Technology AI, vol. 83, no. 2, 2016.

[4]Mahto MK, Bhatia K and Sharma RK, “Combined horizontal and vertical projection feature extraction technique for Gurmukhi handwritten character recognition,” in 2015 International Conference on Advances in Computer Engineering and Applications, 2015.

[5]M. OVR, “Zoning based Devanagari Character Recognition,” vol. 27, no. 4, p. 21–5, 2011.

[6]C. L. Blake and C. J. Merz, “UCI repository of machine learning databases,” 1998. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/letter+recognition.

[7]J. Bell, “Support vector machines,” Machine Learning: Hands-On for Developers and Technical Professionals, pp. 139–160, 2014.

[8]Nasrabadi and N. M, “Pattern recognition and machine learning,” Journal of electronic imaging , vol. 049901, no. 16, p. 4, 2007.

[9]R. E. Schapire, “The boosting approach to machine learning: An overview,” Nonlinear estimation and classification, pp. 149–171, 2003.

[10]Dreiseitl, Stephan and Lucila Ohno-Machado, “Logistic regression and artificial neural network classification models: a methodology review,” Journal of biomedical informatics, vol. 35, no. 5–6, pp. 352–359, 2002.

[11]C. Robert, “Machine learning, a probabilistic perspective,” pp. 62–63, 2014.

[12]Jordan, Michael I. and Tom M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245 , pp. 255–260, 2015.

[13]Gazzah, Sami and Najoua Ben Amara, “Neural networks and support vector machines classifiers for writer identification using Arabic script,” International Arab Journal of Information Technology (IAJIT), vol. 5, p. 1, 2008.

[14]V. D. Do and Dong-Min Woo, “Handwritten Character Recognition Using Feedforward Artificial Neural Network,” in 7th International Conference on Latest Trends in Engineering & Technology, Pretoria, 2015.

[15]Kumar, Parveen, N. Sharma and A. Rana, “Handwritten Character Recognition using Different Kernel based SVM Classifier and MLP Neural Network (A COMPARISON),” International Journal of Computer Applications, vol. 53, no. 11, 2012.

[16]S. D. B. M. N. L. M. M. K. a. D. K. B. Arora, “Performance comparison of SVM and ANN for handwritten Devnagari character recognition,” vol. 1006, no. 5902, 2010.

[17]Brian Ripley and William Venables, “Feed-Forward Neural Networks and Multinomial Log-Linear Models,” 2 2 2016. [Online]. Available: https://cran.r-project.org/web/packages/nnet/nnet.pdf. [Accessed 4 6 2018].

[18]S. Thompson, “Visualizing neural networks in R — update,” 4 3 2013. [Online]. Available: https://beckmw.wordpress.com/tag/nnet/. [Accessed 4 6 2018].

Performance Comparison of SVM and ANN for Handwritten Character Recognition

Abstract

Table of Contents

1 Introduction

2 Dataset

3 Methodology

3.1 Support Vector Machines(SVM)

3.1.1 The cost function

3.1.2 Logistic regression

1.1.1 Support vector machine

3.2 Artificial neural networks(ANN)

4 Overview of SVM and ANN recognition system

4.1 SVM system

4.1.1 Preparation of data

4.1.2 Training SVM

4.1.3 Evaluating the performance of the model

4.1.4 Model performance tuning

4.2 ANN system

4.2.1 Installing nnet package & dataset classification

4.2.2 Fit ANN with nnet

4.2.3 Prediction using ANN

4.2.4 Performance evaluation

4.2.5 Plotting ANN

5 Results & Discussion

5.1 SVM system evaluation

5.2 ANN system evaluation

5.3 Compare ANN& SVM

6 Conclusion

7 Bibliography