avatarAbhay Parashar

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

12115

Abstract

ss="hljs-number">10</span>)) <span class="hljs-attribute">fig</span>.tight_layout(pad=<span class="hljs-number">3</span>.<span class="hljs-number">0</span>) <span class="hljs-attribute">ax</span>[<span class="hljs-number">0</span>,<span class="hljs-number">0</span>].set_title('Glucose') <span class="hljs-attribute">ax</span>[<span class="hljs-number">0</span>,<span class="hljs-number">0</span>].hist(data.Glucose[data.Outcome==<span class="hljs-number">1</span>]); <span class="hljs-attribute">ax</span>[<span class="hljs-number">0</span>,<span class="hljs-number">1</span>].set_title('Pregnancies') <span class="hljs-attribute">ax</span>[<span class="hljs-number">0</span>,<span class="hljs-number">1</span>].hist(data.Pregnancies[data.Outcome==<span class="hljs-number">1</span>]); <span class="hljs-attribute">ax</span>[<span class="hljs-number">1</span>,<span class="hljs-number">0</span>].set_title('Age') <span class="hljs-attribute">ax</span>[<span class="hljs-number">1</span>,<span class="hljs-number">0</span>].hist(data.Age[data.Outcome==<span class="hljs-number">1</span>]); <span class="hljs-attribute">ax</span>[<span class="hljs-number">1</span>,<span class="hljs-number">1</span>].set_title('Blood Pressure') <span class="hljs-attribute">ax</span>[<span class="hljs-number">1</span>,<span class="hljs-number">1</span>].hist(data.BloodPressure[data.Outcome==<span class="hljs-number">1</span>]); <span class="hljs-attribute">ax</span>[<span class="hljs-number">2</span>,<span class="hljs-number">0</span>].set_title('Skin Thickness') <span class="hljs-attribute">ax</span>[<span class="hljs-number">2</span>,<span class="hljs-number">0</span>].hist(data.SkinThickness[data.Outcome==<span class="hljs-number">1</span>]); <span class="hljs-attribute">ax</span>[<span class="hljs-number">2</span>,<span class="hljs-number">1</span>].set_title('Insulin') <span class="hljs-attribute">ax</span>[<span class="hljs-number">2</span>,<span class="hljs-number">1</span>].hist(data.Insulin[data.Outcome==<span class="hljs-number">1</span>]); <span class="hljs-attribute">ax</span>[<span class="hljs-number">3</span>,<span class="hljs-number">0</span>].set_title('BMI') <span class="hljs-attribute">ax</span>[<span class="hljs-number">3</span>,<span class="hljs-number">0</span>].hist(data.BMI[data.Outcome==<span class="hljs-number">1</span>]); <span class="hljs-attribute">ax</span>[<span class="hljs-number">3</span>,<span class="hljs-number">1</span>].set_title('Diabetes Pedigree Function') <span class="hljs-attribute">ax</span>[<span class="hljs-number">3</span>,<span class="hljs-number">1</span>].hist(data.DiabetesPedigreeFunction[data.Outcome==<span class="hljs-number">1</span>]);</pre></div><figure id="0f11"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*mvDJhsQabaaWYUgmGjeFNQ.png"><figcaption>ā€œImage by Authorā€</figcaption></figure><div id="9753"><pre># correlation matrix <span class="hljs-keyword">between</span> <span class="hljs-keyword">columns</span>

It shows the correlation(positive,neagative) <span class="hljs-keyword">between</span> different <span class="hljs-keyword">columns</span>(<span class="hljs-keyword">only</span> <span class="hljs-type">integer</span> <span class="hljs-keyword">value</span> <span class="hljs-keyword">columns</span>) </pre></div><div id="a821"><pre><span class="hljs-type">corr_matrix</span> = <span class="hljs-title">data</span>.corr()

fig,ax = plt.subplots(figsize=(<span class="hljs-number">15</span>,<span class="hljs-number">10</span>))ax = sns.heatmap(<span class="hljs-type">corr_matrix</span>,annot=True,linewidth=<span class="hljs-number">0.5</span>,fmt=<span class="hljs-string">".2f"</span>,cmap=<span class="hljs-string">"YlGnBu"</span>)</pre></div><figure id="d86f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*HaMwpv1H9WPxlTpBF8OPdQ.png"><figcaption>ā€œImage by Authorā€</figcaption></figure><h1 id="0201">Modeling and Training</h1><div id="304a"><pre>#random <span class="hljs-built_in">data</span> shuffelin</pre></div><div id="b4d2"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.sample(<span class="hljs-title">frac</span>=1)</span></pre></div><div id="537e"><pre><span class="hljs-comment">#Spliting the data</span> X = data.drop(<span class="hljs-string">"Outcome"</span>,<span class="hljs-attribute">axis</span>=1) y = data[<span class="hljs-string">"Outcome"</span>] X_train,X_test,y_train,y_test = train_test_split(X,y,<span class="hljs-attribute">test_size</span>=0.2)</pre></div><p id="793c">We are going to <b>train </b>our model on 4 algorithms <b><i>1.Logistic Regression 2.KNN 3.Random Forest Classifier 4.Support Vector Machine</i></b></p><div id="10ff"><pre>## Build an model (Logistic Regression) <span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> LogisticRegression log_reg = LogisticRegression(random_state=<span class="hljs-number">0</span>) log_reg.fit(X_train,y_train);

Evaluating the model

log_reg = log_reg.score(X_test,y_test)</pre></div><div id="40f3"><pre>## Build an model (KNN) knn = KNeighborsClassifier() knn.fit(X_train,y_train);

Evaluating the model

knn = knn.score(X_test,y_test)</pre></div><div id="72d7"><pre>## Build an model (Random forest classifier) clf= RandomForestClassifier() clf.fit(X_train,y_train);

Evaluating the model

clf = clf.score(X_test,y_test)</pre></div><div id="58df"><pre>## Build an model (Support Vector Machine) svm = SVC() svm.fit(X_train,y_train)

Evaluating the model

svm = svm.score(X_test,y_test)</pre></div><p id="ecfc">Let’s visualize the training performance of all the models</p><div id="dba5"><pre>model_compare = pd.<span class="hljs-title class_">DataFrame</span>({<span class="hljs-string">"Logistic Regression"</span><span class="hljs-symbol">:log_reg</span>, <span class="hljs-string">"KNN"</span><span class="hljs-symbol">:knn</span>, <span class="hljs-string">"Random Forest Classifier"</span><span class="hljs-symbol">:clf</span>, <span class="hljs-string">"Support Vector Machine"</span><span class="hljs-symbol">:svm</span>,}, index=[<span class="hljs-string">"accuracy"</span>])</pre></div><div id="4832"><pre><span class="hljs-attribute">model_compare</span>.T.plot.bar(figsize=(<span class="hljs-number">15</span>,<span class="hljs-number">10</span>));</pre></div><div id="923a"><pre>##############OUTPUT###############

     Logistic Regression    KNN     Random ForestClassifier  SVM

accuracy <span class="hljs-number">0.818182</span> <span class="hljs-number">0.772727</span> <span class="hljs-number">0.798701</span> <span class="hljs-number">0.818182</span></pre></div><figure id="3618"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*j7wr3hLQcIFJI-2CyPHn6A.png"><figcaption>ā€œImage by Authorā€</figcaption></figure><blockquote id="5559"><p>Here we can see both SVM and Logistic Regression are performing very well with an accuracy of 81%. we can improve the accuracy more using Hyperparameter tuning.</p></blockquote><h1 id="feb2">Improving accuracy using Hyperparameter tuning</h1><p id="bf15">We are going to use both grid search cv and RandomizedSearchcv for our hyperparameter turning.</p><p id="8649">In the logistic regression parameter which we can be easily hyper tuned are <code>C</code> and <code>solver</code> .</p><h2 id="8e1f">Hyperparameter Tuning using RandomizedSearchcv</h2><div id="d0cb"><pre><span class="hljs-comment"># Create a hyperparameter grid for LogisticRegression</span> <span class="hljs-attribute">log_reg_grid</span> = {<span class="hljs-string">"C"</span>: np.logspace(-<span class="hljs-number">4</span>, <span class="hljs-number">4</span>, <span class="hljs-number">20</span>),<span class="hljs-string">"solver"</span>:<span class="hljs-meta"> ["liblinear"]}</span></pre></div><div id="385d"><pre><span class="hljs-comment"># Tune LogisticRegression</span> np.random.seed(42) <span class="hljs-comment"># Setup random hyperparameter search for LogisticRegression</span> rs_log_reg = RandomizedSearchCV(LogisticRegression(), <span class="hljs-attribute">param_distributions</span>=log_reg_grid, <span class="hljs-attribute">cv</span>=5, <span class="hljs-attribute">n_iter</span>=20, <span class="hljs-attribute">verbose</span>=<span class="hljs-literal">True</span>) <span class="hljs-comment"># Fit random hyperparameter search model for LogisticRegression</span> rs_log_reg.fit(X_train, y_train) score = rs_log_reg.score(X_test,y_test) <span class="hljs-built_in">print</span>(score<span class="hljs-number">*100</span>)</pre></div><div id="0791"><pre>##########OUTPUT########### <span class="hljs-number">83.11688311688312</span></pre></div><blockquote id="6c7d"><p><b>Great,</b> Using Randomized Search cv we have increased the accuracy by <b>2%.</b></p></blockquote><h2 id="5c8b">Hyperparameter Tuning using GridSearchcv</h2><div id="b685"><pre><span class="hljs-attribute">log_reg_grid</span> = {'C': np.logspace(-<span class="hljs-number">4</span>,<span class="hljs-number">4</span>,<span class="hljs-number">30</span>),</pre></div><div id="0867"><pre><span class="hljs-string">"solver"</span>:[<span class="hljs-string">"liblinear"</span>]} <span class="hljs-comment">#setup the gird cv</span> gs_log_reg = GridSearchCV(LogisticRegression(), <span class="hljs-attribute">param_grid</span>=log_reg_grid, <span class="hljs-attribute">cv</span>=5, <span class="hljs-attribute">verbose</span>=<span class="hljs-literal">True</span>) <span class="hljs-comment">#fit grid search cv</span> gs_log_reg.fit(X_train,y_train) score = gs_log_reg.score(X_test,y_test) <span class="hljs-built_in">print</span>(score<span class="hljs-number">*100</span>)</pre></div><div id="66f3"><pre>########OUTPUT######### <span class="hljs-number">83.76623376623377</span></pre></div><blockquote id="4afd"><p><b>Great,</b> Using Grid Search CV we have increased the accuracy by up to <b>2.5%.</b></p></blockquote><p id="aa2f" type="7">Best Model is logistic Regression with 83% accuracy</p><h1 id="09bd">Evaluate the model</h1><p id="f109">Let’s Predict <b>X_test </b>first</p><div id="1844"><pre>y_preds <span class="hljs-operator">=</span> gs_log_reg.predict(X_test) y_preds</pre></div><div id="5e86"><pre><span class="hljs-comment">######OUTPUT#########</span> <span class="hljs-attribute">array</span>([<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>,<span class="hljs-number">0</span>,<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span

Options

, <span class="hljs-number">0</span>,<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>,<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>,<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>,<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>,<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>,<span class="hljs-number">0</span>, <span class="hljs-number">0</span>],dtype=int64)</pre></div><p id="8fb2"><b><i>Let see the confusion matrix, accuracy score, classification report, and roc curve.</i></b></p><p id="1969"><b>confusion matrix</b></p><div id="b34e"><pre>sns.<span class="hljs-built_in">set</span>(<span class="hljs-attribute">font_scale</span>=2) import seaborn as sns sns.heatmap(confusion_matrix(y_test,y_preds), <span class="hljs-attribute">annot</span>=<span class="hljs-literal">True</span>,cbar=False, <span class="hljs-attribute">fmt</span>=<span class="hljs-string">'g'</span>) plt.xlabel(<span class="hljs-string">"True label"</span>) plt.ylabel(<span class="hljs-string">"Predicted label"</span>);</pre></div><p id="f7b0"><b>accuracy score</b></p><div id="c553"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(accuracy_score(y_test,y_pred)</span></span>)</pre></div><div id="729b"><pre>#######OUTPUT######## <span class="hljs-number">0.8376</span></pre></div><p id="737f"><b>Classification Report</b></p><div id="1ff4"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(classification_report(y_test, y_preds)</span></span>)</pre></div><p id="3052"><b>ROC Curve</b></p><div id="5ba2"><pre><span class="hljs-function"><span class="hljs-title">plot_roc_curve</span><span class="hljs-params">(gs_log_reg,X_test,y_test)</span></span></pre></div><h1 id="99a4">Save and Load the model</h1><div id="77d7"><pre><span class="hljs-keyword">import</span> pickle <span class="hljs-comment"># Save trained model to file</span> pickle.dump(gs_log_reg, <span class="hljs-built_in">open</span>(<span class="hljs-string">"Diabetes.pkl"</span>, <span class="hljs-string">"wb"</span>))</pre></div><div id="af5d"><pre>loaded_model = pickle<span class="hljs-selector-class">.load</span>(<span class="hljs-built_in">open</span>(<span class="hljs-string">"Diabetes.pkl"</span>, <span class="hljs-string">"rb"</span>)) loaded_model<span class="hljs-selector-class">.predict</span>(X_test) loaded_model<span class="hljs-selector-class">.score</span>(X_test,y_test)</pre></div><div id="ef67"><pre>#######OUTPUT######## <span class="hljs-number">0.8376623376623377</span></pre></div><h1 id="c812">2. Creating a web app using flask and connecting it with model</h1><p id="4c3c">So to create a web app let’s prepare a folder structure</p><div id="17bf"><pre>diabetes(root) <span class="hljs-string">|____templates</span> <span class="hljs-string">|___index.html</span> <span class="hljs-string">|____static</span> <span class="hljs-string">|____css</span> <span class="hljs-string">|_____js</span> <span class="hljs-string">|____app.py</span> <span class="hljs-string">|_____Diabetes.pkl</span></pre></div><p id="31ad"><b><i>Download The templates and static directory from my Github</i></b></p><div id="81fa" class="link-block"> <a href="https://github.com/Abhayparashar31/Diabetes-prediction"> <div> <div> <h2>Abhayparashar31/Diabetes-prediction</h2> <div><h3>undefined</h3></div> <div><p>undefined</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*_DYo_ckpAo6Andj9)"></div> </div> </div> </a> </div><p id="2ada">Let’s Create <b><i>app.py</i></b></p> <figure id="2cc4"> <div> <div>

            <iframe class="gist-iframe" src="/gist/Abhayparashar31/c7c3664c275d22f5366ecef6b1ff443a.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="b071">Now let’s run our code onto our localhost</p><p id="a38b">Open CMD and go to the root(Diabetes) folder and then run app.py using <code>python app.py</code> then you will see some message like thisšŸ‘‡</p><figure id="fb1e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*u4jvkehJjGpjgwpTDocYiA.png"><figcaption>ā€œImage by Authorā€</figcaption></figure><p id="1589">Just open the URL in any browser and test the app using some random inputs.</p><h1 id="824d">3. Commit the Projects to Github</h1><p id="eb5a">Before committing the project to Github we also going to create two more files.</p><p id="b13d"><b>1. Profile: </b>Heroku apps include a Procfile that specifies the commands that are executed by the app on startup.</p><div id="18cc"><pre>web: gunicorn <span class="hljs-keyword">app</span>:<span class="hljs-keyword">app</span></pre></div><p id="9f6a"><b>2. Requirments.txt: </b>requirements. txt file is used for specifying what python packages are required to run the project.</p><div id="0787"><pre><span class="hljs-attribute">Flask</span>==<span class="hljs-number">1</span>.<span class="hljs-number">1</span>.<span class="hljs-number">1</span>

<span class="hljs-attribute">gunicorn</span>==<span class="hljs-number">19</span>.<span class="hljs-number">9</span>.<span class="hljs-number">0</span> <span class="hljs-attribute">itsdangerous</span>==<span class="hljs-number">1</span>.<span class="hljs-number">1</span>.<span class="hljs-number">0</span> <span class="hljs-attribute">Jinja2</span>==<span class="hljs-number">2</span>.<span class="hljs-number">10</span>.<span class="hljs-number">1</span> <span class="hljs-attribute">MarkupSafe</span>==<span class="hljs-number">1</span>.<span class="hljs-number">1</span>.<span class="hljs-number">1</span> <span class="hljs-attribute">Werkzeug</span>==<span class="hljs-number">0</span>.<span class="hljs-number">15</span>.<span class="hljs-number">5</span> <span class="hljs-attribute">numpy</span>>=<span class="hljs-number">1</span>.<span class="hljs-number">9</span>.<span class="hljs-number">2</span> <span class="hljs-attribute">scipy</span>>=<span class="hljs-number">0</span>.<span class="hljs-number">15</span>.<span class="hljs-number">1</span> <span class="hljs-attribute">scikit</span>-learn>=<span class="hljs-number">0</span>.<span class="hljs-number">18</span> <span class="hljs-attribute">matplotlib</span>>=<span class="hljs-number">1</span>.<span class="hljs-number">4</span>.<span class="hljs-number">3</span> <span class="hljs-attribute">pandas</span>>=<span class="hljs-number">0</span>.<span class="hljs-number">19</span></pre></div><p id="35f0">Now after that, go to your Github account and upload the files and then commit to the branch.</p><figure id="6d07"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RxDt7VZF2GQyn_Chkx0S3w.png"><figcaption>ā€œImage by Authorā€</figcaption></figure><h1 id="9a4b">4. Deploy model using Heroku</h1><p id="1511">Visit <a href="https://signup.heroku.com/">Heroku</a> and create a free account and then log in to your account.</p><figure id="5789"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*cvtPhqT_6fC23iepljaHCA.png"><figcaption>ā€œImage by Authorā€</figcaption></figure><figure id="6550"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*asOoSgWYejgn669DhTECcg.gif"><figcaption>ā€œImage by Authorā€</figcaption></figure><p id="94bf">You can Find all the Source Code in my Github profile<a href="https://github.com/Abhayparashar31/Diabetes-prediction"> Abhayparashar31</a></p><p id="6169">Thanks, For readingšŸ˜„</p><p id="8abf">Thanks For Reading Till Here, If You Like My Content and Want To Support Me The Best Way is —</p><ol><li>Follow Me On <a href="http://abhayparashar31.medium.com/"><b><i>Medium</i></b></a>.</li><li>Connect With Me On <a href="https://www.linkedin.com/in/abhay-parashar-328488185/"><b><i>LinkedIn</i></b></a>.</li><li>Become a Medium Member With The Cost of One Pizza Using <a href="https://abhayparashar31.medium.com/membership"><b><i>My Referral Link</i></b></a>. a small part of your membership fee will go to me.</li><li>Subscribe To <a href="https://abhayparashar31.medium.com/subscribe"><b><i>My Email List</i></b></a> To Never Miss An Article From Me.</li></ol></article></body>

Build & Deploy Diabetes Prediction app using Flask, ML and Heroku

End to End Machine learning project from training a model to deploy it on Heroku

Photo by CHUTTERSNAP on Unsplash

Welcome my friend,

According to CNBC the trendiest job for the upcoming decade is Data Scientist and machine learning engineer. It is the best time for us to learn some machine learning algorithms and create some projects threw them.

You often saw that all those tutorials and blogs explaining about different types of machine learning algorithms but most of them does not show you how to build a project and then deploy it using those algorithms.

Don’t Worry, In this blog, we are going to create an end to end machine learning project then also going to deploy it in Heroku.

We are going to complete it in four steps

  1. Creating a model using machine learning
  2. Creating a web app using flask and connecting it with model
  3. Commit project to Github
  4. Deploy our model using Heroku

Before Getting Started Let’s First Setup our environment

  1. Download Latest Version of Python

2. Install Required Packages

all the packages can be installed using pip from cmd(terminal).

pip install pandas,numpy,matplotlib,scikit-learn,seaborn

3. Install Jupyter notebook

pip install jupyter-notebook
jupyter notebook ### for running

After Completing all the three steps, now let’s start working on our project.

Open a new notebook in jupyter, follow the below steps along

1. Creating a model Using Machine Learning

Import the necessary libraries

#importing Libraries
import numpy as np   
np.random.seed(42)   ## so that output would be same
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline   ## our plot lies on the same notebook
#models
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
#Evaluation
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.model_selection import RandomizedSearchCV,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import plot_roc_curve
#for warning
from warnings import filterwarnings
filterwarnings("ignore")  ## To remove any kind of warning

Load the dataset

The dataset is available at Kaggle but we are going to use mine Github link to load the data.

data = pd.read_csv("https://raw.githubusercontent.com/Abhayparashar31/Diabetes-prediction/master/diabetes.csv")

About the Dataset

The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome. Independent variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

Columns

Pregnancies: Number of times pregnant Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test blood pressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skinfold thickness (mm) Insulin: 2-Hour serum insulin (mu U/ml) BMI: Body mass index (weight in kg/(height in m)²) DiabetesPedigreeFunction: It provided some data on diabetes mellitus history in relatives and the genetic relationship of those relatives to the patient. Age: Age (years) Outcome: Class variable (0 or 1) 268 of 768 is 1, the others are 0

Task

To build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?

EDA on Dataset

print(data.shape)  ### Return the shape of data 
print(data.ndim)   ### Return the n dimensions of data
print(data.size)   ### Return the size of data 
print(data.isna().sum())  ### Returns the sum fo all na values
print(data.info())  ### Give concise summary of a DataFrame

Let’s Visualise the Some columns and compare them

data["Outcome"].value_counts().plot(kind="bar",color=["salmon","deeppink"])
plt.xticks(np.arange(2), ('No Diabetes', 'Diabetes'),rotation=0);
Image by Author
# Comparing Glucose with the Outcome
pd.crosstab(data.Glucose[::15],data.Outcome).plot(kind="bar",figsize=(18,8),color=["yellow","deeppink"])
plt.ylabel("people");
plt.xticks(rotation=0);
plt.legend(['No Diabetes', 'Diabetes']);
ā€œImage by Authorā€
#find out Blood Pressure and age of entries who have diabetes
plt.figure(figsize=(10,6))
# Scatter with positive example
plt.scatter(data.Age[data.Outcome==1],data.BloodPressure[data.Outcome==1],c="Red");
# Scatter with negative example
plt.scatter(data.Age[data.Outcome==0],data.BloodPressure[data.Outcome==0],c="lightblue");
# Add some helpful info
plt.title("Diabetes in function of Age and Blood pressure")
plt.xlabel("Age")
plt.ylabel("Blood Pressure")
plt.legend(["Diabetes","No Diabetes"]);
ā€œImage by Authorā€
## Pairplotting of dataframe
import seaborn as sns
sns.set(style="ticks", color_codes=True)
sns.pairplot(data,hue='Outcome',palette='gnuplot');
ā€œImage by Authorā€
# Histogram of all coloumns when the Outcome is 1( has Diabetes)
fig, ax = plt.subplots(nrows=4, ncols=2, figsize=(12, 10))
fig.tight_layout(pad=3.0)
ax[0,0].set_title('Glucose')
ax[0,0].hist(data.Glucose[data.Outcome==1]);
ax[0,1].set_title('Pregnancies')
ax[0,1].hist(data.Pregnancies[data.Outcome==1]);
ax[1,0].set_title('Age')
ax[1,0].hist(data.Age[data.Outcome==1]);
ax[1,1].set_title('Blood Pressure')
ax[1,1].hist(data.BloodPressure[data.Outcome==1]);
ax[2,0].set_title('Skin Thickness')
ax[2,0].hist(data.SkinThickness[data.Outcome==1]);
ax[2,1].set_title('Insulin')
ax[2,1].hist(data.Insulin[data.Outcome==1]);
ax[3,0].set_title('BMI')
ax[3,0].hist(data.BMI[data.Outcome==1]);
ax[3,1].set_title('Diabetes Pedigree Function')
ax[3,1].hist(data.DiabetesPedigreeFunction[data.Outcome==1]);
ā€œImage by Authorā€
# correlation matrix between columns
## It shows the correlation(positive,neagative) between different columns(only integer value columns) 
corr_matrix = data.corr()
fig,ax = plt.subplots(figsize=(15,10))ax = sns.heatmap(corr_matrix,annot=True,linewidth=0.5,fmt=".2f",cmap="YlGnBu")
ā€œImage by Authorā€

Modeling and Training

#random data shuffelin
data.sample(frac=1)
#Spliting the data
X = data.drop("Outcome",axis=1)
y = data["Outcome"]
X_train,X_test,y_train,y_test =  train_test_split(X,y,test_size=0.2)

We are going to train our model on 4 algorithms 1.Logistic Regression 2.KNN 3.Random Forest Classifier 4.Support Vector Machine

## Build an model (Logistic Regression)
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(random_state=0)
log_reg.fit(X_train,y_train);
## Evaluating the model
log_reg = log_reg.score(X_test,y_test)
## Build an model (KNN)
knn = KNeighborsClassifier()
knn.fit(X_train,y_train);
## Evaluating the model
knn = knn.score(X_test,y_test)
## Build an model (Random forest classifier)
clf= RandomForestClassifier()
clf.fit(X_train,y_train);
## Evaluating the model
clf = clf.score(X_test,y_test)
## Build an model (Support Vector Machine)
svm = SVC()
svm.fit(X_train,y_train)
## Evaluating the model
svm = svm.score(X_test,y_test)

Let’s visualize the training performance of all the models

model_compare = pd.DataFrame({"Logistic Regression":log_reg,
"KNN":knn,
"Random Forest Classifier":clf,
"Support Vector Machine":svm,},
index=["accuracy"])
model_compare.T.plot.bar(figsize=(15,10));
##############OUTPUT###############

         Logistic Regression    KNN     Random ForestClassifier  SVM
accuracy      0.818182        0.772727       0.798701       0.818182
ā€œImage by Authorā€

Here we can see both SVM and Logistic Regression are performing very well with an accuracy of 81%. we can improve the accuracy more using Hyperparameter tuning.

Improving accuracy using Hyperparameter tuning

We are going to use both grid search cv and RandomizedSearchcv for our hyperparameter turning.

In the logistic regression parameter which we can be easily hyper tuned are C and solver .

Hyperparameter Tuning using RandomizedSearchcv

# Create a hyperparameter grid for LogisticRegression
log_reg_grid = {"C": np.logspace(-4, 4, 20),"solver": ["liblinear"]}
# Tune LogisticRegression
np.random.seed(42)
# Setup random hyperparameter search for LogisticRegression
rs_log_reg = RandomizedSearchCV(LogisticRegression(),
                                  param_distributions=log_reg_grid,
                                  cv=5,
                                  n_iter=20,
                                  verbose=True)
# Fit random hyperparameter search model for LogisticRegression
rs_log_reg.fit(X_train, y_train)
score = rs_log_reg.score(X_test,y_test)
print(score*100)
##########OUTPUT###########
83.11688311688312

Great, Using Randomized Search cv we have increased the accuracy by 2%.

Hyperparameter Tuning using GridSearchcv

log_reg_grid = {'C': np.logspace(-4,4,30),
"solver":["liblinear"]}
#setup  the gird cv
gs_log_reg = GridSearchCV(LogisticRegression(),
                          param_grid=log_reg_grid,
                          cv=5,
                           verbose=True)
#fit grid search cv
gs_log_reg.fit(X_train,y_train)
score = gs_log_reg.score(X_test,y_test)
print(score*100)
########OUTPUT#########
83.76623376623377

Great, Using Grid Search CV we have increased the accuracy by up to 2.5%.

Best Model is logistic Regression with 83% accuracy

Evaluate the model

Let’s Predict X_test first

y_preds = gs_log_reg.predict(X_test)
y_preds
######OUTPUT#########
array([0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,0,0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0,0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1,0, 0],dtype=int64)

Let see the confusion matrix, accuracy score, classification report, and roc curve.

confusion matrix

sns.set(font_scale=2)
import seaborn as sns
sns.heatmap(confusion_matrix(y_test,y_preds), annot=True,cbar=False, fmt='g')
plt.xlabel("True label")
plt.ylabel("Predicted label");

accuracy score

print(accuracy_score(y_test,y_pred))
#######OUTPUT########
0.8376

Classification Report

print(classification_report(y_test, y_preds))

ROC Curve

plot_roc_curve(gs_log_reg,X_test,y_test)

Save and Load the model

import pickle
# Save trained model to file
pickle.dump(gs_log_reg, open("Diabetes.pkl", "wb"))
loaded_model = pickle.load(open("Diabetes.pkl", "rb"))
loaded_model.predict(X_test)
loaded_model.score(X_test,y_test)
#######OUTPUT########
0.8376623376623377

2. Creating a web app using flask and connecting it with model

So to create a web app let’s prepare a folder structure

diabetes(root)
    |____templates
            |___index.html
    |____static
            |____css
            |_____js
    |____app.py
    |_____Diabetes.pkl

Download The templates and static directory from my Github

Let’s Create app.py

Now let’s run our code onto our localhost

Open CMD and go to the root(Diabetes) folder and then run app.py using python app.py then you will see some message like thisšŸ‘‡

ā€œImage by Authorā€

Just open the URL in any browser and test the app using some random inputs.

3. Commit the Projects to Github

Before committing the project to Github we also going to create two more files.

1. Profile: Heroku apps include a Procfile that specifies the commands that are executed by the app on startup.

web: gunicorn app:app

2. Requirments.txt: requirements. txt file is used for specifying what python packages are required to run the project.

Flask==1.1.1
gunicorn==19.9.0
itsdangerous==1.1.0
Jinja2==2.10.1
MarkupSafe==1.1.1
Werkzeug==0.15.5
numpy>=1.9.2
scipy>=0.15.1
scikit-learn>=0.18
matplotlib>=1.4.3
pandas>=0.19

Now after that, go to your Github account and upload the files and then commit to the branch.

ā€œImage by Authorā€

4. Deploy model using Heroku

Visit Heroku and create a free account and then log in to your account.

ā€œImage by Authorā€
ā€œImage by Authorā€

You can Find all the Source Code in my Github profile Abhayparashar31

Thanks, For readingšŸ˜„

Thanks For Reading Till Here, If You Like My Content and Want To Support Me The Best Way is —

  1. Follow Me On Medium.
  2. Connect With Me On LinkedIn.
  3. Become a Medium Member With The Cost of One Pizza Using My Referral Link. a small part of your membership fee will go to me.
  4. Subscribe To My Email List To Never Miss An Article From Me.
Machine Learning
Machine Learning Python
Programming
Data Science
Heroku
Recommended from ReadMedium