avatarRashida Nasrin Sucky

Summary

This article provides a step-by-step guide for beginners to develop their first neural network using PyTorch, focusing on heart disease prediction with a dataset from Kaggle.

Abstract

The article, aimed at beginners in deep learning, introduces the development of a neural network using the PyTorch framework. It starts with the importation of necessary packages and preprocessing of the 'Heart.csv' dataset from Kaggle, converting categorical data to numeric form. The author then splits the data into training and test sets and transforms them into torch tensors, preparing them for modeling. A simple neural network with two hidden layers is defined, and the model is trained over 150 epochs using the Adam optimizer and binary cross-entropy loss function. The training process involves batch-wise updates with a batch size of 64, and the article emphasizes the importance of resetting gradients after each epoch. The model's performance is evaluated on both the training and test datasets, achieving 100% accuracy. The author concludes by noting the manual nature of model training in PyTorch compared to TensorFlow, advocating for the learning of such processes for greater control in industry and research applications.

Opinions

  • The author believes it is beneficial for deep learning practitioners to be comfortable with both TensorFlow and PyTorch.
  • The choice of the number of neurons in the hidden layers is presented as a hyperparameter that requires trial and error to optimize.
  • The author suggests that the manual training process in PyTorch, although potentially more labor-intensive, offers more control and is preferred in certain industry and research contexts.
  • The article promotes the idea that understanding the underlying processes of model training is important for deep learning practitioners.
  • The author expresses satisfaction with the model's performance, achieving perfect accuracy on the test dataset.
Photo by M.T ElGassier on Unsplash

Developing Your First Neural Network in PyTorch

A Complete Step-by-Step Process for Beginners

I have been working and writing tutorials on deep learning space for a while now, and I focused mostly on TensorFlow. But Py Torch is also another very widely used deep learning package out there. I think it is a good idea to be comfortable with both of the packages. So, I decided to make tutorials on Py Torch as well.

In that context, this tutorial will be on a Neural Network in Py Torch for beginners. We will work on a project and go through step by step.

The Heart.csv dataset from Kaggle will be used for this tutorial. Please feel free to download the dataset and follow along:

Heart Attack Analysis & Prediction Dataset (kaggle.com)

This is a public dataset with CC0: Public Domain License.

Let’s dive in!

I would like to start by importing the necessary packages:

import pandas as pd 
from collections import OrderedDict 
from torch.optim import SGD 
from sklearn.model_selection import train_test_split 
from sklearn.datasets import make_blobs 
import torch.nn as nn 
import torch 

There are a few columns that have the data type of ‘object’. Before going to any modeling, the data types of those columns should be converted to numeric.

for i in df.columns: 
  if df[i].dtype == 'object':
    df[i] = df[i].astype('category').cat.codes 
df

Output:

As you can see, all the data are in numeric form now.

The last column is ‘HeartDisease’, which has two unique values: 0 and 1. Let’s assume that is the target variable which means the goal of this exercise is to determine the HeartDisease based on the other parameters available in the table.

Defining the training and target variables for the model:

X = df.drop(columns=['HeartDisease'])
y = df['HeartDisease']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21)

To use the data Py Torch models, data needs to be in ‘torch’ format. But x_train and x_test data are in DataFrame form. DataFrames cannot be converted to torch directly. So, converting the data to numpy arrays first and then to torch:

x_train, x_test, y_train, y_test = np.array(x_train), np.array(x_test), np.array(y_train), np.array(y_test)

trainX = torch.from_numpy(x_train).float()
testX = torch.from_numpy(x_test).float()
trainY = torch.from_numpy(y_train).float() 
testY = torch.from_numpy(y_test).float() 

The data is ready.

Model Development

As this is for beginners, we will go for a simple neural network.

The neural network is a Sequence of layers. We will work on a simple Sequential model with 2 hidden layers. Let’s see what the model looks like and then I will explain it.

class HeartDisease(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(11, 128)
        self.act1 = nn.ReLU()
        self.hidden2 = nn.Linear(128, 64)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(64, 1)
        self.act_output = nn.Sigmoid()
 
    def forward(self, x):
        x = self.act1(self.hidden1(x))
        x = self.act2(self.hidden2(x))
        x = self.act_output(self.output(x))
        return x

This is a very simple neural network with two hidden layers. The first hidden layer has an input of 11 and an output of 128. Here, 11 is the number of features or the number of columns that is taken as training features and 128 is the number of neurons in the first hidden layer. The number 128 is chosen by me, you can try with other numbers as well. The number of neurons can be considered as hyperparameters that need to be figured out. Mostly, it is decided by a lot of trial and error.

The output of the hidden1 should be the input of the hidden2. So, the input of the hidden2 is 128 and I chose the output to be 64. Finally, in the output layer, the input is 64 and the output has 1, as this is a binary classification. If you have 10 classes, the output will be 10.

In classification problems, the output needs to be passed through an activation function that gives a probability that ranges from 0 to 1. So that can be rounded up to 1 or rounded down to 0.

The number of hidden layers is also decided by trial and error.

Then, the forward function after the HeartDisease() method calls the layers and finally returns the output, which is our predicted value.

Printing the model,

model = HeartDisease()
print(model)

Output:

HeartDisease(
  (hidden1): Linear(in_features=11, out_features=128, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=128, out_features=64, bias=True)
  (act2): ReLU()
  (output): Linear(in_features=64, out_features=1, bias=True)
  (act_output): Sigmoid()
)

Here is the loss function and the optimizer:

import torch.optim as optim
loss_fn = nn.BCELoss() #binary cross entropy 
optimizer = optim.Adam(model.parameters(), lr=0.001)

Here comes the model training part. I trained the model for 150 epochs and used a batch size of 64. For each epoch predicted label is calculated using the model we defined, and we calculated the loss using the predicted label and true label.

The tricky part is that we should fix the gradients to zero before moving to the next epoch. Otherwise, the gradients from the previous epoch will add up to the current epoch, and the model training will not be correct.

epochs = 150 
batch_size = 64 

for epoch in range(epochs):
  for i in range(0, len(trainX), batch_size):
    Xbatch = trainX[i:i+batch_size]
    y_pred = model(Xbatch)
    ybatch = trainY[i:i+batch_size]
    #print()
    loss = loss_fn(torch.flatten(y_pred), ybatch)
    optimizer.zero_grad() 
    loss.backward() 
    optimizer.step() 
  print(f'Finished epoch {epoch}, latest loss {loss}')

Output:

Finished epoch 0, latest loss 0.46334463357925415
Finished epoch 1, latest loss 0.5276321172714233
Finished epoch 2, latest loss 0.5331380367279053
Finished epoch 3, latest loss 0.5323242545127869
...
...
...
Finished epoch 147, latest loss 0.16034317016601562
Finished epoch 148, latest loss 0.14931809902191162
Finished epoch 149, latest loss 0.15581083297729492

I just showed a few printouts of the losses from the model training to show you how gradually losses went down. Now, it’s time to check the performance of the model.

Model’s prediction accuracy on the test data:

with torch.no_grad():
  y_pred = model(testX)
accuracy = len((y_pred.round() == testY).float())/len(testY)
accuracy 

Output:

1.0

The prediction accuracy on the training data:

with torch.no_grad():
  y_pred = model(trainX)
accuracy = len((y_pred.round() == trainY))/len(trainY)
accuracy 

Output:

1.0

Wow! The accuracy is 100% for both training and test data.

Conclusion

If you are a TensorFlow user, model training may feel like too manual process for you. But in industry and in research, many people like this manual training process because it gives a lot of control. I feel like we should at least learn the process so that we can use it if necessary.

More Reading

A Complete Exploratory Data Analysis in Python | by Rashida Nasrin Sucky | Oct, 2023 | Towards AI (medium.com)

Twitter Sentiment Analysis in Python — Sklearn | Natural Language Processing | by Rashida Nasrin Sucky | Nov, 2023 | Towards AI (medium.com)

TensorFlow Model Training Using GradientTape | by Rashida Nasrin Sucky | Oct, 2023 | Towards Data Science (medium.com)

Converting Texts to Numeric Form with TfidfVectorizer: A Step-by-Step Guide | by Rashida Nasrin Sucky | Oct, 2023 | Towards Data Science (medium.com)

Complete Implementation of a Mini VGG Network for Image Recognition | by Rashida Nasrin Sucky | Towards Data Science (medium.com)

Data Science
Machine Learning
Artificial Intelligence
Programming
Technology
Recommended from ReadMedium