Plotting the Learning Curve with a Single Line of Code
To see how much your model benefits from adding more training data

The Learning Curve is another great tool to have in any data scientist’s toolbox. It is a visualization technique that can be to see how much our model benefits from adding more training data. It shows the relationship between the training score and the test score for a machine learning model with a varying number of training samples. Generally, the cross-validation procedure is taken into effect when plotting the learning curve.
A good ML model fits the training data very well and is generalizable to new input data as well. Sometimes, an ML model may require more training instances in order to generalize to new input data. Adding more training data will sometimes benefit the model to generalize, but not always! We can decide whether to add more training data to build a more generalizable model by looking at its learning curve.
Plotting the learning curve typically requires writing many lines of code and consumes more time. But, thanks to the Python Yellowbrick library, things are much easy now! By using it properly, we can plot the learning curve with just a single line of code! In this article, we will discuss how to plot the learning curve with Yellowbrick and learn how to interpret it.
Prerequisites
To get the most out of today’s content, it is recommended to read the “Using k-fold cross-validation for evaluating a model’s performance” section of my k-fold cross-validation explained in plain English article.
In addition to that, having knowledge of Support Vector Machines and Random Forests algorithms is preferred. This is because, today, we plot the learning curve based on those algorithms. If you’re not familiar with them, just read the following contents written by me.
Installing Yellowbrick
Yellowbrick doesn’t come with the default Anaconda installer. You need to manually install it. To install it, open your Anaconda prompt and just run the following command.
pip install yellowbrickIf that didn’t work for you, try the following with the user tag.
pip install yellowbrick --useror you can also try it with the conda-forge channel.
conda install -c conda-forge yellowbrickor try it with the DistrictDataLabs channel.
conda install -c districtdatalabs yellowbrickAny of the above methods will install the latest version of Yellowbrick.
Plotting the learning curve
Now, consider the following example codes where we plot the learning curve of an SVM and a Random Forest Classifier using the Scikit-learn built-in breast cancer dataset. That dataset has 30 features and 569 training samples. Let’s see adding more data will benefit the SVM and Random Forest models to generalize to new input data.









