Summary

The web content introduces FLAML, an open-source Python library designed to automate the selection of the best machine learning model for both classification and regression tasks efficiently and economically.

Abstract

The article titled "Automating Machine Learning Using FLAML" delves into the use of the FLAML library for streamlining the machine learning model selection process. It emphasizes the difficulty and resource-intensive nature of choosing the optimal model from a wide array of algorithms, and presents FLAML as a solution that is both fast and lightweight. The author guides readers through the installation and usage of FLAML, demonstrating its application on the Iris dataset for classification and the Boston dataset for regression. The library's ability to automatically identify the best machine learning model and hyperparameters is showcased, with the Extra Tree Estimator and a linear model being identified as the top performers for the respective datasets. The article concludes with an invitation for readers to apply FLAML to their own datasets and share their experiences, and it acknowledges the collaboration with Piyush Ingale in its creation.

Opinions

The author expresses that selecting the best machine learning model is challenging and resource-intensive, suggesting a need for automation tools like FLAML.
FLAML is portrayed as an efficient and economical solution for automating the model selection process, with the potential to save time and computational resources.
The author provides a positive endorsement of FLAML by demonstrating its effectiveness on well-known datasets and encouraging readers to explore its capabilities further.
The article implies that FLAML's design is user-friendly, as evidenced by the straightforward installation and usage instructions provided.
By showcasing the best models and hyperparameters found for the example datasets, the author conveys confidence in FLAML's performance and accuracy.
The collaboration with Piyush Ingale suggests a community-driven approach to the development and promotion of FLAML.
The author's willingness to engage with readers, as indicated by the invitation to share comments and the provision of contact information, reflects a commitment to community and knowledge sharing within the data science field.

Automating Machine Learning Using FLAML

Using FLAML for Automating Machine Learning Process

Machine Learning is a process where we try to solve real-life business problems using a different set of algorithms. Creating a Machine Learning model is easy but selecting which model performs the best for our data in terms of generalization and performance is a difficult task.

There is a wide variety of Machine Learning algorithms for both Regression and Classification. These can be selected on the basis of what kind of problem we are trying to solve but it is a process that takes high computational cost, time, and effort. There are different Python libraries that provide an option to automate the process of selecting the best Machine Learning model automatically and efficiently, one such library is FLAML.

FLAML is a lightweight open-source Python library that helps in finding out the best Machine Learning model automatically, efficiently, and economically. It is fast that saves time and is also lightweight in design.

In this article, we will explore FLAML and its functionalities.

Let’s get started…

Installing required libraries

We will start by installing FLAML using pip installation. The command given below will install FLAML using pip.

pip install flaml

Importing required libraries

In this step, we will import all the libraries that are required for creating a Machine Learning model and downloading the dataset.

from flaml import AutoML

Solving Classification Problem

Now we will start by solving a classification problem. The data that we will be using here is the famous Iris dataset that can be easily loaded from the Seaborn library. Let’s start creating the model.

#Loading the Dataset
from sklearn.datasets import load_iris

Creating an instance for Automl is important and also defining the Automl settings, so in this step, we will also create the Automl instance and define the settings.

automl = AutoML()

automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'accuracy',
    "task": 'classification'
}

Next, we will split the load of the data and fit it into the model. Finally, we will also predict using the model and find the best model.

X_train, y_train = load_iris(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)

print(automl.predict_proba(X_train).shape)
# Export the best model
print(automl.model)

Here, we can clearly see that the Extra Tree Estimator is the best model for this data. Now let us print the best hyperparameters and accuracy of the model.

print('Best ML leaner:', automl.best_estimator)
print('Best hyperparmeter config:', automl.best_config)
print('Best accuracy on validation data: {0:.4g}'.format(1-automl.best_loss))
print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))

Similarly, we will follow the same process for the Regression problem also.

Solving Regression Problem

Now we will solve a regression problem. The data that we will be using here is the famous Boston Dataset that can be easily loaded from the Seaborn library. We can follow the exact same process as we did for the Classification problem.

from sklearn.datasets import load_boston

automl = AutoML()

automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'r2',
    "task": 'regression'
}
X_train, y_train = load_boston(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict(X_train).shape)
# Export the best model
print(automl.model)

print('Best ML leaner:', automl.best_estimator)
print('Best hyperparmeter config:', automl.best_config)
print('Best accuracy on validation data: {0:.4g}'.format(1-automl.best_loss))
print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))

Here we can clearly see the best models and hyperparameters for the Regression problems also.

Similarly, you can follow this process for your dataset and find the best models and hyperparameters for your problem. Try this with different datasets, and let me know your comments in the response section.

This article is in collaboration with Piyush Ingale.

Before You Go

Thanks for reading! If you want to get in touch with me, feel free to reach me at [email protected] or my LinkedIn Profile. You can view my Github profile for different data science projects and packages tutorials. Also, feel free to explore my profile and read different articles I have written related to Data Science.