avatarGencay I.

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

8712

Abstract

used to perform principal component analysis on the dataset. It takes in the number of components to keep as a parameter.</p><div id="b08b"><pre><span class="hljs-keyword">from</span> sklearn.decomposition <span class="hljs-keyword">import</span> PCA pca = PCA(n_components=<span class="hljs-number">2</span>) X_train_pca = pca.fit_transform(X_train_scaled)</pre></div><h2 id="8a60">TSNE</h2><p id="8ce5">This function is used to perform t-distributed stochastic neighbor embedding on the dataset. It takes in the number of dimensions to embed the data into as a parameter.</p><div id="03f9"><pre><span class="hljs-keyword">from</span> sklearn.manifold <span class="hljs-keyword">import</span> TSNE tsne = TSNE(n_components=<span class="hljs-number">2</span>) X_train_tsne = tsne.fit_transform(X_train_scaled)</pre></div><h2 id="a99a">GradientBoostingClassifier</h2><p id="d923">This function is used to create a gradient boosting classifier. It takes in the training data and labels as parameters.</p><div id="529c"><pre><span class="hljs-keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> GradientBoostingClassifier clf = GradientBoostingClassifier() clf.fit(X_train_scaled, y_train_encoded)</pre></div><h2 id="991a">AdaBoostClassifier</h2><p id="69c2">This function is used to create an AdaBoost classifier. It takes in the training data and labels as parameters.</p><div id="095c"><pre>from sklearn.ensemble <span class="hljs-keyword">import</span> <span class="hljs-type">AdaBoostClassifier</span> <span class="hljs-variable">clf</span> <span class="hljs-operator">=</span> AdaBoostClassifier() clf.fit(X_train_scaled, y_train_encoded)</pre></div><h2 id="4eaa">Lasso</h2><p id="119d">This function is used to perform Lasso regression. It takes in the training data and labels as parameters.</p><div id="ad04"><pre><span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> Lasso reg = Lasso() reg.fit(X_train_scaled, y_train)</pre></div><h2 id="3f75">Ridge</h2><p id="08ea">This function is used to perform Ridge regression. It takes in the training data and labels as parameters.</p><div id="59aa"><pre><span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> Ridge reg = Ridge() reg.fit(X_train_scaled, y_train)</pre></div><h2 id="944f">ElasticNet</h2><p id="2b13">This function is used to perform Elastic Net regression. It takes in the training data and labels as parameters.</p><div id="db9a"><pre><span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> ElasticNet reg = ElasticNet() reg.fit(X_train_scaled, y_train)</pre></div><h2 id="197e">SGDClassifier</h2><p id="a9da">This function is used to create a stochastic gradient descent classifier. It takes in the training data and labels as parameters.</p><div id="546f"><pre><span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> SGDClassifier clf = SGDClassifier() clf.fit(X_train_scaled, y_train_encoded)</pre></div><h2 id="c90c">KernelPCA</h2><p id="e3e1">This function is used to perform kernel principal component analysis on the dataset. It takes in the kernel function and the number of components to keep as parameters.</p><div id="6023"><pre><span class="hljs-keyword">from</span> sklearn.decomposition <span class="hljs-keyword">import</span> KernelPCA kpca = KernelPCA(kernel=<span class="hljs-string">'rbf'</span>, n_components=<span class="hljs-number">2</span>) X_train_kpca = kpca.fit_transform(X_train_scaled)</pre></div><h2 id="5380">IsolationForest</h2><p id="358d">This function is used to create an isolation forest model for anomaly detection. It takes in the contamination level and the random seed as parameters.</p><div id="02f6"><pre><span class="hljs-keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> IsolationForest clf = IsolationForest(contamination=<span class="hljs-number">0.1</span>, random_state=<span class="hljs-number">42</span>) clf.fit(X_train_scaled)</pre></div><h2 id="736d">DBSCAN</h2><p id="9f25">This function is used to perform density-based spatial clustering of applications with noise (DBSCAN) on the dataset. It takes in the minimum number of samples and the radius of the neighborhood as parameters.</p><div id="c5d1"><pre><span class="hljs-keyword">from</span> sklearn.cluster <span class="hljs-keyword">import</span> DBSCAN dbscan = DBSCAN(min_samples=<span class="hljs-number">5</span>, eps=<span class="hljs-number">0.5</span>) dbscan.fit(X_train_scaled)</pre></div><h2 id="1a5d">AgglomerativeClustering</h2><p id="dfc2">This function is used to perform hierarchical clustering on the dataset. It takes in the number of clusters and the linkage method as parameters.</p><div id="244a"><pre><span class="hljs-keyword">from</span> sklearn.cluster <span class="hljs-keyword">import</span> AgglomerativeClustering agg = AgglomerativeClustering(n_clusters=<span class="hljs-number">3</span>, linkage=<span class="hljs-string">'ward'</span>) agg.fit(X_train_scaled)</pre></div><h2 id="18cd">KernelDensity</h2><p id="d5b0">This function is used to estimate the probability density function of the dataset using a kernel density estimator. It takes in the kernel function and the bandwidth as parameters.</p><div id="1fe0"><pre><span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> KernelDensity kde = KernelDensity(kernel=<span class="hljs-string">'gaussian'</span>, bandwidth=<span class="hljs-number">0.1</span>) kde.fit(X_train_scaled)</pre></div><h2 id="3bdd">GaussianMixture</h2><p id="d825">This function is used to perform Gaussian mixture modeling on the dataset. It takes in the number of components and the covariance type as parameters.</p><div id="18d9"><pre><span class="hljs-keyword">from</span> sklearn.mixture <span class="hljs-keyword">import</span> GaussianMixture gmm = GaussianMixture(n_components=<span class="hljs-number">3</span>, covariance_type=<span class="hljs-string">'full'</span>) gmm.fit(X_train_scaled)</pre></div><h2 id="f903">NearestNeighbors</h2><p id="aaa3">This function is used to perform nearest neighbor searches on the dataset. It takes in the number of neighbors and the distance metric as parameters.</p><div id="507d"><pre><span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> NearestNeighbors nn = NearestNeighbors(n_neighbors=<span class="hljs-number">5</span>, metric=<span class="hljs-string">'euclidean'</span>) nn.fit(X_train_scaled)</pre></div><h2 id="bbe5">KNNClassifier</h2><p id="bb90">This function is used to create a K-nearest neighbors classifier. It takes in the training data and labels as parameters.</p><div id="5bda"><pre><span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> KNeighborsClassifier clf = KNeighborsClassifier() clf.fit(X_train_scaled, y_train_encoded)</pre></div><h2 id="f1ae">LDA</h2><p id="eb64">This function is used to perform linear discriminant analysis on the dataset. It takes in the number of components to keep as a parameter.</p><div id="ac13"><pre><span class="hljs-keyword">from</span> sklearn.discriminant_analysis <span class="hljs-keyword">import</span> LinearDiscriminantAnalysis lda = LinearDiscriminantAnalysis(n_components=<span class="hljs-number">2</span>) X_train_lda = lda.fit_transform(X_train_scaled, y_train_encoded)</pre></div><h2 id="4216">QDA</h2><p id="4d79">This function is used to perform quadratic discriminant analysis on the dataset.</p><div id="2c52"><pre><span class="hljs-keyword">from</span> sklearn.discriminant_analysis <span class="hljs-keyword">import</span> QuadraticDiscriminantAnalysis qda = QuadraticDiscriminantAnalysis() qda.fit(X_train_scaled, y_train_encoded)</pre></div><h2 id="c84c">RANSACRegressor</h2><p id="4598">This function is used to perform RANSAC regression on the dataset. It takes in the base estimator and the maximum number of iterations as parameters.</p><div id="039f"><pre><span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> RANSACRegressor <span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> LinearRegression ransac = RANSACRegressor(base_estimator=LinearRegression(), max_trials=<span class="hljs-number">100</span>) ransac.fit(X_train_scaled, y_train)</pre></div><h2 id="58b2">GradientBoostingRegressor</h2><p id="d742">This function is used to create a gradient boosting regression model. It takes in the training data and labels as parameters.</p><div id="82ef"><pre><span class="hljs-

Options

keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> GradientBoostingRegressor reg = GradientBoostingRegressor() reg.fit(X_train_scaled, y_train)</pre></div><h2 id="d5e6">AdaBoostRegressor</h2><p id="776e">This function is used to create an AdaBoost regression model. It takes in the training data and labels as parameters.</p><div id="bd9b"><pre><span class="hljs-keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> AdaBoostRegressor reg = AdaBoostRegressor() reg.fit(X_train_scaled, y_train)</pre></div><h2 id="2688">SVR</h2><p id="ed8d">This function is used to create a support vector regression model. It takes in the training data and labels as parameters.</p><div id="829e"><pre><span class="hljs-keyword">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVR reg = SVR() reg.fit(X_train_scaled, y_train)</pre></div><h2 id="b8c5">DecisionTreeRegressor</h2><p id="010b">This function is used to create a decision tree regression model. It takes in the training data and labels as parameters.</p><div id="8fd3"><pre><span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeRegressor reg = DecisionTreeRegressor() reg.fit(X_train_scaled, y_train)</pre></div><h2 id="3fa5">RandomForestRegressor</h2><p id="1032">This function is used to create a random forest regression model. It takes in the training data and labels as parameters.</p><div id="5a41"><pre>from sklearn.ensemble <span class="hljs-keyword">import</span> <span class="hljs-type">RandomForestRegressor</span> <span class="hljs-variable">reg</span> <span class="hljs-operator">=</span> RandomForestRegressor() reg.fit(X_train_scaled, y_train)</pre></div><h2 id="f006">PolynomialFeatures</h2><p id="f8eb">This function is used to generate polynomial features from the dataset. It takes in the degree of the polynomial as a parameter.</p><div id="b2f7"><pre><span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> PolynomialFeatures poly = PolynomialFeatures(degree=<span class="hljs-number">2</span>) X_train_poly = poly.fit_transform(X_train_scaled)</pre></div><h2 id="7584">TruncatedSVD</h2><p id="3f35">This function is used to perform truncated singular value decomposition on the dataset. It takes in the number of components to keep as a parameter.</p><div id="af84"><pre><span class="hljs-keyword">from</span> sklearn.decomposition <span class="hljs-keyword">import</span> TruncatedSVD svd = TruncatedSVD(n_components=<span class="hljs-number">2</span>) X_train_svd = svd.fit_transform(X_train_scaled)</pre></div><h2 id="1ce5">NMF</h2><p id="4e1d">This function is used to perform non-negative matrix factorization on the dataset. It takes in the number of components to extract as a parameter.</p><div id="4305"><pre><span class="hljs-keyword">from</span> sklearn.decomposition <span class="hljs-keyword">import</span> NMF nmf = NMF(n_components=<span class="hljs-number">2</span>) X_train_nmf = nmf.fit_transform(X_train_scaled)</pre></div><h2 id="c49f">Binarizer</h2><p id="ff06">This function is used to binarize the dataset based on a threshold value. It takes in the threshold value as a parameter.</p><div id="8e7a"><pre><span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> Binarizer binarizer = Binarizer(threshold=<span class="hljs-number">0.5</span>) X_train_binarized = binarizer.fit_transform(X_train_scaled) </pre></div><h2 id="421a">LabelBinarizer</h2><p id="d0ba">This function is used to binarize categorical variables as binary vectors. It is often used to prepare data for algorithms that require binary inputs.</p><div id="3baf"><pre><span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> LabelBinarizer binarizer = LabelBinarizer() y_train_binarized = binarizer.fit_transform(y_train) </pre></div><h2 id="19c7">MultiLabelBinarizer</h2><p id="54be">This function is used to binarize multiple categorical variables as binary vectors. It is often used to prepare data for algorithms that require binary inputs.</p><div id="2bc9"><pre><span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> MultiLabelBinarizer binarizer = MultiLabelBinarizer() y_train_binarized = binarizer.fit_transform(y_train)</pre></div><h2 id="6113">LabelPropagation</h2><p id="ccae">This function is used to perform label propagation on the dataset. It takes in the kernel function and the number of iterations as parameters.</p><div id="b8f5"><pre><span class="hljs-keyword">from</span> sklearn.semi_supervised <span class="hljs-keyword">import</span> LabelPropagation propagation = LabelPropagation(kernel=<span class="hljs-string">'knn'</span>, max_iter=<span class="hljs-number">100</span>) propagation.fit(X_train_scaled, y_train)</pre></div><h2 id="e06c">LabelSpreading</h2><p id="0a20">This function is used to perform label spreading on the dataset. It takes in the kernel function and the number of iterations as parameters.</p><div id="049a"><pre><span class="hljs-keyword">from</span> sklearn.semi_supervised <span class="hljs-keyword">import</span> LabelSpreading spreading = LabelSpreading(kernel=<span class="hljs-string">'knn'</span>, max_iter=<span class="hljs-number">100</span>) spreading.fit(X_train_scaled, y_train)</pre></div><h2 id="336f">CalibratedClassifierCV</h2><p id="247d">This function is used to calibrate the probabilities of a classifier. It takes in the base classifier and the calibration method as parameters.</p><div id="234d"><pre><span class="hljs-keyword">from</span> sklearn.<span class="hljs-property">calibration</span> <span class="hljs-keyword">import</span> <span class="hljs-title class_">CalibratedClassifierCV</span> <span class="hljs-keyword">from</span> sklearn.<span class="hljs-property">linear_model</span> <span class="hljs-keyword">import</span> <span class="hljs-title class_">LogisticRegression</span> clf = <span class="hljs-title class_">LogisticRegression</span>() calibrated_clf = <span class="hljs-title class_">CalibratedClassifierCV</span>(clf, cv=<span class="hljs-number">5</span>, method=<span class="hljs-string">'sigmoid'</span>) calibrated_clf.<span class="hljs-title function_">fit</span>(X_train_scaled, y_train_encoded)</pre></div><h2 id="8d86">DummyClassifier</h2><p id="6bd7">This function is used to create a dummy classifier that predicts using a simple strategy. It takes in the strategy as a parameter.</p><div id="e0da"><pre><span class="hljs-keyword">from</span> sklearn.dummy <span class="hljs-keyword">import</span> DummyClassifier dummy = DummyClassifier(strategy=<span class="hljs-string">'most_frequent'</span>) dummy.fit(X_train_scaled, y_train_encoded)</pre></div><h1 id="db14">Conclusion</h1><p id="526e">In conclusion, we have covered 50 of the most useful functions provided by Sci-kit learn for machine learning tasks. T</p><p id="1fb3">These functions cover a wide range of techniques and methodologies, making it easier for you to solve real-world problems and accelerate your data science projects.</p><p id="f862">If you want to stay up-to-date with the latest news and trends in data science, machine learning, and AI, then I encourage you to subscribe to my mynewsletter.</p><p id="56aa">By subscribing to my newsletter, you will receive regular updates on new articles, tutorials, and resources that can help you improve your skills and stay ahead of the curve. You can subscribe it by filling out the following forms;</p><p id="829f"><b><i>Here is my <a href="https://gencay.ck.page/">NumPy cheat sheet</a>.</i></b></p><p id="a059"><b><i>Here is the source code of the “<a href="https://gencay.ck.page/billionaire">How to be a Billionaire</a>” data project.</i></b></p><p id="bea6"><b><i>Here is the source code of the “<a href="https://gencay.ck.page/bfd9d41fdc">Classification Task with 6 Different Algorithms using Python</a>” data project.</i></b></p><p id="d476"><b><i>Here is the source code of the “<a href="https://gencay.ck.page/2df5d07388">Decision Tree in Energy Efficiency Analysis</a>” data project.</i></b></p><p id="bcce"><b><i>Here is the source code of the “<a href="https://gencay.ck.page/56510fbc8d">DataDrivenInvestor 2022 Articles Analysis</a>” data project.</i></b></p><p id="4651">Thanks for reading!</p><p id="3f91"><b><i>If you still are not a member of Medium and are eager to learn by reading, here is my referral <a href="https://medium.com/@geencay/membership">link.</a></i></b></p><p id="e740" type="7">“Machine learning is the last invention that humanity will ever need to make.” Nick Bostrom</p></article></body>

scikit-learn Cheat Sheet: Functions for Machine Learning

Mastering Machine Learning with Python and scikit-learn: A Comprehensive Guide for Data Scientists and AI Enthusiasts

Image by Author

Introduction

It is no secret that data science and machine learning have become essential components of the modern business landscape. With the rise of artificial intelligence and the increasing demand for data-driven insights, more and more companies are turning to these powerful tools to gain a competitive edge. Fortunately, Python has emerged as the language of choice for many data scientists, and the Sci-kit learn library provides a comprehensive set of tools for building and deploying machine learning models.

In this article, we will explore 50 of the most useful functions provided by Sci-kit learn for machine learning tasks. From data preprocessing to model selection and evaluation, these functions cover a wide range of techniques and methodologies for solving real-world problems. As if that is not enough, we will use pre-built datasets to illustrate the application of each function, making it easier for you to follow along and apply them in your own projects.

Sounds fantastic? Now, for the surprise: many of these functions are easy to use and require only a few lines of code to implement. Whether you are a seasoned data scientist or just starting out, this cheat sheet will help you become more familiar with the powerful tools available in Sci-kit learn and enable you to accelerate your data science and machine learning projects.

So, grab your favorite beverage, sit back, and let’s dive into the world of Sci-kit learn!

train_test_split

This function is used to split a dataset into training and testing sets. It takes in the dataset, the target variable, and the size of the testing set as parameters.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3)

StandardScaler

This function is used to standardize the dataset by subtracting the mean and dividing by the standard deviation. It is often used to prepare data for algorithms that require standardized input.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

MinMaxScaler

This function is used to scale the dataset to a specific range (usually 0 to 1). It is often used to prepare data for algorithms that require inputs within a certain range.

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

LabelEncoder

This function is used to encode categorical variables as integers. It is often used to prepare data for algorithms that cannot handle categorical variables.

from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y_train_encoded = encoder.fit_transform(y_train)
y_test_encoded = encoder.transform(y_test)

OneHotEncoder

This function is used to encode categorical variables as binary vectors. It is often used to prepare data for algorithms that require binary inputs.

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
y_train_encoded = encoder.fit_transform(y_train.reshape(-1,1))
y_test_encoded = encoder.transform(y_test.reshape(-1,1))

DecisionTreeClassifier

This function is used to create a decision tree model. It takes in the training data and labels as parameters.

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
clf.fit(X_train_scaled, y_train_encoded)

RandomForestClassifier

This function is used to create a random forest model. It takes in the training data and labels as parameters.

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_train_scaled, y_train_encoded)

KMeans

This function is used to create a K-means clustering model. It takes in the dataset and the number of clusters as parameters.

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(X_train_scaled)

LinearRegression

This function is used to create a linear regression model. It takes in the training data and labels as parameters.

from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_train_scaled, y_train)

LogisticRegression

This function is used to create a logistic regression model. It takes in the training data and labels as parameters.

from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train_scaled, y_train_encoded)

SVM

This function is used to create a support vector machine model. It takes in the training data and labels as parameters.

from sklearn.svm import SVC
clf = SVC()
clf.fit(X_train_scaled, y_train_encoded)

NaiveBayes

This function is used to create a Naive Bayes model. It takes in the training data and labels as parameters.

from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X_train_scaled, y_train_encoded)

GridSearchCV

This function is used to perform a grid search to find the best hyperparameters for a model. It takes in the model, the hyperparameter grid, and the cross-validation strategy as parameters.

from sklearn.model_selection import GridSearchCV
param_grid = {'n_estimators': [10, 50, 100], 'max_depth': [2, 4, 8]}
grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train_encoded)

Pipeline

This function is used to create a pipeline of data preprocessing and modeling steps. It takes in a list of tuples, where each tuple contains a name for the step and the corresponding function.

from sklearn.pipeline import Pipeline
pipe = Pipeline([    ('scaler', StandardScaler()),    ('clf', RandomForestClassifier())])
pipe.fit(X_train, y_train)

PCA

This function is used to perform principal component analysis on the dataset. It takes in the number of components to keep as a parameter.

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_scaled)

TSNE

This function is used to perform t-distributed stochastic neighbor embedding on the dataset. It takes in the number of dimensions to embed the data into as a parameter.

from sklearn.manifold import TSNE
tsne = TSNE(n_components=2)
X_train_tsne = tsne.fit_transform(X_train_scaled)

GradientBoostingClassifier

This function is used to create a gradient boosting classifier. It takes in the training data and labels as parameters.

from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier()
clf.fit(X_train_scaled, y_train_encoded)

AdaBoostClassifier

This function is used to create an AdaBoost classifier. It takes in the training data and labels as parameters.

from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier()
clf.fit(X_train_scaled, y_train_encoded)

Lasso

This function is used to perform Lasso regression. It takes in the training data and labels as parameters.

from sklearn.linear_model import Lasso
reg = Lasso()
reg.fit(X_train_scaled, y_train)

Ridge

This function is used to perform Ridge regression. It takes in the training data and labels as parameters.

from sklearn.linear_model import Ridge
reg = Ridge()
reg.fit(X_train_scaled, y_train)

ElasticNet

This function is used to perform Elastic Net regression. It takes in the training data and labels as parameters.

from sklearn.linear_model import ElasticNet
reg = ElasticNet()
reg.fit(X_train_scaled, y_train)

SGDClassifier

This function is used to create a stochastic gradient descent classifier. It takes in the training data and labels as parameters.

from sklearn.linear_model import SGDClassifier
clf = SGDClassifier()
clf.fit(X_train_scaled, y_train_encoded)

KernelPCA

This function is used to perform kernel principal component analysis on the dataset. It takes in the kernel function and the number of components to keep as parameters.

from sklearn.decomposition import KernelPCA
kpca = KernelPCA(kernel='rbf', n_components=2)
X_train_kpca = kpca.fit_transform(X_train_scaled)

IsolationForest

This function is used to create an isolation forest model for anomaly detection. It takes in the contamination level and the random seed as parameters.

from sklearn.ensemble import IsolationForest
clf = IsolationForest(contamination=0.1, random_state=42)
clf.fit(X_train_scaled)

DBSCAN

This function is used to perform density-based spatial clustering of applications with noise (DBSCAN) on the dataset. It takes in the minimum number of samples and the radius of the neighborhood as parameters.

from sklearn.cluster import DBSCAN
dbscan = DBSCAN(min_samples=5, eps=0.5)
dbscan.fit(X_train_scaled)

AgglomerativeClustering

This function is used to perform hierarchical clustering on the dataset. It takes in the number of clusters and the linkage method as parameters.

from sklearn.cluster import AgglomerativeClustering
agg = AgglomerativeClustering(n_clusters=3, linkage='ward')
agg.fit(X_train_scaled)

KernelDensity

This function is used to estimate the probability density function of the dataset using a kernel density estimator. It takes in the kernel function and the bandwidth as parameters.

from sklearn.neighbors import KernelDensity
kde = KernelDensity(kernel='gaussian', bandwidth=0.1)
kde.fit(X_train_scaled)

GaussianMixture

This function is used to perform Gaussian mixture modeling on the dataset. It takes in the number of components and the covariance type as parameters.

from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=3, covariance_type='full')
gmm.fit(X_train_scaled)

NearestNeighbors

This function is used to perform nearest neighbor searches on the dataset. It takes in the number of neighbors and the distance metric as parameters.

from sklearn.neighbors import NearestNeighbors
nn = NearestNeighbors(n_neighbors=5, metric='euclidean')
nn.fit(X_train_scaled)

KNNClassifier

This function is used to create a K-nearest neighbors classifier. It takes in the training data and labels as parameters.

from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier()
clf.fit(X_train_scaled, y_train_encoded)

LDA

This function is used to perform linear discriminant analysis on the dataset. It takes in the number of components to keep as a parameter.

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
X_train_lda = lda.fit_transform(X_train_scaled, y_train_encoded)

QDA

This function is used to perform quadratic discriminant analysis on the dataset.

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train_scaled, y_train_encoded)

RANSACRegressor

This function is used to perform RANSAC regression on the dataset. It takes in the base estimator and the maximum number of iterations as parameters.

from sklearn.linear_model import RANSACRegressor
from sklearn.linear_model import LinearRegression
ransac = RANSACRegressor(base_estimator=LinearRegression(), max_trials=100)
ransac.fit(X_train_scaled, y_train)

GradientBoostingRegressor

This function is used to create a gradient boosting regression model. It takes in the training data and labels as parameters.

from sklearn.ensemble import GradientBoostingRegressor
reg = GradientBoostingRegressor()
reg.fit(X_train_scaled, y_train)

AdaBoostRegressor

This function is used to create an AdaBoost regression model. It takes in the training data and labels as parameters.

from sklearn.ensemble import AdaBoostRegressor
reg = AdaBoostRegressor()
reg.fit(X_train_scaled, y_train)

SVR

This function is used to create a support vector regression model. It takes in the training data and labels as parameters.

from sklearn.svm import SVR
reg = SVR()
reg.fit(X_train_scaled, y_train)

DecisionTreeRegressor

This function is used to create a decision tree regression model. It takes in the training data and labels as parameters.

from sklearn.tree import DecisionTreeRegressor
reg = DecisionTreeRegressor()
reg.fit(X_train_scaled, y_train)

RandomForestRegressor

This function is used to create a random forest regression model. It takes in the training data and labels as parameters.

from sklearn.ensemble import RandomForestRegressor
reg = RandomForestRegressor()
reg.fit(X_train_scaled, y_train)

PolynomialFeatures

This function is used to generate polynomial features from the dataset. It takes in the degree of the polynomial as a parameter.

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_scaled)

TruncatedSVD

This function is used to perform truncated singular value decomposition on the dataset. It takes in the number of components to keep as a parameter.

from sklearn.decomposition import TruncatedSVD
svd = TruncatedSVD(n_components=2)
X_train_svd = svd.fit_transform(X_train_scaled)

NMF

This function is used to perform non-negative matrix factorization on the dataset. It takes in the number of components to extract as a parameter.

from sklearn.decomposition import NMF
nmf = NMF(n_components=2)
X_train_nmf = nmf.fit_transform(X_train_scaled)

Binarizer

This function is used to binarize the dataset based on a threshold value. It takes in the threshold value as a parameter.

from sklearn.preprocessing import Binarizer
binarizer = Binarizer(threshold=0.5)
X_train_binarized = binarizer.fit_transform(X_train_scaled)  

LabelBinarizer

This function is used to binarize categorical variables as binary vectors. It is often used to prepare data for algorithms that require binary inputs.

from sklearn.preprocessing import LabelBinarizer
binarizer = LabelBinarizer()
y_train_binarized = binarizer.fit_transform(y_train)  

MultiLabelBinarizer

This function is used to binarize multiple categorical variables as binary vectors. It is often used to prepare data for algorithms that require binary inputs.

from sklearn.preprocessing import MultiLabelBinarizer
binarizer = MultiLabelBinarizer()
y_train_binarized = binarizer.fit_transform(y_train)

LabelPropagation

This function is used to perform label propagation on the dataset. It takes in the kernel function and the number of iterations as parameters.

from sklearn.semi_supervised import LabelPropagation
propagation = LabelPropagation(kernel='knn', max_iter=100)
propagation.fit(X_train_scaled, y_train)

LabelSpreading

This function is used to perform label spreading on the dataset. It takes in the kernel function and the number of iterations as parameters.

from sklearn.semi_supervised import LabelSpreading
spreading = LabelSpreading(kernel='knn', max_iter=100)
spreading.fit(X_train_scaled, y_train)

CalibratedClassifierCV

This function is used to calibrate the probabilities of a classifier. It takes in the base classifier and the calibration method as parameters.

from sklearn.calibration import CalibratedClassifierCV
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
calibrated_clf = CalibratedClassifierCV(clf, cv=5, method='sigmoid')
calibrated_clf.fit(X_train_scaled, y_train_encoded)

DummyClassifier

This function is used to create a dummy classifier that predicts using a simple strategy. It takes in the strategy as a parameter.

from sklearn.dummy import DummyClassifier
dummy = DummyClassifier(strategy='most_frequent')
dummy.fit(X_train_scaled, y_train_encoded)

Conclusion

In conclusion, we have covered 50 of the most useful functions provided by Sci-kit learn for machine learning tasks. T

These functions cover a wide range of techniques and methodologies, making it easier for you to solve real-world problems and accelerate your data science projects.

If you want to stay up-to-date with the latest news and trends in data science, machine learning, and AI, then I encourage you to subscribe to my mynewsletter.

By subscribing to my newsletter, you will receive regular updates on new articles, tutorials, and resources that can help you improve your skills and stay ahead of the curve. You can subscribe it by filling out the following forms;

Here is my NumPy cheat sheet.

Here is the source code of the “How to be a Billionaire” data project.

Here is the source code of the “Classification Task with 6 Different Algorithms using Python” data project.

Here is the source code of the “Decision Tree in Energy Efficiency Analysis” data project.

Here is the source code of the “DataDrivenInvestor 2022 Articles Analysis” data project.

Thanks for reading!

If you still are not a member of Medium and are eager to learn by reading, here is my referral link.

“Machine learning is the last invention that humanity will ever need to make.” Nick Bostrom

Scikit Learn
Python
Machine Learning
AI
Data Science
Recommended from ReadMedium