avatarDiego Degese

Summary

This article explores the use of KMeans clustering as a technique for identifying trends in financial markets.

Abstract

The article discusses the application of the KMeans clustering algorithm as a technique for trend detection in financial markets. KMeans is an iterative algorithm that partitions data points into K clusters based on similarity. The key steps of the algorithm involve random initialization, assignment, update of centroids, and repetition until convergence. The article also covers the implementation of trend detection with KMeans, including importing required libraries, defining methods for predicting values, training the KMeans model, and evaluating its performance. The results of the KMeans model demonstrate a balanced performance in predicting both upward and downward trends.

Opinions

  • KMeans is an efficient and effective method for grouping data points with similar characteristics.
  • The transformation pred_y = ~pred_y + 2 was applied to correct the inversion of 0s and 1s in the groups, as the assignment of categories to 0s or 1s is not known beforehand in unsupervised learning.
  • The KMeans model had a higher precision and F1-Score for the 'UP' category, indicating its effectiveness in identifying profitable upward trends.
  • The overall accuracy of the KMeans model was approximately 62%, meaning it correctly predicted the trend in about 62% of the test dataset instances.
  • The author recommends comparing the performance of the KMeans-based trend detection with previously explored models using the F1-Score as an evaluation metric.
  • The author emphasizes that the information provided is for educational purposes only and should not be construed as investment advice.
  • The author provides links to download the full source code and colab notebook of this article, as well as their social media profiles.

Exploring Trend Detection with 1-Minute Data in the First 15 Minutes of Each Day (Part 4)

In this fourth part of our trend detection series, we will explore the application of KMeans clustering as a technique for identifying trends in financial markets.

Image from freeshows.ru

KMeans is a popular unsupervised learning algorithm for clustering data points into distinct groups based on similarity. We will explain how KMeans works and, using the baseline methods, will try to improve our trend detection.

Understanding KMeans Clustering

KMeans clustering is an iterative algorithm that aims to partition data points into K clusters, where each data point belongs to the cluster with the nearest mean (centroid). The algorithm’s key steps are as follows:

  1. Random Initialization: K initial centroids are randomly chosen from the data points.
  2. Assignment: Each data point is assigned to the cluster whose centroid is closest to it. This is done based on the Euclidean distance between the data point and the centroid.
  3. Update Centroids: After the assignment step, the centroids of the clusters are updated as the mean of the data points within each cluster.
  4. Repeat: Steps 2 and 3 are repeated until the centroids converge and the clusters stabilize.

The final result is a set of K clusters, and each data point is associated with the cluster to which it belongs. KMeans is an efficient and effective method for grouping data points with similar characteristics.

Before I continue sharing all the information, if you enjoy reading my articles, please hit the follow button — Diego Degese

Implementing trend detection with KMeans

In order to streamline our code and ensure consistent data processing, we will employ the same baseline methods and dataset as used in Part 1. By leveraging the clustering capabilities of KMeans, we aim to enhance the ability to identify and distinguish different market trends.

Let’s explore the code implementation of trend detection with the KMeans clustering.

Importing Required Libraries

from sklearn.cluster import KMeans

We start by importing the necessary libraries. We import the KMeans class from the sklearn library, which is a widely used machine-learning library in Python.

Defining the method to predict the values

def get_predicted_values(model, features):

    pred_y = model.predict(features)
    
    # Transform 0s to 1s and 1s to 0s (In this example we have the values in the opposite site)
    pred_y = ~pred_y + 2

    return pred_y

In this step, we define a method that receives the model and the features and returns all the predicted values.

Since Kmeans is an unsupervised learning algorithm, the assignment of categories to 0s or 1s is not known beforehand. To address this, I manually inspected the data and found that the 0s and 1s were inverted in the groups. Consequently, I corrected the inversion by applying the transformation: pred_y = ~pred_y + 2.

Training the KMeans Clustering Model

# Train Model
model = KMeans(n_clusters=2, random_state=42, n_init="auto")
model.fit(train_x)

In this step, we create a KMeans clustering model using the KMeans class from sklearn. We specify that we want to create 2 clusters (n_clusters=2) since we are interested in separating the data into two trends: 'UP' and 'DOWN'. We set the random_state for reproducibility, and n_init determines the number of times the algorithm will be run with different initial centroids, automatically set here to achieve convergence.

We then train the KMeans Clustering model using the training data. The model will determine two clusters based on the features of the data.

Evaluating the KMeans Clustering Model and Saving Results

# Predict and save train values
pred_y = get_predicted_values(model, train_x)
save_result(train_y, pred_y, 'kmeans.train.csv.gz')

# Predict and save validation values
pred_y = get_predicted_values(model, val_x)
save_result(val_y, pred_y, 'kmeans.val.csv.gz')

# Predict, show and save test values
pred_y = get_predicted_values(model, test_x)
show_result(test_y, pred_y)
save_result(test_y, pred_y, 'kmeans.test.csv.gz')

Finally, we evaluate the KMeans model’s performance on the training, validation, and test datasets. We predict the trend values for each dataset and save the results in separate CSV files for later analysis.

KMeans Clustering Model Results

********************* RESULT TEST **********************
* Confusion Matrix (Top: Predicted - Left: Real)
[[153  96]
 [107 173]]
* Classification Report
              precision    recall  f1-score   support

           0       0.59      0.61      0.60       249
           1       0.64      0.62      0.63       280

    accuracy                           0.62       529
   macro avg       0.62      0.62      0.62       529
weighted avg       0.62      0.62      0.62       529

For the trend category ‘DOWN’ (class 0):

  • The precision is approximately 59%, indicating that out of all the instances classified as ‘DOWN’, 59% of them are true negative predictions.
  • The recall is approximately 61%, indicating that the model correctly captures 61% of all actual ‘DOWN’ trends.
  • The F1-Score for the ‘DOWN’ category is around 60%, representing the harmonic mean of precision and recall.

For the trend category ‘UP’ (class 1):

  • The precision is approximately 64%, indicating that out of all the instances classified as ‘UP’, 64% of them are true positive predictions.
  • The recall is approximately 62%, indicating that the model correctly captures 62% of all actual ‘UP’ trends.
  • The F1-Score for the ‘UP’ category is around 63%, representing the harmonic mean of precision and recall.

The overall accuracy of the KMeans model is approximately 62%, which means that it correctly predicted the trend (either ‘UP’ or ‘DOWN’) in about 62% of the test dataset instances.

The KMeans clustering demonstrates a balanced performance in predicting both upward and downward trends. It has slightly higher precision and F1-Score for the ‘UP’ category, indicating its effectiveness in identifying profitable upward trends.

Conclusion

In this fourth part, we explored the application of KMeans clustering as a technique for trend detection.

In a future part of our series, we will compare the performance of the KMeans-based trend detection with the previously explored models. Using the F1-Score as our evaluation metric, we aim to identify the most accurate and reliable approach for detecting trends in financial markets.

If you enjoy my work, please support me on Medium by becoming a member through my referral link, and consider giving it a clap as a small gesture of motivation. Thank you!

Download the full source code and the colab notebook of this article from here

Twitter / X: https://twitter.com/diegodegese LinkedIn: https://www.linkedin.com/in/ddegese Github: https://github.com/crapher

Disclaimer: Investing in the stock market involves risk and may not be suitable for all investors. The information provided in this article is for educational purposes only and should not be construed as investment advice or a recommendation to buy or sell any particular security. Always do your own research and consult with a licensed financial advisor before making any investment decisions. Past performance is not indicative of future results.

A Message from InsiderFinance

Thanks for being a part of our community! Before you go:

Stock Market
Trending
Stock Trading
Machine Learning
K Means
Recommended from ReadMedium