Exploring Trend Detection with 1-Minute Data in the First 15 Minutes of Each Day (Part 4)

In this fourth part of our trend detection series, we will explore the application of KMeans clustering as a technique for identifying trends in financial markets.

KMeans is a popular unsupervised learning algorithm for clustering data points into distinct groups based on similarity. We will explain how KMeans works and, using the baseline methods, will try to improve our trend detection.

Understanding KMeans Clustering

KMeans clustering is an iterative algorithm that aims to partition data points into K clusters, where each data point belongs to the cluster with the nearest mean (centroid). The algorithm’s key steps are as follows:

Random Initialization: K initial centroids are randomly chosen from the data points.
Assignment: Each data point is assigned to the cluster whose centroid is closest to it. This is done based on the Euclidean distance between the data point and the centroid.
Update Centroids: After the assignment step, the centroids of the clusters are updated as the mean of the data points within each cluster.
Repeat: Steps 2 and 3 are repeated until the centroids converge and the clusters stabilize.

The final result is a set of K clusters, and each data point is associated with the cluster to which it belongs. KMeans is an efficient and effective method for grouping data points with similar characteristics.

Before I continue sharing all the information, if you enjoy reading my articles, please hit the follow button — Diego Degese

Implementing trend detection with KMeans

In order to streamline our code and ensure consistent data processing, we will employ the same baseline methods and dataset as used in Part 1. By leveraging the clustering capabilities of KMeans, we aim to enhance the ability to identify and distinguish different market trends.

Exploring Trend Detection with 1-Minute Data in the First 15 Minutes of Each Day (Part 1)

Welcome to the first part of our series of articles on trend detection using 1-minute data in the first 15 minutes of…

medium.com

Let’s explore the code implementation of trend detection with the KMeans clustering.

Importing Required Libraries

from sklearn.cluster import KMeans

We start by importing the necessary libraries. We import the KMeans class from the sklearn library, which is a widely used machine-learning library in Python.

Defining the method to predict the values

def get_predicted_values(model, features):

    pred_y = model.predict(features)
    
    # Transform 0s to 1s and 1s to 0s (In this example we have the values in the opposite site)
    pred_y = ~pred_y + 2

    return pred_y

In this step, we define a method that receives the model and the features and returns all the predicted values.

Since Kmeans is an unsupervised learning algorithm, the assignment of categories to 0s or 1s is not known beforehand. To address this, I manually inspected the data and found that the 0s and 1s were inverted in the groups. Consequently, I corrected the inversion by applying the transformation: pred_y = ~pred_y + 2.

Training the KMeans Clustering Model

# Train Model
model = KMeans(n_clusters=2, random_state=42, n_init="auto")
model.fit(train_x)

In this step, we create a KMeans clustering model using the KMeans class from sklearn. We specify that we want to create 2 clusters (n_clusters=2) since we are interested in separating the data into two trends: 'UP' and 'DOWN'. We set the random_state for reproducibility, and n_init determines the number of times the algorithm will be run with different initial centroids, automatically set here to achieve convergence.

We then train the KMeans Clustering model using the training data. The model will determine two clusters based on the features of the data.

Evaluating the KMeans Clustering Model and Saving Results

# Predict and save train values
pred_y = get_predicted_values(model, train_x)
save_result(train_y, pred_y, 'kmeans.train.csv.gz')

# Predict and save validation values
pred_y = get_predicted_values(model, val_x)
save_result(val_y, pred_y, 'kmeans.val.csv.gz')

# Predict, show and save test values
pred_y = get_predicted_values(model, test_x)
show_result(test_y, pred_y)
save_result(test_y, pred_y, 'kmeans.test.csv.gz')

Finally, we evaluate the KMeans model’s performance on the training, validation, and test datasets. We predict the trend values for each dataset and save the results in separate CSV files for later analysis.

KMeans Clustering Model Results

********************* RESULT TEST **********************
* Confusion Matrix (Top: Predicted - Left: Real)
[[153  96]
 [107 173]]
* Classification Report
              precision    recall  f1-score   support

           0       0.59      0.61      0.60       249
           1       0.64      0.62      0.63       280

    accuracy                           0.62       529
   macro avg       0.62      0.62      0.62       529
weighted avg       0.62      0.62      0.62       529

For the trend category ‘DOWN’ (class 0):

The precision is approximately 59%, indicating that out of all the instances classified as ‘DOWN’, 59% of them are true negative predictions.
The recall is approximately 61%, indicating that the model correctly captures 61% of all actual ‘DOWN’ trends.
The F1-Score for the ‘DOWN’ category is around 60%, representing the harmonic mean of precision and recall.

For the trend category ‘UP’ (class 1):

The precision is approximately 64%, indicating that out of all the instances classified as ‘UP’, 64% of them are true positive predictions.
The recall is approximately 62%, indicating that the model correctly captures 62% of all actual ‘UP’ trends.
The F1-Score for the ‘UP’ category is around 63%, representing the harmonic mean of precision and recall.

The overall accuracy of the KMeans model is approximately 62%, which means that it correctly predicted the trend (either ‘UP’ or ‘DOWN’) in about 62% of the test dataset instances.

The KMeans clustering demonstrates a balanced performance in predicting both upward and downward trends. It has slightly higher precision and F1-Score for the ‘UP’ category, indicating its effectiveness in identifying profitable upward trends.

Trend Detection 1-Minute Data Series

undefined

Conclusion

In this fourth part, we explored the application of KMeans clustering as a technique for trend detection.

In a future part of our series, we will compare the performance of the KMeans-based trend detection with the previously explored models. Using the F1-Score as our evaluation metric, we aim to identify the most accurate and reliable approach for detecting trends in financial markets.

If you enjoy my work, please support me on Medium by becoming a member through my referral link, and consider giving it a clap as a small gesture of motivation. Thank you!

Download the full source code and the colab notebook of this article from here

Twitter / X: https://twitter.com/diegodegese LinkedIn: https://www.linkedin.com/in/ddegese Github: https://github.com/crapher

Disclaimer: Investing in the stock market involves risk and may not be suitable for all investors. The information provided in this article is for educational purposes only and should not be construed as investment advice or a recommendation to buy or sell any particular security. Always do your own research and consult with a licensed financial advisor before making any investment decisions. Past performance is not indicative of future results.

A Message from InsiderFinance

Thanks for being a part of our community! Before you go:

👏 Clap for the story and follow the author 👉
📰 View more content in the InsiderFinance Wire
📚 Take our FREE Masterclass
📈 Discover Powerful Trading Tools