avatarSuhail Saqan

Summary

The website content discusses the application of machine learning, specifically K-means clustering, to identify support and resistance lines in stock trading.

Abstract

The article "Using Machine Learning to Locate Support and Resistance Lines for Stocks" explains the concept of support and resistance lines as price levels where stock prices may reverse due to a concentration of market interest. It advocates for the use of machine learning, particularly unsupervised classification methods like K-means clustering, to detect these critical price levels by recognizing patterns in historical stock data without explicit programming instructions. The author uses Python and the Yahoo Finance API to gather stock data and applies K-means clustering to determine the optimal number of clusters (support and resistance lines) using the Elbow and Silhouette methods. The Elbow method suggests an optimal K value of 4, while the Silhouette method indicates a K value of 3. The article concludes that using both methods provides a more robust identification of support and resistance lines, which can be crucial for trading strategies.

Opinions

  • The author believes that machine learning is more suitable for identifying support and resistance lines than traditional algorithmic methods because it can autonomously recognize patterns in data.
  • The preference for unsupervised classification, such as K-means clustering, is based on its ability to find undetected patterns without pre-existing labels and minimal human intervention.
  • The author suggests that the Elbow method, when visually represented, can indicate the optimal number of clusters by identifying a sudden decrease in the rate of change of the average distance to the centroid.
  • The Silhouette method is presented as a complementary technique to evaluate the quality of the clusters by measuring how similar a point is to its own cluster compared to others, with a higher score indicating better-defined clusters.
  • The author recommends using both the Elbow and Silhouette methods to ensure the selection of the most optimal number of clusters for K-means clustering in financial data analysis.
  • The article promotes the use of the Yahoo Finance API for data collection due to its ease of use and the ability to obtain data at various intervals, which can influence the strength of identified support and resistance areas.
  • The author endorses an AI service, ZAI.chat, as a cost-effective alternative to ChatGPT Plus (GPT-4), highlighting its value for money.

Using Machine Learning to Locate Support and Resistance Lines for Stocks

Support and Resistance

Support and Resistance lines are defined as certain levels of the assets price at which the price action may stop and/or reverse due to a larger number of interested investors in those price levels. They could be detected using the stock’s historical data. You could read this article for more information.

Machine Learning

The reason why I decided to use Machine Learning for this process is because it tends to be more appropriate than giving a computer a set of commands to follow using the data and executing it. With Machine Learning, the computer itself utilizes the data in order to recognize correlation and patterns between them. Basically, if you give the computer a series of a stocks data at which the stock price hits a certain level multiple times but tends to get rejected by it, it should be able to classify this pattern. At the same time, we could have two types of these rejections, one as the stock price is moving up and the other as it moves down. One method to solve this is using unsupervised classification.

Unsupervised classification is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels using minimum human supervision. The computer would find similarities among data sets and arrange them in different clusters and classifications.

In this example we will be using K-means clustering. In simple terms, it tries to create K number of clusters (collection of data points aggregated together because of certain similarities) based on the number of centroids we need in the dataset. A centroid is the imaginary or real location representing the center of the cluster.

K-means Clustering

Python

I will be using the Yahoo Finance API to download our data. It also allows you to get data for various different intervals. I will be using the 1 minute interval for one day. There could be support and resistance areas on any interval you look at- the longer the interval the stronger they would be.

First thing to do is import the Python libraries we need- sklearn, yfinance, pandas, numpy, and matplotlib. After that we define the start and end dates (I picked the day I wrote this article) and the ticker then inputs them into the yfinance function. We will also be separating the low and high data into different variables.

When you print out the data it should look something like this. We will be given the date and time, open, high, low, close, and volume.

Stock Data

How could we figure out the number of clusters that is best to split our data into?

As discussed earlier, we need to figure out the value of K. This also becomes more difficult as the dimension of the data increases. There are two popular methods to accomplish this- Elbow Method and the Silhouette Method. I will be doing it both ways to demonstrate and compare.

1. The Elbow Method:

In this method, we pick a range for the values of K, then apply K-Means clustering using each of the values of K. Find the average distance of each point in a cluster to its centroid, and represent it in a plot. After that we pick the optimum value of K using the plot.

The picture shown below is the graph of the Inertia vs the K value. Inertia is defined as the mean squared distance between each instance and its closest centroid. In simpler terms, it is the graph of the mean distance of each point in the clusters from its centroids vs the number of clusters.

As you can see, the average distance decreases as the number of clusters increases. Increasing the value of K will decrease the inertia for your model. An inertia of 0 would mean each point is 0 distance to its cluster center. In order to find the optimum number of clusters we need to look at where the rate of change of the distance decreases suddenly.

Using the graph, we can conclude that 4 is an optimal value for K.

Inertia vs K

What we were basically doing is picking the value of K which will separate the clusters the best. We tried to test K=2,3,4. As you can see in this picture, K=4 fits the best.

K-Means Clustering

2. The Silhouette Method

  • s(o) is the silhouette coefficient of the data point o
  • a(o) is the average distance between o and all the other data points in the cluster to which o belongs
  • b(o) is the minimum average distance from 0 to all clusters to which o does not belong

The silhouette coefficient is a value between -1 and 1. Value of 1 means clusters are well apart from each other and clearly distinguished. 0 means clusters are indifferent, or the distance is not significant. -1 means clusters were not assigned properly.

The silhouette value measures how similar a point is to its own cluster (cohesion) compared to other clusters (separation).

In order to get the silhouette score we average all the points out. After calculating it for each of the K values, we pick the value with the highest score. As you can see in the picture below, the silhouette score for K=3 was the highest for both the values of the high(red) and low(blue) stock prices.

Silhouette Scores vs K

Elbow vs Silhouette

After we received our K values using both methods, we use the center of each cluster as the support and resistances for our stock.

Since we came up with a K value of 3 using the Silhouette and 4 using the Elbow that will be the number of supports and resistances we will plot on our graphs.

Elbow Method
Silhouette Method

Although we got different results for both, by looking at the chart you could see how the Elbow method had better drawn supports and resistances. It is always best to use both methods just to make sure you select the most optimal number of clusters in K-means clustering.

Data Science
Machine Learning
Stock Market
Trading
Python
Recommended from ReadMedium