avatarMr. Q

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6171

Abstract

caption></figure><p id="20b4">We did the copy here because we will do many computations based on this sub DataFrame. We also sorted by the index (dates), as we need the time series in ascending order. The API may have done the job, but it’s good to make sure.</p><h2 id="9ca3">Period Percentage Change and Log Return</h2><p id="2a24">Pandas is a good tool as it provides us with all Excel can do (and more) in an easy way. In Excel, we could calculate row by row percentage change by referencing the ranges and doing the simple math, then fill down the formulas. In most programming languages, we have to do a loop. <b>With Pandas, it is easier!</b></p><figure id="9462"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2836uPRdxJz3MSRvdKFzdA.png"><figcaption></figcaption></figure><p id="78dc">In our example, we took the column “<b>close</b>”, then apply the function “<b>pct_change</b>”, which is used to create a new column “<b>pct_chg_daily</b>”. Notice that the first value of the column is <b>NaN </b>as the function “pct_change” starts from the second row then automatically rolling down. This is also why we need the time series in ascending order. We could drop the NaN by using the function “<b>dropna</b>”.</p><figure id="ec5a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*FoZsbqauxcT5YNwOwpKBEw.png"><figcaption></figcaption></figure><p id="5b0e">In many financial analyses, we need log returns, i.e. we take the log of each number then calculate net change. To do so, we need another package <b>Numpy</b>, which provides mathematical and statistical functions. Let’s import the package and give it a short name “<b>np</b>”.</p><figure id="21b3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sX3uJWNkyNFcCupJSnjofQ.png"><figcaption></figcaption></figure><p id="c55f">We now can use the function “<b>np.log</b>” to the column “close”, then apply the net change function “<b>diff</b>”.</p><figure id="5673"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Y8F3kxvMo5iS9cHzvlawyg.png"><figcaption></figcaption></figure><p id="976a">Like the function “<b>pct_change</b>”, “<b>diff</b>” starts from the second row and calculate the net change on a rolling basis.</p><p id="de04">What if we want to calculate 5 days percentage changes? One of the easiest ways is to use the function “<b>shift</b>”.</p><figure id="5e01"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*SOhulMYG-FdsVl92ilVHsA.png"><figcaption></figcaption></figure><p id="dbb0">The function “<b>shift</b>” will take the previous value down by the number of rows given as the input. Here we created one new column “<b>close_5d</b>” with the shifted values, for example, 216.62 is the close on 2015–06–22, which is moved down to align with the date 2015–06–26.</p><p id="433c">We then can do the simple math for the percentage change.</p><figure id="0aea"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xb46jMsTTSGwPXOFeADjDQ.png"><figcaption></figcaption></figure><h2 id="91c8">Rolling</h2><p id="3d3e">Some time-series computation is based on a rolling window. For example, if we need a simple moving average, we need to take a window (a sub-series) to calculate the average, then roll it down with the fixed size of the window. We can easily achieve so with the function “<b>rolling</b>”.</p><figure id="b291"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ennWhWcTWtoRrjeXwrbeSQ.png"><figcaption></figcaption></figure><p id="1420">Here we used the function “<b>rolling</b>” with the column “close”. We set the window size to be 5, so it will take the first 5 days to calculate the average with the function “<b>mean</b>”, then move down one day with another 5 days window. We then created a new column “SMA5”.</p><p id="8751">The function “rolling” works with the function “apply” too for more advanced computations. For example, what if we would like to do the linear regression on the rolling basis and get the slope of the regression line, i.e. rolling slope. We can use “<b>rolling</b>” with the function “<b>apply</b>”.</p><figure id="65cb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*72jC6Y63Lup8wX0fxKYBjg.png"><figcaption></figcaption></figure><p id="f4c1">To use the function “<b>apply</b>” we need to define a custom function that we would use first. We created the function “<b>regression</b>”. Like other function for “<b>apply</b>”, it should take one input and return one value as output. Within the function, we used the “<b>np.ployfit</b>” function (remember np is the short name we gave to import NumPy), which does the regression. Three parameters needed: “<b>x</b>”, “<b>y</b>” and “<b>degree</b>” (with value 1, it does linear regression). Here the little trick we did is creating an integer series given the length of the “<b>y</b>” series (like 0, 1, 2, 3, 4…).</p><h2 id="0efc">Correlation</h2><p id="ab84">Correlation is a common thing to look at when you work with time-series data. For example, the correlation matrix helps us to adjust holdings, as we don’t like concentration risk. The rolling correlation could be a good market indicator for trading. Let’s have a look at how we can do so with DataFrame.</p><p id="4e7d">To do so, let’s prepare another time series data for “<b>IBM</b>”. First, download the daily bars of “IBM”.</p><figure id="a876"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Xny62Fk05dtLfSSrpuDgrQ.png"><figcaption></figcaption></figure><p id="1659">Second, we can join the column “close” from both DataFrames. Here we used the function “<b>join</b>”. It is like function “<b>merge</b>” (we learnt it in the last session) to combine two DataFrames. With “<b>merge</b>” we need to specify the matching column (the parameter “on”), but with “<b>join</b>” we match the index if we leave the parameter “on” as default.</p><figure id="d9e7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7-zAbpZvSFBe2j5sHg_S2w.png"><figcaption></figcaption></figure><p id="987d">We used the “<b>inner</b>” join (you may check the last session for detail). We need the dates to

Options

be aligned for both time series. We then applied the left and right suffix to avoid the conflict column name “<b>close</b>”.</p><p id="78db">Third, we compute the log return series based on the joined DataFrame and keep the log return columns only.</p><figure id="ba68"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*T2uUP2HFGbRhBmXM-pkP4w.png"><figcaption></figcaption></figure><p id="8948">Finally, we can calculate the correlation matrix by just calling the function “<b>corr</b>”.</p><figure id="1832"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*I20nFPjjG5Id0gZXuBZ0qg.png"><figcaption></figcaption></figure><p id="8037">For singe correlation number we can do the following.</p><figure id="8caf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*SetQEIvCKvvB-bV-Zj9GvQ.png"><figcaption></figcaption></figure><p id="c05e">Take the column “<b>ret_spy</b>” and apply the function “corr” with another column “<b>ret_ibm</b>”.</p><p id="9322">If we would like to calculate the rolling 30 days correlation between “IBM” and “SPY”, we could use the function “<b>rolling</b>”.</p><figure id="0b3a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ZgfmTUnv9--UugEewtEktQ.png"><figcaption></figcaption></figure><p id="7147">Take the column “<b>ret_spy</b>” and apply the function “<b>rolling</b>” to create the rolling window. Then use the function “<b>corr</b>” with another column “<b>ret_ibm</b>”.</p><h2 id="caea">Technical Analysis</h2><p id="affd">The final topic of this session is about Technical Analysis. This is one of the common analysis of trading. We could use <b>Numpy </b>and <b>Pandas </b>to create the study, but since we have the package already, why not use it directly. Let’s start with installing the package “<b>ta</b>” and import it, as it’s not included as default.</p><figure id="9121"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dQBmw9Bd8jn81SbVKUXNsQ.png"><figcaption></figcaption></figure><figure id="6f55"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*3PJR9n3ingT4lEGHOYeWpw.png"><figcaption></figcaption></figure><p id="a730">The exponential moving average is the base for many other indicators, so even we would like to create our studies, we can use the package “ta” and use those basic building components. Let’s do one example to calculate 20 days exponential moving average.</p><figure id="8649"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*lZ9-p_r5wTz4mVQzhPmDiw.png"><figcaption></figcaption></figure><figure id="1e64"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Xcwpi4TIIVdlcxougBRzDg.png"><figcaption></figcaption></figure><p id="764b">With the first line of code, we created the exponential moving average indicator, which is under the module “<b>trend</b>”. The package “<b>ta</b>” groups the studies into different categories. The indicator created (variable “<b>ema20</b>”) is an object, where we can call the function “<b>ema_indicator</b>” to create the time series and add one more column “<b>ema20</b>” into our original DataFrame.</p><p id="a51c">Let’s have a look at another example, Ichimoku.</p><figure id="4165"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MtY-i4bxRqmAH_y2ORwnYQ.png"><figcaption></figcaption></figure><figure id="10bf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*bzpIKXOLAR4q_Sfk3FmnCQ.png"><figcaption></figcaption></figure><p id="501a">Again, the first line created the Ichimoku study object. We then used two functions “<b>ichimoku_base_line</b>” and “<b>ichimoku_conversion_line</b>” to get the base and conv columns into our DataFrame.</p><p id="6d28">For more details, we can check the documentation.</p><div id="da6f" class="link-block"> <a href="https://github.com/bukosabino/ta"> <div> <div> <h2>bukosabino/ta</h2> <div><h3>It is a Technical Analysis library to financial time series datasets (open, close, high, low, volume). You can use it…</h3></div> <div><p>github.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*t8Del-8IZOD1yFx5)"></div> </div> </div> </a> </div><div id="5204" class="link-block"> <a href="https://technical-analysis-library-in-python.readthedocs.io/en/latest/"> <div> <div> <h2>Welcome to Technical Analysis Library in Python’s documentation! — Technical Analysis Library in…</h2> <div><h3>It is a Technical Analysis library to financial time series datasets (open, close, high, low, volume). You can use it…</h3></div> <div><p>technical-analysis-library-in-python.readthedocs.io</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/)"></div> </div> </div> </a> </div><p id="1e14">Great job! We’ve done the essential time series based analysis with <b>Pandas </b>and <b>Numpy</b>. We also learnt a new package “<b>ta</b>” for technical analysis. Next session will be our last session with the series “<b>Python Tools</b>”. We will check out essential charting with <b>Matplotlib</b>.</p><div id="e660" class="link-block"> <a href="https://readmedium.com/python-for-financial-analysis-series-python-tools-day-6-91eb5ce20f78"> <div> <div> <h2>Python for Financial Analysis Series — Python Tools Day 6</h2> <div><h3>Let’s Matplotlib! This is the last session of the series. We will do the essential data visualization with Matplotlib…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*Mtq4G8c6vkHOeNtUQbkYqA.jpeg)"></div> </div> </div> </a> </div></article></body>

Python for Financial Analysis Series — Python Tools Day 5

It’s all about essential time series analysis in this session, from log returns, rolling based statistics to technical analysis. We will also learn how to use IEX (one of the most popular financial data cloud) to retrieve historical daily prices.

IMPORTANT UPDATE

Since Microsoft Azure Notebooks has moved to GitHub driven, which may be challenging for a beginner to start with. I have put all the material under Google Colab. It is a very similar Jupyter Notebook environment and you may check it out by click here.

All my code and files are shared and you may click here to open.

Data API (IEX Cloud)

Before we start, we need the time series data. One of the most common time-series data is historical prices. More precisely, it’s historical bars, i.e. open, high, low and close. In today’s session, we will use the historical daily bars of SPY (Biggest S&P 500 ETF) as an example. We will request the data dynamically using one of the popular financial data APIs, IEX Cloud.

IEX Cloud offers API to access a huge amount of financial data, from simple stock market data to corporate actions, fundamentals, company researches, FX, commodity, economics and crypto. It also provides premium data such as Machine Learning Stock Rankings.

Such a good product won’t be free, but for our learning purpose, we just need to create an account and skip any subscriptions for free testing data.

We can register for free testing data!

Let’s go to the register page and create a free account: https://iexcloud.io/cloud-login#/register/

When asked to select a plan, please skip (ignore). We don’t have to have any plan to test the API with sample data.

Once done the register, we can log in and go to the Console. First, we need to turn the “SANDBOX TESTING” on, then to go to the “API Tokens” tab to copy the “PUBLISHABLE” token.

That’s all set!

One small note about API vs SDK. In general, API (Application Programming Interface) is a kind of service to provide certain data or function. On the other hand, the SDK (Software Development Kit) is a language-specific package/library (on top of the API) to provide easy ways to code with specific languages. For example, we have IEX Cloud data API and it has Python SDK.

Download Data

Back to Microsoft Azure Notebooks, in our project, we need to install the IEX Cloud Python SDK for data download. To do so, we can use the command “!pip install”.

pyEx” is the Python SDK for IEX Cloud APIs. Python won’t be useful without open source packages, for example, Pandas, which we have been using in the last four sessions. The reason why we could import Pandas is that Microsoft Azure Notebooks has already installed it. If the package is not installed as default, we need to use PIP to install it (PIP is a Python package management tool). We can use the command “pip install” with the package name (optionally with version) to install the package. In most managed Jupyter environments, such as Microsoft Azure Notebooks and Google Colab, we can use “!pip install”. In our example, we installed “pyEX”.

Next step let’s import the “pyEX”.

We then need to initialize the connection for data downloading.

Notice that you need your API token and the version is the sandbox. Once the connection is ready, we can download the data.

The function “chartDF” takes the ticker “SPY” with time frame “5y” and returns the DataFrame of the historical daily bars. If you would like to explore more functions, please check the document link below.

There are many columns we won’t use today, so let’s only take open, high, low and close.

We did the copy here because we will do many computations based on this sub DataFrame. We also sorted by the index (dates), as we need the time series in ascending order. The API may have done the job, but it’s good to make sure.

Period Percentage Change and Log Return

Pandas is a good tool as it provides us with all Excel can do (and more) in an easy way. In Excel, we could calculate row by row percentage change by referencing the ranges and doing the simple math, then fill down the formulas. In most programming languages, we have to do a loop. With Pandas, it is easier!

In our example, we took the column “close”, then apply the function “pct_change”, which is used to create a new column “pct_chg_daily”. Notice that the first value of the column is NaN as the function “pct_change” starts from the second row then automatically rolling down. This is also why we need the time series in ascending order. We could drop the NaN by using the function “dropna”.

In many financial analyses, we need log returns, i.e. we take the log of each number then calculate net change. To do so, we need another package Numpy, which provides mathematical and statistical functions. Let’s import the package and give it a short name “np”.

We now can use the function “np.log” to the column “close”, then apply the net change function “diff”.

Like the function “pct_change”, “diff” starts from the second row and calculate the net change on a rolling basis.

What if we want to calculate 5 days percentage changes? One of the easiest ways is to use the function “shift”.

The function “shift” will take the previous value down by the number of rows given as the input. Here we created one new column “close_5d” with the shifted values, for example, 216.62 is the close on 2015–06–22, which is moved down to align with the date 2015–06–26.

We then can do the simple math for the percentage change.

Rolling

Some time-series computation is based on a rolling window. For example, if we need a simple moving average, we need to take a window (a sub-series) to calculate the average, then roll it down with the fixed size of the window. We can easily achieve so with the function “rolling”.

Here we used the function “rolling” with the column “close”. We set the window size to be 5, so it will take the first 5 days to calculate the average with the function “mean”, then move down one day with another 5 days window. We then created a new column “SMA5”.

The function “rolling” works with the function “apply” too for more advanced computations. For example, what if we would like to do the linear regression on the rolling basis and get the slope of the regression line, i.e. rolling slope. We can use “rolling” with the function “apply”.

To use the function “apply” we need to define a custom function that we would use first. We created the function “regression”. Like other function for “apply”, it should take one input and return one value as output. Within the function, we used the “np.ployfit” function (remember np is the short name we gave to import NumPy), which does the regression. Three parameters needed: “x”, “y” and “degree” (with value 1, it does linear regression). Here the little trick we did is creating an integer series given the length of the “y” series (like 0, 1, 2, 3, 4…).

Correlation

Correlation is a common thing to look at when you work with time-series data. For example, the correlation matrix helps us to adjust holdings, as we don’t like concentration risk. The rolling correlation could be a good market indicator for trading. Let’s have a look at how we can do so with DataFrame.

To do so, let’s prepare another time series data for “IBM”. First, download the daily bars of “IBM”.

Second, we can join the column “close” from both DataFrames. Here we used the function “join”. It is like function “merge” (we learnt it in the last session) to combine two DataFrames. With “merge” we need to specify the matching column (the parameter “on”), but with “join” we match the index if we leave the parameter “on” as default.

We used the “inner” join (you may check the last session for detail). We need the dates to be aligned for both time series. We then applied the left and right suffix to avoid the conflict column name “close”.

Third, we compute the log return series based on the joined DataFrame and keep the log return columns only.

Finally, we can calculate the correlation matrix by just calling the function “corr”.

For singe correlation number we can do the following.

Take the column “ret_spy” and apply the function “corr” with another column “ret_ibm”.

If we would like to calculate the rolling 30 days correlation between “IBM” and “SPY”, we could use the function “rolling”.

Take the column “ret_spy” and apply the function “rolling” to create the rolling window. Then use the function “corr” with another column “ret_ibm”.

Technical Analysis

The final topic of this session is about Technical Analysis. This is one of the common analysis of trading. We could use Numpy and Pandas to create the study, but since we have the package already, why not use it directly. Let’s start with installing the package “ta” and import it, as it’s not included as default.

The exponential moving average is the base for many other indicators, so even we would like to create our studies, we can use the package “ta” and use those basic building components. Let’s do one example to calculate 20 days exponential moving average.

With the first line of code, we created the exponential moving average indicator, which is under the module “trend”. The package “ta” groups the studies into different categories. The indicator created (variable “ema20”) is an object, where we can call the function “ema_indicator” to create the time series and add one more column “ema20” into our original DataFrame.

Let’s have a look at another example, Ichimoku.

Again, the first line created the Ichimoku study object. We then used two functions “ichimoku_base_line” and “ichimoku_conversion_line” to get the base and conv columns into our DataFrame.

For more details, we can check the documentation.

Great job! We’ve done the essential time series based analysis with Pandas and Numpy. We also learnt a new package “ta” for technical analysis. Next session will be our last session with the series “Python Tools”. We will check out essential charting with Matplotlib.

Python
Data Science
Programming
Finance
Recommended from ReadMedium