avataralpha2phi

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

12956

Abstract

an>) ts = pd.Series(np.random.randn(len(dt_ranges)), <span class="hljs-attribute">index</span>=dt_ranges) ts</pre></div><div id="fb64"><pre>2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> <span class="hljs-string">-0</span>.324954 2020<span class="hljs-string">-12</span><span class="hljs-string">-01</span> <span class="hljs-string">-1</span>.149071 2020<span class="hljs-string">-12</span><span class="hljs-string">-02</span> <span class="hljs-string">-1</span>.766691 2020<span class="hljs-string">-12</span><span class="hljs-string">-03</span> 1.114326 2020<span class="hljs-string">-12</span><span class="hljs-string">-04</span> 0.454338 ...
2021<span class="hljs-string">-11</span><span class="hljs-string">-25</span> <span class="hljs-string">-0</span>.363843 2021<span class="hljs-string">-11</span><span class="hljs-string">-26</span> 0.815752 2021<span class="hljs-string">-11</span><span class="hljs-string">-27</span> 2.276535 2021<span class="hljs-string">-11</span><span class="hljs-string">-28</span> 0.964677 2021<span class="hljs-string">-11</span><span class="hljs-string">-29</span> 1.079887 Freq: D, Length: 365, dtype: float64</pre></div><p id="ff0f">Resample to monthly,</p><div id="81f2"><pre># Roll <span class="hljs-keyword">up</span> <span class="hljs-keyword">to</span> month <span class="hljs-keyword">ts</span>.resample(<span class="hljs-string">'M'</span>).sum()</pre></div><div id="79e9"><pre><span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> -<span class="hljs-number">0</span>.<span class="hljs-number">324954</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">31</span> <span class="hljs-number">1</span>.<span class="hljs-number">414100</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">01</span>-<span class="hljs-number">31</span> <span class="hljs-number">4</span>.<span class="hljs-number">912475</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">02</span>-<span class="hljs-number">28</span> -<span class="hljs-number">1</span>.<span class="hljs-number">587236</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">03</span>-<span class="hljs-number">31</span> -<span class="hljs-number">7</span>.<span class="hljs-number">061592</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">04</span>-<span class="hljs-number">30</span> <span class="hljs-number">8</span>.<span class="hljs-number">470704</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">05</span>-<span class="hljs-number">31</span> -<span class="hljs-number">3</span>.<span class="hljs-number">302751</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">06</span>-<span class="hljs-number">30</span> <span class="hljs-number">9</span>.<span class="hljs-number">050472</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">07</span>-<span class="hljs-number">31</span> <span class="hljs-number">3</span>.<span class="hljs-number">256410</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">08</span>-<span class="hljs-number">31</span> <span class="hljs-number">7</span>.<span class="hljs-number">495364</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">09</span>-<span class="hljs-number">30</span> <span class="hljs-number">4</span>.<span class="hljs-number">506512</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">10</span>-<span class="hljs-number">31</span> <span class="hljs-number">6</span>.<span class="hljs-number">600639</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">0</span>.<span class="hljs-number">966138</span> <span class="hljs-attribute">Freq</span>: M, dtype: float64</pre></div><p id="7c6f">Resample to quarterly,</p><div id="3906"><pre># Roll <span class="hljs-keyword">up</span> <span class="hljs-keyword">to</span> quarter <span class="hljs-keyword">ts</span>.resample(<span class="hljs-string">'Q'</span>).sum()</pre></div><div id="7b82"><pre><span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">31</span> <span class="hljs-number">1</span>.<span class="hljs-number">089146</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">03</span>-<span class="hljs-number">31</span> -<span class="hljs-number">3</span>.<span class="hljs-number">736353</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">06</span>-<span class="hljs-number">30</span> <span class="hljs-number">14</span>.<span class="hljs-number">218426</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">09</span>-<span class="hljs-number">30</span> <span class="hljs-number">15</span>.<span class="hljs-number">258287</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">12</span>-<span class="hljs-number">31</span> <span class="hljs-number">7</span>.<span class="hljs-number">566778</span> <span class="hljs-attribute">Freq</span>: Q-DEC, dtype: float64</pre></div><h1 id="97ef">Downsampling</h1><p id="d169">You can down sample your data, e.g. from every 1 minute to every 5 minutes.</p><p id="f77a">Generate data for every 1 minute,</p><div id="888f"><pre>dt_ranges = pd.date_range(datetime.now(), <span class="hljs-attribute">periods</span>=100, <span class="hljs-attribute">freq</span>=<span class="hljs-string">'T'</span>, <span class="hljs-attribute">normalize</span>=<span class="hljs-literal">True</span>) ts = pd.Series(np.random.randn(len(dt_ranges)), <span class="hljs-attribute">index</span>=dt_ranges) ts</pre></div><div id="a010"><pre>2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 00:00:00 0.088538 2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 00:01:00 0.632649 2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 00:02:00 1.060944 2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 00:03:00 <span class="hljs-string">-1</span>.153388 2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 00:04:00 <span class="hljs-string">-0</span>.363503 ...
2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 01:35:00 <span class="hljs-string">-2</span>.070810 2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 01:36:00 1.414299 2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 01:37:00 <span class="hljs-string">-0</span>.337969 2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 01:38:00 0.076367 2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 01:39:00 <span class="hljs-string">-0</span>.744389 Freq: T, Length: 100, dtype: float64</pre></div><p id="d895">Down sample to every 5 minutes,</p><div id="2473"><pre>ts.resample(<span class="hljs-string">'5min'</span>, <span class="hljs-attribute">closed</span>=<span class="hljs-string">'right'</span>, <span class="hljs-attribute">label</span>=<span class="hljs-string">'right'</span>).sum()</pre></div><div id="4b40"><pre><span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">00</span>:<span class="hljs-number">00</span> <span class="hljs-number">0</span>.<span class="hljs-number">088538</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">05</span>:<span class="hljs-number">00</span> -<span class="hljs-number">1</span>.<span class="hljs-number">234836</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">10</span>:<span class="hljs-number">00</span> -<span class="hljs-number">0</span>.<span class="hljs-number">531198</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">15</span>:<span class="hljs-number">00</span> -<span class="hljs-number">0</span>.<span class="hljs-number">723730</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">20</span>:<span class="hljs-number">00</span> <span class="hljs-number">2</span>.<span class="hljs-number">609681</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">25</span>:<span class="hljs-number">00</span> <span class="hljs-number">4</span>.<span class="hljs-number">472404</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">30</span>:<span class="hljs-number">00</span> -<span class="hljs-number">1</span>.<span class="hljs-number">870135</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">35</span>:<span class="hljs-number">00</span> -<span class="hljs-number">1</span>.<span class="hljs-number">153896</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">40</span>:<span class="hljs-number">00</span> <span class="hljs-number">1</span>.<span class="hljs-number">737836</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">45</span>:<span class="hljs-number">00</span> <span class="hljs-number">1</span>.<span class="hljs-number">967800</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">50</span>:<span class="hljs-number">00</span> -<span class="hljs-number">1</span>.<span class="hljs-number">921246</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">00</span>:<span class="hljs-number">55</span>:<span class="hljs-number">00</span> <span class="hljs-number">1</span>.<span class="hljs-number">024230</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">00</span>:<span class="hljs-number">00</span> -<span class="hljs-number">1</span>.<span class="hljs-number">191950</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">05</span>:<span class="hljs-number">00</span> -<span class="hljs-number">0</span>.<span class="hljs-number">739175</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">10</span>:<span class="hljs-number">00</span> <span class="hljs-number">0</span>.<span class="hljs-number">383030</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">15</span>:<span class="hljs-number">00</span> <span class="hljs-number">1</span>.<span class="hljs-number">728960</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">20</span>:<span class="hljs-number">00</span> -<span class="hljs-number">1</span>.<span class="hljs-number">539351</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">25</span>:<span class="hljs-number">00</span> <span class="hljs-number">1</span>.<span class="hljs-number">180227</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<spa

Options

n class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">30</span>:<span class="hljs-number">00</span> -<span class="hljs-number">2</span>.<span class="hljs-number">921451</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">35</span>:<span class="hljs-number">00</span> -<span class="hljs-number">0</span>.<span class="hljs-number">144566</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">01</span>:<span class="hljs-number">40</span>:<span class="hljs-number">00</span> <span class="hljs-number">0</span>.<span class="hljs-number">408309</span> <span class="hljs-attribute">Freq</span>: <span class="hljs-number">5</span>T, dtype: float64</pre></div><p id="fadb">You can also use the OHLC method.</p><div id="f52d"><pre><span class="hljs-meta"># OHLC resampling</span> ts.resample('5min').ohlc()</pre></div><figure id="7e0e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*N8kzr95DjXjT8IHA8O9PkA.png"><figcaption></figcaption></figure><h1 id="9fce">Upsampling</h1><p id="8d13">Generate a monthly time series data.</p><div id="8961"><pre>dt_ranges = pd.date_range(datetime.now(), <span class="hljs-attribute">periods</span>=12, <span class="hljs-attribute">freq</span>=<span class="hljs-string">'M'</span>, <span class="hljs-attribute">normalize</span>=<span class="hljs-literal">True</span>) ts = pd.Series(np.random.randn(len(dt_ranges)), <span class="hljs-attribute">index</span>=dt_ranges) ts</pre></div><div id="cb4f"><pre><span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">0</span>.<span class="hljs-number">422532</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">31</span> <span class="hljs-number">0</span>.<span class="hljs-number">116354</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">01</span>-<span class="hljs-number">31</span> -<span class="hljs-number">0</span>.<span class="hljs-number">723405</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">02</span>-<span class="hljs-number">28</span> <span class="hljs-number">0</span>.<span class="hljs-number">659457</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">03</span>-<span class="hljs-number">31</span> <span class="hljs-number">0</span>.<span class="hljs-number">535598</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">04</span>-<span class="hljs-number">30</span> <span class="hljs-number">2</span>.<span class="hljs-number">306383</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">05</span>-<span class="hljs-number">31</span> <span class="hljs-number">1</span>.<span class="hljs-number">431278</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">06</span>-<span class="hljs-number">30</span> <span class="hljs-number">0</span>.<span class="hljs-number">269275</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">07</span>-<span class="hljs-number">31</span> <span class="hljs-number">0</span>.<span class="hljs-number">375808</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">08</span>-<span class="hljs-number">31</span> <span class="hljs-number">0</span>.<span class="hljs-number">816700</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">09</span>-<span class="hljs-number">30</span> <span class="hljs-number">0</span>.<span class="hljs-number">227021</span> <span class="hljs-attribute">2021</span>-<span class="hljs-number">10</span>-<span class="hljs-number">31</span> -<span class="hljs-number">0</span>.<span class="hljs-number">269146</span> <span class="hljs-attribute">Freq</span>: M, dtype: float64</pre></div><p id="7d1d">Upsample to daily with forward fill.</p><div id="7475"><pre>ts<span class="hljs-selector-class">.resample</span>(<span class="hljs-string">'D'</span>)<span class="hljs-selector-class">.ffill</span>()</pre></div><div id="3cda"><pre>2020<span class="hljs-string">-11</span><span class="hljs-string">-30</span> 0.422532 2020<span class="hljs-string">-12</span><span class="hljs-string">-01</span> 0.422532 2020<span class="hljs-string">-12</span><span class="hljs-string">-02</span> 0.422532 2020<span class="hljs-string">-12</span><span class="hljs-string">-03</span> 0.422532 2020<span class="hljs-string">-12</span><span class="hljs-string">-04</span> 0.422532 ...
2021<span class="hljs-string">-10</span><span class="hljs-string">-27</span> 0.227021 2021<span class="hljs-string">-10</span><span class="hljs-string">-28</span> 0.227021 2021<span class="hljs-string">-10</span><span class="hljs-string">-29</span> 0.227021 2021<span class="hljs-string">-10</span><span class="hljs-string">-30</span> 0.227021 2021<span class="hljs-string">-10</span><span class="hljs-string">-31</span> <span class="hljs-string">-0</span>.269146 Freq: D, Length: 336, dtype: float64</pre></div><h1 id="6f84">Moving Average</h1><p id="5eb7">Generate a daily time series data over a period of 30 days.</p><div id="9767"><pre>dt_ranges = pd.date_range(datetime.now(), <span class="hljs-attribute">periods</span>=30, <span class="hljs-attribute">freq</span>=<span class="hljs-string">'D'</span>, <span class="hljs-attribute">normalize</span>=<span class="hljs-literal">True</span>) ts = pd.Series(np.random.randint(150,200, len(dt_ranges)), <span class="hljs-attribute">index</span>=dt_ranges) ts</pre></div><div id="e9ff"><pre><span class="hljs-attribute">2020</span>-<span class="hljs-number">11</span>-<span class="hljs-number">30</span> <span class="hljs-number">161</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">01</span> <span class="hljs-number">165</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">02</span> <span class="hljs-number">174</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">03</span> <span class="hljs-number">195</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">04</span> <span class="hljs-number">156</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">05</span> <span class="hljs-number">154</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">06</span> <span class="hljs-number">173</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">07</span> <span class="hljs-number">150</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">08</span> <span class="hljs-number">176</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">09</span> <span class="hljs-number">184</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">10</span> <span class="hljs-number">153</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">11</span> <span class="hljs-number">197</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">12</span> <span class="hljs-number">177</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">13</span> <span class="hljs-number">170</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">14</span> <span class="hljs-number">183</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">15</span> <span class="hljs-number">199</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">16</span> <span class="hljs-number">160</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">17</span> <span class="hljs-number">172</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">18</span> <span class="hljs-number">154</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">19</span> <span class="hljs-number">196</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">20</span> <span class="hljs-number">152</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">21</span> <span class="hljs-number">186</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">22</span> <span class="hljs-number">173</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">23</span> <span class="hljs-number">155</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">24</span> <span class="hljs-number">176</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">25</span> <span class="hljs-number">196</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">26</span> <span class="hljs-number">195</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">27</span> <span class="hljs-number">164</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">28</span> <span class="hljs-number">172</span> <span class="hljs-attribute">2020</span>-<span class="hljs-number">12</span>-<span class="hljs-number">29</span> <span class="hljs-number">193</span> <span class="hljs-attribute">Freq</span>: D, dtype: int64</pre></div><p id="b754">Plot the moving average and exponential weighted moving average.</p><div id="1109"><pre>ts<span class="hljs-selector-class">.plot</span>() ts<span class="hljs-selector-class">.rolling</span>(<span class="hljs-number">5</span>)<span class="hljs-selector-class">.mean</span>()<span class="hljs-selector-class">.plot</span>() ts<span class="hljs-selector-class">.ewm</span>(<span class="hljs-number">5</span>)<span class="hljs-selector-class">.mean</span>()<span class="hljs-selector-class">.plot</span>()</pre></div><figure id="80c6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*wlJE3cS0l9uvTX3xTAmxpA.png"><figcaption></figcaption></figure><p id="fda4">Do also check out the following articles</p><div id="8052" class="link-block"> <a href="https://alpha2phi.medium.com/rpa-and-web-scraping-using-jupyter-7a9e58b0da06"> <div> <div> <h2>RPA and Web Scraping using Jupyter</h2> <div><h3>Overview</h3></div> <div><p>alpha2phi.medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*DcOaReOrCpAR1EJX69xLbw.png)"></div> </div> </div> </a> </div><div id="0253" class="link-block"> <a href="https://readmedium.com/python-categorical-data-with-pandas-ea2d1a6eda9b"> <div> <div> <h2>Python — Categorical Data with Pandas</h2> <div><h3>Overview</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*Zg0RgvcGXK3ZpzWu6qrp6w.png)"></div> </div> </div> </a> </div><div id="d735" class="link-block"> <a href="https://readmedium.com/image-classification-clip-and-resnext-982c2674d9b5"> <div> <div> <h2>Image Classification: CLIP and ResNext</h2> <div><h3>Overview</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*3gdW4t6r2nMXvijz99q1qg.png)"></div> </div> </div> </a> </div></article></body>

Python — Time Series Data with Pandas

Numeric, categorical and time series data are the types of data that we commonly dealt with as part of exploratory data analysis. In this article I will go through some basic operations with time series data.

The notebook for this article can be found here.

Generating Time Series Data

# Generate data with time
ts = pd.Series(np.random.rand(100), index=pd.date_range(datetime.now(), periods=100))
ts
2020-11-30 08:53:58.271878    0.566087
2020-12-01 08:53:58.271878    0.906584
2020-12-02 08:53:58.271878    0.512919
2020-12-03 08:53:58.271878    0.878789
2020-12-04 08:53:58.271878    0.942902
                                ...   
2021-03-05 08:53:58.271878    0.983846
2021-03-06 08:53:58.271878    0.289516
2021-03-07 08:53:58.271878    0.840058
2021-03-08 08:53:58.271878    0.519680
2021-03-09 08:53:58.271878    0.506116
Freq: D, Length: 100, dtype: float64

To remove the time, set normalize to True.

# Generate data without time (normalize = True)
ts = pd.Series(np.random.rand(100), index=pd.date_range(datetime.now(), periods=100, normalize=True))
ts
2020-11-30    0.403108
2020-12-01    0.386055
2020-12-02    0.904074
2020-12-03    0.705386
2020-12-04    0.527159
                ...   
2021-03-05    0.271150
2021-03-06    0.851113
2021-03-07    0.778842
2021-03-08    0.380553
2021-03-09    0.552797
Freq: D, Length: 100, dtype: float64

You can also generate for different frequencies. Refer to the frequency offset aliases.

E.g. to generate data for business month

# You can generate data using different frequency offset alias
# https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

# BM - business month end
pd.date_range(start=dt, end=dt+timedelta(days=365), freq='BM')
DatetimeIndex(['2020-11-30', '2020-12-31', '2021-01-29', '2021-02-26','2021-03-31', '2021-04-30', '2021-05-31', '2021-06-30','2021-07-30', '2021-08-31', '2021-09-30', '2021-10-29'], dtype='datetime64[ns]', freq='BM')

Generate time series for every 90 minutes,

# Every 90 minutes 
pd.date_range(start=dt, end=dt+timedelta(days=3), freq='1H30T')
DatetimeIndex(['2020-11-29 00:00:00', '2020-11-29 01:30:00',
               '2020-11-29 03:00:00', '2020-11-29 04:30:00',
               '2020-11-29 06:00:00', '2020-11-29 07:30:00',
               '2020-11-29 09:00:00', '2020-11-29 10:30:00',
               '2020-11-29 12:00:00', '2020-11-29 13:30:00',
               '2020-11-29 15:00:00', '2020-11-29 16:30:00',
               '2020-11-29 18:00:00', '2020-11-29 19:30:00',
               '2020-11-29 21:00:00', '2020-11-29 22:30:00',
               '2020-11-30 00:00:00', '2020-11-30 01:30:00',
               '2020-11-30 03:00:00', '2020-11-30 04:30:00',
               '2020-11-30 06:00:00', '2020-11-30 07:30:00',
               '2020-11-30 09:00:00', '2020-11-30 10:30:00',
               '2020-11-30 12:00:00', '2020-11-30 13:30:00',
               '2020-11-30 15:00:00', '2020-11-30 16:30:00',
               '2020-11-30 18:00:00', '2020-11-30 19:30:00',
               '2020-11-30 21:00:00', '2020-11-30 22:30:00',
               '2020-12-01 00:00:00', '2020-12-01 01:30:00',
               '2020-12-01 03:00:00', '2020-12-01 04:30:00',
               '2020-12-01 06:00:00', '2020-12-01 07:30:00',
               '2020-12-01 09:00:00', '2020-12-01 10:30:00',
               '2020-12-01 12:00:00', '2020-12-01 13:30:00',
               '2020-12-01 15:00:00', '2020-12-01 16:30:00',
               '2020-12-01 18:00:00', '2020-12-01 19:30:00',
               '2020-12-01 21:00:00', '2020-12-01 22:30:00',
               '2020-12-02 00:00:00'],
              dtype='datetime64[ns]', freq='90T')

Handling Time Zone

Time zone conversion using tz_localize

# Convert to UTC
ts_utc = ts.tz_localize('UTC')
print(ts_utc.index.tz)
# Convert to other time zones
ts_sgt = ts.tz_localize('Asia/Singapore')
print(ts_sgt.index.tz)
ts_london = ts.tz_localize('Europe/London')
print(ts_london.index.tz)

To convert between time zones,

# Convert between timezone
utc_time = pd.Timestamp('2020-12-01 06:00', tz='utc')
print(f"UTC time {utc_time}")

shanghai_time = utc_time.tz_convert('Asia/Shanghai')
print(f"Shanghai time: {shanghai_time}")
UTC time 2020-12-01 06:00:00+00:00
Shanghai time: 2020-12-01 14:00:00+08:00

Resampling

You can resample your time series data

E.g. resample the following time series data in days to either monthly or quarterly.

dt_ranges = pd.date_range(datetime.now(), periods=365, freq='D', normalize=True) 
ts = pd.Series(np.random.randn(len(dt_ranges)), index=dt_ranges) 
ts
2020-11-30   -0.324954
2020-12-01   -1.149071
2020-12-02   -1.766691
2020-12-03    1.114326
2020-12-04    0.454338
                ...   
2021-11-25   -0.363843
2021-11-26    0.815752
2021-11-27    2.276535
2021-11-28    0.964677
2021-11-29    1.079887
Freq: D, Length: 365, dtype: float64

Resample to monthly,

# Roll up to month
ts.resample('M').sum()
2020-11-30   -0.324954
2020-12-31    1.414100
2021-01-31    4.912475
2021-02-28   -1.587236
2021-03-31   -7.061592
2021-04-30    8.470704
2021-05-31   -3.302751
2021-06-30    9.050472
2021-07-31    3.256410
2021-08-31    7.495364
2021-09-30    4.506512
2021-10-31    6.600639
2021-11-30    0.966138
Freq: M, dtype: float64

Resample to quarterly,

# Roll up to quarter
ts.resample('Q').sum()
2020-12-31     1.089146
2021-03-31    -3.736353
2021-06-30    14.218426
2021-09-30    15.258287
2021-12-31     7.566778
Freq: Q-DEC, dtype: float64

Downsampling

You can down sample your data, e.g. from every 1 minute to every 5 minutes.

Generate data for every 1 minute,

dt_ranges = pd.date_range(datetime.now(), periods=100, freq='T', normalize=True)
ts = pd.Series(np.random.randn(len(dt_ranges)), index=dt_ranges)
ts
2020-11-30 00:00:00    0.088538
2020-11-30 00:01:00    0.632649
2020-11-30 00:02:00    1.060944
2020-11-30 00:03:00   -1.153388
2020-11-30 00:04:00   -0.363503
                         ...   
2020-11-30 01:35:00   -2.070810
2020-11-30 01:36:00    1.414299
2020-11-30 01:37:00   -0.337969
2020-11-30 01:38:00    0.076367
2020-11-30 01:39:00   -0.744389
Freq: T, Length: 100, dtype: float64

Down sample to every 5 minutes,

ts.resample('5min', closed='right', label='right').sum()
2020-11-30 00:00:00    0.088538
2020-11-30 00:05:00   -1.234836
2020-11-30 00:10:00   -0.531198
2020-11-30 00:15:00   -0.723730
2020-11-30 00:20:00    2.609681
2020-11-30 00:25:00    4.472404
2020-11-30 00:30:00   -1.870135
2020-11-30 00:35:00   -1.153896
2020-11-30 00:40:00    1.737836
2020-11-30 00:45:00    1.967800
2020-11-30 00:50:00   -1.921246
2020-11-30 00:55:00    1.024230
2020-11-30 01:00:00   -1.191950
2020-11-30 01:05:00   -0.739175
2020-11-30 01:10:00    0.383030
2020-11-30 01:15:00    1.728960
2020-11-30 01:20:00   -1.539351
2020-11-30 01:25:00    1.180227
2020-11-30 01:30:00   -2.921451
2020-11-30 01:35:00   -0.144566
2020-11-30 01:40:00    0.408309
Freq: 5T, dtype: float64

You can also use the OHLC method.

# OHLC resampling
ts.resample('5min').ohlc()

Upsampling

Generate a monthly time series data.

dt_ranges = pd.date_range(datetime.now(), periods=12, freq='M', normalize=True)
ts = pd.Series(np.random.randn(len(dt_ranges)), index=dt_ranges)
ts
2020-11-30    0.422532
2020-12-31    0.116354
2021-01-31   -0.723405
2021-02-28    0.659457
2021-03-31    0.535598
2021-04-30    2.306383
2021-05-31    1.431278
2021-06-30    0.269275
2021-07-31    0.375808
2021-08-31    0.816700
2021-09-30    0.227021
2021-10-31   -0.269146
Freq: M, dtype: float64

Upsample to daily with forward fill.

ts.resample('D').ffill()
2020-11-30    0.422532
2020-12-01    0.422532
2020-12-02    0.422532
2020-12-03    0.422532
2020-12-04    0.422532
                ...   
2021-10-27    0.227021
2021-10-28    0.227021
2021-10-29    0.227021
2021-10-30    0.227021
2021-10-31   -0.269146
Freq: D, Length: 336, dtype: float64

Moving Average

Generate a daily time series data over a period of 30 days.

dt_ranges = pd.date_range(datetime.now(), periods=30, freq='D', normalize=True)
ts = pd.Series(np.random.randint(150,200, len(dt_ranges)), index=dt_ranges)
ts
2020-11-30    161
2020-12-01    165
2020-12-02    174
2020-12-03    195
2020-12-04    156
2020-12-05    154
2020-12-06    173
2020-12-07    150
2020-12-08    176
2020-12-09    184
2020-12-10    153
2020-12-11    197
2020-12-12    177
2020-12-13    170
2020-12-14    183
2020-12-15    199
2020-12-16    160
2020-12-17    172
2020-12-18    154
2020-12-19    196
2020-12-20    152
2020-12-21    186
2020-12-22    173
2020-12-23    155
2020-12-24    176
2020-12-25    196
2020-12-26    195
2020-12-27    164
2020-12-28    172
2020-12-29    193
Freq: D, dtype: int64

Plot the moving average and exponential weighted moving average.

ts.plot()
ts.rolling(5).mean().plot()
ts.ewm(5).mean().plot()

Do also check out the following articles

Python Pandas
Timeseries
Data Science
Analytics
Data Visualization
Recommended from ReadMedium