avatarSoner Yıldırım

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

7863

Abstract

7"><b>{A Pink Necromancer Prosimetrum} — {NaNoWriMo 2024: Day 05} 🦄🌸</b></p><p id="5251">Quaraun’s fingers caressed the vardo’s interior, watching the moose pull their heavy load. The smell of leather and wood mingled with wet earth, a scent he found oddly comforting.</p><p id="c975">A sudden lurch jolted him forward, and he peeked out the door, catching sight of tangled branches snagged on one of the moose’s antlers. The vardo jolted as the moose stopped, their massive antlers catching on the dense, snow-covered branches overhead.</p><p id="adfa">Quaraun brushed snowflakes off his lap, casting a frustrated glance at the moose.</p><p id="6e30">Sighing, he stepped down, leaning heavily on his cane, steadying himself on icy ground, as he untangled the twisted twigs.</p><p id="2eba">The moose chuffed, impatient.</p><p id="4ed9">“Hold still, you fool creature,” he muttered, tugging at the knotted mess.</p><p id="38aa">Quaraun stroked the beast’s neck, calming it.</p><p id="3882">“I swear, these antlers…”</p><p id="5292">Antlers scrape the air, frost crowns a beast’s head proudly, winter’s royalty.</p><p id="ab14">“Your crowns are large enough to rule winter itself,” he muttered, half admiring, half irritated.</p><p id="5a25">Antlers trace the sky, tangled in the forest’s breath, winter crowns the wild.</p><p id="a37e">“Magnificent creatures,” he murmured to himself, tracing his fingers along one antler.</p><p id="8a83">Beneath his touch, they were strong, unyielding, their ancient lines a map of secrets. His fingers itched to weave an adornment for their majestic heads, though he doubted they’d appreciate it.</p><p id="e5a4">Snow-cracked branches bow, moose antlers touch ancient skies, Quaraun whispers praise.</p><p id="4c07">BoomFuzzy chuckled, watching Quaraun struggle with the moose’s stubborn antlers.</p><p id="e582">The magnificent beast tossed its head, sending leaves and bits of moss flying. BoomFuzzy grabbed a few stray twigs, admiring their woody scent.</p><p id="1973">“Ah, ye wee beauty, those antlers are a menace,” he teased, stepping closer.</p><p id="5eb1">The moose snorted, nearly swiping BoomFuzzy with its massive rack.</p><p id="9299">“Oi! Careful, Love!” He laughed, patting the moose. “Big beastie, big attitude.”</p><p id="cb13">BoomFuzzy leaned down, adjusting the leather straps.</p><p id="e1ba">“Maybe I should weave a crown of flowers?” Quaraun said.</p><p id="b3df">“For the moose?” BoomFuzzy asked.</p><p id="9532">“Yes.”</p><p id="ea55">“Weaving posies and putting ’em on yer noggin, eh?”</p><p id="7439">He grinned at Quaraun, winking as he guided the beast forward.</p><p id="65af">“Their nogging. Not mine.”</p><p id="3d11">Old bones creak and clash, forest kings draw paths in ice, ancient wisdom calls.</p><p id="a304">BoomFuzzy chuckled, patting the nearest moose’s flank.</p><p id="229a">“Ye’ve outgrown yerself,” he cooed, scratching behind its ear.</p><p id="a51e">The moose huffed, frost clouding its massive nostrils.</p><p id="a862">BoomFuzzy’s laughter, rings out through frosted branches, echoes haunt the trees.</p><p id="0afb">“I should have me a crown like these,” BoomFuzzy said.</p><p id="f318">“Their antlers?”</p><p id="5e9f">“Aye, got crowns fit fer a king.” BoomFuzzy’s eyes twinkled with mischief. “And we be three kings are we not?”</p><p id="c54c">“Are we?” GhoulSpawn asked.</p><p id="634d">“Aye. I am the King of The UnSeelie Court. Quaraun here was nephew to King of the Moon Elves, and with all the other Moon Elves dead, that make Quaraun King of the Moon Elves.”</p><p id="3556">“But as you just pointed out he’s the last one, so there’s nothing for him to be king of.”</p><p id="d6e7">“Eh. Details can be worked out later. Him still a king.”</p><p id="94b7">GhoulSpawn looked over at Quaraun who was brushing burrs out of the moose’ fur.</p><p id="9e32">“Are you a King Quaraun.”</p><p id="0c73">“No.”</p><p id="593c">“See?” GhoulSpawn said to BoomFuzzy, who was now laughing.</p><p id="e9e3">“What’s so funny?”</p><p id="8be7">Before BoomFuzzy could answer Quaraun continued talking.</p><p id="2f55">“I am the Sacred Pink JellyFish, which makes me The Grand High Emperor of The Triple Planets. I am the King over all other Kings on all of the inhabited planets of our solar system. The entire solar system bows to my feet…”</p><p id="d14c">“Him is Insane,” BoomFuzzy whispered in GhoulSpawn’s ear.</p><p id="c40c">“Really?” GhoulSpawn replied sarcastically. “I hadn’t noticed.”</p><p id="9463">Quaraun glared at both of them.</p><p id="4f99">“I am NOT insane.”</p><p id="2b37">“Say the one what is called Quaraun The Insane.”</p><p id="11af">“I am The Pink Necromancer! Quaraun the Insane is what Findaru’s pencil pushing, brown nosing, jackassery followers call me, because they haven’t got enough brains to think for themselves!”</p><p id="b3d4">“Oooooh,” BoomFuzzy teased. “Yis pissed. Ya said a contraction. Ya only do that when ya too bitchy to control ya deliberate phraseologies.”</p><p id="912b">Quaraun pulled out his wand and aimed it at BoomFuzzy’s nose, about to start shrieking.</p><p id="50ce">“And how do you figure I’m a king?” GhoulSpawn asked BoomFuzzy, hoping that it would deflect Quaraun’s incoming raging inferno.</p><p id="d9c3">BoomFuzzy casually plucked The Rainbow Wand out of Quaraun’s grasp and replaced it with a silver brush, then shoved the wand in his own sporran.</p><p id="f17c">“I’ll keep both ya favourite wands in the same place, eh?”</p><p id="0a31">Quaraun glared at BoomFuzzy’s crotch while GhoulSpawn tried not to laugh.</p><p id="ee51">“Now,” BoomFuzzy returned to addressing GhoulSpawn. “Ya daddy was The Ghoul King. Is that not why ya gots the title of GhoulSpawn? Spawn of The Ghoul. Literally means you are the son of The Ghoul King, and he went missing decades ago. That make you king in his stead.”</p><p id="61e2">“King of what? Hell? I didn’t exactly enjoy living on the Burning Planet you know. A portal might have brought me to your world by accident, but in case you hadn’t noticed, I’m ain’t trying to go back to the Planet of Flame either.”</p><p id="624f">“We are three wandering kings without kingdoms. But that does not make us any less worthy of crowns.”</p><p id="c74b">“What say ye, Love?” BoomFuzzy grinned at Quaraun. “Fancy a crown too?”</p><p id="7d23">“I want my wand back.”</p><p id="7836">“Which one? They both in me kilt. Ya’ll have come get them.” BoomFuzzy pretended to hold an antler above Quaraun’s head, laughing heartily. “Or perhaps I’ll get ye antlers next time I conjure a disguise!”</p><p id="0494">“Give me my wand back!”</p><p id="5408">Quaraun lunged at BoomFuzzy, but the Phooka blinked to the other side of the vardo, laughing hysterically as he did. His laughter filled the air, echoing among the trees as he made his way to untangle the moose from the branches.</p><p id="5aa3">Hands upon rough bone, he names the moose ‘forest kings,’ ancient pride reborn.</p><p id="4552">Seeking to avoid Quaraun’s pending wrath, GhoulSpawn darted forward, plucking stray branches from the moose’s tangled antlers.</p><p id="0a18">“Imagine if we put bells on them,” BoomFuzzy giggled, a mischievous gleam in his eye.</p><p id="26da">He held up a piece of antler that had snagged its rough surface oddly satisfying beneath his fingers.</p><p id="487d">“Better than any Human carving, these are!”</p><p id="50ed">His hands skimmed over the moose’s antlers, tracing their intricate patterns. The moose huffed, nudging him away.</p><p id="5343">Antlers edged with snow, battle-born in GhoulSpawn’s eyes, striking through the dark.</p><p id="6333">Laughing, BoomFuzzy stumbled back, admiring the majestic curves of the antlers.</p><p id="98db">“We should collect these after shedding season,” he mused aloud, imagining his own room decorated in antlered splendour.</p><p id="5add">Nature’s wild crown

Options

bends, horned giants roam the still snow, homeward hoofbeats drum.</p><p id="906c">GhoulSpawn stood nearby, scratching his head, staring at the moose.</p><p id="45f4">“Ever think those antlers could be… I don’t know, like, useful?”</p><p id="1fe5">BoomFuzzy flashed a grin at GhoulSpawn, who shot him an amused look.</p><p id="7588">Moons glint on sharp crowns, armoured beasts haunt GhoulSpawn’s mind, shadows come to life.</p><p id="1263">“We could… mount spikes on them, give ’em extra… defensive qualities.” BoomFuzzy’s eyes sparkled with ideas. “Imagine charging into town with armoured moose! Instant intimidation factor.”</p><p id="2995">He snickered, stepping back as the moose swung its head, as if insulted by the notion.</p><p id="f5a4">GhoulSpawn shrugged, nudging Quaraun.</p><p id="a615">“Hey, just thinking.”</p><p id="06a3">Quaraun nooded.</p><p id="9486">“Those antlers are dangerous enough — why not make ’em spectacular?”</p><p id="512a">“Spectacular?”</p><p id="0438">“Decorate them with glitter and rhinestones. Pink.”</p><p id="35a3" type="7">The End?</p><p id="1769">| <a href="https://www.eelkat.com/Quaraun-Short-Stories-Index.html">©<i>2024 <b>Wendy Christine Allen</b></i></a><i> | All Rights Reserved |</i></p><figure id="3b84"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Ss_9xyQP6g_ZfNeBdI18eQ.jpeg"><figcaption></figcaption></figure><h2 id="f690">Today’s story used this prompt:</h2><h2 id="7bbe">Day 5:</h2><ul><li>Intrigue &</li><li>— Bonus Dare: Pomegranate</li><li>— Double Bonus Dare: Antler</li><li>— Triple Bonus Dare: Fork</li></ul><div id="05f0" class="link-block"> <a href="https://readmedium.com/one-word-writing-prompt-challenge-e065dcbe952d"> <div> <div> <h2>November 2024 — NaNoWriMo Edition — One Word Writing Prompt Challenge</h2> <div><h3>A Month of Writing Prompts (Created for November 2024) (& I just noticed — this is my 100th Writing Prompt published on…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*xIQqYuC1KSMdaXHstHurNg.jpeg)"></div> </div> </div> </a> </div><figure id="9b78"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aWtbG1hf62F9K6bVTEgYWw.gif"><figcaption>꧁✨🌸🔮🦄🔮🌸✨꧂</figcaption></figure><div id="70ee" class="link-block"> <a href="https://medium.com/quaraun-shorts/an-alphabetical-index-of-my-600-drabbles-poems-kish%C5%8Dtenketsu-other-short-fiction-on-medium-c510ba1c2d5e"> <div> <div> <h2>An Alphabetical Index of my 600+ Drabbles, Poems, Kishōtenketsu, & Other Short Fiction on Medium</h2> <div><h3>{The Pink Necromancer Index Part 1- all 600+ stories about Quaraun, BoomFuzzy, & GhoulSpawn can found here} 🌸🦄🌸…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*Ayizc4kzdByAq_ofSpiAMQ.jpeg)"></div> </div> </div> </a> </div><div id="9daf" class="link-block"> <a href="https://readmedium.com/an-alphabetical-index-of-my-300-nonfiction-articles-126645d8b29e"> <div> <div> <h2>An Alphabetical Index of My 300+ NonFiction Articles on Medium</h2> <div><h3>Updated almost daily/Updated whenever a new article is published, I just don’t publish nonfiction daily.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*vn5ezPLSbDiTPn-WwHx92A.png)"></div> </div> </div> </a> </div><div id="1f68" class="link-block"> <a href="https://readmedium.com/what-is-a-triple-drabble-aka-a-drabble-in-3-perspectives-b5d61582a278"> <div> <div> <h2>What Is A Triple Drabble aka A Drabble In 3 Perspectives?</h2> <div><h3>Moving the Triple Drabble Info Here to its own page so it doesn’t clutter up the start of my actual Drabbles anymore.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*7b8uGPVn6hzVXpy2rCmZ6w.png)"></div> </div> </div> </a> </div><div id="b018" class="link-block"> <a href="https://readmedium.com/what-is-a-haiku-story-6717f911fb95"> <div> <div> <h2>What is a Haiku Story?</h2> <div><h3>Info on the term and index of the ones I wrote.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*wy7dKJihAzJ55vXUbPSd_Q.png)"></div> </div> </div> </a> </div><figure id="d753"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aWtbG1hf62F9K6bVTEgYWw.gif"><figcaption>꧁✨🌸🔮🦄🔮🌸✨꧂</figcaption></figure><figure id="4f91"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*WxhsFKAo4CWxwTkDUGdPVQ.gif"><figcaption>꧁✨🌸🔮🦄🔮🌸✨꧂</figcaption></figure><p id="7ecf"><b>Thank you for reading all the way to the end!</b> <<</p><p id="a400"><i>Thank you for stopping by and have a nice day! </i>꧁✨🌸🔮🦄🔮🌸✨꧂</p><p id="4a8d">And if it’s your birthday today: ִֶָ𓂃 ࣪˖ ִֶָ🐇་༘࿐꧁ᴴᵃᵖᵖʸ☆ᵇⁱʳᵗʰᵈᵃʸ꧂🤍🎀🧸🌷🍭</p><p id="c37f"><a href="undefined">Wendy Christine Allen 🌸💖🦄 aka EelKat 🧿💛🔮👻</a></p><div id="4407" class="link-block"> <a href="https://medium.com/@EelKat/subscribe"> <div> <div> <h2>Get an email whenever Wendy Christine Allen 🌸💖🦄 aka EelKat 🧿💛🔮👻 publishes.</h2> <div><h3>Get an email whenever Wendy Christine Allen 🌸💖🦄 aka EelKat 🧿💛🔮👻 publishes. By signing up, you will create a…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*nx_zrR4I5kkFngZ-)"></div> </div> </div> </a> </div><p id="99a6">| <a href="https://www.eelkat.com/index-of-pages.html">©<i>2024 <b>Wendy Christine Allen</b> </i></a><i>| All Rights Reserved |</i></p><div id="4b09" class="link-block"> <a href="https://readmedium.com/never-be-afraid-to-raise-your-voice-for-honesty-and-truth-and-compassion-against-injustice-and-d8817549ea0a"> <div> <div> <h2>“Never be afraid to raise your voice for honesty and truth and compassion against injustice and…</h2> <div><h3> — William Faulkner</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*GtUY047oQIU2wjfx)"></div> </div> </div> </a> </div><p id="cd04">If you have any<a href="https://readmedium.com/never-forget-my-son-was-murdered-8bd7a03c9869"><b> information about my son’s murderer</b></a><b>, </b>please call FBI Agent Andy Drewer at 207–774–9322</p><ul><li>Everything we are allowed to publicly release about <a href="https://readmedium.com/never-forget-my-son-was-murdered-8bd7a03c9869"><b>The FBI Investigation</b></a><b> </b>Can Be Found Here</li></ul></article></body>

5 Cool Ways to Enrich ML Models with Open Data for Free: An In-depth Review of Python Libraries

With code examples

Photo by Towfiqu barbhuiya on Unsplash

Machine learning algorithms are used by numerous businesses to solve a variety of forecasting problems. Predicting the values ​​of time series data is quite common, which has the potential to create business value. For example, a typical task in retail business is to forecast sales in individual stores or in certain categories of goods. Another example is forecasting the demand for rail and air tickets to certain destinations.

All these forecasting problems are strongly tied to the people’s behavior, which is influenced by several factors such as weather, holidays, the state of the economy in the country, and global processes. These factors must be taken into account if you want to build a strong predictive model that produces accurate and robust results.

The number one requirement to create such a model is, of course, data. In this article, we will focus on the task of collecting data and explore 5 popular libraries that gather necessary data for given dates. Moreover, based on the obtained data, we will construct derived features and enrich the training set with them.

We will do an in-depth review of the 5 of the most popular libraries that provide access to different data types:

  • 📚holidays — holidays in different countries
  • 📚yfinance — stock data from Yahoo Finance
  • 📚meteostat — weather data from weather stations around the world
  • 📚pandas-datareader — stock data and economic statistics from many sources around the world
  • 📚upgini — ready-made features based on many sources

All these libraries can be installed via pip:

pip install holidays yfinance meteostat pandas-datareader upgini

Creating the training data

Let’s imagine we need to solve a very typical problem: Predict the volume of sales for different categories of products in different stores across several countries. We will start with generating a dataset that resembles real sales statistics using Pandas.

1️⃣ The first step is to set a date range, list of countries, stores, and product categories. We will indicate the countries in the form of two-letter ISO codes since most of the aforementioned libraries work exactly with them.

import pandas as pd
start_date = pd.Timestamp("2015-01-01")
end_date = pd.Timestamp("2018-12-31")
countries = ["FI", "SE", "NO"]
stores = ["DataMart", "DataRama"]
products = ["Sticker", "Mug", "Hat"]

2️⃣ The second step is to create a dataset from all combinations of the listed attributes:

index = pd.MultiIndex.from_product(
 [pd.date_range(start_date, end_date), countries, stores, products],
 names=["date", "country", "store", "product"],
)
df = index.to_frame().reset_index(drop=True)

3️⃣ We are ready to generate the sales values. To keep the numbers from being completely random, we will add seasonal and weekly trends, and average sales that vary by country, store, and product. Finally, we will add some random noise. In this step, we will be using functions from the math and random libraries of Python.

from math import pi, sin
from random import random, seed
country_factor = {"FI": 1, "SE": 2, "NO": 3}
store_factor = {"DataMart": 1, "DataRama": 2}
product_factor = {"Sticker": 1, "Mug": 2, "Hat": 3}
seed(0)
def fake_sells(date, country, store, product):
 return int(
    50
    * country_factor[country]
    * store_factor[store]
    * product_factor[product]
    * (1 + 0.25 * sin(2 * pi * date.day_of_year / 365))
    * (1 + 0.1 * sin(2 * pi * date.day_of_week / 7))
    * (1 + 0.4 * random())
  )
df["num_sold"] = df.apply(
   lambda r: fake_sells(
      r["date"], r["country"], r["store"], r["product"]
   ),
   axis="columns",
)

It is important to emphasize that the synthetic data doesn’t contain all the patterns occurring in reality but it will be sufficient to demonstrate the process of enrichment with external features.

✅ We have completed generating the data. Let’s take a look at the first 5 rows of the DataFrame:

df.head()
(image by author)

The first library we will cover is the holidays.

📚holidays

It’s natural to expect that holidays have a significant impact on sales. The most popular Python library for getting holiday data is holidays, which allows you to get dates and names of holidays for 86 countries, including those 3 in our dataset.

❗A notable feature of this library is that the holiday schedule is hard coded directly into the library code. On the one hand, this allows you to use it without access to the Internet. However, keep in mind that you will need to update the library each time you want to update information about holidays.

The easiest way to use it is to simply add a column with the names of the holidays for each record. In order not to change the original dataset, we will work with its copy.

🖥️ Demo

import holidays
enriched_df = df.copy()
enriched_df["holiday_name"] = enriched_df.apply(
    
    lambda r: holidays.country_holidays(
        r["country"]).get(r["date"]), 
    axis="columns"
)

Most machine learning libraries can’t use a column with string values ​​directly as a feature. Let’s add a flag for the presence of a holiday for each day:

enriched_df["holiday"] = \
enriched_df["holiday_name"].notna().astype("float")

We will expand the holiday names into one-hot encoded features using the get_dummies function of Pandas:

enriched_df = pd.get_dummies(
    enriched_df, columns=["holiday_name"], prefix="holiday"
)

The features obtained this way contain information on the exact days of the holidays. However, we also observe the effect of a holiday on the sales in the days before and after the holiday.

To construct such features, we need to collect information about holidays not only for the dates presented in the dataset, but also for the dates adjacent to them:

holidays_df = pd.DataFrame(
    [(
        date, 
        country, 
        holidays.country_holidays(country).get(date))
        for country in df["country"].unique()
        for date in pd.date_range(
            start_date - pd.Timedelta(3, "D"), 
            end_date + pd.Timedelta(14, "D")
        )],
    columns=["date", "country", "holiday_name"],
)

We will now transform the names of the holidays into binary features as we have done before:

holidays_df["holiday"] = \
holidays_df["holiday_name"].notna().astype("float")
holidays_df = pd.get_dummies(
    holidays_df, columns=["holiday_name"], prefix="holiday"
)

Let’s construct features that indicate the presence of a particular holiday in the previous 1 or 3 days and in the next 1, 3, 7 or 14 days:

back_features_df = pd.DataFrame(
    {
        f"{column}_d{window_size}_back": (
            holidays_df.groupby("country")[column]
            .rolling(window_size)
            .max()
            .shift(1)
            .values
        )
        for column in holidays_df.columns 
        if column not in ["date", "country"]
        for window_size in [1, 3]
    }
)
ahead_features_df = pd.DataFrame(
    {
        f"{column}_d{window_size}_ahead": (
            holidays_df.groupby("country")[column]
            .rolling(window_size)
            .max()
            .shift(-window_size)
            .values
        )
        for column in holidays_df.columns 
        if column not in ["date", "country"]
        for window_size in [1, 3, 7, 14]
    }
)
holidays_features_df = pd.concat(
    [
        holidays_df[["date", "country"]], 
        back_features_df, 
         ahead_features_df
    ],
    axis=1,
)

We can now enrich our dataset with these features:

enriched_df = enriched_df.merge(
    holidays_features_df, on=["date", "country"]
)

📚yfinance

Shopping behavior is also directly related to the state of the economy and finance in the country and the world. The most accessible and promptly updated indicators of this state are stock indices. To obtain this information, we will use the yfinance library.

To get the data, we need to define a set of indices and a time interval. For demonstration purposes, we will get only the main American and European stock indices (S&P500, NASDAQ, Dow Jones, STOXX), commodity prices (gold, silver, oil, gas) and currencies (dollar index, euro/dollar exchange rate).

🖥️ Demo

tickers = {
    "^GSPC": "snp500",
    "^IXIC": "nasdaq",
    "^DJI": "dow_jones",
    "^STOXX": "stoxx",
    "GC=F": "gold",
    "SI=F": "silver",
    "CL=F": "crude_oil",
    "NG=F": "natural_gas",
    "DX-Y.NYB": "usd",
    "EUR=X": "eur",
}

The next step is to build features by aggregating data over large windows. Let’s download the data for the time period extended accordingly:

import yfinance as yf
start_date = pd.Timestamp("2015-01-01")
end_date = pd.Timestamp("2018-12-31")
forecast_shift = pd.Timedelta(7, "D")
yfinance_df = yf.download(
    list(tickers),
    start=start_date - forecast_shift - pd.Timedelta(365 + 7, "D"),
    end=end_date,
)

Unlike holidays, stock prices data is not known in advance. We will predict the sales for a week ahead which is the reason why we have added a forecast shift of 7 days. We must fit the model on features from the forecast date minus 7 days.

Several indicators are available for each trading day, such as the opening price or trading volume. We will keep only the adjusted closing price. Let’s also rename the columns for better readability:

yfinance_df = yfinance_df["Adj Close"]
yfinance_df = yfinance_df.rename(columns=tickers)
yfinance_df.index.name = "date"

There is no stock data for non-working days. Let’s fill them with the previous available values:

yfinance_df = yfinance_df.fillna(method="ffill")
yfinance_df = yfinance_df.asfreq("D", method="ffill")

Only data known in advance can be used to build forecasts for the future. Let’s shift the dates to our forecasting horizon:

yfinance_df.index += forecast_shift

In addition to the raw values, let’s also create some derivatives:

  • The ratio of the current value to the average for the past 7 days
  • The ratio of the average for 7 days to the average for the year
  • The ratio of the average for 7 days to the same period of the last year
yfinance_features_df = pd.concat(
    [
        yfinance_df,
        (
            yfinance_df / yfinance_df.rolling(7).mean()
        ).add_suffix("_1d_to_7d"),
        (
            yfinance_df.rolling(7).mean() / 
            yfinance_df.rolling(365).mean()
        ).add_suffix("_7d_to_1y"),
        (
            yfinance_df.rolling(7).mean() /  
            yfinance_df.shift(365).rolling(7).mean()
        ).add_suffix("_7d_to_7d_1y_shift"),
    ],
    axis=1,
).dropna()

Finally, we will enrich the original dataset with these features:

enriched_df = df.merge(yfinance_features_df, on="date")

The enriched dataset has 45 features. You can view them using the head method or columns method of DataFrame.

📚meteostat

The weather conditions certainly affect the sales of commodities in many categories, from soft drinks to heating devices. The meteostat library allows you to get historical data from many weather stations around the world.

Among the available indicators are the average, maximum and minimum temperature per day, atmospheric pressure, the amount of precipitation and wind speed.

Since we don’t have exact coordinates of stores, we’ll take the average values for each country. For these purposes, the library provides the ability to request a list of all weather stations in the country, as well as an easy-to-use method for aggregating their readings.

To obtain more stable features, we will keep only stations that have a continuous history of daily observations during the time period we are interested in.

🖥️ Demo

from meteostat import Daily, Stations
 
meteostat_dfs = []
 
for country in df["country"].unique():
    stations = Stations().region(country).fetch()
    stations = stations[
        (stations["daily_start"] <= start_date - forecast_shift)
        & (stations["daily_end"] >= end_date)
    ]
    meteostat_df = (
        Daily(stations, start_date - forecast_shift, end_date)
        .normalize()
        .aggregate(spatial=True)
        .fetch()
    )
    meteostat_df.index.name = "date"
    meteostat_df["country"] = country
    meteostat_dfs.append(meteostat_df)
 
meteostat_features_df = pd.concat(meteostat_dfs)

❗Unfortunately, the library incorrectly calculates the average of wind direction when aggregating over several stations. In addition, some of the columns may contain a lot of missing values. Let’s keep only well-filled columns:

meteostat_features_df = meteostat_features_df[
    [
        "country", 
        "tavg", 
        "tmin", 
        "tmax", 
        "prcp", 
        "snow", 
        "wspd", 
        "pres"
    ]
]

We will add a date shift for our forecasting horizon:

meteostat_features_df.index += forecast_shift

Finally, we can enrich the original dataset with weather features:

enriched_df = df.merge(
    meteostat_features_df, 
    on=["date", "country"]
)

📚pandas-datareader

Pandas-datareader provides an interface for accessing a large number of data sources. Currently, 16 active sources are supported. 11 of them contain mainly stock and financial data and the remaining 5 contain various economic statistics.

❗Some sources limit the amount of data available for free. You must be ready for registration and obtaining personal keys.

We have already used the yfinance library for stock prices data. Thus, we turn to economic statistics and get the Consumer Price Index (CPI). It indicates the level of prices for different categories of consumer goods and services, which directly affects the people’s purchasing power.

CPI values ​​are updated monthly. In addition to the raw value of this indicator, the change relative to the previous month or the corresponding month of the previous year are also available.

❗CPI can be obtained from several sources but it’s not very easy. We first have to look for it on each site. This is the downside of using the pandas-datareader library: it doesn’t provide a single searchable catalog of available datasets. In this case, it turned out that EconDB would be a convenient source for CPI.

Let’s construct in the request by placing the name of the dataset (IMF_CPI), the country filter, the data frequency (M — monthly), and the date range. We should take into account that data for any month becomes available only in the month following after. Therefore, we will extend the range for an additional 1 month.

🖥️ Demo

import pandas_datareader.data as web
 
datareader_df = web.DataReader(
    "&".join(
        [
            "dataset=IMF_CPI",
            f"REF_AREA=[{','.join(countries)}]",
            "FREQ=[M]",
            f"from={start_date - forecast_shift -  
               pd.DateOffset(months=1, day=1):%Y-%m-%d}",
            f"to={end_date:%Y-%m-%d}",
        ]
    ),
    "econdb",
)

The downloaded dataset contains only entries for the first day of each month. To be able to join it on any date, we will fill in the rest of the dates of the month with the same value. We will also shift the dates according to the data availability at the time of the forecast:

datareader_df.index += pd.DateOffset(months=1)
datareader_df = datareader_df.asfreq("D", method="ffill")
datareader_df.index += forecast_shift

Let’s move countries from the columns to the rows using the stack method and rename the columns:

datareader_features_df = datareader_df.stack("Reference Area")
datareader_features_df = datareader_features_df.droplevel(
   ["Frequency", "Scale"], axis="columns"
)
datareader_features_df = datareader_features_df.rename_axis(
   ("date", "country")
)

Finally, let’s replace the country names with ISO codes:

country_codes = {"Finland": "FI", "Sweden": "SE", "Norway": "NO"}
datareader_features_df = datareader_features_df.reset_index()
datareader_features_df["country"] = \
datareader_features_df["country"].replace(
    country_codes
)

We can now enrich the original dataset:

enriched_df = df.merge(
    datareader_features_df, 
    on=["date", "country"]
)

The enriched dataset has 131 features. You can view them using the head method or columns method of DataFrame.

❗Here is a list of things you should keep in mind before starting to work with pandas-datareader:

  • The data loading speed is highly dependent on the source and varies from seconds to minutes. It is sometimes interrupted due to a timeout.
  • You need to deal with each data source separately: what attributes are available, whether registration is required, query syntax, etc.
  • No guarantees: the library is a connector and is not responsible for the availability of data.
  • It’s impossible to get different data for comparison and evaluation with one request. You need to query each indicator separately.

📚upgini

Upgini aggregates a lot of open data sources with a wide range of keys and their combinations (dates, countries, postal codes, IP addresses, phone numbers, emails). In this review, we will focus only on date and country. For these keys, Upgini provides access to all the data types discussed above, as well as the schedule of major international and national events such as olympiads and elections.

The main differences between using Upgini and collecting features from separate sources on your own are:

  • All sources are requested simultaneously with a single request.
  • Joining by keys occurs automatically.
  • The result of enrichment is not raw data, but normalized, cleaned up and transformed features, ready for use in any model.
  • The service returns only those features that contain useful information for the specified target prediction.

Thanks to these advantages, using Upgini comes down just to the following few lines.

🖥️ Demo

from upgini import FeaturesEnricher, SearchKey
X = df.drop(columns="num_sold")
y = df["num_sold"]
enricher = FeaturesEnricher(
    search_keys={
        "date": SearchKey.DATE,
        "country": SearchKey.COUNTRY,
    }
)
enriched_df = enricher.fit_transform(X, y, keep_input=True)

As a result, we get the original dataset enriched with more than a hundred of features from different sources. Each returned feature passes a selection stage. The selection criterion applied is the contribution of a feature to the prediction of the target variable that we specified in a fit call. Additionally, SHAP values for each feature are returned and can be used for further independent selection of features.

As an outcome, with a single request, Upgini performs many steps of a standard ML pipeline at once: data extraction, data cleaning, feature engineering, and feature selection.

Comparison and conclusions

holidays

😊 Pros

  • Works without an internet access
  • Quick responses

👎 Cons

  • The holiday schedule in the past is often incorrect
  • No information about ad hoc holiday transfers
  • No information on other non-working days, including one-time events such as COVID lockdowns
  • No holiday categorization or matching of holiday names in different languages
  • No information about holidays that are working days
  • Limited countries coverage

💥 Usage recommendations

Before using this library, make sure that the countries you are interested in are included in the list of supported ones. If you need information about the full working / weekend schedule, then you’ll have to add the necessary information yourself.

The library can be used if you need only the information about the dates and names of holidays and the requirements for the completeness and accuracy of the data are not very high. If you need consistent historical information on holiday schedules and holiday transfers, as an example, for training of machine learning algorithms, then it’s better to use other tools to get it.

yfinance

😊 Pros

  • Quick responses
  • Intraday data
  • Access to most of the world’s exchanges
  • Data for almost 100 years

👎 Cons

  • Prohibition on the using of Yahoo Finance data for commercial purposes
  • No API access guarantee
  • Intraday data is only available for the last 60 days
  • Impossible to get full list of available tickers
  • Impossible to download all the instruments from a certain exchange or a certain index without a complete enumerating all of them

💥 Usage recommendations

The library can be used for research or educational purposes only. If you are engaged in trading strategies or other tasks related to finance, this type of data will be useful for your purposes. But in any case, you’ll have to transform raw data into features suitable for use in machine learning models.

meteostat

😊 Pros

  • No restrictions on the use of data, including commercial purposes (except for the distribution of the original data without modifications)
  • Global coverage
  • Over 200 hundred years of data history
  • Local caching to disk to optimize repeated requests

👎 Cons

  • Even if you specify a short time interval, the history of observations for the weather station is downloaded entirely, which affects the speed of requests
  • Weather stations are unevenly distributed and the accuracy of data for given coordinates will vary greatly in different parts of the world
  • Lots of missing data, part of the gaps replaced by simulated data
  • Calculation of the average wind direction when averaging data from several weather stations is implemented incorrectly

💥 Usage recommendations

The library provides maximum data coverage both geographically and chronologically, but the quality suffers due to the large number of gaps in the data. Be careful when choosing the nearest weather station to the coordinates you need. Probably it’s better to choose a more distant weather station, but with better data completeness.

There are no weather forecasts in meteostat and the data is updated with a delay, so if you need the most recent data or forecasts for the next dates, then you should look for another data source.

pandas-datareader

😊 Pros

  • Wide range of sources including data aggregators
  • Access to high-quality commercial stock data sources

👎 Cons

  • No unified API for accessing different sources. To make certain queries you’ll have to study the source sites
  • No guarantee that the source will work
  • Different licensing restrictions on the use of data for different sources, up to paid access for some of them
  • Impossible to get a list of available data either for individual sources or for all sources at once (you need to study the source sites yourself)
  • Many data columns are duplicated in different sources, but with different completeness, updating frequency, and depth of available history
  • Impossible to request multiple datasets with a single query
  • Download speed varies greatly between sources, up to the inability to get some data due to timeouts

💥 Usage recommendations

This library provides the widest coverage of financial, economic, demographic statistics at country level. This kind of data can be very useful for building long-term economic models. But to solve typical business problems, country level granularity and monthly/annual frequency of measurements will not be enough. If you need to plan the supply of goods to different stores for the next two weeks, then instead of global macroeconomic trends, it’s better to look for more localized and frequent data.

You should also take into account the difficulty in finding relevant data. If you don’t know exactly what kind of information you need, then pandas-datareader probably won’t work for you.

upgini

😊 Pros

  • Wide variety of data types
  • Localization of data down to a country level or postal code where possible
  • A lot of extra data keys for enrichment — “country”, “zip code”, “phone number”, “hashed email”, “IP address”
  • Ready-made features for ML models
  • Simple Scikit-Learn compatible API to get all the data with a single request
  • Automatic selection of relevant features for the target variable
  • SHAP values for collected features to facilitate further selection steps
  • Convenient enrichment of new data batches with the saved search result

👎 Cons

  • Relatively low response time (several minutes)
  • You can’t choose the types of requested data or specific features yourself, all of them are automatically selected and ranked
  • Features are always selected for a specific target variable, so the enrichment results can’t be reused for another task
  • There’s no detailed description of the features, which makes the interpretation difficult for some of them

💥 Usage recommendations

If your goal is to train machine learning models, then this library is just perfect, thanks to ready-made relevant features and transparent integration into Scikit-Learn.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

Thank you for reading. Please let me know if you have any feedback.

Data Science
Artificial Intelligence
Machine Learning
Python
Programming
Recommended from ReadMedium