Building a Budget News-Based Algorithmic Trader? Well then You Need Hard-To-Find Data
My Story building an algorithmic trader for $0, analyzing free APIs, Datasets, and web scrapers. Part 1: APIs
Algorithmic trading, using either news or stock signals, has blossomed in popularity over recent years. An entire industry has developed from giants like Bloomberg and Webhose, to thousands of smaller companies all vying to have the fastest, most accurate, and most expansive news coverage. The problem with most of these services, however, is they are targeting large firms, and therefore often cost hundreds to an individual, most likely pricing themselves out of most people’s algorithms, and at the very least eating away at potential returns. For this reason, I decided to try and piece together the most realistic solution for the at-home developers, and after exploring dozens of sources, I have narrowed my search down to the 10 most viable options. None of these 10 free sources provide a home run for algorithmic trading, but it's quite realistic to think that a combination of these sources can get most people to a great place! In part 1, I will be covering the first 5, which are all APIs! Part 2 on datasets and web scrapers can also be accessed here!
APIs
An API is a favorite of most platforms for delivering consistent news data programmatically to users. A user will make a request to an endpoint, for example, asking for the 10 most recent articles that have to do with Apple, and receive a response from the endpoint.
Pros:
Fast Programmatic Access — Most APIs will return the requested data within a couple hundred milliseconds, allowing users to quickly gather data
Tailored — APIs tend to have good documentation and if built well are able to return exactly what the user is looking for. For example, I may be able to tell the API I want exactly this many articles from this news source from these dates. Other methods such as bulk downloads may require far more extraction to get to that level of granularity
Real-time — APIs for news most of the time stream real-time data to the endpoint. This means when you send a request, you are getting the most recent information. This can be essential for adapting to changing market conditions.
Cons:
Usage Restrictions — Most APIs are on a pay per use system. This means you will likely be paying on a recurring basis if you are using anything above a free tier — which may not suffice the needs of a complex system. Many also throttle the amount of accesses you can make over a period of time.
Lack of Historical Data — Because News APIs are built on the concept of having the most recent news accessible, the backlogs for historic data tends to be shallow. This can be a problem when ML algorithms need years of historic data to begin to perform well.
Because of the overwhelming positives of APIs, 5 of the 10 best sources I found are APIs, below I will expand on each of them!
News API
Pros:
High usage cap — 500 Requests per day is a fantastic free tier and the average developer will never need to worry about running against that cap!
Accurate articles with tons of metadata — Requests about a topic will often return hundreds of articles per day and contain information such as source, author, title, description, publish date, and article content. This provides a lot of information per request.
Cons:
Built-in delay — The free tier includes a 15-minute delay on all returned articles, this could be a deal-breaker or not matter at all, depending on the type of trading.
Mediocre Backlog — The free tier backlog goes back 1 month (Paid is 24mos), meaning finding that coveted historical data may not be possible.
Prohibitive cost beyond free tier — If you were willing to shell out $10 a month for higher usage and premium support, that is not possible with news API. Their next plan up, which includes 250,000 requests and no news delay, costs $449, well outside the range of most developers.
Overall:
When you start googling for financial news APIs, one of the first results, and conveniently, one of the best, is News API. It is simple to use and has a great free tier for the average developer! If you can fill your historic data by other means and don’t mind the delay, News API is hard to beat!
Bing News Search (via Rapid API)
Pros:
Real-Time — Given a search, provides fast real-time news on a specific search topic or category-specific news
Great pricing beyond free tier — For each additional request above the 100 requests/day limit, you only pay $0.005 per additional request. This gives a lot of flexibility if on some days more requests are needed than on others!
Cons:
Nonexistent Backlog — There is no way to specify a time range for your search results. This makes finding data from even yesterday very difficult, so this API is unideal for retrieving historical data.
Overall:
I consider this a very average API. The metadata returned is good, but not great, the free tier of 100 requests/day is middle of the line as well, but it does provide the invaluable upside of being real-time at the drawback of no historic results. It also comes with the reliability of being maintained by Microsoft. If all you care about is reliable real-time results, this API for its price range gets the job done!
Contextual Web Search (via Rapid API)
Pros:
High usage cap — 10,000 Requests per month is a fantastic free tier. In addition to that, once the limit is hit, requests only cost $0.0005/request, 1/10 the cost of the bing news search requests!
Pretty good backlog — The news search API comes with optional to and from date parameters, and I was able to access data as far back as 1 year, which is pretty excellent for a freemium API. It may not be enough data for complex traders, but provides a reasonable start!
Real-Time — The search will automatically return the newest results unless otherwise specified, making real-time data easy to obtain!
Cons:
Relevancy Concerns — While testing this API, if I searched for a specific ticker symbol, most of the results would be relevant, but the occasional irrelevant result would sneak in. For example, when searching for news related to AAPL, one of the articles I received was entitled, “Is Mariah Carey Too Loose With Her Kids?” Clearly this is not the data anyone would want trickling into their ML algorithms!
High Latency — The average latency reported by the API is around 4000ms, and in my personal testing, even that seemed generous. This has the possibility to limit the amount of obtainable data under heavy usage.
Reliability Issues— The reported reliability of the API is 95%, which is not bad for a free service (though still lower than most). But In my testing, the number of random failures was noticeable, and this is not something I consistently experienced with any of the other APIs.
Overall:
The great attribute about this API is the ability to selectively collect news data going back a year, and the free tier provides enough usage to easily fill in a backlog(each call can return up to 50 articles). It can also provide real-time results, making this API look like a winner at surface level. However, beyond the surface, this API has reliability and relevancy flaws that take a bit of the shine off. If you do not mind sifting through possibly messy data and implement fail-checks, this API may be the best one-size-fits-all solution!
Bloomberg Market and Financial News (via Rapid API)
Pros:
Quality News with ample depth and metadata — The API is broken down into two main search features, the first, you input a ticker, and the 10 most recent articles are returned with their title and unique key. A second API is provided to input this unique key and retrieve the full article and metadata. The metadata is very in-depth and includes tags, associated topics, the article broken into components, and much more.
News is specifically sorted by ticker — Most of the other APIs are web searches for news. Bloomberg specifically tags articles to only its related tickers. This means the news received is always relevant to the desired ticker!
Real-time— Provides the most recent articles within milliseconds of when it was made public!
Good pricing beyond free tier — The pro tier is $10/month, which is a very reasonable rate for a developer and includes 10,000 requests per month, plus a reasonable $0.002/request rate beyond the cap.
Cons:
Free Tier is insufficient — The free tier only includes 500 requests per month, which breaks down to less than 17 per day. For most systems, this is far insufficient, especially considering it takes 2 requests with this API to get to the in-depth metadata for any individual article.
Nonexistent backlog — Bloomberg excels in providing fast-updating financial news. This comes at the tradeoff of no current capability for retrieving historic data.
Overall:
Bloomberg has been a leader in financial news for years, and that is well reflected within this API. It is the clear winner if accuracy, speed, and depth of articles are what is important. The overwhelming strength in this area comes at the cost of other aspects. The lack of a backlog and being almost forced into paying for a premium account weigh on what is otherwise a superb API!
Yahoo Finance API (via Rapid API)
Pros:
News is specifically sorted by ticker — like Bloomberg, Yahoo Finance API works by inputting a ticker symbol and returning a list of the latest 10 articles specifically tagged to be relevant to that company. Yahoo Finance also has the added capability to simply return the latest 10 general finance articles!
Lists include tons of metadata— When asking for the 10 most recent articles, a great amount of detail is included for each item in the list! This includes a title, summary, entities, and content of the article!
Real-time — Provides the most recent articles within milliseconds of when it was made public!
Good pricing beyond free tier — Just like Bloomberg, The pro tier is $10/month, which is a very reasonable rate for a developer and includes 10,000 requests per month, plus a reasonable $0.002/request rate beyond the cap.
Cons:
Nonexistent backlog —This API excels in real-time updates, so there is no programmatic. access to backlogs.
Underwhelming free tier — Like Bloomberg, this API gives access to 500 calls per month for free. While this is almost unusable in Bloomberg, the amount of metadata returned with each list retrieval makes 500/month more friendly. This essentially gives access to 5,000 in-depth articles per month, which is slightly underwhelming but usable for smaller projects!
Overall:
This API behaves and feels very similar to the one provided by Bloomberg, carrying great value in accuracy, speed, and depth of articles. While Yahoo has slightly less depth and metadata per article, it makes up for this by having a friendlier free tier, and requires far fewer API calls to retrieve similar amounts of information! Overall, it stacks up well when compared to the rest of the APIs on this list!
Overall Impression of APIs
All of these APIs have their strengths and weaknesses. None of them provides the perfect solution, but most of these provide a ton of value at low to no cost. All of these APIs are usable, and I think the decision mainly comes down to what the individual values as a strength! Any of these solutions will likely have to be paired with supplemental data, or users will have to collect data over a long period of time to have any use in a trading algorithm.
Evaluating the APIs took much longer than expected. In order to prevent this from being a 20-minute read, I will be covering datasets and web scrapers in part 2 of this series, which can be accessed here!
That is all for now! I hope this read takes a lot of the pain and suffering out of sifting through the dozens of online platforms!
If you enjoyed this article and would like to read more that I have written, consider following or checking out other articles below!
