Summary

The website content discusses Reddit's decision to implement new API pricing, leading to widespread subreddit blackouts and developer backlash due to the perceived high cost and potential issues with legacy infrastructure.

Abstract

The author of the article describes a situation where Reddit's new API pricing model has caused significant disruption in the community. The pricing, set at 24 cents per 1000 API calls, is seen as exorbitant, especially when compared to other services like Firestore or GPT-3. This change has prompted many subreddits to go private or shut down permanently, as third-party tools that rely on the API are crucial for moderation and user experience. The author speculates that Reddit's decision may be due to the costs associated with maintaining outdated infrastructure, exacerbated by the increased traffic from AI scraping tools. While some accuse Reddit of greed, the author leans towards the explanation that Reddit is facing genuine financial pressures due to technical debt, rather than intentional profiteering. The situation has led to a significant amount of negative sentiment towards Reddit, with the community expressing their discontent through downvotes and public criticism.

Opinions

The author believes that Reddit's new API pricing is excessively high and not justified by the service provided.
There is a sentiment that Reddit's decision to change the API pricing is causing unnecessary harm to the community and the ecosystem of third-party tools.
The author suggests that Reddit's costs could be due to legacy infrastructure that has not been updated, which may be a common issue for long-standing tech companies.
The author seems to empathize with Reddit's potential technical and financial challenges, while also acknowledging the community's frustration and the impact on accessibility and non-commercial apps.
There is an implication that Reddit may not have adequately prepared for the increased load caused by AI and web scraping tools, leading to this crisis point.
The author expresses concern that this could set a precedent for other web services as AI-driven web scraping becomes more prevalent.

Oh, Reddit, What Have You Done This Time?

So in my RSS reader I have implemented server fetching. How it currently works is every hour I have an admin account that copies into it all of the feeds from every user, it then fetches all the stories, and then it pushes those stories back into the individual user accounts.

I was thinking of writing an entire blog post about why I do it this way. Long story short: I can reuse most of my code. So this is all running on a virtual private server. And it runs well. I’ve actually configured it to print some stuff to a log file and on June 11th 7pm Eastern I started noticing a lot of 403 errors.

The Reddit Blackout

So, obviously, 403 is the error code that you get when a subreddit goes private and you can no longer access the feed’s content. I just didn’t expect it to happen so early. Especially not at 7 pm Eastern on June 11th which is 11 pm in UTC.

And since then even more subreddits have joined our Eurasian/Australian friends in going private in protest of the new Reddit API pricing. And some are actually going permanently dark because they say that the third party Reddit tools are the only things that can make moderation bearable.

Subreddits like /r/FlutterDev. Shame, I never did like that sub much because people would just downvote people for no reason. But not like this.

So with half of reddit shutdown there’s not much to do but twiddle our thumbs. But we can talk about it. So let’s do that.

How Did We Get Here?

So let’s take a look at this new API. It will cost 24 cents every 1000 API calls. That is…. how do I put this? A sh*t ton of money.

To put this in perspective I have complained about GPT-3’s pricing. I thought it was too high so I was happy to learn that GPT-3.5 would be a tenth of the price.

The Biggest Problem Behind GPT Has Been Solved

So this is a little late because I didn’t know if I was going to write anything about it. Because I was developing my…

medium.com

So how much does GPT-3 cost? It cost 2 cents per 1000 tokens. Now I know it’s not exactly the same, a single request could use hundreds of tokens. But it costs an order of magnitude less money to run a few large language model queries (which is probably running on some supercomputer somewhere) than it does to run 1000 cheap HTTP GET requests.

OK, let’s look at something else which I use. Firestore. How much does a Firestore API call cost? Well they price it using reads, writes, and deletes. The most expensive of these operations is a write. It costs 10 cents which is half of Reddit’s API price. But this doesn’t get you 1000 writes, it gets you 100,000. So every Reddit API request is 200x more expensive than a Firestore write. So is one Reddit API call doing 200 writes/600 reads worth of work? It could be if they’re not paginating their queries properly. Hmm… how do you paginate a query for a forum thread?

Anyways so what is Reddit’s justification for this cost?

On 4/18, we shared that we would update access to the API, including premium access for third parties who require additional capabilities and higher usage limits. Reddit needs to be a self-sustaining business, and to do that, we can no longer subsidize commercial entities that require large-scale data use.

So we have two cases. Either they are lying and being greedy. Or they’re telling the truth. And I actually think that Reddit is telling the truth here. There has been some he said she said situations, especially concerning a phone call with the dev of

And it’s clear people are mad. Everywhere you look people are saying bad things about Reddit. Even in the AMA to talk about the changes there are a lot of undeserved downvotes. Like look at this comment:

We are working with RedReader and Dystopia to make sure they have access and will continue to work with others. We’ll review requests to ensure that the app is non-commercial and focused on accessibility needs. Approved apps can use the Data API for free. For our own apps, there is no excuse. We will do better.

Pretty positive comment. Very upvote worthy. So how many upvotes does it have? -950. At the time of writing. Oh wait, -953. Now it’s back at -950. Now it’s -949.

Well the point is I don’t think a lot of this hate is well deserved. It’s possible that Reddit is lying about having to be profitable. Apparently this whole thing was kicked off by machine learning models scraping Reddit’s content. But I don’t think so. Because you don’t cause a mass blackout (with some subreddits permanently going dark) for no reason.

What I think is going on is that Reddit was built on ageing infrastructure and it literally costs them something like 24 cents per 1000 API calls. Because Reddit was founded in 2005. That’s a long time ago.

Of course there have been a lot of changes made since then but how much of Reddit is still built on legacy stuff? Like Amazon recently revealed that upgrading one of its old systems resulted in a 90% reduction in costs.

Amazon Has Broken Free Of The Cult Of Encapsulation

So this post has gone viral in developer circles.

medium.com

It’s possible that reddit has the same problem. Just worse. It’s the most likely reason for the API costs. Or it could be that Reddit’s CEO is just grossly incompetent.

And it’s probably only now coming to light because of all these new web scraping tools and the rise of AI. There’s another post I read more recently about how an image web scraping tool is causing problems.

An AI Scraping Tool Is Overwhelming Websites With Traffic

"It is sad that several of you are not understanding the potential of AI and open AI and as a consequence have decided…

www.vice.com

Final Thoughts

This is the most likely explanation for the new API pricing: Reddit is paying way too much supporting legacy infrastructure and until now they’ve been able to eat the costs because most of their traffic is made up of actual users who look at ads and pay for various perks. But with AI scrapping Reddit has decided eating the costs is no longer sustainable.

If this is true, which I don’t see why it wouldn’t be it actually makes me feel bad for Reddit. But, I mean, they should’ve seen this coming. If you don’t upgrade your legacy infrastructure it could come back to haunt you as appears to be the case with Reddit.

I just hope this doesn’t become a trend. Because as AI becomes bigger and bigger we’re going to see web scraping increase. A lot of web services assume most of their users are real people. Well, that may not be true anymore. And as the tide goes out we’re going to see who’s been swimming naked.

If you liked this article consider following me on one of my publications: Lost But Coding (for programming content) or The Rest Of The Story (for everything else). You can do so with my RSS reader available on iOS (and Apple Silicon Macs) and Android.

Join Medium with my referral link - Andrew Zuo

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…

andrewzuo.com