Hashtags in the Sky

So Bluesky wants to change how hashtags work

Since quitting Twitter in April, I’ve become active on Mastodon, Bluesky and T2. None are perfect substitutes for Twitter (even as some aspire to be) — but the battle to define the future of the social web is often preoccupied with redefining elements of its past.

And with reason: redos are rare in the tech world. Platforms ossify and enshittify. Network effects thwart effective social graph portability. Upstarts fake it until they… get caught. It’s hard to unentrench the entrenched — especially when doing so depends upon overcoming our thumbs’ muscle memory.

And while some cynics decry the sameness of today’s Twitter successors, relitigating bedrock aspects of social media may win over some refugees who are seeking a fresh take.

But regardless, fifteen years on, we should revisit some of the assumptions passed down from the proto social web (e.g. as pertains to identity, moderation, portability, encryption, interoperability, etc).

While decentralization isn’t something most people care about, promoting competition should be. Participation in the common discourse depends on it — especially as we pendulate from centralization to disaggregated digital social spheres.

While I have more to say about the fediverse generally, today my focus is the future of hashtag support on Bluesky. While still invite only, its million member waitlist implies that its design decisions may have broad cultural implications, especially as Bluesky product developer and protocol engineer Paul Frazee continues to work in public, and has shared several proposals to solicit public feedback.

One of those proposals concerns Bluesky’s support for hashtags — a topic of interest and familiar controversy. You can read it here:

proposals/0003-hashtags at main · bluesky-social/proposals

Bluesky proposal discussions. Contribute to bluesky-social/proposals development by creating an account on GitHub.

github.com

Now, I’ll step back and acknowledge that, as always, I’m just a man on the internet spouting off and sharing my ideas. Even though I birthed the hashtag phenomenon in 2007, their use far exceeded my control or influence.

Each platform that chooses to adopt hashtags can and will make its own decisions about how and when to support them — if at all. And generally I acknowledge and respect their right to do so.

That said, as the moral godfather of hashtags, it’s my duty to keep alive the spark that gave rise to the phenomenon, and to share what I’ve learned from my nearly seventeen years observing them evolve.

More than anyone else, I have a unique perspective on the original intention, context, and goals we were trying to solve with the hashtag. I can also speak to the compromises we made due to technological and media limitations, and the behavioral barriers present in those earliest innings of the social web.

Having started my work on the social web in 2004, I can also speak to the period that preceded the rise of the hashtag, when many competing proposals were in play (you can read about them on the Twitter Fan Wiki that I ran back then). That is to say — some of the proposals being put forth today were tried before and dispensed with, which means there’s in fact prior art that we can examine, and from which I will reference in my commentary.

Generative AI has no idea how to render a #

The problem with the problems with hashtags

It’s worth noting that for all the problems with hashtags (real and perceived), critics lack unanimity. Helpfully, Paul has enumerated his particular problems with hashtags:

Spam: lurking on the flipside of a click on a hashtag is often a morass of spam and nonsense which, generally, sucks.
Accessibility and appearance: screen readers barf when confronted with wordswithoutspaces, and hashtags can be hard to read for the visually equipped. This is especially true when CamelCase is not applied (e.g “#therapist”, “#nowthatcherisdead”, “#susanalbumparty”). When hyperlinked, hashtags can distract.
Targeting content: a little squishy, but Paul suggests that hashtags could be used to “[speak] to different audiences”. This aligns with my original “tag channels” proposal, but then he suggests that Bluesky may need to invest in a tool to “solve it”… which sounds like groups or forums?

Setting aside the longstanding accessibility issue, I’m not swayed by the remainder, but let’s consider each in good faith:

Spam is a platform problem, orthogonal to hashtags. Email addresses don’t solve email spam, nor should they. RFC 821 specifies the routing of compliant messages, not what to do when that protocol is abused. Spam is a ranking and moderation challenge — more relevant to the Labeling and Moderation Controls proposal.
As for accessibility, my understanding is that screen readers can handle CamelCase words, including hashtags. Not applying CamelCase to multi-word hashtags is the effective equivalent of leaving off the text description for uploaded media. Even worse, some platforms intentionally downcase hashtags — overriding author-provided accessibility hints.

Solving the right problems?

So if those are Paul’s key problems, what does he propose to address them?

High level:

Let’s take these one by one. As I do, note that I’m balancing my own priorities—which may or may not align with yours (or Paul’s)—with consideration for the challenges Paul has enumerated.

To wit, one of the greatest challenges with “fixing the problems with hashtags” is definitively rebalancing a set of tradeoffs. Hashtags, if anything, represent a bundle of affordances and accommodations that work for different users, audiences, scenarios, and functionality that has evolved over time. And in so doing, they’ve deprioritized others. My priorities will reveal themselves in context.

1. Visually separate the tags

Separating hashtags from the post body is a common and popular proposal. Especially for users who come from platforms where post tags are already separate (e.g. WordPress, Tumblr), this seems like an obvious and familiar improvement:

And yet, my hashtag proposal specifically eschewed Flickr’s 2006 design which separated the photo description and its tags for reasons.

To keep data clean, useful, and easily malleable, it’s much preferable to separate each type of data:

And yet, for each additional interface you add to collect some idiosyncratic datum, complexity increases for the entire user base as well as the difficulty slope for onboarding new users. And as you scale to the masses, you’ll eventually encounter users who will be much less technically literate than the early adopters. Your cleverness will chew your face off.

If you don’t care about growth or becoming a mass medium, then fine: add all the complexity you like. But if you aspire to grow large and tall, then breaking out metadata into a separate field will produce diminishing returns.

And if you seek to become a decentralized platform with a wide network of clients, then you must prioritize simplicity that promotes interoperability.

The more you complexify upstream, the harder you make it for downstream clients to follow along. If they don’t grok what you’re trying to do (or how it might serve their needs), they will fall out of conformance and risk fracturing the network or its interface semantics. This is already happening on Mastodon.

I call this squeezing the complexity balloon: regardless of what you do, there will always be an absolute amount of complexity in a system, and if you attempt to alleviate it in one dimension, in actuality, you’ve likely just squeezed it into another dimension.

This is why I embedded the “meta with the meat”. I anticipated that lazy developers would manhandle any metadata wasn’t part of the tweet payload itself.

I’d seen it happen on earlier photo sharing networks. It wasn’t uncommon for apps to drop EXIF data or compress images to reduce filesize without user consent. Heck, Instagram did this at scale! While the omission of such data might result in a faster experience, it also interfered with the sharer’s intent.

Thus relegating hashtags to a subordinate interface may to doom them to a similar fate. In plenty of cases this might not amount to much, but in some essential cases, it may render the benefits of hashtags moot for ad hoc coordination and affiliation on decentralized networks, which was their original purpose.

Hashtags are resilient because they represent semaphoric memetic behavior. That is, anyone can learn to use this digital signaling shorthand simply by observing other people —as people do with various TikTok dance trends. Hashtags require intent, but no specific technical knowledge, and the more prevalent their use in culture, the more pervasive they become.

Hashtags are a benevolent digital weed.

Metadata that is relegated to second citizen status doesn’t produce the same effects. Moreover, and related to the next proposal, as a bridging technology, hashtags unite what’s happening in the real world with the digital — and reducing the loss of fidelity between both is a top priority for me.

All the way back to the origins — when I wanted to tag my SXSW and BarCamp tweets—I needed a typographic hack that I could use IRL with minimal loss of meaning when brought to social media.

Seeing this at Pride over this weekend reinforced this priority:

If hashtags were shoved in a box separate from the main content, would people still use them in these contexts?

2. Allow spaces in tags

When I worked on Google+, a pitched battle broke out to make hashtags more “legible”.

“We’re Google”, the engineers mused. “We can reset the standard. We can add spaces to hashtags so they don’t look so… stupid and pedestrian. And it’ll be easy: just enclose a multiword hashtag between two hashes and you’re done.”

Sure, by some definition of “done”.

The problem, as I pointed out at the time, is that this would break compatibility and interoperability across the social web — balkanizing Google+ posts, leading to decoherence and breaking trending topics.

“But we’re Google,” they argued. “We’ll figure it out!”

But could they have predicted #BlackLivesMatter, #MeToo, #ArabSpring, or others? Would they have had the same cultural impact had they been treated as a set of a space-separated words? Would #Black Lives Matter# have been as visually powerful here?

Paul builds upon the separate Tumblr-style tag proposal and then goes on to suggest that spaces could be subbed in place of user provided underscores (“_”) in multiword tags:

When tags are placed in a separate field this is fairly easy. To handle them inline, we propose using an underscore (_) to indicate the space. The composer would convert the underscores to actual spaces before uploading.

And again, I’m for improving hashtag accessibility—but against inhibiting participation and coherence.

#black_lives_matter is not the same thing as #BlackLivesMatter. How would these variants be spoken by radio or TV announcers? And if the hash prefix is dropped:

The defense rested its case Tuesday in Harvey Weinstein’s rape trial without the disgraced Hollywood mogul taking the witness stand, setting the stage for closing arguments in a landmark me too trial punctuated by graphic testimony from six accusers.

Original:

The defense rested its case Tuesday in Harvey Weinstein’s rape trial without the disgraced Hollywood mogul taking the witness stand, setting the stage for closing arguments in a landmark #MeToo trial punctuated by graphic testimony from six accusers.

The hashtag is a common currency identifier. Each can become a digital town square, or a linguistic inner tube.

Supporting spaces in hashtags makes superior the digital context, to the detriment of auditory, broadcast, and print mediums. But some of the hashtag’s value lies in its ability to be used across mediums. Hashtags and should not be relegated to internet backwaters.

My proposal for improving hashtag accessibility:

If the primary intent of adding spaces is to prevent screen readers from barfing on hashtags, then solve that.

Crowd- or author-sourced hashtag accessibility hints, like descriptions for uploaded media, seem like an obvious path, with prior examples in the wild.

And platforms could throw some money at improving screen readers’ treatment of hashtags; leave the hashtags alone and serve ARIA-style hints so screen readers can read them correctly. Let’s avoid solutions that require retraining the entire media apparatus on how to speak a hashtag verbally.

And for the love of the goddess, don’t fall into the Google hubris of trying to add spaces to hashtags when doing so will only minimize their effectiveness as a tool for cross-platform coordination, protest, and providing a vehicle for unheard voices to rise up in unison.

3. Use curated tag search results

Let’s reframe this one as applying Bluesky’s composable moderation concepts to improve hashtag search results.

Just as you can follow custom algorithms or those provided by the platform, why not use the same technique to improve the search experience?

This proposal really doesn’t have much to do with hashtags — though it does make me wonder if Bluesky would collapse hashtags with spaces into tags without spaces, more or less nullifying the previous proposal. That is, would #my_new_tag normalize down to #mynewtag? The latter would produce twice as much content to moderate, and people will just use both tags in their posts to maximize reach. This is the problem with introducing complexity: people will always optimize their behavior to whatever benefits themselves the most with the minimal amount of effort.

4. Mute words and hashtags

I appreciate that Paul acknowledges, “This isn’t a new idea (at all)” — because it’s not. It was in my original 2007 proposal, just never implemented.

The idea is simple enough and I support it, though I think TikTok has the experience of content recommendation nailed down.

Put simply, in the future, you (i.e. your behavior) are the query.

You shouldn’t need to provide explicit signals like following accounts or liking posts to get good recommendations. Just search, watch and interact with content, and the system will adapt to your implicit signals.

But sure, in the case where you want to inhibit certain content from appearing in your feed(s) (or you want to take a break from it), you should be able to mute hashtags globally or from certain people or within certain algorithms.

Personally, I like how Spotify has made it possible to “Exclude from your taste profile” arbitrary content on the service. In my case, all the sleep music I listen to should NOT impact my Discover Weekly!

Spotify’s “Exclude from your taste profile” menu option

So: click on a hashtag and choose “Show me less of this” or “Never show me content with this tag”. Done.

5. Opt-in hashtags

The last proposal relates to targeting content to specific audiences based on their interests. Typically this is solved on social platforms with some form of groups or forums, which is an architectural rather than a memetic behavioral solution.

Paul wants his cake and to eat it too. Cool.

Hashtags provide linguistic liquidity in that they make it very easy for people to take a shotgun approach to labeling content that can find its way to relevant audiences. By applying labels to the content itself (rather than depositing the content into depositories), downstream search engines or aggregators can collate same-tagged content and present them to interested users, ad hoc. The problem with this, which is best articulated in Buster Benson’s Stock and Flow, is that the content that pools together in this method then passes by because the organizing structure (a feed) is temporary.

Forums and groups, in contrast, are persistent digital spaces. People put things in them and can revisit them later.

You can add hashtags to content that is shared into in these spaces, but each space contains a copy of the original, with its own comment thread or interaction ledger.

This pattern is common on Reddit since it lacks a global feed. All content exists within a subreddit. This leads people to post the same to multiple subreddits to maximize reach, creating a kind of low-engagement spam because the original poster’s (OP’s) attention rarely travels downstream with the copies.

When I worked on Google+ (specifically Circles) I proposed a simple three tiered way to think about targeting content: using an email-style composer, you’d first select the audience and then compose your message. You could share to individuals separately (similar to Instagram Stories now), groups singularly, or publicly, to audiences unknown. Hashtags could be added independently to the content, especially since because hashtags should apply to content rather than define audiences or access permissions.

Paul acknowledges that “opting out of hashtags is the simpler option”, and so the ability to mute tags on Bluesky seems like a more straightforward approach than using them to target content exclusively.

So, to conclude

Ultimately, Paul seems to shrug after writing up his proposals:

There’s been a lot of discussion about whether hashtags could be replaced by something better, but nothing has really arrived that feels like a better fit. Hashtags are well-understood and simple. It seems like the challenge is just getting real value out of them.

Yes, yes indeed. It is just a matter of getting real value out of them, and leaving them and how people type them into text boxes alone.

I’m all for finding ways to make hashtags universally more accessible in every context in which they appear, but I will also fight to preserve their utility in galvanizing mycelial movements that need coherence to gain purchase in the public consciousness.

I appreciate the fervor and debate around hashtags and continue to welcome it. As I’ve always said, hashtags are the stupidest idea that could possibly work — and after 16 years, they still do.

Are there better ways to provoke panmedia inclusive digital conversations that cross the chasm into the real world? If you’ve got an idea, now’s the time to speak up!