Tuning out Toxic Comments, with the Help of AI

Can a machine learning-powered moderation tool make the internet a healthier, safer place?

This is the second article in a series that shows how the practices and principles of the People + AI Guidebook are reflected in the design and development of human centered AI products at Google. It was written in collaboration with the Jigsaw Team.

There’s so much potential in online interactions. They can be positive — you might learn something fascinating, perhaps meet a remarkable person — or they can be negative, even harmful. According to a Pew Media Research Center study, about 41 percent of American adults have experienced online harassment, most commonly on social media. A significant portion of those people — 23 percent — reported that their most recent experience happened in a comments section.

A single toxic comment can make someone turn away from a discussion. Even seeing toxic comments directed at others can discourage meaningful conversations from unfolding, by making people less likely to join the dialogue in the first place. The Pew Research Center report revealed that 27 percent of Americans decided not to post something after witnessing online harassment.

The power to moderate comment sections — to identify, reduce or eradicate toxic comments — has historically been granted to platform moderators. But what if, instead of relying on moderators, people could control for themselves the comments they see?

We wanted to make it happen, so we designed and built Tune.

Introducing Tune

Tune is an AI-powered Chrome extension that lets users moderate toxicity in their comment threads. It’s designed to give users (not platform moderators) moment-to-moment control over the tenor of comments they see. With Tune, users can turn down the “volume” on toxic comments entirely (zen mode) or allow certain types of toxicity (profanity, for example) to remain visible.

We built Tune on Perspective, an API we developed that uses machine learning to spot abusive language. Tune works on the commenting platform Disqus as well as on Facebook, Twitter, YouTube, and Reddit. It’s open source (find it on Github) and part of the Conversation-AI research project, which aims to help increase participation, quality, and empathy in online conversations at scale.

Empowering people

Designing Tune required extensive user research, deep empathy for users, and ongoing collaboration between product managers, engineers, and UX designers. From the start, we approached the problem with a user-centered focus by asking:

How might a machine learning-powered moderation tool empower individual users as they read comments?

We thought about how best to build trust, how to design an interface that would offer users control, and how to allow for feedback. Above all, we wanted users to feel empowered to change the toxicity level of the comments seen in their feeds.

Three design goals arose: Build user trust, give users control, and design for transparency.

Finding a metaphor

First: We had to set user expectations and make it easy for each new user to quickly grasp how Tune works. We searched for an easily understandable metaphor.

Our research revealed that users value control. Tune didn’t have to be perceived as something that worked by magic. Being perceived as transparent, we found, would engender user trust. The idea of volume as a metaphor took hold, and with it came the notion of a volume dial. This familiar object reinforced the message we wanted to send: that each user can take control, make “volume” adjustments, and explore what works well for them.

Understanding errors

Perspective was originally developed to enable publishers of all sizes (New York Times, The Economist, and others) to set toxicity thresholds for their platform. But we wanted Tune to serve end users, empowering individuals to set thresholds for themselves.

That said, the output of ML models isn’t always easy for end users to understand. And no matter how rigorous our ML model, we knew errors would still occur. There’s no exact calculus on how to define toxic language, and toxicity isn’t always obvious and universal. We knew our model was likely to classify certain comments in ways that differed from user expectations.

Some comments obviously intend to troll, insult, or provoke the person on the receiving end. But most toxic comments are subtle and ambiguous — not extreme. And in many cases, harmful conversation isn’t caused by the substance of a comment but by the tone in which the ideas are conveyed.

With that in mind, we designed the UI to support transparency: Comments are visible as users turn their “volume” dial to increase or decrease their toxicity thresholds. Users can easily see how the adjustment impacts what types of comments remain visible.

People + AI Guidebook principles

Designing Tune required trial and error and a commitment to human-centered AI design. Our users needed insight into what was happening, so they could trust Tune. They also needed control over toxicity thresholds, so they could adjust in real time.

People and AI Guidebook principles were foundational to the design and are evident in the final product. Those are:

Explainability and trust

Optimize for trust. Explain predictions, recommendations, and other AI output to users. The right level of explanation helps users understand how an ML system works. When users have clear mental models of the system’s capabilities and limits, they can understand how and when it can help accomplish their goals.

Feedback and control

Understand when your users want to maintain control and when they’d appreciate ML automation. We didn’t automate control entirely. We made sure for example, users could minimize toxicity themselves by turning down the dial.

Design to engender feedback — and then align that feedback with model improvement. Ask the right questions at the right level of detail (don’t overwhelm users with questions and avoid wordiness). Ask at the right moment — immediately. When Tune didn’t perform as expected, it was key to enable users to give feedback immediately.

Errors and graceful failure

Set user expectations for “failure” (when the system for example, perceives and classifies a comment as toxic but the user disagrees), and provide paths forward when the system fails. Design and build with the knowledge that errors happen and can help your ML model learn from users. Design an error experience that’s user-centered, because the product needs to provide ways for users to continue their task and help the model improve.

Parting words

We’re proud of what we’ve accomplished so far, but we don’t view Tune as a finished product. We see it instead as an ongoing experiment that empowers users (not platform moderators) to set their own thresholds for what they see. We want Tune to enable a healthier internet — one where toxicity can be dialed down and where people can feel safe.

Quinn Madison leads content strategy for People + AI Research (PAIR) core operations and is a coauthor of People + AI Guidebook.

Summarize