Why Do AI Projects Fail?

85% AI projects fail, 6 reasons why

Beneath apirational headlines and explosive technical advances, AI projects have a poor track record. Gartner, HBR estimates up to 85% AI projects fail before or after deployment, double the rate for software.

It’s well known AI is harder to deploy than software: AI has indeterministic outcomes. AI experiences capability uncertainty in the hands of users. Unintended consequences post-deployment results in bad press and loss of user trust in AI systems. The costs and ROI are also hard to justify upfront.

These are AI systems including & beyond generative AI: recommender systems, self-driving cars, computer vision, healthcare diagnosis, autonomous robots, financial risk assessment, and more

Collected from sources spanning years of academic & industry research, detailed reasons below:

(1) AI systems don’t solve the right problem (2) AI innovation gap (3) AI systems can’t achieve good enough performance and are not useful (4) People miss low-hanging fruit (5) AI systems don’t generate enough value (6) Ethics, Bias, Societal Harm

1. AI systems don’t solve the right problem

“Data scientists come up with things people don’t want. Designers come up with things that can’t be built” John Zimmerman, CMU

Data scientists and technical experts are often siloed, rarely interacting with the rest of the business, therefore coming up with projects unlikely to deliver transformative change and value.

Designers are best positioned to make a technology useful & usable because they approach problems from the perspective of users. But they (1) are generally not invited to participate until the end of projects when functional decisions have already been made, (2) lack abstractions of AI capabilities to work with; AI research mostly focus on mechanisms (how something works technically), not capabilities (what it can do for people, therefore become productized).

Data Scientists work in the space of improving and inventing mechanisms. Designers and Product Managers work in the space of transforming a capability into a desirable product.

Procter & Gamble temporarily embedded data scientists in business units. Accenture Song explored AI innovation with data science and design. Researchers have run successful AI ideation sessions with teams of Data Scientists, Designers, domain experts. More research has been conducted since: “design participants appeared most successful when they engaged in ongoing collaboration with data scientists to help envision what to make and when they embraced a data-centric culture...”

Others propose prioritizing AI use cases first by impact, second by risk, third by data: “Do we have the data to do this use case? Do we have permission to use it? Is it clean enough to be useful? If we don’t get past this step, we don’t start. Find another use case.”

2. AI-innovation gap

Technical advances are often followed by design innovation. A new enabling technology leads people to envision many new forms of the technology, incorporated into different aspects of life.

Engineers create new technology that allow new capabilities. Designers do not invent new technologies. Instead, they create novel assemblies of known technologies (Louridas, 1999)

However, the past 60 years of AI research have not experienced a wealth of design innovation as have other technologies. Partially because AI has traditionally been invisible, proactively serving results, difficult to detect the use of, in order to design with.

Generative AI has brought AI central to the interface: today’s AI revolution is about UX. People directly interact with AI models, and tune results back and forth. However, other forms of AI are infrequently discussed, generally going unnoticed.

Researchers refer to this as the ‘AI-innovation gap,’ commonly attributed to

Two Symptoms of Data Science

(1) Lack effective ideation. Most Data Scientists’ time is spent on analysis, model building, extracting valuable insights from data. Relatively little time is spent on ideation: discovering the right problems to solve with data. Instead of immediately focusing on one idea, what are a hundred alternative ideas that may be a better question to ask in the first place?

(2) Lack effective mental models and communicating them. The onus typically falls on AI developers to explain data science concepts to their collaborators. Communication gaps occur in practice, especially when conveying technical knowledge and its value. Barr Moses notes, ‘the harsh reality is not even the creators of AI are totally sure how it works’. Solutions have been proposed: sensemaking framework, shared mental models.

And Two Symptoms of Design

(1) Designers don’t notice obvious places where ML can improve UX. Starbucks has a gift card model. The Starbucks app can automatically land you on the payments screen when detecting you are in a store, but users are still forced to navigate to the payments section repeatedly. This is not hard to do, but an unnoticed place where AI can offer great convenience.

(2) Designers have clichéd knowledge of AI, driven by media hype and criticism. Designers tend to come up with capabilities inspired by science fiction that AI cannot realistically do, rather than shippable AI products with less risk. The perception of end users are also influenced by media, thinking AI will replace them, act human, work deterministically. This results in two mutually adaptive agents to design for, both the probabilistic AI system and user with unclear AI expectations and mental models.

3. AI systems can’t achieve good enough performance and are not useful

Several scenarios exist where AI systems struggle to achieve satisfactory performance, resulting in systems that may not be useful.

Looking closer, these are really situations where AI is not suited for the task, and handoff to a human partner is a more promising choice:

Healthcare diagnosis for rare diseases. Healthcare clinical decision systems offer predictions for common scenarios. But doctors really want assistance during uncommon scenarios, exactly where AI performs poorly.

Automated content moderation for semantic nuances. AI content moderation systems may struggle to accurately identify nuanced language: satire, sarcasm, cultural references. Systems may remove content not actually violating guidelines, when they should instead be flagged for a person to confirm.

Autonomous vehicles responding to environmental changes. Vehicles are trained to stop at stop signs. But when a crossing guard randomly holds up a stop sign in the middle of the street, the vehicle stops and doesn’t know how to proceed. Systems struggle with novel situations including unpredictable weather obstacles changing before they can react, negative space such as a ditch in the ground.

Instead, apply AI where AI uniquely excels:

(1) Mundane, repetitive data entry tasks where AI mitigates human error. Taking notes, searching information, sorting photo albums by person, time, location.

(2) Sort large datasets at scale and provide insights, patterns, recommendations. Examples include meeting summaries, tagging moments in time based on keywords or visuals, finding a foreign spy balloon by looking through the world’s data.

(3) Fast, real-time tasks people mentally or physically cannot do. Autonomous robots routing warehouses, or maneuvering to remote places in the world to capture data, inspect infrastructure, take the guesswork out of soldiers putting themselves in harm’s way.

(4) Human-AI collaboration. Co-pilots, agents, AI partners, whichever term you like. This concept originates from supervising robots. The Sheridan-Verplank Scale describes levels of human-machine interaction, from no autonomy to fully autonomous machines. For robots, research shows the middle levels don’t work. There’s high cognitive load for a person transitioning from supervising the machine, to fully taking over the task. Even in computer interfaces, designing effective handoffs will prepare people to anticipate, then participate in tasks.

4. People miss low-hanging fruit

There are simple AI opportunities aligned with user and business goals, but companies don’t build them.

Instagram wants influencers to post to attract views, but does not learn which tags influencers frequently use, forcing influencers type the same tags repeatedly. Not hard to build, but an unnoticed AI opportunity.

Self-parking cars serve only a small group of users who are afraid to park in tight spaces. Building AI for a car to park itself is very difficult. But it starts with simple prediction which can be on every car: Is this space big enough? There’s a mindset to make AI capabilities that are hard…

Even simple AI can make headlines. Built by ex-colleagues: Crash detection, AI can save lives and call for help. Handwashing detection, keeping you safe and hygienic during COVID. Predictive maintenance: preventing power outages and keeping the world’s infrastructure running. Autocorrect: never again send an embarrassing text.

5. AI systems don’t generate enough value

High cost to build and maintain, low returns.

Alexa launched with the intention for people to order more products from Amazon. Instead, people used Alexa to play music. Now every time people ask Alexa to play music instead of making purchases, Amazon loses money.

Hundreds of AI tools were built to catch COVID. Research teams around the world stepped up to help. Software was developed to allow hospitals to diagnose or triage patients faster. However, none of them worked, some were harmful. Many problems were linked to poor data. This led to reduced adoption and investment issues, an example of high AI investment with low service value.

Similarly, IBM Watson for Oncology was a technical success in developing AI capable of processing and recommending cancer treatments. But it did not generate sufficient ROI due to high development costs, integration challenges, and limited impact improving treatment outcomes beyond existing medical knowledge. Suboptimal for financial returns.

Design Considerations

Inflated expectations are underwhelmed when AI does not accomplish tasks as promised. But those tasks may not be achievable yet based on available data. Instead, design AI for what it’s capable of. Situations include when (1) there is low risk of getting results wrong or mitigations, (2) machines can minimize human error, (3) AI is expected to be an intern not an expert, and when (4) having an AI intern is better than nothing.

6. Ethics, bias, societal harm

AI ethics is broad but includes three central themes: bias, privacy, transparency.

(1) Bias occurs when available training data does not accurately represent the population AI intends to serve. For example, AI can help recruiters screen resumes. But systems learn to prioritize resumes that exhibit patterns found in resumes of past successful male candidates or educational backgrounds, downgrading underrepresented groups. AI can assess feasibility of diagnosing a type of cancer that affects both men and women. But if the company has data on men only, the models will provide biased results for women.

(2) Privacy requires that AI models are secure & won’t leak personal data. Unauthorized access, misuse, or breaches could lead to identity theft, discrimination, among other harms. Stanford HAI presents three suggestions to mitigate data privacy risks. Ann Cavoukian’s Privacy by Design principles provide a mathematical measure of data privacy.

(3) Transparency suggests users need to understand how AI models work to evaluate their strengths, limitations, functionality. The field of explainable AI (XAI) spurred from AI increasingly supporting high-consequence decisions, from healthcare to criminal justice. Now ‘transparency’ is broadly used as a design pattern to level set user expectations. More from Joseph George Lewis on XAI. Transparency also allows people to understand, trust, and effectively manage their intelligent partners when decisions are limited by the machine’s inability to explain thoughts & actions during critical situations.

Two Distinctions

(1) Transparent systems create models that can explain their reasoning in a way that humans can comprehend.

(2) Interpretable systems create models that may not be transparent but are still understandable and predictable, recognizing deep learning techniques can particularly be highly complex and difficult to explain.

HCI and AI research have produced many user-centered algorithm visualizations, interfaces, and toolkits to assist AI literacy: TensorFlow playground, Wekinator, AI Explainability 360 Toolkit, Google’s Facets.

Two User Types

(1) User-centric AI advocates for systems designed to be accessible to individuals without AI technical knowledge or expertise. Products that embody this perspective: Google Teachable Machine, Apple’s CreateML, Microsoft’s Lobe.ai, Generative AI tools. From research: how non-experts build models, collaborative ML with families.

(2) Expert-centric AI develops AI systems for domain experts with deep knowledge. Systems are typically complex and customizable, think GitHub Co-pilot, Salesforce Einstein. Designed for experts familiar with specific workflows and taxonomy. Novices and non-experts get lost figuring out how these systems work.

Data Science and Design have grown up from different disciplines. But increasingly connected in AI products, when AI experiences are closer to moderating user experiences.

Industry (for-profit) is not good at critical thinking, making useful things with unintended consequences. Academia (non-profit) is good at critical thinking, acting as well-informed critics, many sources cited in this piece.

All are needed for AI.

Thanks for reading, I’d love to know your thoughts

Elaine writes about design, AI, emergent tech. Follow for more, connect on LinkedIn.