AI vs. HI

AI Copyright Infringement Accusations: NY Times vs. AI Chatbots — Fair-use

Billions of dollars and the end of some large language models and chatbot training are at stake

This story will help you understand a major lawsuit filed against AI-generated writing and how the suit came about. I have been a professional writer, editor, and publisher for half a century and am well-versed in AI-HI issues. For a year, since the decent of AI bots upon our world, I have gathered facts from stakeholders and followed developments impacting every human creative worker.

The New York Times is the first major media publisher to file suit against OpenAI and its major investor, Microsoft. In the upcoming year, there will probably be many more suits filed. Large language model platforms like OpenAI, owner of ChatGPT, have been accused for nearly a year of scraping creative products willy-nilly from the web and other sources, regardless of copyrights and without regard to the origins of the written work or images.

Remember that some sites, like Google Books for example, contain copies of print works or excerpts not available elsewhere on the web. So writings scraped by AI conceivably include hundreds of thousands of books that were never intended to be web content. Whether sites like Google Books have misstepped is outside the scope of this article.

Such allegedly purloined material has been used to train AI bots to regurgitate information to platform users in response to prompts. Many individual artists, writers, and creators have previously or are currently preparing suits. The results of such suits will be interesting. I doubt they’ll be limited to OpenAI — many platforms are under scrutiny.

Think of it like this: You or I, as entrepreneurs, start up an information website. Our intent is to give users a one-stop portal where they can find out anything they ever need to know about everything. Users log in and pay us money. They may then ask us anything, and we will access all other known or unknown writings worldwide to glean answers or provide stories for the user.

We don’t get permission from the original writers or even let them know we’re scarfing up their stuff. How long do you think we will last until the first “cease and desist” letter arrives from a publisher or writer we plagiarized?

What the plaintiff says

NYT claims it’s the first major U.S. media organization to sue OpenAI and Microsoft, the powers behind ChatGPT and Copilot (AKA Bing AI), over infringement of the newspaper’s intellectual properties.

In Manhattan federal court, NYT charged that the named defendants took a "free ride on The Times’s massive investment in its journalism” by creating alternative ways to deliver information to users.

“There is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.” — from The New York Times.

Neither Microsoft nor OpenAI issued an immediate response. They have to cogitate carefully to conjure up a company line. Though these two companies and many others allegedly engage in repurposing other humans’ existing creative work, all make noises about the concept of “fair use.”

An explanation of “fair use”

My understanding comes from reading the exact rules from the US Copyright Office and asking questions to clarify some items. Here is what I found in Section 107 of the US Copyright Act.

The following four factors must be considered to determine fair use:

Purpose and character of the use — Courts review whether the accused party used the work for nonprofit educational and noncommercial purposes. Not all nonprofit education and noncommercial uses are fair. Not all commercial uses are deemed “other than fair.” Additional factors play in. For example, transformative uses are more likely to be considered fair. Transformative means the user has added further value, purpose, or an enhanced nature, different from the intended original use of the work.

If AI platforms just compile and regurgitate information gathered, written, edited, and disseminated by a writer or publisher, is that transformative?

Nature of the copyrighted work — This factor analyzes the degree to which the work copied relates to supporting and encouraging creative expression.

The US Copyright Office has specified their support of human creativity. A bot compiling info and hacking it up again may not support human creativity.

The portion used in relation to the copyrighted work as a whole — Courts look at both quantity and quality of copyrighted material used. If the use includes a large portion of the copyrighted work, fair use is less likely to be found.

It should be noted that AI companies have made it a point to say they scraped everything they could find on the interweb. Hardly seems fair to me, but what do I know?

Effect of the use on the potential market for or value of the copyrighted work — Does the repurposing by bots harm the existing or future marketability of the copyright owner’s original work and to what extent? For example, does it displace sales of the original or shrink the viability of human creative skills? The courts consider whether such use could cause substantial harm if it were to become widespread.

This one seems like a no-brainer as many creative workers and their products have already been widely displaced by work cobbled together on AI platforms.

Other factors may be considered by a court. Courts evaluate fair use claims on a case-by-case basis, and outcomes are fact-specific. There is no formula to decide a specific number of words, lines, pages, copies, or percentages of work that may be used without permission.

U.S. Copyright Office Fair Use Index

The goal of the Index is to make the principles and application of fair use more accessible and understandable to the…

www.copyright.gov

The New York Times, a 172-year-old publication facing — as are all media publications — layoffs, cutbacks, and decreased viability, estimated damages in the “billions of dollars.” They will also petition for large language models and training sets that incorporate NYT material to be destroyed.

Meanwhile, individual writers and creative workers have sued to limit the ability of AI services to scrape creative material from the universe in general without compensation or attribution for the makers.

There is no guarantee that any court will side with the Humans.

Backstory and some things to consider

A number of AI companies still scrape information online from millions of sources, articles, blogs, and websites to train generative AI chatbots. Those entities have attracted billions of dollars in investments, and some have gained huge profits.

Human creative workers, including David Baldacci, Jonathan Franzen, John Grisham, Michael Chabon, Ayelet Waldman, Matthew Klam, Sarah Andersen, Kelly McKernan, Karla Ortiz, Scott Turow, and others, have filed lawsuits claiming that AI systems might have co-opted tens of thousands of their books and other works.

Getty Images reported finding more than 15,000 of its images in the Stable Diffusion dataset. Getty filed a suit.

Suits are pending against Midjourney Inc, DeviantArt Inc, DreamUp, Stability A.I. Ltd, Stable Diffusion, and other text-to-image tools.

The NYT suit lists several near-verbatim regurgitations of Times articles from OpenAI and Microsoft chatbots. The newspaper stated that “such infringements threaten high-quality journalism by reducing readers’ perceived need to visit its website, reducing traffic, and potentially cutting into advertising and subscription revenue.”

It also said the defendants’ chatbots make it harder for readers to distinguish fact from fiction, including when their technology falsely attributes information to the newspaper. In AI terms, false or untrue responses are known as hallucinations, according to The Times. In real life, we might call them screw-ups or untruths.

Numerous plaintiffs tried to negotiate fair resolutions in discussions and proposals prior to the filing of lawsuits in the AI battle. All fell on deaf ears.

The case cited in this story is New York Times Co. v. Microsoft Corp. et al., U.S. District Court, Southern District of New York, №23–11195. You can read the pdf here.

If you’re interested in this unfolding AI vs. HI situation, please subscribe to my stories, wherein you’ll find much more information about copyrights and AI. Or visit my profile page.

Maryan Pelland OnText.com - Medium

Read writing from Maryan Pelland OnText.com on Medium. Dedicated to helping writers find their voice. I'll never bore…

medium.com

✍ — Published by Dr. Gabriella Korosi, at Dancing Elephant Press. Click here for guidelines to post.