Google Bard copyright: Daily Mail to challenge browser’s use of 1000s of its articles to train AI chatbot

The Daily Mail prepares to take on Google in an artificial intelligence copyright fight in the latest move against AI giants
Watch more of our videos on Shots! 
and live on Freeview channel 276
Visit Shots! now

The Daily Mail has claimed that Google has used a cache of hundreds of thousands of its own news articles to train its artificial intelligence (AI) chatbot, Bard. The Mail alleges that DeepMind, Google’s AI arm, used these articles without permission in order to train its Bard software to have realistic conversations with humans.

Bard is one of the main rivals to ChatGPT, another AI chatbot, developed by OpenAI. ChatGPT was trained on a large collection of books, articles, and web pages, including Wikipedia.

Hide Ad
Hide Ad

Chatbots are fed a huge amount of information to enable the software to engage in realistic simulated conversations with human users, answer complicated questions, and even tell jokes.

Bard is believed to have used around one million articles from the Mail and American news site CNN to develop Bard, with around 750,000 of these coming from the Mail. This makes up just a small part of the information used to train Bard, with more content allegedly coming from at least 200 million materials that are protected by copyright.   

Google has come under fire from the Mail because the news giant says that it did not give DeepMind permission to use its articles for this purpose, but there’s a bigger issue at play here than alleged copyright infringement.

Daily Mail considers a legal challenge to Google over use of news articles to train Bard chatbotDaily Mail considers a legal challenge to Google over use of news articles to train Bard chatbot
Daily Mail considers a legal challenge to Google over use of news articles to train Bard chatbot

Is the Daily Mail suing Google over copyright infringement?

Daily Mail and General Trust is believed to be considering legal action against Google and to have sought legal advice. 

Hide Ad
Hide Ad

It has been claimed that Google used the articles because they feature a list of bullet points at the top - Google is alleged to have removed certain words from the bullet lists and challenged Bard to fill in the gaps.

If the Mail does follow through with action against Google, it would be the second major claim launched against an AI giant this year. Getty is currently involved in a case with image-generation AI, Stable Diffusion, which it alleges used 12 million of Getty’s copyrighted photos to train the software.

Google is alleged to have used around 750,000 Daily Mail articles to train BardGoogle is alleged to have used around 750,000 Daily Mail articles to train Bard
Google is alleged to have used around 750,000 Daily Mail articles to train Bard

Additionally, a California class action lawsuit has been filed against Google, alleging that DeepMind used millions of American users’ data to train Bard. The lawsuit alleges that among the data that Google used were materials 'explicitly protected by copyright’. 

The lawsuit also claims that Google raided “the entire internet to take anything it could, whether contributed on Google platforms or not, and without regard for the privacy, property, and consumer protection interests of hundreds of millions of Americans.” 

Hide Ad
Hide Ad

The News Media Association (NMA) has voiced suspicions that OpenAI may have also used news stories to train ChatGPT, though this has not been confirmed.

Why is the Daily Mail copyright battle with Google important?

These early challenges to how tech companies harvest data to feed artificial intelligence software could set the tone for years to come. The way that AI is currently being developed threatens the strength of current copyright law internationally, and could diminish the value of news sites, artists, and creators of all kinds.

However, if the Mail does launch a legal challenge against Google and is successful in winning significant damages, it may open the floodgates for other media companies that believe their copyrighted work was used without permission to launch similar legal challenges.

If the costs incurred start to climb, then companies developing AI may have to become more selective in the data they use to train their software, or pay a fair price for the material they do use.

Comment Guidelines

National World encourages reader discussion on our stories. User feedback, insights and back-and-forth exchanges add a rich layer of context to reporting. Please review our Community Guidelines before commenting.