You are viewing your 1 free article this month. Login to read more articles.
The world’s biggest trade publisher has changed the wording on its copyright pages to help protect authors’ intellectual property from being used to train large language models (LLMs) and other artificial intelligence (AI) tools, The Bookseller can exclusively reveal.
Penguin Random House (PRH) has amended its copyright wording across all imprints globally, confirming it will appear “in imprint pages across our markets”. The new wording states: “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems”, and will be included in all new titles and any backlist titles that are reprinted.
The statement also “expressly reserves [the titles] from the text and data mining exception”, in accordance with a European Parliament directive.
The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers including Taylor & Francis, Wiley and Sage have announced partnerships to license content to AI firms.
PRH is believed to be the first of the Big Five anglophone trade publishers to amend its copyright information to reflect the acceleration of AI systems and the alleged reliance by tech companies on using published work to train language models.
PRH UK c.e.o. Tom Weldon told PRH staff in August that the business “will vigorously defend the intellectual property that belongs to our authors and artists”, but added that it would also “innovate responsibly” and “use generative AI tools selectively and responsibly, where we see a clear case that they can advance our goals”.
The Authors’ Licensing and Collecting Society, which recently ran a survey of its members to understand their views on AI, welcomed PRH’s update. C.e.o. Barbara Hayes said: “It is encouraging to see major publishers like PRH adopt new wording in their printed materials that reaffirms the principle of copyright and explicitly forbids technology companies from using copyrighted works to train their AI models. We hope more publishers follow {PRH’s] lead and that those companies developing such models take urgent notice.”
The Society of Authors also said the change in PRH copyright pages was a “welcome move” but added that the current wording didn’t go far enough, as author contracts also needed to be amended. The SoA’s c.e.o. Anna Ganley said: “There is no standard ‘All rights reserved’ wording and even the most basic notice covers all uses. Having said that, we’re pleased to see publishers starting to add to the ‘All rights reserved’ notice to explicitly exclude the use of a work for the purpose of training [generative AI], as it provides greater clarity and helps to explain to readers what cannot be done without rights-holder consent.”
She added: “In addition to this change, we now hope to see changes in publishing contracts too, and proper safeguards to be added, as we believe it is equally important that publishers guarantee to creators that their consent will be sought before the publisher uses—or allows the use of—generative AI in association with the production of the work—for example, for purposes of narrating, translating, images, cover design—and before the publisher grants any access to, or use of, the work by an AI system.”
The SoA also said it would continue to lobby “for a legal framework that honours existing and developing market solutions for AI technologies, namely proper and transparent licensing that ensures creators and rightsholders are paid for the use of their works. It is encouraging to see the entire creative industries united in defending this foundational principle of our creative economy.”
Publishing lawyer Chien‑Wei Lui, senior associate at Fox Williams LLP, told The Bookseller that “the chances of an AI platform providing an output that is, in itself, a copy or infringement of an author’s work, is incredibly low.”
She said it was the training of LLMs “which is the infringing action, and publishers should be ensuring they can control that action for the benefit of themselves and their authors”.
Lui pointed out that the publishing industry was still trying to establish best practice amid a rapidly changing generative AI landscape. “The more training that is being done on a non-contractual/licence basis, the greater the risk that author content is being devalued," she said. "Why would a platform pay to license content for training purposes if it suspects that content is already ’out there’? While the acceleration of generative AI has posed existential questions for the publishing industry, the more prosaic concern is that if your content is being used for training without consent, that is revenue that both the publisher and author are missing out on.”
Lui added: “Publishers need to ensure they understand all the tools at their disposal to limit the ability for third parties to use their content for training purposes. Having a clear and advertised statement about reserving all training and text and data mining rights, for example, is helpful.”
She pointed out that several publishers had written cease and desist letters to some of the larger LLM platforms, but she suggested taking practical steps to prevent content being scraped or used for training. “Many of the AI platforms have published guides on how users can ‘opt out’ of having their content used for AI training, and these are readily available on the internet. In addition, there are ways by which you can prevent your website content from being scraped by using a robots.txt file. Furthermore, new machine-readable text and data mining licences are being created so that any machine coming to scrape your content can be directed to a legitimate means of access (paid for or otherwise),” she said.
The Bookseller approached other leading publishers to ask if they had changed or planned to amend their copyright information in view of the challenge that AI poses. Pan Macmillan, Hachette and Simon & Schuster declined to comment, while Faber could not be reached for comment. However, The Bookseller understands that Faber has recently adopted an “AI policy” that prohibits freelancers working with its authors’ books to copy any of the information into an AI programme "for the purposes of editing, checking, extraction or any other purpose".
PRH’s copyright statement in full reads: “Penguin Random House values and supports copyright. Copyright fuels creativity, encourages diverse voices, promotes freedom of expression and supports a vibrant culture. Thank you for purchasing an authorised edition of this book and for respecting intellectual property laws by not reproducing, scanning or distributing any part of it by any means without permission. You are supporting authors and enabling Penguin Random House to continue to publish books for everyone. No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems. In accordance with Article 4(3) of the Digital Single Market Directive 2019/790, Penguin Random House expressly reserves this work from the text and data mining exception.”