You are viewing your 1 free article this month. Login to read more articles.
Meta has used millions of pirated books to develop its AI programmes, as reported in the Atlantic, provoking outcry from many writers and organisations such as the Society of Authors (SoA).
The American publication published a searchable database of more than 7.5 million books and 81 million research papers. This data set, called Library Genesis or ‘LibGen’ for short, is full of pirated material, and all of it has been used to develop AI systems by tech giant Meta.
According to the Atlantic, court documents show that staff at Meta discussed licensing books and research papers lawfully but instead chose to use stolen work because it was faster and cheaper and Meta argued that it could then use the US’s ‘fair use exception’ defence if it was challenged legally. The SoA released a statement that said: “Given that Meta Platforms, Inc, the parent company of Facebook, Instagram and WhatsApp, has a market capitalisation of £1.147 trillion, this is appalling behaviour.”
The organisation added: “It is not yet clear whether scraping from copyright works without permission is unlawful under the US fair use exception to copyright, but if that scraping is for commercial purposes (which what Meta is doing surely is) it cannot be fair use. Under the UK fair dealing exception to copyright, there is no question that scraping is unlawful without permission.”
The society is calling for Meta to compensate the rightsholders of all the works it has been exploiting. The society urged for the “need to see strong legislation from governments to uphold and strengthen copyright law, ensure transparency and fair payment, and to penalise big tech companies that ride roughshod over the law”.
Continues...
The SoA’s CEO Anna Ganley said: “Rather than ask permission and pay for these copyright-protected materials, AI companies are knowingly choosing to steal them in the race to dominate the market. This is shocking behaviour by big tech that is currently being enabled by governments who are not intervening to strengthen and uphold current copyright protections. As part of the Creative Rights in AI Coalition, the SoA has been at the heart of the fight and is continuing to lobby against these unlawful and exploitative activities.”
On Bluesky, historian Greg Jenner said that “copyright law is being utterly trampled on, over and over”.
Author and former publisher Harriet Evans apparently told the SoA that she has found all of her books in the data set, including one that has yet to be published.
Author Nadine Matheson said: “I found 13 of my books, including my traditionally published works, translated editions, and my self-published book, in the LibGen database, which means Meta has likely used them for AI training without permission. You can search for your own books there to see if they’ve been stolen, too.
“If you haven’t already, contact your trade unions, writers guilds and societies, wherever you are in the world. Reach out to Meta (for whatever good that will do). But one thing is certain: we will not take this lying down.”
The SoA offered advice for authors who find their books in the register.
Catriona MacLeod Stevenson, general counsel and deputy CEO of the Publishers Association, said: “While we have long suspected that illegal pirate websites have been used in the training of LLMs, court documents reported by The Atlantic show that Meta employees were actively encouraged to download and use LibGen’s more than 7.5 million books and 81 million research papers to use to train its LLMs.
“This is infringement of authors’ and publishers’ copyright on a massive scale, and should not go unchallenged. The Publishers Association and its members are actively considering their next steps in this regard. Publishers – and other creative sectors – have said many times before, big tech companies can afford to pay for the content they use and should do so. There is a simple way to access the high-quality content AI developers wish to use to train LLMs, and that is paying for it, just as they pay for the electricity they use, in the ordinary course of doing business.
“As the UK Government considers the thousands of responses to its Copyright and AI Consultation, now is the time to make it clear that companies such as Meta need to be transparent about the copyright-protected works they have used and wish to use, and enter in good faith into licensing discussions so that rightsholders can be remunerated for their work.”
The Bookseller has contacted Meta for a comment.