You are viewing your 1 free article this month. Login to read more articles.
Dan Conway, c.e.o. of the Publishers Association, has told MPs that Large Language Models (LLMs) used in Artificial Intelligence (AI) are breaking the law on a “massive scale”.
The Communications & Digital Committee gathered in Parliament on Tuesday afternoon (7th November) to discuss the nature of LLMs. As well as Conway, other experts giving evidence included Dr Hayleigh Bosher, associate dean and reader in intellectual property law at Brunel University London, senior associate at Clifford Chance Arnav Joshi and Richard Mollet, head of European government affairs at multinational analytics company RELX.
Conway made his concerns clear about how the technology is infringing on copyright law. “Large language models...are a force for good and hugely exciting and the creative industries will be innovating alongside this innovation,” he said.
“But the truth is that the market conditions that are currently out there means that AI is not being developed in a safe, reliable or ethical way. And that’s because Large Language Models are infringing on an absolutely massive scale, we know this in the publishing industry because of the Books3 database [as exposed by the Atlantic] which is 120,000 pirated book titles which have been ingested by these Large Language Models. And we also know because of the output of the models, what’s coming out the other end, that the published end content has to be books content.
“They [the LLMs] aren’t currently being compliant with IP law. We’ve had conversations with technical experts around the processes undergone by the Large Language Models and it’s our contention to the committee that these models do infringe copyright at multiple processes around how they collect, how they store and how they handle it, so it’s our contention that copyright law is being broken on a massive scale.”
Bosher appeared to agree with Conway’s assertion that breaches are taking place. She told the committee: “I think the principles of when you need a licence and when you don’t is clear and to make a reproduction of a copyright-protected work without permission would require a licence [it] would otherwise be infringement, and that’s what AI does at different steps of the process... we’re in a position where some AI tech developers are arguing a different interpretation of the law.”
Conway said: “We think it’s quite clear what should be happening. There should be a process of permission, transparency, remuneration and attribution.
“Really that’s about licensing, what we need is market-based solutions for licensing which are as seamless as possible, flexible and can make sure that this access point for AI systems for data is done in the best possible way. So this is probably a combination of direct licensing... or a collective licensing model which could be helpful for particularly smaller businesses on the AI development side and the rights-holder side.
“That’s not on the market yet, but this is where we need to look at... how do we improve the licensing system so these deals can be done, we can ensure the right information and right creative works are going into these machines and we get the right outputs at the other end.” Bosher agreed with Conway that there needed to be clearer licensing structures.
In terms of encouraging change, he said: “We still support a voluntary process and we’d like to see a highest set of principles from government saying copyright and transparency applies... there’s lots of global models out there we could pick and choose on what would work best.
“I would support a voluntary approach but very much backed up by a legislative handbrake if those voluntary conversations fall apart.”
Last month, publishing trade bodies urged the government to put in place “tangible solutions” to protect the “human creativity” behind AI and asked for “acknowledgement of and recompense for the copyright infringement that has already happened”.