Navigating the Copyright Conundrum in AI

Apr 10, 2024

Read time: 2 minutes

In an era where artificial intelligence (AI) is reshaping industries and redefining our interaction with technology, a pivotal question emerges: How do we balance innovation with the rights of content creators?

A recent New York Times article brings to light a contentious issue: OpenAI's use of YouTube videos to train its models, potentially infringing on copyright laws.

Now, you expect Google, who own YouTube, to be up in arms, right?.

No.

It turns out Google also used YouTube content to train it's own models which infringed on it's own copyright agreement with content creators. <insert mind blown emoji here>

Which leads us into a dialogue on the ethical boundaries of AI development.

The Paradox of Progress

Big tech companies, no strangers to scrutiny over data usage, find themselves at a crossroads. The advancement of AI relies profoundly on data, the more diverse and comprehensive, the better the AI's performance.

You can see the scaling research about this here if you're interested.

Yet, this requirement for data encounters a number of barriers: one of which is the finite nature of freely available, usable data.

As we venture forward, the question of where to source fresh data becomes part of the arms race that all big tech is now competing within.

The Legal Battlefield

The intricacies of copyright law and its applicability to AI training present a legal challenge that has already seen the courtroom's doors.

The New York Times' lawsuit against Microsoft and OpenAI is a good example of the tension between content creators/publishers and AI.

The defence of 'fair use' or transformative use is a nuanced argument, highlighting the probable need for new laws and a legal framework that evolves along with technology.

The Future of Data Acquisition

The quest for new data sources is the next battlefield for AI. Currently available data is being used up, so the exploration of alternative or synthetic data emerges as a possible alternative.

But, this approach is not without its pitfalls. Synthetic data, if created by flawed models with biases or inaccuracies is only going to potentially create more of that.

Innovative solutions, such as using one AI to generate data and another to evaluate its quality, exemplify the proactive approaches being tested however these are unproven at the time of writing.

Join the Conversation

The intricacies of copyright within AI comprise of legal, ethical, and technical challenges. Each piece important to understanding the broader implications for creators, innovators, and consumers alike.

As we stand at this juncture, the need for informed dialogue has never been more critical.

AI Law Webinar: Your Next Step

If you are interested in finding out more sign up to our upcoming webinar, presented in collaboration with Hybrid Legal. Here, we'll unpack these complex issues in straightforward terms, offering clarity and insight into a subject that sits at the heart of AI's future.

Keep informed with the newsletter for PE operating partners and the portfolio companies they back.

Get operational insights and trends, AI frameworks, resources and real deployment stories.