Can AI Content be Attributed to Original Authors?

In a discussion with fellow authors, we touched on AI and maybe the EU reining it in.

(original art created by Giovanni – not an AI!)

The EU is focused on ‘life critical’ issues like health care and the financial system –

https://www.mckinsey.com/capabilities/quantumblack/our-insights/what-the-draft-european-union-ai-regulations-mean-for-business

Those systems will be asked to provide ‘visibility’ into where it gets its raw data from.

But other systems that create text output, image output, games, etc. etc., aren’t going to need that accountability.

I doubt the EU is even going to succeed on that first metric. Nearly all AI deep learning systems are ‘black box’. We feed the systems enormous amount of data and there’s no way for the system to know which particular piece influenced the ranking of which particular word. It’s a sea of data. The public web has about 5 billion pages right now.

Let’s say you made an article on mindfulness that was 400 words long. You scan those 5 billion pages for any mention of mindfulness. Even if you did tag the first output word in the article with 10,000 sources, it would never make sense to associate all 10,000 sources with word #1, and then 10,000 sources (perhaps entirely different pages) with word #2, and so on. It would be enormously unsustainable.

That’s the challenge with even the health care / financial areas. The idea is you’re building a consensus across billions of pages about what “the most likely” first word is, “the most likely” second word, and so on. And how would you even give credit, when so many sources are random blogs with anonymous owners? And then so many blogs have stolen content from other people?

I really don’t see any attribution at all could be feasible. A single 400-word AI article on mindfulness could have frequency data drawn from 4,000,000 individual pages and then a responsible entity would have to track down the ‘legal original owner’ of the content of every one of those 4,000,000 pages.

ASCAP, for music, only works because each musician has specifically registered an entire song with ASCAP, including their name, their contact information, their bank information, and a certification that they 100% wrote and own that sequence of notes. Even then, though, a LONG portion of the song (15 seconds in general) has to be a match before the ASCAP system gives them credit. An ASCAP artist CANNOT get credit just because their song has an E flat and someone else has an E flat. That “single word” match is the exact thing happening with AI’s word-by-word construction.

Imagine if you tried to get ASCAP to give every single musician who had an E flat in their song “credit” if an AI music program used an E flat in their AI song. Then go beyond it, and imagine someone wanted every single musician in the world who has ever posted a piece on the internet (known or unknown to ASCAP) to somehow get legal credit if their song has an E flat and an AI program used an E flat in their piece …

Can AI Content be Attributed to Original Authors?

Share this:

Leave a Reply Cancel reply