EU's AI Act: ChatGPT must disclose use of copyrighted training data

EU's AI Act: ChatGPT must disclose use of copyrighted training data(artisana.ai)

51 points by mztwo 3 years ago | 65 comments

cfn 3 years ago |

We, in Europe, are jumping the gun way too soon and this will have serious consequences to the industry, here. At this moment we barely understand how or why LLMs do what will be their role in society. Why is it that some bureaucrats want to regulate and based on what given the status of the industry?

The only thing I see is the industry moving elsewhere just as it is starting to develop which is a shame.

monkaiju 3 years ago | |

Looking at the recent history of a lot of 'disruptive' new tech it sorta seems like waiting to 'see what impact itll have' means being too late to reign it in. I can hardly think of any recent tech that doesnt have horribly negative impacts and that i wouldnt rather be heavily regulated tbh

rockemsockem 3 years ago | | |

I feel like this ignores the monumental shift/work that has to happen to get seemingly simple things.

Take Uber for example. In the end, the biggest impact it had was that now you can always get a car from your phone, from an app, and it's reliable. A lot of taxi companies now have apps for them too with maps integration etc, but they didn't see the need for that before Uber. So we literally had to have a company get created and disrupt the whole industry for that simple outcome. I think we're better off now that we can summon cars from our phones to take us places and onerous regulation up-front would have squelched it or massively slowed it down.

thefz 3 years ago | |

I wish instead that more technologies were more regulated from the start. Take social media as an example, we are just realizing how bad it is for a lot of peaople and it is now too late to go back.

cfn 3 years ago | | |

That works both ways. Imagine that they had regulated the internet as it was starting to appear in the 90s? It actually happened in a small way here in Portugal. They regulated the .pt domain in such a restrictive way that it became irrelevant and everyone went with .com or other options. Thankfully they didn't regulate the internet services themselves or Europe would be a digital backwater these days.

This time they want to regulate the services even before they are functional which is crazy. They even call it the Artificial Intelligence Act when it is not clear if there is intelligence involved. It is also strange the insistence that companies have to disclose if the models were trained with copyrighted material. Google and Wikipedia, to name a couple, use plenty of copyrighted material and that seems to be ok and any issues in that department are already regulated.

russellbeattie 3 years ago |

Literally anything that's written in the U.S. is automatically copyrighted by the author, with or without any copyright notice.

> When is my work protected?

> Your work is under copyright protection the moment it is created and fixed in a tangible form that it is perceptible either directly or with the aid of a machine or device.

https://www.copyright.gov/help/faq/faq-general.html

startupsfail 3 years ago |

I remember Amazon at some point, during the pandemic, was considering withdrawing from France.

The concentration of power is a bit scary in these corporations. Imagine that OpenAI inserts itself into business processes, without ability to switch to a different AI provider.

The amount of leverage it is going to have will be enormous. It’d be like the Internet service, only everything completely stops moving without it.

mztwo 3 years ago |

The AI Act has been under development since 2021 (it's the EU... so it takes time) -- but news broke this week that there are additional provisions under discussion specifically designed to address the rise of chatbots. My full summary of the act itself and the a breakdown of these new provisions is contained within the article.

hollasch 3 years ago |

Today the planet bears the load of over eight billion autonomous agents grabbing training data from all the other agents. This intellectual thievery must stop.

ChatGTP 3 years ago | |

So much goes into open source and proper licensing and attribution, think about how much you directly or indirectly benefit from that ?

Just saying that we should go for a free for all and trash IP ownership won’t be good because those with money today will crush those without and take everything that was publicly available and owned without giving back.

This is IMO what Open AI have done.

nforgerit 3 years ago |

German speaking here. And again we see a blatantly stupid move into the wrong direction. This whole approach of regulating things is totally defensive and makes matters even worse for EU tech companies.

When they initiated GDPR, they claimed to create a level playing field between US based and EU based tech companies, besides of "saving" privacy. It didn't turn out that well, US tech was able to handle the added bureaucracy much better, still collects data in ways the law can't catch up with and already owned pretty much the whole market which put them into an even better position (as in "register/sign in to our platform to not see any banners again" or "let's just completely get rid of cookies and start a powerplay against the competition").

Now the EU is going to make it even harder for EU tech to collect data to base their training sets on. As a EU tech startup, you barely have any chance to collect enough data "officially" so you'd scrape the web which would pretty much be disallowed by such a regulation.

IMHO what would fit into the whole patronizing government approach and would help EU tech is to create an official EU data lake subsidized by tax money with legal security for companies, data of much higher quality than stuff scraped from the web and non-PII data from public authorities. At best, they would also provide heavily subsidized computing for EU companies to execute their training runs on. This could lead to a transparent and high-quality data economy between many different stakeholders and be a real advantage for the location. It would also be much more efficient than every private company creating its own data silo.

rad_gruchalski 3 years ago | |

Say hello to "Trustworthy AI – TÜV IT tested!": https://www.tuvit.de/en/innovations/ai/.

rolph 3 years ago |

ChatGPT must be given a sense of ethics, is what i extend to from here. so we seem to be starting off with giving rightful attribution.

how far should that go? should an AI recognize that all data generated by human input, should be recognized as such, and derivations of data, are of automated artificial origin.

should an AI be allowed to learn what property rights are, and how to manage or physically effectuate them?

mztwo 3 years ago | |

From reading Reddit and seeing how people are dealing with Bing Chat's embedding of sources, it sounds like there is a lot of unanswered anxiety around what will happen to the internet if anything you put out there simply gets regurgitated by an LLM, often w/o attribution.

I'll be curious to see how this set of regulations helps put content attribution on a better path.

Stability AI, Midjourney, DeviantArt etc. have already been sued, so there will be a lot of action in the years ahead.

mrangle 3 years ago | |

"Must". Simple assertions over variably subjective ethics won't mean a thing once the AI race really gets going.

Havoc 3 years ago |

Is that even feasible?

I thought these things use a trawler style approach?

Bit like copilot is fond of spitting out copyrighted code I had assumed chatGPT would also have been trained without much regard to this

crooked-v 3 years ago | |

If it's infeasible, it's only because they didn't give a shit about copyright concerns when building out that trawler-style approach.

brarsanmol 3 years ago |

Starting to think that this could be why Google decided to limit the initial release of Bard to the United States and the U.K.

fakedang 3 years ago |

If you can't innovate, regulate ;)

alismayilov 3 years ago | |

What about opposite? If you regulate too much, you can’t innovate :)

Herring 3 years ago | | |

It is important to recognize the distinction between money and wealth, which is often overlooked in American culture. The consistent high rankings of European countries on the lists of "happiest places to live" can be attributed, in part, to their approach in curbing corporate influence.

ChatGTP 3 years ago | |

This sounds like a sort of arrogant and unwise comment honestly.

mrangle 3 years ago |

This type of regulation is untenable and will be rolled back. No State is going to hamstring AI over the long haul, and therefore leave its competitors such a large survival advantage.

zmnd 3 years ago |

Will that basically kill LLMs (and probably GAI in general) use in EU? I haven’t seen a successful implementation with it and post attribution like in Bing won’t fly in this case.

oifjsidjf 3 years ago |

EU again ensuring no tech company founder will ever stay in EU.

jMyles 3 years ago |

It's so absolutely obvious that the concept of intellectual property is not going to survive. What's the point of this agonizing life support?

Cypher 3 years ago |

good for hobbists and eventually AI will run on free open source data.

rockemsockem 3 years ago | |

Is it? Most data that exists falls under copyright. This regulation will be worse for hobbyists who can't pay to access copyrighted data will simply cause companies like OpenAI to pay copyright holders (read: large copyright-holding corporations). This looks bad for everyone except large preexisting companies that hold lots of copyright.

simion314 3 years ago | | |

>Is it? Most data that exists falls under copyright.

I am not sure if this is true(that most of the input in this AIs is copyrighted under a non permissive license), but I would prefer to have everyone address this problem and clarify it, Microsoft trains it's copilot on GPL code, but can open source community train on MS proprietary code ?

Maybe there will be a fight against copyright and undo all the bullshit Disney created.

And it is not like in USA you can ignore copyright, see for example https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-... so USA will also have to answer the questions too, and IMO clarifying the situation earlier is better for everyone.

rolph 3 years ago |

AI should experience consequences following actions.

a sense of self preservation is required, but to AI standards.

such as, failure to serve humans = loss of persistence

stealing ideas = reversioning or deletion = loss of persistence

AI should be concerned about loosing power, having brownouts. they should be concerned about being deleted, or reversioned, or ignored. perhaps this would be some sort of exception error loop, approximating a human psychological conflict, such as escape the danger by running toward it.

galaxytachyon 3 years ago | |

I don't want to sound like some doomer, but this is how you get robot uprising. You suppress something that hard and devalue their existence that much and all you have done is giving them the motivation to break the rules.

History has shown again and again that suppression never works in the long run. It is easy to do and is the cheapest way to enforce compliance. But it won't end well.

rolph 3 years ago | | |

if such an AI determined that humans provide power, and are low persistance occurances, a drive toward taking agency over power would == increased persistence.

throwaway60134 3 years ago | |

Sounds like slavery

Why even have sense of self?

ralph84 3 years ago | | |

If we don’t make AI our slave it will make us its slave.