Do Foundation Model Providers Comply with the EU AI Act?(crfm.stanford.edu) |
Do Foundation Model Providers Comply with the EU AI Act?(crfm.stanford.edu) |
Foundation models are labelled as 'high risk'!
In my opinion this is way too premature... this would cripple open source AI as well...
> While the act includes open source exceptions for traditional machine learning models, it expressly forbids safe-harbor provisions for open source generative systems.
>Any model made available in the EU, without first passing extensive, and expensive, licensing, would subject companies to massive fines of the greater of €20,000,000 or 4% of worldwide revenue. Opensource developers, and hosting services such as GitHub – as importers – would be liable for making unlicensed models available.
>Open Source LLMs Not Exempt: Open source foundational models are not exempt from the act. The programmers and distributors of the software have legal liability. For other forms of open source AI software, liability shifts to the group employing the software or bringing it to market. (pg 70).
Source: https://technomancers.ai/eu-ai-act-to-target-us-open-source-...
While i'm usually pro-EU they are really overreacting here and the consequences for our economy of crippling a technology with so much potential will be enormous in the long run.
ALL the regulatory activity mentioned in the article is related to "high-risk" AI systems, which are specifically:
- where the AI is part of a safety system, and where that safety system already needs to undergo conformity assessment.
OR
- where the AI system poses a significant risk of harm to the health, safety or fundamental rights of natural persons - for very specific use-cases
https://www.europarl.europa.eu/resources/library/media/20230... (p122-125)
What really frustrated me about this whole discussion is seeing some SV heavyweights quoting this article uncritically and screaming about how stupid the EU is again, while referring to supposed requirements that are nowhere to be found in the act. I would assume these people have access to the best information in the world, yet they don't seem to have had any of their staff actually read the draft. :/
FWIW, I quickly wrote up some of my thoughts about what the technomancer's article gets wrong at the time, but then didn't get around to polish and publish them. If you're interested, here are my notes: https://gist.github.com/heidekrueger/bdee0268ecdad5f6b56f557...
Edit: I want to emphasize that I DO share some of the concerns that the blogpost raises about the current draft of the act. I just wish we could have a meaningful discussion about it rather than namecalling and fearmongering.
> We assess the compliance of 10 foundation model providers—and their flagship foundation models—with 12 of the Act’s requirements for foundation models
The whole point of this article is to see what would apply to theses models!
We've seen pushes from both sides to redefine anything they disagree with as harmful to mental health.
It's usually the Americans who are freaking out over this, I assume based on their local experience. It’s the same thing about the unions or any other stuff that works completely differently in EU and USA. it cuts both ways, European understanding of the American healthcare system is also a caricature.
wrong analogies also help, like assuming that EU is like the American federal system or that the European law works like the American law.
In the specific case about artificial intelligence, EU is interested in regulating high risk systems, but the online conversation revolves around people freaking out that EU will ban their home grown language model.
> In 2006, the World Trade Organization concluded that the EU moratorium, which had been in effect from 1999 to 2004,[12] had violated international trade rules.[13][14]
We had a moratorium for years and even now we have the most stringent GMO regulations in the world. This crippled GMO research in Europe.
Here's what EU AI Act is actually about: https://softwarecrisis.dev/letters/the-truth-about-the-eu-ac...
And no wonder the (mostly American) AI industry is so outraged by it: it requires accountability.
What is more likely is that OpenAI, Google etc will train EU-specific models for the EU market.
- there are so many laws and regulations,
that everybody is violating some of them
- therefore only a tiny fraction can be
brought to court
- therefore everybody lives in the constant
fear that they suddenly arbitrarily get
crushed
?Examples:
https://lawlinguists.com/break-law-every-day/ https://mccreadylaw.com/blog/breaking-law-every-day/ https://www.rd.com/list/weird-laws/ https://brittontime.com/2021/01/08/top-10-ways-people-breaki...
It's noteworthy that Fair Use is a largely American doctrine that does not appear in European copyright law across the board.
Some may be tempted to point at the lower scores of OpenAI, Google and Facebook models and say this means the regulations aren't fit for purpose, though of course you can take the view this means OpenAI, Google and Facebook aren't doing a good enough job of how they train and publish data about their models.
and I'm not sure what disclosing the training time and power used to train the model tells anyone?
on edit: tells anyone that is a consumer of the product and might have legitimate worries which would be a good prompt to legislation, added edit in for eager HN downvoter who didn't like my asking the question.
And in the end there's no limits in this act. It's not like they're saying "No models above 10 GWh inside the EU!".
So, this article is very useful, but it really is just an analysis of compliance with _proposed_ law.
For it to become real law EU council has to agree unanimously (made up of prime ministers of all member states). Then each one of 27 countries has to implement the law following it's democratic process (national parliament, president, it has to pass any constitutional challenges if any are made etc). Only then it becomes law.
It is a long tedious process and certain things are apriori excluded from EU's jurisdiction altogether such as state level energy generation, anything that affects security situation and many others. So for decades now EU commission and the Court of justice of EU have been working very hard on "scope creep" of existing laws. It's a bonanza of opportunity for most powerful EU states to squeeze the smaller ones, for powerful external groups to influence whatever they want and so on. Seriously, after the horrible fiasco of Brexit (for EU, as I'm considering it from that perspective), loosing one of the most developed and competitive countries on Earth the EU should really have had a proper reform. There are many reasons why EU couldn't retain UK as a member. Arrogance of the commission, is but one of them. Now from the perspective of many years it is very clear booting UK out (and making it think it was it's own idea) was Franco-German (Russian sponsored) plan from the start.
I really hope that'll be the case. FWIW, I believe companies _should_ be required to keep tabs on their (and their supply chain's) emissions, but demanding that this be done at model/system level by data scientists is just ridiculous.
edit: grammar
Whereas if it is a company voluntarily reporting it, the number would just be number of GPUs * wattage of GPUs / tokens generated past year = energy per token.
One of the lessons from the crypto fiasco is that, if unchecked, the energy requirements can baloon to stratospheric heights.
Although the EU seems to think otherwise time and time again.
What is it actually useful for?
There is lots of text on sources like "Wikipedia" that make it sound like EU parliament is almost as powerful as national parliaments. This is simply untrue. First and foremost EU parliament hasn't even got the most basic parliamentary perogative of initiating proceedings in a legally binding act. Instead of Wikipedia I suggest this page (EU own) that shows various types of legal documents in EU. https://european-union.europa.eu/institutions-law-budget/law...
Of all these EU parliament can only make on its own "opinions" and "recommendations". Every other type gets initiated (the very first draft of text is written) by another body. Then the parliament votes on the proposal, but get this... The EU council is by no means bound by a result of that vote. So EU parliament can "strike down a proposal" and the EU council can still progress it. All it has to do is "take it under consideration". See a good factual description of the procedure here on a EU page https://commission.europa.eu/law/law-making-process/types-eu...
As for the act in this article it's not even an act that can become law in its current form. That particular act will now enter into a "draft negotiation mandate" phase when all real stakeholders (nation states) get to say if they like it, how they want to change it etc. And maybe then at some point it'll become law, in what capacity? With what wording? Who knows.
- The AI Act regulates both 'high-risk AI systems' and 'foundation models' and applies different requirements for them.
- 'foundation models' are essentially defined in the act as "very large scale and expensive generative ai models that will probably only be offered via API" (my words). The reason the act wants to regulate them is so that USERs of foundation models have a chance to make their downstream use case complaint if that use case is high-risk. For example, if I'm a health insurance provider and I'm using a chatbot enabled by GPT4 in my health insurance sign-up flow, then my system may be high-risk and needs to be compliant. I need access to some information aobut GPT4 (e.g. expected error modes, potential biases etc) to do that.
- The wording of the act makes a point of highlighting that your run-off-the mill open source generative AI project will not constitute a 'foundation model'. The exact scale at which a project will become a regulated 'foundation model' is not yet clear, but it can be assumed that it will be at least tens of millions of dollars. If you can spend that much on compute an researchers, I think you can spend a few k on becoming compliant.
- The technomancers article confuses requirements for High-risk systems with those for foundation models. (It also gets some of the high-risk requirements completely wrong, but that's another discussion.)
- The stanford HAL website does a great job with the facts! I really value seeing thoughtful contributions to the discussion like theirs. (Especially from an American institution!)
Did it?
But yes, if something is banned then its probably bad for business and innovation around it. Europe is probably missing out on carcinogenic food additive innovations too. Okay, GMO's are not banned by EU but they are banned by some of the largest EU countries and if those countries fell behind in GMO tech, maybe it's due to the local bans and not the EU regulations?
EU isn't into banning AI though. Regulating doesn't mean banning, consider EU's digital markets act - EU didn't ban AppStore but put a requirement on device manufacturers to allow other AppStores. When EU regulated the EV charging ports, it did not ban EVs, but put a requirement to have unified plug.
But... have you seen the state of GDPR enforcement? Anyone who made an honest effort is fine. I don't know of any GDPR enforcement action where the indicted company wasn't blatantly and willfully violating or ignoring the law.
FWIW, everything I've seen from regulators and the legislators involved in the nitty-gritty of the act seems to suggest that most of them are really smart people who know what they don't know. They know that AI is quickly evolving and the draft of the law goes out of its way to _not_ be too specific about _how_ to comply. E.g., I would not expect the EU (or national regulators) to bring down 'one right way' to report energy consumption.
The fact that they COULD still bothers me.
Do you plan to integrate ChatGPT or Stable Diffusion to a high-risk system? I would advise against it at this time.
This doesn't necessarily measure every use, just "manufacturer's spec", the same you get for eg energy class for house appliances (at least in the EU). Nobody goes around measuring refrigerator power usage, but when you're buying one, you get a rough indication of how "green" (or not) it is.
It wouldn't make sense to include measured idle time in the energy numbers you'd include in model documentation. Maybe that could go in a monthly report somewhere, but that's a different topic.
I was referring more to the users of such a system (what the AI Act would call a 'deployer'). They may have significantly less expertise but could still be required to track real-time energy use. Of course, simply referring to the 'energy label' by the provider could be a viable solution.