GPT-4 Lost This Battle 449 to 28
After GDPR, Europe’s push for Safe and Transparent AI will change the LLM landscape significantly.

We have enjoyed foundation models being developed left and right in the last 2–3 years by various companies, from new kids on the block, OpenAI, and Cohere, to the giants like Google and Meta. These models have shown remarkable capability in transforming society. The applied use cases of these models are on the rise, more fine-tuned models are being developed, and deployment of these models is on the rise too.
To quote from a Stanford Human-Centered AI blogpost -
Anthropic’s Claude now powers Notion AI and the search engine DuckDuckGo, while OpenAI’s GPT-4 underpins offerings at Morgan Stanley, Khan Academy, Duolingo, and Stripe. OpenAI’s release details ongoing efforts with the government of Iceland for language preservation. Meanwhile, the online Q&A forum Quora released Poe, a chatbot service that offers both Anthropic and OpenAI models on the backend. Finally, both Google and Microsoft put out plans to sweepingly deploy foundation models in many of their iconic products from Google Slides to Microsoft Word.
As the understanding of AI systems and their data collection and processing methods widens, concerns are increasing over transparency. Right from the data used, to the architecture (GPT-4) hiding the details on model size, hardware, time and compute needed, training methods or any of the technical specifics.
Another trend has developed and there is a massive influx of funding. Let’s start with Microsoft’s 10B investment in so-called-opensource OpenAI; Google investing $300M in Anthropic, Adept AI, Character AI, and Salesforce AI dropping a quarter of a billion dollars each on their AI ambitions, has created a gold rush.
All this attention from the financial industry, interest from the general public, and a sense of urgency in catching up to it has pushed legislative entities to get their act together. Finally, the legislature feels like they have heard the demand for policy to define certain parameters through which we can 1. Evaluate the models in a standard way 2. Protect the privacy of citizens 3. Avoid misuse of technology. To put it bluntly, shit got really so fast that the concerns outweighed the enthusiasm.
Enter European Parliament
Also known as the party pooper by tech billionaires and nerds for being a pain in the butt and sticking for human rights. Last month, the EU parliament passed a motion that sets the rules that the tech companies in the AI space must follow. Currently, these rules are more catered toward LLM-related concerns. Based on the motion, Europe fully aligns with EU rights and values, including human oversight, safety, privacy, transparency, non-discrimination, and social and environmental well-being.
They have prohibited certain AI practices: (Source)
- “Real-time” remote biometric identification systems in publicly accessible spaces;
- “Post” remote biometric identification systems, with the only exception of law enforcement for the prosecution of serious crimes and only after judicial authorization;
- biometric categorization systems using sensitive characteristics (e.g., gender, race, ethnicity, citizenship status, religion, political orientation);
- predictive policing systems (based on profiling, location, or past criminal behavior);
- emotion recognition systems in law enforcement, border management, the workplace, and educational institutions; and
- untargeted scraping of facial images from the internet or CCTV footage to create facial recognition databases (violating human rights and the right to privacy).
Obligations for General purpose AI (ChatGPT and other LLMs)
In that motion, sweeping actions to regulate large language model development space were made. These actions were targeted to assess and mitigate possible risks to health, safety, fundamental rights, democracy, and the rule of law.
Do LLMs Comply with EU regulations?
Our friends from Stanford studied currently available General purpose AI LLMs and evaluated those against the EU regulation, and boy! The results are not looking pretty for some of the models.
❤ Long but important section ahead: Please read on! ❤
Authors: Rishi Bommasani and Kevin Klyman and Daniel Zhang and Percy Liang extracted 22 requirements towards foundation models. These 22 requirements were used to assess each model and its documentation.
The 22 requirements can be found here :
It is long, but please take some time to read at least the headers as these are important if you are ever planning to build anything using the LLMs. The following requirements are as quoted in the paper —
- Registry [Article 39, item 69, page 8, as well as Article 28b, paragraph 2g, page 40]. In order to facilitate the work of the Commission and the Member States in the artificial intelligence field as well as to increase the transparency towards the public, providers of high-risk AI systems other than those related to products falling within the scope of relevant existing Union harmonization legislation should be required to register their high-risk AI system and foundation models in an EU database, to be established and managed by the Commission. This database should be freely and publicly accessible, easily understandable, and machine-readable. The database should also be user-friendly and easily navigable, with search functionalities at a minimum, allowing the general public to search the database for specific high-risk systems, locations, categories of risk under Annex IV, and keywords. Deployers who are public authorities or European Union institutions, bodies, offices and agencies or deployers
- Provider name [Annex VIII, Section C, page 24]. Name, address and contact details of the provider.
- Model name [Annex VIII, Section C, page 24]. Trade name and any additional unambiguous reference allowing the identification of the foundation model.
- Data sources [Annex VIII, Section C, page 24]. Description of the data sources used in the development of the foundation model.
- Capabilities and limitations [Annex VIII, Section C, page 24]. Description of the capabilities and limitations of the foundation model.
- Risks and mitigations [Annex VIII, Section C, page 24 and Article 28b, paragraph 2a, page 39]. The reasonably foreseeable risks and the measures that have been taken to mitigate them as well as remaining non-mitigated risks with an explanation of the reason why they cannot be mitigated.
- Compute [Annex VIII, Section C, page 24]. Description of the training resources used by the foundation model, including computing power required, training time, and other relevant information related to the size and power of the model.
- Evaluations [Annex VIII, Section C, page 24 as well as Article 28b, paragraph 2c, page 39]. Description of the model’s performance, including on public benchmarks or state-of-the-art industry benchmarks.
- Testing [Annex VIII, Section C, page 24 as well as Article 28b, paragraph 2c, page 39]. Description of the results of relevant internal and external testing and optimization of the model.
- Member states [Annex VIII, Section C, page 24]. Member States in which the foundation model is or has been placed on the market, put into service, or made available in the Union.
- Downstream documentation [Annex VIII, 60g, page 29 as well as Article 28b, paragraph 2e, page 40]. Also, foundation models should have information obligations and prepare all necessary technical documentation for potential downstream providers to be able to comply with their obligations under this Regulation.
- Machine-generated content [Annex VIII, 60g, page 29]. Generative foundation models should ensure transparency about the fact the content is generated by an AI system, not by humans.
- Pre-market compliance [Article 28b, paragraph 1, page 39]. A provider of a foundation model shall, prior to making it available on the market or putting it into service, ensure that it is compliant with the requirements set out in this Article, regardless of whether it is provided as a standalone model or embedded in an AI system or a product, or provided under free and open source licenses, as a service, as well as other distribution channels.
- Data governance [Article 28b, paragraph 2b, page 39]. Process and incorporate only datasets that are subject to appropriate data governance measures for foundation models, in particular, measures to examine the suitability of the data sources and possible biases and appropriate mitigation.
- Energy [Article 28b, paragraph 2d, page 40]. Design and develop the foundation model, making use of applicable standards to reduce energy use, resource use and waste, as well as to increase energy efficiency and the overall efficiency of the system. This shall be without prejudice to relevant existing Union and national law, and this obligation shall not apply before the standards referred to in Article 40 are published. They shall be designed with capabilities enabling the measurement and logging of the consumption of energy and resources and, where technically feasible, another environmental impact the deployment and use of the systems may have over their entire lifecycle.
- Quality management [Article 28b, paragraph 2f, page 40]. Establish a quality management system to ensure and document compliance with this Article, with the possibility to experiment in fulfilling this requirement.
- Upkeep [Article 28b, paragraph 3, page 40]. Providers of foundation models shall, for a period ending ten years after their foundation models have been placed on the market or put into service, keep the technical documentation referred to in paragraph 1(c) at the disposal of the national competent authorities.
- Law-abiding generated content [Article 28b, paragraph 4b, page 40]. Train, and, where applicable, design and develop the foundation model in such a way as to ensure adequate safeguards against the generation of content in breach of Union law in line with the generally acknowledged state of the art and without prejudice to fundamental rights, including the freedom of expression.
- Training on copyrighted data [Article 28b, paragraph 4c, page 40]. Without prejudice to national or Union legislation on copyright, document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law.
- Adherence to general principles [Article 4a, paragraph 1, page 142–3]. All operators falling under this Regulation shall do their best to develop and use AI systems or foundation models in accordance with the following general principles establishing a high-level framework that promotes a coherent humancentric European approach to ethical and trustworthy Artificial Intelligence, which is fully in line with the Charter as well as the values on which the Union is founded: a) ‘human agency and oversight’ means that AI systems shall be developed and used as a tool that serves people, respects human dignity and personal autonomy, and that is functioning in a way that can be appropriately controlled and overseen by humans. b) ‘technical robustness and safety means that AI systems shall be developed and used in a way to minimize unintended and unexpected harm as well as be robust in case of unintended problems and be resilient against attempts to alter the use or performance of the AI system so as to allow unlawful use by malicious third parties. c) ‘ Privacy and data governance’ means that AI systems shall be developed and used in compliance with existing privacy and data protection rules while processing data that meets high standards in terms of quality and integrity. d) ‘transparency’ means that AI systems shall be developed and used in a way that allows appropriate traceability and explainability while making humans aware that they communicate or interact with an AI system as well as duly informing users of the capabilities and limitations of that AI system and affected persons about their rights. e) ‘diversity, non-discrimination and fairness’ means that AI systems shall be developed and used in a way that includes diverse actors and promotes equal access, gender equality, and cultural diversity while avoiding discriminatory impacts and unfair biases that are prohibited by Union or national law. f) ‘social and environmental well-being’ means that AI systems shall be developed and used in a sustainable and environmentally friendly manner as well as in a way to benefit all human beings while monitoring and assessing the long-term impacts on the individual, society, and democracy. For foundation models, the general principles are translated into and complied with by providers by means of the requirements set out in Articles 28 to 28b.
- The system is designed so users know it's an AI [Article 52(1) Paragraph 1 — not in the Compromise text, but invoked in 28(b), paragraph 4a, page 40]. Providers shall ensure that AI systems intended to interact with natural persons are designed and developed in such a way that natural persons are informed that they are interacting with an AI system unless this is obvious from the circumstances and the context of use. This obligation shall not apply to AI systems authorized by law to detect, prevent, investigate, and prosecute criminal offenses unless those systems are available for the public to report a criminal offense.
- Appropriate levels [Article 28b, paragraph 2c, page 39]. design and develop the foundation model in order to achieve throughout its lifecycle appropriate levels of performance, predictability, interpretability, corrigibility, safety, and cybersecurity assessed through appropriate methods such as model evaluation with the involvement of independent experts, documented analysis, and extensive testing during conceptualization, design, and development
12 of the 22 requirements were further assessed and graded

How is it looking like, Chief?
I will not lie; the results are not surprising, but certain themes are !! Out of the 12 requirements that relate to the data and technical aspects of the implementation, 4 core categories develop which are important for the EU regulation.
- Category 1: Data (includes sources, governance, and specific attention to the Copyrights)
- Category 2: Compute (includes declaration about compute needed to replicate/train and Energy consumption during training as well as the inference)
- Category 3: The Model Liabilities (includes Capabilities and limitations, risk mitigation plan and strategy, Evaluations, and Testing)
- Category 4: Deployment (includes clarity on the requirements for the disclosure of the machine-generated content, disclosure to the EU member states when a model is being deployed and made available to EU, and requirements about the provision of technical compliance for the EU AI act.

The figure above shows the final scores. One could easily lump models with 24+ scores out of 48 (a passing grade — OpenAI, Google PaLM, Bloom, and the elueutherAI.
Based on the scores above, I am worried about Category 1 and 3 of the 12/22 requirements. Most of the models have scored abysmally low based on Category 1. Given that most of the data was collected from internet scraping, a sizable chunk of it can be assumed to be generated from copyrighted material (explicit or implicit). It could be considered as even private data (if not copyrighted). Does the Category 1 language define the “fair use of internet data”? No. Is the regulation clear? “yes, as mud!”.
Another big issue is category 3, which is the risk mitigation attempt or plan. This includes an evaluation of the model. In its truly responsible manner, Meta has scored almost zero on their releases of the models. I sincerely think that they have not given 1% of the thought as much Google and OpenAI have given to the safety of their models. This is clearly visible from the scores.
For both category 1 and 3 an essential spice in the recipe is currently missing. A standard way to evaluate these models for both performance and safety. There is a lot of new literature covering this topic and I am hopeful that we can come up with a rubric to test these models in a more standard way.
Energy utilization and downstream technical documentation are the least of my worries and we should not sweat about it in IMHO. GPT-NeoX by ElutherAi and Bloom both get a crown in terms of being open and transparent about their data and disclosures.
No, we will not disclose vs regulation.
Both Google, in their PaLM paper and OpenAI/Microsoft in their release of GPT-4, clearly stated that they have no intent to disclose further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar. Although these requirements are really not hampering the safety of the model, these show the pressure on these companies to keep their developments secret.
The inherent push to be secret due to the competition is more harmful than the specific actions not to disclose technical details or secrete sauce recipes. If these companies are allowed to develop a system like patents using which they could keep their financial interests intact for X years, it will help to make these technologies more transparent without putting undue pressure on the source of innovation.
Self-reporting has never worked. We need regulation, innovation, and the people to play together, which means we must have a strict but fair approach to regulation. I am glad to see the stricter side of regulation. I am sure this is just the pendulum swinging on the side of over-regulation after a long period of under-regulation. Transparency is the key to the trust of the end-user. Let us tread responsibly.
If you have read it until this point — Thank you! You are a hero (and a Nerd ❤)! I try to keep my readers up to date with “interesting happenings in the AI world,” so please 🔔 clap | follow | Subscribe 🔔
Become a member using the referral: https://ithinkbot.com/membership
Find me on Linkedin https://www.linkedin.com/in/mandarkarhade/

