Over the past year, European AI startups have been catching up with their overseas competitors and introducing products comparable to the popular ChatGPT. With the focus on rapid advancements and development, issues regarding transparency, ethics, and user impact are sometimes put on the back burner. However, this is likely to change with the enforcement of the EU AI Act which should start in late 2024.
The
The EU AI Act classifies AI systems by risk levels. Systems labeled as high-risk will need to follow transparency requirements, with mandatory assessments of potential impacts on public health, safety, human rights, and societal welfare. They will be checked for biases ensuring they are non-discriminatory and respectful of fundamental human rights.
Additionally, developers of high-risk systems will be obliged to maintain detailed documentation, including training methods and datasets, to demonstrate compliance.
Foundation models comparable to GPT-3.5 will be regulated if they take at least 10²⁵ flops of computing power to train. Conversely, there will be significant concessions regarding open-source models, providing an incentive to develop this type of product.
Additionally, the EU AI Act lists prohibited AI systems. This includes biometric categorization using sensitive characteristics (e.g., race, religion, etc.), scraping of facial images, workplace and educational emotion recognition, social scoring, manipulation of human behavior, and systems designed to exploit human vulnerabilities.
The act also imposes sanctions for non-compliance, with penalties varying from €7.5 million or 1.5% of a company’s global turnover to as much as €35 million or 7% of turnover, depending on the violation and company size.
With the European AI Act likely to be enforced at the end of 2024 it is important to start preparing now, especially if your system is classified as
We recommend focusing on all aspects of system building starting from data preparation and finishing with in-depth system evaluation.
As outlined in the EU AI Act, companies will be responsible for keeping detailed records of the datasets. This will force companies to respect data privacy and improve traceability. For example, if a system produces harmful content, it can be traced back to inappropriate data or biased texts in the dataset it was trained on.
This means the training datasets should be considered carefully when prepping for the new rules. This could involve filtering and cleaning parts of the data used for training or even building custom datasets that are domain-curated and purposely built to avoid common biases present in scraped data.
To comply with the new rules, companies building LLMs should invest in aligning their models with human expectations, focusing on truthfulness, helpfulness, and harmlessness. The main methods used for LLM alignment are
Both methods collect human preferences for model output and use this data to teach the model what the desired output should look like. We can quite effectively stop the majority of harmful content creation at this stage if we provide the model with the right examples.
Evaluation of AI systems will be a top priority and needs to become a part of the product development cycle. The feeling of having a good model must be replaced by a meticulous and in-depth evaluation strategy.
Generative AI systems are particularly difficult to evaluate because the output is not deterministic. Generated text can’t be automatically compared to a “correct” answer. Evaluation of such systems involves human feedback focusing on a variety of aspects such as correctness, helpfulness, and harmlessness.
More often the systems need to be evaluated further than the basic levels mentioned above. For example, when evaluating harmfulness, we could further divide it into categories such as bias, hate speech, racism, etc. That way, we could discover at a granular level what needs to be fixed in the system to minimize its negative impact.
No doubt, the EU AI Act is an important step in AI regulations, and it signifies a new era of AI, where responsible AI development is no longer an option and will now be legally enforced.
Is your company ready to comply with the new AI regulations?