Data governance as the foundation of trustworthy AI

The quality of artificial intelligence stands and falls with the quality of its data.

The quality of artificial intelligence stands and falls with the quality of its data. This seemingly banal insight (‘garbage in, garbage out’) has prompted European legislators to formulate comprehensive requirements for data governance for high-risk AI systems in the AI Regulation. Article 10 of the AI Regulation establishes a legal framework that goes far beyond technical aspects and will present new challenges for providers of AI systems in the future.

From the technical to the legal dimension

AI systems are based on so-called ‘machine learning’ models, which adopt the statistical properties of their (training) data during the training process. If these data sets contain gaps, errors or distortions, this will inevitably be reflected directly or indirectly in the results – e.g. in incorrect predictions or discriminatory decisions.

Practical compliance requirements

Art. 10 AI Regulation transforms this technical reality into specific legal obligations: If high-risk AI systems are trained with data, training, validation and test data sets must be developed that meet the quality requirements of Art. 10 (2) to (5) AI Regulation. This includes, in particular, an assessment of the data sets in terms of availability, quantity and suitability, as well as an investigation of relevant data gaps or other deficiencies.

In addition, the data sets must be designed in such a way that they are sufficiently representative and as accurate and complete as possible. It must also be ensured that the data sets correspond to the geographical, contextual, behavioural or functional conditions under which the high-risk AI system is intended to be used.

Furthermore, providers must document their decision-making basis for data selection in a comprehensible manner, provide complete evidence of the data collection procedures and the origin of the data, and make all data preparation processes – such as annotation, cleansing or aggregation – transparent. Of particular importance is the obligation to carry out systematic bias analysis, especially when data outputs are used as inputs for downstream processes, which can lead to feedback effects.

Consequences for business practice

The future (legal) requirements for data governance make it clear that the use of AI is not only a technical task, but also a legal and organisational one to a large extent. Early investment in robust data management processes and continuous documentation are therefore essential – both to fulfil regulatory obligations and to create a trustworthy basis for the use of AI.

Our KWR Data Protection Team will be happy to support you in establishing data governance as an integral part of your AI compliance, thereby paving the way for legally compliant, high-performance and responsible AI systems.

Your contact