Training an AI with personal data: consent or legitimate interest?

Since May 27, 2025, Meta has started to train its artificial intelligence models, including Meta AI, using public data of European users social networks Facebook, Instagram and Threads. This processing is carried out without prior consent, but with the possibility for users to object to it via an opt-out mechanism.

‍

In a communique, Meta justifies this processing of personal data by its legitimate interest in particular to allow generative AI models to” understand the incredible and diverse nuances and complexities that make up European communities ” and in particular the” distinct ways in which different countries use humor and sarcasm on our products ”.

‍

This announcement led several organizations including the NGO NOYB to file complaints with 11 personal data protection authorities, including the DPC in Ireland. The main argument being that Meta should Set up an opt-in to obtain the consent of individuals and not a simple opt-out.

‍

Can legitimate interest, under the GDPR, be sufficient to justify training an AI with personal data, without prior consent?

‍

The issue was examined in depth by the European Data Protection Board (EDPS), in a Opinion delivered on December 17 2024. The CNIL then clarified this position by updating its Practical sheet on the use of the legal basis of legitimate interest for artificial intelligence systems. All of these documents offer us a useful analytical framework to better understand the conditions under which such processing is lawful.

‍

The 3-step test to verify the legitimate interest of the treatment

‍

Before relying on this legal basis to train an AI using personal data, the CNIL and the EDPS recommend a three-step analysis and documenting it internally.

1. Identify the legitimate, real and clear interest in the treatment

The first step is to identify an interest legitimate, that is to say a real, legal, clearly defined and sufficiently concrete. The challenge here is not to invoke a hypothetical intention, but to demonstrate a specific purpose, in connection with the company's activity. For example, developing a conversational AI or improving an existing service can thus be admissible objectives. On the other hand, wanting to train an AI for prohibited purposes (profiling minors, automated surveillance, etc.) is incompatible from the outset with this legal basis for legitimate interest.

‍2. Verify the need for treatment

The second stage aims to assess the need for the proposed treatment. It is not enough for a treatment to be useful: it must not exist no less intrusive alternative to achieve the same result. This requires real thought: is it possible to limit the data used? to do without identifying data? to mobilize synthetic or aggregated data? When it comes to AI, this step is often overlooked, even though it is crucial.

3. Balancing interests

Finally, the third step is the most delicate: it is a question of carrying out a balancing between the interest pursued by the data controller and the rights and freedoms of the persons concerned. This is where the supervisory authorities ask to weigh the risks by asking questions in particular about the extent of the data used, the sensitivity of the data, the number of persons concerned, the risks of attacks on privacy, reputation, reputation, the opacity of the treatment, etc.

‍

Note that the more the treatment has a collective benefit (e.g. improved care, useful innovation), the more it can be justified. On the other hand, if the purpose of the treatment is purely commercial, it will be necessary to provide for reinforced protective measures to protect the persons concerned.

‍

At this stage, it is also necessary to consider the following aspects:

What are the real and probable risks for the persons concerned, taking into account in particular the volume of data, the sensitivity of the data, potential breaches (privacy, freedom of expression, reputation...)?

Could the people concerned Reasonably expect such reuse of their data? To answer this question, it is necessary to adopt a “case-by-case” approach that takes into account the reasonable expectations of individuals with respect to the services concerned and their technological level.

What are the additional measures put in place to limit these risks ? These may include technical, legal or organizational measures, such as the anonymization or pseudonymization of data, a clear desire to be transparent or the creation of an ethics committee by the data controller.

‍

And now, let's apply this 3-step test to assess the use of Meta IA's legitimate interest

‍

Warning: Contrary to what Meta claims, the EDPS opinion is general and does not at all confirm that Meta's reasoning and approach are in accordance with the GDPR. It is necessary to actually apply the test to Meta.

‍

1. Legitimate interest identified: Meta invokes a clear objective: to train its AIs to offer discussion functionalities adapted to European cultural and linguistic specificities. It is indeed a real, legitimate and sufficiently determined interest.

‍

2. Necessity of treatment: However, the question arises whether less intrusive ways could have been mobilized. According to NOYB, training an AI model can be based on open sources that can be accessed online, without using personal data from social networks. At this stage, the demonstration of necessity therefore appears. less well-founded.

‍

3. Balancing interests: This is undoubtedly the most delicate step. The risks for those affected are real: invasion of privacy, reputation, loss of control over their data. Do users reasonably expect their posts — sometimes ephemeral or private — to be reused for AI training purposes? We can doubt it, even if the profiles are “public.”

‍

Meta highlights the following guarantees:

only public data is used,

private messages and data from minors are excluded,

a form allows people, including those without an account on these social networks, to object,

a new opt-out mechanism has been integrated into the applications.

‍

However, several complementary measures could have strengthened the compliance and transparency of the treatment, such as in particular:

document publicly legitimate interest analysis and publishing it,

give priority to the legal basis of consent,

pseudonymize or anonymize the data,

Restrict the period of use of the data (e.g.: do not go back more than 2 years),

Offer differentiated opposition options (past/future),

improve the advance communication upon the entry into force of the treatment, in particular on the deadline of May 27 to exercise the right to object.

‍

All these elements show that, even if the interest pursued by Meta is legitimate, the balance with the rights of individuals remains fragile and could be contested.

‍

On May 17, NOYB sent Meta a letter of formal notice asking it to immediately stop using the personal data of European users to train its AIs. NOYB denounces a lack of information and fears that user rights will be compromised once the data is integrated into the models. European collective action is envisaged, with damages potentially estimated in billions of euros.

‍

This iconic case could well serve as a matrix for future AI projects deployed on a large scale. In the coming months, it will illustrate the red lines that should not be crossed and those that still need to be drawn.

‍

***

‍

To find out more about the obligations associated with the launch of an artificial intelligence model and the requirements of the GDPR, we invite you to consult our article: ” Key steps to bring your tools into AI & RGPD compliance ”, available on our blog.

‍

The firm's IT/Data team supports you to secure your practices and integrate these requirements into your projects. Contact us today to anticipate regulatory changes and bring your tools into compliance.

‍

Clémentine Beaussier, partner lawyer at Squair

Read the article

Listen to the episode

Read the press release

Watch the video

Learn more

June 3, 2025

Say, hey!

Saturday - Thursday:

8:30am - 10:45pm

Call for an appointment



hello@libertylaw.com