🔵 UNPOPULAR OPINION: the GDPR also applies when creating and training AI datasets - and most tech companies ignore it. This must change. Read this:
As the
@CNIL's infographic below shows, regardless of the data source, data protection law must be observed when creating a training dataset.
A reminder that Article 6 of the GDPR establishes that these are the possible ways to process personal data lawfully:
- consent
- contract
- legal obligation
- vital interest
- public interest
- legitimate interested
Most AI companies developing large language models today rely on legitimate interest to scrape data from the web and train their models.
However, despite seeming an "easy" alternative, legitimate interest has its own legal requisites, including the three-part test (purpose, necessity, balancing), transparency, data minimization, and storage limitation.
Most tech companies developing AI today don't comply with any of these (and I did not mention yet data subjects' rights and other data protection principles).
With the quick and ubiquitous integration of generative AI and large language models-based capabilities into daily applications, data protection law must be implemented and made effective (or privacy rights and advancements - which took so much effort and time - will be undermined).
Privacy matters, ALSO when AI is involved.
Join our 4-week Privacy & AI Bootcamp on January 31st and learn more about it.