Rethinking Data Protection in the (Generative) Artificial Intelligence Era
Rethinking Data Protection in the (Generative) Artificial Intelligence Era
The (generative) artificial intelligence (AI) era has profoundly reshaped the meaning and value of data. No longer confined to static content, data now permeates every stage of the AI lifecycle from the training samples that shape model parameters to the prompts and outputs that drive real-world model deployment. This shift renders traditional notions of data protection insufficient, while the boundaries of what needs safeguarding remain poorly defined. Failing to safeguard data in AI systems can inflict societal and individual, underscoring the urgent need to clearly delineate the scope of and rigorously enforce data protection. In this perspective, we propose a four-level taxonomy, including non-usability, privacy preservation, traceability, and deletability, that captures the diverse protection needs arising in modern (generative) AI models and systems. Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline, including training datasets, model weights, system prompts, and AI-generated content. We analyze representative technical approaches at each level and reveal regulatory blind spots that leave critical assets exposed. By offering a structured lens to align future AI technologies and governance with trustworthy data practices, we underscore the urgency of rethinking data protection for modern AI techniques and provide timely guidance for developers, researchers, and regulators alike.
Yiming Li、Shuo Shao、Yu He、Junfeng Guo、Tianwei Zhang、Zhan Qin、Pin-Yu Chen、Michael Backes、Philip Torr、Dacheng Tao、Kui Ren
计算技术、计算机技术
Yiming Li,Shuo Shao,Yu He,Junfeng Guo,Tianwei Zhang,Zhan Qin,Pin-Yu Chen,Michael Backes,Philip Torr,Dacheng Tao,Kui Ren.Rethinking Data Protection in the (Generative) Artificial Intelligence Era[EB/OL].(2025-07-03)[2025-07-21].https://arxiv.org/abs/2507.03034.点此复制
评论