Perturbed Public Voices (P$^{2}$V): A Dataset for Robust Audio Deepfake Detection
Perturbed Public Voices (P$^{2}$V): A Dataset for Robust Audio Deepfake Detection
Current audio deepfake detectors cannot be trusted. While they excel on controlled benchmarks, they fail when tested in the real world. We introduce Perturbed Public Voices (P$^{2}$V), an IRB-approved dataset capturing three critical aspects of malicious deepfakes: (1) identity-consistent transcripts via LLMs, (2) environmental and adversarial noise, and (3) state-of-the-art voice cloning (2020-2025). Experiments reveal alarming vulnerabilities of 22 recent audio deepfake detectors: models trained on current datasets lose 43% performance when tested on P$^{2}$V, with performance measured as the mean of F1 score on deepfake audio, AUC, and 1-EER. Simple adversarial perturbations induce up to 16% performance degradation, while advanced cloning techniques reduce detectability by 20-30%. In contrast, P$^{2}$V-trained models maintain robustness against these attacks while generalizing to existing datasets, establishing a new benchmark for robust audio deepfake detection. P$^{2}$V will be publicly released upon acceptance by a conference/journal.
Chongyang Gao、Marco Postiglione、Isabel Gortner、Sarit Kraus、V. S. Subrahmanian
通信
Chongyang Gao,Marco Postiglione,Isabel Gortner,Sarit Kraus,V. S. Subrahmanian.Perturbed Public Voices (P$^{2}$V): A Dataset for Robust Audio Deepfake Detection[EB/OL].(2025-08-13)[2025-08-28].https://arxiv.org/abs/2508.10949.点此复制
评论