Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
In this paper, we propose Vo-Ve, a novel voice-vector embedding that captures speaker identity. Unlike conventional speaker embeddings, Vo-Ve is explainable, as it contains the probabilities of explicit voice attribute classes. Through extensive analysis, we demonstrate that Vo-Ve not only evaluates speaker similarity competitively with conventional techniques but also provides an interpretable explanation in terms of voice attributes. We strongly believe that Vo-Ve can enhance evaluation schemes across various speech tasks due to its high-level explainability.
Jaejun Lee、Kyogu Lee
通信
Jaejun Lee,Kyogu Lee.Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation[EB/OL].(2025-06-24)[2025-07-21].https://arxiv.org/abs/2506.19446.点此复制
评论