MMMOS: Multi-domain Multi-axis Audio Quality Assessment
MMMOS: Multi-domain Multi-axis Audio Quality Assessment
Accurate audio quality estimation is essential for developing and evaluating audio generation, retrieval, and enhancement systems. Existing non-intrusive assessment models predict a single Mean Opinion Score (MOS) for speech, merging diverse perceptual factors and failing to generalize beyond speech. We propose MMMOS, a no-reference, multi-domain audio quality assessment system that estimates four orthogonal axes: Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness across speech, music, and environmental sounds. MMMOS fuses frame-level embeddings from three pretrained encoders (WavLM, MuQ, and M2D) and evaluates three aggregation strategies with four loss functions. By ensembling the top eight models, MMMOS shows a 20-30% reduction in mean squared error and a 4-5% increase in Kendall's Ï versus baseline, gains first place in six of eight Production Complexity metrics, and ranks among the top three on 17 of 32 challenge metrics.
Yi-Cheng Lin、Jia-Hung Chen、Hung-yi Lee
电子技术应用
Yi-Cheng Lin,Jia-Hung Chen,Hung-yi Lee.MMMOS: Multi-domain Multi-axis Audio Quality Assessment[EB/OL].(2025-07-05)[2025-07-16].https://arxiv.org/abs/2507.04094.点此复制
评论