|国家预印本平台
首页|Handling Numeric Expressions in Automatic Speech Recognition

Handling Numeric Expressions in Automatic Speech Recognition

Handling Numeric Expressions in Automatic Speech Recognition

来源:Arxiv_logoArxiv
英文摘要

This paper addresses the problem of correctly formatting numeric expressions in automatic speech recognition (ASR) transcripts. This is challenging since the expected transcript format depends on the context, e.g., 1945 (year) vs. 19:45 (timestamp). We compare cascaded and end-to-end approaches to recognize and format numeric expressions such as years, timestamps, currency amounts, and quantities. For the end-to-end approach, we employed a data generation strategy using a large language model (LLM) together with a text to speech (TTS) model to generate adaptation data. The results on our test data set show that while approaches based on LLMs perform well in recognizing formatted numeric expressions, adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.

Christian Huber、Alexander Waibel

计算技术、计算机技术

Christian Huber,Alexander Waibel.Handling Numeric Expressions in Automatic Speech Recognition[EB/OL].(2025-06-23)[2025-07-16].https://arxiv.org/abs/2408.00004.点此复制

评论