首页|Handling Numeric Expressions in Automatic Speech Recognition

Handling Numeric Expressions in Automatic Speech Recognition

来源：

英文摘要

This paper addresses the problem of correctly formatting numeric expressions in automatic speech recognition (ASR) transcripts. This is challenging since the expected transcript format depends on the context, e.g., 1945 (year) vs. 19:45 (timestamp). We compare cascaded and end-to-end approaches to recognize and format numeric expressions such as years, timestamps, currency amounts, and quantities. For the end-to-end approach, we employed a data generation strategy using a large language model (LLM) together with a text to speech (TTS) model to generate adaptation data. The results on our test data set show that while approaches based on LLMs perform well in recognizing formatted numeric expressions, adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.

作者：Christian Huber、Alexander Waibel

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Christian Huber,Alexander Waibel.Handling Numeric Expressions in Automatic Speech Recognition[EB/OL].(2025-06-23)[2025-07-16].https://arxiv.org/abs/2408.00004.点此复制

Handling Numeric Expressions in Automatic Speech Recognition

Handling Numeric Expressions in Automatic Speech Recognition

评论