|国家预印本平台
首页|Tutorial: $φ$-Transductions in OpenFst via the Gallic Semiring

Tutorial: $φ$-Transductions in OpenFst via the Gallic Semiring

Tutorial: $φ$-Transductions in OpenFst via the Gallic Semiring

来源:Arxiv_logoArxiv
英文摘要

OpenFst, a popular finite-state transducer library, supports $φ$-transitions but, due to an implementation constraint, they cannot be used with transducers in a straightforward way. In this short tutorial, we describe how one can use other functionality provided by OpenFst (namely, the Gallic semiring) to correctly implement $φ$-transductions and demonstrate it by implementing the MaxMatch (WordPiece) tokenization algorithm (Devlin et al., 2019; Song et al., 2021). Accompanying self-contained code examples are provided. https://www.openfst.org/twiki/pub/Contrib/FstContrib/phi_transduction_tutorial_code.tgz

Marco Cognetta、Cyril Allauzen

计算技术、计算机技术

Marco Cognetta,Cyril Allauzen.Tutorial: $φ$-Transductions in OpenFst via the Gallic Semiring[EB/OL].(2025-06-22)[2025-07-17].https://arxiv.org/abs/2506.17942.点此复制

评论