Tutorial: $Ï$-Transductions in OpenFst via the Gallic Semiring
Tutorial: $Ï$-Transductions in OpenFst via the Gallic Semiring
OpenFst, a popular finite-state transducer library, supports $Ï$-transitions but, due to an implementation constraint, they cannot be used with transducers in a straightforward way. In this short tutorial, we describe how one can use other functionality provided by OpenFst (namely, the Gallic semiring) to correctly implement $Ï$-transductions and demonstrate it by implementing the MaxMatch (WordPiece) tokenization algorithm (Devlin et al., 2019; Song et al., 2021). Accompanying self-contained code examples are provided. https://www.openfst.org/twiki/pub/Contrib/FstContrib/phi_transduction_tutorial_code.tgz
Marco Cognetta、Cyril Allauzen
计算技术、计算机技术
Marco Cognetta,Cyril Allauzen.Tutorial: $Ï$-Transductions in OpenFst via the Gallic Semiring[EB/OL].(2025-06-22)[2025-07-17].https://arxiv.org/abs/2506.17942.点此复制
评论