Predicting the Out-of-Vocabulary Rate and the Required Vocabulary Size for Spe...

Title Predicting the Out-of-Vocabulary Rate and the Required Vocabulary Size for Speech Processing Applications
Authors Johannes Müller, Holger Stahl, Manfred Lang
Type Scientific Conference Paper
Abstract This paper describes an approach for predicting both the vocabulary size and the resulting out-of-vocabulary rate (OOV-rate) for a hypothetical extension of an existing text corpus. By splitting the original corpus into two different sub-corpora, vocabulary and OOV-rate can be determined for that special constellation. Average values are calculated for all combinations of sub-corpora and can be approximated by analytic function terms. These functions enable the easy prediction of the vocabulary size and the OOV-rate. The prediction accuracy results in a relative error below 4.6%.
Keywords: out-of-vocabulary rate, OOV-rate, vocabulary size, text corpus, test corpus, training corpus
Reference Proceedings ICSLP 96 (Philadelphia, USA, 1996), pp. 658-661
Year 1996
Language English
Download Dateisymbol Scientific Conference Paper as pdf file  (48 kByte)

Twitter-Symbol Facebook-Symbol XING-Symbol Delicious-Symbol Senden-Symbol

© WebDesign by Johannes Müller - Briefsymbol Kontakt
erstellt am 21.02.2007, zuletzt geändert am 09.01.2008