Modeling German Word Order Acquisition via Bayesian Inference

Annika Heuser and Polina Tsvilodub, 2021

Published in Proceedings of the Society for Computation in Linguistics 2021

Download paper

Abstract: Perfors et al.(2011) introduced a Bayesian model selection inference system as a child language acquisition model, and demonstrated in the case of English child language input that without any prior bias, such a system would prefer a probabilistic context-free grammar (PCFG) that used hierarchical structure over a regular grammar representing linear phrase structure. However, this system by Perfors et al.(2011) is limited as a computational model because it can only compare PCFGs that all parse the same data, which is likely to include errors, especially in the case of large data sets. In fact, in the German child language corpus that we consider, transcription and part-of-speech (POS) tagging result in so many errors that the corpus no longer appears representative of a child's input. This illustrates that the assumption that such corpora are always appropriate for computational child language acquisition modeling is misleading. Here we propose a method of comparing syntactic hypotheses compatible with different subsets of a child-directed speech corpus by identifying and countering the implications of different data subset sizes on the likelihood component of the Perfors-type Bayesian model selection scheme. We apply this approach in a case study of word-order acquisition in German.

Recommended citation: Heuser, A., & Tsvilodub, P. (2021). Modeling German Word Order Acquisition via Bayesian Inference. In Proceedings of the Society for Computation in Linguistics 2021.