Quantification of stylistic differences in human- and ASR-produced transcripts of African American English

Annika Heuser, Tyler Kendall, Miguel Del Rio, Quinten McNamara, Nischal Bhandari, Corey Miller, and Migüel Jetté, 2024

Published in Proceedings of Interspeech 2024

Download paper

Abstract: Common measures of accuracy used to assess the performance of automatic speech recognition (ASR) systems, as well as hu- man transcribers, conflate multiple sources of error. Stylistic differences, such as verbatim vs non-verbatim, can play a sig- nificant role in ASR performance evaluation when differences exist between training and test datasets. The problem is com- pounded for speech from underrepresented varieties, where the speech to orthography mapping is not as standardized. We cat- egorize the kinds of stylistic differences between 6 transcrip- tion versions, 4 human- and 2 ASR-produced, of 10 hours of African American English (AAE) speech. Focusing on verba- tim features and AAE morphosyntactic features, we investigate the interactions of these categories with how well transcripts can be compared via word error rate (WER). The results, and overall analysis, help clarify how ASR outputs are a function of the decisions made by the training data’s human transcribers.

Recommended citation: Heuser, A., Kendall, T., Del Rio, M., McNamara, Q., Bhandari, N., Miller, C., & Jetté, M. (2024). Quantification of stylistic differences in human- and ASR-produced transcripts of African American English. In Proceedings of Interspeech 2024.