Toward Improved Dialect Modeling – Malcah Yaeger-Dror (University of Arizona, Cognitive Science)

February 12, 2002 all-day

Speech technology (including synthesis, speech recognition and speaker verification) has made significant advances in recent years in laboratories and in field applications, but speech recognition can still degrade when the test data do not match the training data well — for example, when the test data includes dialects that are not included in the original sample, or when the speech collected from certain speakers does not match the way they normally speak. As a result, ‘non-mainstream dialects’ are under-represented because they are more difficult to collect using ‘standard’ channels.? For those who speak dialects not represented in the training data, this is a serious impediment to the goal of universal access. Such an impediment can have broad-reaching consequences since it can affect access to education and even telephone information systems. In fact, appropriate corpora for developing more adequate recognition strategies are so sparse that it is difficult even to be able to assess just how bad the current situation is, or to evaluate new modeling techniques. One impediment to devising a corpus which permits a better modeling of dialect is the fact that those who understand dialect and ‘style’ differences are often not versed in speech technology and vice versa. This paper will address the issue of how better understanding between these groups can permit researchers to gather appropriate speech so adequate recognition strategies can be devised for all speakers of English – both by choosing speakers from a broader range of dialects and by collecting the speech in a setting which is appropriate. After a short discussion of ‘dialect’ and ‘style’ (Eckert and Rickford 2001, Yaeger-Dror & Hall-Lew 2000(A)), the paper will propose how to take better advantage of a corpus which is already available, and which meets the criteria which appear to be needed for better dialect modeling. The paper will propose that if appropriately labeled and coded with respect to dialect and demographic variables, at least one corpus presently available could be quite helpful in improving dialect recognition.? A subset of the phone calls from Call Friend Southern American English appear to meet these criteria, both because the speech style is natural and conversational and because the speakers represent non-mainstream dialects for which there is at present very inadequate recognition.? We will conclude that better modeling of dialect effects across age groups, dialect groups, and sex should greatly enhance the goal of universal speech access. References Eckert, P. and J. Rickford? 2001. Style and Sociolinguistic Variation. CUP. Yaeger-Dror, M. 2001. Primitives for the analysis of ‘style’. In P.Eckert and J.Rickford, 170-185. Lauren Hall-Lew 2000. Prosodic prominence on negation in various registers of US English. Journal of the Acoustical Society of America. 108:2468 (A).

Center for Language and Speech Processing