Title: Modeling visual co-articulation for large vocabulary continuous visual speech recognition Authors: A. Mashari (Contact), J. Sison, C. Neti, G. Potamianos, J.Luettin Though large vocabulary continuous speech recognition (LVCSR) has been addressed with remarkable success by various audio-based systems, over the past few decades, noise still remains the major obstacle. Visual information is a potential source for making significant improvements in LVCSR since it is not affected by acoustic noise and has significant complementary information (e.g. voicing is easily detected in audio, whereas place of articulation is more detectable by video). This work is part of a larger research project [1] that explored HMM audio-visual LVCSR systems. One important issue in such systems is modeling context dependence of speech units [phone models] in order to capture co-articulation. In this work we explore visually meaningful speech co-articulation effects modeled using decision trees, and present some preliminary results using the HTK system. Though the visual decision trees differ significantly from audio ones, they do not significantly improve the performance of an audio-visual LVCSR system. [1] C. Neti, G. Potamianos, J.Luettin, I. Matthews, D. Vergyri, H. Glotin, J.Sison, A. Mashari, J. Zhou., "Audio-visual speech recognition" Final Workshop2000 Report, CLSP Johns Hopkins University, Baltimore, 2000.