Brian Roark (Google AI): Romanization, non-standard orthography and text entry

When:

July 12, 2018 @ 10:30 am – 11:30 am

2018-07-12T10:30:00-04:00

2018-07-12T11:30:00-04:00

Brian Roark Google AI Natural Language Modeling Natural Language Processing Romanization

Abstract

In this talk, we present issues in natural language modeling for text entry in languages that use noisy (i.e., non-standard) romanization strategies, with a particular focus on languages using Indic scripts. We discuss romanization strategies, and present data indicating that this sort of romanization typically amounts to a rough phonetic transcription. We present Gboard keyboards that make use of models very similar to widely used grapheme-to-phoneme models. We also discuss language modeling of romanized text directly.

Bio

Brian Roark is a computational linguist working on various topics in natural language processing. His research interests include: syntactic parsing of text and speech; language modeling for automatic speech recognition and other applications; supervised and unsupervised learning of language and parsing models; text entry, accessibility and augmentative & alternative communication (AAC).

Before joining Google as a research scientist in 2013, he was a faculty member for 9 years in the Center for Spoken Language Understanding (CSLU) at Oregon Health & Science University (OHSU) – part of what used to be the Oregon Graduate Institute (OGI). Before that, he was in the Speech Algorithms Department at AT&T Labs – Research from 2001–2004. He received his PhD in the Department of Cognitive and Linguistic Sciences at Brown University in 2001.

Brian Roark (Google AI): Romanization, non-standard orthography and text entry

Abstract

Bio

Center for Language and Speech Processing