Advances in open domain Large Language Models (LLMs) starting with BERT and more recently with GPT-4, PaLM, and LLaMA have facilitated dramatic improvements in conversational systems. These improvements include an unprecedented breadth of conversational interactions between humans and machines while maintaining and sometimes surpassing the accuracy of systems trained specifically for known, closed domains. However, many applications still require higher levels of accuracy than pre-trained LLMs can provide. There are many studies underway to accomplish this. Broadly speaking, the methods assume the pre-trained models are fixed (due to cost/time), and instead look to various augmentation methods including prompting strategies and model adaptation/fine-tuning.
One augmentation strategy leverages the context of the conversation. For example, who are the participants and what is known about these individuals (personal context), what was just said (dialogue context), where is the conversation taking place (geo context), what time of day and season is it (time context), etc. A powerful form of context is the shared visual setting of the conversation between the human(s) and machine. The shared visual scene may be from a device (phone, smart glasses) or represented on a screen (browser, maps, etc.) The elements in the visual context can be exploited by grounding the natural language conversational interaction, thereby changing the priors of certain concepts and increasing the accuracy of the system. In this talk, I will present some of my historical work in this area as well as my recent work in the AI Virtual Assistant (AVA) Lab at Georgia Tech.
Dr. Larry Heck is a Professor with a joint appointment in the School of Electrical and Computer Engineering and the School of Interactive Computing at the Georgia Institute of Technology. He holds the Rhesa S. Farmer Distinguished Chair of Advanced Computing Concepts and is a Georgia Research Alliance Eminent Scholar. His received the BSEE from Texas Tech University (1986), and MSEE and PhD EE from the Georgia Institute of Technology (1989,1991). He is a Fellow of the IEEE, inducted into the Academy of Distinguished Engineering Alumni at Georgia Tech and received the Distinguished Engineer Award from the Texas Tech University Whitacre College of Engineering. He was a Senior Research Engineer with SRI (1992-98), Vice President of R&D at Nuance (1998-2005), Vice President of Search and Advertising Sciences at Yahoo! (2005-2009), Chief Scientist of the Microsoft Speech products and Distinguished Engineer in Microsoft Research (2009-2014), Principal Scientist with Google Research (2014-2017), and CEO of Viv Labs and SVP at Samsung (2017-2021).