Language Processing in the Web Era – Kuansan Wang (Microsoft)
Natural language processing (NLP) has been dominated by statistical based data driven approaches. The massive amount of data available, especially those from the Web, have further fueled the progress in this area. In the past decades, it has been widely reported that simple methods can often outperform most complicated system when trained with large amount of data. In deploying many web scale applications, however, we regularly find that the size of training data is just one of several factors that contribute to the success of the applications. In this talk, we will use real world applications to illustrate the important design considerations in web scale NLP:Rudimentary multilingual capabilities to cope with the global nature of the webVersatile modeling of the diverse styles of languages used in the web documentsFast adaptation to keep pace with the changes of the webFew heuristics to ensure system generalizability and robustnessPossibilities for efficient implementations with minimal manual efforts
Dr. Kuansan Wang is a Principal Researcher at Microsoft Research, Redmond WA, where he is currently managing Human Intelligence Technology Group in Internet Service Research Center. He joined Microsoft Research in 1998 with Speech Technology Group, conducting research in spoken language understanding and dialog system. He was responsible for architecting many speech products from Microsoft, ranging from desktop, embedded and server applications to mobile and cloud based services. His research outcomes, disclosed in more than 60 US and European patents and applications, have been adopted in three ISO, three W3C and four ECMA standards. He has also served as an organizing member/reviewer and panelist at WWW, ICASSP, InterSpeech, ACL and various workshops in speech, language and web research areas.Dr. Wang received B.S. from National Taiwan University, M.S. and PhD from University of Maryland, College Park, all in Electrical Engineering. Prior to joining Microsoft, he was a Member of Technical Staff in AT&T/Lucent Bell Labs in Murray Hill, NJ, and NYNEX/Verizon Science and Technology Center in White Plain, NY.