Big Data Goes Mobile – Kenneth Church (IBM)

February 22, 2013 all-day

What is “big”? Time & Space? Expense? Pounds? Power? Size of machine? Size of market? We will discuss many of these dimensions, but focus on throughput and latency (mobility of data). If our clouds can’t import and export data at scale, they may turn into roach motels where data can check in; but it can’t check out. DataScope is designed to make it easy to import and export 100s of TBs of disks. Amdahl’s Laws have stood up remarkably well to the test of time. These laws explain how to balance memory, cycles and IO. There is an opportunity to extend these laws to balance for mobility.
Ken is currently at IBM working on Siri-like applications of speech on phones. Before that, he was the Chief Scientist of the HLTCOE at JHU. He has worked at Microsoft and AT&T, as well. Education: MIT (undergrad and graduate). He enjoys working with large datasets. Back in the 1980s, we thought that Associated Press newswire (1million words per week) was big, but he has since had the opportunity to work with much larger datasets such as AT&T’s billing records and Bing’s web logs. He has worked on many topics in computational linguistics including: web search, language modeling, text analysis, spelling correction, word-sense disambiguation, terminology, translation, lexicography, compression, speech (recognition and synthesis), OCR, as well as applications that go well beyond computational linguistics such as revenue assurance and virtual integration (using screen scraping and web crawling to integrate systems that traditionally don’t talk together as well as they could such as billing and customer care). Service: past president of ACL and former president of SIGDAT (the organization that organizes EMNLP).

Center for Language and Speech Processing