Zhifei Li (PhD ’10) believes that computers should be able to see, hear, feel, and move. But, he adds, mostly they need to be able to speak to us and respond intelligently to what we say.
“Technology exists to make a positive impact on people’s lives, and to do that, we need to enable better human-machine interaction,” says Li, who grew up dreaming of being a scientist or engineer while working the fields on his family’s farm in China’s mountainous Hunan province.
Sometimes, childhood dreams do come true. Now 39, Li is founder and CEO of Mobvoi, a Beijing-based start-up making headlines not only for its innovative work building the “next generation” mobile voice search engine, but also more recently, for TicWatch, China’s answer to Apple Watch and Android Wear.
Billed as a quintessentially Chinese device, the sleek, stylish TicWatch (about $160 U.S.) lets wearers use their voices to do everything from finding the closest karaoke bar open past 2 a.m. to remotely controlling the dishwashers in their apartments. To wake the device, users simply say “Ni hao Wenwen,” (Chinese for “hello”), and it springs to life. TicWatch also includes a sensitive touchscreen, as well as a so-called “Tickle strip” slide sensor on the side, increasing its functionality. In April, Li and Mobvoi held “CreaTic,” the first of many hackathons aimed at ensuring that TicWatch and its operating system, TicWear (launched in 2014), remain game-changers.
“This is only the beginning,” promises Li, whose nickname in China is “Watch Bro.” “We are looking at many new ideas, including ‘Pace,’ a music app that can select music based on the cadence of your steps.”
Li envisions a world where all of our “smart” devices – from smart-watches to cars to home appliances to, eventually, robots – are connected and are able to sort through all the fragmented data they collect about where we go, what we do, and what we want, and are not only able to anticipate what we need but also provide it.
“Once these dots are connected, the value [of our devices] will really shine,” he predicts.
Li’s interest in artificial intelligence technologies that will enable people to have fast and natural interactions with mobile devices him to the Whiting School of Engineering, and theCenter for Language and Speech Processing, for his doctorate.
“Johns Hopkins was my ideal choice because CLSP is one of the best in its field,” he said. “At CLSP, I got to work with and for the best scholars in the world across different fields, including Sanjeev Khudanpur of electrical and computer engineering and Jason Eisner and David Yarowsky of computer science. The high standard of the research culture there is the thing I treasure most about my time there.”
While at WSE, Li built Joshua, the widely used, open-source mainstream software for machine translation. After receiving his PhD, Li went to work for Google as a research scientist, where he created Google’s mobile offline translation system, which now supports 90 languages and serves hundreds of millions of users.
But in 2012, with a “heart devoted to making an impact on my own country using the knowledge that I had accumulated,” Li returned to China and started Mobvoi (later in 2013, his former Google colleague, Xin Lei, a speech recognition researcher, joined his startup as the Chief Technology Officer). Among the company’s first projects was Chumen Wenwen, the Chinese version of Siri, followed by TicWear and TicWatch.
“We are really only in the early stages of what artificial intelligence and human machine interaction are capable of,” he said.
This article originally appeared in the Winter 2016 issue of Johns Hopkins Engineering magazine