Xuankai Chang receives Best Paper Award at IEEE’s Automatic Speech Recognition and Understanding Workshop

January 9, 2020

Xuankai Chang, a graduate student in the Department of Electrical and Computer Engineering who is also a member of the Center for Language and Speech Processing, admits that he had not given much thought to the idea of winning the Best Paper award at IEEE’s Automatic Speech Recognition and Understanding Workshop in Singapore.

Despite being nominated for the award and receiving plenty of positive feedback from attendees throughout the conference, there was a lot of competition for the top honor. His hope was that his paper, titled “Mimo-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition,” would win the conference’s Best Student Paper honor.

When the awards were announced, Chang was a little down when his name was not called for Best Student Paper. That disappointment was short lived, though, as just a few minutes later, it was revealed that his paper did not win Best Student Paper because it had been selected as the workshop’s best overall paper.

“I was surprised to hear the title of my paper,” Chang said of the announcement.

Chang’s paper focuses on trying to solve the cocktail party problem, which refers to the speech of a target speaker being entangled with noise from interfering speakers. Essentially, the ability to clearly listen and understand speech when there are additional disruptive sounds in an area.

Historically, machines have outperformed humans in speech recognition tasks in ideal, noise-free conditions. However, when there are distracting sounds, humans can easily handle such cases while machines cannot. Chang’s research reflects a new technique that could help machines catch up to humans in that area.

“In our paper, we proposed a new model to utilize the microphone-array signals to recognize the speech signals in which multiple speakers are speaking simultaneously and achieved surprisingly good results,” Chang said. “Our work can be used to enhance the performance of speech recognition systems in the complex cases which are common in real life, for example, intelligent home devices or a meeting transcription system.”

Chang believes the award signals that others in the field accept the work he has been doing. He hopes his work draws more attention to the cocktail party problem, and that his method could result in a better experience when using intelligent devices, as well as hearing aid devices.

As for the future of his work, he refers to being at Johns Hopkins as the “key for this research.”

“I began to work on this project when I started the collaboration with the group at Johns Hopkins,” Chang said. “Throughout the project, I have had a lot of valuable discussions with my advisor, [ECE associate research professor] Shinji Watanabe, and other friends in our group, as well as in CLSP. Even before attending the conference, they gave me helpful advice on how to improve my presentation. I’m thankful for all the help and guidance they’ve given me during this process.”


Center for Language and Speech Processing