Piotr Żelasko paper featured in ACL newsletter
A paper by Center for Language and Speech Processing assistant research scientist Piotr Żelasko was featured in the September edition of an Association for Computational Linguistics’ newsletter that shares recent developments in computational typology and multilingual natural language processing.
“The paper’s inclusion says that the contribution is interesting to the broader community of speech and language scientists trying to understand how the world’s languages are connected together,” Żelasko said. “While there is a fair deal of past research on multilingual speech systems, I think it can be safely stated that there is still much to be discovered.”
Titled “That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages,” the paper grew out of Żelasko’s work as part of Department of Electrical and Computer Engineering Assistant Professor Najim Dehak’s group. The research is a part of a larger project which focuses on the discovery of phonetic inventories of the world’s languages s being done by Dehak’s group in collaboration with Mark Hasegawa-Johnson’s team at University of Illinois and Odette Scharenborg’s group at Delft University of Technology.
The paper explores whether an automatic speech recognition (ASR) system, trained on multiple (13) languages at once, can learn something universal about their phonetics. The research found that despite some languages being quite different from one another, the systems still benefited from using multilingual recordings together to train a phone recognition model.
“We discovered that practically all speech units, which are known as phones, are getting recognized better in the multilingual model, even the ones that existed in just a single language,” Żelasko said. “It was initially puzzling, but ended up making sense. There were other phones in the training data that share all but one articulatory features with the language-unique ones. So, even though the speech sounds are different, there is some degree of universality in them and we see that in the results.”
Żelasko views this as being a very promising result for speech recognition engineers building systems for languages that don’t have many recordings that can be used for resources.
The lack of speech resources for under-resourced languages is a major challenge, and Żelasko believes that this result shows that utilizing this system could be better than relying on what has been used in recent years. As a result, this research could help make it easier for engineers to build speech recognition software for under-resourced languages.
“In recent years, AI practitioners have overcome these issues with transfer learning, which is using out-of-domain data to pre-train a model, and then what little in-domain data is available to fine-tune it,” Żelasko said. “While this approach doesn’t seem to have gained as much traction in the speech community as in NLP, for example. Our results suggest that using out-of-language training data could yield major improvements for speech systems.”
For Żelasko, coming to Hopkins was a key factor in him performing this research. He joined Hopkins from his native Poland in January, but already likes the collaborative spirit that Dehak promotes among both group members, and with those outside the institution.
“After spending several years in the industry, I’m very happy to have joined Najim Dehak’s team at JHU. The group has been very supportive of each other, especially during these difficult pandemic times,” Żelasko said. “Collaboration with my co-authors — including Laureano Moro-Velazquez from our team – was paramount in designing the experiments and making sense of the results.”