Multilingual and code-switching speech recognition are important challenges due to the growing adoption of personal assistant devices and smartphones. With the rise of globalisation, there is an increasing demand for multilingual ASR, handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems.
The prevalence of code-switching in spoken content has enforced automatic speech recognition (ASR) systems to handle mixed input. Yet, designing a CS-ASR has many challenges, mainly due to the data scarcity, grammatical structure complexity and mismatch along with unbalanced language usage distribution.
We propose to study the multi-lingual and code-switching phenomena in two frequently spoken language sets (English and Arabic), along with low resourced and indigenous languages.:
– Arabic and English: 1,000 hours of multi-dialectal Arabic Data (MGB-2), English (Tedlium, librispeech).
– Minority languages of English speaking countries: Gaelic (British Isles), Maori (New Zealand) and many indigenous languages of the US and Canada
– Languages of sub-Saharan Africa, eg. Zulu, Xhosa, Yoruba
The code-switching study will feature both intersentential (switching between-utterances) and intrasentential (within utterances). The evaluation of the designed system and the analysis of the phenomena will be driven based on real test cases, collected from real meetings and interviews.
Our proposal, for the summer workshop, focuses on investigating novel techniques to build practical large vocabulary continuous speech recognition (LVCSR) systems capable of dealing with both the monolingual and code-switching spoken utterances. We aim to explore data augmentation and state of the art modelling techniques – using transfer learning and self supervised learning – to deal with the lack of balanced transcribed data, for multilingual and code-switching. Moreover, we also aim to address the challenge of evaluating code-switching ASR output.
The summer school will include four work packages running simultaneously while sharing outcomes to achieve desired goals. The work packages are:
WP1: Design a multilingual ASR with code-switching capabilities, in this WP, we will focus on pre-trained models and self-supervised models.
WP2: Handle low-resourced languages/dialects by generating synthetic code-switching data covering synthetic textual and speech data, upholding language dependent construction and triggers.
WP3: Build a robust evaluation measure considering the mixed script output system. The current systems are evaluated based on transliterated word error rate and character error rate. This method lacks generalization, especially when there is code-mix within the same word.
WP4: Understand where/why code-switching happens in speech analysis for system/human code-switching points. This will address issues such as: complex social factors, dominant language may be used for education literacy, is the code-switching a topic/domain dependent. This WP will offer more insights of the challenge rather than building a better system to improve accuracy.
Injy Khairy Hamed
Hamdy S. Hussein
Closing Presentation (Video)