Xinyu Crystina Zhang (University of Waterloo) – “Information Seeking Beyond English”
3400 N CHARLES ST
Abstract
Pretrained language models have brought revolutionary progress to information-seeking in the English world. While the advance is exciting, how to transfer such progress into non-English, especially lower resource languages, presents new challenges that require developing new resources and methodologies. In this talk, I will present my research on building effective information-seeking systems for non-English speakers. I will begin by introducing the benchmarks and datasets developed to support the evaluation and training of the multilingual search systems. These resources have since become widely adopted within the community and enable the development of effective multilingual embedding models. The next part of the talk will share the best training practices we found in such model development, including strategies for enhancing backbone models and surprising transfer effects across languages. Building on these foundations, my work expanded to understand how language models process multilingual text and facilitate knowledge transfer across languages. The talk will conclude with a vision for the future of multilingual language model development, with the goal of adapting these models to unseen languages with minimal data and resource requirements and thus bridging the gap for underrepresented linguistic communities.
Bio
Crystina is a PhD candidate at the University of Waterloo, where she is advised by Prof. Jimmy Lin. Crystina’s research focuses on enhancing search systems in multilingual scenarios, with works featured at top NLP and IR conferences and journals, such as TACL, TOIS, ACL, and SIGIR. She hosted competitions on multilingual retrieval in the WSDM Cup 2022 and FIRE 2023 and received outstanding paper awards at EMNLP 2024 and a best paper nominee at SIGIR 2024. She has interned at Google DeepMind, Cohere, Max Planck Institute für Informatik, and NAVER. Prior to graduate school, she received her bachelor’s degree in computer science from the Hong Kong University of Science and Technology (HKUST) in 2020.