Alzheimer’s disease (AD) is the most common neurodegenerative disorder. It generally deteriorates memory function, then language, then executive function to the point where simple activities of daily living (ADLs) become difficult (e.g. taking medicine or turning off a stove). Parkinson’s disease (PD) is the second most common neurodegenerative disease, also primarily affecting individuals of advanced age. Its cardinal symptoms include akinesia, tremor, rigidity, and postural imbalance. Together, AD and PD afflict approximately 55 million people, and there is no cure. Currently, professional or informal caregivers look after these individuals, either at home or in long-term care facilities. Caregiving is already a great, expensive burden on the system, but things will soon become far worse. Populations of many nations are aging rapidly and, with over 12% of people above the age of 65 having either AD or PD, incidence rates are set to triple over the next few decades.
Monitoring and assessment are vital, but current models are unsustainable. Patients need to be monitored regularly (e.g. to check if medication needs to be updated), which is expensive, time-consuming, and especially difficult when travelling to the closest neurologist is unrealistic. Monitoring patients using non-intrusive sensors to collect data during ADLs from speech, gait, and handwriting, can help to reduce the burden.
Our goal is to design and evaluate a system for monitoring neurodegenerative disorders over the phone, as illustrated in Figure 1. The doctor or an automatic system can contact the patient either on a regular basis or upon detecting a significant change in the patient’s behavior. During this interaction, the doctor or the automatic system has access to all the sensor-based evaluations of the patient’s ADLs and his/her biometric data (already stored on the server). The doctor/system can remind these individuals to take their medicine, can initiate dedicated speech-based tests, and can recommend medication changes or face-to-face meetings in the clinic.
The analysis is based on the fact that these disorders cause characteristic lexical, syntactic, and spectro-temporal acoustic deviations from non-pathological speech, and in a similar way, deviations from non-pathologic gait and handwriting. We will learn temporal models combining non-intrusive sensor data-based tests to allow a monitoring system to initiate intervention by the medical expert or automatic analysis system.
We have uncovered specific differences between pathological speech and normative control speech using existing feature extraction code (in Matlab and Python), which will be provided to the workshop by team members, to extract over 400 acoustic (e.g. pitch variation, phonation rate, wavelet, spectral and cepstral analyses, harmonicity, stability, and periodicity (e.g. jitter, shimmer, correlation dimension, and recurrence period density entropy)), lexico-syntactic (e.g. noun-to-pronoun ration, Yngve depth), semantic (e.g. word precision), phonetic (e.g. speaking rate and pitch dynamics) [Trevino et al, 2011), phonological [Cernak et al., 2016], conversation-based, and articulatory [Yu et al, 2014-15] features from audio and associated transcripts.
Our team has obtained very high accuracy in binary classification between pathological and non-pathological speech, as shown in Figure 2. Analysis has also shown the heterogeneity of pathological speech. For example, our recent work has shown that speakers with AD can be individually clustered according to either differences in fluency or in semantics [Fraser et al., 2016]. Our team has also, using phoneme-dependent and articulatory coordination-based features, with audio from elderly subjects over a 4 year period, achieved an average equal error rate (EER) of about 13.5% in detection of mild cognitive impairment [Yu et al, 2015]. This same speech feature methodology has been used with promise in detecting PD [Williamson et al, 2015]. In addition, we are working on discriminating between people with dementia and people with other functional memory disorders using computer-aided conversational analysis. Having automatic, home-based screening for such cases would enable clinicians to profile and track changes in speech and language over time, and hence give a more nuanced picture compared to the ‘snapshot’ obtained when patients go for a checkup. Given this strong starting point, we intend to train new models to answer deeper questions of discriminability, as detailed below.
Phil Green is leading the CloudCAST initiative, which is an international effort to develop and distribute speech technology “in the cloud” for clinicians and therapists worldwide. CloudCAST includes new data collection and merging of UA-Speech and TORGO databases of dysarthric speech, from UIUC and Toronto, respectively. CloudCAST will also help to disseminate the results of the workshop to enrolled clinicians directly. We will also use the publicly-available DementiaBank database of individuals with (N=198) and without (N=200) Alzheimer’s disease describing pictures verbally, as well as language assessment tasks being collected online from normative speakers (currently, N>80) and speakers with dementia (currently, N=30) from Toronto (Talk2Me). Talk2Me uses the same MySQL schema as CloudCAST and is being supported by the AGE-WELL National Centres of Excellence, the Alzheimer’s Society, and the Scarborough Centre for Healthy Communities, including with ongoing data collection. Importantly, none of these data will require HIPAA training or ethics board approval. From Antioquia, we currently have two databases, (1) PC-GITA [Orozco 2015] which contains recordings of 100 speakers, 50 with PD and 50 age- and gender-matched healthy controls, and (2) ongoing data collection with recordings of speech, gait, and handwriting of currently 30 patients with PD.
For both databases we have demographics, MDS-UPDRS-III, and Hoehn&Yahr measures. By the time of the workshop the second collection will include all-day recordings with dynamics given medicine intake and professional evaluation of speech with the standardized Frenchay assessment. Finally, the Ontario Neurodegenerative Disease Research Initiative (ONDRI) provides a full battery of assessments, including speech, depression scores, and neuroimaging for 150 people with AD and 150 with PD. MIT is also providing 200 elderly audio/cognitive assessments as part of Thomas Quatieri’s research into MCI; this will continue in collaboration with the MIT Brain and Cognitive Science Department, McLean Hospital, and MGH.
Since assessment should be automated fully, our group is already adapting automatic speech recognition (ASR) tools for this task. Specifically, we have adapted models in Kaldi using maximum likelihood linear regression to older voices both with and without dementia, and are currently exploring the relationships between ASR and assessment accuracies. We are also using the ASR system in IBM Watson as part of Frank Rudzicz’s involvement in IBM’s Academic Initiative, and a deeper collaboration through AGE-WELL. We are naturally interested in working with other large commercial ASR systems.
|Elmar Noeth||University of Erlangen (Germany)|
|Frank Rudzicz||Toronto Rehabilitation Institute (Canada)|
|Heidi Christensen||University of Sheffield (UK)|
|Juan Rafael Orozco-Arroyave||University of Antioquia (Colombia)|
|Hamidreza Chinaei||University of Toronto|
|Juan Camilo Vasquez-Correa||University of Antioquia (Colombia)|
|Maria Yancheva||University of Toronto (Canada)|
|Phani Shankar Nidadavolu||Johns Hopkins University|
|Julius Hannink||University of Erlangen (Germany)|
|Alyssa Vann||Stanford University|
|Nikolai Vogler||University of California-Irvine|
|Senior Affiliates (Part-time Members)|
|Tobias Bocklet||University of Erlangen (Germany)|