Natural Language Processing for Health – Guergana Savova (Harvard)
Baltimore, MD, 21218
There is an abundance of health-related free text that can be used for a variety of immediate biomedical applications – phenotyping for Genome Wide Studies (GWAS), clinical point of care, patient powered applications, biomedical research. The presentation will cover current research problems in Natural Language Processing (NLP) relevant to health applications such as event and temporal expression discovery, linking of events to create timelines of patient’s clinical histories. Applications of NLP to biomedical problems will be discussed within the framework of national networks such as electronic Medical Records and Genomics (eMERGE), Pharmacogenomics Research Network (PGRN), Informatics for Integrating the Biology and the Bedside (i2b2), Patient Centered Outcomes Research Institute (PCORI).
Dr. Guergana Savova is Assistant Professor at Harvard Medical School and Boston Children’s Hospital. Her research interests are in natural language processing (NLP) especially as applied to the text generated by physicians (the clinical narrative). This is usually referred to as clinical NLP. She has been creating gold standard annotated resources based on computable definitions and developing methods for computable solutions. The focus of Dr. Savova’s research is higher level semantic and discourse processing of the clinical narrative which includes tasks such as named entity recognition, event recognition, relation detection and classification including coreference and temporal relations. The methods are mostly machine learning spanning supervised, lightly supervised and completely unsupervised.
The result of Dr. Savova’s research with her collaborators has led to the creation of the clinical Text Analysis and Knowledge Extraction System (cTAKES; ctakes.apache.org). cTAKES is an information extraction system comprising of a number of NLP components. cTAKES has been applied to a number of biomedical use cases to mine the data within the clinical narrative such as i2b2, PGRN and eMERGE to name a few. Within the Integrating Informatics and Biology to the Bedside (i2b2), cTAKES has been used to extract patient characteristics for determining their status related to a specific phenotype (Multiple Scleroris, Inflamatory Bowel Disease, Type 2 Diabetes). Within the Pharmacogenomics Research Network (PGRN), cTAKES has been applied to automatically determine patient’s disease activity and detect responders versus non-responders to a specific treatment. Within the Electronic Medical Record and Genomics (eMERGE), cTAKES has been applied to automatically discover patients with Peripheral Arterial Disease.