Protein folding and parsing

Julia Hockenmaier, University of Pennsylvania

April 17, 2007

View Seminar Video


We know that adult speakers of a language have no problem understanding newspapers in that language, and that proteins fold spontaneously into specific three-dimensional structures. However, a sentence in the Wall Street Journal may have millions of possible grammatical analyses, and a protein may have millions of possible structures. As computer scientists who want to design systems that can either parse natural language or predict the folded structure of proteins, we are faced with two very similar search problems: In both cases, we want to find the optimal structure of an input sequence among an exponential number of possible alternatives. In this talk, I will demonstrate how CKY, a standard dynamic programming algorithm that is normally used in natural language parsing, can be adapted to give us novel insights into the protein folding problem. If we assume that folding is a greedy, hierarchical search for lowest-energy structures, CKY provides an efficient way to find all direct folding routes. I will also show that we can extend CKY to construct a Markov chain model of the entire folding process, and that this Markov chain may explain an apparent contradiction between what experimentalists observe in a test tube and what many theorists predict.


Julia Hockenmaier is a postdoc with Aravind Joshi at the University of Pennsylvania, and also a frequent visitor to Ken Dill's lab at the University of California at San Francisco. Her research areas are natural language processing (computational linguistics) and computational biology, specifically natural language parsing and protein folding.