Weak and Strong Learning of Context-Free Grammars
Alexander Clark, Royal Holloway University of London
September 11, 2012
Rapid progress has been made in the last few years in the 'unsupervised' learning of context-free grammars using distributional techniques: a core challenge for theoretical linguistics and NLP. However these techniques are on their own of limited value because they are merely weak results -- we learn a grammar that generates the right strings, but not necessarily a grammar that defines the right structures.
In this talk we will look at various ways of moving from weak learning algorithms to strong algorithms that can provably learn also the correct structures. Of course in order to do this we need to define a mathematically precise notion of syntactic structure. We will present a new theoretical approach to this based on considering transformations of grammars through morphisms of algebraic structures that interpret grammars. Under this model we can say that the simplest/smallest grammar for a language will always use a certain set of syntactic categories, and a certain set of lexical categories; these categories will be drawn from the syntactic concept lattice, a basis for several weak learning algorithms for CFGs. This means that under mild Bayesian assumptions we can consider only grammars that use these categories; this leads to some nontrivial predictions about the nature of syntactic structure in natural languages.
Alexander Clark is in the Department of Computer Science at Royal Holloway, University of London. His research interests are in grammatical inference, theoretical and mathematical linguistics and unsupervised learning. He is currently president of SIGNLL and chair of the steering committee of the ICGI; a book coauthored with Shalom Lappin, 'Linguistic Nativism and the Poverty of the Stimulus' was published by Wiley-Blackwell in 2011.