| CLSP Homepage : Workshop Homepage | |
![]() | |
| Workshop 2005 | Saturday, July 4, 2009 |
|
Problem Definition: The proposed project will tackle the problem of parsing Arabic dialects. Parsing is an important component in many advanced NLP systems, and has also been shown to be useful for language modeling for ASR. As is well known, Arabic exhibits diglossia, i.e., the coexistence of two forms of language, a high variety with standard orthography and sociopolitical clout which is not natively spoken by anyone (Modern Standard Arabic, MSA) and low varieties that are primarily spoken and lack writing standards (Arabic dialects). The dialects and MSA form a continuum of variation at the lexical, phonological, morphological, and syntactic levels. There are important resources currently available for MSA with much on-going NLP work; for example, there are several syntactic and semantic parsers for MSA. However, Arabic dialect resources and NLP research are still at an infancy stage. There are linguistic studies of Arabic dialectal syntax but there is no language engineering work (such as computational grammars). There are no parallel written corpora between any of the dialects and any other language, including MSA. Thus, most of the techniques developed for parsing that exploit supervised (in the canonical sense) machine learning do not apply, since there is no sufficient annotated data to learn from. We would like to leverage existing resources and tools for MSA in order to parse Arabic dialects using both symbolic techniques and machine learning approaches. Impact
Click here for technical details |
|||
| Team Members: | |||
| Owen Rambow | Team Leader | Columbia University | rambow at cs dot columbia dot edu |
| Rebecca Hwa | Senior Researcher | University of Pittsburgh | hwa at cs dot pitt dot edu |
| David Chiang | Senior Researcher | University of Maryland | dchiang at umiacs dot umd dot edu |
| Nizar Habash | Senior Researcher | Columbia University | nizar at NizarHabash dot com |
| Khalil Sima'an | Senior Researcher | University of Amsterdam | simaan at science dot uva dot nl |
| Mona Diab | Senior Researcher | Columbia University | mdiab at cs dot columbia dot edu |
| Roger Levy | Graduate Student | Stanford University | rog at stanford dot edu |
| Carol Nichols | Graduate Student | University of Pittsburgh | cln23 at cs dot pitt dot edu |
| Safiullah Shareef | Undergraduate Student | Johns Hopkins University | safi at jhu dot edu |
| Vincent Lacey | Undergraduate Student | Georgia Tech | gtg813b at mail dot gatech dot edu |
| Technical Contact: Owen Rambow Computer Science Department Columbia University |
Administrative Contact: 2005 Summer Workshop Center for Language and Speech Processing Johns Hopkins University |
||
| The Center for Language and Speech Processing The Johns Hopkins University 3400 North Charles Street, Barton Hall Baltimore, MD 21218 | |||||
| Telephone: (410) 516-4237 | Fax: (410) 516-5050 | E-mail: clsp@clsp.jhu.edu | |||