Treebanks, Linguistic Theories and Applications

Petya Osenova (Sofia University “St. Kliment Ohridski”, Bulgaria) and Kiril Simov (Bulgarian Academy of Sciences, Bulgaria)

The course aims to introduce syntax in contemporary linguistic theories through syntactic resources, called ‘treebanks’ and its main NLP applications. Treebanking has been active for more than 30 years now. Trade-off between linguistic grammars and applied syntactic corpora has acquired many forms, such as: attempts to linguistic neutrality or devotion to a specific linguistic theory; underspecification or loss of information; scaling with automatic parsers, etc.

The course will outline the design of the annotation schemes as reflections of the linguistic theories; the relation of the syntactic information to other types of linguistic knowledge, such as morphology and semantics; the training of parsers on the treebanks; the applications of the treebanks for Machine Translation, Information Retrieval, Word Sense Disambiguation, etc. We will trace the treebank endeavors from monolingual to multilingual architectures and will pay special attention to the Universal Dependencies Initiative as a mega-lingual project supported by Google.