Language and Computation Courses
Word Vector Space Specialisation,
Ivan Vulić (University of Cambridge, UK) and Nikola Mrkšić (University of Cambridge, UK)
Word representation learning has become a research area of central importance in modern NLP. The most pervasive representation techniques are still grounded in the distributional hypothesis, as they are learned from co-occurrence information in large corpora, and coalesce various types of information (e.g., similarity vs. relatedness vs. association). Specialising vector spaces to maximise their content with respect to one key property while mitigating others has become an active research topic. Proposed approaches fall into two broad categories:
- Unsupervised methods which learn from raw textual corpora in more sophisticated ways (e.g. using context selection and attention); and
- Knowledge-base driven approaches which exploit available resources to encode external information into distributional vector spaces.
This one-week introductory course will introduce students and researchers to recent methods for constructing vector spaces specialised for a range of downstream NLP applications. We will deliver a detailed survey of the proposed methods and discuss best practices for their intrinsic and application-oriented evaluation.