Introduction to Linked Open Data in Linguistics

Language and Computation Courses

Introductory Course

Introduction to Linked Open Data in Linguistics,
Thierry Declerck (DFKI GmbH, Germany and ACDH-ÖAW, Austria) and John P. McCrae (The National University of Ireland Galway, Ireland)

Week 2, 9:00 – 10:30, Room 255, Floor 2

Publishing language resources under open licenses and linking them together has been an area of increasing interest in academic circles, including applied linguistics, lexicography, natural language processing and information technology, and to facilitate exchange of knowledge and information across boundaries between disciplines as well as between academia and the IT business.

Until now this development has been discussed in workshops, datathons, and has also been at the core of the work conducted within the W3C Ontology-Lexica Community Group, whose final report has been published in May 2016 (Lexicon Model for Ontologies: Community Report, 10 May 2016)1. We see this development as an important step towards making linguistic data:

  1. easily and uniformly queryable,
  2. interoperable and
  3. sharable over the Web using open standards such as the HTTP protocol and the RDF data model.

While it has been shown that linked data has significant value for the management of language resources in the Web, the practice is still far from being an accepted standard in the community. Thus it is important that we continue to push the development and adoption of linked data technologies among creators of language resources, but also within curricula at universities and summer schools.

This proposed course for ESSLLI 2018 class has the main goal of giving people in the field of computational linguistics practical skills in the fields of linked data and semantic technologies as applied to linguistics and lexical data. After developing a short initial ontology, participants will learn step by step how to represent multilingual data with their ontology and how to ground it linguistically. We will introduce a variety of state-of-the-art multilingual representation formats and application scenarios in which to leverage and exploit multilingual semantic data. Finally, we will detail the connection of lexical and corpus resources using the NIF (http://persistence.uni-leipzig.org/nlp2rdf/) data format. At the end of the class, participants will be able to use Linguistic Linked Open Data (LLOD) for the semantic representation of linguistic data. Students will also be made familiar with best practices for publishing their own linguistic data in the Linguistic Linked Data cloud (guidelines resulting from a past European Supporting Action, LIDER: http://www.lider-project.eu/).

Both instructors of this proposed course have spent the last years on investigating the interface of lexical data and knowledge representation systems. We can refer to a number of publications together on various aspects of this intersection of ontologies and natural language resources. John was a driving force behind the development of the Lexicon Model for Ontologies (lemon) and its further development in the context of a W3C Working Group on Ontology-Lexica (see the section “practical information” and “references” below). Thierry has considerable experience in connecting the field of lexicography with the LLOD, and he has been in teaching on other but related topics at three past ESSLLIs. John and Thierry have also been teaching on the topics at recent Linguistic Linked Data and Semantic Technology summer schools (Eurolan, http://eurolan.info.uaic.ro/2015/) and datathons (LIDER datathon, http://datathon2017.retele.linkeddata.es/)

Course Material:

Day 1/2:   About_Language_Resources_and_RDF_vocabularies_and_SPARQL_Intro

Day 3: Introduction to the OntoLex-Lemon Model and Lemon Design Patterns

Day 4: SPARQLing with GraphDB

Day 5: Linked Open Data Cloud and WordNets and TEI-LEX