Deep semantic annotation describes the identification and markup of both entities and relations that are not overtly present or syntactically transparent in the linguistic utterance/text. This includes: the labeling of semantic roles to explicit arguments (a task well studied): the identification of semantically entailed implicit arguments in text; the identification of type coercion contexts; and presuppositions and implicatures coming from both compositional constructions as well as discourse contexts. We discuss how the "task decomposition" methodology can reduce a difficult annotation problem to a set or sequence of simpler, high-IAA tasks. In particular, we demonstrate how the decomposition process renders the tasks amenable to crowdsourcing platforms, such as Amazon Mechanical Turk.
Non-overt semantic relations are the most difficult dependencies to identify in computational linguistics through automatic methods. Semantic roles are perhaps the best studied of these, but there are many other important relations that have not been adequately investigated for the purpose of annotation and resource development.
These include: coercion, argument selection, and type shifting phenomena; presupposition; event causation; entailment. This tutorial shows how such deep semantic phenomena can be captured by using "task decomposition” when performing annotation.
Full-scale human-like interpretation and inference over natural language text remains an elusive task, as evidenced by some recent community challenges on tasks such as entailment, temporal reasoning, semantic role labeling, and others. This situation is largely due to the fact that such tasks require the ability to identify and manipulate information that typically is not associated with a surface expression, which also complicates producing consistent, reliable, easy-to-verify annotations.
In this tutorial, we will cover the techniques for adding such information to the text in the form of deep semantic annotation, with a particular focus on the methods for decomposing such annotation tasks into a series of manageable hierarchically arranged subproblems, where annotator decisions follow a sequence of intuitively well-defined choices.
Target Audience: Those interested in developing datasets involving annotations with semantic information for use in machine learning.
Duration of Tutorial: .5 day