Student Seminar – Anton Belyy “Systems for Human-AI Cooperation on Collecting Semantic Annotations”

March 11, 2022 @ 12:00 pm – 1:15 pm
Virtual Seminar


We consider a problem of data collection for semantically rich NLU tasks, where detailed semantics of documents (or utterances) are captured using a complex meaning representation.  Previously, data collection for such tasks was either handled at the cost of extensive annotator training (e.g. in FrameNet or PropBank) or simplified meaning representation (e.g. in QA-SRL or Overnight).  In this talk, we present two systems [1, 2] that aim to support fast, accurate, and expressive semantic annotations by pairing human workers with a trained model in the loop.
The first system, called Guided K-best [1], is an annotation toolkit for conversational semantic parsing.  Instead of typing annotations from scratch, data specialists choose a correct parse from the K-best output of a few-shot prototyped model.  As the K-best list can be large (e.g. K=100), we guide the annotators’ exploration of the K-best list via explainable hierarchical clustering.  In addition, we experiment with RoBERTa-based reranking of the K-best list to recalibrate the few-shot model towards [email protected]  The final system allows to annotate data up to 35% faster than the standard, non-guided K-best and improves the few-shot model’s top-1 accuracy by up to 18%.  The second system, called SchemaBlocks [2], is an annotation toolkit for schemas, or structured descriptions of frequent real-world scenarios (e.g., cooking a meal).  It represents schemas in the annotation UI as nested blocks.  Using a novel Causal ARM model, we further speed up the annotation process and guide data specialists towards expressive and diverse schemas.  As part of this work, we collect 232 schemas, evaluating their internal coherence and their coverage on large-scale newswire corpora.


Center for Language and Speech Processing