Kristian Woodsend: "Optimize what you write, satisfy how you write it: Integer linear programming models for text rewriting"

Ponente: Kristian Woodsend, (Institute for Language, Cognition and ComputationUniversity of Edinburgh)

Fecha: martes 22 de enero de 2013

Hora: 10h30

Lugar de celebración: Sala 1.03, ETSI Informática, UNED

Abstract

Recent years have witnessed increased interest in data-driven methods for text rewriting, e.g., writing a document in a simpler style, or a sentence in more concise manner. It is frequently the case, when performing inference in these natural language tasks, that the decisions involved are mutually dependent. Local decision makers (such as machine-learning classifiers) have a role to play, but in order to make coherent decisions during inference, it is essential that takes these interdependencies into account. I will be giving a tutorial on how to develop Integer Linear Programming (ILP) models for inference, using models that we developed for text rewriting as examples. In these models, we combined the rules and predictions made through data-driven and machine learning methods, with declarative knowledge expressed as constraints.

In the second part, I will go on to describe our application of these techniques on two rather old and well-studied text generation problems: simplification and multi-document summarization. Leveraging large-scale corpora such as Wikipedia, we induced automatically a quasi-synchronous tree-substitution grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. I will then present ILP models that select the most appropriate content from the space of possible rewrites generated by the grammar. Finally, I will present experimental results to show that this approach is able to produce grammatical and meaningful output.

Joint work with Mirella Lapata.

 

Bio

Kristian Woodsend is currently a researcher at the University of Edinburgh. He has been working with Prof Mirella Lapata on natural language generation, on tasks such as summarization, generating highlights and captions, and simplification. This work has involved combining machine learning techniques with integer linear programming optimization methods which are able to explore the whole solution space efficiently and find the global optimum. Previously, he gained his PhD in large-scale numerical optimization methods for training support vector machines. Before that, he spent several years developing software for mobile phones.

 

Lugar de celebración

Sala 1.03
ETSI Informática, UNED
c/ Juan del Rosal, 16
Ciudad Universitaria
28040 Madrid

 
RocketTheme Joomla Templates