Enrique Amigó: "The Fundamental Properties of Similarity and their Application to Text Processing and Information Access Tasks"

Ponente: Enrique Amigó (NLP&IR-UNED)

Fecha: martes 28 de mayo de 2013

Hora: 12h00

Lugar de celebración: Sala 1.03, ETSI Informática, UNED

Abstract

Many Natural Language Processing tasks can be seen as a problem of defining similarity measures between texts (e.g. document/document, query/document, sentence/sentence, etc).  A crucial issue is then to find the most appropriate similarity measure or combination of measures.  The standard development cycle consists of optimizing measures with respect to test collections with human assessments; However, a common problem with this methodology is that evaluation results change considerably over different datasets.

In this talk, we present three theorems that any similarity measure satisfies, providing empirical evidences and theoretical proves. These theorems explain multiple phenomena observed in PLN tasks such us, the high predictive power of text output evaluation measures at system level, the decreasing nature of precision/recall curves in IR tasks, the pooling biasing phenomena in TREC corpora, the high performance of combining diverse systems, the unexpected high performance of voting methods in machine learning scenarios, among others. In addition,  the theorems  provide an unsupervised method for combining evaluation measures, predicting the clustering threshold in grouping tasks and predicting the average relevance of IR system produced rankings.

 

Bio

Enrique Amigó is researcher and lecturer at UNED's NLP&IR Group. His main research interest are focused on evaluation metrics applied to Natural Language Processing tasks.

 

Lugar de celebración

Sala 1.03
ETSI Informática, UNED
c/ Juan del Rosal, 16
Ciudad Universitaria
28040 Madrid


Materiales

Presentación: "The Fundamental Properties of Similarity  and their Application to Text Processing and Information Access Tasks"


 
RocketTheme Joomla Templates