David Lillis: Applying Machine Learning Diversity Metrics to Data Fusion in Information Retrieval

Applying Machine Learning Diversity Metrics to Data Fusion in Information Retrieval

David Leonard, David Lillis, Fergus Toolan, Lusheng Zhang, Rem W. Collier and John Dunnion

In P. Clough, C. Foley, C. Gurrin, G. J. F. Jones, W. Kraaij, H. Lee, and V. Mudoch, editors, Advances in Information Retrieval - 33rd European Conference on IR Research ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings., volume 6611 of Lecture Notes in Computer Science, pages 695--698. Springer Berlin Heidelberg, Dublin, Ireland, 2011.

Abstract

The Supervised Machine Learning task of classification has parallels with Information Retrieval (IR): in each case, items (documents in the case of IR) are required to be categorised into discrete classes (relevant or non-relevant). Thus a parallel can also be drawn between classifier ensembles, where evidence from multiple classifiers are combined to achieve a superior result, and the IR data fusion task. This paper presents preliminary experimental results on the applicability of classifier ensemble diversity metrics in data fusion. Initial results indicate a relationship between the quality of the fused result set (as measured by MAP) and the diversity of its inputs.