The top-k retrieval problem aims to find the optimal set of k documents from a number of relevant documents given the user’s query.
The key issue is to balance the relevance and diversity of the top-k search results. In this paper, we address this problem using Facility Location
Analysis taken from Operations Research, where the locations of facilities are optimally chosen according to some criteria. We show how this
analysis technique is a generalization of state-of-the-art retrieval models for diversification (such as the Modern Portfolio Theory for Information
Retrieval), which treat the top-k search results like “obnoxious facilities” that should be dispersed as far as possible from each other. However,
Facility Location Analysis suggests that the top-k search results could be treated like “desirable facilities” to be placed as close as possible to their
customers. This leads to a new top-k retrieval model where the best representatives of the relevant documents are selected. In a series of
experiments conducted on two TREC diversity collections, we show that significant improvements can be made over the current state-of-the-art
through this alternative treatment of the top-k retrieval problem.
-
G. Zuccon, L. Azzopardi, D. Zhang, and J. Wang, "Top-k Retrieval using Facility Location Analysis," in
ECIR Best Paper, 2012.
bibtex
@inproceedings{ecir2012,
author = {Guido Zuccon and Leif Azzopardi and Dell Zhang and Jun Wang},
title = {Top-k Retrieval using Facility Location Analysis},
booktitle = {ECIR Best Paper},
year = {2012},
url={http://www.dcs.bbk.ac.uk/~dell/publications/dellzhang_ecir2012.pdf}
}

The Efficient Frontier in document ranking
Jun Wang, Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval, ECIR2009
This paper concerns document ranking in information retrieval – particularly collaborative filtering and recommender systems . In information retrieval systems, the widely accepted probability ranking principle (PRP) suggests that, for optimal retrieval, documents should be ranked in order of decreasing probability of relevance. In this paper, we present a new document ranking paradigm, arguing that a better, more general solution is to optimize top-n ranked documents as a whole, rather than ranking them independently. Inspired by the Modern Portfolio Theory in finance, we quantify a ranked list of documents on the basis of its expected overall relevance (mean) and its variance; the latter serves as a measure of risk, which was rarely studied for document ranking in the past. Through the analysis of the mean and variance, we show that an optimal rank order is the one that maximizes the overall relevance (mean) of the ranked list at a given risk level (variance). Based on this principle, we then derive an efficient document ranking algorithm. It extends the PRP by considering both the uncertainty of relevance predictions and correlations between retrieved documents. Furthermore, we quantify the benefits of diversification, and theoretically show that diversifying documents is an effective way to reduce the risk of document ranking. Experimental results on the collaborative filtering problem confirms the theoretical insights with improved recommendation performance, e.g., achieved over 300% performance gain over the PRP-based ranking on the user-based recommendation.
@INPROCEEDINGS{Wang:ecir2009:1,
AUTHOR = {Jun Wang},
TITLE = {“{M}ean-Variance Analysis: A New Document Ranking Theory in Information Retrieval},
BOOKTITLE = {Proc. of European Conference on Information Retrieval (ECIR 2009)},
YEAR = {2009}
}
The ECIR Paper (PDF) 
Title: Relevance Models for Collaborative Filtering
defended April 2008
Doctoral Consortium Award, SIGIR2006
Abstract:

Collaborative filtering is the common technique of predicting the
interests of a user by collecting preference information from many
users. Although it is generally regarded as a key information
retrieval technique, its relation to the existing information
retrieval theory is unclear. This thesis shows how the development of
collaborative filtering can gain many benefits from information
retrieval theories and models. It brings the notion of relevance into
collaborative filtering and develops several relevance models for
collaborative filtering. Besides dealing with user profiles that are
obtained by explicitly asking users to rate information items, the
relevance models can also cope with the situations where user profiles
are implicitly supplied by observing user interactions with a
system. Experimental results complement the theoretical insights with
improved recommendation accuracy for both item relevance ranking and
user rating prediction. Furthermore, the approaches are more than just
analogy: our derivations of the unified relevance model show that
popular user-based and item-based approaches represent only a partial
view of the problem, whereas a unified view that brings these partial
views together gives better insights into their relative importance
and how retrieval can benefit from their combination.
A PDF version can be downloaded from here
@PhdThesis{wangphdthesis,
author = {Jun Wang},
title = {Relevance Models for Collaborative Filtering},
school = {Delft University of Technology},
year = {2008},
address = {Delft, The Netherlands},
month = {April},
url = {},
}
Jun Wang, 2008