Title: Relevance Models for Collaborative Filtering
defended April 2008
Doctoral Consortium Award, SIGIR2006
Abstract:

Collaborative filtering is the common technique of predicting the
interests of a user by collecting preference information from many
users. Although it is generally regarded as a key information
retrieval technique, its relation to the existing information
retrieval theory is unclear. This thesis shows how the development of
collaborative filtering can gain many benefits from information
retrieval theories and models. It brings the notion of relevance into
collaborative filtering and develops several relevance models for
collaborative filtering. Besides dealing with user profiles that are
obtained by explicitly asking users to rate information items, the
relevance models can also cope with the situations where user profiles
are implicitly supplied by observing user interactions with a
system. Experimental results complement the theoretical insights with
improved recommendation accuracy for both item relevance ranking and
user rating prediction. Furthermore, the approaches are more than just
analogy: our derivations of the unified relevance model show that
popular user-based and item-based approaches represent only a partial
view of the problem, whereas a unified view that brings these partial
views together gives better insights into their relative importance
and how retrieval can benefit from their combination.
A PDF version can be downloaded from here
@PhdThesis{wangphdthesis,
author = {Jun Wang},
title = {Relevance Models for Collaborative Filtering},
school = {Delft University of Technology},
year = {2008},
address = {Delft, The Netherlands},
month = {April},
url = {},
}
Jun Wang, 2008

Most retrieval models estimate the relevance of each document to a query and rank the documents accordingly. However, such an approach ignores the uncertainty associated with the estimates of relevancy. If a high estimate of relevancy also has a high uncertainty, then the document may be very relevant or not relevant at all. Another document may have a slightly lower estimate of relevancy but the corresponding uncertainty may be much less. In such a circumstance, should the retrieval engine risk ranking the first document highest, or should it choose a more conservative (safer) strategy that gives preference to the second document? There is no definitive answer to this question, as it depends on the risk preferences of the user and the information retrieval system. In this paper we present a general framework for modeling uncertainty and introduce an asymmetric loss function with a single parameter that can model the level of risk the system is willing to accept. By adjusting the risk preference parameter, our approach can effectively adapt to users’ different retrieval strategies. We apply this asymmetric loss function to a language modeling framework and a practical risk-aware document scoring function is obtained. Our experiments on several TREC collections show that our “risk-averse” approach significantly improves the Jelinek-Mercer smoothing language model, and a combination of our “risk-averse” approach and the Jelinek-Mercer smoothing method generally outperforms the Dirichlet smoothing method. Experimental results also show that the “risk-averse” approach, even without smoothing from the collection statistics, performs as well as three commonly-adopted retrieval models, namely, the Jelinek- Mercer and Dirichlet smoothing methods, and BM25 model.
Our work on this topic has been accepted in SIGIR2009 and ECIR2009.
@INPROCEEDINGS{Wang:sigir2009:2,
AUTHOR = {Jianhan Zhu and Jun Wang and Michael Taylor and Ingemar Cox},
TITLE = {Risky Business: Modeling and Exploiting Uncertainty in Information Retrieval},
BOOKTITLE = {SIGIR09 Full Paper},
YEAR = {2009}
}
@INPROCEEDINGS{Wang:ecir2009:2,
AUTHOR = {Jianhan Zhu and Jun Wang and Michael J Taylor and Ingemar Cox},
TITLE = {Risk-aware Information Retrieval},
BOOKTITLE = {Proc. of European Conference on Information Retrieval (ECIR 2009)},
YEAR = {2009}
}
SIGIR Paper (PDF) ECIR Paper (PDF)
