A Unified Relevance Retrieval Model by Eliteness Hypothesis

June 16th, 2011 No comments
  • G. Jagadeesh, S. E. Robertson, and J. Wang, "A Unified Relevance Retrieval Model by Eliteness Hypothesis," in ArXiv e-prints (Working Paper), 2011. bibtex
    Go to document
    @inproceedings{unifiedmodel-ictir,
      author = {Gorla. Jagadeesh and S.E. Robertson and Jun Wang},
      title = {A Unified Relevance Retrieval Model by Eliteness Hypothesis},
      booktitle = {ArXiv e-prints (Working Paper)},
      year = {2011},
      Howpublished={ArXiv e-prints, \url{http://arxiv.org/abs/1106.2946}},
      pdf={http://arxiv.org/pdf/1106.2946v6.pdf},
      }
In this paper, an Eliteness Hypothesis for information retrieval is proposed, where we define two generative processes to create information items and queries. By assuming the deterministic relationships between the eliteness of terms and relevance, we obtain a new theoretical retrieval framework. The resulting ranking function is a unified one as it is capable of using available relevance information on both the document and the query, which is otherwise unachievable by existing retrieval models. Our preliminary experiment on a simple ranking function has demonstrated the potential of the approach.
Categories: Information Retrieval Models Tags:

ECIR2011 Tutorial: Risk Management in Information Retrieval

November 1st, 2010 Comments off
Risk Management

Risk Management (image courtesy of matrixsafety.com.au)

Jun Wang, University College London and Kevyn Collins-Thompson, Microsoft Research

  • J. Wang and K. Collins-Thompson, "ECIR2011 Tutorial: Risk Management in Information Retrieval," in ECIR, 2011. bibtex
    Go to document
    @inproceedings{ecirtutorial,
      author = {Jun Wang and Kevyn Collins-Thompson},
      title = {ECIR2011 Tutorial: Risk Management in Information Retrieval},
      booktitle = {ECIR},
      year = {2011},
      pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/papers/2011-ECIRTutorial-RiskIR.pdf},
      }

1. Course objectives

Risk modelling and management are a new concept in Information Retrieval (IR) modelling. The new way of  thinking has significantly departed from the classic information retrieval methodologies originated from the Probability Ranking Principle, the Robertson- Spärck Jones model (the resulting BM25 formula), and the statistical language models. The recent research  in risk-aware IR models has been made by taking an analogy with the financial risk management. It has been demonstrated that the idea about uncertainty and retrieval risks, as well as the resulting mathematical modelling tools, not only provide theoretical explanations of some empirical retrieval results (e.g., the need for diversification, the trade-off between MAP and MRR, and the justification for pseudo-relevance feedback), but also help us develop useful retrieval techniques such as risk-aware query expansion and optimal document ranking.

The tutorial is aimed at providing a comprehensive introduction of the emerging risk modelling and management techniques in information retrieval systems. While the basic theories (such as portfolio theory of IR and mean-variance analysis) and risk models of information retrieval are covered, the tutorial is also focused on the resulting practical algorithms of query expansion, relevance ranking, IR metric optimization as well as their performance evaluations. Practical IR applications such as Web search engines, multimedia retrieval, and collaborative filtering will also be covered, as well as discussion of new opportunities for future research and applications.

2. Its relevance to the information retrieval community

The theoretical research and mathematical modelling of IR systems are the key driver behind the development of the information retrieval (IR) field. The first theoretical study of information retrieval can be traced back to Maron and Kuhns’ Probabilistic Indexing paper in 1960, where the classic school of thinking about IR systems was established [1]. We call it individualism here as the goal is to come up with a relevance score for each of the documents, and to rank them individually with respect to those scores [2, 3]. The two different document-oriented and query oriented views on how to assign a probability of relevance of a document to a user need has resulted in two different types of practical models. The Robertson- Spärck Jones model (the resulting BM25 formula) takes the query-oriented view (or need-oriented view), assuming a given information need and choosing the query representation in order to select relevant documents against others [4], while in the document-oriented view, the language models aim to choose the appropriate document representation to match queries or judge its relevance [5, 6].

The individualism has drawbacks [7]. Recently, there has been a new type of theoretical research that uses a probabilistic framework to move beyond the Probability Ranking Principle and determine optimal ranking algorithms by considering the finial-ranking context [8, 9, 10] or regarding the returned documents as a whole [11]. Portfolio theory of information retrieval has emerged as a new theory in IR modelling and provided a sounded theoretical framework and more powerful mathematical tool for us to understand and analyze retrieval processes such as query expansion [12,13], document ranking [14, 15], and evaluation [16, 17]. Drawing an analogy with finance and financial risk modelling [18], it has been realized that ranking documents under uncertainty is not just about picking individual relevant documents (Individualism), but about choosing the right combination of relevant documents [14]. Here, we name this new information retrieval modelling methodology Portfolioism. This motivates the IR researchers to quantify a ranked list of documents on the basis of its expected overall relevance (mean) and its variance, where the latter serves as a measure of risk.

The risk modelling and portfolio theory of IR provide theoretical explanations of some empirical retrieval results such as the need for diversification, the trade-off between MAP and MRR, and the justification for pseudo-relevance feedback. They also guide us to develop many useful retrieval techniques. For example, risk modelling has been successfully applied in query expansion [12, 13], and mean-variance analysis has been adopted for evaluation [16, 17]. The latest development in risk modelling has been used to directly maximize the expectation of various standard metrics (average precision, mean reciprocal rank, discounted cumulative gain, etc) [19].

3. The tutorial content and how the tutorial will be structured

Theme 1 The classic risk modelling era (Individualism)– 30 minutes (It should be emphasized that many previous tutorials on information retrieval models were limited only to theme 1 (mostly 1 and 2). To have a complete and consistent view, we will quickly spend half hour on it and yet provide a rather fresh discussion and new insights from the novel risk-modeling viewpoint.)

  1. Probability Ranking Principle [1, 2, 3, 7]
  2. Classic IR models [4, 5]
  3. Handling Ranking Context [8, 9, 10]

Theme 2 The Modern risk modelling era (Portfolioism) - 120 minutes

  1. Ranking, IR metric optimization, and mean-variance analysis [14, 15, 19]
  2. Risk-aware Query expansion [12, 13]
  3. Evaluation [16, 17]
  4. Other applications (multimedia retrieval, collaborative filtering, etc.) [11, 24, 25]

Theme 3 Relation to other recent developments – 30 minutes

  1. Robust classification [26, 27]
  2. Diversification and Multi-armed Bandit Machine [20, 21, 22]

10. The relation to Quantum Probability Ranking Principle [23]

11. Future opportunities

4. Planned course materials

Slides, handouts, and demos.

5. Key references

[1]    M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. J. ACM, 7(3), 1960.

http://portal.acm.org/ft_gateway.cfm?id=321035&type=pdf&CFID=15232325&CFTOKEN=45719553

[2]    W. S. Cooper. The inadequacy of probability of usefulness as a ranking criterion for retrieval system output. University of California, Berkeley, 1971.

[3]    S. E. Robertson and N. Belkin. Ranking in principle. Journal of Documentation, 1978.

[4]    S. E. Robertson and K. Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129–46, 1976.

[5]    D. Hiemstra. Using language models for information retrieval. PhD thesis, Univ.Twente, 2001.

[6]    J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, 2001.

[7]    M. D. Gordon and P. Lenk. A utility theoretic examination of the probability ranking principle in information retrieval. JASIS, 42(10):703–714, 1991.

[8]    C. Zhai and J. D. Lafferty. A risk minimization framework for information retrieval. Inf. Process. Manage., 42(1):31–55, 2006.

[9]    H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, 2006.

[10] J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998.

[11] Jun Wang. “Mean-variance analysis: A new document ranking theory in information retrieval. In Proc. of European Conference on Information Retrieval (ECIR 2009), 2009.

[12] K. Collins-Thompson. “Estimating robust query models with convex optimization”. Advances in Neural Information Processing Systems 21  (NIPS), 2008

[13] K. Collins-Thompson. “Reducing the risk of query expansion via robust constrained optimization”.  in CIKM, 2009

[14] Jun Wang and Jianhan Zhu. Portfolio theory of information retrieval. In SIGIR09, 2009.

[15] Jianhan Zhu, Jun Wang, Michael Taylor, and Ingemar Cox. Risky business: Modeling and exploiting uncertainty in information retrieval. In SIGIR09, 2009.

[16] K. Collins-Thompson. “Accounting for stability of retrieval algorithms using risk-reward curves”. Proceedings of SIGIR 2009 Workshop on the Future of Evaluation in Information Retrieval, Boston. pg. 27-28.

[17] J. Zhu, J. Wang, and V. Vinay, “Topic (Query) Selection for IR Evaluation,” in SIGIR09

[18] H. Markowitz. Portfolio selection. Journal of Finance, 1952.

[19] J. Wang and J. Zhu, “On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics,” in SIGIR10 Full Paper, 2010

[20] A. Slivkins, F. Radlinski and S. Gollapudi, Learning Optimally Diverse Rankings over Large Document Collections, ICML 2010.

[21] Thorsten Joachims, Thomas Hofmann, Yisong Yue, Chun-Nam Yu, Predicting Structured Objects with Support Vector Machines, Communications of the ACM (CACM)

[22] C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. B¨uttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008.

[23]  G. Zuccon, L. Azzopardi, Using the Quantum Probability Ranking Principle to Rank Interdependent Documents, in ECIR 2010

[24] Robin Aly, Aiden Doherty, Djoerd Hiemstra, Alan Smeaton. “Beyond Shot Retrieval: Searching for Broadcast News Items Using Language Models of Concepts’’. In ECIR, 2010

[25] Portfolio Theory of Multimedia Fusion, X.Y. Wang and M.S. Kankanhalli, ACM International Conference on Multimedia (ACMMM 2010), 2010.

[26] Lanckriet, G.R.G., El Ghaoui, L., Bhattacharyya, C., Jordan, M.I. (2002). A Robust Minimax Approach to Classification . Journal of Machine Learning Research, Vol. 3, pp. 555-582.

[27] Dredze, M., Crammer, K., and Pereira, F. 2008. Confidence-weighted linear classification. In Proceedings of the 25th international Conference on Machine Learning (Helsinki, Finland, July 05 – 09, 2008). ICML ’08, vol. 307. ACM, New York, NY, 264-271.

ACM RecSys10: Optimizing multiple objectives in collaborative filtering

October 20th, 2010 Comments off

This paper is about the utility of making personalized recommendations. While it is important to accurately predict the target user’s preference, in practice the accuracy should not be the only concern; a useful recommender system needs to consider the user’s utility or satisfaction of fulfilling a certain information seeking task. For example, recommending popular items (products) is unlikely to result in more gain than discovering insignificant (“long tail”) yet liked items because the popular ones might be already known to the user. Equally, recommending items that are out of stock would be frustrating for both the user and system if the system is employed to discover items to purchase. Thus, it is important to have a flexible recommendation framework that takes into account additional recommendation goals meanwhile minimizing the performance loss in order to provide greater adjustability and a better user experience.

To achieve this, in this paper, we propose a general recommendation optimization framework that not only considers the predicted preference scores (e.g. ratings) but also deals with additional operational or resource related recommendation goals. Using this framework we demonstrate through realistic examples how to expand existing rating prediction algorithms by biasing the recommendation depending on other external factors such as the availability, profitability or usefulness of an item. Our experiments on real data sets demonstrate that this framework is indeed able to cope with multiple objectives with minor performance loss.

http://portal.acm.org/citation.cfm?id=1864708.1864723

ACM SIGIR2010 Optimal Ranker

August 11th, 2010 Comments off
Optimal Ranker

Optimal Ranker

The slides of  the SIGIR talk will be put online.


A demo of portfolio theory of movie recommendation

May 10th, 2010 Comments off

Picture 1

Thanks to David Stefan, a movie recommendation demo system is up and running. This is a 4th year final project and intended to demo the effectiveness of using portfolio theory to recommender systems.

To launch the application, click here. (if you found any bugs, do drop us an email!)

On Statistical Analysis and Optimization of IR Effectiveness Metrics

April 9th, 2010 Comments off
b

The two distinct stages in the statistical document ranking process.

a

By adjusting the correlation between documents from -0.2 to 1.0, the gain on performance for Average Precision, DCG (Discounted Cumulative Gain), and RR (Reciprocal Rank), respectively.

This paper proposes a probabilistic framework for optimizing IR
effectiveness metrics during retrieval. If the effectiveness of a
ranked list can be defined by an IR metric, the optimal rank
decision should be chosen by optimizing the metric. During retrieval
the relevance of documents is, however, not known a priori, and the
resulting optimization objective function is the expected value of
that IR metric with respect to a probability measure of the
documents’ relevance. To have a general probabilistic optimization
framework, we use the joint probability of relevance to
measure the uncertainty of all the documents’ relevance in the
collection as a whole, and statistically analyze the expected values
of IR metrics under such uncertainty. The analytical forms of
expected IR metrics show some interesting properties that have not
been revealed before. Our analysis and optimization framework do not
assume a particular (relevance) retrieval model and metric, making
it applicable to many existing IR models and metrics. The
experiments demonstrate its significance in adapting to IR metrics
for the text retrieval task.

  • J. Wang and J. Zhu, "On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics," in Proc. of the Annual International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR), 2010. bibtex
    Go to document
    @INPROCEEDINGS{Wang:sigir2010:1,
      author = {Jun Wang and Jianhan Zhu},
      TITLE = {On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics},
      BOOKTITLE = {Proc. of the Annual International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR)},
      pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/papers/2010-sigir10-optranker.pdf},
      YEAR = {2010}
    }

ECIR 2010 Group Photo, Bletchley Park

March 31st, 2010 No comments

ECIR Tour of Bletchley Park

Categories: ECIR2010, Events Tags:

Directional Error Based Collaborative Filtering

December 18th, 2009 No comments

Goal-driven Collaborative Filtering – A Directional Error Based Approach, ECIR10

Tamas Jambor and Jun Wang

  • T. Jambor and J. Wang, "Goal-driven Collaborative Filtering: A Directional Error Based Approach," in Proc. of European Conference on Information Retrieval (ECIR 2010). Oral Presentation, 2010. bibtex
    Go to document Go to document
    @INPROCEEDINGS{Wang:ecir2010,
      author = {Tamas Jambor and Jun Wang},
      TITLE = {Goal-driven Collaborative Filtering: A Directional Error Based Approach},
      BOOKTITLE = {Proc. of European Conference on Information Retrieval (ECIR 2010). Oral Presentation},
      pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/papers/2010-ecir10-gdcf.pdf},
      url={http://www.springerlink.com/content/d67350166q173675/},
      YEAR = {2010}
    }
directionalerror

Directional Risk Preference of Recommendation Prediction

Collaborative filtering is one of the most effective techniques for making personalized content recommendation. In the literature, a common experimental setup in the modeling phase is to minimize,
either explicitly or implicitly, the (expected) error between the predicted ratings and the true user ratings, while in the evaluation phase, the resulting model is again assessed by that error. In this paper,  we argue that defining an error function that is fixed across rating scales is however limited, and different applications may have different recommendation goals thus error functions. For example, in some cases, we might be more concerned about the highly predicted items than the ones with low ratings (precision minded), while in other cases, we want to make sure not to miss any highly
rated items (recall minded). Additionally, some applications might require to produce a top-N recommendation list, where the rank-based performance measure becomes valid. To address this issue, we propose a flexible optimization framework that can adapt to individual recommendation goals. We introduce a Directional Error Function to capture the cost (risk) of each individual  predictions, and it can be learned from the specified performance measures at hand. Our preliminary experiments on a real data set demonstrate that significant performance gains have been achieved.

CNIKM 2009 Papers online

December 8th, 2009 Comments off

The workshop proceeding have been online.

http://www.informatik.uni-trier.de/~ley/db/conf/cikm/cnikm2009.html

The call for papers attracted 19 submissions from 12 countries (Australia, Brazil, Canada, China, Germany, India, Korea, Netherlands, Singapore, Sweden, Switzerland and United Kingdom). The program committee accepted 9 full papers and 3 short papers that cover a variety of topics, including community detection, information spread, centrality analysis, link prediction, peer-to-peer networks and recommender systems. In addition, the program includes a keynote speech by Dr.  Jure Leskovec of Stanford University on the clustering structure of very large networks. We hope that these proceedings will serve as a valuable reference for researchers and practitioners in this area.

Categories: CIKM2009 Tags:

Invited Talk from Microsoft Research, Thursday, 17 Sept, 2-3pm

September 15th, 2009 No comments

Title:

A convex optimization framework for stable, selective query expansion
Time
: Thursday, 17 Sept 2009, 2-3pm
Location: South Wing Garwood LT, University College London
http://www.ucl.ac.uk/efd/roombooking/building-location/?id=012

Invited Speaker: Dr. Kevyn Collins-Thompson,  Microsoft Research, USA

Abstract: Query expansion is a long-studied approach for improving retrieval effectiveness by enhancing the user’s original query with additional related words. Even state-of-the-art expansion methods, however, have problems that have limited their deployment in real-world scenarios. For example, current techniques do a poor job of balancing risk and reward: they are optimized to perform well on average, but have high variance across queries, significantly hurting performance in some cases. Inspired by portfolio risk models in computational finance, I’ll discuss how the risk of query expansion algorithms can be reduced by casting query expansion in terms of constrained convex optimization over a word graph. Our approach has several advantages. First, it provides an effective way to balance risk and reward. Results across multiple standard test collections show consistent reductions in the number and magnitude of expansion failures with no loss in the strong average-case gain of the underlying expansion algorithm. Second, we obtain a powerful framework for integrating multiple sources of domain knowledge by encoding them as simultaneous optimization constraints. Third, while existing methods typically select expansion terms individually, our approach jointly optimizes over expansion terms as a set, allowing us to capture and reduce set-based phenomena such as aspect imbalance, a traditional source of query drift. Finally, we obtain a natural way to do selective expansion, so that our algorithm automatically reduces or avoids query expansion in risky scenarios. Our approach is easy to apply to existing query expansion algorithms and has fast, real-time implementations.

Bio:

Kevyn Collins-Thompson is a Researcher in the Context, Learning and
User Experience for Search (CLUES) group at Microsoft Research.   He
completed his PhD at Carnegie Mellon University in 2008, where his
advisor was Jamie Callan.  His thesis developed theoretical models,
algorithms, and evaluation methods that account for risk in
information retrieval, focusing on query expansion.  His research
interests include robust algorithms for IR, modeling temporal or
evolutional dynamics of text, and understanding how the brain acquires
language skills.  Kevyn also has more than ten years of industry
experience as a software developer and manager, responsible for
shipping advanced features in Office, Windows, Tablet PC, and many
other products.

Categories: Events Tags: