BCS IRSG Tutorial: Information Retrieval & Data Analytics
http://irsg.bcs.org/SearchSolutions/2011/ss2011tutorials.php 
09:00-12:30 Tuesday 15th November, Covent Garden
The tutorial is aimed at providing an up-to-date introduction of information retrieval and data analytics (data mining) techniques. It is about how to automatically find relevant information from large-scale data collections and subsequently extract meaningful patterns from it. While the basic concepts and statistical methods of information retrieval and data mining are covered, the course is primarily focused on practical algorithms of textual document indexing, relevance ranking, web usage mining, text analytics, as well as their performance evaluations. Practical retrieval and data mining applications such as web search engines, personalisation and recommender systems, the mining of frequent patterns, associations and correlations, anomaly detection will also be covered.
DDR-2012: Diversity in Document Retrieval
OverviewWhen an ambiguous query is received, a sensible approach is for the information retrieval (IR) system to diversify the results retrieved for this query, in the hope that at least one of the interpretations of the query intent will satisfy the user. Diversity is an increasingly important topic, of interest to both academic researchers (such as participants in the TREC Web and Blog track diversity tasks), as well as to search engines professionals. In this workshop, we solicit submissions both on approaches and models for diversity, the evaluation of diverse search results, and on applications and presentation of diverse search results.
Important Dates
5th December: Papers due
10th January: Notification of Acceptance
17th January: Camera-Ready papers due
12th February: DDR-2012 Workshop
Location
DDR 2012 will be co-located with the 5th ACM International Conference
on Web Search and Data Mining (WSDM 2012) in Seattle, on 12th February
2012.
Organisers
Craig Macdonald, University of Glasgow, UK
Jun Wang, University College London, UK
Charles Clarke, University of Waterloo, Canada
CIKM11 Tutorial
-
J. Wang and K. Collins-Thompson, "CIKM2011 Tutorial: Statistical Information Retrieval Modelling: From Probability Ranking Principle to recent advances in diversity, Portfolio Theory, and beyond," in CIKM, 2011. bibtex
@inproceedings{cikmtutorial,
author = {Jun Wang and Kevyn Collins-Thompson},
title = {CIKM2011 Tutorial: Statistical Information Retrieval Modelling: From Probability Ranking Principle to recent advances in diversity, Portfolio Theory, and beyond},
booktitle = {CIKM},
year = {2011},
pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/blog/},
}
- This is version 2 of a tutorial presented in ECIR2011, now with a much more focus on general retrieval modelling. Any suggestions are welcome!
DiveRS 2011 – International Workshop on Novelty and Diversity in Recommender Systems
http://ir.ii.uam.es/divers2011/
DiveRS 2011 aims to gather researchers and practitioners interested in the role of novelty and diversity in recommender systems. The workshop seeks to advance towards a better understanding of what novelty and diversity are, how they can improve the effectiveness of recommendation methods and the utility of their outputs. We aim to identify open problems, relevant research directions, and opportunities for innovation in the recommendation business. The workshop seeks to stir further interest for these topics in the community, and stimulate the research and progress in this area.
We welcome the participation of researchers, students, and practitioners in the Recommender Systems community and related areas such as Information Retrieval, Data Mining, Machine Learning, and Human-Computer Interaction, working in different application domains, working on or interested in the workshop topics.
A Unified Relevance Retrieval Model by Eliteness Hypothesis
-
G. Jagadeesh, S. E. Robertson, and J. Wang, "A Unified Relevance Retrieval Model by Eliteness Hypothesis," in ArXiv e-prints (Working Paper), 2011. bibtex
@inproceedings{unifiedmodel-ictir,
author = {Gorla. Jagadeesh and S.E. Robertson and Jun Wang},
title = {A Unified Relevance Retrieval Model by Eliteness Hypothesis},
booktitle = {ArXiv e-prints (Working Paper)},
year = {2011},
Howpublished={ArXiv e-prints, \url{http://arxiv.org/abs/1106.2946}},
pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/papers/2011-ictir-unifiedmodel.pdf},
}
ECIR2011 Tutorial: Risk Management in Information Retrieval

Risk Management (image courtesy of matrixsafety.com.au)
Jun Wang, University College London and Kevyn Collins-Thompson, Microsoft Research
-
J. Wang and K. Collins-Thompson, "ECIR2011 Tutorial: Risk Management in Information Retrieval," in ECIR, 2011. bibtex
@inproceedings{ecirtutorial,
author = {Jun Wang and Kevyn Collins-Thompson},
title = {ECIR2011 Tutorial: Risk Management in Information Retrieval},
booktitle = {ECIR},
year = {2011},
pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/papers/2011-ECIRTutorial-RiskIR.pdf},
}
1. Course objectives
Risk modelling and management are a new concept in Information Retrieval (IR) modelling. The new way of thinking has significantly departed from the classic information retrieval methodologies originated from the Probability Ranking Principle, the Robertson- Spärck Jones model (the resulting BM25 formula), and the statistical language models. The recent research in risk-aware IR models has been made by taking an analogy with the financial risk management. It has been demonstrated that the idea about uncertainty and retrieval risks, as well as the resulting mathematical modelling tools, not only provide theoretical explanations of some empirical retrieval results (e.g., the need for diversification, the trade-off between MAP and MRR, and the justification for pseudo-relevance feedback), but also help us develop useful retrieval techniques such as risk-aware query expansion and optimal document ranking.
The tutorial is aimed at providing a comprehensive introduction of the emerging risk modelling and management techniques in information retrieval systems. While the basic theories (such as portfolio theory of IR and mean-variance analysis) and risk models of information retrieval are covered, the tutorial is also focused on the resulting practical algorithms of query expansion, relevance ranking, IR metric optimization as well as their performance evaluations. Practical IR applications such as Web search engines, multimedia retrieval, and collaborative filtering will also be covered, as well as discussion of new opportunities for future research and applications.
2. Its relevance to the information retrieval community
The theoretical research and mathematical modelling of IR systems are the key driver behind the development of the information retrieval (IR) field. The first theoretical study of information retrieval can be traced back to Maron and Kuhns’ Probabilistic Indexing paper in 1960, where the classic school of thinking about IR systems was established [1]. We call it individualism here as the goal is to come up with a relevance score for each of the documents, and to rank them individually with respect to those scores [2, 3]. The two different document-oriented and query oriented views on how to assign a probability of relevance of a document to a user need has resulted in two different types of practical models. The Robertson- Spärck Jones model (the resulting BM25 formula) takes the query-oriented view (or need-oriented view), assuming a given information need and choosing the query representation in order to select relevant documents against others [4], while in the document-oriented view, the language models aim to choose the appropriate document representation to match queries or judge its relevance [5, 6].
The individualism has drawbacks [7]. Recently, there has been a new type of theoretical research that uses a probabilistic framework to move beyond the Probability Ranking Principle and determine optimal ranking algorithms by considering the finial-ranking context [8, 9, 10] or regarding the returned documents as a whole [11]. Portfolio theory of information retrieval has emerged as a new theory in IR modelling and provided a sounded theoretical framework and more powerful mathematical tool for us to understand and analyze retrieval processes such as query expansion [12,13], document ranking [14, 15], and evaluation [16, 17]. Drawing an analogy with finance and financial risk modelling [18], it has been realized that ranking documents under uncertainty is not just about picking individual relevant documents (Individualism), but about choosing the right combination of relevant documents [14]. Here, we name this new information retrieval modelling methodology Portfolioism. This motivates the IR researchers to quantify a ranked list of documents on the basis of its expected overall relevance (mean) and its variance, where the latter serves as a measure of risk.
The risk modelling and portfolio theory of IR provide theoretical explanations of some empirical retrieval results such as the need for diversification, the trade-off between MAP and MRR, and the justification for pseudo-relevance feedback. They also guide us to develop many useful retrieval techniques. For example, risk modelling has been successfully applied in query expansion [12, 13], and mean-variance analysis has been adopted for evaluation [16, 17]. The latest development in risk modelling has been used to directly maximize the expectation of various standard metrics (average precision, mean reciprocal rank, discounted cumulative gain, etc) [19].
3. The tutorial content and how the tutorial will be structured
Theme 1 The classic risk modelling era (Individualism)– 30 minutes (It should be emphasized that many previous tutorials on information retrieval models were limited only to theme 1 (mostly 1 and 2). To have a complete and consistent view, we will quickly spend half hour on it and yet provide a rather fresh discussion and new insights from the novel risk-modeling viewpoint.)
- Probability Ranking Principle [1, 2, 3, 7]
- Classic IR models [4, 5]
- Handling Ranking Context [8, 9, 10]
Theme 2 The Modern risk modelling era (Portfolioism) - 120 minutes
- Ranking, IR metric optimization, and mean-variance analysis [14, 15, 19]
- Risk-aware Query expansion [12, 13]
- Evaluation [16, 17]
- Other applications (multimedia retrieval, collaborative filtering, etc.) [11, 24, 25]
Theme 3 Relation to other recent developments – 30 minutes
- Robust classification [26, 27]
- Diversification and Multi-armed Bandit Machine [20, 21, 22]
10. The relation to Quantum Probability Ranking Principle [23]
11. Future opportunities
4. Planned course materials
Slides, handouts, and demos.
5. Key references
[1] M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. J. ACM, 7(3), 1960.
http://portal.acm.org/ft_gateway.cfm?id=321035&type=pdf&CFID=15232325&CFTOKEN=45719553
[2] W. S. Cooper. The inadequacy of probability of usefulness as a ranking criterion for retrieval system output. University of California, Berkeley, 1971.
[3] S. E. Robertson and N. Belkin. Ranking in principle. Journal of Documentation, 1978.
[4] S. E. Robertson and K. Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129–46, 1976.
[5] D. Hiemstra. Using language models for information retrieval. PhD thesis, Univ.Twente, 2001.
[6] J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, 2001.
[7] M. D. Gordon and P. Lenk. A utility theoretic examination of the probability ranking principle in information retrieval. JASIS, 42(10):703–714, 1991.
[8] C. Zhai and J. D. Lafferty. A risk minimization framework for information retrieval. Inf. Process. Manage., 42(1):31–55, 2006.
[9] H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, 2006.
[10] J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998.
[11] Jun Wang. “Mean-variance analysis: A new document ranking theory in information retrieval. In Proc. of European Conference on Information Retrieval (ECIR 2009), 2009.
[12] K. Collins-Thompson. “Estimating robust query models with convex optimization”. Advances in Neural Information Processing Systems 21 (NIPS), 2008
[13] K. Collins-Thompson. “Reducing the risk of query expansion via robust constrained optimization”. in CIKM, 2009
[14] Jun Wang and Jianhan Zhu. Portfolio theory of information retrieval. In SIGIR09, 2009.
[15] Jianhan Zhu, Jun Wang, Michael Taylor, and Ingemar Cox. Risky business: Modeling and exploiting uncertainty in information retrieval. In SIGIR09, 2009.
[16] K. Collins-Thompson. “Accounting for stability of retrieval algorithms using risk-reward curves”. Proceedings of SIGIR 2009 Workshop on the Future of Evaluation in Information Retrieval, Boston. pg. 27-28.
[17] J. Zhu, J. Wang, and V. Vinay, “Topic (Query) Selection for IR Evaluation,” in SIGIR09
[18] H. Markowitz. Portfolio selection. Journal of Finance, 1952.
[19] J. Wang and J. Zhu, “On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics,” in SIGIR10 Full Paper, 2010
[20] A. Slivkins, F. Radlinski and S. Gollapudi, Learning Optimally Diverse Rankings over Large Document Collections, ICML 2010.
[21] Thorsten Joachims, Thomas Hofmann, Yisong Yue, Chun-Nam Yu, Predicting Structured Objects with Support Vector Machines, Communications of the ACM (CACM)
[22] C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. B¨uttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008.
[23] G. Zuccon, L. Azzopardi, Using the Quantum Probability Ranking Principle to Rank Interdependent Documents, in ECIR 2010
[24] Robin Aly, Aiden Doherty, Djoerd Hiemstra, Alan Smeaton. “Beyond Shot Retrieval: Searching for Broadcast News Items Using Language Models of Concepts’’. In ECIR, 2010
[25] Portfolio Theory of Multimedia Fusion, X.Y. Wang and M.S. Kankanhalli, ACM International Conference on Multimedia (ACMMM 2010), 2010.
[26] Lanckriet, G.R.G., El Ghaoui, L., Bhattacharyya, C., Jordan, M.I. (2002). A Robust Minimax Approach to Classification . Journal of Machine Learning Research, Vol. 3, pp. 555-582.
[27] Dredze, M., Crammer, K., and Pereira, F. 2008. Confidence-weighted linear classification. In Proceedings of the 25th international Conference on Machine Learning (Helsinki, Finland, July 05 – 09, 2008). ICML ’08, vol. 307. ACM, New York, NY, 264-271.
ACM RecSys10: Optimizing multiple objectives in collaborative filtering

This paper is about the utility of making personalized recommendations. While it is important to accurately predict the target user’s preference, in practice the accuracy should not be the only concern; a useful recommender system needs to consider the user’s utility or satisfaction of fulfilling a certain information seeking task. For example, recommending popular items (products) is unlikely to result in more gain than discovering insignificant (“long tail”) yet liked items because the popular ones might be already known to the user. Equally, recommending items that are out of stock would be frustrating for both the user and system if the system is employed to discover items to purchase. Thus, it is important to have a flexible recommendation framework that takes into account additional recommendation goals meanwhile minimizing the performance loss in order to provide greater adjustability and a better user experience.
To achieve this, in this paper, we propose a general recommendation optimization framework that not only considers the predicted preference scores (e.g. ratings) but also deals with additional operational or resource related recommendation goals. Using this framework we demonstrate through realistic examples how to expand existing rating prediction algorithms by biasing the recommendation depending on other external factors such as the availability, profitability or usefulness of an item. Our experiments on real data sets demonstrate that this framework is indeed able to cope with multiple objectives with minor performance loss.
http://portal.acm.org/citation.cfm?id=1864708.1864723
ACM SIGIR2010 Optimal Ranker

Optimal Ranker
The slides of the SIGIR talk will be put online.
A demo of portfolio theory of movie recommendation
Thanks to David Stefan, a movie recommendation demo system is up and running. This is a 4th year final project and intended to demo the effectiveness of using portfolio theory to recommender systems.
To launch the application, click here. (if you found any bugs, do drop us an email!)


