New MSc and MRes Programme on Web Science

January 16th, 2012 No comments
The Computer Science department is  launching MSc and MRes courses in Web Science. These programmes will provide students with a knowledge and understanding of the fundamental principles and technological components of the World Wide Web, preparing them for a career in scientific research or within Internet-based industries. Topics covered will include detailed coverage of multimedia information processing, information search and retrieval, data mining and knowledge acquisition, and large-scale distributed data analytics.
For more information on Web Science please see here:
Categories: News Tags:

BCS IRSG Tutorial: Information Retrieval & Data Analytics

September 23rd, 2011 1 comment

http://irsg.bcs.org/SearchSolutions/2011/ss2011tutorials.php

09:00-12:30 Tuesday 15th November, Covent Garden

The tutorial is aimed at providing an up-to-date introduction of information retrieval and data analytics (data mining) techniques. It is about how to automatically find relevant information from large-scale data collections and subsequently extract meaningful patterns from it. While the basic concepts and statistical methods of information retrieval and data mining are covered, the course is primarily focused on practical algorithms of textual document indexing, relevance ranking, web usage mining, text analytics, as well as their performance evaluations. Practical retrieval and data mining applications such as web search engines, personalisation and recommender systems, the mining of frequent patterns, associations and correlations, anomaly detection will also be covered.

DDR-2012: Diversity in Document Retrieval

September 6th, 2011 No comments
Overview
When an ambiguous query is received, a sensible approach is for the information retrieval (IR) system to diversify the results retrieved for this query, in the hope that at least one of the interpretations of the query intent will satisfy the user. Diversity is an increasingly important topic, of interest to both academic researchers (such as participants in the TREC Web and Blog track diversity tasks), as well as to search engines professionals. In this workshop, we solicit submissions both on approaches and models for diversity, the evaluation of diverse search results, and on applications and presentation of diverse search results.

OverviewWhen an ambiguous query is received, a sensible approach is for the information retrieval (IR) system to diversify the results retrieved for this query, in the hope that at least one of the interpretations of the query intent will satisfy the user. Diversity is an increasingly important topic, of interest to both academic researchers (such as participants in the TREC Web and Blog track diversity tasks), as well as to search engines professionals. In this workshop, we solicit submissions both on approaches and models for diversity, the evaluation of diverse search results, and on applications and presentation of diverse search results.

Important Dates

5th December: Papers due

10th January: Notification of Acceptance

17th January: Camera-Ready papers due

12th February: DDR-2012 Workshop

Location

DDR 2012 will be co-located with the 5th ACM International Conference

on Web Search and Data Mining (WSDM 2012) in Seattle, on 12th February

2012.

Organisers

Craig Macdonald, University of Glasgow, UK

Jun Wang, University College London, UK

Charles Clarke, University of Waterloo, Canada

Categories: Uncategorized Tags:

CIKM11 Tutorial

June 30th, 2011 No comments
  • J. Wang and K. Collins-Thompson, "CIKM2011 Tutorial: Statistical Information Retrieval Modelling: From Probability Ranking Principle to recent advances in diversity, Portfolio Theory, and beyond," in CIKM, 2011. bibtex
    Go to document
    @inproceedings{cikmtutorial,
      author = {Jun Wang and Kevyn Collins-Thompson},
      title = {CIKM2011 Tutorial: Statistical Information Retrieval Modelling: From Probability Ranking Principle to recent advances in diversity, Portfolio Theory, and beyond},
      booktitle = {CIKM},
      year = {2011},
      pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/blog/},
      }
  • This is version 2 of a tutorial presented in ECIR2011, now with a much more focus on general retrieval modelling. Any suggestions are welcome!
Abstract
Statistical modelling of Information Retrieval systems is a key driving force in the development of the information retrieval (IR) field. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to statistical Information Retrieval modelling. Unlike many other theoretical IR tutorials offered in the past, we take a fresh and systematic perspective from the viewpoint of portfolio theory of information retrieval and risk management. A unified treatment and new insights will be given to reflect the recent developments of considering the ranked retrieval results as a whole. Recent research progress in diversification, risk management, and the portfolio theory of information retrieval will be covered, in addition to classic methods such as Maron and Kuhns’ Probabilistic Indexing, Robertson-Spärck Jones model (the resulting BM25 formula) and language modelling approaches. The tutorial will also review the resulting practical algorithms of risk-aware query expansion, diverse ranking, IR metric optimization as well as their performance evaluations. Practical IR applications such as web search engines, multimedia retrieval, and collaborative filtering will also be introduced, as well as discussion of new opportunities for future research and applications that intersect among information retrieval, knowledge management, and databases.
CIKM

CIKM2011

Categories: CIKM2011, Events Tags:

DiveRS 2011 – International Workshop on Novelty and Diversity in Recommender Systems

June 17th, 2011 No comments

http://ir.ii.uam.es/divers2011/

DiveRS 2011 aims to gather researchers and practitioners interested in the role of novelty and diversity in recommender systems. The workshop seeks to advance towards a better understanding of what novelty and diversity are, how they can improve the effectiveness of recommendation methods and the utility of their outputs. We aim to identify open problems, relevant research directions, and opportunities for innovation in the recommendation business. The workshop seeks to stir further interest for these topics in the community, and stimulate the research and progress in this area.

We welcome the participation of researchers, students, and practitioners in the Recommender Systems community and related areas such as Information Retrieval, Data Mining, Machine Learning, and Human-Computer Interaction, working in different application domains, working on or interested in the workshop topics.

Categories: ACM RecSys2011, Events, News Tags:

A Unified Relevance Retrieval Model by Eliteness Hypothesis

June 16th, 2011 No comments
  • G. Jagadeesh, S. E. Robertson, and J. Wang, "A Unified Relevance Retrieval Model by Eliteness Hypothesis," in ArXiv e-prints (Working Paper), 2011. bibtex
    Go to document
    @inproceedings{unifiedmodel-ictir,
      author = {Gorla. Jagadeesh and S.E. Robertson and Jun Wang},
      title = {A Unified Relevance Retrieval Model by Eliteness Hypothesis},
      booktitle = {ArXiv e-prints (Working Paper)},
      year = {2011},
      Howpublished={ArXiv e-prints, \url{http://arxiv.org/abs/1106.2946}},
      pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/papers/2011-ictir-unifiedmodel.pdf},
      }
In this paper, an Eliteness Hypothesis for information retrieval is proposed, where we define two generative processes to create information items and queries. By assuming the deterministic relationships between the eliteness of terms and relevance, we obtain a new theoretical retrieval framework. The resulting ranking function is a unified one as it is capable of using available relevance information on both the document and the query, which is otherwise unachievable by existing retrieval models. Our preliminary experiment on a simple ranking function has demonstrated the potential of the approach.
Categories: Information Retrieval Models Tags:

ECIR2011 Tutorial: Risk Management in Information Retrieval

November 1st, 2010 Comments off
Risk Management

Risk Management (image courtesy of matrixsafety.com.au)

Jun Wang, University College London and Kevyn Collins-Thompson, Microsoft Research

  • J. Wang and K. Collins-Thompson, "ECIR2011 Tutorial: Risk Management in Information Retrieval," in ECIR, 2011. bibtex
    Go to document
    @inproceedings{ecirtutorial,
      author = {Jun Wang and Kevyn Collins-Thompson},
      title = {ECIR2011 Tutorial: Risk Management in Information Retrieval},
      booktitle = {ECIR},
      year = {2011},
      pdf={http://web4.cs.ucl.ac.uk/staff/jun.wang/papers/2011-ECIRTutorial-RiskIR.pdf},
      }

1. Course objectives

Risk modelling and management are a new concept in Information Retrieval (IR) modelling. The new way of  thinking has significantly departed from the classic information retrieval methodologies originated from the Probability Ranking Principle, the Robertson- Spärck Jones model (the resulting BM25 formula), and the statistical language models. The recent research  in risk-aware IR models has been made by taking an analogy with the financial risk management. It has been demonstrated that the idea about uncertainty and retrieval risks, as well as the resulting mathematical modelling tools, not only provide theoretical explanations of some empirical retrieval results (e.g., the need for diversification, the trade-off between MAP and MRR, and the justification for pseudo-relevance feedback), but also help us develop useful retrieval techniques such as risk-aware query expansion and optimal document ranking.

The tutorial is aimed at providing a comprehensive introduction of the emerging risk modelling and management techniques in information retrieval systems. While the basic theories (such as portfolio theory of IR and mean-variance analysis) and risk models of information retrieval are covered, the tutorial is also focused on the resulting practical algorithms of query expansion, relevance ranking, IR metric optimization as well as their performance evaluations. Practical IR applications such as Web search engines, multimedia retrieval, and collaborative filtering will also be covered, as well as discussion of new opportunities for future research and applications.

2. Its relevance to the information retrieval community

The theoretical research and mathematical modelling of IR systems are the key driver behind the development of the information retrieval (IR) field. The first theoretical study of information retrieval can be traced back to Maron and Kuhns’ Probabilistic Indexing paper in 1960, where the classic school of thinking about IR systems was established [1]. We call it individualism here as the goal is to come up with a relevance score for each of the documents, and to rank them individually with respect to those scores [2, 3]. The two different document-oriented and query oriented views on how to assign a probability of relevance of a document to a user need has resulted in two different types of practical models. The Robertson- Spärck Jones model (the resulting BM25 formula) takes the query-oriented view (or need-oriented view), assuming a given information need and choosing the query representation in order to select relevant documents against others [4], while in the document-oriented view, the language models aim to choose the appropriate document representation to match queries or judge its relevance [5, 6].

The individualism has drawbacks [7]. Recently, there has been a new type of theoretical research that uses a probabilistic framework to move beyond the Probability Ranking Principle and determine optimal ranking algorithms by considering the finial-ranking context [8, 9, 10] or regarding the returned documents as a whole [11]. Portfolio theory of information retrieval has emerged as a new theory in IR modelling and provided a sounded theoretical framework and more powerful mathematical tool for us to understand and analyze retrieval processes such as query expansion [12,13], document ranking [14, 15], and evaluation [16, 17]. Drawing an analogy with finance and financial risk modelling [18], it has been realized that ranking documents under uncertainty is not just about picking individual relevant documents (Individualism), but about choosing the right combination of relevant documents [14]. Here, we name this new information retrieval modelling methodology Portfolioism. This motivates the IR researchers to quantify a ranked list of documents on the basis of its expected overall relevance (mean) and its variance, where the latter serves as a measure of risk.

The risk modelling and portfolio theory of IR provide theoretical explanations of some empirical retrieval results such as the need for diversification, the trade-off between MAP and MRR, and the justification for pseudo-relevance feedback. They also guide us to develop many useful retrieval techniques. For example, risk modelling has been successfully applied in query expansion [12, 13], and mean-variance analysis has been adopted for evaluation [16, 17]. The latest development in risk modelling has been used to directly maximize the expectation of various standard metrics (average precision, mean reciprocal rank, discounted cumulative gain, etc) [19].

3. The tutorial content and how the tutorial will be structured

Theme 1 The classic risk modelling era (Individualism)– 30 minutes (It should be emphasized that many previous tutorials on information retrieval models were limited only to theme 1 (mostly 1 and 2). To have a complete and consistent view, we will quickly spend half hour on it and yet provide a rather fresh discussion and new insights from the novel risk-modeling viewpoint.)

  1. Probability Ranking Principle [1, 2, 3, 7]
  2. Classic IR models [4, 5]
  3. Handling Ranking Context [8, 9, 10]

Theme 2 The Modern risk modelling era (Portfolioism) - 120 minutes

  1. Ranking, IR metric optimization, and mean-variance analysis [14, 15, 19]
  2. Risk-aware Query expansion [12, 13]
  3. Evaluation [16, 17]
  4. Other applications (multimedia retrieval, collaborative filtering, etc.) [11, 24, 25]

Theme 3 Relation to other recent developments – 30 minutes

  1. Robust classification [26, 27]
  2. Diversification and Multi-armed Bandit Machine [20, 21, 22]

10. The relation to Quantum Probability Ranking Principle [23]

11. Future opportunities

4. Planned course materials

Slides, handouts, and demos.

5. Key references

[1]    M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. J. ACM, 7(3), 1960.

http://portal.acm.org/ft_gateway.cfm?id=321035&type=pdf&CFID=15232325&CFTOKEN=45719553

[2]    W. S. Cooper. The inadequacy of probability of usefulness as a ranking criterion for retrieval system output. University of California, Berkeley, 1971.

[3]    S. E. Robertson and N. Belkin. Ranking in principle. Journal of Documentation, 1978.

[4]    S. E. Robertson and K. Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129–46, 1976.

[5]    D. Hiemstra. Using language models for information retrieval. PhD thesis, Univ.Twente, 2001.

[6]    J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, 2001.

[7]    M. D. Gordon and P. Lenk. A utility theoretic examination of the probability ranking principle in information retrieval. JASIS, 42(10):703–714, 1991.

[8]    C. Zhai and J. D. Lafferty. A risk minimization framework for information retrieval. Inf. Process. Manage., 42(1):31–55, 2006.

[9]    H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, 2006.

[10] J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998.

[11] Jun Wang. “Mean-variance analysis: A new document ranking theory in information retrieval. In Proc. of European Conference on Information Retrieval (ECIR 2009), 2009.

[12] K. Collins-Thompson. “Estimating robust query models with convex optimization”. Advances in Neural Information Processing Systems 21  (NIPS), 2008

[13] K. Collins-Thompson. “Reducing the risk of query expansion via robust constrained optimization”.  in CIKM, 2009

[14] Jun Wang and Jianhan Zhu. Portfolio theory of information retrieval. In SIGIR09, 2009.

[15] Jianhan Zhu, Jun Wang, Michael Taylor, and Ingemar Cox. Risky business: Modeling and exploiting uncertainty in information retrieval. In SIGIR09, 2009.

[16] K. Collins-Thompson. “Accounting for stability of retrieval algorithms using risk-reward curves”. Proceedings of SIGIR 2009 Workshop on the Future of Evaluation in Information Retrieval, Boston. pg. 27-28.

[17] J. Zhu, J. Wang, and V. Vinay, “Topic (Query) Selection for IR Evaluation,” in SIGIR09

[18] H. Markowitz. Portfolio selection. Journal of Finance, 1952.

[19] J. Wang and J. Zhu, “On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics,” in SIGIR10 Full Paper, 2010

[20] A. Slivkins, F. Radlinski and S. Gollapudi, Learning Optimally Diverse Rankings over Large Document Collections, ICML 2010.

[21] Thorsten Joachims, Thomas Hofmann, Yisong Yue, Chun-Nam Yu, Predicting Structured Objects with Support Vector Machines, Communications of the ACM (CACM)

[22] C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. B¨uttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008.

[23]  G. Zuccon, L. Azzopardi, Using the Quantum Probability Ranking Principle to Rank Interdependent Documents, in ECIR 2010

[24] Robin Aly, Aiden Doherty, Djoerd Hiemstra, Alan Smeaton. “Beyond Shot Retrieval: Searching for Broadcast News Items Using Language Models of Concepts’’. In ECIR, 2010

[25] Portfolio Theory of Multimedia Fusion, X.Y. Wang and M.S. Kankanhalli, ACM International Conference on Multimedia (ACMMM 2010), 2010.

[26] Lanckriet, G.R.G., El Ghaoui, L., Bhattacharyya, C., Jordan, M.I. (2002). A Robust Minimax Approach to Classification . Journal of Machine Learning Research, Vol. 3, pp. 555-582.

[27] Dredze, M., Crammer, K., and Pereira, F. 2008. Confidence-weighted linear classification. In Proceedings of the 25th international Conference on Machine Learning (Helsinki, Finland, July 05 – 09, 2008). ICML ’08, vol. 307. ACM, New York, NY, 264-271.

ACM RecSys10: Optimizing multiple objectives in collaborative filtering

October 20th, 2010 Comments off

This paper is about the utility of making personalized recommendations. While it is important to accurately predict the target user’s preference, in practice the accuracy should not be the only concern; a useful recommender system needs to consider the user’s utility or satisfaction of fulfilling a certain information seeking task. For example, recommending popular items (products) is unlikely to result in more gain than discovering insignificant (“long tail”) yet liked items because the popular ones might be already known to the user. Equally, recommending items that are out of stock would be frustrating for both the user and system if the system is employed to discover items to purchase. Thus, it is important to have a flexible recommendation framework that takes into account additional recommendation goals meanwhile minimizing the performance loss in order to provide greater adjustability and a better user experience.

To achieve this, in this paper, we propose a general recommendation optimization framework that not only considers the predicted preference scores (e.g. ratings) but also deals with additional operational or resource related recommendation goals. Using this framework we demonstrate through realistic examples how to expand existing rating prediction algorithms by biasing the recommendation depending on other external factors such as the availability, profitability or usefulness of an item. Our experiments on real data sets demonstrate that this framework is indeed able to cope with multiple objectives with minor performance loss.

http://portal.acm.org/citation.cfm?id=1864708.1864723

ACM SIGIR2010 Optimal Ranker

August 11th, 2010 Comments off
Optimal Ranker

Optimal Ranker

The slides of  the SIGIR talk will be put online.


A demo of portfolio theory of movie recommendation

May 10th, 2010 Comments off

Picture 1

Thanks to David Stefan, a movie recommendation demo system is up and running. This is a 4th year final project and intended to demo the effectiveness of using portfolio theory to recommender systems.

To launch the application, click here. (if you found any bugs, do drop us an email!)