Archive

Archive for the ‘Tutorials: Collaborative Filtering’ Category

CIKM2013 Full Paper: Interactive Collaborative Filtering

July 18th, 2013 Comments off

Interactive Recommendation: a ‘chicken- or-the-egg’ problem.

In this paper, we study collaborative filtering (CF) in an interactive setting, in which a recommender system continuously recommends items to individual users and receives interactive feedback. Whilst users enjoy sequential recommendations, the recommendation predictions are constantly refined using up-to-date feedback on the recommended items. Bringing the interactive mechanism back to CF process is fundamental because the ultimate goal for a recommender system is about the discovery of interesting items for individual users and yet users’ personal preferences and contexts evolve over time during the interactions with the system. This requires us not to distinguish between the stages of collecting information to construct the user profile and making recommendations, but to seamlessly integrate these stages together during the interactive process, with the goal of maximizing the overall recommendation accuracy throughout the interactions. This mechanism naturally addresses the cold-start problem as any user can immediately receive sequential recommendations without providing ratings beforehand. We formulate Interactive CF with the probabilistic matrix factorization (PMF) framework, and leverage several exploitation-exploration algorithms to select items, including the empirical Thompson sampling and upper-confidence-bound-based algorithms. We conduct our experiment on cold-start users as well as warm-start users with drifting taste. Results show that the proposed methods have significant improvements over several strong baselines for the MovieLens, EachMovie and Netflix datasets.

Post to Twitter

ACM RecSys13: To Personalize or Not: A Risk Management Perspective

July 16th, 2013 Comments off

Performance comparison of personalized and non-personalized recommendations.

Personalization techniques have been widely adopted in many recommender systems. However, experiments on real-word datasets show that for some users in certain contexts, personalized recommendations do not necessarily perform better than recommendations that rely purely on popularity. Broadly, this can be interpreted by the fact that the parameters of a personalization model are usually estimated from sparse data; the resulting personalized prediction, despite of its low bias, is often volatile. In this paper, we study the problem further by investigating into the ranking of recommendation lists. From a risk management and portfolio retrieval perspective, there is no difference between the popularity-based and the personalized ranking as both of the recommendation outputs can be represented as the trade-off between expected relevance (reward) and associated uncertainty (risk).  Through our analysis, we discover common scenarios and provide a technique to predict whether personalization will fail. Besides the theoretical understanding, our experimental results show that the resulting switch algorithm, which decides whether or not to personalize, outperforms the mainstream recommendation algorithms.

Post to Twitter

WWW2012 Full Paper: Using Control Theory for Stable and Efficient Recommender Systems

February 16th, 2012 No comments

The aim of a web-based recommender system is to provide highly accurate and up-to-date recommendations to its users; in

A mass attached to a spring and damper

practice, it will hope to retain its users over time. However, this raises unique challenges. To achieve complex goals such as keeping the recommender model up- to-date over time, we need to consider a number of external requirements. Generally, these requirements arise from the physical nature of

A controlled recommender system

the system, for instance the available computational resources. Ideally, we would like to design a system that does not deviate from the required outcome. Modeling such a system over time requires to describe the internal dynamics as a combination of the underlying recommender model and the its users’ behavior. We propose to solve this problem by applying the principles of modern con- trol theory—a powerful set of tools to deal with dynamical systems—to construct and maintain a stable and robust recommender system for dynamically evolving environments. In particular, we introduce a design principle by focusing on the dynamic relationship between the recommender sys- tem’s performance and the number of new training samples the system requires. This enables us to automate the control other external factors such as the system’s update frequency. We show that, by using a Proportional-Integral-Derivative controller, a recommender system is able to automatically and accurately estimate the required input to keep the out- put close to a pre-defined requirements. Our experiments on a standard rating dataset show that, by using a feedback loop between system performance and training, the trade- off between the effectiveness and efficiency of the system can be well maintained. We close by discussing the widespread applicability of our approach to a variety of scenarios that recommender systems face.

[bibtex file=mypublications.bib key=www2012]

Post to Twitter

ACM RecSys10: Optimizing multiple objectives in collaborative filtering

October 20th, 2010 Comments off

This paper is about the utility of making personalized recommendations. While it is important to accurately predict the target user’s preference, in practice the accuracy should not be the only concern; a useful recommender system needs to consider the user’s utility or satisfaction of fulfilling a certain information seeking task. For example, recommending popular items (products) is unlikely to result in more gain than discovering insignificant (“long tail”) yet liked items because the popular ones might be already known to the user. Equally, recommending items that are out of stock would be frustrating for both the user and system if the system is employed to discover items to purchase. Thus, it is important to have a flexible recommendation framework that takes into account additional recommendation goals meanwhile minimizing the performance loss in order to provide greater adjustability and a better user experience.

To achieve this, in this paper, we propose a general recommendation optimization framework that not only considers the predicted preference scores (e.g. ratings) but also deals with additional operational or resource related recommendation goals. Using this framework we demonstrate through realistic examples how to expand existing rating prediction algorithms by biasing the recommendation depending on other external factors such as the availability, profitability or usefulness of an item. Our experiments on real data sets demonstrate that this framework is indeed able to cope with multiple objectives with minor performance loss.

http://portal.acm.org/citation.cfm?id=1864708.1864723

Post to Twitter

A demo of portfolio theory of movie recommendation

May 10th, 2010 Comments off

Picture 1

Thanks to David Stefan, a movie recommendation demo system is up and running. This is a 4th year final project and intended to demo the effectiveness of using portfolio theory to recommender systems.

To launch the application, click here. (if you found any bugs, do drop us an email!)

Post to Twitter

Directional Error Based Collaborative Filtering

December 18th, 2009 No comments

Goal-driven Collaborative Filtering – A Directional Error Based Approach, ECIR10

Tamas Jambor and Jun Wang

[bibtex file=mypublications.bib key=Wang:ecir2010]

directionalerror

Directional Risk Preference of Recommendation Prediction

Collaborative filtering is one of the most effective techniques for making personalized content recommendation. In the literature, a common experimental setup in the modeling phase is to minimize,
either explicitly or implicitly, the (expected) error between the predicted ratings and the true user ratings, while in the evaluation phase, the resulting model is again assessed by that error. In this paper,  we argue that defining an error function that is fixed across rating scales is however limited, and different applications may have different recommendation goals thus error functions. For example, in some cases, we might be more concerned about the highly predicted items than the ones with low ratings (precision minded), while in other cases, we want to make sure not to miss any highly
rated items (recall minded). Additionally, some applications might require to produce a top-N recommendation list, where the rank-based performance measure becomes valid. To address this issue, we propose a flexible optimization framework that can adapt to individual recommendation goals. We introduce a Directional Error Function to capture the cost (risk) of each individual  predictions, and it can be learned from the specified performance measures at hand. Our preliminary experiments on a real data set demonstrate that significant performance gains have been achieved.

Post to Twitter

Collaborative Filtering Explained

July 8th, 2009 Comments off

Collaborative filtering (CF) is concerned with predicting how likely a specific user will like certain information items (books, movies, music items, web pages, etc). As the term “collaborative” probably implies, the prediction has to rely on a collection of other (similar) users’ preferences, which have been collaboratively collected.

Netflix DVD Recommendation

Netflix DVD Recommendation

One of the popular applications of collaborative filtering is personalized content recommendation. A typical example is movie recommendation, where a user is explicitly asked to rate what he or she has liked or disliked in the past. After rating a few movie items, the recommendation engine would be able to produce a prediction about the user’s ratings of unseen movie items, by looking at other (similar) users’ past ratings for the movies items in question. In this case, users have to explicitly provide their ratings for movies items before hand, e.g., give 1 star for the lowest rating (most hated)  and 5 stars for the highest rating (most liked).

Amazons Book Recommendation

Amazon's Book Recommendation

End users sometimes are too lazy to provide such ratings. Alternatively, it is possible for a recommendation system to “non-intrusively” observe the users choices and infer their preferences from the historic interaction data.  For example, Amazon’s book recommendation requires  an injection of the users’ purchase data – “Customers Who Bought This Item Also Bought” that.

In either case, if we look at the prediction problem from a conceptual level, it is very much like Web retrieval in that it needs to calculate the correspondence (called relevance) between a user information need (in our case, a user preference or predefined preferable topics) and an information item (e.g., a movie or a book).  In text retrieval, the correspondence is usually calculated by looking at content descriptions, e.g., how many and how frequent the query terms occur with a document. Likewise, when we make personalized recommendations, users’ unseen preference can be predicted by examining the matching between the textual descriptions of information needs and information items. This is called content-based filtering.

Collaborative filtering, on the other hand, does not necessarily need textual descriptions. It makes personalized recommendations by aggregating the opinions and preferences of previous users. Originally, the idea of collaborative filtering was derived from heuristics. Goldberg et. al coined the term, collaborative filtering, when developing an automatic filtering system for electronic mail, called Tapestry, at Xerox Palo Alto Research. In the paper published in CACM 1992

[bibtex file=ir.bib key=cf1992]

it reads:

“Collaborative filtering simply means that people collaborate to help one another perform filtering by recording their reactions to documents they read. Such reactions may be that a document was particularly interesting (or particularly uninteresting)…”

This reminds me a story in a well-known ancient Chinese historical book, Zhan Guo Ce (English title  “Strategies of the Warring States”).

[bibtex file=ir.bib key=zgc]

In VOL. 10 CH’I III, it reads:

Shun-yu K’un in one day introduced seven scholars to King Hsuan. The King said: ‘Come sir. I have heard that if there is a single scholar in a thousand li (a distance measure) they are standing shoulder to shoulder, if there is a single sage in a hundred generations it is as if they came following on each other’s heels. Are not scholars then very numerous?’ Shun-yu K’un said:’ Not so. Birds of a feather flock together. Like footed animals walk in company. If now you seek for ch’ai-hu or chieh-keng (medical herbs grown usually in mountains) in bogs and marshes, you will not get any in several generations. All things have their associates. Now I am an associate of able men. If Your Majesty ask me for scholars it is like scooping water out of the river and getting fire from tinder. If I am to introduce scholars again will they be only seven?’ “

Basically the story tells us:

“Things of a kind come together; People of a mind fall into the same group.”

Let us first formulate collaborative filtering by grouping people (users).

User-based Collaborative Filtering

User-based Collaborative Filtering

User-based approaches

The user-based approaches assume that:

users who have similar preferences in the past are likely to have similar preferences in the future, and the more similar they are, the more likely they would agree with each other in the future.

The preference prediction is therefore calculated by weighted-averaging of the ratings from similar users.

(To Be Continued)

Post to Twitter

Collaborative Filtering Resources

June 17th, 2009 Comments off

Collaborative filtering (CF) is concerned with making recommendations about information items (books, movies, music items, web-pages, etc) to a specific user,  based on his/her preference indicated in the past.  Users having similar preferences in the past is likely to share similar preferences in the future.  Given the user, information items can be filtered in/out regarding to the behaviors of his or her similar users.

User profiles can be explicitly obtained by asking users to rate items that they know. However these explicit ratings are hard to gather in a real system. It is highly desirable to infer user preferences from implicit observations of user interactions with a system.

One can explicitly ask users to rate what they have used/purchased. Such a profile is filled explicitly by the users ratings. An implicit profile is based on passive observation and contains users historic interaction data.

In this page, I collected some useful online materials for collaborative filtering research.

Research Software

- CoFE: a java based collaborative filtering engine. http://eecs.oregonstate.edu/iis/CoFE/

- Suggest Top-N recommendation engine: it implements the item-based and user-based collaborative filtering algorithms. Only lib files, no source codes included. http://www-users.cs.umn.edu/~karypis/suggest/

- C/Matlab Toolkit: a Matlab implementation of some collaborative filtering algorithms, including memory-user-based, personality diagnosis method (see Pennock et FL., 2000) etc. http://www-2.cs.cmu.edu/~lebanon/IR-lab.htm

- Matlab code for Canny’s factor analysis based collaborative filtering. www.cs.berkeley.edu/~jfc/’mender/.

- Taste is a collaborative filtering engine for Java. http://taste.sourceforge.net/

Data sets

Explicit Rating Data Sets:

- Movielens Movie Rating Data Set. http://www.grouplens.org/

- Jester Joke Rating Data Set. http://www.ieor.berkeley.edu/~goldberg/jester-data/

- Book-Crossing Book Rating Data Set. http://www.informatik.uni-freiburg.de/~cziegler/BX/

- Parliament Voting. http://ucdata.berkeley.edu:7101/new_web/VoteWorld/voteworld/datasets.html

- Online Dating Data Set. http://www.ksi.ms.mff.cuni.cz/~petricek/data/ It contains user ratings from an online dating we site: libimseti.cz. Courtesy of Vaclav Petricek.

Implicit Rating Data Sets:

- Audioscrobblers Music Play-list Data-sets.The Audioscrobbler dataset collects the play-lists of the users in a one-line community (http://www.audioscrobbler.com/) by using a plug-in in the users’ media players such as Winamp, iTunes, XMMS etc. The plug-ins send the title and artist of every song users play to the Audioscrobbler server, which updates the user’s musical profile with the new songs. In the database, the user’s profile is recorded as a form of co-occurrence pair like {userID,itemID} pair. The pair means a user {userID} has played a\

- AOL Web search query: http://www.gregsadetsky.com/aol-data/

Collaborative Filtering Bibliography

Memory-based

- Unifying User-based and Item-based Collaborative Filtering Approaches by Similarity Fusion (2006). Appear in SIGIR 2006. http://ict.ewi.tudelft.nl/pub/jun/sigir06_similarityfuson.pdf

- Scalable collaborative filtering using cluster-based smoothing (2005). http://doi.acm.org/10.1145/1076034.1076056

- An automatic weighting scheme for collaborative filtering (2004). http://doi.acm.org/10.1145/1008992.1009051

- Item-based Collaborative Filtering Recommendation Algorithms (2001). http://www10.org/cdrom/papers/519/

- Evaluation of Item-Based Top-N Recommendation Algorithms (2001). http://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/itemrs.pdf

- A regression-based approach for scaling-up personalized recommender systems in e-commerce (2000). http://nas.cl.uh.edu/boetticher/ML_DataMining/vucetic.pdf

- Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model-Based Approach (1999). http://research.microsoft.com/~horvitz/cfpd.htm

- An algorithmic framework for performing collaborative filtering (1999).

- Empirical Analysis of Predictive Algorithms for Collaborative Filtering (1998). http://research.microsoft.com/research/pubs/view.aspx?tr_id=166

- Grouplens: Applying Collaborative Filtering to Usenet News (1997).  http://www.ics.uci.edu/~pratt/courses/papers/p77-konstan.pdf

- Social Information Filtering: Algorithms for Automating “Word of Mouth” (1995). http://citeseer.ist.psu.edu/195430.html

- Grouplens: an open architecture for collaborative filtering of netnews (1994). http://doi.acm.org/10.1145/192844.192905

- Using collaborative filtering to weave an information tapestry (1992). http://citeseer.ist.psu.edu/context/1727112/0

Relevance Models

- Unified Relevance Models for Rating Prediction in Collaborative Filtering (2008). http://ict.ewi.tudelft.nl/pub/jun/URM-Wang-TOIS.pdf

- A User-Item Relevance Model for Log-based Collaborative Filtering (2006). http://ict.ewi.tudelft.nl/pub/jun/ecir06.pdf

- Relevance Feedback Models for Recommendation (2006). http://acl.ldc.upenn.edu/W/W06/W06-1653.pdf

Latent Class Models

- A study of Mixture Models for Collaborative Filtering (2006). http://www.cs.cmu.edu/~lsi/Paper_JIR_Si.pdf

- Two-way latent grouping model for user preference prediction (2005). http://eprints.pascal-network.org/archive/00001005/01/uai05.pdf

- The Multiple Multiplicative Factor Model For Collaborative Filtering (2004). http://www.machinelearning.org/proceedings/icml2004/papers/363.pdf

- Collaborative filtering: a machine learning perspective (2004). http://citeseer.ist.psu.edu/marlin04collaborative.html

- Flexible mixture model for collaborative filtering (2003). http://www.hpl.hp.com/conferences/icml2003/papers/183.pdf

- Latent class models for collaborative filtering (1999). http://portal.acm.org/citation.cfm?id=687583

Matrix Factorization

- Fast Maximum Margin Matrix Factorization for Collaborative Prediction (2005). http://people.csail.mit.edu/jrennie/papers/icml05-mmmf.pdf

- Eigentaste: A constant time collaborative filtering algorithm (2001). (Using PCA) http://www.ieor.berkeley.edu/~goldberg/pubs/eigentaste.pdf

- Application of Dimensionality Reduction in Recommender System — A Case Study (2000). http://citeseer.ist.psu.edu/sarwar00application.html

- Collaborative filtering with privacy via factor analysis (1999). (Using factor analysis) http://www.cs.berkeley.edu/~jfc/papers/02/SIGIR02.pdf

- Learning collaborative information filters (1998). (using SVD) http://www.ics.uci.edu/~pazzani/Publications/MLC98.pdf

Clustering

- A maximum entropy approach to collaborative filtering in dynamic, sparse, high dimensional domains (2002). http://research.yahoo.com/publication/OR-2003-007.pdf

- Clustering Methods for Collaborative Filtering (1998). http://citeseer.ist.psu.edu/ungar98clustering.html

- A Formal Statistical Approach to Collaborative Filtering (1998). http://citeseer.ist.psu.edu/387035.html

- A Scalable Collaborative Filtering Framework based on Co-clustering (2005). http://hercules.ece.utexas.edu/~srujana/papers/icdm05.pdf

- Model-based Overlapping Co-Clustering. http://www.siam.org/meetings/sdm06/workproceed/Text%20Mining/shafiei16.pdf

Transitive Associations

- Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering (2004). http://doi.acm.org/10.1145/963770.963775

Trust Inference

- Improving Collaborative Filtering with Trust-based Metrics (2006). http://doi.acm.org/10.1145/1141277.1141717

- Alleviating the Sparsity Problem of Collaborative Filtering Using Trust Inferences. http://www.ics.forth.gr/isl/publications/paperlink/LNCS_Formatted_iTrust_34770228.pdf

Perception-based

- Online ranking/collaborative filtering using the perception algorithm (2003).

Combining Content-based and Collaborative Filtering

- A Unified Recommendation Framework Based on Probabilistic Relational Models (2005). http://www.stern.nyu.edu/ciio/WorkOnline/IS20042005/0217-01.pdf

- Unifying Collaborative and Content-Based Filtering (2004). http://www.cs.brown.edu/people/th/publications.html

- Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes (2003). http://www.dbs.informatik.uni-muenchen.de/~yu_k

- Content-Boosted Collaborative Filtering (2001).

Distributed Collaborative Filtering

- Personalization of a peer-to-peer television system (2006). http://ict.ewi.tudelft.nl/pub/jun/euroitv06.pdf

- Distributed Collaborative Filtering for Peer-to-Peer File Sharing Systems (2006). http://ict.ewi.tudelft.nl/pub/jun/sac06.pdf

- Pocketlens: Toward a Personal Recommender System (2004). http://doi.acm.org/10.1145/1010614.1010618

Other issues

- Being Accurate is Not Enough: How Accuracy Metrics have hurt Recommender Systems (2006). http://www.grouplens.org/papers/pdf/mcnee-chi06-acc.pdf

- A collaborative filtering algorithm and evaluation metric that accurately model the user experience (2004). http://doi.acm.org/10.1145/1008992.1009050

- Evaluating collaborative filtering recommender systems (2004). http://doi.acm.org/10.1145/963770.963772

Related Information Retrieval Papers

In general, collaborative filtering is formulated as a self-contained problem, apart from classic approaches for text retrieval, e.g. RSJ models and language models. However, the collaborative filtering problem can be treated as a prediction problem – a prediction of the relevance between user and item (see <a href=”http://ict.ewi.tudelft.nl/pub/jun/ecir06.pdf”>user-item relevance models</a>). Under this veiw, the instant benefits are gained from the current advances in these text retrieval models. We found the following papers are pretty interesting and are related to the collaborative filtering problem.

- Query Chains: Learning to Rank from Implicit Feedback (2005). http://www.cs.cornell.edu/%7Efilip/papers/Radlinski05QueryChains.pdf

- On Event Spaces and Probabilistic Models in Information Retrieval (2005).

- Probabilistic relevance models based on document and query generation (2003).

- Novelty and redundancy detection in adaptive filtering (2002). http://doi.acm.org/10.1145/564376.564393

- Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term (2002).

- Exact Maximum Likelihood Estimation for Word Mixtures (2002). http://www-2.cs.cmu.edu/~yiz/research/paper/icml2002.ps

- A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval (2001).

- Document language models, query models, and risk minimization for information retrieval (2001).

- Information Retrieval as Statistical Translation (1999). http://www.informedia.cs.cmu.edu/documents/irast-final.pdf

- Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval (1998).

- Relevance weighting of search terms (1976).

Related Machine Learning Papers

- On Combining Classifiers (1998). http://ieeexplore.ieee.org/iel4/34/14695/00667881.pdf

- On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions (1976).

- Spectral clustering for multi-type relational data (2006). http://portal.acm.org/citation.cfm?id=1143918

- Hierarchical Bayesian Models for Applications in Information Retrieval (2003). http://www.cs.berkeley.edu/~jordan/papers/jordan-valencia.pdf

- A Hierarchical Latent Variable Model for Data Visualization (1998). http://citeseer.ist.psu.edu/bishop98hierarchical.html – Combining Labeled and Unlabeled Data with Co-Training (1998). http://citeseer.ist.psu.edu/47625.html

- Enhancing Supervised Learning with Unlabeled Data (2000). http://citeseer.ist.psu.edu/goldman00enhancing.html

Post to Twitter