User talk:Ahsh: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 5: Line 5:


== '''CoFi_RANK: Maximum Margin Matrix Factorization for Collaborative Ranking''' ==
== '''CoFi_RANK: Maximum Margin Matrix Factorization for Collaborative Ranking''' ==


== Problem Statement: ==
== Problem Statement: ==

Revision as of 00:04, 28 July 2009

Welcome to Wiki Course Notes! We hope you will contribute much and well. You will probably want to read the help pages. Again, welcome and have fun! WikiSysop 20:18, 25 July 2009 (UTC)

CoFi_RANK: Maximum Margin Matrix Factorization for Collaborative Ranking

Problem Statement:

The underlying intelligent tools behind the webshoppers such as Amazon, Netflix, and Apple learn a suggestion function based on the the current user's and the others ratings in order to offer personalized recommendations. To this end, collaborative filtering provided a promising approach in which the rating patterns (of the products) by the current user and the others are used to estimate rates (or ranking) for unrated items. The task is more challenging once the user is unknown for the system (i.e., there is not any rating records from this user). Two different strategies might be incorporated for offering the recommendation list: rating or ranking. Ranking is different from rating in which the set of recommendations is obtaind directly, rather than first finding the rates and then sort them accordingly. For collaborative ratings, Maximum Marging Matrix Factorization (MMMF) had a promising result for estimating the unknown rates. This paper extends the use of MMMF for collaborative ranking. Since the top ranked items (products) are offered to the user, it is more important to predict what a user might like than what he/she dislikes. In other words, the items with higher rank should be ranked more accuratly than the last ones.


Objectives:

The algorithm should

1- directly optimize the ranking scores, 2- be adaptable to different scores, 3- not need any features extraction besides the actual ratings, 4- be scalable and parralizable with large number of items and users.

Definitions:

Polya-Littlewood-Hardy inequality

For any two vectors a, b , their inner product is maximized when a, b are sorted in the same order. That is [math]\displaystyle{ \lt a,b\gt \lt = \lt sort(a),sort(b)\gt }[/math]

Normalized Discounted Comulative Gain

Consider the rating vector [math]\displaystyle{ y\in\{1,...,r\}^{n} }[/math], and [math]\displaystyle{ \pi }[/math] the permutation of the rating vector. For the trucation threshold [math]\displaystyle{ k }[/math] (the number of recommendations), the Discounted Comulative Gains (DCG) score is defined as: [math]\displaystyle{ DCG@k(y,\pi)=\sum_{i=1}^{k}\frac{2^{y_{\pi_i}-1}}{log(i+2)} }[/math] Given the permutation [math]\displaystyle{ \pi_s }[/math] which sorts the [math]\displaystyle{ y }[/math] in decreasing order, the Normalized Discounted Comulative Gains (NDCG) score is given by:[math]\displaystyle{ NDCG@k(y,\pi)=\frac{DCG@k(y,\pi)}{DCG@k(y,\pi_{s})} }[/math]

According to the Polya-Littlewood-Hardy inequality, the DCG has the highest score when the [math]\displaystyle{ y }[/math] is decreasingly ordered. Note that the weighting vector [math]\displaystyle{ \frac{1}{log(i+2)} }[/math] is a monotinically decreasing function.

Problem Statement

Estimate a matrix [math]\displaystyle{ F\in R^{m\times u} }[/math] and use the values [math]\displaystyle{ F_{ij} }[/math] for ranking the item [math]\displaystyle{ j }[/math] by user [math]\displaystyle{ i }[/math]. Therefore,given a matrix [math]\displaystyle{ Y }[/math] of known ratings (in which row i shows the ratings of a user for item j) we are aiming to maxmize the performance of [math]\displaystyle{ F }[/math] defined as: [math]\displaystyle{ R(F,Y)=\sum_{i=1}^{u}NDCG@k(\prod^i,Y^i) }[/math] in which [math]\displaystyle{ \prod^i }[/math] is argsort(-F^i), i.e., the permutation that sorts F in decreasing order. The performance function is discritized (piecewise constant) and, thus, non-convex. In this paper, the structured estimation is used to convert this non-convex problem to a convex (upper bound) minimization. This convesion consists of three steps: 1- Converting [math]\displaystyle{ NDCG(\pi,y) }[/math] into a loss using the regret 2- Linear mapping of rating vector in sorted order 3- Using the convex upper-bound technique to combine the regret and linear map into a convex upper-bound minimization.

(1) Regret Convesion

Instead of maximizing NDCG(\pi,y), we minimize the non-negative regret function [math]\displaystyle{ \delta(\pi,y):=1-NDCG(\pi,y) }[/math] in which vanishes at [math]\displaystyle{ \pi=\pi_{s} }[/math] .

(2) Linear Mapping

Using the Polya-Littlewood-Hardy inequality, we define the linear mapp [math]\displaystyle{ \psi }[/math] to encode the permutation[math]\displaystyle{ \pi=argsort(f) }[/math]. [math]\displaystyle{ \psi(\pi,f):=\lt c,f_{\pi}\gt }[/math]. In this mapping, the [math]\displaystyle{ c }[/math] is a monotonically decreasing function (e.g., [math]\displaystyle{ c_i=(i+1)^{-0.25} }[/math] ).

(3)Convex Upper Bound

The follwoing lemma explains how to find a convex upper-bounds on non-convex optimization.

Lemma==

Assume that [math]\displaystyle{ \psi }[/math] is the abovementioned linear map, and [math]\displaystyle{ \pi^{\ast}:=argsort(-f) }[/math] is the ranking induced by [math]\displaystyle{ f }[/math]. Then, the follwoing loss function [math]\displaystyle{ l(f,y) }[/math] is convex in [math]\displaystyle{ f }[/math] and it satisfies [math]\displaystyle{ l(f,y)\gt = \delta(y,\pi^{\ast}) }[/math]. [math]\displaystyle{ l(f,y):=\max\underline{\pi}[\Delta (\pi,y)+\lt c,f_{\pi}\gt ] }[/math]