maximum-Margin Matrix Factorization: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with 'Assume Y as a n by m matrix containing n user preferences about m movies, such that <math>Y_{ij} = +1</math> if user i likes movie j, and <math>Y_{ij} = -1</math> if he dislikes ...')
 
No edit summary
Line 1: Line 1:
Assume Y as a n by m matrix containing n user preferences about m movies, such that <math>Y_{ij} = +1</math> if user i likes movie j, and <math>Y_{ij} = -1</math> if he dislikes it. Due to lack of knowledge about the users’ opinions, Y is partially observable: it Y has some <math>pm</math> and the other cells are unknown. The main goal is to find matrix X such than it preserves the knowledge in Y, and predicts the value of its unknown cells.
Assume Y as a n by m matrix containing n user preferences about m movies, such that <math>Y_{ij} = +1</math> if user i likes movie j, and <math>Y_{ij} = -1</math> if he dislikes it. Due to lack of knowledge about the users’ opinions, Y is partially observable: it has some <math>pm</math> and the other cells are unknown. The main goal is to find matrix X such than it preserves the knowledge in Y, and predicts the value of its unknown cells.
 
Predicting the unknown values in this problem is possible because the rows and columns of Y are assumed to be related to each other. One can translate this ''relation'' by the rank of X; in other word, the rank of X indicates the number of '''features''' affecting the values in Y. Therefore, minimizing the number of features or the rank of X is equal to finding a ''simple'' relation in the given knowledge of Y. In addition, keeping the number of features low is one way to avoid the problem of over-fitting in the predication process.
 
In the first step, assume we have Y with no unknowns, and the goal is to find X as a simple (low-rank) representation of Y. By choosing the sum-squared error loss function, one can easily use SVD technique to minimize the loss and find X. Despite the fact this solution is suitable for lowering the rank; it does not work for the case we have unknowns in Y (due to several local minima problem).

Revision as of 14:28, 18 July 2009

Assume Y as a n by m matrix containing n user preferences about m movies, such that [math]\displaystyle{ Y_{ij} = +1 }[/math] if user i likes movie j, and [math]\displaystyle{ Y_{ij} = -1 }[/math] if he dislikes it. Due to lack of knowledge about the users’ opinions, Y is partially observable: it has some [math]\displaystyle{ pm }[/math] and the other cells are unknown. The main goal is to find matrix X such than it preserves the knowledge in Y, and predicts the value of its unknown cells.

Predicting the unknown values in this problem is possible because the rows and columns of Y are assumed to be related to each other. One can translate this relation by the rank of X; in other word, the rank of X indicates the number of features affecting the values in Y. Therefore, minimizing the number of features or the rank of X is equal to finding a simple relation in the given knowledge of Y. In addition, keeping the number of features low is one way to avoid the problem of over-fitting in the predication process.

In the first step, assume we have Y with no unknowns, and the goal is to find X as a simple (low-rank) representation of Y. By choosing the sum-squared error loss function, one can easily use SVD technique to minimize the loss and find X. Despite the fact this solution is suitable for lowering the rank; it does not work for the case we have unknowns in Y (due to several local minima problem).