Wide and Deep Learning for Recommender Systems: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 10: Line 10:
== Related Work ==
== Related Work ==


1. '''Embedding-based models''' like factorization machines (S. Rendle, 2012) or deep neural networks learns a low-dimensional dense embedding vector for each query and item feature to generalize on query-item feature pairs that have never been seen before by learning, but with less work on feature engineering. However, under a sparse and high rank query-item matrix, it is hard to learn the low-dimensional representation for the query-item matrix, which lacks in memorization.
1. '''Generalized linear models''' like logistic regression are trained on binarized sparse features with one-hot encoding using cross-product transformation to achieve memorization, but these models don't generalize to unseen query-item feature pairs, which lacks in generalization.


2. '''Generalized linear models''' like logistic regression are trained on binarized sparse features with one-hot encoding using cross-product transformation to achieve memorization, but these models do not generalize to unseen query-item feature pairs.
2. '''Embedding-based models''' like factorization machines (S. Rendle, 2012) or deep neural networks learns a low-dimensional dense embedding vector for each query and item feature to generalize on query-item feature pairs that have never been seen before by learning, but with less work on feature engineering. However, under a sparse and high rank query-item matrix, it is hard to learn the low-dimensional representation for the query-item matrix, which lacks in memorization.
 
== Motivation ==
 
Can we build a model to achieve both memorization and generalization? This question motivates the concept of joint training wide and deep models, specifically Wide & Deep Learning.
 
The performance of generalized linear models with cross-product transformation can be improved by adding features that are less granular. However, this requires lots of work in feature engineering. On the other hand, the performance of embedding-based models can be improved by linar models with cross-product feature transformations to memorize the rules with a few number of parameters.
 
Thus, to handle both problems would be to combine the wide and deep models together in the training phase. Therefore, the inception architecture was motivated by Heng-Tze et al. [1] that overcome these difficulties by clustering sparse matrices into relatively dense submatrices. It takes advantage of both extra sparsity and existing computational hardware.


== Model Architecture ==
== Model Architecture ==
Line 31: Line 39:


== References ==
== References ==
[1] Heng-Tze Cheng


[1] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, July 2011.
[1] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, July 2011.

Revision as of 23:16, 30 November 2021

Presented by

Junbin Pan

Introduction

This paper presents a jointly trained wide linear models and deep neural networks architecture - Wide & Deep Learning. In the past, deep neural networks which is good at generalization and generalized linear models with nonlinear feature transformations methods which is good at memorization are widely used in the recommender system. However, combining the benefits of the two models can achieve both memorization and generalization at the same time in recommender system. With jointly training wide linear models and deep neural networks, this paper has demonstrated that a newly proposed Wide & Deep learning outperforms wide-only and deep-only models in recommender systems under the Google Play app with over one billion active users and over one million apps.

Related Work

1. Generalized linear models like logistic regression are trained on binarized sparse features with one-hot encoding using cross-product transformation to achieve memorization, but these models don't generalize to unseen query-item feature pairs, which lacks in generalization.

2. Embedding-based models like factorization machines (S. Rendle, 2012) or deep neural networks learns a low-dimensional dense embedding vector for each query and item feature to generalize on query-item feature pairs that have never been seen before by learning, but with less work on feature engineering. However, under a sparse and high rank query-item matrix, it is hard to learn the low-dimensional representation for the query-item matrix, which lacks in memorization.

Motivation

Can we build a model to achieve both memorization and generalization? This question motivates the concept of joint training wide and deep models, specifically Wide & Deep Learning.

The performance of generalized linear models with cross-product transformation can be improved by adding features that are less granular. However, this requires lots of work in feature engineering. On the other hand, the performance of embedding-based models can be improved by linar models with cross-product feature transformations to memorize the rules with a few number of parameters.

Thus, to handle both problems would be to combine the wide and deep models together in the training phase. Therefore, the inception architecture was motivated by Heng-Tze et al. [1] that overcome these difficulties by clustering sparse matrices into relatively dense submatrices. It takes advantage of both extra sparsity and existing computational hardware.

Model Architecture

abc

Model Results

abc

Conclusion

abc

Critiques

abc

References

[1] Heng-Tze Cheng

[1] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, July 2011.

[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[3] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. In Proc. AISTATS, 2011.

[4] T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. H. Cernocky. Strategies for training large scale neural network language models. In IEEE Automatic Speech Recognition & Understanding Workshop, 2011.

[5] S. Rendle. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol., 3(3):57:1–57:22, May 2012.abc

[6] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint training of a convolutional network and a graphical model for human pose estimation. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, NIPS, pages 1799–1807. 2014.

[7] H. Wang, N. Wang, and D.-Y. Yeung. Collaborative deep learning for recommender systems. In Proc. KDD, pages 1235–1244, 2015.

[8] B. Yan and G. Chen. AppJoy: Personalized mobile application discovery. In MobiSys, pages 113–126, 2011.