Wide and Deep Learning for Recommender Systems
This paper presents a jointly trained wide linear models and deep neural networks architecture - Wide & Deep Learning. In the past, deep neural networks which is good at generalization and generalized linear models with nonlinear feature transformations methods which is good at memorization are widely used in the recommender system. However, combining the benefits of the two models can achieve both memorization and generalization at the same time in recommender system. With jointly training wide linear models and deep neural networks, this paper has demonstrated that a newly proposed Wide & Deep learning outperforms wide-only and deep-only models in recommender systems under the Google Play app with over one billion active users and over one million apps.
1. Embedding-based models like factorization machines (S. Rendle, 2012) or deep neural networks learns a low-dimensional dense embedding vector for each query and item feature to generalize on query-item feature pairs that have never been seen before by learning, but with less work on feature engineering. However, under a sparse and high rank query-item matrix, it is hard to learn the low-dimensional representation for the query-item matrix, which lacks in memorization.
2. Generalized linear models like logistic regression are trained on binarized sparse features with one-hot encoding using cross-product transformation to achieve memorization, but these models do not generalize to unseen query-item feature pairs.
 J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, July 2011.
 K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2016.
 H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. In Proc. AISTATS, 2011.
 T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. H. Cernocky. Strategies for training large scale neural network language models. In IEEE Automatic Speech Recognition & Understanding Workshop, 2011.
 S. Rendle. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol., 3(3):57:1–57:22, May 2012.abc
 J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint training of a convolutional network and a graphical model for human pose estimation. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, NIPS, pages 1799–1807. 2014.
 H. Wang, N. Wang, and D.-Y. Yeung. Collaborative deep learning for recommender systems. In Proc. KDD, pages 1235–1244, 2015.
 B. Yan and G. Chen. AppJoy: Personalized mobile application discovery. In MobiSys, pages 113–126, 2011.