stat946F18: Difference between revisions

From statwiki
Jump to navigation Jump to search
 
(269 intermediate revisions by 41 users not shown)
Line 1: Line 1:
== [[F18-STAT946-Proposal| Project Proposal ]] ==
=Paper presentation=
=Paper presentation=
[https://goo.gl/forms/8NucSpF36K6IUZ0V2 Your feedback on presentations]
= Record your contributions here [https://docs.google.com/spreadsheets/d/1SxkjNfhOg_eXWpUnVHuIP93E6tEiXEdpm68dQGencgE/edit?usp=sharing]=
Use the following notations:
P: You have written a summary/critique on the paper.
T: You had a technical contribution on a  paper (excluding the paper that you present).
E: You had an editorial contribution on a  paper (excluding the paper that you present).
{| class="wikitable"
{| class="wikitable"


Line 11: Line 32:
|width="30pt"|Link to the summary
|width="30pt"|Link to the summary
|-
|-
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat946w18/Unsupervised_Machine_Translation_Using_Monolingual_Corpora_Only Summary]]
|-
|Oct 25 || Dhruv Kumar || 1 || Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs || [https://openreview.net/pdf?id=rkRwGg-0Z Paper] ||
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat946F18/Beyond_Word_Importance_Contextual_Decomposition_to_Extract_Interactions_from_LSTMs Summary]
[https://wiki.math.uwaterloo.ca/statwiki/images/e/ea/Beyond_Word_Importance.pdf Slides]
|-
|Oct 25 || Amirpasha Ghabussi || 2 || DCN+: Mixed Objective And Deep Residual Coattention for Question Answering || [https://openreview.net/pdf?id=H1meywxRW Paper] ||
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=DCN_plus:_Mixed_Objective_And_Deep_Residual_Coattention_for_Question_Answering Summary]
|-
|Oct 25 || Juan Carrillo || 3 || Hierarchical Representations for Efficient Architecture Search  || [https://arxiv.org/abs/1711.00436 Paper]  ||
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat946F18/Hierarchical_Representations_for_Efficient_Architecture_Search Summary]
[https://wiki.math.uwaterloo.ca/statwiki/images/1/15/HierarchicalRep-slides.pdf Slides]
|-
|Oct 30 ||  Manpreet Singh Minhas || 4 || End-to-end Active Object Tracking via Reinforcement Learning || [http://proceedings.mlr.press/v80/luo18a/luo18a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=End_to_end_Active_Object_Tracking_via_Reinforcement_Learning Summary]
|-
|Oct 30 ||  Marvin Pafla || 5 || Fairness Without Demographics in Repeated Loss Minimization ||  [http://proceedings.mlr.press/v80/hashimoto18a.html Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Fairness_Without_Demographics_in_Repeated_Loss_Minimization Summary]
|-
|Oct 30 ||  Glen Chalatov || 6 || Pixels to Graphs by Associative Embedding || [http://papers.nips.cc/paper/6812-pixels-to-graphs-by-associative-embedding Paper] ||
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Pixels_to_Graphs_by_Associative_Embedding Summary]
|-
|Nov 1 ||  Sriram Ganapathi Subramanian || 7 ||Differentiable plasticity: training plastic neural networks with backpropagation || [http://proceedings.mlr.press/v80/miconi18a.html Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat946F18/differentiableplasticity Summary]
[https://wiki.math.uwaterloo.ca/statwiki/images/3/3c/Deep_learning_course_presentation.pdf Slides]
|-
|Nov 1 ||  Hadi Nekoei || 8 || Synthesizing Programs for Images using Reinforced Adversarial Learning ||  [http://proceedings.mlr.press/v80/ganin18a.html Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Synthesizing_Programs_for_Images_usingReinforced_Adversarial_Learning Summary]
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:Synthesizing_Programs_for_Images_using_Reinforced_Adversarial_Learning.pdf Slides]
|-
|Nov 1 ||  Henry Chen || 9 || DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks || [https://ieeexplore.ieee.org/abstract/document/7989236 Paper] ||
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=DeepVO_Towards_end_to_end_visual_odometry_with_deep_RNN Summary]
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:DeepVO_Presentation_Henry.pdf Slides]
|-
|Nov 6 ||  Nargess Heydari || 10  ||Wavelet Pooling For Convolutional Neural Networks Networks || [https://openreview.net/pdf?id=rkhlb8lCZ Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat946w18/Wavelet_Pooling_For_Convolutional_Neural_Networks Summary] [https://wiki.math.uwaterloo.ca/statwiki/images/1/1a/Wavelet_Pooling_for_Convolutional_Neural_Networks.pptx Slides]
|-
|Nov 6 ||  Aravind Ravi || 11 || Towards Image Understanding from Deep Compression Without Decoding || [https://openreview.net/forum?id=HkXWCMbRW Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat946w18/Towards_Image_Understanding_From_Deep_Compression_Without_Decoding Summary]
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:DL_STAT946_PPT_AravindRavi.pdf Slides]
|-
|Nov 6 ||  Ronald Feng || 12  || Learning to Teach || [https://openreview.net/pdf?id=HJewuJWCZ Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_to_Teach Summary]
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:946_L2T_slides.pdf Slides]
|-
|Nov 8 ||  Neel Bhatt || 13 || Annotating Object Instances with a Polygon-RNN || [https://www.cs.utoronto.ca/~fidler/papers/paper_polyrnn.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Annotating_Object_Instances_with_a_Polygon_RNN Summary] [https://wiki.math.uwaterloo.ca/statwiki/images/a/af/ANNOTATING_OBJECT_INSTANCES_NEEL_BHATT.pdf Slides]
|-
|Nov 8 ||  Jacob Manuel || 14 || Co-teaching: Robust Training Deep Neural Networks with Extremely Noisy Labels || [https://arxiv.org/pdf/1804.06872.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Co-Teaching Summary] [https://wiki.math.uwaterloo.ca/statwiki/images/3/33/Co-Teaching.pdf Slides]
|-
|Nov 8 ||  Charupriya Sharma|| 15 || A Bayesian Perspective on Generalization and Stochastic Gradient Descent||  [https://openreview.net/pdf?id=BJij4yg0Z Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=A_Bayesian_Perspective_on_Generalization_and_Stochastic_Gradient_Descent Summary]
|-
|NOv 13 || Sagar Rajendran  || 16 || Zero-Shot Visual Imitation || [https://openreview.net/pdf?id=BkisuzWRW Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Zero-Shot_Visual_Imitation Summary]
|-
 
|Nov 13 || Ruijie Zhang || 17 || Searching for Efficient Multi-Scale Architectures for Dense Image Prediction || [https://arxiv.org/pdf/1809.04184.pdf Paper]|| [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Searching_For_Efficient_Multi_Scale_Architectures_For_Dense_Image_Prediction Summary]
|-
|Nov 13 || Neil Budnarain  || 18 || Predicting Floor Level For 911 Calls with Neural Networks and Smartphone Sensor Data || [https://openreview.net/pdf?id=ryBnUWb0b Paper]  || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Predicting_Floor_Level_For_911_Calls_with_Neural_Network_and_Smartphone_Sensor_Data  Summary]
|-
|NOv 15 ||  Zheng Ma || 19 || Reinforcement Learning of Theorem Proving  ||  [https://arxiv.org/abs/1805.07563 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Reinforcement_Learning_of_Theorem_Proving Summary] [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:zheng_946_presentation.pdf Slides]
|-
|Nov 15 || Abdul Khader Naik  || 20 || Multi-View Data Generation Without View Supervision || [https://openreview.net/pdf?id=ryRh0bb0Z Paper]  || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=MULTI-VIEW_DATA_GENERATION_WITHOUT_VIEW_SUPERVISION Summary]
|-
|Nov 15 || Johra Muhammad Moosa  || 21 || Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin || [https://papers.nips.cc/paper/7255-attend-and-predict-understanding-gene-regulation-by-selective-attention-on-chromatin.pdf Paper]  || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Attend_and_Predict:_Understanding_Gene_Regulation_by_Selective_Attention_on_Chromatin Summary] [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:Attend_and_Predict.pdf Slides]
|-
|NOv 20 || Zahra Rezapour Siahgourabi  || 22 ||Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias  ||[https://arxiv.org/pdf/1807.07049 Paper]  ||
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Robot_Learning_in_Homes:_Improving_Generalization_and_Reducing_Dataset_Bias Summary]
|-
|Nov 20 || Shubham Koundinya  || 23 || Countering Adversarial Images Using Input Transformations ||[https://openreview.net/pdf?id=SyJ7ClWCb paper]  ||
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Countering_Adversarial_Images_Using_Input_Transformations Summary]
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:Countering_Adversarial_Images.pdf Slides]
|-
|Nov 20 || Salman Khan  || 24 || Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples || [http://proceedings.mlr.press/v80/athalye18a.html paper]  || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Obfuscated_Gradients_Give_a_False_Sense_of_Security_Circumventing_Defenses_to_Adversarial_Examples Summary]
|-
|NOv 22 ||Soroush Ameli  || 25 || Learning to Navigate in Cities Without a Map ||  [https://arxiv.org/abs/1804.00168 paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_to_Navigate_in_Cities_Without_a_Map Summary]  
|-
|-
|Oct 25 || Jia Chen || 1|| ||   ||  
|Nov 22 ||Ivan Li || 26 || Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction || [https://arxiv.org/pdf/1802.05451v3.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Mapping_Images_to_Scene_Graphs_with_Permutation-Invariant_Structured_Prediction Summary]
|-
|-
|Oct 25 ||   || 2|| ||   ||  
|Nov 22 ||Sigeng Chen || 27 ||Conditional Neural Processes || [http://proceedings.mlr.press/v80/garnelo18a/garnelo18a.pdf Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=conditional_neural_process Summary]
|-
|-
|Oct 25 ||   || 3|| ||   ||  
|Nov 27 || Aileen Li  || 28 || Unsupervised Neural Machine Translation ||[https://openreview.net/pdf?id=Sy2ogebAW Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Unsupervised_Neural_Machine_Translation Summary]
|-
|-
|Oct 30 || Manpreet Singh Minhas || 1|| ||   ||  
|Nov 27 ||Xudong Peng  || 29 || Visual Reinforcement Learning with Imagined Goals || [https://arxiv.org/abs/1807.04742 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Visual_Reinforcement_Learning_with_Imagined_Goals Summary]
|-
|-
|Nov 1 || Sriram Ganapathi Subramanian || 1|| Mean Field Multi-Agent Reinforcement Learning || [http://proceedings.mlr.press/v80/yang18d.html Paper] ||  
|Nov 27 ||Xinyue Zhang  || 30 || Policy Optimization with Demonstrations || [http://proceedings.mlr.press/v80/kang18a/kang18a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=policy_optimization_with_demonstrations Summary]
|-
|-
|NOv 13 || Sagar Rajendran  || 1|| Zero-Shot Visual Imitation || [https://openreview.net/pdf?id=BkisuzWRW Paper] ||
|-
|-
|Nov 13 ||  || 2||  ||   ||  
|NOv 29 ||Junyi Zhang   || 31 || Autoregressive Convolutional Neural Networks for Asynchronous Time Series || [http://proceedings.mlr.press/v80/binkowski18a/binkowski18a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat946F18/Autoregressive_Convolutional_Neural_Networks_for_Asynchronous_Time_Series Summary]
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:SOCNN.pdf Slides]
|-
|-
|NOv 15 ||  || 3|| ||   ||  
|Nov 29 ||Travis Bender   || 32 || ShakeDrop Regularization || [https://arxiv.org/pdf/1802.02375.pdf Paper]  || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=ShakeDrop_Regularization Summary] [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:ShakeDrop_Regularization.pdf Slides]
|-
|-
|Nov 15 || Abdul Khader Naik || 4||  ||   ||  
|Nov 29 ||Patrick Li || 33 || Dynamic Routing Between Capsules || [https://arxiv.org/pdf/1710.09829.pdf Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=CapsuleNets Summary] [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:STAT946_Presentation1.pdf Slides]||
|-
|-
|NOv 20 || Zahra Rezapour Siahgourabi  || 19|| ||   ||  
|Nov 30 || Jiazhen Chen  || 34 || Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning || [https://arxiv.org/abs/1809.02121 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=learn_what_not_to_learn Summary]
|-
|-
|Nov 20 || Shubham Koundinya || 6|| ||   ||  
|Nov 30 || Gaurav Sahu || 35 || Fix your classifier: the marginal value of training the last weight layer || [https://openreview.net/pdf?id=S1Dh8Tg0- Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Fix_your_classifier:_the_marginal_value_of_training_the_last_weight_layer Summary]
|-
|-
|NOv 22 ||Soroush Ameli  || 22|| ||   ||  
|Nov 23 || Kashif Khan || 36 || Wasserstein Auto-Encoders || [https://arxiv.org/pdf/1711.01558.pdf Paper]  || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Wasserstein_Auto-encoders Summary]
|-
|-
|Nov 27 || Aileen Li || 8|| Visual Interaction Networks: Learning a Physics Simulator from Video ||[http://papers.nips.cc/paper/7040-visual-interaction-networks-learning-a-physics-simulator-from-video.pdf Paper]   ||  
|Nov 23 || Shala Chen || 37 || A Neural Representation of Sketch Drawings || [https://arxiv.org/pdf/1704.03477.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_neural_representation_of_sketch_drawings Summary]
|-
|-
|NOv 27 ||Xudong Peng  || 9|| ||   ||  
|Nov 30 || Ki Beom Lee || 38 || Detecting Statistical Interactions from Neural Network Weights|| [https://openreview.net/forum?id=ByOfBggRZ Paper] ||
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=DETECTING_STATISTICAL_INTERACTIONS_FROM_NEURAL_NETWORK_WEIGHTS Summary]
|-
|-
|Nov 27 ||Xinyue Zhang  || 10|| A Distributional Perspective on Reinforcement Learning || [http://proceedings.mlr.press/v70/bellemare17a/bellemare17a.pdf Paper] ||  
|Nov 23 || Wesley Fisher || 39 || Deep Reinforcement Learning in Continuous Action Spaces: a Case Study in the Game of Simulated Curling || [http://proceedings.mlr.press/v80/lee18b/lee18b.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Reinforcement_Learning_in_Continuous_Action_Spaces_a_Case_Study_in_the_Game_of_Simulated_Curling Summary]
|-
|-
|NOv 29 ||Junyi Zhang  || 11|| ||   ||  
||Nov 30|| Ahmed Afify || 40 ||Don't Decay the Learning Rate, Increase the Batch Size || [https://openreview.net/pdf?id=B1Yy1BxCZ  Paper]|| [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=DON'T_DECAY_THE_LEARNING_RATE_,_INCREASE_THE_BATCH_SIZE Summary]
|-
|-
|Nov 29 ||  || 12||  ||  ||

Latest revision as of 02:22, 2 December 2018

Project Proposal

Paper presentation

Your feedback on presentations


Record your contributions here [1]

Use the following notations:

P: You have written a summary/critique on the paper.

T: You had a technical contribution on a paper (excluding the paper that you present).

E: You had an editorial contribution on a paper (excluding the paper that you present).




Date Name Paper number Title Link to the paper Link to the summary
Feb 15 (example) Ri Wang Sequence to sequence learning with neural networks. Paper [Summary]
Oct 25 Dhruv Kumar 1 Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Paper

Summary Slides

Oct 25 Amirpasha Ghabussi 2 DCN+: Mixed Objective And Deep Residual Coattention for Question Answering Paper

Summary

Oct 25 Juan Carrillo 3 Hierarchical Representations for Efficient Architecture Search Paper

Summary Slides

Oct 30 Manpreet Singh Minhas 4 End-to-end Active Object Tracking via Reinforcement Learning Paper Summary
Oct 30 Marvin Pafla 5 Fairness Without Demographics in Repeated Loss Minimization Paper Summary
Oct 30 Glen Chalatov 6 Pixels to Graphs by Associative Embedding Paper

Summary

Nov 1 Sriram Ganapathi Subramanian 7 Differentiable plasticity: training plastic neural networks with backpropagation Paper Summary

Slides

Nov 1 Hadi Nekoei 8 Synthesizing Programs for Images using Reinforced Adversarial Learning Paper Summary

Slides

Nov 1 Henry Chen 9 DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks Paper

Summary Slides

Nov 6 Nargess Heydari 10 Wavelet Pooling For Convolutional Neural Networks Networks Paper Summary Slides
Nov 6 Aravind Ravi 11 Towards Image Understanding from Deep Compression Without Decoding Paper Summary

Slides

Nov 6 Ronald Feng 12 Learning to Teach Paper Summary

Slides

Nov 8 Neel Bhatt 13 Annotating Object Instances with a Polygon-RNN Paper Summary Slides
Nov 8 Jacob Manuel 14 Co-teaching: Robust Training Deep Neural Networks with Extremely Noisy Labels Paper Summary Slides
Nov 8 Charupriya Sharma 15 A Bayesian Perspective on Generalization and Stochastic Gradient Descent Paper Summary
NOv 13 Sagar Rajendran 16 Zero-Shot Visual Imitation Paper Summary
Nov 13 Ruijie Zhang 17 Searching for Efficient Multi-Scale Architectures for Dense Image Prediction Paper Summary
Nov 13 Neil Budnarain 18 Predicting Floor Level For 911 Calls with Neural Networks and Smartphone Sensor Data Paper Summary
NOv 15 Zheng Ma 19 Reinforcement Learning of Theorem Proving Paper Summary Slides
Nov 15 Abdul Khader Naik 20 Multi-View Data Generation Without View Supervision Paper Summary
Nov 15 Johra Muhammad Moosa 21 Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin Paper Summary Slides
NOv 20 Zahra Rezapour Siahgourabi 22 Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias Paper

Summary

Nov 20 Shubham Koundinya 23 Countering Adversarial Images Using Input Transformations paper

Summary Slides

Nov 20 Salman Khan 24 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples paper Summary
NOv 22 Soroush Ameli 25 Learning to Navigate in Cities Without a Map paper Summary
Nov 22 Ivan Li 26 Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction Paper Summary
Nov 22 Sigeng Chen 27 Conditional Neural Processes Paper Summary
Nov 27 Aileen Li 28 Unsupervised Neural Machine Translation Paper Summary
Nov 27 Xudong Peng 29 Visual Reinforcement Learning with Imagined Goals Paper Summary
Nov 27 Xinyue Zhang 30 Policy Optimization with Demonstrations Paper Summary
NOv 29 Junyi Zhang 31 Autoregressive Convolutional Neural Networks for Asynchronous Time Series Paper Summary

Slides

Nov 29 Travis Bender 32 ShakeDrop Regularization Paper Summary Slides
Nov 29 Patrick Li 33 Dynamic Routing Between Capsules Paper Summary Slides
Nov 30 Jiazhen Chen 34 Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning Paper Summary
Nov 30 Gaurav Sahu 35 Fix your classifier: the marginal value of training the last weight layer Paper Summary
Nov 23 Kashif Khan 36 Wasserstein Auto-Encoders Paper Summary
Nov 23 Shala Chen 37 A Neural Representation of Sketch Drawings Paper Summary
Nov 30 Ki Beom Lee 38 Detecting Statistical Interactions from Neural Network Weights Paper

Summary

Nov 23 Wesley Fisher 39 Deep Reinforcement Learning in Continuous Action Spaces: a Case Study in the Game of Simulated Curling Paper Summary
Nov 30 Ahmed Afify 40 Don't Decay the Learning Rate, Increase the Batch Size Paper Summary