Learning Combinatorial Optimzation

From statwiki
Revision as of 13:14, 20 March 2018 by A33chow (talk | contribs) (Model)
Jump to: navigation, search

Learning Combinatorial Optimization Algorithms Over Graphs

Group Members

Abhi (Graph Theory),

Alvin (Reinforcement Learning/actual paper)

Pranav (actual paper),

Daniel (Conclusion: performance, adv, disadv, criticism)

Introduction and Problem Motivation

1) Graph Theory (MLP, TSP, Maxcut) - Common Problems to Solve are: Minimum Vertex Cover: Given a ‘graph’ G, find the minimum number of vertices to tick, so that every single edge is covered. G=(V,E,w). Where G is the Graph, V are the vertices, E is the edge, and w is the set of weights for the edges

Maximum Cut: Given a ‘graph’ G,

Travelling Salesman Problem

2) Reinforcement Learning - The core concept of Reinforcement Learning is to consider a partially observable Markov Decision Process, and a A Markov decision process is a 5-tuple [math](S,A,P_\cdot(\cdot,\cdot),R_\cdot(\cdot,\cdot),\gamma)[/math], where

  • [math]S[/math] is a finite set of states (they do not have to be, but for the purpose of this paper, we assume for it to be),
  • [math]A[/math] is a finite set of actions (generally only feasible actions) (alternatively, [math]A_s[/math] is the finite set of actions available from state [math]s[/math]),
  • [math]P_a(s,s') = \Pr(s_{t+1}=s' \mid s_t = s, a_t=a)[/math] is the probability that action [math]a[/math] in state [math]s[/math] at time [math]t[/math] will lead to state [math]s'[/math] at time [math]t+1[/math],
  • [math]R_a(s,s')[/math] is the immediate reward (or expected immediate reward) received after transitioning from state [math]s[/math] to state [math]s'[/math], due to action [math]a[/math], furthermore, it is between two consecutive time periods
  • [math]\gamma \in [0,1][/math] is the discount factor, which represents the difference in importance between future rewards and present rewards.

In Reinforcement Learning, the rules are generally stochastic, which means that we associate a probability with choosing an action as opposed to deterministic choice of an action. Some other talks have elucidated about this, however, in detail, the idea is that, to maintain exploration-exploitation tradeoffs it's a good idea to have a list of probabilities as opposed to random values.


Testing $X_i$ = 50


The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called structure2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. Some of these properties or features include a node’s graph neighbourhood which may or may not be useful depending on the problem. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. Another criticism for the paper is in their choice of reinforcement learning algorithm. The authors decide to use the Deep Q Learning (DQN) algorithm in their experiments and tests. However, they did not consider using Asynchronous Advantage Actor Critic (A3C) which is a fast and popular Reinforcement learning algorithm that provides an simple and lightweight advantage to its processing.


The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.


Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017