http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=A33chow&feedformat=atom statwiki - User contributions [US] 2022-09-26T07:02:45Z User contributions MediaWiki 1.28.3 http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34936 Learning Combinatorial Optimzation 2018-03-21T01:33:36Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> <br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> <br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1. States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2. Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3. Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4. Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> &lt;math&gt;\text{Learning Algorithm:} &lt;/math&gt;<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.png]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34935 Learning Combinatorial Optimzation 2018-03-21T01:33:20Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> <br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> <br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1. States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2. Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3. Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4. Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]] Fig. 1<br /> <br /> &lt;math&gt;\text{Learning Algorithm:} &lt;/math&gt;<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.png]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34934 Learning Combinatorial Optimzation 2018-03-21T01:32:52Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> <br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> <br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1. States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2. Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3. Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4. Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards) (Fig. 1)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> (Fig. 1)<br /> <br /> &lt;math&gt;\text{Learning Algorithm:} &lt;/math&gt;<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.png]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34933 Learning Combinatorial Optimzation 2018-03-21T01:31:50Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> <br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> <br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1. States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2. Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3. Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4. Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> &lt;math&gt;\text{Learning Algorithm:} &lt;/math&gt;<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.png]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34932 Learning Combinatorial Optimzation 2018-03-21T01:31:26Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> <br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> <br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1. States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2. Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3. Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4. Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> &lt;math&gt;\text{Learning Algorithm:}; &lt;/math&gt;<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.png]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34931 Learning Combinatorial Optimzation 2018-03-21T01:31:07Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> <br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> <br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1. States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2. Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3. Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4. Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> \text{Learning Algorithm:}<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.png]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34930 Learning Combinatorial Optimzation 2018-03-21T01:30:17Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> <br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> <br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1. States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2. Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3. Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4. Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.png]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34927 Learning Combinatorial Optimzation 2018-03-21T01:25:18Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1) States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2) Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> 5) Policy : <br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.png]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:Algorithm_Q-learning.png&diff=34926 File:Algorithm Q-learning.png 2018-03-21T01:24:25Z <p>A33chow: </p> <hr /> <div></div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34925 Learning Combinatorial Optimzation 2018-03-21T01:23:47Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1) States - S is the state of the graph at a given time which is obtained through an action<br /> <br /> 2) Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> 5) Policy : <br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> [[File:Algorithm_Q-learning.jpg]]<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34924 Learning Combinatorial Optimzation 2018-03-21T01:14:19Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[File:s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1) States - S is a sequence of actions on a graph.<br /> <br /> 2) Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> 5) <br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017<br /> <br /> https://blog.acolyer.org/2017/09/15/struc2vec-learning-node-representations-from-structural-identity/</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34919 Learning Combinatorial Optimzation 2018-03-21T00:58:04Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> [[s2vimage1.png]]<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1) States - S is a sequence of actions on a graph.<br /> <br /> 2) Transition - Transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - An action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - Reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34916 Learning Combinatorial Optimzation 2018-03-21T00:51:18Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning:<br /> <br /> 1) States - S is a sequence of actions on a graph.<br /> <br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34915 Learning Combinatorial Optimzation 2018-03-21T00:51:08Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> <br /> 1) States - S is a sequence of actions on a graph.<br /> <br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34914 Learning Combinatorial Optimzation 2018-03-21T00:50:40Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> <br /> 1) &lt;math&gt; \text {States - S is a sequence of actions on a graph.}<br /> <br /> 2) \text {Transition - transitioning to another node; Tag the node that was last used with feature x = 1}<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34913 Learning Combinatorial Optimzation 2018-03-21T00:50:21Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> <br /> 1) &lt;math&gt; \text {States - S is a sequence of actions on a graph. }<br /> <br /> 2) &lt;math&gt; \text {Transition - transitioning to another node; Tag the node that was last used with feature x = 1}<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34912 Learning Combinatorial Optimzation 2018-03-21T00:49:56Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> <br /> 1) &lt;math&gt; \text {States - S is a sequence of actions on a graph. &lt;\math&gt;}<br /> <br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34911 Learning Combinatorial Optimzation 2018-03-21T00:48:55Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> <br /> 1) &lt;math&gt; \text {States - S is a sequence of actions on a graph.} &lt;\math&gt;<br /> <br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34910 Learning Combinatorial Optimzation 2018-03-21T00:47:59Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> <br /> 1) \textStates - S is a sequence of actions on a graph.<br /> <br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34909 Learning Combinatorial Optimzation 2018-03-21T00:46:43Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> <br /> 1) States - S is a sequence of actions on a graph.<br /> <br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34908 Learning Combinatorial Optimzation 2018-03-21T00:45:54Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> <br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> <br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> <br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; <br /> <br /> This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, one- step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34906 Learning Combinatorial Optimzation 2018-03-21T00:45:21Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34905 Learning Combinatorial Optimzation 2018-03-21T00:43:05Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities. A small example is shown below.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;. <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34903 Learning Combinatorial Optimzation 2018-03-21T00:42:47Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities.<br /> <br /> [[File:Sales.png]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;/math&gt;<br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34900 Learning Combinatorial Optimzation 2018-03-21T00:42:24Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: In the travelling Salesman problem, if you have a list of cities and their distance between each other, how can you navigate through each of the cities so that you'll visit every city, and return back to the origin as quick as possible. This problem can be represented in a Graph, where each of the vertices (V) represents the cities, and each of the edges (E) represents the distance between each of the cities.<br /> <br /> [[File:Sales]]<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; where <br /> &lt;math&gt; y = (\gamma max_v Q(h(S_t+1),v';\theta) + r(S_t,v_t) &lt;\math&gt;<br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34897 Learning Combinatorial Optimzation 2018-03-21T00:40:55Z <p>A33chow: /* 3. Representation */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning \n<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34896 Learning Combinatorial Optimzation 2018-03-21T00:40:11Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning \n<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> &lt;math&gt; (y- Q(h(S_t),v_t; \theta))^2 &lt;/math&gt; <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34895 Learning Combinatorial Optimzation 2018-03-21T00:39:27Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning \n<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> &lt;math&gt; (y- Q(h(S_t),v_t; theta))^2 &lt;/math&gt; <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34894 Learning Combinatorial Optimzation 2018-03-21T00:38:59Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G. A cut is the splitting of a graph into 2 parts (S and T). In a Max cut problem, the goal is to cut the graph in such a way that the number of edges that are touching both vertices of S and T at the same time is maximized.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and T is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning \n<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. <br /> &lt;math&gt; (y- Q(h(S_t),v_t; theta))^2 &lt;math&gt; <br /> Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34891 Learning Combinatorial Optimzation 2018-03-21T00:30:51Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G, find some vertices and put them in set S. The goal is to maximize the number of edges that touch a vertex in S in one end, and another vertex outside of S in another end.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and another not in S is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning \n<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34890 Learning Combinatorial Optimzation 2018-03-21T00:29:23Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G, find some vertices and put them in set S. The goal is to maximize the number of edges that touch a vertex in S in one end, and another vertex outside of S in another end.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and another not in S is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> <br /> [[File:reinforcement_learning.png]]<br /> <br /> Learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:reinforcement_learning.png&diff=34889 File:reinforcement learning.png 2018-03-21T00:28:27Z <p>A33chow: A33chow uploaded a new version of File:reinforcement learning.png</p> <hr /> <div></div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34888 Learning Combinatorial Optimzation 2018-03-21T00:26:54Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G, find some vertices and put them in set S. The goal is to maximize the number of edges that touch a vertex in S in one end, and another vertex outside of S in another end.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and another not in S is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> <br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (States, Transitions, Actions, Rewards)<br /> [[File:reinforcement_learning.png]]<br /> <br /> learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34887 Learning Combinatorial Optimzation 2018-03-21T00:26:09Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G, find some vertices and put them in set S. The goal is to maximize the number of edges that touch a vertex in S in one end, and another vertex outside of S in another end.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and another not in S is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (states, transitions, . <br /> [[File:reinforcement_learning.png]]<br /> <br /> learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:reinforcement_learning.png&diff=34886 File:reinforcement learning.png 2018-03-21T00:25:10Z <p>A33chow: </p> <hr /> <div></div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34885 Learning Combinatorial Optimzation 2018-03-21T00:23:28Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G, find some vertices and put them in set S. The goal is to maximize the number of edges that touch a vertex in S in one end, and another vertex outside of S in another end.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and another not in S is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (states, transitions, . <br /> [[File:Test12345.jpg]]<br /> <br /> learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34884 Learning Combinatorial Optimzation 2018-03-21T00:22:52Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G, find some vertices and put them in set S. The goal is to maximize the number of edges that touch a vertex in S in one end, and another vertex outside of S in another end.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and another not in S is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (states, transitions, . <br /> <br /> <br /> learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34883 Learning Combinatorial Optimzation 2018-03-21T00:21:30Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G, find some vertices and put them in set S. The goal is to maximize the number of edges that touch a vertex in S in one end, and another vertex outside of S in another end.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and another not in S is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (states, transitions, . <br /> /u4/a33chow/Desktop/Screen Shot 2018-03-20 at 8.19.21 PM.png<br /> [[File:Test.png]]<br /> <br /> <br /> learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34882 Learning Combinatorial Optimzation 2018-03-21T00:20:10Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Introduction),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. Companies like Ebay, and Fedex are currently spending millions of dollar trying to look for the best solution. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> A problem which is NP is defined as a problem, where all possible paths can be taken, but the time to solve this problem is some sort of a polynomial function.<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is the study of mathematical structures used to model relation between objects. All graphs are made up of a series of vertices (points), edges (lines).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the set of edges, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G. If there exists a set of vertices (S) so that every edge touches at least 1 point in S, then we can say that every element in S is a vertex cover. In a minimum vertex cover problem, the goal is to find the minimum possible size of S.<br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a G, find some vertices and put them in set S. The goal is to maximize the number of edges that touch a vertex in S in one end, and another vertex outside of S in another end.<br /> A quick example of this is below, you can see that when those 2 pink vertices are ticked, the number of edges that touch a vertex in S, and another not in S is maximized, and the maximum value for this solution is 8.<br /> <br /> [[File:Maxcut.png]]<br /> <br /> Travelling Salesman Problem: Given a G, how should a salesman go about navigating between the edges (roads), in order to maximize his potential sales?<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> The choice of representation for the graph, as mentioned above, is known as struct2vec. The intuitive idea of this is very similar to word2vec which is a very popular model for encoding words into a low dimensional space. We begin with explaining &lt;math&gt; \hat{Q} &lt;/math&gt; which can be thought of summarizing the state of a graph at a given point in time. In the Reinforcement Learning literature, &lt;math&gt; \hat{Q} &lt;/math&gt; is often thought of a measure of quality, in this case, it can be thought of that way too, where the quality of the graph represents how much cost we have avoided.<br /> <br /> But representing complex structures is extremely hard, in fact one may always argue that there is some property of the graph that the algorithm has failed to capture, we will elaborate on this in the criticisms of the paper.<br /> <br /> Struct2Vec aims to gather information about the graph topology which could represent how to traverse it using the greedy algorithm. This is a claim that the paper makes without any justification, they claim in the struct2vec paper that measuring similarity in the degrees in vertices represents structural relationships between the nodes. However, it is not impossible to think of a mathematical or intuitive counterexample to this problem, especially when it comes to the TSP/maxcut.<br /> <br /> TODO: Insert image about how good/bad struct2vec is<br /> <br /> &lt;math&gt;\mu_v^{t + 1} \leftarrow \mathcal{F}(x_v, \{\mu_u\}_{u \in \mathcal{N}(v)}, \{w(v, u)\}_{u \in \mathcal{N}(v)}; \Theta)&lt;/math&gt;<br /> <br /> Where:<br /> <br /> 1. &lt;math&gt;\mathcal{N}(v) - \text{Neighborhood of v}&lt;/math&gt;<br /> <br /> 2. &lt;math&gt;\mathcal{F} - \text{Non Linear Mapping}&lt;/math&gt; <br /> <br /> 3. &lt;math&gt;x_v - \text{Current features of the nodes}&lt;/math&gt;<br /> <br /> This formula is explicitly given by: &lt;math&gt; \mu_v^{t + 1} \leftarrow \text{relu}(\theta_1 x_v + \theta_2 \sum_{u \in \mathcal{N}(v)} \mu_u^{t} + \theta_3 \sum_{u \in \mathcal{N}(v)} relu(\theta_4 \cdot w(v, u)) &lt;/math&gt;. There are a few interesting facts about this formula, one of them is: the fact that they've used summations, which makes the algorithm order invariant, possibly because they believe the order of the nodes is not really relevant, however, this is fairly contradictory to the baseline they compare against (which is location dependent/order dependent). Also, one might ask the question that, what if the topology allows for updates to be dependent on future updates. This is why it happens over several iterations (the paper presents 4 as a decent number). It does make sense that as we increase the value of &lt;math&gt;T&lt;/math&gt;, we will see that nodes are dependent on other nodes they are very far away from.<br /> <br /> We also highlight the dimensions of the parameters: &lt;math&gt; \theta_1, \theta_4 \in \mathbb{R}^p, \theta_2, \theta_3 \in \mathbb{R}^{p \times p}&lt;/math&gt;. Now, with this new information about our graphs, we must compute our estimated value function for pursuing a particular action.<br /> <br /> &lt;math&gt; \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \theta_7 \mu_v^{(T)}]) &lt;/math&gt;.<br /> <br /> Where, &lt;math&gt; \theta_5 \in \mathbb{R}^{2p}, \theta_6, \theta_7 \in \mathbb{R}^{p \times p}&lt;/math&gt;. <br /> <br /> Finally, &lt;math&gt; \Theta = \{\theta_i\}_{i=1}^{7}&lt;/math&gt;<br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Transition - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - an action is a node of the graph that isn't part of the sequence of actions. Actions are p-dimensional nodes.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> &lt;math&gt;r(S,v) = c(h(S'),G) - c(h(S),G);&lt;/math&gt; This represents change in cost evaluated from previous state to new state<br /> <br /> For the three optimization problems: MVC, MAXCUT, TSP; we have different formulations of reinforcement learning (states, transitions, . <br /> /u4/a33chow/Desktop/Screen Shot 2018-03-20 at 8.19.21 PM.png<br /> [[File:Example.jpg]]<br /> <br /> <br /> learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called struct2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. This entails picking a fixed output dimension which may or may not compromise on the expressibility of the node, i.e. we may lose information by arbitrarily choosing an output dimension. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> <br /> Another criticism for the paper is the choice of algorithms to compare against. While it is completely up to the authors to choose what they compare their algorithms against, it does seem strange to compare their Reinforcement Learning algorithm to some of the worse insertion heuristics for the TSP. In particular, there are a couple of insertion heuristics that underperform, i.e., choose far below suboptimal tours. It would be very helpful to indicate this to the audience reading the paper. Similarly, for pointer networks being the benchmark.<br /> <br /> == 6. Conclusions ==<br /> TODO: Add the tables from the paper for comparisons<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34838 Learning Combinatorial Optimzation 2018-03-20T21:08:33Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Graph Theory),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is a set of problems where some sort of Spatial Analysis is needed in order to come up with a solution. Generally it is a shape which has several points (vertices), that are connected to each other through a series of edges(nodes).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the edge, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G, find the minimum number of vertices to tick, so that every single edge is covered. <br /> A quick example of this is below, you can see that when those 2 red vertices are ticked, every single edge is now touching a red vertex.<br /> <br /> [[File:MVC2.png]]<br /> <br /> Maximum Cut: Given a ‘graph’ G,<br /> <br /> Travelling Salesman Problem<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> <br /> == 4. Training ==<br /> <br /> Formulating of Reinforcement learning<br /> 1) States - S is a sequence of actions on a graph.<br /> 2) Movement - transitioning to another node; Tag the node that was last used with feature x = 1<br /> 3) Actions - Is a node of the graph that isn't part of the sequence of actions.<br /> 4) Rewards - reward function is defined as change in cost after action and movement.<br /> <br /> More specifically,<br /> r(S,v) = c(h(S'),G) - c(h(S),G); This represents change in cost evaluated from previous state to new state<br /> <br /> learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called structure2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. Some of these properties or features include a node’s graph neighbourhood which may or may not be useful depending on the problem. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> Another criticism for the paper is in their choice of reinforcement learning algorithm. The authors decide to use the Deep Q Learning (DQN) algorithm in their experiments and tests. However, they did not consider using Asynchronous Advantage Actor Critic (A3C) which is a fast and popular Reinforcement learning algorithm that provides an simple and lightweight advantage to its processing.<br /> <br /> == 6. Conclusions ==<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34820 Learning Combinatorial Optimzation 2018-03-20T20:35:24Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Graph Theory),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is a set of problems where some sort of Spatial Analysis is needed in order to come up with a solution. Generally it is a shape which has several points (vertices), that are connected to each other through a series of edges(nodes).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the edge, and w is the set of weights for the edges<br /> <br /> <br /> The problems which the paper is trying to address are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G, find the minimum number of vertices to tick, so that every single edge is covered. <br /> <br /> Maximum Cut: Given a ‘graph’ G,<br /> <br /> Travelling Salesman Problem<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> <br /> == 4. Training ==<br /> <br /> learning Algorithm:<br /> <br /> To perform learning of the parameters, application of n-step Q learning and fitted Q-iteration is used.<br /> <br /> Stepped Q-learning: This updates the function's parameters at each step by performing a gradient step to minimize the squared loss of the function. Generalizing to n-step Q learning, it addresses the issue of delayed rewards, where an immediate valuation of rewards may not be optimal. With this instance, 1 step updating of the parameters may not be optimal.<br /> <br /> Fitted Q-learning: Is a faster learning convergence when used on top of a neural network. In contrast to updating Q function sample by sample, it updates function with batches of samples from data set instead of singular samples.<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called structure2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. Some of these properties or features include a node’s graph neighbourhood which may or may not be useful depending on the problem. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> Another criticism for the paper is in their choice of reinforcement learning algorithm. The authors decide to use the Deep Q Learning (DQN) algorithm in their experiments and tests. However, they did not consider using Asynchronous Advantage Actor Critic (A3C) which is a fast and popular Reinforcement learning algorithm that provides an simple and lightweight advantage to its processing.<br /> <br /> == 6. Conclusions ==<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34818 Learning Combinatorial Optimzation 2018-03-20T20:32:00Z <p>A33chow: /* 4. Training */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Graph Theory),<br /> <br /> Alvin (actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == 1. Introduction and Problem Motivation ==<br /> (work in progress)<br /> One of the most common problems encountered today is a problem known as The Travelling Salesman Problem. The basic premise is that there is a salesman in a city, and he wants to go to people's doorsteps and sell them products, what is the best way to do it? There are a lot of different algorithms devised in the field of Combinatorics and Optimization that can be used to deal with this problem. For example, one solution might be to always visit the next nearest house, and try to sell the product to them, another solution might be to first get a list of possible candidates that may actually purchase your products, and try to visit all of those first, and then go to the original solution. A problem such as this takes a lot of time in order to solve, this problem is an example of a group of problems known as Graph Theory.<br /> <br /> (work in progress)<br /> The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or approximation algorithms. While these approaches have been adequate, it requires specific domain-knowledge behind each individual problem or additional trial-and-error in determining the tradeoff being finding an accurate or efficient heuristics function. However, if these problems are repeated solved, differing only in data values, perhaps we could apply learning on heuristics such that we automate this tedious task.<br /> <br /> === a) Graph Theory ===<br /> Graph Theory is a set of problems where some sort of Spatial Analysis is needed in order to come up with a solution. Generally it is a shape which has several points (vertices), that are connected to each other through a series of edges(nodes).<br /> These problems have a common notation of: <br /> G=(V,E,w)<br /> <br /> Where G is the Graph, V are the vertices, E is the edge, and w is the set of weights for the edges<br /> <br /> <br /> Common Problems to Solve are:<br /> <br /> Minimum Vertex Cover: Given a ‘graph’ G, find the minimum number of vertices to tick, so that every single edge is covered. <br /> <br /> Maximum Cut: Given a ‘graph’ G,<br /> <br /> Travelling Salesman Problem<br /> <br /> == 2. Example Problems ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == 3. Representation ==<br /> <br /> == 4. Training ==<br /> <br /> learning Algorithm:<br /> <br /> To perform<br /> <br /> == 5. Results and Criticisms ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called structure2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. Some of these properties or features include a node’s graph neighbourhood which may or may not be useful depending on the problem. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> Another criticism for the paper is in their choice of reinforcement learning algorithm. The authors decide to use the Deep Q Learning (DQN) algorithm in their experiments and tests. However, they did not consider using Asynchronous Advantage Actor Critic (A3C) which is a fast and popular Reinforcement learning algorithm that provides an simple and lightweight advantage to its processing.<br /> <br /> == 6. Conclusions ==<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == 7. Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Learning_Combinatorial_Optimzation&diff=34778 Learning Combinatorial Optimzation 2018-03-20T17:14:15Z <p>A33chow: /* Model */</p> <hr /> <div>Learning Combinatorial Optimization Algorithms Over Graphs<br /> <br /> <br /> == Group Members ==<br /> <br /> Abhi (Graph Theory),<br /> <br /> Alvin (Reinforcement Learning/actual paper)<br /> <br /> Pranav (actual paper),<br /> <br /> Daniel (Conclusion: performance, adv, disadv, criticism)<br /> <br /> == Introduction and Problem Motivation ==<br /> <br /> 1) Graph Theory (MLP, TSP, Maxcut) - <br /> Common Problems to Solve are:<br /> Minimum Vertex Cover: Given a ‘graph’ G, find the minimum number of vertices to tick, so that every single edge is covered. G=(V,E,w).<br /> Where G is the Graph, V are the vertices, E is the edge, and w is the set of weights for the edges<br /> <br /> Maximum Cut: Given a ‘graph’ G,<br /> <br /> Travelling Salesman Problem<br /> <br /> 2) Reinforcement Learning - The core concept of Reinforcement Learning is to consider a partially observable Markov Decision Process, and a A Markov decision process is a 5-tuple &lt;math&gt;(S,A,P_\cdot(\cdot,\cdot),R_\cdot(\cdot,\cdot),\gamma)&lt;/math&gt;, where<br /> <br /> * &lt;math&gt;S&lt;/math&gt; is a finite set of states (they do not have to be, but for the purpose of this paper, we assume for it to be),<br /> * &lt;math&gt;A&lt;/math&gt; is a finite set of actions (generally only feasible actions) (alternatively, &lt;math&gt;A_s&lt;/math&gt; is the finite set of actions available from state &lt;math&gt;s&lt;/math&gt;),<br /> * &lt;math&gt;P_a(s,s') = \Pr(s_{t+1}=s' \mid s_t = s, a_t=a)&lt;/math&gt; is the probability that action &lt;math&gt;a&lt;/math&gt; in state &lt;math&gt;s&lt;/math&gt; at time &lt;math&gt;t&lt;/math&gt; will lead to state &lt;math&gt;s'&lt;/math&gt; at time &lt;math&gt;t+1&lt;/math&gt;,<br /> *&lt;math&gt;R_a(s,s')&lt;/math&gt; is the immediate reward (or expected immediate reward) received after transitioning from state &lt;math&gt;s&lt;/math&gt; to state &lt;math&gt;s'&lt;/math&gt;, due to action &lt;math&gt;a&lt;/math&gt;, furthermore, it is between two consecutive time periods<br /> *&lt;math&gt;\gamma \in [0,1]&lt;/math&gt; is the discount factor, which represents the difference in importance between future rewards and present rewards.<br /> <br /> In Reinforcement Learning, the rules are generally stochastic, which means that we associate a probability with choosing an action as opposed to deterministic choice of an action. Some other talks have elucidated about this, however, in detail, the idea is that, to maintain exploration-exploitation tradeoffs it's a good idea to have a list of probabilities as opposed to random values.<br /> <br /> <br /> <br /> == Model ==<br /> Testing<br /> $X_i$ = 50<br /> <br /> == Criticism ==<br /> The paper proposes a solution that uses a combination of reinforcement learning and graph embedding to improve current methods of solving graph optimization problems. However, the graph embedding network the authors use is called structure2vec (S2V). S2V takes a graph as input and converts the properties of the nodes in the graph as features. Some of these properties or features include a node’s graph neighbourhood which may or may not be useful depending on the problem. In particular, knowing a node’s neighbourhood is useful in problems such as Minimum Vertex Cover or Maximum Cut, however it may not be as useful in problems such as Traveling Salesman Problem. <br /> Another criticism for the paper is in their choice of reinforcement learning algorithm. The authors decide to use the Deep Q Learning (DQN) algorithm in their experiments and tests. However, they did not consider using Asynchronous Advantage Actor Critic (A3C) which is a fast and popular Reinforcement learning algorithm that provides an simple and lightweight advantage to its processing.<br /> <br /> == Conclusions ==<br /> The machine learning framework the authors propose is a solution to NP-hard graph optimization problems that have a large amount of instances that need to be computed. Where the problem structure remains largely the same except for specific data values. Such cases are common in the industry where large tech companies have to process millions of requests per second and can afford to invest in expensive pre-computation if it speeds up real-time individual requests. Through their experiments and performance results the paper has shown that their solution could potentially lead to faster development and increased runtime efficiency of algorithms for graph problems.<br /> == Source == <br /> Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song. Learning Combinatorial Optimization Algorithms over Graphs. In Neural Information Processing Systems, 2017</div> A33chow