Difference between revisions of "Graph Structure of Neural Networks"

From statwiki
Jump to: navigation, search
(Discussions and Conclusions)
(Discussions and Conclusions)
Line 23: Line 23:
  
 
= Discussions and Conclusions =
 
= Discussions and Conclusions =
 +
 +
Section 5 of the paper summarize the result of experiment among multiple different relational graphs through sampling and analyzing.
  
 
[[File:Result2_441_2020Group16.png]]
 
[[File:Result2_441_2020Group16.png]]
  
 
== 1. Neural networks performance depends on its structure ==
 
== 1. Neural networks performance depends on its structure ==
We collect top-1 errors for all the sampled relational graphs
+
In the experiment, top-1 errors are going to be used to measure the performance of the model. The parameters of the models are average path length and clustering coefficient. Heat maps was created to illustrate the difference of predictive performance among possible average path length and clustering coefficient. In Figure ???, The darker area represents a smaller top-1 error which indicate the model perform better than other area.
on different tasks and architectures, and also record the
+
Compare with the complete graph which has A = 1 and C = 1, The best performing relational graph can outperform the complete graph baseline by 1.4% top-1 error for MLP on CIFAR-10, and 0.5% to 1.2% for models on ImageNet. Hence it is an indicator that the predictive performance of neural network highly depends on the graph structure, or equivalently that completed graph does not always preform the best.  
graph measures (average path length and clustering coefficient) for each sampled graph. The heat maps of graph
+
 
measures vs. predictive performance (Figure 4(a)(c)(f))
 
show that there exist graph structures that can outperform
 
the complete graph (the pixel on bottom right) baselines.
 
The best performing relational graph can outperform the
 
complete graph baseline by 1.4% top-1 error for MLP on
 
CIFAR-10, and 0.5% to 1.2% for models on ImageNet. The
 
existence of well-performing models in many different experimental setups suggests that graph structure does matter
 
for the predictive performance of neural networks.
 
  
 
== 2. Sweet spot where performance is significantly improved ==
 
== 2. Sweet spot where performance is significantly improved ==
To further reduce the impact of training noise, we downsample and aggregate the 3942 graphs in Figure 4(a) into a
+
To reduce the training noise, the 3942 graphs that in the sample had been grouped into 52 bin, each bin had been colored based on the average performance of graphs that fall into the bin. Based on the heat map, the well-performing graphs tend to cluster into a special spot that the paper called “sweet spot” shown in the red rectangle.  
coarse resolution of 52 bins, where each bin represents the
+
 
average performance of graphs that fall into the bin, as is
 
shown in Figure 4(f) (leftmost). With this aggregated heat
 
map, it is clear that well-performing graphs tend to cluster
 
into a “sweet spot”, shown in the red rectangle.
 
 
== 3. Relationship between neural network’s performance and parameters ==  
 
== 3. Relationship between neural network’s performance and parameters ==  
In Figure 4(f), we observe that neural network’s predictive
+
When we visualize the heat map, we can see that there are no significant jump of performance that occurred as small change of clustering coefficient and average path length. If one of the variable is fixed in a small range, it is observed that a second degree polynomial is a good visualization tools for the overall trend. Therefore, both clustering coefficient and average path length are highly related with neural network performance by a U-shape.
performance is approximately a smooth function of the clustering coefficient and average path length of its relational
+
 
graph. Keeping one graph measure fixed (in a small range),
 
we visualize network performances against the other measure (Shown in Figure 4(b)(d)). We use second degree polynomial regression to visualize the overall trend. We observe
 
that both clustering coefficient and average path length are
 
indicative of neural network performance, demonstrating a
 
U-shape correlation
 
 
== 4. Consistency among many different tasks and datasets ==
 
== 4. Consistency among many different tasks and datasets ==
Given that relational graph defines a shared design space
+
It is observed that the results are consistent through different point of view. Among multiple architecture dataset, it is observed that the clustering coefficient within [0.1,0.7] and average path length with in [1.5,3] consistently outperform the baseline complete graph.
across various neural architectures, we observe that relational graphs with certain graph measures may consistently
+
 
perform well regardless of how they are instantiated.
+
Among different dataset with network that has similar clustering coefficient and average path length, the results are correlated, The paper mentioned that ResNet-34 is much more complex than 5-layer MLP but a fixed set relational graphs would perform similarly in both setting, with Pearson correlation of 0.658, the p-value for the Null hypothesis is less than 10^-8.
Qualitative consistency. We visually observe in Figure
+
 
4(f) that the region of well-performing graphs is consistent across different architectures datasets. Specifically, we
 
found that graphs with clustering coefficient within [0.1, 0.7]
 
and average path length within [1.5, 3] consistently outperform the baseline complete graph. Moreover, the U-shape
 
trends between graph measures and corresponding neural
 
network performance, shown in Figure 4(b)(d), are also
 
visually consistent across architectures and datasets.
 
52 graphs:
 
correlation = 0.90
 
3 epochs:
 
correlation = 0.93
 
Figure 5: The correlation between findings in intermediate
 
training epochs and the final epoch (left), and using fewer
 
samples of relational graphs and using all the graphs (right).
 
Quantitative consistency. To further quantify this consistency across tasks and architectures, we select the 52 bins
 
in the heat map in Figure 4(f), where the bin value indicates
 
the average performance of relational graphs whose graph
 
measures fall into the bin range. We plot the correlation of
 
the 52 bin values across different pairs of tasks, shown in
 
Figure 4(e). We observe that the performance of relational
 
graphs with certain graph measures correlates across different tasks and architectures. For example, even though
 
a ResNet-34 has much higher complexity than a 5-layer
 
MLP, and ImageNet is a much more challenging dataset
 
than CIFAR-10, a fixed set relational graphs would perform
 
similarly in both settings, indicated by a Pearson correlation
 
of 0.658 (p-value < 10−8
 
).
 
 
== 5. top architectures can be identified efficiently ==
 
== 5. top architectures can be identified efficiently ==
  

Revision as of 15:26, 15 November 2020

Presented By

Xiaolan Xu, Robin Wen, Yue Weng, Beizhen Chang

Introduction

We develop a new way of representing a neural network as a graph, which we call relational graph. Our key insight is to focus on message exchange, rather than just on directed data flow. As a simple example, for a fixedwidth fully-connected layer, we can represent one input channel and one output channel together as a single node, and an edge in the relational graph represents the message exchange between the two nodes (Figure 1(a)).

Relational Graph

Parameter Definition

(1) Clustering Coefficient

(2) Average Path Length

Experimental Setup (Section 4 in the paper)

Discussions and Conclusions

Section 5 of the paper summarize the result of experiment among multiple different relational graphs through sampling and analyzing.

Result2 441 2020Group16.png

1. Neural networks performance depends on its structure

In the experiment, top-1 errors are going to be used to measure the performance of the model. The parameters of the models are average path length and clustering coefficient. Heat maps was created to illustrate the difference of predictive performance among possible average path length and clustering coefficient. In Figure ???, The darker area represents a smaller top-1 error which indicate the model perform better than other area. Compare with the complete graph which has A = 1 and C = 1, The best performing relational graph can outperform the complete graph baseline by 1.4% top-1 error for MLP on CIFAR-10, and 0.5% to 1.2% for models on ImageNet. Hence it is an indicator that the predictive performance of neural network highly depends on the graph structure, or equivalently that completed graph does not always preform the best.


2. Sweet spot where performance is significantly improved

To reduce the training noise, the 3942 graphs that in the sample had been grouped into 52 bin, each bin had been colored based on the average performance of graphs that fall into the bin. Based on the heat map, the well-performing graphs tend to cluster into a special spot that the paper called “sweet spot” shown in the red rectangle.

3. Relationship between neural network’s performance and parameters

When we visualize the heat map, we can see that there are no significant jump of performance that occurred as small change of clustering coefficient and average path length. If one of the variable is fixed in a small range, it is observed that a second degree polynomial is a good visualization tools for the overall trend. Therefore, both clustering coefficient and average path length are highly related with neural network performance by a U-shape.

4. Consistency among many different tasks and datasets

It is observed that the results are consistent through different point of view. Among multiple architecture dataset, it is observed that the clustering coefficient within [0.1,0.7] and average path length with in [1.5,3] consistently outperform the baseline complete graph.

Among different dataset with network that has similar clustering coefficient and average path length, the results are correlated, The paper mentioned that ResNet-34 is much more complex than 5-layer MLP but a fixed set relational graphs would perform similarly in both setting, with Pearson correlation of 0.658, the p-value for the Null hypothesis is less than 10^-8.

5. top architectures can be identified efficiently

6. well-performing neural networks have graph structure surprisingly similar to those of real biological neural networks

Critique