a fair comparison of graph neural networks for graph classification: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 34: Line 34:
==Risk Assessment and Model Selection==
==Risk Assessment and Model Selection==
'''Risk Assesment
'''Risk Assesment
The goal of risk assessment is to provide an estimate of the performance of a class of models.
When a test set is not explicitly given, a common way to proceed is to use k-fold Cross Validation.
As model selection is performed independently for
each training/test split, they obtain different “best” hyper-parameter configurations; this is why they
refer to the performance of a class of models.
'''Model Selection
The goal of model selection, or hyper-parameter tuning, is to choose among a set of candidate hyperparameter
configurations the one that works best on a specific validation set.

Revision as of 19:07, 9 November 2020

Presented By

Jaskirat Singh Bhatia

Background

Experimental reproducibility and replicability are critical topics in machine learning. Authors have often raised concerns about their lack in scientific publications to improve the quality of the field. Recently, the graph representation learning field has attracted the attention of a wide research community, which resulted in a large stream of works. As such, several Graph Neural Network models have been developed to effectively tackle graph classification. However, experimental procedures often lack rigorousness and are hardly reproducible. The authors tried to reproduce the results from such experiments to tackle the problem of ambiguity in experimental procedures and the impossibility of reproducing results. They also Standardized the experimental environment so that the results could be reproduced while using this environment.

Graph Neural Networks

1. A Neural Network which takes a Graph as an input

2. Tasks include classifying the graph or finding a missing edge/ node in the graph.

Problems in Papers

Some of the most common reproducibility problems encountered in this field concern hyperparameters selection and the correct usage of data splits for model selection versus model assessment. Moreover, the evaluation code is sometimes missing or incomplete, and experiments are not standardized across different works in terms of node and edge features.

These issues easily generate doubts and confusion among practitioners that need a fully transparent and reproducible experimental setting. As a matter of fact, the evaluation of a model goes through two different phases, namely model selection on the validation set and model assessment on the test set. Clearly, to fail in keeping these phases well separated could lead to over-optimistic and biased estimates of the true performance of a model, making it hard for other researchers to present competitive results without following the same ambiguous evaluation procedures.

Risk Assessment and Model Selection

Risk Assesment

The goal of risk assessment is to provide an estimate of the performance of a class of models. When a test set is not explicitly given, a common way to proceed is to use k-fold Cross Validation. As model selection is performed independently for each training/test split, they obtain different “best” hyper-parameter configurations; this is why they refer to the performance of a class of models.

Model Selection

The goal of model selection, or hyper-parameter tuning, is to choose among a set of candidate hyperparameter configurations the one that works best on a specific validation set.