ALBERT: A Lite BERT for Self-supervised Learning of Language Representations: Difference between revisions

Revision as of 19:28, 2 November 2020

Presented by

Maziar Dadbin

Introduction

In this paper, the authors have made some changes to the BERT model and the result is ALBERT, a model that out-performs BERT on GLUE, SQuAD, and RACE benchmarks. The important point is that ALBERT has fewer number of parameters than BERT-large, but still it gets better results. The above mentioned changes are Factorized embedding parameterization and Cross-layer parameter sharing which are two methods of parameter reduction. They also introduced a new loss function and replaced it with one of the loss functions being used in BERT (i.e. NSP). The last change is removing dropout from the model.

@@ Line 3: / Line 3: @@
 ==Introduction==
+In this paper, the authors have made some changes to the BERT model and the result is ALBERT, a model that out-performs BERT on GLUE, SQuAD, and RACE benchmarks. The important point is that ALBERT has fewer number of parameters than BERT-large, but still it gets better results. The above mentioned changes are Factorized embedding parameterization and Cross-layer parameter sharing which are two methods of parameter reduction. They also introduced a new loss function and replaced it with one of the loss functions being used in BERT (i.e. NSP). The last change is removing dropout from the model.
 == Motivation ==

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations: Difference between revisions

Revision as of 19:28, 2 November 2020

Contents

Presented by

Introduction

Motivation

Model details

Factorized embedding parameterization

Cross-layer parameter sharing

Inter-sentence coherence loss

Removing dropout

Navigation menu

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations: Difference between revisions

Revision as of 19:28, 2 November 2020

Presented by

Introduction

Motivation

Model details

Factorized embedding parameterization

Cross-layer parameter sharing

Inter-sentence coherence loss

Removing dropout

Navigation menu

Search