Difference between revisions of "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

From statwiki
Jump to: navigation, search
Line 3: Line 3:
  
 
==Introduction==
 
==Introduction==
 +
 +
== Motivation ==
 +
 +
 +
==Model details==
 +
 +
===Factorized embedding parameterization===
 +
 +
===Cross-layer parameter sharing===
 +
 +
===Inter-sentence coherence loss===
 +
 +
===Removing dropout===

Revision as of 17:53, 2 November 2020

Presented by

Maziar Dadbin

Introduction

Motivation

Model details

Factorized embedding parameterization

Cross-layer parameter sharing

Inter-sentence coherence loss

Removing dropout