ALBERT: A Lite BERT for Self-supervised Learning of Language Representations: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 3: Line 3:


==Introduction==
==Introduction==
== Motivation ==
==Model details==
===Factorized embedding parameterization===
===Cross-layer parameter sharing===
===Inter-sentence coherence loss===
===Removing dropout===

Revision as of 18:53, 2 November 2020

Presented by

Maziar Dadbin

Introduction

Motivation

Model details

Factorized embedding parameterization

Cross-layer parameter sharing

Inter-sentence coherence loss

Removing dropout