ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

From statwiki
Revision as of 19:05, 2 November 2020 by Mdadbin (talk | contribs)
Jump to navigation Jump to search

Presented by

Maziar Dadbin

Introduction

Motivation

Model details

Factorized embedding parameterization

Cross-layer parameter sharing

Inter-sentence coherence loss

Relationship between convexity and smoothness.

Removing dropout