Roberta: Difference between revisions
Jump to navigation
Jump to search
(Created page with "== Presented by == Danial Maleki == Introduction == Self-training methods in the NLP domain(Natural Language Processing) like ELMo[1], GPT[2], BERT[3], XLM[4], and XLNet[5] h...") |
No edit summary |
||
Line 3: | Line 3: | ||
== Introduction == | == Introduction == | ||
Self-training methods in the NLP domain(Natural Language Processing) like ELMo[1], GPT[2], BERT[3], XLM[4], and XLNet[5] have shown significant improvements, but knowing which part the methods have the most contribution is challenging to determine. Roberta is a replication of BERT pretraining which is trying to investigate the effects of hyperparameters tuning and training set size. | Self-training methods in the NLP domain(Natural Language Processing) like ELMo[1], GPT[2], BERT[3], XLM[4], and XLNet[5] have shown significant improvements, but knowing which part the methods have the most contribution is challenging to determine. Roberta is a replication of BERT pretraining which is trying to investigate the effects of hyperparameters tuning and training set size. In summary, what they did can be categorized by (1) they modified some BERT design choices and training schemes. (2) they used a new set of new datasets. These 2 modification categories help them to improve performance on the down stream tasks. | ||
== Background == | == Background == | ||
Roberta is a |
Revision as of 03:45, 29 November 2020
Presented by
Danial Maleki
Introduction
Self-training methods in the NLP domain(Natural Language Processing) like ELMo[1], GPT[2], BERT[3], XLM[4], and XLNet[5] have shown significant improvements, but knowing which part the methods have the most contribution is challenging to determine. Roberta is a replication of BERT pretraining which is trying to investigate the effects of hyperparameters tuning and training set size. In summary, what they did can be categorized by (1) they modified some BERT design choices and training schemes. (2) they used a new set of new datasets. These 2 modification categories help them to improve performance on the down stream tasks.
Background
Roberta is a