The model is tweaked to maximise the likelihood of the training data (maximum likelihood estimation).
Autoregression is to predict the next word in sequence. Masked training is to mask out part of sentence and let the model to fill in the blanks.
NSP next sentence prediction let the model to predicts whether a pair of sentence appear consecutively in the training corpus. This let model to fine tune narrative flow and coherence.
The training cost is high but the inference cost is less about 4.5 to 1
No comments:
Post a Comment