Transformer • Ben Lau

The fundamental ideas are the encoder-decoder architecture and the attention mechanism. The attention mechanism is the core mechanism that every models is using, but there are three evolution directions on the architecture. BERT was built based on encoder only. There are not much models are really built based on encoder and decoder. Nowadays, most of the models are based on decoder only, such as GPT, Claude, LLaMA.