Transformer distiled, Part 1 of 2
July 1, 2022
An overview of the core mathematical components of the Transformer architecture, covering scaled dot-product attention, softmax, multi-head attention, linear layers, and learned embeddings.
Read More