Recap on ML

Recap on ML

Recap on ML

Home Categories Search About Archive

Transformer distiled, Part 1 of 2

qte77 · July 1, 2022

ml   theory   transformer   dot-product   softmax   attention   linear   embedding

Scaled dot-product

Softmax and multi-head attention

Linear layers

Learned Embeddings

Share: Twitter, Facebook