Recap on ML

Recap on ML

Recap on ML

Home Categories Search About Archive

Transformer distiled, Part 2 of 2

qte77 · August 1, 2022

ml   theory   transformer   regularization   attention   embedding   encoding

Transformer distiled, Part 2 of 2

Regularization

Self- vs Cross-Attention

Adding vs concatenating positional encoding

Share: Twitter, Facebook