Transformer distiled, Part 2 of 2 qte77 · August 1, 2022 ml theory transformer regularization attention embedding encoding Transformer distiled, Part 2 of 2 Regularization Self- vs Cross-Attention Adding vs concatenating positional encoding Share: Twitter, Facebook