Transformer pro and contra
O(n^2)
O(1)
Parallelization
Transfer learning
Pre-training
Read More
Python matmul() vs einsum()
matmul() vs einsum()
Theory
Case studies
Read More
Covariance Shift and Inductive Priors
Covariance Shift and Inductive Priors
Covariance Shift
Regularization
Drop Out
Label
Smoothing
Regularization vs Normalization
Inductive Priors
Read More
Python Iterators and Generators
Python Iterators and Generators
Read More
Transformer distiled, Part 2 of 2
Regularization
Self- vs Cross-Attention
Adding vs concatenating positional encoding
Read More