SegFormer Quantization Part 1 Short Intro and Reason

qte77 · August 6, 2023

ml theory transformer attention embedding encoding tensor quantization

SegFormer Quantization, Part 1 Introduction and Reason

Purpose

Quantization to reduce SPACE and it’s effect on TIME and model quality
Research different quantization schemes on pre-trained models
Using HuggingFace built-in or custom functions
If HuggingFace is insufficient use PyTorch or TensorFlow Hubs
If all fall back to using low level PyTorch

What

Overcoming and recording difficulties along the way
From PoC to MVP
As generic as possible using jupytext and papermill

How

Using SegFormer (HF) for Image Classification and Semantic Segmentation
PyTorch .half()
HuggingFace and bitsandbytes load_in_8bit and load_in_4bit
Custom functions like binarization

To come

Using PyTorch quantization capabilities like quant/dequant-Layers, dtype.qint32, quantize_fx, QConfigMapping
Task specific distribution of w/b, act and grad
Use learned task specific distribution as initialisation

Share: Twitter, Facebook