SegFormer Part 3, Quantization Description

qte77 · May 5, 2024

Description of Quantization of pre-trained Image Transformers

Load versions for QAT and compare SPACE/TIME

8-bit quantization with bitsandbytes

From LLM.int8() Paper, Source GH. 8-bit HF inference example

  • optimizer
    • bnb.optim.Adam8bit(....)
    • bnb.nn.Embedding(..)
  • inference
    • linear = bnb.nn.Linear8bitLt(...)
    • Modes: mixed-precision, int8
    • or full LLM.int8() method

BitsAndBytesConfig also offers configuration support.

from transformers import BitsAndBytesConfig

# quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=bf16)
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
)

Twitter, Facebook