Difficulties while Quantizing
Load versions for QAT and compare SPACE/TIME
ValueError: SegformerForImageClassification does not support device_map='auto'. To implement support, the modelclass needs to implement the _no_split_modules attribute.
and
ValueError: SegformerForImageClassification does not support device_map='sequential'. To implement support, the modelclass needs to implement the _no_split_modules attribute.
from_pretrained()supportsdevice_map='auto', but not SegformerForImageClassification- Solution: 
device_map=0(cuda:0) as default param intoSegformerForImageClassification.from_pretrained() 
Test model with example input
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
- If input is not halfed
 - Solution: 
copy()input dict andhalf()thepixel_values 
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- If input is halfed
 - From /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py
 - Solution: inputs[“pixel_values”] to device cuda
 
UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.
- Solution (TODO): Use bnb config 
BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=<dtpype>) 
Training
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
- Error when training int8 and int4 w/o adopting input
 - PyTorch cpp c10::Half, Introducing the Half type!, Numpy Data types
 - Solution: 
collate_fnwithtensor.half() 
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed bitsandbytes>=0.37.2.
- Error while calling 
Trainer()despite havingbitsandbytes>=0.37.0installed and imported, e.g.%pip list | grep bitsandbytesyieldsbitsandbytes 0.41.1 - Solution: TODO
 
UserWarning: You are calling save_pretrained to a 8-bit converted model you may likely encounter unexepected behaviors. If you want to save 8-bit models, make sure to have bitsandbytes>0.37.2 installed.
- Warning and Error saving 8bit quantized model
 
NotImplementedError: You are calling save_pretrained on a 4-bit converted model. This is currently not supported
- Error saving 4bit qantized model
 
RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()
- TODO
 
Designing a device map
- Designing a device map with HF Accelerate, supported defaults: 
"auto", "balanced", "balanced_low_0", "sequential" accelerate.infer_auto_device_map- Solution: for now use 
device_map=0ordevice_map={'':torch.cuda.current_device()}