Interested in boosting quantized LLM performance with QAT?
Check out our latest work on Low-Rank Quantization-Aware Training (LR-QAT) which can train 7B LLMs on a single consumer-grade GPU with just 24GB of memory.
New work with
@yell1337 and
@delchia
arxiv.org/abs/2406.06385