Tag
2 articles
Learn to compress instruction-tuned language models using FP8, GPTQ, and SmoothQuant quantization techniques with llmcompressor, and benchmark their performance.
Google introduces TurboQuant, a new compression algorithm that reduces LLM key-value cache memory by 6x and delivers up to 8x speedup without accuracy loss.