Tag
4 articles
Paged Attention emerges as a key solution to the GPU memory bottleneck in large language models, enabling more efficient memory usage and higher concurrency in AI inference systems.
Researchers from Meta, Cornell, and CMU introduce TinyLoRA, a 13-parameter fine-tuning method that achieves 91.8% accuracy on GSM8K using Qwen2.5-7B.
Yann LeCun has raised $1 billion for his new startup AMI Labs, marking Europe's largest seed funding round ever. Investors are betting on his vision for AI beyond LLMs.