Tag
14 articles
Google introduces TurboQuant, a new compression algorithm that reduces LLM key-value cache memory by 6x and delivers up to 8x speedup without accuracy loss.
Arena, formerly LM Arena, has emerged as the de facto public leaderboard for frontier LLMs, influencing funding, launches, and PR cycles in the AI industry.
Learn how to build AI pipelines that produce reliable, structured outputs using Outlines and Pydantic. Understand how type-safe and schema-constrained LLM pipelines work and why they matter for real-world AI applications.
OpenAI introduces IH-Challenge, a training method that improves instruction hierarchy in frontier LLMs, enhancing safety steerability and resistance to prompt injection attacks.
This explainer explores NVIDIA's Nemotron-Terminal, a systematic data engineering pipeline for scaling LLMs in terminal environments. Learn how it addresses the critical bottleneck of training data for autonomous AI agents.
LangWatch open-sources a platform to evaluate and trace AI agents, addressing the non-determinism challenge in LLM-based systems.
A new tutorial demonstrates how to build a stable and efficient QLoRA fine-tuning pipeline using Unsloth, addressing common Colab issues and enabling resource-efficient LLM training.
Alibaba open-sources CoPaw, a high-performance personal agent workstation designed to help developers scale multi-channel AI workflows and memory management.
Kwai AI's SRPO framework slashes LLM RL post-training steps by 90% while matching DeepSeek-R1 performance in math and code. This two-stage RL approach with history resampling overcomes GRPO limitations.
Sakana AI introduces Doc-to-LoRA and Text-to-LoRA, hypernetwork techniques that enable instant long-context internalization and zero-shot LLM adaptation via natural language instructions.
Researchers from PSU and Duke University develop a framework to automatically identify which agent in an LLM multi-agent system causes task failures and when the failure occurs.
A new Google AI research introduces the Deep-Thinking Ratio, a method to improve LLM accuracy while cutting inference costs by half. It challenges the traditional belief that longer reasoning chains lead to better outcomes.