Tag

#LLM

14 articles

Google Introduces TurboQuant: A New Compression Algorithm that Reduces LLM Key-Value Cache Memory by 6x and Delivers Up to 8x Speedup, All with Zero Accuracy Loss

Google introduces TurboQuant, a new compression algorithm that reduces LLM key-value cache memory by 6x and delivers up to 8x speedup without accuracy loss.

Mar 2427

The leaderboard “you can’t game,” funded by the companies it ranks

Arena, formerly LM Arena, has emerged as the de facto public leaderboard for frontier LLMs, influencing funding, launches, and PR cycles in the AI industry.

Mar 1821

How to Build Type-Safe, Schema-Constrained, and Function-Driven LLM Pipelines Using Outlines and Pydantic

Learn how to build AI pipelines that produce reliable, structured outputs using Outlines and Pydantic. Understand how type-safe and schema-constrained LLM pipelines work and why they matter for real-world AI applications.

Mar 1435

Improving instruction hierarchy in frontier LLMs

OpenAI introduces IH-Challenge, a training method that improves instruction hierarchy in frontier LLMs, enhancing safety steerability and resistance to prompt injection attacks.

Mar 1040

NVIDIA AI Releases Nemotron-Terminal: A Systematic Data Engineering Pipeline for Scaling LLM Terminal Agents

This explainer explores NVIDIA's Nemotron-Terminal, a systematic data engineering pipeline for scaling LLMs in terminal environments. Learn how it addresses the critical bottleneck of training data for autonomous AI agents.

Mar 1041

tools

LangWatch Open Sources the Missing Evaluation Layer for AI Agents to Enable End-to-End Tracing, Simulation, and Systematic Testing

LangWatch open-sources a platform to evaluate and trace AI agents, addressing the non-determinism challenge in LLM-based systems.

Mar 438

tools

How to Build a Stable and Efficient QLoRA Fine-Tuning Pipeline Using Unsloth for Large Language Models

A new tutorial demonstrates how to build a stable and efficient QLoRA fine-tuning pipeline using Unsloth, addressing common Colab issues and enabling resource-efficient LLM training.

Mar 387

tools

Alibaba Team Open-Sources CoPaw: A High-Performance Personal Agent Workstation for Developers to Scale Multi-Channel AI Workflows and Memory

Alibaba open-sources CoPaw, a high-performance personal agent workstation designed to help developers scale multi-channel AI workflows and memory management.

Mar 1195

Can GRPO be 10x Efficient? Kwai AI’s SRPO Suggests Yes with SRPO

Kwai AI's SRPO framework slashes LLM RL post-training steps by 90% while matching DeepSeek-R1 performance in math and code. This two-stage RL approach with history resampling overcomes GRPO limitations.

Feb 2758

Sakana AI Introduces Doc-to-LoRA and Text-to-LoRA: Hypernetworks that Instantly Internalize Long Contexts and Adapt LLMs via Zero-Shot Natural Language

Sakana AI introduces Doc-to-LoRA and Text-to-LoRA, hypernetwork techniques that enable instant long-context internalization and zero-shot LLM adaptation via natural language instructions.

Feb 2737

Which Agent Causes Task Failures and When?Researchers from PSU and Duke explores automated failure attribution of LLM Multi-Agent Systems

Researchers from PSU and Duke University develop a framework to automatically identify which agent in an LLM multi-agent system causes task failures and when the failure occurs.

Feb 2654

research

A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half

A new Google AI research introduces the Deep-Thinking Ratio, a method to improve LLM accuracy while cutting inference costs by half. It challenges the traditional belief that longer reasoning chains lead to better outcomes.

Feb 2360