Tag
13 articles
London-based AI chip startup Fractile has raised $220 million to bring its in-memory computing inference chip to production, with support from Accel and Pat Gelsinger.
Meta and Stanford researchers introduce the Fast Byte Latent Transformer, reducing inference memory bandwidth by over 50% without subword tokenization.
Learn how to set up and use TokenSpeed, an open-source LLM inference engine optimized for agentic workloads, with step-by-step instructions for beginners.
This article explains the concept of inference optimization in AI, why it's critical for modern AI deployment, and how companies like Nebius are investing heavily in this area.
This article explains Google's strategy to challenge Nvidia in AI inference by building a diverse chip supply chain with four partners, spanning multiple chip generations from Ironwood to TPU v8.
Learn how to simulate AI chip inference execution with custom memory processing units and inference-optimized architectures, similar to Google's custom chips with Marvell.
NVIDIA releases AITune, an open-source toolkit that automatically identifies the fastest inference backend for PyTorch models, streamlining deployment and enhancing performance.
This article explains how sigmoid and ReLU activation functions affect geometric context preservation in neural networks, and why this matters for inference accuracy.
AI chip startup Rebellions raises $400 million at $2.3B valuation, positioning itself as a challenger to Nvidia's dominance in AI inference hardware.
Learn how to deploy machine learning models on Nvidia's new Vera Rubin platform with dedicated Groq 3 LPX inference chips using Docker containers and ONNX export.
Meta has unveiled four new generations of custom AI chips aimed at reducing inference costs and decreasing reliance on external GPU suppliers like Nvidia and AMD.
Learn about SPCT (Sparse Prompt Compression Technique), a new method developed by DeepSeek AI that improves the scalability of reward models during inference, making AI systems more efficient and cost-effective.