Tag

#inference

18 articles

Nvidia teams up with chip rival d-Matrix instead of fighting it

Nvidia has chosen to collaborate with its chip rival d-Matrix, combining their technologies to create a joint AI system for running inference models. The partnership marks a strategic shift toward cooperation in the competitive AI chip market.

Jul 820

Hot French startup ZML releases free product to speed inference across lots of AI chips

Learn how to use ZML's open-source inference optimization software to accelerate AI model execution across multiple hardware platforms, demonstrating performance improvements through practical implementation.

Jul 743

tech

Nvidia rival Etched raises $800M with backing from Jane Street and a TSMC-linked fund

Learn how to build and optimize AI inference pipelines using TensorFlow, similar to what companies like Etched are developing for specialized AI chips.

Jun 3025

tech

Why Wall Street thinks US memory maker Micron is the next Nvidia

Learn to build and test AI inference systems that demonstrate how Micron's memory technology impacts AI model performance, simulating the advantages that make Micron a potential rival to Nvidia.

Jun 2850

tech

After Nvidia’s $20B not-aqui-hire, AI chip startup Groq reportedly raising $650M

AI chip startup Groq is raising $650 million in internal funding as it pivots from general hardware to focus on AI inference, the process of refining how AI models respond to prompts.

May 2950

tech

Fractile raises $220m to take its in-memory-compute inference chip into production

London-based AI chip startup Fractile has raised $220 million to bring its in-memory computing inference chip to production, with support from Accel and Pat Gelsinger.

May 1358

Meta and Stanford Researchers Propose Fast Byte Latent Transformer That Reduces Inference Memory Bandwidth by Over 50% Without Tokenization

Meta and Stanford researchers introduce the Fast Byte Latent Transformer, reducing inference memory bandwidth by over 50% without subword tokenization.

May 1153

LightSeek Foundation Releases TokenSpeed, an Open-Source LLM Inference Engine Targeting TensorRT-LLM-Level Performance for Agentic Workloads

Learn how to set up and use TokenSpeed, an open-source LLM inference engine optimized for agentic workloads, with step-by-step instructions for beginners.

May 956

Nebius paid $643 million for 20 people because inference is where the money is

This article explains the concept of inference optimization in AI, why it's critical for modern AI deployment, and how companies like Nebius are investing heavily in this area.

May 149

Google is building a four-partner chip supply chain to challenge Nvidia in AI inference

This article explains Google's strategy to challenge Nvidia in AI inference by building a diverse chip supply chain with four partners, spanning multiple chip generations from Ironwood to TPU v8.

Apr 2099

tech

Google is in talks with Marvell to build custom AI inference chips as it diversifies beyond Broadcom

Learn how to simulate AI chip inference execution with custom memory processing units and inference-optimized architectures, similar to Google's custom chips with Marvell.

Apr 1982

tools

NVIDIA Releases AITune: An Open-Source Inference Toolkit That Automatically Finds the Fastest Inference Backend for Any PyTorch Model

NVIDIA releases AITune, an open-source toolkit that automatically identifies the fastest inference backend for PyTorch models, streamlining deployment and enhancing performance.

Apr 1088