Tag
5 articles
This article explains CuPy, a GPU-accelerated Python library for high-performance numerical computing, and how it leverages CUDA kernels, streams, and sparse matrices for machine learning workloads.
NVIDIA has released cuda-oxide, an experimental Rust-to-CUDA compiler backend that enables direct compilation of SIMT GPU kernels to PTX bytecode, streamlining GPU development for Rust developers.
Nvidia's CUDA software platform has become a critical competitive advantage, creating a formidable barrier that rivals traditional hardware superiority in the AI industry.
Learn to implement sparse matrix operations using CUDA kernels to achieve 20.5% inference and 21.9% training speedup in LLMs, following the TwELL approach by Sakana AI and NVIDIA.
Learn how to run a tiny but powerful AI model called Bonsai 1-bit LLM on your computer using CUDA and GGUF technology.