Tag
4 articles
Nvidia sets new MLPerf records with 288 GPUs while AMD and Intel pursue different strategic paths in AI hardware competition.
AI models like GPT-5 and Gemini 3 Pro can confidently describe images they've never seen, and current benchmarks fail to detect this issue. A Stanford study highlights the dangers of AI hallucinations and calls for new evaluation methods.
This article explains the trade-offs in AI language model performance, focusing on how models like Grok 4.20 reduce hallucinations but lag behind top-tier models in benchmarks.
This article explains how current AI agent benchmarks focus narrowly on coding tasks, ignoring 92% of the US labor market, and why this limits the real-world applicability of AI systems.