New math benchmark reveals AI models confidently solve problems that have no solution

A new AI benchmark reveals that models confidently solve math problems that have no solution, exposing a key gap in their reasoning capabilities.

In a groundbreaking development that highlights the limitations of current AI systems, a consortium of 64 mathematicians has introduced SOOHAK, a new benchmark designed to test the reasoning capabilities of artificial intelligence models in mathematics. This benchmark includes 439 handwritten math problems, 99 of which are deliberately unsolvable. The results are sobering: while models like Google’s Gemini 3 Pro perform well on research-level tasks—achieving a 30% success rate—none of the tested models can reliably identify when a problem has no solution.

Confidence Without Correctness

The findings underscore a troubling trend in AI behavior: models often provide confident answers to problems that are mathematically impossible to solve. This phenomenon, known as "overconfidence in error," reveals a gap between AI’s ability to generate plausible-sounding solutions and its capacity to recognize the limits of its own knowledge. While increased computational power improves performance on solvable problems, it does not enhance the models’ ability to detect when a problem is fundamentally flawed or unsolvable.

Implications for AI Development

SOOHAK was created to address a key shortcoming in current AI benchmarks: they often focus on correctness rather than the nuanced understanding of problem structure. By including unsolvable problems, researchers aim to push AI systems to develop a more robust sense of mathematical reasoning and self-awareness. As the field moves toward more complex, real-world applications, this kind of benchmark is essential to ensure that AI systems don’t just appear smart—they actually understand what they’re capable of and what they’re not.

Looking Forward

While the results are a wake-up call for AI researchers, they also present a clear direction for future development. Enhancing AI systems to recognize when a problem has no solution is not just an academic exercise—it’s a critical step toward building more reliable, trustworthy, and responsible artificial intelligence. As AI continues to permeate scientific and engineering domains, benchmarks like SOOHAK will play a vital role in shaping the next generation of intelligent systems.

New math benchmark reveals AI models confidently solve problems that have no solution

Confidence Without Correctness

Implications for AI Development

Looking Forward

Related Articles

Music streamer Deezer says more than 50% of daily uploads are AI-generated

Google launches a cheaper alternative to large AI security models like Mythos

US threatens sanctions against Chinese AI models over IP theft