Ai Benchmark Results Dashboard

21h

HackerRank Introduces New Benchmark to Assess Advanced AI Models

The ASTRA Benchmark consists of multi-file, project-based problems designed to mimic real-world coding tasks. The intent of the HackerRank ASTRA Benchmark is to determine the correctness and ...

Hosted on MSN18d

OpenAI Accused of Manipulating Benchmark Results as Chinese Models Close AI Performance Gap

It was recently revealed that OpenAI secretly funded and accessed data related to the FrontierMath AI benchmark ... many are now suspicious of those results. Crucially, the furor over FrontierMath ...

TechCrunch19d

Even some of the best AI can’t beat this new benchmark

a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems. The benchmark, called Humanity’s Last Exam ...

TechCrunch23d

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

An organization developing math benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI ...

The Victoria Advocate20d

CAIS and Scale AI Unveil Results of "Humanity's Last Exam," a Groundbreaking New Benchmark

and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results