Start Ai Test Benchmark

2don MSN

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.

7don MSN

OpenAI’s deep research can complete 26% of Humanity’s Last Exam—a benchmark for the frontier of human knowledge

OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the ...

Yahoo Finance8d

HackerRank Introduces New Benchmark to Assess Advanced AI Models

The intent of the HackerRank ASTRA Benchmark is to determine the correctness and consistency of an AI model’s coding ability ... standard deviation. Wide test case coverage: ASTRA’s dataset ...

Hosted on MSN1d

Elon Musk's Grok 3 is now available, beats ChatGPT in some benchmarks — LLM took 10x more compute to train versus Grok 2

Elon Musk just launched Grok 3, the latest version of xAI’s LLM that was trained at the Colossus Supercluster in Memphis, ...

1don MSN

Musk's xAI releases artificial intelligence model Grok 3, claims better performance than rivals in early testing

Elon Musk’s artificial intelligence company xAI has unveiled its latest AI chatbot, Grok 3, as its competition with OpenAI ...

decrypt5d

New Open Source AI Model Rivals DeepSeek's Performance—With Far Less Training Data

OpenThinker-32B achieved benchmark-beating results using just 14% of the data its Chinese competitor needed, marking a win ...

Morningstar5d

Aquant's 2025 Field Service Benchmark Report Reveals AI Enabling 39% Faster Machinery Repairs and More

NEW YORK, Feb. 13, 2025 (GLOBE NEWSWIRE) -- Aquant, an AI platform built for servicing complex machinery, released its highly anticipated 2025 Field Service Benchmark Report, offering an in-depth ...

Mena FN8d

Hackerrank Introduces New Benchmark To Assess Advanced AI Models

(MENAFN- GlobeNewsWire - Nasdaq) industry Leader Known for Software Development Skills Expertise Introduces Real-World Benchmark of AI Software Development Capabilities CUPERTINO, Calif., ...

Business Insider8d

HackerRank Introduces New Benchmark to Assess Advanced AI Models

The intent of the HackerRank ASTRA Benchmark is to determine the correctness and consistency of an AI model’s coding ability in relation to practical applications. “With the ASTRA Benchmark ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results