Ai Performance Benchmark

17h

Which AI agent is the best? This new leaderboard can tell you

On Wednesday, Galileo launched an Agent Leaderboard on Hugging Face, an open-source AI platform where users can build, train, access, and deploy AI models. The leaderboard is meant to help people ...

17h

Leaked AMD Strix Halo benchmark sounds too good to be true

Integrated graphics cards have been fighting an uphill battle for many years, often failing to achieve anything near what ...

8don MSN

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.

23h

Benchmarks Are Breaking Investment Management (And That's Our Opportunity)

How Goodhart’s Law Reveals the Opportunity in Long-Term Innovation Investing, and why traditional performance metrics may be ...

decrypt1d

New Open Source AI Model Rivals DeepSeek's Performance—With Far Less Training Data

OpenThinker-32B achieved benchmark-beating results using just 14% of the data its Chinese competitor needed, marking a win ...

11hon MSN

3DMark benchmarks show off AMD's big daddy Strix Halo laptop chip in action and I'm a little underwhelmed

Strix Halo, AMD's upcoming and extremely large APU, has finally seen some benchmarks in 3DMark Time Spy. These early results ...

Aquant's 2025 Field Service Benchmark Report Reveals AI Enabling 39% Faster Machinery Repairs and More

New report reveals one-third of service queries are solvable without a professional’s help and can be solved through self-serviceNEW YORK, Feb. 13, 2025 (GLOBE NEWSWIRE) -- Aquant, an AI platform ...

Yahoo Finance10d

Paritii Launches The Parity Benchmark: A Game-Changer in AI Fairness Evaluation

Using over 520 carefully designed questions, the benchmark assesses how well AI models handle both factual ... but showed weaker performance in reasoning-heavy bias questions.

11d

OpenAI o3-mini vs DeepSeek R1 : AI Coding Comparison

Discover the strengths and weaknesses of o3-mini and DeepSeek R1 in this detailed AI model comparison of its coding skills ...

Techopedia1d

Kimi AI 1.5: New Chinese AI Model Beats ChatGPT & DeepSeek

Just days after DeepSeek R1 made headlines, Moonshot AI introduced Kimi AI 1.5, a model already touted superior to OpenAI’s ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results