Benchmark Model - Search News

57m

Stock market today: Dow, S&P 500, Nasdaq futures jump as stocks head for steep weekly losses

The risk of a US government shutdown has eased but investors stayed on watch for the next move in an escalating trade war.

23h

Testing The Limits: Three Ways AI Benchmarks Are Evolving

When it comes to real-world evaluation, appropriate benchmarks need to be carefully selected to match the context of AI ...

Analytics Insight1d

Gemma 3: Google’s New AI Beats OpenAI’s o3-mini and DeepSeek-V3

Google has launched Gemma 3, the third generation of its open-source AI models. The model is better than rivals like DeepSeek ...

MIT Technology Review1d

Gemini Robotics uses Google’s top language model to make robots more useful

Google DeepMind has released a new model, Gemini Robotics, that combines its best large language model with robotics. Plugging in the LLM seems to give robots the ability to be more dexterous, work ...

OpenAI’s newest developer AI brings search capabilities to AI agents

When using Responses API to create an AI agent, developers can choose from two models: GPT-4o search and GPT-4o mini search.

MIT Technology Review2d

These new AI benchmarks could help make models less biased

New AI benchmarks could help developers reduce bias in AI models, potentially making them fairer and less likely to cause harm. The research, from a team based at Stanford, was posted to the arXiv ...

AMD Ryzen 9 9950X3D: Everything You Need To Know

Everything you need to know about AMD's flagship Ryzen 9 9950X3D processor as reviews go live and availability begins March. 12 ...

Macworld3d

15-inch MacBook Air (M4) review: A Mac rhapsody in (sky) blue

Apple offers three standard configurations each for the 13- and 15-inch MacBook Air. The 15-inch MacBook Air in this review ...

Decrypt3d

China's Manus AI Challenges OpenAI's $200 Agent—If You Can Get an Invite

A new autonomous AI agent platform claims benchmark superiority over Deep Research, even as skeptics question its legitimacy.

This new AI benchmark measures how much models lie

Researchers behind the MASK benchmark found that more knowledge doesn't mean more 'moral virtue.' See which model lies the ...

Chatbots Are Academically Dishonest

Even by the AI industry’s frenetic standards, 2025 has been dizzying. OpenAI, Anthropic, Google, and xAI have all released ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results