LLM Latest Benchmark Results

News

15d

Grok 4 leapfrogs Claude and DeepSeek in LLM rankings, despite safety concerns

Grok 4 by xAI was released on July 9, and it's surged ahead of competitors like DeepSeek and Claude at LMArena, a leaderboard ...

dbta1y

Deci Unveils Latest LLM, Sets New Benchmarks in Accuracy

Deci, the deep learning company harnessing AI to build AI, is adding a large language model, DeciLM-7B, to its suite of innovative generative AI models-setting new benchmarks in accuracy and ...

VentureBeat1y

Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 ...

LLM training gets an oversized boost that is beating Moore’s Law Of particular note among all the results in the MLPerf Training 3.1 benchmark are the numbers on large language model (LLM) training.

VentureBeat1y

LiveBench is an open LLM benchmark using contamination ... - VentureBeat

Called LiveBench, it’s a general-purpose LLM benchmark that offers test data free of contamination, which tends to happen with a dataset when more models use it for training purposes.

datanami.com1y

Groq Shows Promising Results in New LLM Benchmark ... - Datanami

MOUNTAIN VIEW, Calif., Feb. 13, 2024 — Groq, a generative AI solutions company, is the winner in the latest large language model (LLM) benchmark by ArtificialAnalysis.ai, besting eight top cloud ...

insideHPC8mon

MLCommons Launches LLM Safety Benchmark | Inside HPC & AI News

Dec. 4, 2024 — MLCommons today released AILuminate, a safety test for large language models. The v1.0 benchmark – which provides a series of safety grades for the most widely-used LLMs – is the first ...

datanami.com2mon

Indico Data Launches LLM Benchmark Site for Document Understanding

“Indico has been committed to fostering transparency and trust within the AI industry since our founding,” stated Tom Wilde, CEO of Indico Data. “Our latest initiative, the LLM benchmark site, fills a ...

India Today on MSN11d

Sam Altman says OpenAI LLM achieved IMO gold-level Math skills, GPT-5 launch coming soon

An OpenAI experimental model has achieved gold medal-level performance at the 2025 International Math Olympiad, marking a ...

Business Wire1mon

Simbian Announces Industry’s First Benchmark to Comprehensively ...

Simbian’s AI SOC LLM Leaderboard is the industry’s first and only benchmark that measures LLMs on autonomous end-to-end investigation of alerts, utilizing the above skills.

Business Wire9mon

Riassunto: Cognite lancia Cognite Atlas AI™ LLM & SLM Benchmark ...

Cognite, il leader globale nell'IA per l'industria, oggi ha annunciato il lancio di Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Ag ...

Enid News & Eagle1mon

Simbian Announces Industry’s First Benchmark to Comprehensively ...

Simbian’s AI SOC Agent measured LLM performance for autonomous alert investigation, including tasks of diverse skills. All top-tier LLMs completed over 60% of the tasks but left a gap for ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results