LLM Latest Benchmark Results

News

Benchmarks for AI in Software Engineering

Benchmarks drive many areas of research forward, and this is indeed the case for two areas of research that I engage with: ...

dbta1y

Deci Unveils Latest LLM, Sets New Benchmarks in Accuracy

Deci, the deep learning company harnessing AI to build AI, is adding a large language model, DeciLM-7B, to its suite of innovative generative AI models-setting new benchmarks in accuracy and ...

14don MSN

Grok 4 leapfrogs Claude and DeepSeek in LLM rankings, despite safety concerns

Grok 4 by xAI was released on July 9, and it's surged ahead of competitors like DeepSeek and Claude at LMArena, a leaderboard ...

VentureBeat1y

Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 ...

LLM training gets an oversized boost that is beating Moore’s Law Of particular note among all the results in the MLPerf Training 3.1 benchmark are the numbers on large language model (LLM) training.

SiliconANGLE7mon

MLCommons releases new AILuminate benchmark for measuring AI model ...

The benchmark uses AI models to automate the task of analyzing LLM responses. The evaluation models deliver their findings in the form of an automatically-generated report.

datanami.com1y

Groq Shows Promising Results in New LLM Benchmark ... - Datanami

MOUNTAIN VIEW, Calif., Feb. 13, 2024 — Groq, a generative AI solutions company, is the winner in the latest large language model (LLM) benchmark by ArtificialAnalysis.ai, besting eight top cloud ...

VentureBeat1y

Nvidia triples and Intel doubles generative AI inference performance on ...

There are more than 8,500 performance results in the MLCommons' latest benchmark, testing all manner of combinations and permutations of hardware, software and AI inference use cases.

datanami.com2mon

Indico Data Launches LLM Benchmark Site for Document Understanding

“Indico has been committed to fostering transparency and trust within the AI industry since our founding,” stated Tom Wilde, CEO of Indico Data. “Our latest initiative, the LLM benchmark site, fills a ...

SiliconANGLE1y

Nvidia and Intel set new standards for AI performance in MLPerf 4.0 ...

In the the GPT-J LLM text summarization benchmark, the latest Xeon chip showed that it was 1.9-times faster than its predecessor.

The Economist1y

GPT, Claude, Llama? How to tell which AI model is best

And on July 24th, the day after Llama 3.1’s debut, Mistral, a French AI startup, announced Mistral Large 2, its latest LLM, with—you’ve guessed it—yet another table of benchmarks.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results