Large Language Models - AI-360 (Page 3)

Anthropic Urges Swift Government Action on AI Regulation, Citing Rising Risks

"AI systems went from solving 2% of real coding problems to 49% in one year. Anthropic warns the window for safe regulation is closing fast."

by AI-360

01 OpenAI

New SimpleQA Benchmark Aims to Test Language Models' Factual Accuracy

OpenAI's SimpleQA tests 4,326 factual questions with 3% error rate. GPT-4o scores under 40%, showing larger models excel while deeper thinking ones opt to decline.

by AI-360

01 Stanford

ChatGPT Outperforms Physicians in Medical Diagnostic Reasoning Steps, Stanford Study Shows

"ChatGPT-4 scored 92 in clinical reasoning vs physicians' 74-76. AI-assisted doctors completed diagnoses 1+ minute faster but showed no accuracy gains"

by AI-360

01 Mistral

Mistral AI Launches "Ministraux"

Ministral 3B and 8B models outperform larger peers, support 128k context, and enable on-device AI for robotics and local analytics. Pricing from $0.04/million tokens.

by AI-360

01 Stanford

AI Models Show Promise in Evaluating and Optimising Educational Content

Stanford study: LLMs can assess and optimise educational materials, replicating learning effects and generating content preferred by human teachers.

by AI-360

01 NVIDIA

NVIDIA Boosts LLM Inference Performance

NVIDIA optimised LLMs, achieving 3.5x latency improvement for Llama 70B in under a year. Blackwell platform shows 4x performance boost and first FP4 precision use in MLPerf.

by AI-360