Large Language Models
"AI systems went from solving 2% of real coding problems to 49% in one year. Anthropic warns the window for safe regulation is closing fast."
OpenAI's SimpleQA tests 4,326 factual questions with 3% error rate. GPT-4o scores under 40%, showing larger models excel while deeper thinking ones opt to decline.
"ChatGPT-4 scored 92 in clinical reasoning vs physicians' 74-76. AI-assisted doctors completed diagnoses 1+ minute faster but showed no accuracy gains"
Ministral 3B and 8B models outperform larger peers, support 128k context, and enable on-device AI for robotics and local analytics. Pricing from $0.04/million tokens.
Stanford study: LLMs can assess and optimise educational materials, replicating learning effects and generating content preferred by human teachers.
NVIDIA optimised LLMs, achieving 3.5x latency improvement for Llama 70B in under a year. Blackwell platform shows 4x performance boost and first FP4 precision use in MLPerf.