Large Language Models
"The search uses fine-tuned GPT-4 with novel synthetic data generation. Users can trigger web searches automatically or manually, with linked source citations."
"AI systems went from solving 2% of real coding problems to 49% in one year. Anthropic warns the window for safe regulation is closing fast."
OpenAI's SimpleQA tests 4,326 factual questions with 3% error rate. GPT-4o scores under 40%, showing larger models excel while deeper thinking ones opt to decline.
"ChatGPT-4 scored 92 in clinical reasoning vs physicians' 74-76. AI-assisted doctors completed diagnoses 1+ minute faster but showed no accuracy gains"
Ministral 3B and 8B models outperform larger peers, support 128k context, and enable on-device AI for robotics and local analytics. Pricing from $0.04/million tokens.
Stanford study: LLMs can assess and optimise educational materials, replicating learning effects and generating content preferred by human teachers.