15 posts

Large Language Models

Latest posts
New SimpleQA Benchmark Aims to Test Language Models' Factual Accuracy
New SimpleQA Benchmark Aims to Test Language Models' Factual Accuracy

OpenAI's SimpleQA tests 4,326 factual questions with 3% error rate. GPT-4o scores under 40%, showing larger models excel while deeper thinking ones opt to decline.

by AI-360
ChatGPT Outperforms Physicians in Medical Diagnostic Reasoning Steps, Stanford Study Shows
ChatGPT Outperforms Physicians in Medical Diagnostic Reasoning Steps, Stanford Study Shows

"ChatGPT-4 scored 92 in clinical reasoning vs physicians' 74-76. AI-assisted doctors completed diagnoses 1+ minute faster but showed no accuracy gains"

by AI-360
Mistral AI Launches "Ministraux"
Mistral AI Launches "Ministraux"

Ministral 3B and 8B models outperform larger peers, support 128k context, and enable on-device AI for robotics and local analytics. Pricing from $0.04/million tokens.

by AI-360
AI Models Show Promise in Evaluating and Optimising Educational Content
AI Models Show Promise in Evaluating and Optimising Educational Content

Stanford study: LLMs can assess and optimise educational materials, replicating learning effects and generating content preferred by human teachers.

by AI-360
NVIDIA Boosts LLM Inference Performance
NVIDIA Boosts LLM Inference Performance

NVIDIA optimised LLMs, achieving 3.5x latency improvement for Llama 70B in under a year. Blackwell platform shows 4x performance boost and first FP4 precision use in MLPerf.

by AI-360
Anthropic Outlines AI Safeguards for 2024 U.S. Elections
Anthropic Outlines AI Safeguards for 2024 U.S. Elections

Anthropic implements measures to prevent AI misuse in 2024 US elections, including policy updates, detection systems, and redirects to voting information.

by AI-360
Your link has expired. Please request a new one.
Your link has expired. Please request a new one.
Your link has expired. Please request a new one.
Great! You've successfully signed up.
Great! You've successfully signed up.
Welcome back! You've successfully signed in.
Success! You now have access to additional content.