Stanford
LLMs show high consistency on neutral topics like Thanksgiving but become variable on controversial issues. Larger models outperform smaller ones in reliability.
While software security has reporting infrastructure and bug bounties, AI systems lack similar frameworks for third-party evaluation and protection
"Stanford's fellowship places tech experts in government roles. One fellow helped write AI laws, while others improved public services and training."
"ChatGPT-4 scored 92 in clinical reasoning vs physicians' 74-76. AI-assisted doctors completed diagnoses 1+ minute faster but showed no accuracy gains"
Stanford's RegLab uses AI to identify racial covenants in 5M+ property deeds. The system saves 86,500 person-hours, costs 2% of proprietary models, reveals patterns.
Stanford study: LLMs can assess and optimise educational materials, replicating learning effects and generating content preferred by human teachers.