AI-360
OpenAI's o1-mini matches larger models in STEM tasks at 80% lower cost. It scores 70% on AIME, reaches 86th percentile on Codeforces, and is 3-5x faster than GPT-4o.
OpenAI's new 'o1' models excel in science, coding, and math. They achieved 83% success on IMO qualifying exams, up from GPT-4's 13%.
OpenAI's o1 model excels in math, coding, and science, outperforming humans on GPQA Diamond. Uses internal "chain of thought" for complex reasoning.
OpenAI's o1 system card reveals safety assessments. Model uses chain-of-thought reasoning, shows improved safety benchmark performance.
Google's DataGemma uses Data Commons to reduce LLM hallucinations. RIG and RAG approaches improve factual accuracy in AI responses.
Anadol fine-tunes Llama models with 500M+ nature images. "Open source information is like gold for art making," he says, emphasising AI's artistic potential.