OpenAI has released o1-mini, a new cost-efficient AI model optimised for STEM reasoning, offering comparable performance to its larger counterpart at a fraction of the cost.
Launched on September 12, for tier 5 API users, this specialised model excels in STEM-related tasks, particularly in mathematics and coding, while being 80% cheaper than its predecessor, OpenAI o1-preview.
The new model's performance in high-level mathematics is particularly noteworthy. On the American Invitational Mathematics Examination (AIME), o1-mini achieved a score of 70.0%, nearly matching the 74.4% score of the more extensive o1 model. This performance places o1-mini among the top 500 U.S. high school students in mathematical ability.
In the realm of coding, o1-mini demonstrates impressive capabilities. On the Codeforces competition platform, it achieved an Elo rating of 1650, surpassing o1-preview's 1258 and approaching o1's 1673. This rating positions o1-mini in the 86th percentile of programmers competing on the platform.
While o1-mini shines in STEM-related tasks, it does have limitations. The model's performance in areas requiring broad world knowledge, such as the MMLU (Massive Multitask Language Understanding) benchmark, lags behind its larger counterparts. OpenAI acknowledges this trade-off, noting that o1-mini's specialisation allows for its increased efficiency in targeted applications.
The release of o1-mini also brings improvements in processing speed. In a direct comparison, o1-mini was found to be 3-5 times faster than GPT-4o in solving a word reasoning question, while maintaining accuracy.
Safety remains a priority for OpenAI. The company reports that o1-mini demonstrates a 59% higher jailbreak robustness compared to GPT-4o on an internal version of the StrongREJECT dataset, indicating improved resistance to potential misuse.
Specialised models like o1-mini represent a new direction in balancing performance with cost-efficiency. OpenAI's focus on STEM capabilities in this release suggests a growing trend towards task-specific AI solutions.