OpenAI has unveiled "deep research," a groundbreaking agentic capability that conducts multi-step internet research for complex tasks, transforming what would take human analysts many hours into a process completed in minutes. The new tool, powered by a version of the upcoming OpenAI o3 model optimised for web browsing and data analysis, represents a significant advancement in AI's ability to synthesise knowledge from diverse online sources.

Deep research autonomously discovers, reasons about, and consolidates insights from across the web, handling complex queries that require extensive context-gathering from diverse sources. The agent can analyse hundreds of online sources including text, images, and PDFs, creating comprehensive analyst-level reports with proper citations.

"The ability to synthesise knowledge is a prerequisite for creating new knowledge," states OpenAI, highlighting how this capability advances their broader goal of developing AGI capable of producing novel scientific research.

The tool was built specifically for professionals conducting intensive knowledge work in domains like finance, science, policy, and engineering who require thorough, precise, and reliable research. It's also applicable for consumer use cases requiring detailed analysis, such as personalised product recommendations that typically demand careful research.

Deep research extends OpenAI's reasoning capabilities through an advanced AI agent that can use browsers and Python tools, reacting to information it encounters. The system was trained on real-world tasks requiring browser and Python tool use, leveraging reinforcement learning methods used in OpenAI's o1 reasoning model.

When users submit queries through ChatGPT, they select the "deep research" option and can attach contextual files or spreadsheets. A sidebar appears showing steps taken and sources used, with research completion times ranging from 5-30 minutes. Users receive notifications when the research is complete, allowing them to work on other tasks during processing.

A key differentiator from standard GPT models is deep research's ability to conduct extensive exploration and properly cite each claim, transforming quick summaries into well-documented, verified answers usable as work products. The system will soon incorporate embedded images, data visualisations, and other analytic outputs for enhanced clarity.

Evaluations show the new system delivers exceptional performance on complex tasks. On "Humanity's Last Exam," a test of expert-level questions across 100+ subjects, the model powering deep research scored 26.6% accuracy—a substantial improvement over previous models like OpenAI o1 (9.1%) and Claude 3.5 Sonnet (4.3%).

The system also achieved state-of-the-art results on GAIA, a benchmark for real-world AI tasks requiring reasoning, multi-modal fluency, and tool use. Internal assessments found that deep research automated multiple hours of difficult, manual investigation work across various domains.

Deep research is currently available to Pro users (limited to 100 queries monthly) due to its compute-intensive nature, with expansions planned for Plus, Team, and Enterprise users. A faster, more cost-effective version powered by a smaller model is in development to increase rate limits for all paid users.

Future enhancements include integration with specialised data sources, expanding access to subscription-based or internal resources for more robust and personalised outputs. OpenAI envisions combining deep research with its "Operator" tool to create a comprehensive agentic system capable of both online investigation and real-world action, enabling increasingly sophisticated task completion.


Share this post
The link has been copied!