Presenting in-depth research
An AI that can perform multi-step research activities for you and synthesise vast volumes of web material using reasoning. Pro users can access it now, followed by Plus and Team users.
OpenAI is introducing deep research in ChatGPT today, a new agentic capability that performs multi-step online research for challenging issues. It completes tasks that would take a human several hours in tens of minutes.
When you give ChatGPT a prompt, OpenAI’s next agent, Deep Research, will locate, evaluate, and synthesise hundreds of internet sources to provide an extensive report at the level of a research analyst. It uses reasoning to search, understand, and analyse vast volumes of text, photos, and PDFs on the internet, making necessary adjustments in response to material it comes across. It is powered by a variant of the future OpenAI o3 model that is tailored for web browsing and data analysis.
To create new knowledge, one must be able to synthesise existing knowledge. Because of this, deep research is an important step in the direction of its larger objective of creating AGI, which it has long imagined to be able to generate original scientific research.
Why we conducted in-depth research
People who need detailed, accurate, and trustworthy research in fields like finance, science, policy, and engineering are the target audience for deep research. For discriminating consumers seeking highly customised suggestions on products that usually necessitate thorough investigation, such as furniture, appliances, and cars, it can be just as helpful. It is simple to refer to and validate the information because each output is thoroughly documented, complete with concise citations and an explanation of its reasoning. It works especially well for locating specialised, counterintuitive information that would otherwise require perusing a large number of websites. By enabling you to offload and expedite intricate, time-consuming web research with a single query, deep research saves you important time.
Deep research independently gathers, analyses, and synthesises information from the internet. It was trained using the same reinforcement learning techniques as OpenAI o1, its first reasoning model, on real-world problems that required the use of browsers and Python tools. Even though O1 shows remarkable aptitude in coding, mathematics, and other technological fields, many real-world problems necessitate obtaining a great deal of background knowledge and information from a variety of internet sources. To close that gap, deep research expands on these reasoning skills, enabling it to tackle the kinds of issues people encounter in their daily lives and at work.
How to apply in-depth research
Enter your question in ChatGPT’s message composer after choosing “deep research.” Tell ChatGPT what you require, such as a customised research on the best commuter bike or a competitive comparison of streaming services. Spreadsheets or files can be attached to your query to provide context. A sidebar containing an overview of the actions performed and the sources consulted appears once it has begun to run.
Due to the time required to go deeply into the web, extensive research can take anywhere from five to thirty minutes to finish. You can take a break or work on other projects in the interim; you’ll be notified when the research is finished. In the next weeks, the finished product will be delivered as a report in the chat. For further context and clarity, it will also be including data visualizations, embedded photos, and other analytical outputs in these reports.
GPT-4o is better suited for multimodal, real-time communication than thorough research. The capacity of deep research to carry out in-depth investigation and cite every assertion is what separates a brief synopsis from a thoroughly documented, validated response that may be used as a work product for complex, domain-specific questions where depth and detail are crucial.
How it operates
End-to-end reinforcement learning was used to train deep research on challenging browsing and reasoning tasks in a variety of topics. It gained the ability to plan and carry out a multi-step trajectory to locate the data it requires, going back and responding to real-time information as needed, thanks to that training. Additionally, the model can plot and iterate on graphs using the Python tool, search through user-uploaded files, incorporate website images and created graphs in its answers, and reference certain lines or passages from its sources. It achieves unprecedented heights on several public assessments that centre on real-world issues as a result of this training.
The Final Exam of Humanity
The model driving deep research achieves a new high of 26.6% accuracy on Humanity’s Last Exam, a recently published assessment that evaluates AI on expert-level questions across a wide range of areas. This test covers more than 100 areas, from linguistics to rocket science, classics to ecology, and includes more than 3,000 multiple-choice and short answer questions. The fields with the biggest improvements over OpenAI o1 were mathematics, chemistry, and the humanities and social sciences. The deep research model demonstrated a human-like approach by efficiently locating specialised information when needed.
Model | Accuracy (%) |
---|---|
GPT-4o | 3.3 |
Grok-2 | 3.8 |
Claude 3.5 Sonnet | 4.3 |
Gemini Thinking | 6.2 |
OpenAI o1 | 9.1 |
DeepSeek-R1 | 9.4 |
OpenAI o3-mini (medium) | 10.5 |
OpenAI o3-mini (high) | 13.0 |
OpenAI deep research | 26.6 |
GAIA
The model driving deep research achieves a new state of the art (SOTA) on GAIA, a public benchmark that assesses AI on real-world issues, and tops the external leaderboard. Completing these exercises, which consist of questions with varying degrees of difficulty, calls on skills in thinking, multimodal fluency, web browsing, and tool proficiency.
Restrictions
Although deep research is still in its infancy and has limitations, it opens up important new possibilities. Internal evaluations show that it occasionally makes inaccurate inferences or hallucinates facts in responses, albeit at a significantly lower rate than current ChatGPT models. It may have trouble telling the difference between rumours and reliable information, and it currently exhibits a lack of confidence calibration, frequently failing to appropriately express doubt. Tasks may take longer to begin, and reports and citations may contain small formatting mistakes at launch. With increased use and time, it anticipate that all of these problems will be resolved swiftly.
Get in
Currently, deep study in ChatGPT requires a lot of computing power. More inference computation is needed the longer it takes to investigate a question. Today, it is launching a version that is tailored for Pro users, handling up to 100 enquiries monthly. Next in line for access will be Plus and Team users, then Enterprise. Its efforts to provide access to users in the European Economic Area, Switzerland, and the United Kingdom are ongoing.
It will soon introduce a speedier, more affordable version of deep research driven by a smaller model that nevertheless yields high-quality findings, which will result in much greater rate limitations for all premium users.
It’ll be working on the technical infrastructure, keeping a careful eye on the current release, and carrying out even more thorough testing in the upcoming weeks and months. This is consistent with its iterative deployment principle. OpenAI plans to deliver deep research to Plus users in approximately one month, provided that all safety tests continue to satisfy its release requirements.
What comes next?
Deep study is currently accessible on ChatGPT’s web platform and will be made available on PC and mobile apps later this month. Deep research currently has access to any uploaded files and the public web. To make its output even more reliable and customised, you will eventually be able to connect to more specialised data sources, increasing its access to internal or subscription-based resources.
In the future, we see agentic experiences combining in ChatGPT for asynchronous, practical research and implementation. ChatGPT will be able to do more complex jobs for you because to the combination of deep research, which can conduct asynchronous online investigation, and operator, which can take action in the actual world.