February 26, 2026

Introducing the You.com Research API—#1 on DeepSearchQA

Bar chart showing model accuracy on DeepSearchQA; Frontier leads at 83.67%, followed by others ranging from 81.9% down to the lowest score of 21.33%.

TLDR: Today, we are launching our Research API, a new deep search offering that ranks #1 on DeepSearchQA with the highest publicly reported 83.67% accuracy and 93.16% F1 score. Our scalable research compute capabilities enable us to push the frontier of accuracy and latency across both simple and complicated tasks, as shown by our top performance across industry benchmarks ranging from SimpleQA to DeepSearchQA.

Our Approach

After significant experimentation and technical deep dives, we designed a robust harness to minimize human-steering while maximizing agent autonomy. Our harness drives the most powerful models to effectively navigate problems of varying complexities. In tandem, we ensure that the data that we retrieve and extract is of the highest quality and relevancy. This approach powers our Research API, achieving state-of-the-art results. 

Chart of the highest accuracy endpoint and mode combination across all available SKUs.

How does this play out in the real world? The research path our Research API takes varies significantly based on the type of query. A mathematics question follows a different trajectory than a compliance question or a competitive landscape analysis.

The robust harness, plus super-charged tooling has been designed with extreme thoughtfulness on what data to include, ensuring that Research performs well.

Our Search API as the Core Primitive

We have heavily invested in mapping and understanding public web content via our Search, Contents, and Live News APIs. Research is designed to leverage these core APIs as tools in the best way possible. This cuts down on wasted calls and gives the model cleaner inputs, leading to increased efficiency. 

Research works to ensure sources retrieved are appropriate for the task at hand, in terms of freshness, diversity, and other core query-specific qualities across its depth-focused exploration. 

Managing Context

Deep research generates far more information than any frontier LLM's context window can hold. We built context-masking and compaction strategies that let Research operate well beyond those limits, maintaining coherent reasoning across hundreds or thousands of turns without losing track of what it found, what it verified, and what is still unresolved.

Your Constraints and Choices

The Research API receives a budget based on the research_effort tier you choose—lite, standard, deep, exhaustive, or frontier. Agent scaling is the primary mechanism that allows us to push the frontier across cost, accuracy, and latency. The system plans its approach around your budget and allocates effort where required to ensure all constraints are met. As an example, the system will spend more time verifying high-stakes, ambiguous claims versus clear, well-sourced facts.

For particular long-horizon deep research tasks, Research will run more than 1000 reasoning turns and expend up to 10 million tokens on a single query. This design is what makes a wide range of latency and accuracy tradeoffs possible.

Pushing the Frontier

To showcase our Research API’s capabilities, we benchmarked across a breadth of industry standard search and research benchmarks, including SimpleQA, FRAMES, BrowseComp, and DeepSearchQA. These benchmarks include everything from simple, single-hop questions to more complicated, multi-hop questions, highlighting the flexibility of our Research API.

SimpleQA (OpenAI)

Our deep and exhaustive research efforts achieve the highest accuracy on SimpleQA, with lower latency than other APIs in the same accuracy range.

SimpleQA is a 4,326-question benchmark of short, fact-seeking questions designed to test factual accuracy on single-hop lookups.

Scatter plot showing accuracy vs. latency for multiple AI models, highlighting You.com models with higher accuracy and lower response times.

FRAMES (Google) 

Our exhaustive research effort is in the highest accuracy range, with latency at 1/7th of providers in the same accuracy range.

FRAMES is an 824-question benchmark testing factuality, retrieval accuracy, and multi-step reasoning across questions that require synthesizing information from multiple sources.

Scatterplot comparing AI models, showing accuracy versus log latency, with You.com models clustered in the high‑accuracy, low‑latency region.

BrowseComp (OpenAI) 

Our exhaustive research effort is the second highest accuracy, with latency at 1/4th of the offering with the highest accuracy.

BrowseComp is a 1,266-question benchmark evaluating whether a system can find specific, verifiable facts that require navigating and extracting information across multiple web pages.

Scatterplot of model accuracy vs latency; highest accuracy is pro at 46.53%, with you.com exhaustive shown as the top balanced performer.

DeepSearchQA (Google DeepMind) 

Our frontier research effort achieves the highest accuracy and F1 scores in the industry. 

To obtain F1 and latency scores across benchmarks (which are not often publicly reported), we ran various research APIs on the full DeepSearchQA set. We also report the top accuracies reported in the industry as well for comparison purposes.

DeepSearchQA is a 900-prompt dataset evaluating agents on difficult multi-step information-seeking tasks across 17 fields.

Bar chart titled DeepSearchQA comparing model accuracies, showing You.com Frontier highest at 83.67% and others decreasing down to 21.33%.
Chart of the highest accuracy endpoint and mode combination across all available SKUs.
Scatterplot comparing DeepSearchQA models, showing F1 scores versus log latency, with several you.com variants achieving higher F1 at different speeds.

Research API Details

To provide developers with flexibility based on accuracy, cost, and latency requirements, Research offers a research_effort parameter, which controls how much research compute is utilized to generate a response. 

For the query, “Which global cities improved air quality the most over the past 10 years, and what measurable actions contributed?” we show the variation between the standard vs exhaustive research effort here:

Shell
curl --request POST \ --url https://api.you.com/v1/research \ --header 'Content-Type: application/json' \ --header 'X-API-Key: API_KEY' \ --data '{ "input": "Which global cities improved air quality the most over the past 10 years, and what measurable actions contributed?", "research_effort": "standard" }'

Here are example responses from the Research API (which are abridged for display purposes):

research_effort = standard research_effort = exhaustive
{
  "output": {
    "content": "Global assessments show that the largest PM2.5 declines since 2010 have been concentrated in East China, parts of the eastern United States, Europe, and Japan, with steeply rising gains in these megacities and cities with aggressive traffic-emission controls such as London [[2, 3, 9, 11]].\n\n1) **Beijing (China)** — PM2.5 fell from ~89–90 µg/m³ in 2013 to ~58 µg/m³ in 2017 (about 35–38% in five years), with evidence from both satellite and surface observations [[4, 5]].\n\n**Key drivers** included coal phase-down, industrial controls, stricter vehicle-fuel standards, and regional enforcement [[6, 7, 8]].\n\n2) **Chinese cluster of cities (BTH / YRD / PRD)** — China's population-weighted PM2.5 fell ~32% from 2013–2017, with the largest modeled decline in Beijing–Tianjin–Hebei (~33% across 367 cities), observed PM2.5 fell ~44% from 2013–2019 [[10]].\n\n3) **London (UK)** — London achieved major NO2 reductions linked to ULEZ/LEZ policies, with PM2.5 trends slower but studies show accelerated declines after ULEZ implementation and meaningful reductions versus no-ULEZ scenarios [[11, 12, 13, 14]].\n\n...",
    "content_type": "text",
    "sources": [
      {
        "url": "https://pubmed.ncbi.nlm.nih.gov/36356738/",        "title": "Trends in urban air pollution over the last two decades: A global perspective - PubMed",
        "snippets": [
          "At global scale, PM2.5 exposures declined slightly from 2000 to 2019...",
          "A 3–5-fold reduction in PM2.5 exposure was observed in the Eastern US, Europe, Southeast China, and Japan..."
        ]
      }
    ]
  }
}
{
  "output": {
    "content": "There is no single definitive "top 10" ranking, but several independent peer‑reviewed studies and case studies point to a small set of cities and regions where PM2.5 fell the fastest in the last decade. These places all share two features: (1) large, quantified reductions in PM2.5 or NO2 are documented, and (2) specific policies can be tied to these improvements.\n\n1) **Beijing (and other major Chinese cities)**\nBeijing's PM2.5 fell from roughly 89–102 µg/m³ in 2013 to about 32–39 µg/m³ by 2023, implying approximately 60–65% reduction over about a decade [[1, 2, 3]]. National analyses also show that China's clean‑air policies across hundreds of Chinese cities reduced PM2.5 pollutants, with an estimated 30% of China's air‑quality PM2.5 declines due to policy.\n\n**Key policies:** coal-to-gas transitions, multi-sector coordination, industrial restructuring, tighter emission standards, regional coordination.\n\nAtmospheric modeling indicates most of Beijing's 2013–2017 PM2.5 improvement was due to emissions reductions rather than weather variation [[2, 8]].\n\n2) **Seoul metropolitan area (Seoul, Incheon, Gyeonggi)**\nThe Seoul region has strong evidence of long-term emissions reductions tied to policy, including UNEP reports of large reductions in fine particulate emissions (about 75% reduction in Seoul's emitted PM2.5 mass) and large ambient declines over 2005–2020 [[14]]. Measures included vehicle emission controls, CNG buses, industrial restructuring.\n\n3) **London (ULEZ, LEZ, congestion charging)**\nLondon achieved major NO2 and PM reductions associated with ULEZ/LEZ policies. Key actions included vehicle emissions standards, cleaner buses, incentives, and industrial controls. The response aligns with available studies.\n\n...",
    "content_type": "text",
    "sources": [
      {
        "url": "https://sustainablemobility.iclei.org/air-pollution-beijing/",        "title": "Clearing the skies: how Beijing tackled air pollution & what lies ahead - ICLEI Sustainable Mobility",
        "snippets": [
          "China played a vital role, accounting for three-quarters of global air pollution reductions from 2013–2020...",
          "The annual average PM2.5 concentrations decreased..."
        ]
      }
    ]
  }
}

In both instances, the response includes an answer with citations, along with source attribution and full citations. The higher research_effort, exhaustive, call has more effort extended to identify additional cities, more granular data, and completes more thorough cross-referencing to ensure validity. 

The structure of the response is simple and prioritizes ease of use for downstream workflows. In both instances, the response includes an answer with citations, along with source attribution and full citations. The higher research_effort, exhaustive, call has more effort extended to identify additional cities, more granular data, and completes more thorough cross-referencing to ensure validity. 

The structure of the response is simple and prioritizes ease of use for downstream workflows.

Pricing

research_effort USD per 1000 requests Latency
lite $6.50 <2s
standard (default) $50 ~10–30s
deep $100 <120s
exhaustive $300 <300s
frontier*
*Contact You.com for usage
>$2000 >1,000s

Pricing is fixed per tier.

Getting Started

1. Sign up and get your API key. No credit card required for testing

2. Full documentation: docs.you.com/api-reference/research/v1-research

3. Full eval code coming soon. We want you to run it.

Featured resources.

All resources.

Browse our complete collection of tools, guides, and expert insights — helping your team turn AI into ROI.

Company

Welcoming Saahil Jain as Next You.com CTO

Richard Socher

You.com Co-Founder & CEO

March 17, 2026

Blog

Modular AI & ML Workflows

10 Creative Ways to Use AI Web Search & Research in Your n8n Workflows

Tyler Eastman

Lead Android Developer

March 13, 2026

Blog

Product Updates

You.com Search APIs: More Value at a Lower Cost

You.com Team

March 11, 2026

Blog

Comparisons, Evals & Alternatives

Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

Zairah Mustahsan

Staff Data Scientist

March 10, 2026

News & Press

Measuring & Demonstrating ROI

AI for Efficiency: Where It Delivers Results and Where It Falls Short

You.com Team

March 10, 2026

Blog

Rag & Grounding AI

Why AI with Real-Time Data Matters

You.com Team

March 5, 2026

Blog

AI 101

Effective AI Skills Are Like Seeds

Edward Irby

Senior Software Engineer

March 2, 2026

Blog

Surreal collage featuring fragmented facial features layered with abstract shapes on a black‑to‑blue gradient background.
Rag & Grounding AI

AI Hallucination Prevention and How RAG Helps

Megna Anand

AI Engineer, Enterprise Solutions

February 27, 2026

Blog