Alibaba Releases Tongyi DeepResearch — the open-source agent built for deep, long-horizon web research

Release date: September 2025 (announced Sept 17, 2025)
Short summary: Alibaba’s Tongyi Lab open-sourced Tongyi DeepResearch, a 30B-parameter, Mixture-of-Experts (MoE) web-agent designed specifically for long-horizon, multi-step information seeking. It activates only ~3B parameters per token, claims state-of-the-art results on specialized agentic search benchmarks, and ships with the code, framework and download links for researchers and developers. Venturebeat+1

What is Tongyi DeepResearch?

Tongyi DeepResearch is an agentic large language model (LLM) and web-agent framework from Alibaba’s Tongyi Lab aimed at “deep research” tasks — think multi-step queries that require browsing, extracting facts from many sources, synthesizing structured answers, and iterating like a human researcher. Rather than a general chat assistant, it’s explicitly engineered to run extended information-seeking policies (planner → search → extract → synthesize loops) and to scale computation efficiently using Mixture-of-Experts (MoE) architecture. Tongyi DeepResearch+1

Key engineering highlights published and demonstrated by Tongyi Lab:

30 billion total parameters in the full model, but a MoE routing design so only ~3 billion parameters are activated per token — a design that reduces inference compute while preserving expressivity. GitHub+1
Agentic framework and tooling (planner, web interface, extraction modules) open-sourced alongside model weights and example agents. The project isn’t just a checkpoint — it’s a complete stack for building web-research agents. Tongyi DeepResearch
Benchmarks and claims: Tongyi DeepResearch reports SOTA results on agentic search and web reasoning benchmarks such as BrowserComp, WebWalkerQA, xbench-DeepSearch and Humanity’s Last Exam. The lab published benchmark numbers and comparison tables in its blog and repository. Hugging Face+1

Why the architecture choice matters: MoE and “activated” parameters

Two related trends underpin this release:

Mixture-of-Experts (MoE) architectures let a very large model contain many specialized sub-modules (“experts”) while routing only a subset to process any given token. That means models can be big in capacity but cheaper per-token to run. Tongyi DeepResearch’s 30B total / 3B activated split follows this efficient compute pattern used by other recent large models. This tradeoff is especially attractive for agentic tasks where token context can be long and diverse — you want broad capability without linear increases in cost. GitHub+1
Specialization for long-horizon tasks. Deep research tasks require multi-step planning, remembering what’s found earlier, following threads across pages, and maintaining citation fidelity. That’s different from single-prompt chat completions. By co-designing the model with the framework — planners, retrievers, extractors — Tongyi DeepResearch aims to be both robust at reasoning across many sources and efficient when deployed. Tongyi DeepResearch

How it performs — benchmarks and comparisons

Tongyi Lab released benchmark results comparing its agent against other state-of-the-art web agents. On multiple agentic benchmarks (BrowserComp-EN and ZH, WebWalkerQA, GAIA, xbench-DeepSearch, Humanity’s Last Exam), Tongyi DeepResearch reports top scores and sometimes substantial gains over existing public agents. Independent commentary and third-party coverage highlight that Tongyi’s approach narrows the gap — or even exceeds — comparable systems that rely on much larger dense models or proprietary toolchains. Hugging Face+1

Important context when reading benchmarks:

Benchmarks for web agents differ from traditional LLM benchmarks (e.g., MMLU) — they measure things like multi-step retrieval fidelity, citation accuracy, ability to follow tool instructions, and end-task synthesis quality.
Results are often sensitive to evaluation setup (search APIs, allowed tool chains, prompt engineering), so while Tongyi’s numbers are impressive, cross-lab reproducibility and deeper audits matter. Multiple outlets note strong scores but also call for broader community validation. Venturebeat+1

Open-source release: what’s available and where to get it

Alibaba Tongyi Lab has made available:

Model checkpoints (the MoE model weights) and smaller exported formats on GitHub, Hugging Face, and related model hubs.
The agent framework, example planners, web-wrapping tools, and scripts to run the agent locally or on private infrastructure. GitHub+1

Practical notes for developers:

The GitHub repository contains the research paper, training details, inference recipes and code for the planner/extractor modules. This enables replication, fine-tuning, or integrating the agent into production pipelines. GitHub
Hugging Face hosts convenient model cards and downloadable artifacts for researchers who prefer that flow. There are also community conversions (e.g., GGUF) to help run the model on diverse runtimes. Hugging Face+1

Use cases: where Tongyi DeepResearch can be applied

Tongyi DeepResearch targets tasks that benefit from agentic behavior and sustained information synthesis. Examples include:

Legal and regulatory research: walk multiple sources, extract statutes and precedents, synthesize a memo with inline citations. The planner-search cycle maps well to document-level legal workflows. Tongyi DeepResearch
Scientific literature review: follow citations, fetch papers, summarize experimental methods and create structured comparison tables. The model’s long-horizon design suits multi-document synthesis. Tongyi DeepResearch
Competitive intelligence & business research: gather and reconcile facts from news, filings, and blogs across time, generating evidence-backed briefings. Tongyi DeepResearch
Automated travel planning, product research, or multi-step assistance: any domain where the agent must chain dozens (or hundreds) of micro-searches into a coherent deliverable. apidog

Because the project is open source, organizations can also adapt the agent to private data sources (enterprise search indexes, internal docs) and tune the planner/extractor behavior for compliance or domain specificity. Tongyi DeepResearch

Business and industry significance

Alibaba’s open-sourcing of a high-performing agent matters for several industry reasons:

Leveling the playing field: previously, top agentic systems were mostly available only via proprietary APIs (e.g., some offerings from US labs). An open agent that performs competitively gives researchers, startups, and enterprises an alternative they can run on premise or modify. That has implications for competition, innovation and vendor lock-in. Venturebeat+1
Cost-performance tradeoffs: the MoE design offers an attractive model for organizations that need powerful agentic behavior but want to manage inference costs. Running a 3B-activated token workload is materially cheaper than always invoking a full dense 100B model, especially at scale. GitHub
Cloud and chip ecosystems: Alibaba Cloud already sells Qwen family models and enterprise AI offerings. This release strengthens Alibaba’s cloud AI stack and may accelerate enterprise adoption of their cloud GPUs and accelerators — a point picked up in financial and industry reporting. AlibabaCloud+1

Safety, reliability and ethical considerations

Open-sourcing a capable agentic model has clear benefits — reproducibility, inspection, and innovation — but it also raises safety and misuse concerns. Tongyi Lab acknowledges these tradeoffs and includes various guardrails and evaluation artifacts, but the broader community must remain vigilant:

Hallucination & citation fidelity: web agents can hallucinate plausible but false statements if extractors or retrievers return noisy sources. Evaluation should prioritize verifiable citations and provenance. Tongyi DeepResearch
Manipulation & automation risk: a capable agent that can read, synthesize and act on the web can be repurposed for content scraping at scale, automated disinformation pipelines, or abusive automation. Responsible deployment, rate limits, and use-case governance are essential. Venturebeat
Data privacy & IP: integrating proprietary corpora or paywalled sources into such agents raises legal and contractual questions. Enterprises adopting Tongyi DeepResearch need to audit rights and compliance. Tongyi DeepResearch

Public release invites the community to audit, build mitigations, and stress test the model — a net positive if paired with rigorous responsible-AI practices and external audits. Several commentary pieces call for independent benchmarks and red teaming to validate claims and probe failure modes. Venturebeat+1

How researchers and developers can get started

Read the paper and repo. Start with the Tongyi Lab blog post and GitHub repository which describe the design, training recipes, and evaluation. These documents contain reproducibility details you’ll need before running or adapting the model. Tongyi DeepResearch+1
Download checkpoints and examples. Hugging Face hosts model artifacts and community conversions if you want to try inference locally or in cloud instances. Look for recommended inference scripts and example planner configurations. Hugging Face+1
Plan infrastructure. Because the model is MoE-based, your runtime needs to support MoE routing or accept converted dense variants. Evaluate memory, compute and latency tradeoffs for your target workload. GitHub
Audit and benchmark. Run the model on your own test suites (citation accuracy, hallucination rates, latency, cost) and consider red-teaming before exposing it to end users. Venturebeat

Limitations and open questions

While the release is important, several limitations and open questions remain:

Reproducibility across environments. MoE models can be sensitive to routing implementations; reproducing the exact published results may require precise toolchains and compute. The community will test this. GitHub
Benchmarking nuance. Agentic benchmark evaluations are still an evolving science. Differences in search backends, allowed plugins, and evaluation metrics complicate head-to-head claims. Independent third-party evaluations will be informative. Venturebeat
Operational complexity. Deploying and maintaining web agents in production is nontrivial: retriever freshness, rate limits on target websites, source reliability, and compliance are operational concerns enterprises must address. Tongyi DeepResearch

The bigger picture: what this means for the agent era

Tongyi DeepResearch is a vivid marker of how the agent era is evolving:

We’re shifting from monolithic chat models to systems — models plus toolchains plus planners — designed to solve extended tasks. Alibaba’s release demonstrates that open-source teams can assemble these systems and publish competitive results. Tongyi DeepResearch
Efficiency matters. MoE architectures and clever activation strategies are enabling teams to trade total capacity for per-token cost, democratizing access to agentic capabilities. GitHub
Open releases accelerate research and scrutiny. An open DeepResearch-class agent invites security researchers, academics and practitioners to validate claims, reproduce results, and propose fixes — a crucial step towards accountable agent development. Futunn News

Conclusion — a practical, open foundation for deep web research

Alibaba’s Tongyi DeepResearch is more than a model checkpoint — it’s an attempt to package an entire agentic research stack and hand it to the community. The combination of a MoE 30B model with low per-token activation, open tooling for planning and extraction, and competitive benchmark results makes it a noteworthy milestone in the agent landscape. For enterprises and researchers, it represents a reachable baseline to experiment with long-horizon, multi-document workflows without being locked into proprietary APIs. For the wider public, it raises the usual mix of promise (democratized capability, reproducibility) and concern (misuse risk, hallucination, operational complexity) that accompanies any open release of powerful AI systems. Tongyi DeepResearch+2Hugging Face+2

For quick updates, follow our whatsapp channel –https://whatsapp.com/channel/0029VbAabEC11ulGy0ZwRi3j

https://bitsofall.com/https-yourdomain-com-google-ai-introduces-agent-payments-protocol-ap2/

https://bitsofall.com/https-yourdomain-com-google-ai-ships-timesfm-2-5/

OpenAI Introduces GPT-5-Codex: Redefining the Future of AI-Powered Coding

MoonshotAI Released Checkpoint-Engine: Fast, Lightweight Middleware to Update LLM Weights