Alibaba trains AI search skills without live web queries

Alibaba’s research lab Tongyi has introduced ZeroSearch, a method for training large language models to use search without calling real search engines during training. The approach uses a second language model to simulate search results, giving researchers more control while lowering costs in the experiments described.

WTF Index TERMINATOR
◄ Terminator 1 Idiocracy 0 ►

ZeroSearch modestly advances LLM capability by training models to use search behavior more cheaply and controllably, without clear harm or societal degradation.

Alibaba trains AI search skills without live web queries

Alibaba’s research lab Tongyi has introduced ZeroSearch, a training method designed to teach large language models how to handle search tasks without depending on live web searches during training.

The idea is simple in outline but important in practice: instead of sending every training query to a real search engine, ZeroSearch uses another language model to generate short simulated search results. Those results can be useful, useless, clear, or more difficult to interpret, depending on how the researchers want the training process to behave.

Why search training matters

Chatbots can answer many questions from what they already know, but that built-in knowledge is not always enough. For questions that require fresh lookup behavior or multiple pieces of information, a model needs to learn when to search, how to write a query, and how to use the information it receives.

Many current approaches train this skill with reinforcement learning, or RL, while relying on actual search engines such as Google. Alibaba’s team says that setup is expensive, hard to control, and difficult to scale.

ZeroSearch changes the training environment. The model still practices search behavior, but the search results come from a simulated source rather than an external search service. That gives the researchers control over both the availability and the difficulty of the information returned.

How ZeroSearch simulates the web

In the system described, Qwen-2.5 is the main language model being trained. During each round, it first decides whether more information is needed. If it decides to search, it creates a query and sends that query to the simulation model.

The simulation model then produces short documents in response. These documents can contain relevant information, or they can be intentionally irrelevant. The main model reviews the generated material, forms an answer, and receives feedback through RL.

The process is structured so that the training data does not stay equally easy from start to finish. Early in training, the simulated results are made helpful. Over time, their quality is gradually reduced through a curriculum learning approach.

That gradual change is central to the method. It pushes the model to do more than copy obvious answers from clean results. It has to learn how to reason from information that may be incomplete, unclear, or conflicting, which is closer to the challenge of real search.

The simulation model is also prepared in advance. It is fine-tuned to generate both “useful” and “useless” search results, with the distinction controlled through subtle prompt changes.

Multi-step questions are the key test

A major goal of ZeroSearch is to help a model manage searches that require more than one step. Some questions cannot be answered by a single lookup because the first answer becomes the input for the next search.

The source example asks: "Who is the spouse of the person who voices Smokey the Bear?" In that test, the simulated search first identifies Sam Elliott as the voice actor. The model then performs a second simulated search for Sam Elliott’s spouse and finds Katharine Ross.

The important behavior is not only retrieving those names. The model has to break the original question into sub-questions, keep track of the intermediate result, and combine the findings into one final answer.

That type of multi-level search is one of the clearest reasons to train search behavior directly. A model that can decide when to search again, rather than stopping after the first result, is better suited to questions that require linked pieces of information.

Lower costs and tighter control

The cost comparison in the experiments is one of the most concrete arguments for ZeroSearch. Running 64,000 searches through Google’s SerpAPI cost about $586 in API fees. Running the simulation model on four rented AWS A100 GPUs cost $71 in compute time.

Cost is only part of the benefit. A simulated search system is always available, returns responses in a consistent style, and can be made easier or harder as needed. According to the team, that makes training more predictable and robust.

This matters because real search engines are not designed primarily as training environments for language models. Their results can vary, they come with external costs, and researchers have less control over the exact type of difficulty the model sees.

With simulation, the training process becomes more like a controlled practice field. The researchers can decide whether the model sees helpful documents, noisy documents, or documents that force it to separate useful information from useless material.

Benchmark results and model size

Alibaba’s team evaluated ZeroSearch on seven well-known question-answering benchmarks, including Natural Questions, TriviaQA, and HotpotQA. The method matched or outperformed approaches trained with real Google searches, especially when the simulation model had 14 billion parameters.

Smaller models with 7 billion parameters also performed well. The source makes clear, however, that size was not the only factor. Fine-tuning the simulation model for the task mattered strongly, while models controlled only by prompts performed much worse.

That finding points to a practical lesson for search-assistant training. It is not enough to place a generic model behind a search-like interface and rely on prompts alone. The model generating simulated results has to learn the difference between useful and useless search output in a way that supports the larger training process.

Alibaba has released some of its models on HuggingFace, and more details and code are available on GitHub. For teams working on AI search assistants, ZeroSearch presents a notable alternative to live-search training: keep the search behavior, remove the dependency on real search engines during training, and use simulation to control difficulty at scale.