LLMs as Scalable, General-Purpose Simulators for Evolving Digital Agent Training

**Yiming Wang$^{2, \star, \spadesuit}$ Da Yin$^{1, \star, \spadesuit, \heartsuit}$ Yuedong Cui$^{1, \star}$ Ruichen Zheng$^{1, \star}$ Zhiqian Li$^1$**

**Zongyu Lin$^1$ Di Wu$^1$ Xueqing Wu$^1$ Chenchen Ye$^1$ Yu Zhou$^1$ Kai-Wei Chang$^1$**

$^1$UCLA $^2$Harvard University $^\star$Co-First Authors $^\spadesuit$Co-Lead$_{\textrm{Alphabetical Order}}$ $^\heartsuit$Equal Advise

Last Updated on Oct 16, 2025 | 📄: Arxiv | :github:: Github | 🤗: Huggingface

We welcome questions and feedback, and are open to discussion and collaboration! Please feel free to contact us at [email protected] and [email protected].

<aside>

TL;DR

Can we train UI agents with a few experience on real environments or even without any?

We introduce $\textbf{UI-Simulator}$, a scalable paradigm that synthesizes training trajectories at scale with LLM-based digital world simulator.
We further propose $\textrm{\textbf{UI-Simulator-Grow}}$, a targeted scaling strategy that enables more rapid and data-efficient scaling by prioritizing high-impact tasks and synthesizes informative trajectory variants.
On WebArena and AndroidWorld, $\textbf{UI-Simulator}$ rivals or exceeds open-source agents trained on real UIs, and $\textrm{\textbf{UI-Simulator-Grow}}$ matches $\texttt{Llama-3-70B}$ performance using only $\texttt{Llama-3-8B}$ base model. </aside>

Figure 1: Performance highlights of UI-Simulator and its empowered UI-Simulator-Grow. In particular, UI-Simulator could outperform the same data collection process on real-world environments; UI-Simulator-Grow could bring more rapid scaling trend than UI-Simulator.

Why can LLMs simulate digital world?

We notice that most digital UI environments, including web, mobile, and computer, can be represented as structured textual accessibility trees. Pre-training on front-end code and procedural knowledge makes LLMs suitable as a backbone model to synthesize reasonable UI states and state transitions triggered by user actions.

How can LLMs simulate UI states and transitions?

(Retrieval-Free) Simulation

The transition follows a multi-step pipeline that guides the world simulator to anticipate outcomes, infer coherent and diverse next states, and render them into a structured format.