Arena began as a UC Berkeley research project in 2023. It is now a fast-growing company built around one of the AI industry’s most watched model leaderboards.
Just eight months after launching its commercial service, Arena has reached $100 million in annualized run-rate revenue. The milestone shows how demand for AI model evaluation has moved from a public research function into a large commercial market.
From public leaderboard to paid evaluation platform
Arena is best known for its crowdsourced AI model performance leaderboard. The leaderboard is generated from over 10 million user evaluations, making it a widely followed signal of how different AI systems perform across tasks.
The consumer experience is simple. A user enters a prompt, Arena sends it to two models, and the user then chooses which response did a better job. Those comparisons help shape the rankings that have made Arena a reference point for model performance.
The public leaderboard remains free to use. Arena’s business, however, began in September, when the company introduced AI Evaluations.
AI Evaluations gives model labs and enterprises deeper performance analytics gathered from Arena’s community. That product turns the same evaluation flow that attracts public users into a commercial service for organizations trying to understand how models behave in practice.
Why the revenue number matters
Arena’s growth is notable because the company is still widely associated with its research roots. Its co-founder and CEO, Anastasios Angelopoulos, told TechCrunch: “A lot of people don’t even understand that our business is making any money at all; people still see us as an open source project.”
The company calls the milestone ARR, a term that traditionally stood for annualized recurring revenue. Angelopoulos clarified that Arena charges customers for “consumption,” meaning the revenue is not recurring.
That distinction matters. The $100 million figure shows the annualized scale of customer spending, but it does not mean Arena has the same subscription-style revenue profile that many software companies imply when they use ARR.
Even with that caveat, the growth curve is sharp. When Arena announced in January that it raised a $150 million Series A at a post-money valuation of $1.7 billion, its annualized revenue was $30 million. The company has now reached $100 million just months later.
Post-training demand is lifting the market
Arena’s rise is tied to a broader push among AI providers to improve model performance after initial training. As AI providers strive to maximize model performance, their appetite for post-training optimization services continues to surge.
Angelopoulos said Arena does not have direct competitors. Yupp, another crowdsourced AI model-picking startup, shut down in March. Still, Arena competes “for the same dollar” with human labeling startups such as Mercor, Surge, and Scale AI, which assist model makers in refining AI during post-training.
The wider market context is significant. Handshake’s gross annualized revenue from AI training has nearly doubled since January, climbing from $550 million to nearly $1 billion, The Information reported in April. Mercor’s annualized revenue also topped $1 billion earlier this year, up from $500 million last September, according to The Information.
Those figures point to a common theme: companies building AI models are spending heavily on ways to measure, compare, refine, and improve performance. Arena’s advantage is that it sits at the intersection of public model discovery and private model analytics.
What Arena evaluates
Arena ranks models on a variety of tasks. The source article names text, coding, vision, and image generation as key categories.
The company has also introduced Agent Mode for complex, long-running workflows. That addition reflects the broader move from evaluating isolated prompts toward testing how AI systems perform on more involved tasks.
For model labs and enterprises, that kind of evaluation can be valuable because raw model capability is only part of the question. Buyers and builders also need to understand where one model performs better than another, how performance changes by task, and how users judge output quality in side-by-side comparisons.
Arena’s public interface creates a steady stream of judgments from users who are often drawn by early access to the latest, often unreleased, AI models. Its commercial product packages deeper insight from that activity for customers that need more than a public rank.
The company behind the leaderboard
Arena originated at UC Berkeley and was co-founded by Anastasios Angelopoulos, Wei-Lin Chiang, and Ion Stoica. Chiang, a fellow UC Berkeley postdoctoral student, serves as CTO. Stoica is a UC Berkeley professor and Databricks co-founder who advised the project before it incorporated as a company in April 2025.
The company has raised a total of $250 million from investors. Its backers include Felicis, Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners, Laude Ventures, and UC Investments.
Arena’s story shows how infrastructure around AI evaluation is becoming valuable in its own right. The models may get the attention, but the systems that compare them, test them, and explain their strengths are becoming a business category of their own.