TechCrunch AI January 16, 2025 NEUTRAL

MiniMax pushes AI model race with 4 million-token context

MiniMax introduced three new AI models for text, multimodal understanding and speech generation. The standout is MiniMax-Text-01, which the company says has a 4 million-token context window and can analyze around 3 million words at once.

WTF Index NEUTRAL

◄ Terminator 1 Idiocracy 1 ►

This is mostly a competitive model launch, with only mild concerns around synthetic media and expanding AI capability.

MiniMax pushes AI model race with 4 million-token context

MiniMax has introduced a new set of AI models as Chinese companies continue to release systems that aim to compete with models from OpenAI and other U.S.-based AI firms.

The Alibaba- and Tencent-backed startup debuted MiniMax-Text-01, MiniMax-VL-01, and T2A-01-HD. Together, the releases cover text, image-and-text understanding, and speech generation, while also raising familiar questions about openness, licensing, synthetic media, and AI export restrictions.

What MiniMax Released

The three models have different roles. MiniMax-Text-01 is a text-only model. MiniMax-VL-01 can work with both images and text. T2A-01-HD is built for audio generation, specifically speech.

MiniMax is not a small entrant. The company has raised around $850 million in venture capital and is valued at more than $2.5 billion. It was founded in 2021 by former employees of SenseTime, one of China’s largest AI firms.

The company already has consumer and creator-facing projects. Its apps include Talkie, an AI-powered role-playing platform along the lines of Character AI, and it has released text-to-video models in Hailuo.

For developers and AI users, the practical point is that MiniMax is trying to cover several major AI formats at once:

MiniMax-Text-01 for text generation and long-input analysis.
MiniMax-VL-01 for multimodal tasks involving images and text.
T2A-01-HD for generated speech and voice cloning.

The Text Model’s Biggest Claim

MiniMax says MiniMax-Text-01 is 456 billion parameters in size. Parameters roughly correspond to a model’s problem-solving ability, and models with more parameters generally perform better than models with fewer parameters.

The company claims MiniMax-Text-01 performs better than models such as Google’s recently unveiled Gemini 2.0 Flash on benchmarks like MMLU and SimpleQA. Those benchmarks measure a model’s ability to answer math problems and fact-based questions.

The most striking feature is the context window. A model’s context window is the input it can consider before generating output. In simple terms, it defines how much material the model can keep in view while responding.

MiniMax-Text-01 has a context window of 4 million tokens. According to MiniMax, that means it can analyze around 3 million words in one go, or just over five copies of 'War and Peace.' The source also notes that this context window is roughly 31 times the size of GPT-4o’s and Llama 3.1’s.

That matters because long-context models can handle much larger documents, codebases, transcripts, or collections of text without splitting them into smaller parts. The source does not say how MiniMax-Text-01 performs in real-world use, but the context length alone makes it a notable release.

Vision, Charts and Speech

MiniMax-VL-01 is the company’s image-and-text model. MiniMax says it rivals Anthropic’s Claude 3.5 Sonnet on evaluations that require multimodal understanding.

One example is ChartQA, which asks models to answer questions about graphs and diagrams. The source gives a sample task: 'What is the peak value of the orange line in this graph?'

The comparison is mixed. MiniMax says MiniMax-VL-01 is competitive with Claude 3.5 Sonnet on some multimodal evaluations, but it does not quite beat Gemini 2.0 Flash on many of those tests. OpenAI’s GPT-4o and the open model InternVL2.5 also beat it on several.

T2A-01-HD takes MiniMax into speech generation. It can generate a synthetic voice with adjustable cadence, tone, and tenor in around 17 different languages, including English and Chinese. It can also clone a voice from just 10 seconds of an audio recording.

MiniMax did not publish benchmark results comparing T2A-01-HD with other audio-generating models. The source says that, to the reporter’s ear, its outputs sound on par with audio models from Meta and startups like PlayAI.

Open Availability With Limits

MiniMax is making some of the new models available beyond its own platforms, but the details matter. MiniMax-Text-01 and MiniMax-VL-01 can be downloaded from GitHub and the AI dev platform Hugging Face.

T2A-01-HD is different. It is available only through MiniMax’s API and Hailuo AI platform.

The text and vision models are described as openly available, but that does not mean they are fully open source. MiniMax has not released the components, such as training data, needed to re-create them from scratch.

The models are also under MiniMax’s restrictive license. That license prohibits developers from using the models to improve rival AI models. It also requires platforms with more than 100 million monthly active users to request a special license from MiniMax.

For teams evaluating AI models, this distinction is important. Download access can make a model easier to test, but licensing terms and missing training components can still limit how it can be used, studied, or built upon.

Controversy and Export Pressure

MiniMax’s product history has already drawn attention. Talkie was pulled from Apple’s App Store in December for unspecified 'technical' reasons. The app features AI avatars of public figures, including Donald Trump, Taylor Swift, Elon Musk, and LeBron James, none of whom appear to have consented to being featured in the app.

MiniMax’s video tools have also faced scrutiny. In December, Broadcast magazine reported that MiniMax’s video generators can reproduce the logos of British television channels, suggesting that the models were trained on content from those channels. MiniMax is also reportedly being sued by iQiyi, a Chinese video streaming service that alleges MiniMax illicitly trained on iQiyi’s copyrighted recordings.

The timing of the release adds another layer. MiniMax’s new models arrived days after the outgoing Biden administration proposed harsher export rules and restrictions on AI technologies for Chinese ventures.

Companies in China were already prevented from buying advanced AI chips. If the new rules go into effect as written, companies would face stricter caps on both semiconductor technology and the models needed to bootstrap sophisticated AI systems.

On Wednesday, the Biden administration announced additional measures focused on keeping sophisticated chips out of China. Chip foundries and packaging companies that want to export certain chips will face broader license requirements unless they exercise greater scrutiny and due diligence to prevent their products from reaching Chinese clients.

MiniMax’s releases show how quickly AI development is moving even under pressure. The company is presenting competitive claims in text and multimodal AI, pushing a large context window, and offering speech tools with voice cloning, while operating inside a tightening technology environment.