A new AI speech model called Dia is trying to bring podcast-style voice generation closer to everyday developers and creators. Built by two Korea-based undergrads behind Nari Labs, the model is openly available and designed to turn written scripts into two-way audio conversations.
The pitch is simple: give users more control over synthetic voices than a typical automated speech tool, while keeping the model accessible enough to run outside a giant cloud platform. The result is a project that looks technically impressive, but also highlights the unresolved risks around voice cloning, impersonation, and training data.
What Nari Labs says Dia can do
Dia is a 1.6 billion parameter AI speech model. Parameters are the internal variables a model uses to make predictions, and larger models generally tend to perform better.
The model can generate dialogue from a script, which makes it useful for podcast-style clips and conversational audio. Users can customize speaker tones and add nonverbal or messy human details such as disfluencies, coughs, laughs, and other cues.
That emphasis matters because many AI voice systems are judged not only on clarity, but on whether they can create speech that feels conversational. A polished synthetic voice can still sound artificial if it lacks interruptions, pauses, laughter, or changes in delivery.
Nari Labs co-founder Toby Kim said the work was inspired by Google’s NotebookLM. According to Kim, he and his fellow co-founder started learning about speech AI three months ago and wanted to build a model that gave users more control over generated voices and "freedom in the script."
An openly available model with modest hardware needs
Dia is available through Hugging Face and GitHub. That matters because it places the model in front of AI developers who are already used to downloading, testing, modifying, and building around open tools.
The model can run on most modern PCs with at least 10GB of VRAM. For a voice AI system that generates dialogue, that makes Dia more reachable than tools that require specialized hosted infrastructure for every experiment.
Nari used Google’s TPU Research Cloud program to train Dia. The program gives researchers free access to Google’s TPU AI chips, which are used for AI training workloads.
Dia generates a random voice unless the user prompts it with a description of the intended style. It can also clone a person’s voice, a feature that can make synthetic speech more flexible but also much more sensitive.
Why the model is getting attention
The market around synthetic speech is already crowded. ElevenLabs is one of the largest players, while other challengers include PlayAI and Sesame.
Investor interest is also strong. According to PitchBook, startups developing voice AI tech raised over $398 million in VC funding last year.
Dia’s appeal comes from a mix of access, capability, and timing. It is openly available, it can run on consumer-style hardware with enough VRAM, and it targets a familiar use case: generating podcast-like conversations from written material.
In TechCrunch’s brief testing through Nari’s web demo, Dia generated two-way chats about any subject without resistance. The report described the voice quality as competitive with other tools and said the voice cloning function was among the easiest the reporter had tried.
For developers, that combination could make Dia useful as a base model for experiments in narration, scripted dialogue, audio interfaces, and voice-enabled products. For listeners, it points toward a future in which generated conversations become easier to produce and harder to distinguish from recordings made by people.
The same features create obvious risks
Dia also appears to offer little in the way of safeguards. That is a serious issue for any AI speech model, especially one that can clone voices and produce convincing dialogue.
The source article notes that it would be trivially easy to create disinformation or a scammy recording with the model. Nari’s project pages discourage using Dia to impersonate, deceive, or conduct illicit campaigns, but the group says it "isn’t responsible" for misuse.
That puts Dia in the middle of a broader tension in AI voice technology. Open access can help researchers, builders, and hobbyists move faster. But the same openness can lower the barrier for people who want to create misleading audio.
The problem is especially sharp with voice cloning. A tool that can reproduce a person’s voice may be useful for authorized creative work, accessibility, or prototyping. But without meaningful safeguards, it can also support impersonation and fraud.
Training data remains unanswered
Nari has not disclosed which data it scraped to train Dia. That leaves an important gap in understanding how the model was built and what material may have shaped its output.
The article notes that Dia may have been developed using copyrighted content. A commenter on Hacker News observed that one sample sounded like the hosts of NPR’s "Planet Money" podcast.
Training AI models on copyrighted content is common, but legally disputed. Some AI companies argue that fair use protects model training. Rights holders argue that fair use does not apply in that context.
For Dia, the unresolved data question matters because voice models are judged partly by how closely they can reproduce the sound and structure of real speech. If training material includes recognizable voices, formats, or shows, the line between learning general speech patterns and imitating specific works can become difficult to separate.
What comes next for Dia
Kim said Nari’s plan is to build a synthetic voice platform with a "social aspect" on top of Dia and future, larger models. The group also intends to release a technical report for Dia.
Nari plans to expand Dia’s language support beyond English. That would broaden the model’s potential user base and make the safety and data questions even more important.
For now, Dia is a clear example of how quickly AI speech tools are moving. A small team with limited prior speech AI experience has released a model that can generate convincing, customizable dialogue and run on accessible hardware. The achievement is notable, but so are the unanswered questions that come with it.