Meta's Fundamental AI Research (FAIR) team has introduced Omnilingual ASR, an automatic speech recognition system designed to transcribe spoken language across more than 1,600 languages. The project targets a long-running gap in speech AI: many languages have little or no support because they lack large collections of transcribed audio.
The release matters because automatic speech recognition is often strongest where data is already abundant. Meta's new system is aimed at a wider language map, including languages that have historically been left outside mainstream AI tools.
Why language coverage is the central issue
Most speech recognition tools have concentrated on a few hundred well-resourced languages. Those are the languages with enough transcribed speech to train and test systems at scale.
That leaves a large gap. The source describes more than 7,000 spoken worldwide, with thousands receiving little or no AI support. Omnilingual ASR is built around that problem: not just improving transcription for languages already covered, but expanding access to languages that have not had meaningful support.
Meta says 500 of the 1,600 supported languages have never been covered by any AI system before. That is one of the strongest signals of what FAIR is trying to accomplish with the release. The goal is not only a larger model family, but a step toward a "universal transcription system" that could help break down global language barriers.
How well Omnilingual ASR performs
The system's results depend heavily on the amount of training data available for each language. That is an important limitation, because the central challenge for underrepresented languages is often the lack of usable audio and transcripts.
According to Meta, Omnilingual ASR delivers a character error rate below 10 for 78 percent of the 1,600 languages tested. The results improve sharply when more training audio is available. For languages with at least ten hours of training audio, 95 percent reach that mark or better.
The results are more mixed for languages with very limited data, but still notable. For "low-resource" languages with less than ten hours of audio, 36 percent fall below the 10 character error rate threshold.
Those figures show the practical tradeoff. Omnilingual ASR can cover a much broader set of languages than typical speech recognition systems, but recognition quality is still tied to the data available. More audio and transcripts generally mean stronger transcription performance.
A corpus for underrepresented languages
Alongside the model release, Meta has also released the Omnilingual ASR Corpus. The corpus is a large dataset of transcribed speech in 350 underrepresented languages.
The dataset is available under a Creative Commons (CC-BY) license. Its purpose is to support further research and real-world use, especially for developers and researchers who want to build or adapt speech recognition models for specific local needs.
This is a significant part of the release because models alone do not solve the entire problem. Speech recognition systems need examples of spoken language paired with text. By making a corpus available for underrepresented languages, Meta is giving researchers and developers material they can use as a foundation for more targeted work.
Bring Your Own Language and in-context learning
One of the most important features in Omnilingual ASR is called "Bring Your Own Language." It uses in-context learning, adapting a technique from large language models to speech recognition.
The idea is direct: users provide a few paired audio and text samples for a new language. The system then learns from those examples without requiring retraining or heavy computing resources.
Meta says this approach could, in theory, expand Omnilingual ASR to more than 5,400 languages. That would push the system far beyond current industry standards described in the source.
The quality for minimally supported languages does not yet match fully trained systems. Still, the feature changes what is possible for communities that previously had no practical speech recognition access. Instead of waiting for a large training dataset, they may be able to start with a smaller set of examples.
Open-source models and deployment choices
Meta is releasing Omnilingual ASR as open source under the Apache 2.0 license. Researchers and developers can use, modify, and build on the models, including for commercial use. The datasets are available under a CC-BY license.
The Omnilingual ASR family includes multiple model sizes. The range starts with a lightweight 300 million parameter version for low-power devices and extends to a 7 billion parameter version for "top-tier accuracy." All models are built on FAIR's PyTorch-based fairseq2 framework, and a demo is available.
That range gives developers different options depending on the setting. A smaller model can be relevant where power or device constraints matter. A larger model is positioned for the strongest accuracy in the family.
The broader direction is clear: Meta is trying to make speech recognition less dependent on the small group of languages that already have abundant training data. Omnilingual ASR does not remove the importance of data, but it expands the number of languages that can be reached today and offers a path for adding more through examples.