Google Research has released code for Health Acoustic Representations, known as HeAR, giving researchers access to an AI system built to analyze coughs and breathing sounds for health-related signals. The work points to a future where short audio recordings could help support screening, especially when access to conventional testing is limited.
HeAR is not being presented as a ready diagnostic product. The researchers stress that it is still a research tool, and any medical use would require clinical validation. Still, its benchmark results show why audio-based health AI is attracting attention.
How HeAR learns from sound
HeAR was developed by Google Research to turn short acoustic recordings into useful representations for health analysis. It was trained on over 300 million short audio clips from non-copyrighted YouTube videos, using self-supervised learning.
The system is based on the Transformer architecture. During training, parts of audio spectrograms were hidden, and the neural network learned to reconstruct the missing sections. That process helped HeAR create compact representations of audio data that can contain relevant health information.
This matters because coughs and breathing patterns can carry information that is difficult to evaluate at scale with human listening alone. A model that can learn broadly from sound may then be tested on specific health-related tasks, such as cough classification or lung function estimation.
Google published its findings in March 2024 and has now released the code for other researchers to use. Researchers can also request the trained HeAR model and an anonymized version of the CIDRZ dataset, which contains cough audio data, from Google.
Where the model performed best
Researchers evaluated HeAR on 33 tasks from 6 datasets. Those tasks included recognizing health-related sounds, classifying cough recordings, and estimating lung function values.
Across most benchmarks, HeAR outperformed existing audio AI models. One of the most notable results involved tuberculosis detection from cough sounds. HeAR achieved an accuracy, measured as AUROC, of 0.739. The second-best model, TRILL, reached 0.652.
The authors see potential in using AI cough analysis to identify people in resource-poor areas who need further testing. That is an important distinction: the model is framed as a possible screening aid, not as a replacement for medical diagnosis.
HeAR also showed promise in estimating lung function parameters from smartphone recordings. These included FEV1, described as one-second capacity, and FVC, described as vital capacity.
For FEV1, HeAR had an average error of only 0.418 liters. The best comparison method had an average error of 0.479 liters. The source article notes that this could lead to new, accessible screening tools for lung diseases such as COPD.
Why cough and breathing analysis could matter
The main appeal of an approach like HeAR is accessibility. Coughs and breathing sounds can be captured with devices people already use, including smartphones. If a model can extract useful signals from those recordings, it could help expand screening beyond settings with specialized equipment.
The tuberculosis result is especially relevant because the source highlights potential use in resource-poor areas. In that context, audio analysis could help flag people who should receive further testing. The value would come from prioritization and reach, not from replacing clinical evaluation.
The lung function work follows the same logic. Estimating FEV1 and FVC from smartphone recordings could make early screening more accessible. For diseases such as COPD, easier screening tools could help bring attention to people who might otherwise go untested.
There are several practical implications for researchers:
- HeAR can be tested against health-related audio tasks beyond its initial benchmarks.
- The released code makes it easier to reproduce, inspect, and extend the approach.
- The trained model and anonymized CIDRZ dataset can support further work on cough audio data.
- Benchmark gains must still be separated from clinical readiness.
The limits are still significant
Despite the strong benchmark results, the researchers stress that HeAR is still a research tool. Any diagnostic applications would require clinical validation. That requirement is central because performance on datasets does not automatically prove that a system is suitable for real-world medical decisions.
The system also has technical limitations. One current limitation is that HeAR can process only two-second audio clips. That constraint matters because real cough and breathing recordings may vary in length, quality, and recording conditions.
Google plans to use techniques such as model distillation and quantization to make HeAR more efficient for direct use on mobile devices. Those techniques are presented as a path toward more practical deployment, especially where on-device processing could be useful.
The StopTB Partnership, a UN-backed organization aiming to cure tuberculosis by 2030, supports this approach. That support aligns with the broader idea that audio-based AI could help identify people who need further testing in places where screening resources are limited.
What comes next for HeAR
The release of HeAR code shifts the project from a published research result into something other researchers can work with directly. That can help test whether its benchmark performance holds across more settings, recordings, and health-related tasks.
For now, the key point is balance. HeAR shows that AI systems trained on large-scale audio can learn representations useful for health assessment. It also shows that promising screening ideas still need careful validation before they are used in diagnosis.
If future research supports the early results, cough and breathing audio could become a more important input for accessible health screening. But based on the source, HeAR’s current role is clear: it is a research system with promising results, released code, and important limits still to resolve.