Designing better metal alloys is often slowed by a basic problem: companies cannot know how a material will perform until they make it and test it. MIT researchers say they have found a more accurate way to model metals whose internal chemistry is complex, using machine-learning training datasets built to reflect a wider range of atomic environments.
The work focuses on metallic alloys, but senior author Rodrigo Freitas says the approach could also be adapted to other materials, including semiconductors. The goal is not a single product or one narrow use case. It is a modeling method that could help materials researchers make better predictions before expensive experimentation begins.
Why metal alloys are hard to simulate
Material behavior depends heavily on how chemical elements are arranged inside a solid. Two materials can contain the same elements, yet behave very differently if those elements are organized in different ways. One arrangement may produce brittleness, while another may allow deformation without breaking.
To capture those differences, researchers need simulations that describe materials atom by atom. Those simulations depend on models of how atoms interact with each other. Over the last two decades, machine learning has become the most accurate way to build those models.
The challenge is that many machine-learning models work best when a material has a highly ordered internal pattern. Most solid materials do not fit that neat structure. Their chemical arrangements are disordered, and they can vary from one region to another.
“The real challenge in our field is modelling these chemically disordered phases,” Freitas says. “Chemical disorder means there’s a huge variety of local chemical environments, which is hard for the machine-learning model to learn. This is a problem because every single metal we use in practice is chemically disordered.”
The training data problem
The MIT team’s central issue was not simply model size. It was whether the model had seen enough useful examples of the many local chemical environments inside disordered materials.
Current leading approaches can rely on brute force to build training data. According to the source article, that can require more than 100,000 hours of computation for a single material. Even after that effort, the resulting data may not transfer well when researchers change the material’s composition.
Freitas’ group had previously developed a way to measure the chemical complexity of solid materials by studying the frequency and spacing of small groups of atoms. For the new study, the researchers used that capability to create stronger training datasets for machine-learning models.
Their method uses information theory to generate datasets that include a broader mix of local chemical environments. The approach swaps atoms in samples to reduce repetition, replacing redundant examples with chemical environments the model might otherwise miss.
“We kept optimizing the training set so it captured as many different local environments as possible,” Freitas says. “If the same kind of environment showed up many times, we replaced redundant examples with ones the model hadn’t seen before. That makes the training set much more informative because each example adds something new.”
What the MIT team showed
In a new paper in Sciences Advances, the researchers reported that their approach could predict material properties for a diverse group of metal alloys under a range of conditions. Models trained on their datasets performed more accurately than models trained through random sampling or another popular sampling method.
The team also compared its approach with much larger models created by companies like Google and Microsoft. Using a set of machine-learning models, the researchers found that models trained on the MIT datasets were more accurate for the alloy cases they studied.
That result matters because accurate atom-by-atom simulation depends on whether a model can describe chemical bonding in a realistic way. Freitas frames the issue plainly: if a model cannot represent those interactions well, it may still offer broad insight, but it cannot reliably say what will happen to specific materials in real use.
The research team included first author Killian Sheriff PhD ’26; MIT PhD students Daniel Xiao and Yifan Cao; University of Sheffield Senior Lecturer Lewis R. Owen; and senior author Rodrigo Freitas, MIT’s TDK Career Development Professor in Materials Science and Engineering.
From alloy phases to industrial decisions
The method works partly because it captures hidden patterns in the sample data. The researchers describe those patterns as “subtle energetic biases toward certain local chemical configurations.”
Those small energetic differences can shape which phases form in an alloy, how phases change with temperature and composition, and what properties the finished material will have. That makes the work relevant to phase diagrams, which map stable phases across different temperatures and chemical compositions.
Daniel Xiao led simulations showing that the team’s models could predict phase diagrams that closely matched experimental data. The team also used Owen’s experimental data to compare simulations with real measurements of atomic ordering in alloys.
Phase diagrams are not just academic charts. They are tools used to connect materials modeling with processing decisions. If a team is welding, casting, or heat-treating an alloy, it needs to understand which phases are likely to appear under different conditions.
“Phase diagrams are one of the main ways people connect materials modeling to real processing decisions,” Freitas says. “If you are welding, casting, or heat-treating an alloy, you need to know which phases are likely to form under different conditions. Our goal is to make these kinds of predictions accurate enough, and accessible enough, that they become part of how people design materials.”
Why it matters for materials innovation
Companies in aerospace, energy, and computing are continually looking for materials that can improve performance. But the need to manufacture and test candidate materials adds time and cost, especially when experimentation is expensive.
A more accurate simulation workflow could change how early materials decisions are made. The MIT approach does not remove the need for real-world testing, based on the source article. It does, however, point toward simulations that can better reflect the chemistry of complex metal alloys before researchers commit to costly experiments.
Freitas says the method is not tied to one application. The same approach could be used to create new sustainable steels, new materials for aerospace, and other materials where chemical disorder makes prediction difficult.