Cement production produces greenhouse gas emissions, driven by the energy-intensive production of clinker and limestone calcination1. The conventional approach to reducing these emissions involves replacing clinker with supplementary cementitious materials like coal fly ash and blast furnace slag. However, the availability of these traditional substitutes has declined by 37% over two decades as coal plants close and steel recycling increases2. This supply constraint, combined with rising demand for sustainable construction, necessitates identifying new materials that can undergo similar cement-like reactions.

Machine learning offers a systematic approach to screen large databases of materials and has demonstrated exciting potential in other applications3. Recently, a team led by Soroush Mahjoubi and Elsa Olivetti at Massachusetts Institute of Technology have reported a comprehensive framework that combines natural language processing with predictive modelling to explore cement alternatives (https://doi.org/10.1038/s43246-025-00820-4)4. The team developed an innovative approach, mining over 5.7 million scientific papers to extract chemical compositions of more than 14,000 materials. They then employed fine-tuned large language models to classify these materials into 19 categories. A neural network based on this database was trained to predict three critical reactivity metrics: heat release, Ca(OH)2 consumption, and bound water content, achieving R2 values exceeding 0.85.

This approach reveals unexpected diversity among reactive materials. Construction and demolition wastes, including recycled ceramics and concrete, exhibit heat releases up to 450 J/g, comparable to traditional pozzolans—the broad class of siliceous and aluminous cement-forming materials. Municipal solid waste incineration ash and various biomass ashes (rice husk, sugarcane bagasse, wood) also demonstrate significant pozzolanic behaviour. Among mine tailings, copper and zinc varieties show particularly promising reactivity profiles. Excitingly, this report indicates these secondary materials could collectively replace 68% of global cement production (Fig. 1). However, not all regions have access to industrial byproducts, making the discovery of reactive natural materials particularly significant. By applying their model to predict reactivity for a global geochemical database of over 1 million rock samples, the team identified 25 rock types with significant reactivity when mechanically activated. Ignimbrite and silicic tuff show the highest reactive-to-total sample ratios (~25%), while more abundant rocks like rhyolite and andesite, though displaying lower ratios, offer greater global availability. These reactive rocks are concentrated in tectonically active regions, including the Andes, the Great Rift Valley, and the Pacific Ring of Fire, providing regional alternatives where industrial byproducts are scarce.

Fig. 1: Reactivity mapping of cementitious materials.
figure 1

a Heat release versus Ca(OH)2 consumption for different material types. b Density distribution of the 18 material classes in reactivity space. Grey lines mark critical thresholds: materials releasing >120 J/g are considered reactive (versus inert), while those consuming >50 g/100 g Ca(OH)2 exhibit pozzolanic behaviour. Colour intensity indicates sample frequency. Reproduced from Communications Materials under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (2025)4.

The technical achievement of this study required overcoming significant challenges in data scarcity and variability. ‘Much of the existing data on cement substitutes is scattered, inconsistent, and incomplete, especially regarding key physical properties like amorphous content’, notes Dr Mahjoubi. To overcome this, the team developed a specialised neural network with multiple prediction pathways that can intelligently fill in missing data gaps while maintaining accuracy, an essential feature when dealing with the incomplete records common in materials research. This sophisticated approach enabled these researchers to capture the complex chemical and physical interactions governing cement reactivity while managing the inherent uncertainty in the available data.

Implementing these alternatives at scale could reduce global greenhouse gas emissions by 3%, equivalent to removing 260 million vehicles from roads. Many identified materials require only mechanical activation through grinding, avoiding the energy-intensive thermal processing needed for other cement substitutes. The geographic distribution of natural precursors also addresses regional disparities in access to sustainable construction materials. ‘Experimental validation of some of the most promising candidate materials is a critical next step’, emphasises Dr Mahjoubi. Importantly, the machine learning framework, able to rapidly screen materials based on chemical and physical properties, provides a valuable foundation for expanding the circular economy in construction. ‘Future work will also aim to integrate knowledge about cement hydration kinetics to further improve the interpretability and accuracy of predictions’, comments Dr Mahjoubi. As infrastructure demands continue to grow globally, data-driven approaches offer a promising pathway to maintain material performance while significantly reducing the environmental impact of concrete production.