The enormous amounts of research data produced daily in the field of materials science can be of tremendous value if it can be characterized comprehensively. For this, a FAIR (Findable, Accessible, Interoperable, and Re- usable) data infrastructure is imperative. In this article published as Nature Perspective, Scheffler et al. describe how to make data findable and AI-enabled to actively shape the future in materials science.
One of the most important research goals in materials science today is the improvement of existing or even the discovery of new materials to solve societal problems such as the energy crisis (e.g. through improved catalysts). In addition to the targeted high-throughput screening of materials with respect to their properties, the evaluation of already existing material data is a promising approach to find materials with certain desired properties. In this context, it is particularly interesting to be able to use data from synthetic, experimental and theoretical studies for deeper analysis and training of AI models.
In a newly accepted Nature Perspective article Scheffler and colleagues describe the challenges of establishing a FAIR (Findable, Accessible, Interoperable, and Re- usable) data infrastructure. Since most research data today is neither findable nor interoperable, as metadata, ontologies, and workflows from different research groups are often not sufficiently characterized and thus not comparable scientific data need to be published well characterized in a “clean” way. Only then can data be easily shared and explored using data analysis and artificial intelligence (AI) methods. The long-term goal is the development of material "maps" that will enable prediction of new materials with desired properties.
The authors describe the value of large-scale data collections, digital repositories, and new concepts and methods of data analysis, such as those being developed and used in the Novel Materials Discovery (NOMAD) lab. They emphasize the importance of convincing the scientific community to share their data. Metadata standards and ontologies need to be developed. In addition, data-centric materials science requires a supported complex infrastructure and the provision of efficient tools for data processing and analysis. (Fig. 1).
To prepare for tomorrow's research, consortia like FAIRmat are using successful, real-world examples from everyday data-centric research to demonstrate that the concept actually works and to educate future scientists and engineers about the importance of FAIR scientific data management and control. The authors describe how FAIRmat addresses the challenge of establishing domain-specific data storage solutions (e.g., NOMAD Oases) in materials science. Participating scientists can manage their data in a central metadata repository and explore them hierarchically by properties in a common encyclopedia. Practical tools for data analysis are also provided (e.g., the NOMAD Artificial Intelligence Toolkit).
The FAIR data infrastructure will be critical for reproducible synthesis, experimental disciplines, and theory and computation. To realize the data revolution, existing research approaches such as experiment, theory, and numerical simulations need to be complemented by data mining, new analysis methods (especially AI), and visualization. This will open up new horizons for research in basic and engineering sciences and reach out to industry and society.
Ultimately, however, it is crucial to convince the scientific community to move from the role of observer to active action and to participate in building the FAIR data infrastructure. Otherwise, they may miss important developments in material science and fall behind.
Read the full article here:
M. Scheffler, M. Aeschlimann, M. Albrecht, T. Bereau, H.-J. Bungartz, C. Felser, M. Greiner, A. Groß, C. T. Koch, K. Kremer, W. E. Nagel, M. Scheidgen, C. Wöll, and C. Draxl
FAIR data – new horizons for materials research.
Nature Perspectives: 604, 635–642 (2022)