Sponsor:Air Force Office of Scientific Research (AFOSR), under the Department of Defense (DOD), under award number FA9550-12-1-0458; and National Institute of Standards and Technology (NIST), under Award No. 70NANB14H012.
Project Team Members:
University of Michigan-Ann Arbor
Georgia Institute of Technology
Data Centered Materials Knowledge Discovery
This page contains a general overview of projects related to materials informatics, the application of data mining technologies for accelerated and enhanced materials knowledge discovery. Several lines of projects developed to solve specific materials modeling problems can be found in each project's page in Project Outline below.
Data mining for materials discovery is concerned with representing materials science problems into a statistical framework, and learning models that describe observations about the processing, structure, and property of materials. The extraction of microstructure-property relationships resides at the basis of nearly all cutting-edge applications of Material Science and Engineering, whose goals are to develop advanced materials for industrial and military purposes, using experimental and computational methodologies. The massive amount of experimental and simulation data produced by modern characterization instruments and computational platforms introduce many challenges in terms of scalability, data storage, complexity, high dimensionality, interpretation, and retrieval. This makes it imperative to employ advanced methods for efficient data storage, retrieval, and analysis, thereby providing opportunities in the scope of high performance data mining for materials informatics.
Large-scale materials databases provide unprecedented opportunities for both supervised (e.g. classification, regression) and unsupervised learning (e.g. clustering, feature learning) in the field of data mining. The use of advanced modeling techniques with various data mining optimization and validation methodologies will allow us to identify strong predictor variables for the outcome of interest (here, a microstructure or a property of a material), and to construct a model for predicting that outcome. This requires advanced data mining techniques for knowledge discovery, which are to combine multiple predictor variables into a predictive model based on supervised data (with known labels/outcomes), and can be used to predict the labels of future test instances.
The project aims to create break-through concepts and methodologies for elucidating the microstructure-properties link to enable materials design by brining together cutting-edge theories and techniques from materials science, mathematics and information science. Three grand challenges have been identified, around which our research efforts are built:
- We aim to establish a standardized methodology, grounded in sound mathematics, for acquiring, storing, analyzing, modeling, and querying "beyond 3-D" materials data, taking full account of the potential sparsity of such data as well as the associated uncertainties and variabilities;
- We aim to employ advanced stochastic/probabilistic models that allow not only for the description of the "average" microstructure, but also for the inclusion of rare events (large deviations), and to set up the proper data structures to enable such a stochastic description; and
- We aim to employ advanced data mining approaches, constrained by accurate mathematical models and accounting for variability, to instantiate large numbers of digital microstructures to search for an optimal microstructure and its process path, to achieve a desired property combination.
The problems we take on to solve in materials knowledge discovery center around addressing the following issues:
- In materials design, how to best represent the design space? What variables should we select to use as key design factors? The project of microstructure representation learning answers these questions.
- Given a set of design variables, how to perform an optimization design that results in a desired (often optimized) property of materials? The project of microstructure optimization design answers this question.
- Suppose the causal relationship from microstructure to property exists, what are the different means of modeling such a relationship from an agnostic, data-centered point of view? See solutions in the collection of projects in predictive modeling in materials science.
- Ruoqian Liu, Logan Ward, Ankit Agrawal, Wei-keng Liao, and Chris Wolverton, and Alok Choudhary. Deep Learning for Chemical Compound Stability Prediction. In Proceedings of the Workshop on Large-scale Deep Learning for Data Mining, held in conjunction with the SIGKDD Conference on Knowledge Discovery and Data Mining, August 2016. (pdf)
- Ruoqian Liu, Ankit Agrawal, Wei-keng Liao, Alok Choudhary, and Zhengzhang Chen. Pruned Search: A Machine Learning Based Meta-Heuristic Approach for Constrained Continuous Optimization. In the Eighth International Conference on Contemporary Computing (IC3), August 2015. (pdf)
- Ruoqian Liu, Abhishek Kumar, Zhengzhang Chen, Ankit Agrawal, Veera Sundararaghavan, and Alok Choudhary. A predictive machine learning approach for microstructure optimization and materials design. Scientific Reports, 5:11551, Macmillan Publishers Limited SN, June 2015. (pdf)
- Hongyi Xu, Ruoqian Liu, Alok Choudhary, and Wei Chen. A Machine Learning-Based Design Representation Method for Designing Heterogeneous Microstructures. Journal of Mechanical Design, 137(5):051403–051403, ASME, May 2015. (pdf)
- Ruoqian Liu, Yuksel Yabansu, Ankit Agrawal, Surya Kalidindi, and Alok Choudhary. Machine learning approaches for elastic localization linkages in high-contrast composite materials. Integrating Materials and Manufacturing Innovation, 4(1):1–17, 2015. (pdf)
- Ruoqian Liu, and Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. Search Space Preprocessing in Solving Complex Optimization Problems. In the Workshop on Complexity for Big Data held in conjunction with the IEEE International Conference on Big Data, October 2014. (pdf)
- Hongyi Xu, Ruoqian Liu, Alok Choudhary, and Wei Chen. A Machine Learning-Based Design Representation Method for Designing Heterogeneous Microstructures, (Design Automation (DAC) Best Paper). In the ASME International Design Engineering Technical Conferences, August 2014. (pdf)
- Ruoqian Liu, Zhengzhang Chen, Tony Fast, Surya Kalidindi, Ankit Agrawal, and Alok Choudhary. Predictive Modeling in Characterizing Localization Relationships. In the TMS Annual Meeting & Exhibition, Symposium of Data Analytics for Materials Science and Manufacturing, February 2014. (pdf unavailable)
- Ruoqian Liu, Abhishek Kumar, Zhengzhang Chen, Ankit Agrawal, Veera Sundararaghavan, and Alok Choudhary. A Data Mining Approach in Structure-Property Optimization. In the TMS Annual Meeting & Exhibition, Symposium of Data Analytics for Materials Science and Manufacturing, February 2014. (pdf unavailable)
Software DownloadSoftwares developed for materials discovery are mostly problem specific and require domain data that are not publically available. That said, we strive to make components in the pipeline well modulized wherever we can, and in the meanwhile extend them to be suitable for general use. A series of general purpose softwares derived from aforementioned projects can be accessed below.
- Pruned-Search: a general optimization package that performs search space reduction given an objective function and mathematical constraints. See documentations on this project page.
- deuNet: a general python based deep learning package using Theano backend for GPU support through cuDNN. This is a self-contained deep learning package and its usage has been demonstrated on this project page.
This work is supported by AFOSR (Air Force Office of Scientific Research), Department of Defense (DOD) under Award No. FA9550-12-1-0458; and by National Institute of Standards and Technology (NIST), under Award No. 70NANB14H012.