This repository contains code for calculating descriptors, training separate machine learning models for formation energy and bandgap predictions, and choosing the model with the Minimum Mean Squared Error (MSE) to make predictions on novel perovskite materials.
The methodology employed leverages descriptors, such as ionic radii and electronegativity, as well as empirical factors like octahedral and tolerance factors, and pymatgen features to construct a comprehensive feature set for each material. Using these features, thousands of machine learning models are trained and evaluated, with the best-performing model being selected based on its predictive accuracy.
Once a robust model is identified, it is utilized to predict the formation energy and band gap of unexplored perovskite configurations. The results are systematically analyzed through error plots, feature importance visualizations, and comprehensive performance metrics to validate the effectiveness of the predictive model.
This framework provides an efficient and scalable approach for the accelerated screening and discovery of promising lead-free perovskite materials, facilitating further exploration and validation through density functional theory (DFT) calculations.

Prerequisites The required packages can be found in the requirements.txt file. Make sure to install them using: pip install -r requirements.txt Code Files
- Classic-Features_FE.py Calculates classical descriptors for the elements involved, such as:
Radius of
-
pymatgen_descriptors.py Utilizes pymatgen to generate additional elemental descriptors for machine learning.
-
ML_saveCSV_FE_V2.py Generates thousands of machine learning models with different parameters and calculates performance metrics to find the optimal model with the lowest MSE.
-
Find-min-mse_test_FE.py Finds the model with the minimum MSE error and saves it for further use.
-
Pred-Load-FE.py Loads the selected model to predict materials' formation energy.
-
Bar-plot-fea-impo_FE.py Generates a plot to visualize the feature importance of the selected model, aiding in the interpretation of key descriptors.
-
MLErrorPlot_FE.py Creates error plots comparing the predicted and actual values for both the training and test datasets, providing a visual representation of model performance.