Skip to content

Latest commit

 

History

History
162 lines (141 loc) · 5.37 KB

README.md

File metadata and controls

162 lines (141 loc) · 5.37 KB

Root Storage of Deep Learning Models in TMVA

REPO_SIZE TOTAL_LINES LICENSE CHECKS

This project was a part of Google Summer of Code 2021 under the organization CERN-HSF
Link to Project Page

Project Details

Student's Name Sanjiban Sengupta
Mentors Lorenzo Moneta, Sitong An, Anirudh Dagar
Organization Root-Project (CERN-HSF)
Organization Code Repository https://github.com/root-project/root
Final Report https://github.com/sanjibansg/GSoC21-RootStorage/wiki
Code Implementations https://github.com/root-project/root/pulls?q=author:sanjibansg
Project Proposal https://docs.google.com/document/d/1MVKpGP9lr0tUhrxB59nrNlZfAtnO_Dgkx8ddw1k26Yk/edit?usp=sharing
Documentation Blog https://blog.sanjiban.ml/series/gsoc

About Project

The Toolkit for Multivariate Data Analysis (TMVA) is a sub-module of ROOT which provides a machine learning environment for conducting the training, testing, and evaluation of various multivariate methods especially used in High-energy Physics. Recently, the TMVA team introduced SOFIE (System for Fast Inference code Emit) which facilitates its own intermediate representation of deep learning models following the ONNX standards. To facilitate the usage, storage, and exchange of these models, this project aimed at developing the storage functionality of Deep Learning models in the `.root` format, popular in the High Energy Physics community.

Project Contents

  1. Functionality for serialization of RModel for storing a trained deep learning model in `.root` format.
  2. Functionality for parsing a Keras `.h5` file into a RModel object for generation of inference code.
  3. Functionality for parsing a PyTorch `.pt` file into a RModel object for generation of inference code.
  4. Tests,Tutorials & Documentations for various parsers of TMVA SOFIE's RModel object.
  5. Funcationality for Intermediate Representation of BDT Models and Parsing of TMVA trained BDT models

Tech Stack

  • Languages: C/C++, Python
  • Deep Learning Libraries: Keras, PyTorch
  • API: C-Python API
  • Build: CMake
  • Tests: GTest Framework
  • Documentation: DOxygen

Installation

Installation Steps for building ROOT from source can be found here

https://root.cern/install/build_from_source/

Provided install.sh can also be used which directly builds the repository and merges the implemented code files

git clone https://github.com/sanjibansg/GSoC21-RootStorage.git
cd GSoC21-RootStorage
./install.sh

Interface

  • Serialization of RModel
    //Writing ROOT File
    TFile file("model.root","CREATE");
    using namespace TMVA::Experimental;
    SOFIE::RModel model = SOFIE::PyKeras::Parse("trained_model_dense.h5");
    model.Write("model");
    file.Close();
    
    //Reading ROOT File
    TFile file("model.root","READ");
    using namespace TMVA::Experimental;
    SOFIE::RModel *model;
    file.GetObject("model",model);
    file.Close();
    

  • Keras Converter for RModel
    //Parser returns a RModel object
    using TMVA::Experimental::SOFIE;
    RModel model = PyKeras::Parse("trained_model_dense.h5");
    
    //Converter writes a ROOT file directly
    PyKeras::ConvertToRoot(“trained_model_dense.h5”);
    

  • PyTorch Converter for RModel
    //Parser returns a RModel object
    using TMVA::Experimental::SOFIE;
    
    //Building the vector for input shapes
    std::vector<size_t> s1{120,1};
    std::vector<std::vector<size_t>> inputShape{s1};
    RModel model = PyTorch::Parse("trained_model_dense.pt",inputShape);
    
    //Converter write3s a ROOT file directly
    std::vector<size_t> s1{120,1};
    std::vector<std::vector<size_t>> shape{s1};
    PyTorch::ConvertToRoot(“trained_model_dense.pt”,inputShape);
    

  • Root Storage of BDT
    //Parser loads the BDT model from .xml to RootStorage::BDT object
    TMVA::Experimental::RootStorage::BDT model;
    bool usePurity = true;
    model.Parse("TMVA_CNN_Classification_BDT.weights.xml",usePurity);
    

Future Plan

  • Development of Root Storage of BDT
    • Develop the mapping interface for inference code generation from class RootStorage::BDT
    • Researching on the conversion of scikit-learn based BDT models to class RootStorage::BDT for subsequent inference
    • Adding tests & tutorials for BDT
  • Adding Support for conversion of Convolution Layers from Keras and PyTorch models.

Contributions

For existing bugs and adding more features open a issue here.