Skip to content

A repository for the winners of the NASA Mars Spectrometry challenge

License

Notifications You must be signed in to change notification settings

drivendataorg/mars-spectrometry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation



Banner Image

Mars Spectrometry: Detect Evidence for Past Habitability

DOI Mars Spectrometry

Goal of the Competition

In this challenge, the competitors' goal was to build a model to automatically analyze mass spectrometry data collected for Mars exploration in order to help scientists in their analysis of understanding the past habitability of Mars.

Their models detect the presence of certain families of chemical compounds in data collected from performing evolved gas analysis (EGA) on a set of analog samples. The winning techniques seen in this repo may be used to help analyze data from Mars, and potentially even inform future designs for planetary mission instruments performing in-situ analysis.

What's in this Repository

This repository contains code from winning competitors in the Mars Spectrometry: Detect Evidence for Past Habitability DrivenData challenge. Code for all winning solutions are open source under the MIT License.

Winning code for other DrivenData competitions is available in the competition-winners repository.

Winning Submissions

Place Team or User Private Score Summary of Model
1 + Bonus dmytro 0.092 Represented the mass spectrogram as a 2D image (temperature vs m/z values) used as an input to CNN, RNN or transformer-based models. The diverse set of preprocessing configurations and models helped to achieve the diverse ensemble of models. This model also won the Bonus prize for its strong performance on SAM testbed data and its promise for application as judged by a panel of NASA scientists. Further details can be found in the write-up in the winner's repo.
2 _NQ_ 0.116 Feature engineering includes scaling m/z channels and area under the curve, peak value, peak width, and others. A LGBM model trained with these features ensembled with a neural network with 2 Conv1d modules, operating over temperature, followed by a linear layer across m/z channels and then a multi-target classifier gave the best performance.
3 devnikhilmishra 0.119 Converted the multilabel problem into a binary classification problem. Used LightGBM k-fold ensemble model to get the initial predictions, then fed these predictions along with top 5k features to a 31 fold ensemble, catboost model (which acted like a meta model)

Additional solution details can be found in the reports folder inside the directory for each submission.

Benchmark Blog Post: Mars Spectrometry: Detect Evidence for Past Habitability