Skip to content

R codes used to construct prediction models and prepare figures

Notifications You must be signed in to change notification settings

grp-bork/CellCount_Nishijima_2024

Repository files navigation

CellCount_Nishijima_2024

Overview

This repository contains the R codes used in Nishijima et al. Cell, 2024. The scripts provided include:

  1. Prediction model construction: R scripts to build prediction models for fecal microbial load (i.e., total microbial cells per gram, or cell density) using the XGBoost algorithm.
  2. Figure generation: R scripts to produce figures used in the manuscript.

The scripts to construct models utilize species-level taxonomic profiles (mOTUs v2.5) and fecal microbial load from the GALAXY/MicrobLiver (n = 1,894) and MetaCardis (n = 1,812) study populations, located in the data folder. Due to restricted access to the Japanese 4D and Estonian Microbiome datasets, codes based on these datasets are provided with HTML outputs generated by Rmarkdown.

Data downloading

Some input files are too large to put in this repository. These files are available on Zenodo at https://zenodo.org/records/14243685. After downloading and unpacking:

  1. Place the data in the main directory.
  2. Place the model in the out/ directory.

Instructions

To generate figures using R scripts (e.g. Figure_XXXX.R), it is necessary first to construct and validate the prediction models. This can be done by running the following scripts:

  1. Model Construction: construct_models.R
  2. Internal Validation: internal_validation.R
  3. External Validation: external_validation.R

Using Pre-Constructed Models
As constructing the models can take several hours or even days, pre-constructed models are available in the out/model/ directory. If you wish to use these models directly, you may skip construct_models.R, and proceed with internal_validation.R and external_validation_of_models.R to validate and generate the figures.


Fecal microbial load is a major determinant of gut microbiome variation and a confounder for disease associations

Published in Cell, available online 13 November 2024

Suguru Nishijima, Evelina Stankevic, Oliver Aasmets, Thomas S. B. Schmidt, Naoyoshi Nagata, Marisa Isabell Keller, Pamela Ferretti, Helene Bæk Juel, Anthony Fullam, Shahriyar Mahdi Robbani, Christian Schudoma, Johanne Kragh Hansen, Louise Aas Holm, Mads Israelsen, Robert Schierwagen, Nikolaj Torp, Manimozhiyan Arumugam, Flemming Bendtsen, Charlotte Brøns, Cilius Esmann Fonvig, Jens-Christian Holm, Trine Nielsen, Julie Steen Pedersen, Maja Sofie Thiele, Jonel Trebicka, Elin Org, Aleksander Krag, Torben Hansen, Michael Kuhn, and Peer Bork, on behalf of the GALAXY and MicrobLiver Consortia

About

R codes used to construct prediction models and prepare figures

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published