Model-based Genomically Informed High-dimensional Predictor of Microbial Community Metabolic Profiles
MelonnPan is a computational method for predicting metabolite composition from microbiome sequencing data.
For more information on the technical aspects:
User manual || Tutorial || Forum
Citation:
Himel Mallick, Eric A. Franzosa, Lauren J. McIver, Soumya Banerjee, Alexandra Sirota-Madi, Aleksandar D. Kostic, Clary B. Clish, Hera Vlamakis, Ramnik Xavier, Curtis Huttenhower (2019). Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences. Nature Communications 10(1):3136-3146.
MelonnPan is composed of two high-level workflows: MelonnPan-Predict and MelonnPan-Train.
The MelonnPan-Predict workflow takes a table of microbial sequence features as input (i.e. taxonomic or functional abundances on a per-sample basis) and outputs a predicted metabolomic table (i.e. relative abundances of metabolite compounds across samples).
The MelonnPan-Train workflow creates a weight matrix that links an optimal set of sequence features to a subset of predictable metabolites following rigorous internal validation, which is then used to generate a table of predicted metabolite compounds (i.e. relative abundances of metabolite compounds per sample). When sufficiently accurate, these predicted metabolite relative abundances can be used for downstream statistical analysis and end-to-end biomarker discovery.
Requirements
R software (version >= 3.5.0)
There are three options for installing MelonnPan:
- Within R
- Directly from GitHub
From Within R
You can install melonnpan
using the devtools
package in R:
devtools::install_github("biobakery/melonnpan")
From GitHub (Directly)
Clone the repository using git clone
, which downloads the package as its own directory called melonnpan
.
git clone https://github.com/biobakery/melonnpan.git
Then, install MelonnPan using R CMD INSTALL
.
R CMD INSTALL melonnpan
How to Run
Basic Usage
- MelonnPan-Predict:
-
To predict metabolite composition from metagenomes (stored in “metag.txt”) using the default model, enter the following in the command prompt:
$ Rscript predict_metabolites.R --metag="metag.txt" -o $OUTPUT_DIR
$OUTPUT_DIR = the output directory
The output file will be:
Predicted metabolite relative abundances along with Representative Training Similarity Index (RTSI) score for each sample.
- MelonnPan-Train:
-
If you want to re-train a new MelonnPan model (i.e. different from the default trained model) using your own paired metabolite and microbial sequencing data (possibly measured from the same biospecimen) stored in “metab.txt” and “metag.txt” respectively, run the following command:
$ Rscript train_metabolites.R --metab="metab.txt" --metag="metag.txt" -o $OUTPUT_DIR
This workflow takes two tab-delimited text files as inputs (as above), where in each file, each column describes a feature (i.e. a metabolite compound or a metagenomic sequence feature) and each row represents a sample. These two tables should have the exact same samples (rows) containing normalized relative abundances (i.e. proportional data between 0 and 1).
Note: This step is computationally intensive and we recommend running with multiple cores for best results.
$OUTPUT_DIR = the output directory
The output file will be:
Predicted metabolite relative abundances along with trained weights and predictability score for each metabolite.