Melonnpan – The Huttenhower Lab

MelonnPan

Model-based Genomically Informed High-dimensional Predictor of Microbial Community Metabolic Profiles

MelonnPan is a computational method for predicting metabolite composition from microbiome sequencing data.

For more information on the technical aspects:

Citation:
Himel Mallick, Eric A. Franzosa, Lauren J. McIver, Soumya Banerjee, Alexandra Sirota-Madi, Aleksandar D. Kostic, Clary B. Clish, Hera Vlamakis, Ramnik Xavier, Curtis Huttenhower (2019). Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences. Nature Communications 10(1):3136-3146.

OVERVIEW

MelonnPan is composed of two high-level workflows: MelonnPan-Predict and MelonnPan-Train.

The MelonnPan-Predict workflow takes a table of microbial sequence features as input (i.e. taxonomic or functional abundances on a per-sample basis) and outputs a predicted metabolomic table (i.e. relative abundances of metabolite compounds across samples).

The MelonnPan-Train workflow creates a weight matrix that links an optimal set of sequence features to a subset of predictable metabolites following rigorous internal validation, which is then used to generate a table of predicted metabolite compounds (i.e. relative abundances of metabolite compounds per sample). When sufficiently accurate, these predicted metabolite relative abundances can be used for downstream statistical analysis and end-to-end biomarker discovery.

GETTING STARTED

Requirements

R software (version >= 3.5.0)

There are three options for installing MelonnPan:

Within R
Directly from GitHub

From Within R

You can install melonnpan using the devtools package in R:

 devtools::install_github("biobakery/melonnpan")

From GitHub (Directly)

Clone the repository using git clone, which downloads the package as its own directory called melonnpan.

git clone https://github.com/biobakery/melonnpan.git

Then, install MelonnPan using R CMD INSTALL.

R CMD INSTALL melonnpan

How to Run

Basic Usage

MelonnPan-Predict:
To predict metabolite composition from metagenomes (stored in “metag.txt”) using the default model, enter the following in the command prompt:

$ Rscript predict_metabolites.R --metag="metag.txt" -o $OUTPUT_DIR

$OUTPUT_DIR = the output directory

The output file will be:

Predicted metabolite relative abundances along with Representative Training Similarity Index (RTSI) score for each sample.
MelonnPan-Train:
If you want to re-train a new MelonnPan model (i.e. different from the default trained model) using your own paired metabolite and microbial sequencing data (possibly measured from the same biospecimen) stored in “metab.txt” and “metag.txt” respectively, run the following command:

$ Rscript train_metabolites.R --metab="metab.txt" --metag="metag.txt" -o $OUTPUT_DIR

This workflow takes two tab-delimited text files as inputs (as above), where in each file, each column describes a feature (i.e. a metabolite compound or a metagenomic sequence feature) and each row represents a sample. These two tables should have the exact same samples (rows) containing normalized relative abundances (i.e. proportional data between 0 and 1).

Note: This step is computationally intensive and we recommend running with multiple cores for best results.

$OUTPUT_DIR = the output directory

The output file will be:

Predicted metabolite relative abundances along with trained weights and predictability score for each metabolite.