FUGAsseM

The Huttenhower Lab > FUGAsseM
FUGAsseM

FUGAsseM (Function predictor of Uncharacterized Gene products by Assessing high-dimensional community data in Microbiomes) is a computational tool based on a “guilt by association” approach to predict functions of novel gene products in the context of microbial communities. It uses machine learning methods to predict functions of microbial proteins by integrating multiple types of community-wide data.

User manual || Tutorial || Forum

Citation:

Yancong Zhang, Amrisha Bhosle, Sena Bae, Kelly Eckenrode, Xueying Huang, Jingjing Tang, Danylo Lavrentovich, Lana Awad, Ji Hua, Ya Wang, Xochitl C. Morgan, Bin Li, Andy Krueger, Wendy S. Garrett, Eric A. Franzosa, Curtis Huttenhower. "Predicting functions of uncharacterized gene products from microbial communities" [In submission].

In the meantime, please add the software link in your Methods if you cite FUGAsseM:

http://huttenhower.sph.harvard.edu/fugassem

For more detailed information about the software, read FUGAsseM User Manual and FUGAsseM Tutorial

Overview

FUGAsseM uses a “guilt-by-association” approach by building an individual classifier for upweight individual data type resulting in an evidence weighting behavior in the first layer, followed by a second layer building an ensemble classifier by integrating the weighted learning results from the first layer. This layered learning and predicting process integrates different source of functional information while simultaneously assigning weights to each data type for final predictions.

Installation

Requirements

  1. Python (version >= 3.7, requiring numpy, pandas
    multiprocessing, sklearn, matplotlib, scipy, goatools, statistics python packages; tested 3.7)
  2. AnADAMA2 (version >= 0.8.0; tested 0.8.0)

Install FUGAsseM

You only need to do any one of the following options to install the FUGAsseM package.

Option 1: Installing with conda

  • $ conda install -c biobakery fugassem

Option 2: Installing with pip

  • $ pip install fugassem
  • If you do not have write permissions to /usr/lib/, then add the option --user to the install command. This will install the python package into subdirectories of ~/.local/. Please note when using the --user install option on some platforms, you might need to add ~/.local/bin/ to your $PATH as it might not be included by default. You will know if it needs to be added if you see the following message fugassem: command not found when trying to run FUGAsseM after installing with the --user option.
How to run

A typical process runs FUGAsseM per dataset.

  • For a list of command line options, run:$ fugassem --helpThis command yields:
    usage: fugassem_workflow.py [-h] [--version]
                        [--taxon-level {MSP,Terminal,Species,Genus,Family,
                        Order,Class,Phylum}]
            ...
  • Run the canonical function prediction of FUGAsseM, which requires MTX abundance data and annotation of protein families for function prediction.
    • Input files:
    • Demo run of FUGAsseM-MTX model
      $ fugassem --basename $BASENAME \ 
      --input $INPUT_MTX \ 
      --input-annotation $INPUT_annotation \
      --output $OUTPUT_DIR
      • $INPUT_MTX = the protein families MTX abundances file (TSV format)
      • $INPUT_annotation = raw GO annotations for some of these protein families (TSV format)
      • $OUTPUT_DIR = the output folder
      • Output files will be created named with $BASENAME:
        1. $OUTPUT_DIR/merged/$BASENAME.finalized_ML.prediction.tsv: this file combines the finalized predictions from all taxa by using machine learning approaches based on MTX coexpression patterns (TSV format).
        2. Predictions files of each taxon will also be created. E.g. the finalized predictions using MTX-coexpression evidence per taxon are in the file: $OUTPUT_DIR/main/$TAXON_NAME/prediction/finalized/$BASENAME.$TAXON_NAME.finalized_ML.prediction.tsv
  • Run the integrated function prediction workflow of FUGAsseM. When other community-wide data are available, FUGAsseM can predict functions by integrating multiple pieces of evidence. The additional steps in this workflow are 1) building individual machine learning classifiers for each type of evidence including coexpression as discussed above, 2) and integration to generate an ensemble classifier for final function prediction.
    • Input files
    • Demo run of FUGAsseM-full model
      $ fugassem --basename $BASENAME \
      --input $INPUT_MTX \ 
      --input-annotation $INPUT_annotation \ 
      --vector-list $VECTOR_list --matrix-list $METRIX_list \ 
      --output $OUTPUT_DIR
      • $INPUT_MTX = the protein families MTX abundances file (TSV format)
      • $INPUT_annotation = raw GO annotations for some of these protein families (TSV format)
      • $VECTOR_list = file names of vector-based evidence data, provided as a string of 'file1,file2', semi-colon delimited for multiple files.
      • $METRIX_list = file names of matrix-based evidence data, provided as a string of 'file1,file2', semi-colon delimited for multiple files.
      • $OUTPUT_DIR = the output folder
      • Output files will be created named with $BASENAME:
        1. $OUTPUT_DIR/merged/$BASENAME.finalized_ML.prediction.tsv: this file combines the finalized predictions from all taxa by using machine learning approaches based on MTX coexpression patterns (TSV format).
        2. $OUTPUT_DIR/merged/$BASENAME.$EVIDENCE_TYPE_ML.prediction.tsv (where $EVIDENCE_TYPE = the basename of each piece of evidence): this file includes combined predictions based on individual type of evidence (TSV format file).
        3. Predictions files of each taxon will also be created:
          • FUGAsseM predicts functions based on input evidence data.
          • The finalized prediction results using integrated evidence per taxon are in the file: $OUTPUT_DIR/main/$TAXON_NAME/prediction/finalized/$BASENAME.$TAXON_NAME.finalized_ML.prediction.tsv.
          • The prediction results by using individual evidence per taxon are in the file: $OUTPUT_DIR/$TAXON_NAME/prediction/$EVIDENCE_TYPE/$BASENAME.$TAXON_NAME.$EVIDENCE_TYPE_ML.prediction.tsv (where $EVIDENCE_TYPE = the basename of each piece of evidence).
Download FUGAsseM results

Download pre-computed predicted annotations by FUGAsseM for microbial communities. For more information, see our paper "Predicting functions of uncharacterized gene products from microbial communities" [In submission].

Applying FUGAsseM to 1,595 HMP2 metagenomes and 800 metatranscriptomes

We provide FUGAsseM's predicted functions of the protein families assembled from the Integrative Human Microbiome Project (HMP2), Inflammatory Bowel Disease Multi'omics Database (IBDMDB). Here, predicted functions cover all categories of the Gene Ontology (i.e. Biological Process, Molecular Function and Cellular Component).