HUMAnN: The HMP Unified Metabolic Analysis Network

You can obtain the HUMAnN software here:


This is the latest version, which provided the analysis for all metagenomic shotgun data from the Human Microbiome Project. If you find the software or data useful, please cite our manuscript:

Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, Rodriguez-Mueller B, Zucker J, Thiagarajan M, Henrissat B, White O, Kelley ST, Methé B, Schloss PD, Gevers D, Mitreva M, Huttenhower C. "Metabolic reconstruction for metagenomic data and its application to the human microbiome." PLoS Comput Biol. 2012 Jun;8(6):e1002358

Please contact us at the HUMAnN Google Group if you have any comments, suggestions, or bug reports for the software. Code is also available directly from our Mercurial source code repository at using the hg clone command.

If you would like to be notified about new versions, new features, or any other news related to HUMAnN please join our mailing list: the HUMAnN Google Group.

HUMAnN is a pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data. Sequencing a metagenome typically produces millions of short DNA/RNA reads. HUMAnN takes these reads as inputs and produces gene and pathway summaries as outputs:

  • The abundance of each orthologous gene family in the community. Orthologous families are groups of genes that perform roughly the same biological roles. HUMAnN uses the KEGG Orthology (KO) by default, but any catalog of orthologs can be employed with minor changes (COG, NOG, etc.)
  • The presence/absence of each pathway in the community. HUMAnN refers to pathway presence/absence as "coverage," and defines a pathway as a set of two or more genes. HUMAnN uses KEGG pathways and modules by default, but again can easily be modified to use GO terms or other gene sets.
  • The abundance of each pathway in the community, i.e. how many "copies" of that pathway are present.

HUMAnN can thus be used in tandem with any translated BLAST program, with out-of-the-box support for NCBI BLAST, USEARCH, MBLASTX, and MAPX. The pipeline converts sequence reads into coverage and abundance tables summarizing the gene families and pathways in one or more microbial communities. This lets you analyze a collection of metagenomes as a matrix of gene/pathway abundances, just like you might analyze a collection of microarrays.

We are aware that KEGG is now commercial, and we have updated HUMAnN accordingly. In brief, we include derived files and information needed for normal HUMAnN operation, but creation and evaluation of synthetic metagenomes is impeded without a KEGG license. Please contact the KEGG developers if this is an inconvenience for you contact us at the HUMAnN Google Group for assistance in evaluating HUMAnN output if necessary.

Many thanks to the NIH and to the entire Human Microbiome Project team for making the HMP possible and for the many collaborators who helped to make HUMAnN a reality. Sahar Abubucker and Makedonka Mitreva (Washington University) co-led the Metabolic Reconstruction group, Nicola Segata (Harvard School of Public Health) performed many HMP-specific analyses, the pipeline incorporates software from Yuzhen Ye (Indiana University), Beltran Rodriguez-Mueller (SDSU), and Pat Schloss (University of Michigan), and specific contributors include Alyx Schubert (University of Michigan), Jeremy Zucker (Broad Institute), Brandi Cantarel (UMD), Qiandong Zeng (Broad Institute), Johannes Goll (JCVI), and many others.

An overview of HUMAnN

HUMAnN overview

Metabolic modules differentially abundant in one or more body sites of the human microbiome

Metabolic modules differentially abundant in the human microbiome

Synthetic mock communities for validation

We generated 4 synthetic metagenomes to aid in evaluating HUMAnN's predictive accuracy. We generated two high-complexity (HC, 100 organisms) synthetic metagenomes called HC1 and HC2 and two low-complecity (LC, 20 organisms) synthetic metagenomes called LC1 and LC2. HC1 and LC1 have even distributions (all organisms present at equal abundance) while HC2 and LC2 have staggered distributions (organisms have random, log-normally distributed abundances). Organisms included in the LC metagenomes were manually selected from KEGG v54-curated reference genomes associated with the human microbiome, while organisms included in the HC metagenomes were randomly selected from all manually curated bacterial genomes.

Latest Versions

    v0.99, 10-2-2013

    • Completely revamped pathway coverage calculation, much more accurate for low-abundance events (thanks to Kat Huang, Sean Sykes!)
    • Fixed handling of empty hits files (thanks to Pavan Kumar!)
    • Fixed missing KO gloss annotations in merged 01b*.txt per-gene tabular abundance quantification output files.
    • Added tab-delimited input file formats.
    • Added GraPhlAn tree output files to enable visualization of abundance overlays on KEGG hierarchies (thanks to Jovian Yu, Morgan Paull!).
    • Added preliminary organism-specific output generation.

    v0.98, 12-06-11

    • Allow to remove unusual duplicate enzymes from KEGG's files
    • Allow input filenames to contain underscores
    • Fix module size calculation in
    • Fix a bug in to allow a wider range of KEGG gene name detection

    v0.971, 10-17-11

    • Fix missing (thanks to Brandi Cantarel!)

    v0.97, 10-17-11

    • Add several internal evaluation pipelines in response to initial reviews
    • Fix hits2*.py handling of zero/very small e-values (thanks to Fah Sathira!)

    v0.96, 07-28-11

    • MAJOR CHANGE: KEGG is now defunct, and HUMAnN has been updated accordingly
      • KEGG derived information needed for normal operation is included
      • KEGG files needed for synthetic metagenome construction are _not_ included
      • "Frozen" synthetic metagenome evaluation is still possible
      • Please contact us directly for more information if needed
    • Add documentation on potential maq issues (thanks to Shinichi Sunagawa!)
    • Fix a typo in formatting (thanks to Shinichi Sunagawa!)
    • Fix a typo in formatting (thanks to Kathryn Iverson!)
    • Fix a typo in for overly sparse input files (thanks to Jeffrey Werner!)
    • Work around Mac OS X zcat issues (thanks to Jeffrey Werner!)

    v0.95, 05-18-11

    • Fix a typo in (only affected unused filter option)
    • Add complete parameter evaluation process to HMP pipeline