humann1 – The Huttenhower Lab

HUMAnN 1.0 The HMP Unified Metabolic Analysis Network

You can obtain the HUMAnN 1.0 software here:

0.99b.tar.gzThis is the latest version, which provided the analysis for all metagenomic shotgun data from the Human Microbiome Project. If you find the software or data useful, please cite our manuscript:

Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, Rodriguez-Mueller B, Zucker J, Thiagarajan M, Henrissat B, White O, Kelley ST, Methé B, Schloss PD, Gevers D, Mitreva M, Huttenhower C. "Metabolic reconstruction for metagenomic data and its application to the human microbiome." PLoS Comput Biol. 2012 Jun;8(6):e1002358Please contact us at the HUMAnN category in Forum if you have any comments, suggestions, or bug reports for the software. Code is also available directly from our Github source code repository at https://github.com/biobakery/humann_legacy using the git clone command.

If you would like to be notified about new versions, new features, or any other news related to HUMAnN, please refer to the HUMAnN category in the bioBakery Forum
Additionally, HUMAnN Google Group (Read only) available here.

HUMAnN 1.0 is a pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data. Sequencing a metagenome typically produces millions of short DNA/RNA reads. HUMAnN 1.0 takes these reads as inputs and produces gene and pathway summaries as outputs:

The abundance of each orthologous gene family in the community. Orthologous families are groups of genes that perform roughly the same biological roles. HUMAnN 1.0 uses the KEGG Orthology (KO) by default, but any catalog of orthologs can be employed with minor changes (COG, NOG, etc.)
The presence/absence of each pathway in the community. HUMAnN 1.0 refers to pathway presence/absence as "coverage," and defines a pathway as a set of two or more genes. HUMAnN 1.0 uses KEGG pathways and modules by default, but again can easily be modified to use GO terms or other gene sets.
The abundance of each pathway in the community, i.e. how many "copies" of that pathway are present.

HUMAnN 1.0 can thus be used in tandem with any translated BLAST program, with out-of-the-box support for NCBI BLAST, USEARCH, MBLASTX, and MAPX. The pipeline converts sequence reads into coverage and abundance tables summarizing the gene families and pathways in one or more microbial communities. This lets you analyze a collection of metagenomes as a matrix of gene/pathway abundances, just like you might analyze a collection of microarrays.

We are aware that KEGG is now commercial, and we have updated HUMAnN 1.0 accordingly. In brief, we include derived files and information needed for normal HUMAnN 1.0 operation, but creation and evaluation of synthetic metagenomes is impeded without a KEGG license. Please contact the KEGG developers if this is an inconvenience for you contact us at the HUMAnN category in Forum for assistance in evaluating HUMAnN output if necessary.

Many thanks to the NIH and to the entire Human Microbiome Project team for making the HMP possible and for the many collaborators who helped to make HUMAnN a reality. Sahar Abubucker and Makedonka Mitreva (Washington University) co-led the Metabolic Reconstruction group, Nicola Segata (Harvard School of Public Health) performed many HMP-specific analyses, the pipeline incorporates software from Yuzhen Ye (Indiana University), Beltran Rodriguez-Mueller (SDSU), and Pat Schloss (University of Michigan), and specific contributors include Alyx Schubert (University of Michigan), Jeremy Zucker (Broad Institute), Brandi Cantarel (UMD), Qiandong Zeng (Broad Institute), Johannes Goll (JCVI), and many others.

Synthetic mock communities for validation

We generated 4 synthetic metagenomes to aid in evaluating HUMAnN's predictive accuracy. We generated two high-complexity (HC, 100 organisms) synthetic metagenomes called HC1 and HC2 and two low-complecity (LC, 20 organisms) synthetic metagenomes called LC1 and LC2. HC1 and LC1 have even distributions (all organisms present at equal abundance) while HC2 and LC2 have staggered distributions (organisms have random, log-normally distributed abundances). Organisms included in the LC metagenomes were manually selected from KEGG v54-curated reference genomes associated with the human microbiome, while organisms included in the HC metagenomes were randomly selected from all manually curated bacterial genomes.

Latest Versions

v0.99, 10-2-2013

Completely revamped pathway coverage calculation, much more accurate for low-abundance events (thanks to Kat Huang, Sean Sykes!)
Fixed hits2metarep.py handling of empty hits files (thanks to Pavan Kumar!)
Fixed missing KO gloss annotations in merged 01b*.txt per-gene tabular abundance quantification output files.
Added tab-delimited input file formats.
Added GraPhlAn tree output files to enable visualization of abundance overlays on KEGG hierarchies (thanks to Jovian Yu, Morgan Paull!).
Added preliminary organism-specific output generation.

v0.98, 12-06-11

Allow module2modulec.py to remove unusual duplicate enzymes from KEGG's files
Allow input filenames to contain underscores
Fix module size calculation in filter.py
Fix a bug in hits2enzymes.py to allow a wider range of KEGG gene name detection

v0.971, 10-17-11

Fix missing exclude.py (thanks to Brandi Cantarel!)

v0.97, 10-17-11

Add several internal evaluation pipelines in response to initial reviews
Fix hits2*.py handling of zero/very small e-values (thanks to Fah Sathira!)

v0.96, 07-28-11

MAJOR CHANGE: KEGG is now defunct, and HUMAnN 1.0 has been updated accordingly
- KEGG derived information needed for normal operation is included
- KEGG files needed for synthetic metagenome construction are _not_ included
- "Frozen" synthetic metagenome evaluation is still possible
- Please contact us directly for more information if needed
Add documentation on potential maq issues (thanks to Shinichi Sunagawa!)
Fix a typo in fastq2fasta.py formatting (thanks to Shinichi Sunagawa!)
Fix a typo in module2modulec.py formatting (thanks to Kathryn Iverson!)
Fix a typo in eco.py for overly sparse input files (thanks to Jeffrey Werner!)
Work around Mac OS X zcat issues (thanks to Jeffrey Werner!)

v0.95, 05-18-11

Fix a typo in hits2enzymes.py (only affected unused filter option)
Add complete parameter evaluation process to HMP pipeline