MetaPhlAn4 – The Huttenhower Lab

MetaPhlAn 4.0

MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With StrainPhlAn, it is possible to perform accurate strain-level microbial profiling. MetaPhlAn 4 relies on ~5.1M unique clade-specific marker genes identified from ~1M microbial genomes (~236,600 references and 771,500 metagenomic assembled genomes) spanning 26,970 species-level genome bins (SGBs, http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html), 4,992 of them taxonomically unidentified at the species level (the latest marker information file can be found here), allowing:

unambiguous taxonomic assignments
an accurate estimation of organismal relative abundance
SGB-level resolution for bacteria, archaea and eukaryotes
strain identification and tracking
orders of magnitude speedups compared to existing methods.
metagenomic strain-level population genomics

For more information on the technical aspects of:

User manual || Tutorial || Forum

Citation:

Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn 4.

Aitor Blanco-Miguez, Francesco Beghini, Fabio Cumbo, Lauren J. McIver, Kelsey N. Thompson, Moreno Zolfo, Paolo Manghi, Leonard Dubois, Kun D. Huang, Andrew Maltez Thomas, Gianmarco Piccinno, Elisa Piperni, Michal Punčochář, Mireia Valles-Colomer, Adrian Tett, Francesca Giordano, Richard Davies, Jonathan Wolf, Sarah E. Berry, Tim D. Spector, Eric A. Franzosa, Edoardo Pasolli, Francesco Asnicar, Curtis Huttenhower, Nicola Segata. Preprint (2022)

If you use StrainPhlAn, please cite the MetaPhlAn paper and the following StrainPhlAn paper:

Microbial strain-level population structure and genetic diversity from metagenomes. Duy Tin Truong, Adrian Tett, Edoardo Pasolli, Curtis Huttenhower, & Nicola Segata. Genome Research 27:626-638 (2017)

Major updates in version 4.0

Adoption of the species-level genome bins system (SGBs, http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html)
New MetaPhlAn marker genes extracted identified from ~1M microbial genomes
Ability to profile 21,978 known (kSGBs) and 4,992 unknown (uSGBs) microbial species
Better representation of, not only the human gut microbiome but also many other animal and ecological environments
Estimation of metagenome composed by microbes not included in the database with parameter --unclassified_estimation
Compatibility with MetaPhlAn 3 databases with parameter --mpa3

Full list of changes here.

Pre-requisites

MetaPhlAn requires python 3 or newer with numpy, and Biopython libraries installed. Python libraries are automatically installed by pip. MetaPhlAn relies on BowTie2 (version 2.3 or higher) to map reads against marker genes. Check that bowtie2 is present in the system path with execute and read permissions.

If MetaPhlAn is installed using conda, no pre-requisites are needed.

MetaPhlAn is integrated with advanced heatmap plotting with hclust2 and cladogram visualization with GraPhlAn. If you use such visualization tools please refer to their prerequisites.

Installation

The best way to install MetaPhlAn is through conda via the Bioconda channel. If you have not configured you Anaconda installation in order to fetch packages from Bioconda, please follow these steps in order to setup the channels.

You can install MetaPhlAn by running

$ conda install -c bioconda metaphlan

For installing it from the source code and for further installation instructions, please see the Wiki at the Installation paragraph.