Prioritization and Prediction of functional Annotation for Novel and Important genes via automated data Network Integration

PPANINI (Prioritization and Prediction of functional Annotations for Novel and Important genes via automated data Network Integration) provides a computational pipeline to prioritize microbial genes based on their metagenomic properties (e.g. prevalence and abundance). The resulting prioritized list of gene candidates can then be analyzed further using our visualization tools.

A final abundances table can be used to start if it exists, or PPANINI can generate it from short reads.

For more information on the technical aspects:

User Manual || User Tutorial || Forum

Gholamali Rahnavard, Afrah Shafquat, Himel Mallick, Jason Lloyd-Price, Kevin Bonham, Bahar Sayoldin, Eric A. Franzosa, Curtis Huttenhower, Identifying important uncharacterized genes using metagenomes and metatranscriptomes. 

PPANINI prioritizes important genes in a microbial community based on presence/absence and abundance from metagenomic data. Sequencing a metagenome typically produces millions of short DNA/RNA reads. PPANINI takes a genes abundances table for all the samples in a study, it ranks the important genes and summarize the outputs as:

    • A table of prioritized important genes with prevalence, abundance, and a score (PPANINI score). The genes are ranked based on their PPANINI score. In the genes column PPANINI uses UniRef90 ids or inartificial centroid for similar genes.

Getting Started with PPANINI


  1. matplotlib
  2. Python 2.7
  3. Biopython
  4. Numpy 1.9


  1. Install PPANINI
    • $ pip install ppanini
  2. Test the PPANINI install (Optional)
    • $ ppanini_test
  3. Download the PPANINI demos for clustering unannotated genes (Optional)

How to Run

Basic Usage

$ ppanini -i genetable.txt --gene-catalog samples.fasta -o $OUTPUT_DIR --vsearch /path/to/vsearch

$OUTPUT_DIR = the output directory

Clustring step could be bypassed by:

ppanini-i genetable.txt --bypass-clustering -o $OUTPUT_DIR

The output file will be:

List of important genes (centroids) with prevelence, abundance, and ppanini score

Communities from Human Microbiome Project (HMP) to start with

A genes abundances table for 93 stool samples with a
UCLUST file containing centroids of genes. The UCLUST file is used to collapse unannotated
genes into artificial clusters.

$ ppanini -i stool_gene_centroids_table.txt --uc stool_gene_clusters.uc -o $OUTPUT_DIR

For bypassing clustering unannotated genes:

$ ppanini -i stool_gene_centroids_table.txt --bypass-clustering -o $OUTPUT_DIR

An genes abundances table for 70 Anterior nares samples with a
gene catalog fasta file which is used to cluster unannoted genes.

$ ppanini -i AN_gene_table.txt --gene-catalog AN_centroids_for_clustering.fasta -o $OUTPUT_DIR

For bypassing clustering unannotated genes:

$ ppanini -i AN_gene_table.txt --bypass-clustering -o $OUTPUT_DIR