PPANINI
Prioritization and Prediction of functional Annotation for Novel and Important genes via automated data Network Integration
PPANINI (Prioritization and Prediction of functional Annotations for Novel and Important genes via automated data Network Integration) provides a computational pipeline to prioritize microbial genes based on their metagenomic properties (e.g. prevalence and abundance). The resulting prioritized list of gene candidates can then be analyzed further using our visualization tools.
A final abundances table can be used to start if it exists, or PPANINI can generate it from short reads.
For more information on the technical aspects:
User Manual || User Tutorial || Forum
Citation:
Gholamali Rahnavard, Afrah Shafquat, Himel Mallick, Jason Lloyd-Price, Kevin Bonham, Bahar Sayoldin, Eric A. Franzosa, Curtis Huttenhower, Identifying important uncharacterized genes using metagenomes and metatranscriptomes.
PPANINI prioritizes important genes in a microbial community based on presence/absence and abundance from metagenomic data. Sequencing a metagenome typically produces millions of short DNA/RNA reads. PPANINI takes a genes abundances table for all the samples in a study, it ranks the important genes and summarize the outputs as:
- A table of prioritized important genes with prevalence, abundance, and a score (PPANINI score). The genes are ranked based on their PPANINI score. In the genes column PPANINI uses UniRef90 ids or inartificial centroid for similar genes.
Getting Started with PPANINI
Requirements
Installation
- Install PPANINI
$ pip install ppanini
- Test the PPANINI install (Optional)
$ ppanini_test
- Download the PPANINI demos for clustering unannotated genes (Optional)
- You can obtain a copy by right-clicking the link and selecting “save link as”:
- Gene abundances table
- FASTA file
How to Run
Basic Usage
$ ppanini -i genetable.txt --gene-catalog samples.fasta -o $OUTPUT_DIR --vsearch /path/to/vsearch
$OUTPUT_DIR = the output directory
Clustring step could be bypassed by:
ppanini-i genetable.txt --bypass-clustering -o $OUTPUT_DIR
The output file will be:
A List of important genes (centroids) with prevelence, abundance, and ppanini score
Communities from Human Microbiome Project (HMP) to start with
A genes abundances table for 93 stool samples with a
UCLUST file containing centroids of genes. The UCLUST file is used to collapse unannotated
genes into artificial clusters.
$ ppanini -i stool_gene_centroids_table.txt --uc stool_gene_clusters.uc -o $OUTPUT_DIR
For bypassing clustering unannotated genes:
$ ppanini -i stool_gene_centroids_table.txt --bypass-clustering -o $OUTPUT_DIR
An genes abundances table for 70 Anterior nares samples with a
gene catalog fasta file which is used to cluster unannoted genes.
$ ppanini -i AN_gene_table.txt --gene-catalog AN_centroids_for_clustering.fasta -o $OUTPUT_DIR
For bypassing clustering unannotated genes:
$ ppanini -i AN_gene_table.txt --bypass-clustering -o $OUTPUT_DIR