ShortBRED is a system for profiling protein families of interest at very high specificity in shotgun meta’omic sequencing data.
ShortBRED-Identify collapses proteins of interest into families, and then screens these families (as represented by consensus sequences) against 1) each other and 2) a comprehensive protein reference.
ShortBRED-Identify then identifies short, distinguishing peptide sequences (“markers”) for each protein family. This process is performed once for a given set of proteins of interest to produce a reusable marker set (see below for examples of pre-computed markers).
ShortBRED-Quantify then screens a metagenome or metatranscriptome against a given marker set to profile the presence/absence and relative abundance of the associated proteins.
For more information on the technical aspects:
Kaminski J, Gibson MK, Franzosa EA, Segata N, Dantas G, Huttenhower C.
High-specificity targeted functional profiling in microbial communities with ShortBRED.
PLoS Comput Biol. 2015 Dec 18;11(12):e1004557.
Markers used in the 2015 paper:
Other frequently-used markers:
- An updated marker collection (mid-2017) for microbial Virulence Factors based on input protein sequences compiled from Victors, VFDB, and MvirDB.
- An updated marker collection (mid-2017) for Antibiotic Resistance Factors based on The Comprehensive Antibiotic Resistance Database (CARD).
ShortBRED-Identify uses a comprehensive (ideally non-redundant) background protein reference database to screen for protein family-specific peptide sequences. The custom BLAST database used in the 2015 paper can be downloaded here.
A convenient (and continuously updated) alternative background database is UniRef90. You can download UniRef90 as a FASTA file and provide it as input to
ShortBRED-Identify with “
--ref uniref90.fasta“. This will format UniRef90 as a BLAST database for the current run; the BLAST database can be reused in later runs.
To create markers for the sample data included with ShortBRED, set your current working directory to the folder where you unpacked ShortBRED and type:
$ ./shortbred_identify.py --goi example/input_prots.faa --ref example/ref_prots.faa --markers mytestmarkers.faa --tmp example_identify
The sample data included with ShortBRED is quite small, so this command should run in less than a minute on a typical machine. It will create a set of markers (“mytestmarkers.faa”) that you can open up and explore to get a sense of what typical
ShortBRED-Identify output looks like.
If you would like to test
ShortBRED-Quantify using your new markers, enter the following command:
$ ./shortbred_quantify.py --markers mytestmarkers.faa --wgs example/wgs.fna --results results.txt --tmp example_quantify
This command should also run quickly, as there are only 100 nucleotide reads in
example/wgs.fna. You can then open up
results.txt and see the ShortBRED counts for each protein family, which provides the relative abundance of the protein families in the wgs data.