ShortBRED: Short, Better Representative Extract Dataset



Introduction to ShortBRED

ShortBRED is a system for profiling protein families of interest at very high specificity in shotgun meta'omic sequencing data. ShortBRED-Identify collapses proteins of interest into families, and then screens these families (as represented by consensus sequences) against 1) each other and 2) a comprehensive protein reference. ShortBRED-Identify then identifies short, distinguishing peptide sequences ("markers") for each protein family. This process is performed once for a given set of proteins of interest to produce a reusable marker set (see below for examples of pre-computed markers). ShortBRED-Quantify then screens a metagenome or metatranscriptome against a given marker set to profile the presence/absence and relative abundance of the associated proteins.

For more information on the technical aspects of ShortBRED, or to cite ShortBRED in your work, please refer to:

Kaminski J, Gibson MK, Franzosa EA, Segata N, Dantas G, Huttenhower C.
High-specificity targeted functional profiling in microbial communities with ShortBRED.
PLoS Comput Biol. 2015 Dec 18;11(12):e1004557.


How ShortBRED works


ShortBRED Dependencies


Pre-computed ShortBRED Markers

Markers used in the 2015 paper:

Other frequently-used markers:


ShortBRED Reference Databases

ShortBRED-Identify uses a comprehensive (ideally non-redundant) background protein reference database to screen for protein family-specific peptide sequences. The custom BLAST database used in the 2015 paper can be downloaded here.

A convenient (and continuously updated) alternative background database is UniRef90. You can download UniRef90 as a FASTA file and provide it as input to ShortBRED-Identify with "--ref uniref90.fasta". This will format UniRef90 as a BLAST database for the current run; the BLAST database can be reused in later runs.


ShortBRED Quickstart

To create markers for the sample data included with ShortBRED, set your current working directory to the folder where you unpacked ShortBRED and type:

$ ./shortbred_identify.py --goi example/input_prots.faa --ref example/ref_prots.faa --markers mytestmarkers.faa --tmp example_identify

The sample data included with ShortBRED is quite small, so this command should run in less than a minute on a typical machine. It will create a set of markers ("mytestmarkers.faa") that you can open up and explore to get a sense of what typical ShortBRED-Identify output looks like.

There are many settings available. Please see the ShortBRED manual for more details.

If you would like to test ShortBRED-Quantify using your new markers, enter the following command:

$ ./shortbred_quantify.py --markers mytestmarkers.faa --wgs example/wgs.fna --results results.txt --tmp example_quantify

This command should also run quickly, as there are only 100 nucleotide reads in example/wgs.fna. You can then open up results.txt and see the ShortBRED counts for each protein family, which provides the relative abundance of the protein families in the wgs data.

As with ShortBRED-Identify, there are many settings available, which are described in more detail in the ShortBRED manual.