GraPhlAn

The Huttenhower Lab > GraPhlAn

 

GraPhlAn

GraPhlAn is a software tool for producing high-quality circular representations of taxonomic and phylogenetic trees. GraPhlAn focuses on concise, integrative, informative, and publication-ready representations of phylogenetically- and taxonomically-driven investigation.

User manual || Tutorial || Forum

Requirements

    1. python (version >= 2.7)
    2. the biopython python library ( version >=1.6)
    3. the matplotlib python library (version >=1.1)

Getting started

Installation

GraPhlAn is available in bitbucket and should be obtained using Github.
download the GraPhlAn software from the repository:

$ git clone https://github.com/biobakery/graphlan.git
or

This will download the GraPhlAn repository locally in the “graphlan” subfolder. You then have to put this subfolder into the system path so that you can use GraPhlAn from any location in your system:

$ export PATH=`pwd`/graphlan/:$PATH

Adding the above line into the bash configuration file will make the path addition permanent.

How to Run

Basic usage

The GraPhlAn package consists in two main scripts:
1- graphlan.py
2- graphlan_annotate.py

The first produces graphical output of an input tree in any of the three most popular format: Newick, PhyloXML, or text format. The second modifies any input tree (in any of the three standard format) adding additional information regarding structural or graphical aspects of the tree (like colors and style of the taxa, labels, shadows, heatmaps, …); graphlan_annotate.py generates PhyloXML files that can be converted into images by graphlan.py.

………..

More specifically, here are all the options one can set for graphlan.py:

usage: graphlan.py [-h] [–format [‘output_image_format’]]
[–warnings WARNINGS] [–dpi image_dpi] [–size image size]
[–pad pad_in] [-v]
input_tree output_image

GraPhlAn 0.9 (22 August 2012)
AUTHORS: Nicola Segata (nsegata@hsph.harvard.edu)

positional arguments:
input_tree the input tree in PhyloXML format
output_image the output image (the format is guessed from the
extension unless –format is given. (png, pdf, ps,
eps, svg are the available file formats

optional arguments:
-h, –help show this help message and exit
–format [‘output_image_format’]
set the format of the output image (default none
meaning that the format is guessed from the output
file extension)
–warnings WARNINGS set whether warning messages should be reported or not
(default 1)
–dpi image_dpi the dpi of the output image for non vectorial formats
–size image size the size of the output image (in inches, default 7.0)
–pad pad_in the distance between the most external graphical
element and the border of the image
-v, –version Prints the current GraPhlAn version and exit

………..

Input tree files for graphlan.py can be generated, personalized, and annotated using the graphlan_annotate.py module. In addition to the tree topology and (possibly) branch lengths, graphlan_annotate.py reads an “annotation file” (–annot option) which specifies the graphical options for the tree. The syntax of the annotation file is described comprehensively below. Here is the command line invocation syntax.

usage: graphlan_annotate.py [-h] [–annot the annotation file] [-v]
input_tree [output_tree]

GraPhlAn annotate module 0.9 (22 August 2012) AUTHORS: Nicola Segata
(nsegata@hsph.harvard.edu)

positional arguments:
input_tree the input tree (in Newick, Nexus, PhyloXML or plain
text format
output_tree the output tree in PhyloXML format containing the
newly added annotations. If not specified, the input
tree file will be overwritten

optional arguments:
-h, –help show this help message and exit
–annot the annotation file
specify the annotation file
-v, –version Prints the current MetaPhlAn version and exit

Command and syntax of the annotation file

The annotation file is a tab-delimited file listing the graphical options for clades. Usually each line has three fields: the name of the clade, the name of the option, and the value to assign to the option. Lines can however have two fields (typically for “global” option not referred to a specific clade) or four fields when the external rings (a sort of circular heatmap) is specified.

Below we report and describe all available options and their syntax subdivided by option types.

Global tree options

Global structural and visual characteristics affecting the entire tree are specified in the annotation file with a two-column tab separated syntax with the following pattern

  • global_tree_option
  • global_tree_option_value

where global_tree_option can be any of the following:

  1. ignore_branch_len [def. 0 = False]
    • specify whether to display the tree with fixed branch length (i.e. 0) or with the values specified in the input tree. If the input tree is not containing branch length information, branch lengths will not be showed regardless of this option
  2. total_plotted_degrees [def. 360]
    • the total circular portion used in plotting the tree. 360 means that the tree uses the full rotational space. Small trees are usually best displayed with a limited total_plotted_degrees value.
  3. start_rotation [def 0]
    • the default starting rotational position for the first leaf of the tree
  4. clade_separation [def 0.0]
    • specify a fractional separation between clades which is proportional to the branch distance between subtrees. It option can be used to visually separate more clades that are reciprocally deep branching.
  5. branch_bracket_depth [def 0.25]
    • the relative position of the branch bracket which is the radial segment from which the child taxa branches originate.
  6. branch_bracket_width [def 1.0]
    • the width of the branch bracket relative to the position of the most separated child roots

Graphical tree options

The graphical tree options are the most common way of personalizing the trees. They can be referred to specific clade, to set of clades, or to all clade. The syntax is the following:

  • [clade_name{+|*|^}]
  • graphical_tree_option
  • graphical_tree_option_value

If the clade name is omitted the option is applied to ALL clades. The clade can be specified with the full label comprising all names from the root of the tree or with the last level only (if last level names are not unique, multiple matching clades will be affected by the command). Optionally, at the end of the clade name, one of the following character can be added (see below for the meaning of these symbols): +, *, ^

The “graphical_tree_option” can be:

  1. clade_marker_size [def. 20.0]
    • the size of the marker representing the root of the clade inside the tree
  2. clade_marker_color [def. #FFFFFF, i.e. white]
    • the fill color of the marker representing the root of the clade inside the tree
  3. clade_marker_shape [def. ‘o’, i.e. circle]
    • the shape of the clade marker. See the Marker Shapes table below for the allowed options
  4. clade_marker_edge_width [def. 0.5]
    • the thickness of the border for clade markers
  5. clade_marker_edge_color [def. #000000, i.e. black]
    • the color of the markers’ border

When added after the name of a valid clade, the following three characters can be used to apply the same property to multiple parts of the clade’ subtree

  • * : the specified clade and all its descendants are affected by the property
  • + : the specified clade and all its terminal nodes are affected
  • ^ : all (an only) the terminal nodes of the specified clade are affected

Annotation options

  • [clade_name]
  • annotation_option
  • graphical_tree_option_value

We call annotations the shadings highlighting clades and the corresponding subtree. Annotations can be colored, their alpha-channel can be globally regulated, and have a label associated with them. Specifically, the options available for annotations are:

  1. annotation_font_size [def. 7]
    • the font size of the annotation label
  2. annotation_font_stretch [def. 0]
    • horizontal font compactness (0 is minimal)
  3. annotation_rotation [def 0]
      the rotation of the label. As default the rotation is perpendicular to the radial position of the label. It can be changed to 90 so that the labels are less likely to overlap
  4. annotation_background_color [def. grey]
    • the color of the annotation background
  5. annotation_background_edge_color [def annotation_background_color]
    • the color of the edge for the annotation background
  6. annotation [def. no annotation]
    • the label the be associated and displayed for the annotation. This can assume several formats:
    • str (a string not containing ‘:’): the string to be displayed entirely (an only) on the shading
    • key:str : the (supposedly short) key will be displayed on the annotation shading, whereas the full key:string label will be reported as external legend
    • *:str : a key will be generated and used as the 2. “key:str” case
    • * : the name of the clade (specifically the last taxonomic level only) will be used as the ‘str’ in the 1. case above
    • *:* : the combination of the 3. and 4. cases above

Ring options

We call rings the graphical elements external to the tree itself that can be seen as “circular heatmaps”, “circular barplots”, and actually more (like indicator elements). These “rings” are linked directly to the internal tree as each segment of the rings correspond to a tree leaf (and potentially to internal nodes as well). Multiple rings can be specified for the same image and each must have a progressive associated number (level “1” being the most internal ring).

The general syntax for rings is:

[clade_name]ring_option

ring_level

ring_option_value

If clade_name is not present or if it is “*” the ring option is applied to all the ring sectors in the “ring_level”. The “ring_level” is a integer number that must always be specified.

Here are the possible ring options:

    1. ring_color [def. black]
      • the color of the ring segment
    2. ring_width [def. 1.0]
      • the width of the ring segment a fraction of the total circular width available for the specific clade
    3. ring_height [def. highest height for the rings in the same level, or 0.1 if no heights are specific]
      • the height of the circular segment. If not specific the same default height (0.1*size of the tree) is applied for all ring segment in the level, otherwise the height is equal to the biggest height value in the level.
    4. ring_alpha [def. 1.0]

the transparency value. 0.0 means completely transparent (thus invisible), 1.0 means completely opaque (no transparencies)

  1. ring_shape [def. R]
    • the shape of the ring. Default is ‘R’ for rectangular which means that the whole available area is used. The alternative is currently ‘v’ only which means a triangular shape that can be used as pointing arrow for highlighting specific clades.
  2. ring_edge_width [def 0.1]
    • the width of the border of the ring segment
  3. ring_edge_color [def None, which means ‘ring_color’]
    • the color of the border of the ring segment

Some additional ring options refer to non clade-specific aspects like the label of the ring itself or the graphical separation between rings. These options are specified without a clade name in the following tree-column format:

global_ring_option ring_level global_ring_option_value

Specifically, the ring options can be:

  1. ring_label [def. None]
    the label to be displayed at “stat_rotation” position for the rings. total_plotted_degrees should be less than 360 to make space for these labels.
  2. ring_label_font_size [def. 11]
    • the font size of the ring labels.
  3. ring_internal_separator_thickness [def. 0.0 which means absent]
    • the thickness of the circular line separating different ring levels. This is referred to the most internal of the two sides of each ring.
  4. ring_external_separator_thickness [def. 0.0 which means absent]
    • the thickness of the circular line separating different ring levels. This is referred to the most external of the two sides of each ring.
  5. ring_separator_color [def. ‘k’ for black]
    • the color of the circular line separating different ring levels.

………..

Colors

Colors are strings that can be:

  1. one of the following ‘default’ colors
    • blue, green, red, cyan, magenta, yellow, black, white
  2. a one-letter shortcut for the above colors
    • ‘b’ (blue), ‘g’ (green), ‘r’ (red), ‘c’ (cyan), ‘m’ (magenta), ‘y’ (yellow), ‘k’ (black), ‘w’ (white)
  3. a RGB color code in the hexadecimal format
    • #rrggbb, for example #FF0000 corresponds to (full) red

Marker shapes

As of August 2012 we support the marker types available in matplotlib version 1.1.1. Specifically here are the codes for the markers. Note that some of them are shapes with internal color-filled space, other are edge- or point-only markers.

  • ‘.’ : point marker
  • ‘,’ : pixel marker
  • ‘o’ : circle marker
  • ‘v’ : triangle_down marker
  • ‘^’ : triangle_up marker
  • ‘<‘ : triangle_left marker
  • ‘>’ : triangle_right marker
  • ‘1’ : tri_down marker
  • ‘2’ : tri_up marker
  • ‘3’ : tri_left marker
  • ‘4’ : tri_right marker
  • ‘s’ : square marker
  • ‘p’ : pentagon marker
  • ‘*’ : star marker
  • ‘h’ : hexagon1 marker
  • ‘H’ : hexagon2 marker
  • ‘+’ : plus marker
  • ‘x’ : x marker
  • ‘D’ : diamond marker
  • ‘d’ : thin_diamond marker
  • ‘|’ : vline marker
  • ‘_’ : hline marker

Examples

Here are some examples of graphical tree representations generated by GraPhlAn. These example are included in the software package (under /examples). At the end of the page can find the GraPhlAn manual.

hmptree800.png