MaAsLin: Multivariate Association with Linear Models

Please note, as of April 1, 2019, MaAsLin will no longer be officially supported with development and support efforts focused on its successor, MaAsLin2.

MaAsLin is a multivariate statistical framework that finds associations between clinical metadata and microbial community abundance or function. The clinical metadata can be of any type continuous (for example age and weight), boolean (sex, stool/biopsy), or discrete/factor (cohort groupings and phenotypes). MaAsLin is best used in the case when you are associating many metadata with microbial measurements. When this is the case each metadatum can be a diffrent type. For example, you could include age, weight, sex, cohort and phenotype in the same input file to be analyzed in the same MaAsLin run. The microbial measurements are expected to be normalized before using MaAsLin and so are proportional data ranging from 0 to 1.0.

The results of a MaAsLin run are the association of a specific microbial community member with metadata. These associations are without the influence of the other metadata in the study. There are certain factors known that can influence the microbiome (for example diet, age, geography, fecal or biopsy sample origin). MaAsLin allows one to detect the effect of a metadata, possibly a phenotype, deconfounding the effects of diet, age, sample origin or any other metadata captured in the study!

maaslin_overview.png

Maaslin Analysis Overview MaAsLin performs boosted, additive general linear models between one group of data (metadata/the predictors) and another group (in our case microbial abundance/the response). Given that metagenomic data is sparse, the boosting is used to select metadata that show some potential to be associated with microbial abundances. Boosting of metadata and selection of a model occurs per otu. The metadata data that is selected for use by boosting is then used in a general linear model using metadata as predictors and otu arcsin-square root transformed abundance as the response.


If you use MaAsLin, please cite the paper the MaAsLin methodology was initially presented in:
Morgan XC, Tickle TL, Sokol H, Gevers D, Devaney KL, Ward DV, Reyes JA, Shah SA, LeLeiko N, Snapper SB, Bousvaros A, Korzenik J, Sands BE, Xavier RJ, Huttenhower C. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 2012 Apr 16;13(9):R79.




Use MaAsLin in Galaxy

MaAsLin is currently being implemented as a Galaxy Module, a preliminary version can be found HERE.

Install MaAsLin (preliminary version)

MaAsLin requires the following R packages: agricolae, gam, gamlss, gbm, glmnet, inlinedocs, logging, MASS, nlme, optparse, outliers, penalized, pscl, robustbase

Please install these packages before installing MaAsLin.

To install MaAsLin:

  1. Download the latest version of MaAsLin.
  2. Install MaAsLin (where X.Y.Z is the version number)
$ R CMD INSTALL Maaslin_X.Y.Z.tar.gz

Please note MaAsLin is currently under development.




Updates and mailing list

Software updates will be available through the bitbucket repository. You are more than welcome to use the Issue Tracking system on Bitbucket to provide feedback, report bugs, and suggest/request new features.

If you would like to be notified about new features, or have comments or questions related to MaAsLin please join our mailing list:

MaAsLin google group

We also started a FAQ page.



Expected input files

MaAsLin requires three input files. These include a data file (PCL), a file detailing what information from the PCL file to read in (Read.Config), and an R script for custom data processing or visualization (.R). All files should be named the same file name but have different extension (.pcl, .R, and .read.config).

A. Input data file
Required input file which we call the PCL file. This file contains all the data and metadata. This file is formatted so that metadata/data (otus or bugs) are rows and samples are columns. All metadata rows should come first before any abundance data. The file should be a tab delimited text file. A demo PCL file is found in the MaAsLin download in maaslin/inst/extdata/ .

1. Rows represent metadata and features (bugs), columns represent samples.
2. The first row by default should be the sample ids.
3. Metadata rows should be next.
4. Lastly, rows containing features (bugs) measurements (like abundance) should be after metadata rows.
5. The first column should contain the ID describing the column. For metadata this may be, for example, "Age" for a row containing the age of the patients donating the samples. For measurements, this should be the feature name (bug name).
5. By default the file is expected to be TAB delimited.
6. If a consensus lineage or hierarchy of taxonomy is contained in the feature name, the default delimiter between clades is the pipe ("|").

*It is our goal to make using MaAsLin as easy as possible. If you are starting with a separate metadata file and a Qiime produced table, a bitbucket project "QiimeToMaAsLin" exists to help create a PCL file for you in an automated way. Click here for information on obtaining QiimeToMaAsLin.

B. Read Config File
A .read.config file allows one to selectively read rows and columns from your PCL file. This allows one to configure a maaslin run, selecting different metadata to associated with microbial measurements. Although .read.config files have many potential uses, a minimal example of a read config files is as follows:
Matrix: Metadata
Read_PCL_Rows: -Weight

Matrix: Abundance
Read_PCL_Rows: Bacteria-

'Matrix: Metadata' defines the first block (of two lines in this case ) to specify which metadata to read. 'Read_PCL_Rows: -Weight' indicates the rows read from the pcl file start at the first metadata (row 2) and continue to the metadata 'Weight' (inclusively).

'Matrix: Abundance' defines the second block (separated from the first block by an empty line) to specify which abundance data to read. 'Read_PCL_Rows: Bacteria-' indicates the rows read from the pcl file start at the 'Bacteria' row and continues to the end of the file.

Samples or columns can be excluded from the MaAsLin run in a similar fashion with a third line using the key word 'Read_PCL_Columns:'. More details on how to use the .read.config file can be found in the README . A demo .read.config file is found in the MaAsLin download in maaslin/inst/extdata/ .

C. Optional .R Script
A demo R script is found in the MaAsLin download in maaslin/inst/extdata/ . In this version of MaAsLin the data transformation for microbial measurements is found in the .R . The demo .R script can be used for any project by copying it and renaming it the projectName.R. The .R script can be used to add custom data analysis or to manipulate the output MFA figures. Documentation and examples are in development to show how to fully use this file.

Running MaAslin

A. Run the MaAsLin demo:

Run the demo included in the MaAsLin install.
$ R
> library(Maaslin)
> example(Maaslin)

B. Run MaAsLin:

If starting with a PCL file (input.pcl), first transpose it to a TSV file (input.tsv).

 $ ./Maaslin/exec/transpose.py  input.tsv 

Run MaAsLin.

$ R
> library(Maaslin)
> Maaslin('input.tsv','maaslin_output',strInputConfig='input.read.config')

Please see the FAQs if you need information on running MaAsLin from the command line.

C. MaAsLin options:

Run the following for a full listing of MaAsLin options

 > help(Maaslin) 




MaAsLin Output Files

1. Analysis (These files are useful for analysis):
projectname-metadata.txt: Each metadata will have a file of associations. Any associations indicated to be performed after initial boosting is recorded here. Included are the information from the final general linear model (performed after the boosting) and the FDR corrected p-value (q-value). Can be opened as a text file or spreadsheet.

projectname-metadata.pdf: Any association that had a q-value less than or equal to the significance threshold will be plotted here. If this file does not exist, the projectname-metadata.txt should not have an entry that is less than or equal to the threshold. Factor and boolean data is plotted as knotched box plots; continuous data is plotted as a scatter plot with a line of best fit.

maaslin_output.png

Example of the projectname-metadata.pdf file Significant associations are combined in files of associations per metadata. Factor and boolean data is plotted as knotched box plots; continuous data is plotted as a scatter plot with a line of best fit. Plots show raw data, header data show information from the reduced model.


projectname_Summary.txt: Any entry in the projectname-metadata.pdf are collected together here. Can be opened as a text file or spreadsheet.


2. Troubleshooting (These files are typically not used for analysis but are there for documenting the process and troubleshooting.):

projectname.txt: Contains the detail for the statistical engine. Is useful for detailed troubleshooting.

data.tsv: The data matrix that was read in (transposed). Useful for making sure the correct data was read in.

data.read.config: Can be used to read in the data.tsv .

metadata.tsv: The metadata that was read in (transposed). Useful for making sure the correct metadata was read in.

metadata.read.config: Can be used to read in the data.tsv .

read_merged.tsv: The data and metadata merged (transposed). Useful for making sure the merging occurred correctly.

read_merged.read.config: Can be used to read in the read_merged.tsv .

read_cleaned.tsv: The data read in, merged, and then cleaned. After this process the data is written to this file for reference if needed.

read_cleaned.read.config: Can be used to read in read_cleaned.tsv .

ProcessQC.txt: Contains quality control for the MaAsLin analysis. This includes information on the magnitude of outlier removal.


This covers how to use MaAsLin. Thank you for using this software and good luck with all your endeavors!




Related projects

QiimeToMaAslin is a series of software scripts that create pcl files for MaAsLin in an automated way. The program takes a Qiime table and a metadata file as input, merges the files, and optionally hierarchically sums and normalizes OTUS at each clade level (based on their consensus lineage). QiimeToMaAsLin code and documentation can be found in bitbucket. Click here to download QiimeToMaAsLin

You can also obtain the complete analysis package using the following mercurial command.

$ hg clone https://bitbucket.org/biobakery/qiimetomaaslin

or by downloading the compressed archive in zip, gz, or bz2 format.