Skip available courses

Available courses

Summary

The last decade has seen an explosion in high-throughput technologies, making it relatively straightforward to generate biomedical datasets of billions of data points in a short time. Such datasets often require more time to be analyzed than to be generated. Since large-scale datasets will only become larger and more common in the future, it is essential for biomedical researchers today to possess the skills to manipulate and analyze them and for new computational and statistical methodology to be developed to perform these analyses.

This tutorial will survey statistical methods and software to analyze diverse experimental data from a network point of view. We will focus on gene expression data and single gene perturbation screens, describing approaches to make the step from the parts list to the wiring diagram by using phenotypes for network inference and integrating them with complementary data sources. We will discuss scaling these techniques to very large data compendia (e.g. thousands of expression arrays) and statistical methods for mining the resulting inferred biological networks.

Performance objectives

Upon completion of this course, participants should be able to:
  • Perform detailed analyses of single microarray experiments using R/Bioconductor.
  • Construct nested effects and other statistical models from single gene perturbation screens.
  • Integrate large biological data collections, including hundreds or thousands of gene expression conditions or other experimental results.
  • Perform basic network mining and functional mapping to interpret the resulting biological networks.

Content and instructional methods

Participants will be made familiar with statistical challenges appearing in expression data analysis, gene perturbation screens, large scale data integration, and biological network mining. Algorithms will be introduced and discussed shortly and we will give pointers to available implementations and software. The course will aim for breadth rather than depth, providing links to additional information where possible, and will focus on concepts that can be applied in the participants' own biomedical or statistical research.

Outline

The following is a list of topics covered in the course. Topics will include a mix of theory, examples, data analysis, and brief computing demonstrations.

  1. Introduction to gene expression microarrays;
  2. Introduction to expression data analysis in R/Bioconductor;
  3. Gene perturbation screens;
  4. Nested effects models;
  5. Introduction to experimental and functional data types/sources;
  6. Biological network models;
  7. Biological network and data integration;
  8. Functional mapping and large scale network mining.

Brief Biographies

Curtis Huttenhower is an Assistant Professor in the Biostatistics Department at the Harvard School of Public Health in Boston, MA. Dr. Huttenhower's methodological research focuses on large-scale data mining and network models, with biological applications in microbial population and metagenome characterization. Dr. Huttenhower's collaborations include experimentalists from the Human Microbiome Project, Nurse's Health Study, and Health Professionals Followup Study, providing a diversity of practical large-scale genomic data.

Florian Markowetz is a Group Leader at the Cancer Research UK Cambridge Research Institute in Cambridge, UK. Dr. Markowetz has extensive experience in analyzing gene perturbation screens and developing novel methodology to address the unique features of gene perturbation data. Dr. Markowetz works in several close collaborations to analyze gene perturbation screens, which have given him a wide overview of which challenges arise in real-world studies and how to address them computationally.
The last decade has seen an explosion in high-throughput technologies, making it relatively straightforward to generate biomedical datasets of millions, or even billions of data points in a short time. Such datasets provide a significant challenge for their analysis, requiring researchers to often spend more time analyzing data than generating it. Since large-scale datasets will become larger and more common in the future, it is thus essential for biomedical researchers today to possess the skills to manipulate and analyze them. To enable students to analyze and understand their own data, this fast-paced tutorial will provide hands-on experience that complement lectures with analysis exercises as well as discussion sections exploring relevant recent publications. For microarray data analysis, we will cover normalization, hierarchical clustering, k-means clustering, bi-clustering, GO analysis and Gene Set Enrichment Analysis, as well as identification of genes with significant changes in expression. For high-throughput sequencing applications, we will cover tools and techniques for read mapping, analysis of RNA-Seq and ChIP-Seq data, assembly of short reads into contigs, SNP analysis, as well as emerging file formats.

This course was offered in May, 2010 through the South African National Bioinformatics Institute at the University of the Western Cape by Gavin Sherlock and Catherine Ball of Stanford University and Curtis Huttenhower of the Harvard School of Public Health.
This tutorial will provide an overview of computational methods for mining very large data collections, both for single assays generating sizeable outputs and for integrating thousands of diverse experimental results. We will focus on microbial and metagenomic community characterization as a biological application.

Requires an introductory knowledge of statistics, basic familiarity with experimental data types (e.g. expression arrays, interactions, short read sequencing), and basic familiarity with bioinformatic software systems. A laptop will not be required, but pointers and information will be provided for computational materials that the participants can follow up on independently.
Introduction to genomic data, computational methods for interpreting these data, and a survey of current functional genomics research. Covers biological data processing, programming for large datasets, high-throughput data (sequencing, proteomics, expression, etc.), and related publications. This course is targeted at students in experimental biology programs with an interest in understanding how available genomic techniques and resources can be applied in their research.
Introduction to genomic data, computational methods for interpreting these data, and a survey of current functional genomics research. Covers biological data processing, programming for large datasets, high-throughput data (sequencing, proteomics, expression, etc.), and related publications. This course is targeted at students in experimental biology programs with an interest in understanding how available genomic techniques and resources can be applied in their research.
Introduction to genomic data, computational methods for interpreting these data, and a survey of current functional genomics research. Covers biological data processing, programming for large datasets, high-throughput data (sequencing, proteomics, expression, etc.), and related publications. This course is targeted at students in experimental biology programs with an interest in understanding how available genomic techniques and resources can be applied in their research.

Introduction to genomic data, computational methods for interpreting these data, and a survey of current functional genomics research. Covers biological data processing, programming for large datasets, high-throughput data (sequencing, proteomics, expression, etc.), and related publications. This course is targeted at students in experimental biology programs with an interest in understanding how available genomic techniques and resources can be applied in their research.