Topic outline
General
Topic 1
An overview of functional genomic data.
Genomic Data Resources provides an overview of public (meta) databases, primary data resources as well as a host of services and resources that provide functional analysis of this primary data. A thorough understanding of the most popular data types available as well as how they interconnect is immensely helpful when trying to annotate your own results and helps with the integration of your results with existing, public knowledge.Topic 2
Scalable machine learning.
Machine Learning lists well-established frameworks and resources in machine-learning, from high-level systems such as Weka and Orange to libraries and code that can be easily embedded in your own scripts. A short tutorial also guides you through a toy example using the Weka system along with some basic code.Topic 3
Sequencing.
(Second-generation) Sequencing aims to provide a summary of standard workflows in handling second generation sequencing data along with strategies for keeping yourself informed with a rapidly changing field. Pointers to community-accepted standards and algorithms will get you started in understanding different mapping, assembly and variant-calling approaches.Topic 4
An introduction to metagenomics.
Metagenomics summarizes current research aspects from basic 16sRNA analysis to shotgun sequencing, assembly and community analysis. An overview of current databases and tools is provided along with a sample analysis of a human gut microbiome using MG-RAST.Topic 5
Genomic data integration.
Data Integration lists resources and tools for genomic data integration, including basic method as well as applied tools. The HEFalMp system is demonstrated along with two more basic examples on host/pathogen interaction and GWAS analysis which use previously highlighted genomic resources to identify functions and molecules of interest in large data sets.Topic 6
Scalability.
Scalability wraps up the tutorial by providing information on how to scale the listed examples to larger and larger datasets with regards to storage, computational resources and by providing a consistent research computing environment.