Computational reconstruction of NFkB pathway interaction mechanisms during prostate cancer

It remains challenging to understand biomolecular mechanisms in cancer from high-throughput data. This is particularly true for the Nuclear-factor-kappa-B (NFκB) pathway, which is linked to the inflammatory response and cell proliferation in prostate and other cancers. Despite close scrutiny and a deep understanding of many members' activities, the current list of pathway members and systems-level perspective on their interactions remains incomplete.

Here, we provide a computational reconstruction of NFκB pathway interaction mechanisms in prostate cancer. We identified novel roles for 8 genes and new mechanistic interactions between them and 10 known pathway members. This was carried out by bioinformatic integration of 653 gene expression datasets with 1.4M interactions to predict the inclusion of 40 additional genes. Molecular mechanisms of interaction were inferred using recent advances in Bayesian data integration to simultaneously provide information specific to biological contexts and individual biomolecular activities, resulting in 112 interactions in the fully reconstructed pathway. This method is generalizable, and new NFκB pathway information may enable better understanding of prostate cancer and development of more effective prevention and treatment strategies.

Reference: Börnigen D, Tyekucheva S, Wang X, Rider J, Lee GS, Mucci LA, Sweeney C, Huttenhower C. "Computational reconstruction of NFkB pathway interaction mechanisms during prostate cancer". In Revision.

Input Data
We integrated high-throughput and heterogeneous genomics data using a naïve Bayesian approach with regularization. We trained one naïve Bayesian classifier for each biological context and each interaction mechanism individually, using following gold standard as the underlying ground truth in the training and learning process:
Gold standard
Biological contexts
Interaction mechanisms

We collected 633 non-disease microarray expression datasets from the NCBI Gene Expression Omnibus repository (GEO) comprising 14,617 conditions, as well as 18 human gene expression datasets chosen to be particularly informative for functional relationships in prostate cancer that we manually curated:
List of 633 non-disease GEO datasets
List of 18 prostate cancer specific GEO datasets

Additionally, we collected 225 nonmicroarray datasets from the protein interaction databases BioGRID, IntAct, STRING, Prosite, Domine, Transfac, and ORegAnno for obtaining 1,351,782 pairwise gene interactions (Supplementary Table 6) and 878 datasets in total.

All input datasets were downloaded, processed, normalized and standardized using ARepA (http://199.94.60.28/arepa, PMID: 26157642).

Python Code
We have implemented the network and pathway reconstruction in Python and used the sfle framework (http://199.94.60.28/sfle):
Download Python code within sfle framework

Within this framework, we have applied the Counter function from the sleipnir library for training the naïve Bayesian classifiers and inferring the context-specific networks as well as interaction networks:
Sleipnir
Counter function

Results: Network graphs
Nfkb pathway (sif format) - Figure 2a
Nfkb complex (sif format) - Figure 2b
Cap vs. global prediction (sif format) - Figure 3a (sif format)
Supplementary Material
Supplementary Material