Supplementary Figure 1. Properties and preprocessing of input datasets. Row labeled ‘Genes’ contains the results of redundancy removal and orthology mapping between probe sets and charts the overlap as Venn diagram in the middle. Row labeled ‘Drugs’ charts the result of restricting the CMap data set to compounds tested in all three cell lines used here and how these overlap with compounds tested in rat liver (Venn diagram in the middle). Pie charts give the composition of the drug space profiled in each resource.
Supplementary Figure 2. Non-redundant drug-induced modules can be detected across a wide range of ISA parameter settings. The non-redundant final set of drug-induced modules from each cell line was compared to the whole set of raw drug-induced modules before redundancy removal (MCF7 n=2255, PC3 n=1661, HL60 n=1783) that were identified under a number of runs of the ISA algorithm that explored an extensive parameter grid (gene s.d. cutoff [from 5 to 2 in steps of 0.2], drug s.d. cutoff [from 4 to 1 in steps of 0.2]). The heatmap indicates the significance of gene overlap (Fisher’s exact test, negative log10 of the p-value, see color key) between any of the final non-redundant modules (columns) and the most similar module from a particular ISA parameter set (rows). In total, more than half of the final modules significantly overlapped (FDR-corrected p-value<0.01) with raw modules from more than 100 distinct parameter settings.
Supplementary Figure 3. Network of conserved drug-induced transcriptional modules (CODIM). Using a reciprocal best hit approach, drug-induced transcriptional modules across three cell lines and rat liver were linked to each other based on the significance of the overlap between their gene and drug members (with FDR corrected p-value < 0.01). Taken together, these modules covered a substantial fraction of the gene and drug space of the underlying expression data: on average 1,857 genes (20% of the input gene set) and 555 drugs (56%) per cell line, and for rat liver 1,587 genes (44%) and 311 drugs (90%) were represented within modules. All drug-induced transcriptional modules were labeled (number) in the order in which they were identified by the ISA procedure.
Supplementary Figure 4. Module conservation across model systems is significantly higher than expected for random independent biclusters. For each pair-wise comparison of 4 datasets (3 cancer cell lines and rat liver), the number of conserved modules was compared against the conservation between biclusters that were randomly generated, independently for each data set. Properties of 10,000 randomly generated biclusters resembled those of real ones (same distributions of gene members). In each pair-wise comparison between data sets, we observed significantly more conserved real drug-induced modules as assessed by their gene overlap (empirical p-value < 1E-4) than between independent randomly generated data sets.
Supplementary Figure 5. Assessing the impact of drug space on module conservation. The total number of treatments (990 drugs) in the CMap dataset were downsampled to smaller drug spaces comprising 10% to 100% (raw data) in increments of 10% (always using the same drug sample for all three cell lines). Five replicates were analyzed for each downsampled drug space. Drug-induced transcriptional modules were identified using exactly the same pipeline and parameters except for the number of seeds, which were reduced to 2,000 to reduce computation time. The total number of identified drug-induce modules and their conservation across cell lines and rat liver are displayed as stacked bar plots with error bars representing the standard error. These results indicate that a drug space of 200-400 drugs (~20-40%, assessed in three cell lines) appears to be a lower limit for robust conservation estimation.
Supplementary Figure 6. Functional coherence of transcriptional modules as benchmarked against STRING. Functionally related protein pairs were extracted from the STRING protein-protein association network for human and rat. Drug-induced transcriptional modules from human cell lines and rat liver were found to have a higher ratio of functionally associated gene pairs in comparison to random modules (1,000 gene sets with the same size and background). 30 out of 82 modules from human cell lines (35%), 12 out of 23 CODIMs (52%) and 13 out of 43 modules from rat liver (30%) clustered together more functionally related genes than expected by chance (at an empirical, permutation-based p-value cutoff < 0.05).
Supplementary Figure 7. CODIM2 is enriched in genes known to function in unfolded protein response (UPR) and cholesterol biosynthesis pathways. (A) The modulatory effect of CODIM2 on three branches of UPR signaling. Shown are genes known to be modulated by one of the key UPR sensors IRE-1, ATF6, PERK and how their expression changes in response to the drugs contained in CODIM2 (enriched for antipsychotics). Overall mean fold changes ±SEM in PC3 module 14, n=28, are in the following indicated in parentheses: for IRE1 targets HERPUD1 (1.75±0.11), EDEM1 (0.32±0.04), DNAJB9 (1.68±0.18), DNAJC3 (0.75±0.11), and HSPA5 (0.89±0.07); for the ATF6 branch ATF6 (0.2±0.03), GADD45 (0.68±0.07), HSP90B1 (0.48±0.06) and XBP1 (0.79±0.06); for PERK targets ATF4 (0.82±0.07), ATF3 (1.66±0.24), DDIT3 (1.92±0.14), DDIT4 (1.92±0.21), and TRIB3 (1.75±0.13) (for genes with multiple probe sets, the one with highest mean FC was selected). In contrary to reported effects of antipsychotics specifically on PERK signaling (Canfran-Duque et al, 2012), CODIM2 drugs were found to stimulate all branches of UPR signaling although to a lower extent in the ATF6 branch. Target genes were extracted from the literature (Canfran-Duque et al, 2012; Lee et al, 2003; Szegezdi et al, 2006). (IRE1, inositol required 1; PERK, PKR-like endoplasmic reticulum localized kinase; ATF6, activating transcription factor 6) (B) The curated pathway of cholesterol biosynthesis as illustrated in WikiPathways (Kelder et al, 2012). Genes in this pathway that were also contained in CODIM2 are labeled in green.
Supplementary Figure 8. Negative controls for assessing anti-proliferative effects of CODIM1 predictions. Three predicted candidate cell-cycle inhibitors were chosen for validation experiments: sulconazole (anti-fungal) and vinburnine (vasodilator) were successfully validated, whereas validation of mephentermine (cardiac stimulant) failed (see Fig. 3 in the main text). Butoconazole and moxisylate (both present in CMap) were chosen as negative controls for validation experiments with results shown here.
Supplementary Figure 9. Cell cycle analysis using propidium iodide staining and FACS. and assessing FL3-A signal to determine cell cycle phases of cells. (A) Treatment of HL-60 cells with reference compound nocodazole (200 ng/mL) resulted in G2/M arrest, most prominent after 24h of treatment. (B) Using “Flowing software”, the FL3-A signal indicating DNA content was used to determine cells in different cell cycle phases, subG1, G1, G2/M and endoR (endoreduplicated) (see Fig. 3 and Materials and Methods for details).
Supplementary Figure 10. Detailed results of experimentally tested drug-target interactions. Ten newly predicted drug-target interactions were experimentally evaluated with in vitro binding or cellular assays as indicated. Two drugs from the PC3-9 module, zaprinast (phosphodiesterase inhibitor) and raubasine (anti-hypertensive) were tested for PPARγ binding. Four drug repositioning candidates from MCF7-9 module were screened for binding affinity to estrogen receptor alpha (nitrendipine, antihypertensive; theobromine, alkaloid of the cacao plant, present at high levels in chocolate; bendroflumethiazide, thiazide diuretic) or an antagonistic effect in a cellular assay (dilazep, vasodilator). Predictions based on the HL60-17 module (hexetidine, anti-bacterial; (+)-chelidonine, alkaloid; vigabatrin, antiepileptic; podophyllotoxin, antiviral) were both tested against ADRA2C and ADRA1B, as it was not possible to a priori determine the exact target associated with the drugs from the HL60-17 module from cell line data. Ki values were obtained for predictions with 40% and more activity at 50 µM concentration (nh: Hill coefficient).
Supplementary Table 1. Robustness of drug-induced modules to redundancy removal parameters. Redundancies among modules were eliminated within each cell line using the recommended standard procedure ISAUnique (R ‘eisa’ package). We used a correlation (cor.) threshold of 0.3. Subsequently, modules were subjected to a second round of redundancy removal based on gene overlap between modules (Fisher test, p-value <10-5, using the same priority as pre-defined by ISA). Additionally, we explored a wide range of parameters here (cor. threshold: from 1 to 0.1 in decrements of 0.1 and overlap p-value cutoffs 10-2, 10-5, and 10-10) and assessed the effect on the resulting non-redundant set of modules (percentages in parentheses indicate the coverage of the non-redundant set of modules described in the main text). In conclusion, the combination of these redundancy removal routines yields similar results across a broad range of parameter settings and is thus quite robust to the actual parameters used (shaded in gray).
Supplementary Table 2. Robustness of constitutive co-expression modules to detection parameter. To obtain a final set of drug-induced modules, we discarded those ‘constitutive’ co-expression modules in which 10% or more of the gene pairs were correlated (Pearson coefficient > 0.6) in untreated control samples. Here we additionally explored a wider parameter grid by varying the percentage of correlated gene pairs between 1 and 50% (see Table). When we assessed how many of the non-redundant modules filtered with these parameters overlapped with the final modules presented in the main text, we found that for any cutoff, the majority of final drug-induced modules (in most cases the vast majority) were detected invariably.
Supplementary Table 3 (xlsx). Comparison of gene and drug members of drug-induced modules across human cell lines and organisms (separate file).
Supplementary Table 4. Characterization of gene and drug members of conserved drug-induced modules (CODIMs), CODIM-associated modules of rat liver and selected cell line-specific modules linked to a certain MOA. Functional enrichment of genes (biological process) and MOA enrichment of drugs were labeled in green if identified in the automated annotation process. For drug annotation, manually curated drug information was additionally gathered for CODIMs (colored in orange). Moreover, we checked the consistency of gene and drug annotations and categorized these associations in terms of their agreement with the current literature: green labels for previously reported associations, yellow for less obvious cases with poorly characterized drug mechanism of action, and grey for novel associations for which no clear understanding could be gathered from the literature (cases without detectable and interpretable enrichment in white).
Supplementary Table 5. Detailed information on the characterization of gene and drug members of unified conserved drug-induced modules (CODIMs), CODIM-associated modules of rat liver and selected cell line-specific modules linked to a certain MOA.
Supplementary Table 6. Experimental results of RNAi-screen for selected genes from CODIM2. We found that siRNA mediated knock-down of 10 out of 24 poorly characterized genes (not previously linked to cellular cholesterol regulation) have a significant effect (labeled in orange) in the filipin and/or the DiI-LDL uptake assays with an inhibitory (yellow) or stimulatory (blue) role. An effect was deemed significant if 2 or more siRNAs had a consistent effect with absolute (average) Z-score >1 and FDR corrected p-value <0.01.
Here is the bundled supplementary materials file with captions and references. (PDF document).