High-grade gliomas are essentially incurable, with a median survival of less than 20 months for glioblastoma multiforme (GBM). Standard-of-care treatments aggressively combine maximal surgical resection, radiation, and chemotherapy, but all typically fail due to the complexity of both the disease and the brain itself (1). Even the more novel targeted treatments, including immunotherapy, have yet to show notable progress (2). Experimental and clinical studies indicate that tumor cell population heterogeneity, its spatiotemporal dynamic adaptation, and cellular-level phenotypic plasticity are key, interrelated contributors to treatment failure (1, 3). Understanding and modulating these adaptive mechanisms is thus a critical step toward improved treatments and patient outcomes.
A fundamental but still poorly understood example of GBM cellular state plasticity is the apparent “Grow” versus “Go” switch between relatively exclusive phenotypes of cell cycle progression and directed motility (4). Standard-of-care treatments target proliferating cells, i.e., the Grow phenotype, but GBM tumor cells may resist the treatment by transitioning to Go or “Dormant” phenotypes that enable widespread dissemination and eventual recurrence (5). In addition to the phenotype transition capability, a GBM intrinsically has a very complex Go phenotype. Numerous studies have shown that diverse mechanisms of invasion can be activated in GBM, depending on the cell of origin (such as stem-like or tumor cell), genomic alteration state, and microenvironment stimuli [such as hypoxia or stiff extracellular matrix (ECM)] (6–9). However, all clinical trials that have targeted these experimentally developed mechanisms have been unsuccessful in controlling the motile GBM cells in patients (1). A likely limiting factor was the presumption that cells intrinsically have only a few restricted mechanisms to induce invasion, with further assumption of a fairly static behavior for those phenotypes.
In this study, we argue that the Go phenotype and phenotype transition map are more complex than what has been proposed thus far. We hypothesize that each GBM cell already has the ability to use any of several invasion mechanisms. Many of these are innate, being available during developmental or tissue repair programs, and are reacquired and adapted through epigenetic and other alterations during initial cancer development. From this perspective, there are many possible Go phenotype states per cell; however, they may not share the same phenotype transition probability. That is, even if all GBM cells in a given patient were synchronized to a Go phenotype, these cells would still vary in their susceptibility to different treatments, depending on their type of Go state. We further argue that the most clinically relevant tumor cell attribute is not motility in a narrow, current sense, but rather the propensity to reach or remain in a motile phenotype.
We accordingly developed new integrative analyses, combining dynamical modeling and patient data analyses. These probe the structure and breadth of phenotypic states to address essential limitations of more narrowly focused experimental studies. By interrelating the many potentially accessible determinants of GBM cell states that lead to different phenotypes [motility (Go), proliferation (Grow), dormancy, and apoptosis], we provide an extensible, generalizable analysis framework for more robust targeting of GBM recurrence. GBM research has emphasized transitions between Go and Grow states (4). Cell death and dormancy (as an absence of active proliferation or motility) were chosen to complete a foundational set of mutually exclusive, general phenotypic states accessible to a GBM tumor cell. More granular states are possible and are captured, in part, by the stable (attractor) state space generated by our model. We specifically assess a Go state propensity or stability measure, which we term p_Go, with survival implications. This analysis integrates diverse regulatory mechanisms to estimate the likelihood of reaching or remaining in the Go state, under physiologically plausible perturbations of external stimuli, such as mitogens and oxygen. Our analyses show that motile tumor cells that partially share pathway activity states may still differ substantially in their predicted response to external stimuli and ultimate phenotype transition. Moreover, the initial, protein-level cellular state of a tumor cell also plays an important role in some cases, as certain input stimuli can induce alternative tumor states depending on such initial conditions. Highly stable motile states, even if rare, still have the potential to drive GBM recurrence in the context of a large tumor cell population. Our simulations show that transitions between Go and Dormant states are particularly likely, relative to the more discussed Go-Grow switches, and may also enable recurrence. These transitions are notably driven by stimuli of the brain and tumor microenvironment and would require the integrative framework that we have introduced to effectively target and modulate. In particular, we show that p_Go is not a simple correlate of a more transparent principal components analysis (PCA)–derived expression measure that is reflective of established GBM transcriptional subtype status. In estimating patient survival probability, p_Go and related phenotype stability measures provide complementary information beyond GBM tumor subtype and age.
In this section, we outline details of our model and its resulting landscape of attractors representing accessible tumor cell states. To facilitate comparison with available GBM patient tumor sample expression data, these attractors are mapped to both phenotypes and reduced pathway-level state representations. We term the propensity to reach and remain in a phenotype after input perturbations p_phenotype or phenotype stability. We focused on p_Go; however, the stability of other phenotypes can also be calculated. Starting with homogeneous tumor subsets, made available by the IVY-GAP dataset (10), we computed sample-specific p_Go values. We then deconvoluted the more complex mixtures characteristic of the TCGA GBM tumor sample expression using the IVY-GAP as a template set and derived the heterogeneous sample p_Go values from the decomposition. We further examined the association between p_Go and patient survival, and with several key molecular processes. Our source code is freely available.
Modeling GBM phenotype dynamics from a high-dimensional mechanistic system view
To describe and integrate the most current understanding of GBM tumor cell plasticity, concerning growth, dormancy, apoptosis, and motility regulation, we developed a high-dimensional mechanistic model in the form of a Boolean network (11). Boolean network representations simplify reality. But they offer a practical first step toward dynamical, mechanistic models for systems whose kinetic parameters have not been determined. These include the determinants of GBM cell fate, and most other areas of tumor biology. The binarization of non–switch-like network element states remains an actively studied area, with no consensus as far as preferred methods. Still, different approaches have yielded experimentally validated results (12–14), even for systems likely to have non–switch-like components. In part, this may reflect the fact that many such components may be approximated as requiring an activity level sufficient to induce downstream process activity, which may be rendered more switch-like through integration of feedback-stabilized components. Where these qualitative models pass tests of sensibility, they could at least suggest components that are more critical to the dynamics and stability of key processes. These could focus experimental studies to resolve details required for more quantitative models (15–17).
Four key features are included in our network structure. First is the distinction between external input stimuli, serving as bifurcation parameters, and internal signaling machinery (Fig. 1, A and B, and table S1), where both can affect phenotype transition. Second, GBM cells are highly stimulated by the heterogeneous brain environment. Thus, our approach includes three fundamental GBM spatial environments: brain parenchyma (BP), perivascular space (PS), and white matter (WM) tracts (7). Third, several experimentally validated mechanisms were elaborated only in specific GBM cell types. By situating these within a broader GBM molecular network, we enabled discovery of additional determinants that may affect their impact across a wider range of GBM tumor types. Fourth, we make explicit the fact that a single phenotype may well be driven by distinct attractors and their associated stable gene expression/pathway activity states. Together, our GBM network includes 77 nodes (10 inputs, 67 variables), 139 interactions, and 790 feedback loops (see fig. S1 and all detailed interactions, mechanisms, and their references in table S1). The main molecular mechanisms are described below.
Determinants of the Go phenotype include components leading to directed motility, beginning with ECM-integrin interactions and activation of focal adhesion kinase (FAK) and the RHO family guanosine triphosphatases (GTPases) (RHOA, CDC42, and RAC1) (7). The Grow phenotype–related model components include elements of the mitogen-activated protein kinase (MAPK) pathway and components critical to cell cycle progression (18). Cellular states are certainly not limited to Go and Grow. Outside this dichotomy are non-motile quiescent states (Dormant), as well as states leading to programmed cell death (“Apoptosis”). To address these, we model critical elements of the PI3K/AKT, WNT, NFKB (nuclear factor kappa B), and p53 pathways associated with cell cycle control and cell survival decisions (13, 18–22).
Within this fundamental framework, we also overlay experimentally established, glioma-specific motility switches and pathway crosstalk. Hypoxia is a well-established inducer of GBM tumor cell motility, often beginning with an organized front of invasive, pseudopalisading cells advancing beyond an oxygen-depleted, necrotic tumor core (23). We represent the interactions leading from hypoxia to cellular motility, beginning with stabilization of hypoxia-inducible factor (HIF) transcription factors (TFs), followed by induction of cell cycle arrest through CDKN1A (p21) and activation of EMT (epithelial-mesenchymal-transition) TFs ZEB1 and TWIST1. Interactions of these regulators with motility effectors such as fibronectin and RHOA are additionally detailed. We also connect these hypoxia-related pathways with prosurvival NFKB signaling, which can sustain cells escaping adverse, hypoxic environments (20).
A more recently elaborated motility switch involves the CNS (central nervous system)–specific TF OLIG2, whose phosphorylation status has been shown to regulate transitions between proliferative and motile states in GBM stem-like contexts (6, 24). Our model includes complex feedback loops with other genes and pathways, such as TGFB (Transforming growth factor beta) pathway signaling through the SMAD2/3/4 complex, the EMT-related TF ZEB1, MMP2/ECM remodeling, cyclin-dependent kinase (CDK) inhibitor activity (p15/CDKN2B, p21/CDKN1A), and MYC inhibition. It also includes feedback loop–sustained platelet-derived growth factor (PDGF) pathway activation and repression of the cell cycle inhibitor p21/CDKN1A by phosphorylated OLIG2. This phosphorylation requires three distinct kinase types, CDK1/2, GSK3B, and CK2 (24), which we indicate, together with their known regulators in the context of GBM.
In addition to TGFB-driven motility, we include motility drivers specific to particular spatial environments. In the PS, extracellular bradykinin acts through a G protein–coupled receptor to increase intracellular Ca2+, leading to Ca2+-dependent efflux of Cl− and K+ through ion channels. This leads to water loss from the cell by osmosis and consequent volume reduction, which facilitates movement amidst astrocytic end feet and other local barriers in the PS (9). Along WM, a range of axonal guidance molecules are known to direct movement of neural precursor cells during development. GBM tumor cells co-opt some of these mechanisms, and we included a representative, experimentally established pathway leading from ephrins B1 and B2 to directed motility, via activation of the EphB2 receptor, FAK, and RHOA (25).
Phenotype transition is induced by spatial brain and tumor microenvironment stimuli
To explore how the spatial brain and tumor microenvironment stimuli (inputs) affect stable cell behaviors (i.e., attractors) and are integrated by all mentioned mechanisms, we simulated phenotype dynamics given various input conditions. Simulations from 320 spatial context–associated input states yielded 272 distinct attractor states. Each attractor state is associated with one of the four major phenotypes (Grow, Go, Dormant, or Apoptosis) (table S2). Note that certain attractors can be reached from multiple initial input states. Moreover, most attractor states are reached exclusively from inputs within one spatial environment (Fig. 1C). Most input states—284 of 320—lead exclusively to a single attractor state. For the remainder, only two possible attractor states are reached. A higher proportion of inputs (31.3%) lead to Grow or Apoptosis phenotypes in the PS environment (versus 15.6% in BP and WM), whereas inputs leading to Dormancy, rather than growth or motility, are far more prevalent in the BP (70.3% versus 40.6% in WM and 25% in PS) .
To assess how the specific cell’s initial protein-level cellular state may affect phenotypic convergence, we simulated 10,000 initial conditions for each input state. These initial conditions fixed the input state, but sampled alternative cellular network states. Simulations from all input cases yielded a predominant attractor, reached from >95% of all sampled initial conditions. Rare cases (36 of 320, 11.25%) of a single input state allowing two possible attractors, depending on alternative starting cellular network states, were observed. These attractors often shared the same phenotype, but in a few cases, attractors with distinct phenotypes were accessible. In particular, from eight input state vectors, a Go attractor state was predominant, with a less frequently reached Dormant state observed in the remaining cases. No Go-Grow switch based only on initial cellular state was found to occur (Fig. 1D). A Go-Grow phenotypic switch can only occur due to changes through the external inputs. An example of a phenotype switch due to external input changes is given in Fig. 1E. For a full map of converged attractors based on the different input states, see table S2.
Drug treatments may change tumor cell behavior by perturbing the function of key molecular elements. We explored the impact of several representative examples of experimentally and clinically relevant drugs for GBM on the predicted cell phenotype dynamics. These included CK2 (26), epidermal growth factor receptor (EGFR) (27), MAPK kinase (MEK) (28), SRC (29), TGFBR (30), the combination of MEK and TGFBR (28, 30), and the combination of RAF and CK2 (31, 32). Details of the targeted network analyses and results are listed in file S2 and tables S2 and S3. Several notable results demonstrate the stability of the phenotype transition map due to internal network perturbations, which simulate functional inactivation due to drug treatment or gene mutation. For instance, in the MEK-targeted network and the combined RAF- and CK2-targeted network, a single input state could lead to alternative attractors with distinct phenotypes: a Go-Grow switch in the PS environment, and a Dormant-Grow switch in WM and BP environments. As mentioned, these switches did not occur in the untargeted system. Moreover, there are cases of convergence to novel attractors due to internal network perturbations. We provide a detailed example that includes complex dynamics for the intrinsic and targeted networks (see files S1 and S2).
Phenotype stability due to dynamic microenvironmental stimuli (p_Go)
The model and the presented attractor state space focus on protein-level components whose mechanistic behavior is typically analyzed in experimental studies. Cellular states in patient tumor samples are more practically assessed at the level of mRNA expression. To bridge the gap between our literature-based model and translationally relevant patient data, we mapped model-associated attractors (with 67 elements) to more concise pathway-level binary attractors (with 14 pathways) (Fig. 2A), applying direct rules indicated in table S4. The use of pathway-level attractors invariably leads to some compression of the model cellular state space, as we go from 272 distinct full-length attractors to 38 distinct pathway-level attractors (table S4). Some pathway-level attractors were associated with both Go and Dormant full-length attractors and are termed “Go-Dormant.” However, the pathway representation generally preserved phenotypic distinctions, as most pathway-level attractors matched full-length attractors of a single phenotype (see table S4).
The Go phenotype is known to drive GBM recurrence. In view of the cellular state plasticity captured in our model, we hypothesized that the most relevant determinant of tumor cell aggressiveness is not simply being in a Go state. Instead, it is the likelihood of either transitioning to or remaining in the Go state, under modest perturbation of the input state, as cells move in brain space and the tumor changes its environment. The precise estimation of this Go state tendency/stability measure, which we term “p_Go,” is described in Materials and Methods and is presented schematically in Fig. 2A. For non-Go attractor states, p_Go is the probability of reaching the Go state with small input changes. For Go states, it is the probability of remaining in Go under such changes. Similar phenotypic tendency/stability measures (which we will refer to more succinctly as stability measures) can be computed for the remaining general phenotypes, as shown in Fig. 2B . The resulting values are shown in table S4, which indicates a broad range of p_Go (0 to 0.65). Strictly Dormant- or Grow-associated pathway attractors tend to have lower p_Go values, whereas Go-associated states are at the opposite extreme. But among Go-associated states, there is still a wide range of p_Go, from 0.36 (pathway states 26 and 34) to 0.66 (pathway states 9 and 20). Our results thus predict that certain Go states are inherently more stable to perturbation of stimuli than others, and may thereby confer greater aggressiveness to associated tumor cells.
Matching spatially homogeneous GBM samples with p_Go-associated attractor states
Having derived a set of model-based, pathway-level attractors representing GBM tumor cell states, we then set out to match these in tumor sample gene expression data. Bulk sample studies, such as the TCGA, average gene expression from a complex mixture of tumor and non-tumor cell types. To initially target a more homogeneous set of samples, we selected location-stratified gene expression data from the IVY-GAP study (10). These data were derived from tumor samples microdissected from defined GBM spatial environments, including core cellular tumor (CT), perinecrotic zone (PZ), infiltrating tumor (IT), and leading edge (LE) (Fig. 3A). We mapped IVY-GAP tumor sample expression to attractor-associated pathway activity. Applying an approach elaborated in Materials and Methods, we then matched 89 of 270 IVY-GAP samples to 16 of 38 model-based attractors, and thereby assigned each one a p_Go value (table S5). To address the question of how representative the selected 89 samples are, particularly with respect to location, we clustered all sample pathway activities within each location and mark attractor-matched samples (in red, in Fig. 3B). All locations, except microvascular proliferation, are well represented. These attractor-matched samples span the set of IVY-GAP locations, and their assigned p_Go values broadly align with expected, location-associated phenotypes. In particular, IT and LE samples have a higher p_Go range relative to other location-matched samples, such as those from the growth-focused core CT (Fig. 3C). PCA projection of the attractor-matched IVY-GAP sample expression data shows a clear separation by location, providing further corroboration of the association between high p_Go and infiltrative tumor contexts. The pathway attractor–matched IVY-GAP samples can be assigned the phenotype of their associated attractor. By this analysis, 11 of 12 IT samples and 18 of 18 LE samples are in Go/Dormant, 11 of 25 CT samples are in Grow, and 11 of 11 PZ samples are in Apoptosis (Fig. 3, C to F, and table S5). We examined in detail all four phenotype stabilities. Phenotype stability of pathway-level attractors is given in Fig. 2B, and phenotype stability by location is given in Fig. 3E, for stem and non-stem cells. It is interesting to note that our analysis reveals an inverse relationship between p_Apoptosis and p_Go in IT and LE locations. Figure 3F shows that IVY-GAP samples from hyperplastic blood vessels, microvascular proliferation, PZ, and pseudopalisading cells are matched to apoptosis-associated attractors. From the IVY-GAP study site, one can extract the expression of apoptosis-related genes and observe higher expression in the aforementioned tumor regions (fig. S6). These tumor regions are well documented in the literature with higher apoptotic activity due to the connections between vascular pathology, hypoxia, and angiogenesis and their relation to survival and invasion (33–38).
P_Go association with patient survival and tumor biology
We next assigned p_Go values to more complex, bulk GBM tumor samples. We hypothesized that such samples should include a mixture of canonical, location-associated GBM subtypes. We accordingly reconstructed their expression using the attractor-matched and p_Go-assigned IVY-GAP samples as an expression template set. We worked with two GBM tumor sample expression datasets, GSE16011 (39) and the TCGA GBM study set (40, 41), each including matched patient survival data. The former study set was used for methodology development, whereas the latter TCGA set was used for model validation. In both cases, each tumor sample expression profile was deconvolved with respect to the expression profiles of the attractor-matched IVY-GAP samples. This reconstructed the target sample as a positive coefficient-weighted sum of the template samples. A p_Go value for the target sample was then derived by computing a corresponding, relative match strength-weighted sum of the template sample–specific p_Go values, with certain details elaborated in Materials and Methods and full results in table S7. Higher p_Go-associated samples would be expected to exhibit a more aggressive phenotype, and a significant survival difference was evident when tumor samples were stratified into low and high p_Go subsets (Fig. 4A).
To consider mechanistic bases for p_Go, we correlated this with gene expression across the TCGA GBM samples, obtaining an ordered vector of gene-specific p_Go versus expression correlations. With these, we performed gene set enrichment analysis (GSEA) (42) with respect to the REACTOME pathways, as described in Materials and Methods. Negative and positive normalized enrichment scores (NESs) respectively indicate pathways up-regulated when p_Go is low and high. The strongest associations are shown in Fig. 4B (see also table S8 for full results). With low p_Go values, we observe activation of several cell cycle– and DNA replication–related pathways, consistent with a relatively reduced tendency to transition to a Go state from most Grow-like states.
In cases with high p_Go values, we observe a broader set of pathways. In considering these, it is important to note that p_Go represents a tendency to remain in or transition to a Go state, rather than that of simply being in such a state. The association of the electron transport chain–related pathways with high p_Go is consistent with studies showing that more invasive, slow-cycling GBM tumor cells favor mitochondrial oxidative phosphorylation for their energy needs (43). Several overlapping immune response–related pathways are also associated with higher p_Go values. The “REACTOME_CHEMOKINE_RECEPTORS_BIND_CHEMOKINES” pathway association is particularly driven by expression of the chemokine/receptor combination CXCL16/CXCR6 (see table S8 summary of leading-edge gene sets). Glioma cell–released CXCL16 has been shown to directly promote tumor cell migration and polarize microglia/macrophages toward an anti-inflammatory/protumor phenotype, which drives proinvasive microenvironmental changes (44).
Further support for an association between p_Go and a proinvasive tumor microenvironment is provided by specific p_Go versus gene expression correlations (Fig. 4C). These include those with respect to CCL2 (Spearman rho = 0.441, P = 2.68 × 10−9) and GDF15/MIC-1 (Spearman rho = 0.201, P = 9.59 × 10−3), which encode molecules secreted by glioma stem cells to recruit and drive polarization of protumor/invasive M2 macrophages (45). Concordantly, p_Go is also correlated with expression of M2 macrophage–specific cytokines IL23A (Spearman rho = 0.262, P = 6.65 × 10−4) and IL10 (Spearman rho = 0.29, P = 1.50 × 10−4) (45). These associations are noteworthy, given their putative origin in minor subpopulations of the bulk tumor sample.
Last, we examined the distribution of p_Go within established GBM subtypes (Fig. 4D). GBM gene expression subtypes were described by several studies (46, 47) and agree on three subtypes designated as proneural, mesenchymal, and classical. Still, there is no clear association between these transcriptional subtypes and clinical outcome, in part because tumor heterogeneity gives rise to bulk tumors including several subtypes (48). We accordingly derived the new Go stability measure through deconvolution with respect to more homogeneous samples (from the IVY-GAP dataset with spatial information). Each bulk tumor can also be assigned the previous transcriptional subtype description. These analyses show that proneural samples tend to have lower p_Go values than classical and notably invasive mesenchymal samples. The p_Go distributions, particularly of the classical and mesenchymal subtypes, overlap substantially, consistent with the p_Go’s integration of a broader, dynamically modulated set of features favoring the Go state.
Although we show that higher p_Go values correspond to samples from mesenchymal and classical subtypes (Fig. 4D), one may wonder if p_Go brings any new information or is a redundant measure. We note that while the IVY-GAP PC1 gene expression weighting does separate, to an extent, mesenchymal, classical, and proneural TCGA GBM samples, p_Go is not a mere correlate of this measure (Fig. 4E). Moreover, in contrast with p_Go, mesenchymal and classical TCGA GBM samples do not show a significant survival difference when compared to proneural TCGA GBM samples (Fig. 4F). If p_Go and increased Go phenotype propensity/stability contribute to a more aggressive phenotype, somewhat independently of subtype, one would expect to see a survival difference with p_Go-based stratification within subtype cohorts. A significant survival difference is evident when putatively more invasive mesenchymal and classical TCGA GBM samples are stratified into low and high p_Go subsets (Fig. 4F). In a multivariate Cox proportional hazards model, including age and subtype, p_Go (and similarly defined p_Phenotype measures) adds complementary information to age and subtype (fig. S7). Thus, p_Go and the entire phenotype transition map include additional, complementary information that cannot simply be explained by previous work.
Numerous studies have revealed molecular mechanisms enabling certain types of GBM tumor cells to transition between proliferative (Grow) and motile (Go) phenotypes, or even to Dormancy. These necessarily focus on specific cell types and contexts, such as glioma stem cells in associated microenvironmental niches (6, 24). Although valuable, they still have not been integrated into a larger study of glioma cell growth and motility to account for known redundant, compensatory mechanisms and complex brain environmental stimuli that affect system dynamics. In actual patients, adaptive, motile tumor cell states emerge amidst a balance of potentially conflicting inputs, integrated by dynamic, feedback-stabilized molecular networks, all subject to stochastic fluctuations that would be next to impossible to mimic in experimental settings.
Our analysis captures these translationally critical features to reveal a “GBM attractor landscape” of potential molecular tumor cell states, focusing on the main known phenotypes of Go, Grow, Dormancy, and Apoptosis. By integrating leading current mechanistic studies, we reveal plausible emergent properties of clinical relevance. In particular, we found that most external input states invariably lead to a single tumor phenotype, but certain inputs strongly favor a dominant phenotype, while allowing an alternative depending on random fluctuations in the initial protein-level cellular state. Even if a highly motile state can emerge only within very restricted environmental conditions, this is potentially sufficient to drive clinical recurrence in the context of a large tumor cell population. These types of probabilistic descriptions more generally provide a bridge between our model’s single-cell “state choice” focus and clinically observed, population-level behaviors. They also enabled our formulation of the Go phenotype stability (p_Go), which represents the propensity of a tumor cell state to reach or remain in the motile Go state under physiologically likely input state perturbations. While molecular profiling can indicate currently proliferative or motile tumor cell types, the key attribute to understand and ultimately modulate is the dynamic stability of the Go state. Our work predicts that not all overtly motile states are equal in this most translationally relevant sense.
Higher p_Go values should increase tumor aggressiveness and recurrence, and are associated with reduced patient survival, as we show in this study. In the context of these survival probability estimates, our phenotype stability measure adds complementary information, relative to the known GBM tumor subtypes and age. Moreover, we associated higher p_Go states with several molecular pathways and processes. These include tumor cell intrinsic up-regulation of oxidative phosphorylation, shown to favor motility (43), as well as expression signatures of immune subpopulations known to remodel the tumor microenvironment to facilitate invasion. These molecular correlates support our broader view of GBM invasiveness, as p_Go is defined by a dynamical model-derived measure integrating diverse features that shape the propensity for tumor cell motility, given a specific brain and tumor environment.
It is important to note the clinically consequential likelihood of dynamic transition between Dormant and Go phenotypes. We highlight, more generally, the importance of assessing dynamic tumor behavior within its full potential map of phenotype transitioning, due to input and/or internal system perturbations. A larger contribution of this study is its extensible and more broadly applicable framework for combining high-dimensional mechanistic modeling and patient data–driven analyses. We also detail an essentially general methodology for uniform pathway state representation, matching of model-based and observed tumor expression states, and deconvolution of multi-subpopulation–averaged expression states. Together, these provide a new analysis framework for further development in GBM, enabling assessment of new targets within the tumor-brain environment (49, 50), based on their impact on GBM tumor phenotype plasticity and redundant mechanisms.
MATERIALS AND METHODS
The methods of our study can be divided into two main parts, as they integrate a theory-driven method with a data-driven method using GBM patient datasets. First, we elaborate on the phenotype transition model, which includes details of model construction, simulations to identify attractors, attractor phenotypes, phenotype map, and phenotype stability. Second, we describe the integrated patient data analyses used to validate our model. These include several steps, with the end goal of assigning phenotype stability values to the complex mixtures characteristic of TCGA GBM tumor sample expression. Together, these analyses allow us to examine the association between theoretically formulated phenotype stability and patient survival. We also connect phenotype stability with patient gene expression data and several key molecular processes. Throughout the study, we attempt to use standard, well-established bioinformatics approaches in an effort to increase the reproducibility of the results and the generality of our framework.
Seven types of data and other resources were used in this study. First, 62 research articles and leading reviews were used to identify experimentally validated mechanisms and interactions. Those used in support of specific Boolean network model rules are indicated in table S1. Second, model simulation data, both with and without drug perturbation, were used for model validation, steady-state (attractor) discovery, and derivation of theoretical phenotype distributions. Third, the Broad/MSigDB Hallmark Pathway and other gene sets were used to map tumor sample gene expression to theoretical attractor-matched pathway activity. Hallmark gene sets summarize and represent specific well-defined biological states or processes and display coherent expression. These gene sets were generated by a computational methodology based on identifying overlaps between gene sets in other MSigDB collections and retaining genes that display coordinate expression (51). Fourth, the spatially homogeneous IVY-GAP tumor sample expression dataset (10) was used for matching with theoretical attractor states, and also for deconvolution of more heterogeneous TCGA GBM tumor samples. The IVY-GAP dataset includes RNA sequencing (RNA-Seq) profiles for 270 laser-capture microdissected samples from 42 GBM tumors, sampled from distinct anatomical regions of the tumor. They were downloaded from the Allen Institute’s Ivy Glioblastoma Atlas Project 5 (IVY-GAP). The expression data are represented as fragments per kilobase of transcript per million reads mapped (FPKM) values and further adjusted with TbT normalization. One hundred and twenty-two samples from the Anatomic Structures (AS) study were analyzed. The locations and number of samples are as follows: CT (30 samples), LE (19), IT (24), PZ (CTpnz, 26), and pseudopalisading cells around necrosis (CTpan, 24). Fifth, two GBM tumor sample expression datasets were used, GSE16011 (39) and the TCGA GBM study set (40, 41), each including matched patient survival data. Sixth, the REACTOME pathways (52) were used for GSEA (42). Seventh, a previously derived mapping of TCGA samples to the three GBM tumor-intrinsic transcriptional subtypes (proneural, mesenchymal, and classical) (46, 47) was used to evaluate the subtype-specific p_Go phenotype stability measure distributions.
Step 1. Dynamical model
Boolean network. We developed a Boolean network model to integrate established cancer biology and GBM-specific mechanisms and ultimately characterize the distribution and stability of converged phenotypes. Boolean network models provide a qualitative description of a dynamical system, where each node can exist in either an active (“on”) or inactive (“off”) state (11). Activity in this context implies reaching a threshold necessary for biological function, and a binary value represents the state of a node. An extended node state vector represents the overall state of the system at a given time point. Node update rules, illustrated in Fig. 1B, indicate each node’s state as a logical function (Boolean rule) of the states of its network input (“parent”) nodes. A Boolean rule (function) is usually expressed via the logic operators AND, OR, and NOT. Collectively, these update rules allow precise simulations from given starting states. The simulations ultimately settle in stable “attractor” states, which, in appropriately constructed models, match observed cellular states and phenotypes (53).
Network structure. The structure of the network, as nodes, edges, and Boolean rules, is detailed in table S1, together with supporting mechanistic studies. Together, our GBM network includes 77 nodes (10 inputs, 67 variables), 139 interactions, and 790 feedback loops (see fig. S1 and table S1). At the level of individual rules, we used AND to combine elements operating in a complex or otherwise shown experimentally to be required strictly together. We used OR for components or processes that could be driven by one of several entities. Certain behaviors were expected. Given that we are dealing with Boolean approximations indicative of activity above a threshold, we developed the model to ensure that states like Go, Grow, and Apoptosis are mutually exclusive. We also aimed to match the Go-Grow switching described in the GBM literature so that targeting of nodes leading to cell cycle activity increases the likelihood of motile Go attractor states. After representing fundamental machinery for cell proliferation, motility, and death, we aimed to elaborate at least exemplars of known processes leading to GBM motility (as outlined in the initial results section). We finally aimed to connect these processes with one another, and the more fundamental model components, while preserving dynamic behaviors described in the literature and expected under particular node perturbations.
Spatial brain and tumor microenvironment stimuli (inputs). Ten nodes represent inputs, including mitogens (EGF and PDGF), other key pathway drivers (TIMP and WNT canonical, affecting TGFB and WNT signaling), and various attributes of the internal and external environment (DNA damage, oxygen, bradykinin, ephrin B1/B2, stiff ECM components, hyaluronan). The brain environment is far from homogeneous. Through specific input parameter settings, our model captures three fundamental spatial environments: the BP, the PS, and the WM. Note that input states are length 10 binary vectors with all possible combinations of input node settings, up to constraints on spatial environment–specific inputs. In our presentation, we use the term “inputs” to refer to these input state vectors and “input node(s)” to refer to their specific components. Under the described spatial constraints, there are 7, 6, and 7 free binary input nodes in the BP, PS, and WM, respectively. These give rise to 27 + 26 + 27 = 320 distinct inputs, from which the resulting attractor state properties can be evaluated.
Simulations to reach attractors. Using the integrative, dynamic network model outlined above, we performed simulations. The complete state of the network can be formally described by a row vector (x, y) = (x1, …, x10, y1, …, y67), where xi, yi ∈ [0, 1] indicate the states of input and non-input network nodes, respectively. As noted above, some of the xi are constrained within particular spatial environments, giving rise to 320 distinct input state vectors, each associated with a specific environment.
For each input state, we ran the model 10,000 times, starting in each case from a different, random initial condition, i.e., setting of the non-input nodes representing the binary, largely protein-level network states. Running the model entails selecting, at each time step, a random non-input node yi and updating its value, based on its Boolean function and the current values of its function-specific input nodes. This process, described in the literature as “asynchronous updating” (52), is repeated until a fixed, steady-state, or attractor yk is reached (where the index k denotes one of a set of accessible attractors). Concretely, this attractor is a vector of binary network node values, which remains the same no matter which node update rule is selected and run. The mapping from input states to reached attractors is in table S2.
Matching attractors to phenotypes. Each attractor state is associated with a single high-level phenotype selected from Grow, Go, Dormant, or Apoptosis, based on the steady-state activity of cell cycle (“Cell_Cycle”), motility (“Directed_Motility”), and apoptosis (Apoptosis) nodes. These states are mutually exclusive, based on the model structure and dynamic attributes.
Pathway-level attractor representation. To facilitate comparison with tumor sample expression data, we mapped model-associated attractors (N = 67) to more concise pathway-level binary attractors (N = 14). The selected pathways were chosen from the Broad/MSigDB Hallmark Pathway, BioCarta, and Pathway Interaction Database sets to cover key features of attractor states, particularly those critical to attractor phenotypes. Indicated in table S4 (sheet 1) are the pathways and their state assignment based on specific attractor component states. The resulting pathway-level attractor set is also provided in table S4, sheet 2.
Phenotype stability. We define phenotype stability as the likelihood of transitioning or remaining in a phenotype, under perturbations of the input state. This stability measure could, in principle, be calculated for full-length (mostly protein-level) attractors. However, because the end goal is to work with patient data, the pathway level was found to be more translationally accurate. Thus, here, we focus on describing phenotype stability with respect to pathway-level attractors. For each pathway-level tumor cell state attractor, we can work backward to its associated set of input states (via the full-length attractor states from which the pathway attractor state is derived). Each pathway steady-state associated input state set can then be expanded to include nearby input state vectors. For our main analyses, we expanded the input state set to also include vectors that only differed at one input setting. The intuition is that these sorts of changes, for example, with respect to a mitogen- or location-associated input, are expected to arise for motile tumor cells in the dynamic GBM tumor microenvironment. We accordingly computed p_Go for each attractor by aggregating its expanded input set input-to-Go phenotype probabilities, as derived from our model simulations (tables S3 and S4).
For additional theoretical analyses (Fig. 2B and fig. S4), we applied a variation of this methodology to compute specific phenotype transition probabilities from the Go state. For a given brain spatial environment (either with or without a particular simulated drug treatment), we identified input states leading to the Go state. This Go-associated input state set was expanded to include perturbations of up to two inputs (excluding changes to inputs defining the spatial environment, to focus on phenotype transitions within a spatial environment). Phenotype transition probabilities from the Go state were then computed by aggregating the input-to-phenotype probabilities (table S3) over the above, expanded input set. The larger perturbation (k = 2 versus k = 1) was allowed because these analyses expanded “backward” to input states from full-length attractors in the Go state, rather than from pathway-level attractors. In the latter case, allowing larger input perturbations would lead to much larger expanded input states, due to the data compression associated with the mapping from full-length attractors to pathway-level attractors.
Step 2. Data analyses
Pathway activity per sample. For both the IVY-GAP and TCGA GBM datasets, R’s GSVA (54) library was used to compute the pathway activity with respect to the 14 network model–matched pathways indicated above. Pathway activity levels for each pathway were scaled to the unit interval (by subtracting the minimum value across samples and then dividing by the corresponding range). The resulting scaled pathway activity levels were then binarized by setting values below and above the (pathway-specific) median value to 0 and 1, respectively.
Binarization. We plotted the distribution of scaled single sample GSEA pathway activities for IVY-GAP samples (fig. S5). Many pathway activities were well matched by a single Gaussian. For those that were not, we fitted two-component Gaussian mixtures and binarized the pathway activities (using a point between the two means, and equidistant after standardization relative to the fit SDs). The resulting binarization did not improve the results as far as IVY-GAP sample matching, relative to the median-based thresholding.
Matching IVY-GAP samples to theoretical attractors. Each IVY-GAP sample-derived, binary pathway state vector was then matched, if possible, to a theoretical attractor. We required exact (binary state) matching at the level of the HALLMARK_E2F_TARGETS, HALLMARK_APOPTOSIS, and at least two of three motility effector pathways (BIOCARTA_RAC1, BIOCARTA_RHO, and PID_CDC42). Among candidate attractors satisfying these criteria, we selected the attractor with the best binary state match with the sample pathway state vector (as measured by the Jaccard similarity coefficient). The matched attractor p_Go value was then assigned to the IVY-GAP sample (table S5).
GSE16011 and TCGA GBM tumor sample expression deconvolution and p_Go computation. Tumor sample expression profiles from the GSE160 and TCGA GBM study sets were deconvolved with respect to the expression profiles of the attractor-matched IVY-GAP samples. All expression profiles were restricted to 76 genes selected to match essential nodes in the network model (table S6). We will refer to the restricted, attractor-matched IVY-Gap samples as “template samples.” Each restricted study set sample was reconstructed as a positive coefficient-weighted sum of the template samples using the nnls function of the nnls R package. The resulting regression coefficients were then scaled by dividing each one by the l1 norm of the coefficient vector (sum of coefficient absolute values). This allows each scaled coefficient to reflect the relative contribution of a particular template sample to the reconstruction. A p_Go value for the target sample was then derived by computing a scaled coefficient-weighted sum of the template sample–associated p_Go values (table S7).
Survival analysis. p_Go-assigned samples from the GSE160 and TCGA GBM study sets were stratified into low and high p_Go groups, based on p_Go values below and above the 66th percentile value, respectively. Using the above stratification and provided survival data, Kaplan-Meier curves and log-rank test P values were computed using R survival package functions survfit and survdiff.
Gene set enrichment analyses with respect to p_Go versus gene expression correlations. Spearman rank correlations were computed between p_Go-matched TCGA sample gene expression profiles and the corresponding vector of sample p_Go values. GSEA with respect to the REACTOME pathway set was performed using the resulting vector of correlations and the fgsea R package function fgsea.
p_Go versus gene expression correlations. We computed correlations with p_Go for use in the pre-ranked GSEA, whose results were adjusted for multiple comparisons. We then reviewed the GBM-specific literature related to significantly enriched processes [false discovery rate (FDR) < 0.05]. Where cited experimental studies indicated process-related genes implicated in GBM tumor biology, we checked their correlations. The P values for the presented correlations are unadjusted, and are intended to show plausible connections between p_Go and tumor biology, with the particular studies cited for readers to consider further.
Model testing and validation
We aimed for two levels of validation. With the network model itself, we elaborated the model enough to capture key determinants of GBM tumor cell fate and appropriate dynamic behaviors shown through experimental studies (increased probability of Go with perturbation of Grow-related pathways, etc.). These integrative dynamic features need not follow from rules that are individually consistent with particular studies. This internal validation allowed us to determine when our model was fundamentally wrong or incomplete.
We then hypothesized that a measure of stability (p_Go) for a deleterious phenotype, necessarily derived from a dynamic model, would predict reduced patient survival, consistent with its conferring a more durably aggressive tumor phenotype. To assign p_Go values to tumor samples, we developed the two-step procedure of associating theoretical pathway–level attractors with location-stratified IVY-GAP samples and then deconvolving the expression of bulk tumor samples with respect to p_Go-assigned IVY-GAP samples. To develop this methodology, we used the GSE16011 GBM tumor sample expression dataset. The TCGA GBM data were set aside and not used to adjust the Boolean network model or the tumor sample processing methodology. Using this validation dataset, we show that p_Go provides complementary information, relative to age and GBM subtype status, for predicting patient survival, and that it is not a simple correlate of existing gene expression–based measures.
|Cellular network state||The (binary) activity states of cell-intrinsic nodes in the Boolean network model. These represent molecular components or processes within or directly generated (e.g., secreted) by the tumor cell. For fundamentally adaptive cells, dynamical modeling is essential to study the evolution and potential stabilization of the cellular network state over time.|
|Initial conditions||The value of a cellular state at starting point (t = 0).|
|Spatial brain and tumor environmental stimuli (i.e., inputs)||In our model, a GBM cell is exposed to two types of stimuli: (i) those related to the physical location within the organ, such as the BP, the PS, and WM, which include different cell diffusion properties, and (ii) molecular inputs within the tumor microenvironment, including mitogens (EGF and PDGF), other key pathway drivers (TIMP and WNT canonical, affecting TGFB and WNT signaling), and various attributes of the internal and external environment (DNA damage, oxygen, bradykinin, ephrin B1/B2, stiff ECM components, hyaluronan). The input space is the set of all 320 input state vectors (which we refer to as inputs). These length 10 binary vectors are derived from all possible combinations of input node settings, up to constraints on spatial environment–specific inputs.|
|Dynamical system||A dynamical system is one that evolves in state over time, as a function of the system’s elements, their interactions, and input parameters. In the case of a network representation, the evolution refers to the changes in node values, and/or edges over time.|
|Boolean network||A Boolean network model provides a qualitative representation of a system in which each node can take two possible values denoted by 1 (ON) or 0 (OFF). This binary value represents the state of a node. In Boolean models, the future state of a node is determined based on a logic statement concerning the current states of its regulators. This statement, called a Boolean rule (function), is usually expressed via the logic operators AND, OR, and NOT.|
|Attractor||An attractor is stable cellular network state, representing the long-term behavior of a system.|
|Phenotype||Observable characteristics of a cell resulting from interactions between external stimuli, the intrinsic cellular system, and its ability to change. We focus on four GBM relevant phenotypes: motility (GO), proliferation (Grow), dormancy, and apoptosis. Note, many cellular network states can have the same phenotype.|
|Attractor landscape||The attractor landscape includes all attractors that the system could converge to and could also include their stability and transition probabilities under particular input perturbations. The attractor space is the set of 272 distinct attractor states reached by model simulation starting from the 320 input states.|
|Phenotype transition map||Phenotype landscape or phenotype transition map provides the likelihood of paths to transition from one phenotype to another. As a phenotype may have several attractors, each with a different sensitivity to input state conditions, this map becomes very complex to compute. As brain environment and GBM cells are dynamic, predicting such a map is even more challenging.|
|Bifurcation parameter||A bifurcation occurs when a small change made to the parameter values of a system causes a sudden “qualitative” or topological change in the system’s behavior. In our case, certain input changes could change the cellular phenotype.|
|Complex dynamical system||A complex dynamical system includes scenarios that are challenging to predict due to the complexity of the system and its dynamics. For instance, (i) a system with two attractors yet sharing a single environment input state; (ii) bifurcation, changing the system’s behavior by a small perturbation of the input state; (iii) several input states share the same attractor; (iv) an internal system perturbation, such as a mutation, could change the system’s behavior.|
|Plasticity||Changes that remain even after removing a particular stress. In our case, such changes may occur at the level of Boolean functions, or the activity threshold of a node.|