Epigenetic posttranslational modifications on histones are highly dynamic; extensive changes of these histone states can affect recruitment of effector proteins that fundamentally shape gene expression programs underlying processes that govern development and disease (1–6). Specifically, methylation at histone lysine residues has received considerable attention due to the important involvement with transcriptional regulation. One such mark, histone H3 lysine 4 trimethylation (H3K4me3), is an evolutionarily conserved modification consistently found at transcription start sites (TSSs) and serves as a hallmark of active gene promoters (6–8). Studies have conferred that H3K4me3 interacts with chromatin remodelers (9, 10) and promotes recruitment of basic transcription factors to facilitate transcriptional activation (11, 12). Other functional implications ascribed to H3K4me3 include DNA damage repair response (13, 14), marking of bivalent genes comarked with the repressive H3K27me3 modification (15), and cell identity specification (16, 17). Thus, there is a consensus that H3K4me3 is critically linked to transcriptional output and cellular response.
Deposition of methylation on H3K4 is catalyzed by lysine methyltransferases (KMTs), namely, the family of COMPASS (COMplex of Proteins ASsociated with Set1) complexes, which is widely conserved from yeast to human (18–20). While Set1/COMPASS is solely responsible for implementing all three methylation patterns in yeast (19–21), our laboratory and others have described the division of labor in methylating H3K4 among the COMPASS members in higher organisms (21–28). In mammals, there are six Set1-related enzymes that reside in COMPASS-like complexes: Set1A, Set1B, Mll1, Mll2, Mll3, and Mll4 (18, 29, 30). Evolutionary expansion of the COMPASS family in higher metazoans denotes functional diversification of H3K4 methylation, showcasing the underlying complexity of epigenetic regulation (31). Mammalian Set1A/Set1B have been accredited as mainly responsible for bulk H3K4me3 genome-wide (6, 28, 30, 32, 33); Mll1/Mll2 catalyze H3K4me3 in a locus-specific manner (e.g., targeting Hox genes) (25, 27, 34–36); and Mll3/Mll4 are key H3K4 monomethyltransferases at enhancers (24, 29, 37). Recent studies have begun to reassess the biological significance of the catalytic activity as the primary function of COMPASS. For instance, the enzymatic SET domain of Set1A was shown to be nonessential for embryonic stem cell (ESC) self-renewal (38), although deletion of the full-length protein impairs viability (39, 40). Likewise, inactivating mutations of the SET domain of Trr, the Drosophila homolog of Mll3/Mll4, does not result in clear developmental defects, and Mll3/Mll4 catalytically deficient ESCs have less transcriptional aberrations than seen in cells with total protein loss (41, 42). Our laboratory also determined Mll4/COMPASS could regulate enhancers without its enzymatic activity (43). These studies suggest that the COMPASS family function independent of H3K4 methylation is context dependent. However, deletion of any individual COMPASS member in mice results in embryonic or prenatal lethality with distinct phenotypes (37, 39, 44–46), signifying these proteins have partially nonredundant functions. Although research has provided insight into how each COMPASS member operates in unique contexts, mechanisms underlying their roles in cellular and developmental regulation remain elusive.
We previously reported that ablating the SET domain of Set1A (Set1AΔSET) does not disrupt bulk H3K4me3 in ESCs (38), indicating the likelihood of other COMPASS members having functionally redundant roles to sustain global H3K4me3 in Set1AΔSET ESCs. Multiple lines of evidence thus far have suggested that both Set1A and its paralog Set1B contribute to genome-wide H3K4me3 deposition (18, 28, 32, 33). However, recent studies suggest Set1A and Set1B have functionally distinct roles, as evidenced by Set1B overexpression not being able to mitigate proliferation defects caused by loss of Set1A protein in ESCs (39), and Set1B is localized mostly in the cytoplasm (47). Given the existing perplexity in their function, we sought to elucidate the extent to which Set1A and Set1B, as well as their familial relatives Mll1 and Mll2, may have distinct versus overlapping responsibilities in H3K4me3 regulation in ESCs in the current study. Through generating an array of ESC lines containing compounding mutations of COMPASS family, we found that Set1A, Set1B, and Mll2 engage in an epigenetic collaborative circuit to modulate the H3K4me3 signature and breadth in ESCs. Our findings provide evidence for both functional specialization and redundancy of the mammalian COMPASS family members to direct transcriptional regulation and cell identity in a context-specific manner, shedding novel insights into mechanisms underlying disease pathogenesis associated with mutations of such critical epigenetic modifiers of chromatin.
RESULTS AND DISCUSSION
Our previous study demonstrated that deleting the C-terminal catalytic SET domain of Set1A does not disrupt global H3K4 methylation in ESCs, which suggested that other enzymes implement bulk H3K4 methylation in the absence of Set1A activity (38). Because Set1B is structurally homologous to Set1A (fig. S1A), we investigated whether Set1B could compensate for Set1AΔSET function to sustain global H3K4 methylation in ESCs. Therefore, we generated Set1B knockout (Set1BKO) ESCs by deleting the first four exons of the Set1B genomic locus via CRISPR-Cas9 gene editing. Polymerase chain reaction (PCR) genotyping and RNA sequencing (RNA-seq) confirmed the deletion of the intended genomic region and Set1B transcript (fig. S1, B and C), and Western blotting substantiated the complete loss of the Set1B protein in ESCs (Fig. 1A). Ablating Set1B did not alter bulk H3K4 methylation as observed by Western blotting (Fig. 1B and fig. S1D). ChIP-seq [chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing] analyses exhibited similar H3K4me3 levels in wild-type (WT) and Set1BKO ESCs (Fig. 1C). In addition, the expressions of key pluripotency factors Sox2, Nanog, and Oct4 remain invariable when comparing WT and Set1BKO cells (fig. S1E). These findings thus far agree with prior studies demonstrating that Set1B is not essential for ESC self-renewal and that Set1B removal does not affect bulk H3K4 methylation (39).
Our laboratory has recently established that Set1B, unlike the rest of its COMPASS family relatives, resides predominantly in the cytoplasm (47); however, we find that a small fraction of Set1B still binds to chromatin in ESCs, prompting us to determine the potential role of Set1B in H3K4 methylation (Fig. 1D). Because independent abrogation of Set1B or the catalytic SET domain of Set1A did not change overall H3K4 methylation levels (38), we used CRISPR-Cas9 to establish Set1BKO-Set1AΔSET ESCs. Successful generation of homozygous double-mutant ESCs was verified by PCR genotyping and RNA-seq (fig. S1F and Fig. 1E). Expressions of pluripotency markers Oct4, Nanog, and Sox2 are comparable between WT and Set1BKO-Set1AΔSET ESCs at the significance level of P < 0.01 (fig. S1H). We assessed H3K4 methylation in the double-mutant cells compared with that in WT ESCs and found that while bulk H3K4me1/me2 levels did not differ, H3K4me3 was somewhat decreased in Set1BKO-Set1AΔSET ESCs by Western blotting (fig. S1G and Fig. 1F). The reduction in H3K4me3 levels in the double mutant compared with WT ESCs was also consistently detected by ChIP-seq (Fig. 1G). The diminished H3K4me3 in the combined deletions of the SET domain of Set1A and knockout of Set1B, compared with either mutation alone, suggests functional redundancy between the two Set1 paralogs in regulating H3K4 methylation in ESCs.
Several studies have implicated a role for H3K4me3 in transcriptional activation by RNA polymerase II (RNAP II) (11, 12). Given the noticeable decrease in H3K4me3 in Set1BKO-Set1AΔSET compared with WT cells, we used ChIP-seq to evaluate RNAP II occupancy genome-wide. Reduced RNAP II is evident at sites with H3K4me3 loss, and accompanying reduction in gene expression can be seen in representative track examples (Fig. 2A). Such decrease in H3K4me3 and RNAP II is not due to the loss of Set1A SET domain or Set1B alone (38), further supporting their functionally redundant roles in H3K4me3 implementation and transcription regulation at these regions (fig. S2, A and B). Genome-wide analyses in the Set1BKO-Set1AΔSET double-mutant cells revealed that H3K4me3 is lost across regions proximal to TSSs, with a corresponding decrease in RNAP II occupancy compared with WT ESCs (fig. S2C). Partitioning of these H3K4me3 peaks using K-means clustering revealed that H3K4me3 loss in the double mutant occurs in the second and third clusters, where Set1A binding is the strongest (Fig. 2, B and C). Both clusters also exhibit somewhat diminished RNAP II levels in Set1BKO-Set1AΔSET compared with WT cells (Fig. 2C). The accompanying RNA-seq data demonstrate moderate differences in gene expression pattern, with a stronger decrease in expression pertinent to cluster 3, which has a relatively more severe H3K4me3 loss (Fig. 2C). Cluster 3 also contains genes that are more highly expressed in WT ESCs (fig. S2D) and is especially enriched for housekeeping factors such as ribosomal-related proteins as shown by gene ontology (GO) annotation (fig. S2E). Collectively, these data signify that H3K4me3 at promoters of more highly expressed genes are more perturbed by the combinatorial deletions of Set1B and the SET domain of Set1A in ESCs. These findings are consistent with previous studies reporting that H3K4me3 at genes with higher expression are more sensitive to loss of CXXC1, a key subunit of the Set1/COMPASS complexes (48, 49).
We previously found that Mll2, another member of the COMPASS family of H3K4 methyltransferases, also catalyzes H3K4me3 in ESCs (25); therefore, we deleted the SET domain of Mll2 using CRISPR-Cas9 (fig. S2F). Through Western blotting, we noted that disrupting the SET domain of Mll2 adversely affects the protein’s stability (fig. S2G), indicating that Mll2 protein level in Mll2ΔSET ESCs is quite comparable to Mll2KO cells, which were previously generated in our laboratory (34). Histone H3K4me3 ChIP-seq analysis in Mll2ΔSET ESCs revealed loss of H3K4me3 at sites distinct from those seen in Set1BKO-Set1AΔSET when compared with WT cells (fig. S2, C and H). Particularly, the greatest H3K4me3 reduction in Mll2ΔSET cells is confined mainly to cluster 1 peaks, where there is also a corresponding decrease in RNAP II level and overall expression of genes nearest to these sites (Fig. 2, D and E). Cluster 1 contains the TSS of genes that are typically less transcriptionally active and are linked to function in proper development (fig. S2, D and E), concordant with previous findings that Mll2/COMPASS implements H3K4me3 primarily at specific loci such as bivalent genes (25). Analysis of H3K4me3 ChIP-seq in ESCs without Mll1 (50), also known to deposit H3K4me3 in mammals, showed comparable levels of global H3K4me3 between Mll1KO and WT cells (fig. S2I), signifying that Mll1/COMPASS is not a crucial regulator of bulk H3K4me3 in ESCs.
The K-means clustering of H3K4me3 in Fig. 2 effectively divided peaks by their width, which also correlated with gene expression, such that wider H3K4me3 coincided with higher nearest gene expression, and narrower H3K4me3 concurred with lower nearest gene expression (Fig. 2, B to E, and fig. S2D), suggesting an underlying biological significance to H3K4me3 breadth. Multiple studies have recently explored such functional implications, reporting a positive relationship between H3K4me3 breadth and gene expression (17, 51) and attributing the role of breadth to determining specific cell identity during development and disease (16, 52, 53). We noted the predilection of Set1/COMPASS mutant cells to present H3K4me3 loss in clusters 2 and 3, which have greater H3K4me3 breadth, while mutant Mll2/COMPASS primarily affected cluster 1 with the narrowest breadth (Fig. 2, B to E). To investigate the function of H3K4me3 breadth further, we classified broad and narrow WT peaks, annotated to the nearest TSS, and sorted these regions from wide to narrow. We retrieved 500 sites with either the broadest or narrowest peaks and examined H3K4me3 levels in WT, Set1BKO-Set1AΔSET, and Mll2ΔSET ESCs. In comparison to WT, Set1BKO-Set1AΔSET ESCs show diminished H3K4me3 at genes primarily with broader peaks, while Mll2ΔSET cells exhibit abated H3K4me3 levels at genes with mainly narrower peaks (Fig. 3A). These unequivocal differences of altered ChIP density by peak width in the two mutant cell lines are further illustrated quantitatively in composite profiles (Fig. 3B) and representative track examples (Fig. 3C). GO term analyses indicate that genes with broader H3K4me3 peaks are enriched for biological processes that are relatively more important for proper maintenance of stem cell identity (fig. S3A), supporting previous studies associating H3K4me3 breadth with cell identity specification (16, 17). In addition, narrow peaks are enriched for processes that are more pertinent to neuronal development (fig. S3A), consistent with published studies showing Mll2 implementing H3K4me3 at development-related genes in ESCs (25, 54, 55).
To further investigate the consequence of H3K4me3 peak width, the top and bottom breadth quintiles were evaluated. For the top 20% H3K4me3 peaks, the Set1BKO-Set1AΔSET mutant featured a significantly drastic decrease in H3K4me3 coverage compared with WT, while the Mll2ΔSET mutant exhibited comparable H3K4me3 occupancy relative to WT (Fig. 3D). In contrast, Mll2ΔSET ESCs showed a more significant reduction in H3K4me3 for the 20% narrowest H3K4me3 peaks compared with WT, while the Set1BKO-Set1AΔSET mutant is much less altered for H3K4me3 at these narrow peak regions (Fig. 3D). Furthermore, when assessing peak width of differential H3K4me3 TSS-proximal peaks for each mutant, Set1/COMPASS mainly affected wider peaks, while Mll2ΔSET mutant primarily affected narrower peaks (fig. S3B). By examining the levels of RNAP II at H3K4me3-proximal TSSs in Set1BKO-Set1AΔSET and the Mll2ΔSET mutants compared with WT, we observe decreased RNAP II levels at broader peaks for the Set1BKO-Set1AΔSET mutant and reduced RNAP II levels at narrower peaks for the Mll2ΔSET mutant (Fig. 3E). On the basis of these findings, we speculate the following: (i) transcription of cell identity genes, which are marked by broad H3K4me3 (16), is regulated by Set1/COMPASS and, thus, illuminates the importance of Set1/COMPASS in maintaining stem cell viability (38, 39), and (ii) the observed phenomenon of Mll2 affecting narrow H3K4me3 peaks, which mark developmental genes, supports the critical role of Mll2 in development (34, 56, 57). There are several plausible explanations for the differential recruitment and activity of these COMPASS methyltransferases at distinct groups of loci. Studies have shown that Set1, the yeast homolog to mammalian Set1A and Set1B, is recruited to chromatin through its association with elongating RNAP II (58–60). One study has recently proposed that repeated passaging of elongation complexes containing Set1/COMPASS contributes to the widening or broadening of H3K4me3 levels, which correlate with increased transcription frequency (61). While this model has yet to be demonstrated for Mll2/COMPASS, it is possible that Mll2 is recruited to chromatin by other factors that include Menin, which also resides in the Mll/COMPASS complexes, and LEDGF (27, 62–65). In addition, different DNA-binding specificities of the CXXC motif found in the CXXC1 protein, a key subunit in the Set1/COMPASS complex, and in Mll2 have been reported to contribute to selective targeting of various COMPASS complexes to their respective genomic loci (66–68). In sum, these findings demonstrate that the difference in H3K4me3 patterns established by Set1/COMPASS and Mll2/COMPASS is indicative of functional significance in stem cells.
We noted that Set1A and Mll2 are both highly enriched at H3K4me3 TSS-proximal peaks depicted in the K-means clustered heatmaps, indicating colocalization of Set1A and Mll2 on chromatin (Fig. 2, B and D). In addition, at annotated TSS sites, there is >98% overlap in Mll2 and Set1A binding regions (fig. S4A). This colocalization could be explained by the fact that both CXXC1, a key subunit in the Set1A/COMPASS complex as discussed earlier, and Mll2 harbor a CXXC motif that recognizes and binds to unmethylated CpG-containing DNA (34). It is therefore possible that Mll2 could compensate for Set1/COMPASS loss in H3K4me3 deposition. Consequently, we depleted Mll2 using a short hairpin RNA (shRNA) in Set1BKO-Set1AΔSET lines and compared to WT ESCs (fig. S4B). Knockdown of Mll2 resulted in a marked decrease in H3K4me3 in Set1BKO-Set1AΔSET ESCs (Fig. 4, A and B). To investigate this observation in further detail, we therefore generated a triple-mutant cell line, where we deleted the SET domain of Mll2 in Set1BKO-Set1AΔSET ESCs by CRISPR-Cas9 (fig. S4, C and D). Simultaneously, we removed Mll1 in Set1BKO-Set1AΔSET ESCs using previously reported guide RNAs (gRNAs) for targeting Mll1 (50), in the event that Mll1 may manifest a compensatory role in implementing H3K4me3 under conditions of Set1/COMPASS mutation (fig. S4, C and D). We were able to successfully retrieve homozygous triple-mutant ESCs harboring the intended Set1BKO-Set1AΔSET–Mll2ΔSET and Set1BKO-Set1AΔSET–Mll1KO mutations, as validated by PCR genotyping and RNA-seq (fig. S4, C and D). However, we noted that the Set1BKO-Set1AΔSET–Mll2ΔSET cells proliferate more slowly than WT, Set1BKO-Set1AΔSET, and Mll2ΔSET cells (data not shown). To ascertain if the appended deletion of either SET domain of Mll2 or of Mll affected H3K4 methylation, we performed Western blotting and ChIP-seq analyses to evaluate H3K4me3 levels. Consistent with the effects seen in knocking down Mll2 in the double-mutant ESCs (Fig. 4, A and B), bulk H3K4me3, including the level at TSS-proximal regions, is substantially lowered in the Set1BKO-Set1AΔSET–Mll2ΔSET cells (fig. S4, E to G). However, no additive perturbation effect on H3K4me3 levels was observed in the Set1BKO-Set1AΔSET–Mll1KO triple mutant relative to the double mutant (fig. S4, E to G), affirming the minimal contribution to H3K4me3 deposition by Mll1 in ESCs.
Extending our analyses by evaluating H3K4me3 changes in the previously defined clusters, we observe a robust and synergistic decrease in H3K4me3 occupancy in all three clusters in the Set1BKO-Set1AΔSET–Mll2ΔSET triple-mutant cells (Fig. 4C), evincing that Mll2/COMPASS is functionally redundant to Set1/COMPASS in sustaining global H3K4me3 levels in ESCs. When we analyzed H3K4me3 changes at Set1- versus Mll2-dependent H3K4me3 regions in Set1BKO-Set1AΔSET, Mll2ΔSET, and Set1BKO-Set1AΔSET–Mll2ΔSET compared with WT, we noted the following: The significant decrease in H3K4me3 in the triple mutant compared with that in the double mutant at Set1-dependent H3K4 methylation is unquestionably greater than the significant reduction in H3K4me3 in the triple mutant compared with that in the Mll2ΔSET at Mll2-dependent sites (fig. S4, H and I). This supports the interpretation that while Mll2 is functionally redundant to Set1 at Set1-controlled regions, Set1 does not appear to reciprocate redundancy to Mll2 at Mll2-controlled sites, suggesting a unidirectional compensatory relationship between Mll2 and Set1. The added mutation of Mll2ΔSET in Set1BKO-Set1AΔSET ESCs also further perturbed H3K4me3 breadth, such that the reduction in H3K4me3 density is seen at both genes with broad or narrow breadth, with a more radical decrease at broader peaks in the triple mutant (Fig. 4, D to G, and fig. S4J; also compare Fig. 3A to Fig. 4E). These data pinpoint the role of Mll2/COMPASS to be a key compensator for Set1/COMPASS function in bolstering overall H3K4me3 level and breadth in ESCs.
On the basis of global analysis, we noted an additional decrease in RNAP II at broader and narrower H3K4me3 peaks for the Set1BKO-Set1AΔSET–Mll2ΔSET versus Set1BKO-Set1AΔSET cells (Fig. 4H). The severe loss of H3K4me3 in the Set1BKO-Set1AΔSET–Mll2ΔSET mutant at these regions correlates with a significant decrease in the transcription of cell identity genes compared with WT and Set1BKO-Set1AΔSET cells and as a result may contribute to the proliferation defect in these triple-mutant cells. Thus, in retrospect, the less marked decrease in RNAP II levels in the double mutant at these broader peak regions depicted in Fig. 3E may indicate that the remaining H3K4me3 at these loci is adequate to maintain expression of such cell identity genes to the levels that cell proliferation is not impaired. Furthermore, we previously demonstrated that proper ESC differentiation requires the catalytic activity of Set1A (38), which is a key contributor to the formation of broad H3K4me3. Therefore, we deduce that loss of H3K4me3 at cell identity genes in Set1-mutant cells has a direct effect on cellular differentiation. Because self-renewal and differentiation are hallmarks of pluripotent stem cells, the loss of differentiation potential in Set1-mutant ESCs reflects the impairment of their identity. Together, these data signify that H3K4me3 levels at broad domains play an instructive role in maintaining ESC identity.
In summary, we present findings delineating the functionally redundant roles of the COMPASS family members in H3K4me3 implementation in mammalian stem cells. By investigating H3K4me3 enrichment and breadth in a series of ESC lines harboring compounding mutations of COMPASS enzymes relative to WT cells, we report that COMPASS members Set1A, Set1B, and Mll2 coordinate regulation of H3K4me3 level and peak breadth across the mammalian genome. We also establish that Mll2 plays an important compensatory role in sustaining global H3K4me3 level and breadth in the absence of Set1/COMPASS. This study enhances our current knowledge, shedding new light on the extraordinary ability of our cells to adapt to contextual changes for sustainment. Moreover, with numerous studies indicating the role of COMPASS in disease pathogenesis—Set1A/Set1B in cancer (69, 70), Set1A in schizophrenia (71, 72), and Mll2 in childhood-onset dystonia (73, 74)—our work offers insight into discovering potential targets for future therapy against these relevant diseases through characterization and assessment of the broad/narrow H3K4me3 epigenetic signature.
Acknowledgments: We express gratitude to members of the Shilatifard laboratory for constructive criticism toward this manuscript. We thank E. R. Smith for the critical reading and editing of this manuscript. We thank M. Iwanaszko for insightful feedback on statistical tests and bioinformatics analyses in this manuscript. We are thankful to K. Chen of The Methodist Hospital Research Institute in Houston, Texas, for instructive advice on performing breadth analyses. We also thank B. K. Cenik for inspiring scientific quip. Funding: C.C.S. was supported, in part, by NIH/NCI Kirschstein-NRSA F31CA228149 and is currently supported, in part, by NIH/NCI Predoctoral to Postdoctoral Transition Award F99CA234945. K.C. is supported, in part, by NIH/NICHD Pathway to Independence Award K99HD094906. Studies in the Shilatifard laboratory regarding the role of the COMPASS family in development and cancer are supported by the NCI’s Outstanding Investigator Award R35CA197569. Author contributions: C.C.S. and A.S. conceived and initiated the project. C.C.S., M.U., L.W., C.A.R., S.A.M., E.J.R., D.Z., and D.D. performed the experiments. C.C.S., P.A.O., K.C., F.X.C., and S.D. analyzed the data. C.C.S. and A.S. wrote the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Next-generation sequencing data have been deposited at Gene Expression Omnibus database under accession number GSE152595. Additional data related to this paper may be requested from the authors.