Histone H3 lysine 4 (H3K4) methylation, MLL family H3K4 methyltransferase, and transcription
Chromatin within eukaryotic nuclei is organized around ~10 nm wide multiprotein complexes known as nucleosomes (
Kornberg, 1977). Each nucleosome consists of ~146 base pairs of DNA wrapped around an octameric arrangement of the highly conserved histone proteins H2A, H2B, H3, and H4 (
Luger et al., 1997). Additionally, histone H1 and related linker histones bind to nucleosomes and stabilize higher-order chromatin structure (
Thoma and Koller, 1977;
Thoma et al., 1979;
Harshman et al., 2013). Post-translational covalent modification (PTM) of histones, particularly within their unstructured N-terminal regions, plays a critical role in genome-wide transcriptional regulation by altering the density of chromatin packing or the recruitment of chromatin-associated factors (
Cheung et al., 2000;
Rea et al., 2000 ;
Jenuwein and Allis, 2001;
Ruthenburg et al., 2007;
Clausell et al., 2009;
Ernst et al., 2011;
Khare et al., 2012). Among all histone proteins, the N terminus of histone H3 contains the highest density of known PTM sites (Fischle et al., 2003; Khare et al., 2012), which include serine, threonine, and tyrosine phosphorylation, lysine mono-, di-, and tri- methylation, lysine acetylation, lysine ADP-ribosylation, arginine mono- and di- methylation, arginine citrullination, proline isomerization, and tail clipping (
Wang et al., 2004;
Messner et al., 2010;
Bannister and Kouzarides, 2011;
Khare et al., 2012). A major challenge in chromatin biology has been to determine the individual functions of these numerous modifications, and furthermore to understand how crosstalk between modifications affects their functions (
Jenuwein and Allis, 2001;
Fischle et al., 2003;
Khare et al., 2012).
Many modifications of histone H3 are now known to correlate with specific chromatin states. Trimethylation of histone H3 at lysine 4 (H3K4me3) is enriched on nucleosomes in the 5′ regions of transcriptionally active genes in yeast, mouse, chicken, and multiple human cell lines, and H3K4me3 is positively correlated with transcriptional activity and RNA Polymerase II occupancy (
Santos-Rosa et al., 2002;
Ng et al., 2003;
Schneider et al., 2004;
Bernstein et al., 2005;
Pokholok et al., 2005;
Ernst et al., 2011). In vertebrates but not
S. cerevisiae, H3K4 di-methylation (H3K4me2) is predominantly colocalized with H3K4me3 at these 5′ regions (
Bernstein et al., 2005;
Martin and Zhang, 2005;
Ruthenburg et al., 2007;
Ernst et al., 2011), while H3K4 mono-methylation (H3K4me1) is found at both enhancers and promoters (
Ernst et al., 2011). The emerging consensus is that H3K4me3 epigenetically marks sites of active transcription in vertebrates, although it remains unclear whether H3K4me3 is sufficient to stimulate transcription. Interestingly, whereas H3K4me1 is important for enhancer-mediated gene activation (
Herz et al., 2012;
Hu et al., 2013;
Lee et al., 2013), the same modification at the promoter regions inhibits gene activity prior to induction by maintaining a repressed chromatin state (
Cheng et al., 2014).
Catalysis of histone H3 lysine 4 methylation is performed by the Mixed Lineage Leukemia (MLL) enzyme family, which contains six known members in mammals: SET1A, SET1B, MLL1, MLL2, MLL3, MLL4, as well as the unconventional member MLL5 (
Allis et al., 2007;
Ruthenburg et al., 2007;
Shilatifard, 2008;
Zhou et al., 2013). All are large (~1700-5500 amino acid) proteins that contain a C-terminal SET domain, which facilitates methyl transfer from S-adenosylmethionine to the ϵ-amine of the histone H3 lysine 4 side chain. With the exception of MLL5 (
Sebastian et al., 2009;
Zhou et al., 2013), MLL catalytic subunits associate in vivo with a conserved ‘core complex’ of regulatory subunits often abbreviated as WRAD: WDR5, RbBP5, ASH2L, and DPY-30 (
Miller et al., 2001;
Shilatifard, 2008;
Ernst and Vakoc, 2012;
van Nuland et al., 2013). Interaction of MLL catalytic subunits with WRAD greatly enhances their catalytic efficiency
in vitro (
Southall et al., 2009;
Cao et al., 2010;
Odho et al., 2010;
Zhang et al., 2012) (however, see (
Shinsky and Cosgrove, 2015; Li et al., 2016)) and depletion of any single WRAD subunit causes defects in H3K4 di- and tri-methylation
in vivo (Dou et al., 2006). The mechanism of the increased catalytic ability of MLL in the presence of WRAD, as well as the high-resolution structure of the assembled MLL-WRAD complex are important active areas of investigation (
Takahashi et al., 2011;
Patel et al., 2014;
Shinsky et al., 2014;Shinsky and Cosgrove, 2015). The interactions between MLL-WRAD components and the effects of each component on methyltransferase activity have been extensively characterized and reviewed elsewhere (
Ruthenburg et al., 2007;
Trievel and Shilatifard, 2009;
Ernst and Vakoc, 2012;
Couture and Skiniotis, 2013;
Zhang et al., 2013).
Although it remains incompletely understood why mammals require six different MLL catalytic subunits, knockout mouse models have illustrated the essential roles that SET1A, SET1B, MLL1, MLL2, and MLL4 play in mammalian development. Null alleles cause embryonic lethality for all MLL family members except MLL3 (
Lee et al., 2008), suggesting that SET1A, SET1B, MLL1, MLL2, and MLL4 each perform non-redundant functions required for prenatal organism viability (
Yu et al., 1995;
Glaser et al., 2006;
Terranova et al., 2006;
Lee et al., 2008,
2013;
Bledau et al., 2014). SET1A and SET1B maintain the bulk of genomic H3K4me3 (
Wu et al., 2008;
Bledau et al., 2014). Knockout of SET1A causes gastrulation failure, while SET1B knockout embryos die at E11.5 with a severe slow growth phenotype (
Bledau et al., 2014). MLL1 null embryos die at E10.5-11.5 and display body segmentation defects due to loss of
Hox gene expression, which depends on H3K4 methylation (
Yu et al., 1995;
Terranova et al., 2006), as well as deficient hematopoiesis (
Yagi et al., 1998). Knockout of MLL2, which is closely related to MLL1, is lethal at E9.5 and decreases the expression of a set of
Hox genes distinct from those regulated by MLL1 (
Glaser et al., 2006). Loss of MLL3 or MLL4, which are closely related and are a major source of H3K4me1 at enhancers (
Hu et al., 2013), suggests partial functional redundancy, as loss of each individually disrupts normal adipogenesis, but only MLL4 null mice display complete embryonic lethality (
Lee et al., 2008,
2013).The essential role of WRAD in mammalian MLL complexes is also illustrated by the embryonic lethality phenotype of ASH2L and DPY-30 null mice (
Stoller et al., 2010;
Skarnes et al., 2011), and may explain the current absence of RbBP5- or WDR5 null mice despite significant interest.
Novel interactions and functions of MLL-WRAD or WRAD components in the nucleus
Proteins in the MLL-WRAD complex are well known regulators of genome-wide H3K4me1-3 deposition. However, recent work suggests they perform a variety of other nuclear functions beyond this simple enzymatic activity, which is summarized in the paragraphs below (also see Fig. 1). One category of such functions involves the direct or indirect interaction of MLL-WRAD with transcription factors such as MYC, Oct4, and C/EBPα to coordinate H3K4 methylation with the expression of specific target genes under specific conditions such as stem cell differentiation and tumorigenesis. Another category of new nuclear functions for MLL-WRAD involves the formation of complexes with other nuclear proteins and long non-coding RNAs (lncRNAs). These novel interacting partners include but are not limited to distinct H3K4MT complex members (such as PTIP and PA1), Nonspecific lethal proteins (KANSL1/KANSL2), the DNA damage repair ubiquitin ligase CUL4-DDB1, an influenza virus histone mimic (NS1), the Dam1 kinetochore protein in yeast, and the lncRNAs HOTTIP and NeST. While the scope and implications of these numerous interactions are under active investigation, the role of these new nuclear functions extends beyond H3K4me1-3 deposition and will deepen our understanding of how cells interpret the H3K4me1-3 mark, or may uncover novel functions of these proteins.
Two prominent examples of the interplay between MLL-WRAD and transcription factors are the recently described interactions between WDR5, SETLA, and Oct4, as well as between WDR5 and MYC. WDR5 and SETLA both interact with the transcription factor Oct4, and depletion of either protein in embryonic stem cells decreases the expression of Oct4-targeted self-renewal genes (
Ang et al., 2011;
Fang et al., 2016). Consistent with this finding, WDR5 or SETLA depletion also greatly reduces the efficiency of iPS colony formation during somatic cell reprogramming using the Oct4, Sox2, KLF4, and c-MYC factors (
Ang et al., 2011;
Fang et al., 2016). WDR5 and Oct4 occupy many of the same promoters, and it is thought that the locus specificity of WDR5-directed H3K4 methylation in embryonic stem cells may be conferred in part by its interaction with Oct4 (
Ang et al., 2011). Interestingly, WDR5 directly interacts with the transcription factor MYC through the evolutionarily conserved “MYC box IIIb” motif on MYC, and the two proteins display a high degree of genome-wide colocalization on chromatin (
Thomas et al., 2015). MYC box IIIb point mutants that cannot interact with WDR5 are also unable to occupy ~80% of normal genome-wide MYC binding sites, do not induce tumorigenic transformation of cultured fibroblasts, and are inactive for cellular reprograming when expressed with Oct4, Sox2, and KLF4 (
Thomas et al., 2015a). Thus, WDR5 appears to influence the stable association between the MYC/MAX heterodimer and its target genes in a biologically relevant manner (
Thomas et al., 2015a,
2015b). Additionally, WDR5 interacts with the p30 mutant isoform of the transcription factor C/EBPα, which is a key regulator of hematopoietic gene expression and is frequently mutated in acute myeloid leukemia (
Grebien et al., 2015). However, binding between C/EBPα and WDR5 is likely indirect and requires cell type specific factors, as it is not recapitulated in HEK293 cells co-expressing the two proteins (
Grebien et al., 2015). Instead, expression of p30-C/EBPα target genes appears to depend primarily on H3K4me3 deposition, which is disrupted by a novel small molecule inhibitor of the MLL1-WDR5 interaction (
Grebien et al., 2015). Thus the gain-of-function oncogenic properties of the C/EBPα p30 truncation critically depend on WDR5 and MLL1, although the precise mechanism remains unclear. Together, these findings and others support a model in which high levels of WDR5 promote cell proliferation and a pluripotent state in stem cells and cancer cells, while decreased levels of WDR5 or other MLL-WRAD proteins induce differentiation (with the exception of osteoblasts (
Gori et al., 2006;
Zhu et al., 2008) and inhibit cell proliferation to varying degrees in different cell types (
Ang et al., 2011;
Jiang et al., 2011;
Thomas et al., 2015a;
Chen et al., 2015;
Dai et al., 2015;
Grebien et al., 2015).
Extensive crosstalk between MLL-WRAD proteins and members of other chromatin modifying complexes has also become increasingly apparent. While the WRAD core module is common to all SET/MLL H3K4MT complexes, accessory factors including PTIP, PA1, HCFC1/2, WDR82, and MENIN form multiple MLL-WRAD-containing complexes of distinct composition in a manner that appears to depend on the particular MLL/SET catalytic subunit involved (
van Nuland et al., 2013). Interestingly, WDR5 and DPY-30, the core MLL-WRAD proteins with the highest absolute abundance in HeLa cells, participate in several non-H3K4MT complexes with diverse functional roles (
van Nuland et al., 2013). In one instance, WDR5 is recruited into the Nonspecific lethal (NSL) complex, which contains the histone H4 lysine 16 acetyltransferase MOF as well as the KANSL1 and KANSL2 scaffolding proteins (
Dias et al., 2014). Structural analysis revealed that WDR5 interacts directly with KANSL1 and KANSL2 using the same binding sites that are recognized by MLL and RbBP5, respectively, when WDR5 participates in the MLL-WRAD complex (
Dias et al., 2014). Accordingly, incorporation of WRD5 into NSL or MLL-WRAD complexes was mutually exclusive
in vitro and
in vivo (68)
. In another case, the CUL4-DDB1 ubiquitin E3 ligase, which responds to DNA damage and regulates H3K4me3 levels, interacts with WDR5 and RbBP5, perhaps through its association with H3K4me3 nucleosomes, although the precise function of WDR5 and RbBP5 in this complex remains unknown (
Higa et al., 2006). Furthermore, a recent study identified a hydrophobic patch on the ubiquitin protein (centered on residues I44, L8, and V70) that mediates binding to a diverse set of WD40 repeat proteins, including WDR5 (
Pashkova et al., 2010). This raises the intriguing possibility that WDR5 may integrate ubiquitin and methylation post-translational modifications in the context of histones as well as in non-histone proteins.
At least two lines of evidence suggest that MLL-WRAD could be involved in the methylation of non-histone proteins. One example can be found in studies of the yeast protein Dam1, a member of the DASH complex that regulates attachment of the mitotic spindle microtubule ends to chromosomes and is required for proper chromosome segregation during cell division in
S. cerevisiae (
Nogales and Ramey, 2009). Set1, the SET/MLL homolog responsible for H3K4 methylation in yeast, also di-methylates Dam1 at K233
in vivo (
Zhang et al., 2005)
. Furthermore, Dam1 K233 di-methylation is regulated by histone H2B K123 ubiquitination and alters phosphorylation of adjacent serine residues within the ‘SKSS’ motif by the aurora kinase Ipl1(
Zhang et al., 2005;
Latham et al., 2011 ). While these findings raise the possibility that non-histone substrates exist for MLL-WRAD in mammals, Dam1 is nonessential for viability in fission yeast (
S. pombe) and no clear Dam1 homolog exists in metazoans (
Thakur and Sanyal, 2011). Another intriguing example is the influenza virus protein NS1, which contains an N-terminal RNA binding domain and a C-terminal ‘ARSK’ sequence that functions as a histone H3 mimic (
Marazzi et al., 2012;
Qin et al., 2014). NS1 directly engages the arginine binding cleft of WDR5 in a manner similar to histone H3, and methylation of the NS1 protein in virus-infected cells leads to recruitment of the human PAF1 transcription elongation complex and promotes viral gene expression (
Marazzi et al., 2012;
Qin et al., 2014). However, it remains to be seen whether the MLL-WRAD complex can methylate other histone-like motifs in human proteins or those of pathogens in a similar manner.
In addition to the many protein–protein interactions that involve WDR5, an extensive list of interactions has recently been described between WDR5 and RNA. Initial studies have focused on the interactions between WDR5 and long non-coding RNAs (lncRNA) including HOTTIP and NeST (
Wang et al., 2011;
Gomez et al., 2013;
Yang et al., 2014). HOTTIP is transcribed from the HoxA locus and forms an enhancer-like binding site for MLL-WRAD, which promotes H3K4me3 and increased expression of genes in close proximity to HOTTIP (
Wang et al., 2011). Alanine scanning mutagenesis of WDR5 identified a binding site for HOTTIP that shares partial overlap with the RbBP5 binding pocket, although point mutations were identified that disrupt WDR5-HOTTIP binding without affecting the WDR5-RbBP5 interaction (
Yang et al., 2014). Remarkably, WDR5 appears to bind ~1500 cellular RNAs through this HOTTIP binding surface, including a large number of mRNAs (
Yang et al., 2014). A separate study found that WDR5 could also interact with two PIWI proteins involved in piRNA-mediated activation of gene expression, most likely in a manner whereby piRNAs recruit piRNA-interacting PIWI proteins and subsequently the MLL-WRAD complex to specific genomic loci (
He et al., 2015). It will be interesting in future studies to address how direct interactions between WDR5 and RNA affect the activity and localization of the MLL-WRAD complex, both inside and outside of the cell nucleus.
Emerging functions of MLL-WRAD beyond the nucleus
Most work on the MLL and WRAD has focused on the important role of these proteins in H3K4 methylation and other nuclear events. However, several recent studies have demonstrated that WRAD subunits accumulate in multiple locations outside of the cell nucleus, which may have important implications for the function of these proteins in cells and organisms (Fig. 1). In 2009, our group reported that in addition to its nuclear localization, DPY-30 is also found at the
trans-Golgi network (TGN) (
Xu et al., 2009). Dpy-30 is recruited to the TGN by binding to the large guanine nucleotide exchange factor BIG1, a resident TGN protein (
Xu et al., 2009;
Xia et al., 2010). Although localization of other MLL-WRAD subunits to the Golgi was not observed, knockdown of DPY-30, ASH2L, or RbBP5 still resulted in the accumulation of recycling endosomes near cell protrusions (
Xu et al., 2009). More recently, overexpression of PAQR3, a Golgi-localized GPCR-like receptor, was shown to cause accumulation of WDR5, ASH2L, RbBP5, and DPY-30 at the Golgi apparatus as well as their depletion from the nucleus and a concomitant decrease in global H3K4me3 (
Liu et al., 2015). It is currently unclear whether the abilities of BIG1 and PAQR3 to recruit WRAD proteins to the Golgi are mechanistically linked. It also remains to be determined whether endogenous WRAD proteins can assembled into a complex at the Golgi and if so, whether this assembly is regulated by specific signals.
WDR5 has also been observed in other cytoplasmic structures. Wang
et al. (
2010) have reported that upon infection of cells with Sendai virus, WDR5 translocates from the nucleus to the mitochondria, where it induces host anti-viral innate response via its interaction with signaling proteins such as VISA, TRAF3, and TRAF6. Depletion of WDR5 inhibited virus-induced expression of IRF3, IFN-β, and NF-kB and impaired assembly of the VISA complex after virus infection, although a role for other MLL-WRAD subunits in viral infection was not investigated (
Wang et al., 2010).
In addition to the Golgi and mitochondria, we recently found that WDR5 also localizes to the midbody, a transient structure that forms between two daughter cells during cytokinesis and orchestrates the final events of cell division (
Bailey et al., 2015). Interestingly, localization of WDR5 to the midbody depends on the integrity of its central arginine binding cavity, although the factor(s) responsible for recruiting WDR5 to the midbody remain to be identified (
Bailey et al., 2015). WDR5 was shown to interact with several midbody-localized microtubule binding proteins including PRC1, KIF4, and MKLP1, and depletion of WDR5 impaired cytokinesis progression by affecting midbody microtubule disassembly (
Bailey et al., 2015). These findings are consistent with a previous report that knockdown of several MLL-WRAD components increases the number of multinucleated cells (
Ali et al., 2014). This multinucleation phenotype was initially attributed to mitotic aberrations in WRAD-depleted cells, and the role of cytokinesis was not examined (
Ali et al., 2014). Further work will be necessary to identify the mechanism by which WDR5 promotes cytokinesis and whether other MLL-WRAD components localize to the midbody during cytokinesis.
A final possibility worth noting is that some cytoplasmic and/or nuclear functions currently attributed to WDR5 may instead be performed by WDR5B, a seldom-studied WDR5 homolog. In humans, the parental WDR5 gene is located on chromosome 9q34, while an intron-less retrotransposed copy known as WDR5B is present on chromosome 3q21 and encodes a protein with ~86% amino acid homology to WDR5 (
Vinckenbosch et al., 2006;
Okamura and Nakai, 2008). In
Arabidopsis thaliana, loss of function mutants in WDR5b (AT4G02730) had no apparent phenotype, while RNAi-mediated depletion of WDR5a (AT3G49660) accelerated the floral transition in a manner that depended on H3K4 methylation (
Jiang et al., 2009,
2011). Although
Arabidopsis WDR5a displays slightly higher amino acid homology to human WDR5 (63% vs. 58% for
Arabidopsis WDR5b), human WDR5 and WDR5B are substantially more similar to each other than to either
Arabidopsis protein, and it remains unclear whether human WDR5B and
Arabidopsis WDR5b are functionally related (
Jiang et al., 2009). In mammalian cells, both WDR5 and WDR5B interact with the CUL4-DDB1 ubiquitin ligase complex, while only WDR5B was detected in a screen for interacting proteins of the lysosomal transmembrane protein ATP13A2 (PARK9), suggesting that WDR5 and WDR5B may perform both redundant and independent functions (
Higa et al., 2006;
Usenovic et al., 2012). Further work will be needed to clearly delineate the cellular functions of WDR5 from those of WDR5B.
Concluding remarks
Since the discovery of a relationship between H3K4 methylation and transcriptional activation in the late 1990s, a wealth of information has been generated regarding deposition of the H3K4me1-3 marks by MLL-WRAD complex members. Although the primary function ascribed to MLL-WRAD proteins involvesthe direct correlation between H3K4 methylation and gene transcription, recent work has revealed interactions between MLL-WRAD and transcription factors, lncRNAs, and other nuclear components that play important roles in this process. There is also evidence that individual WRAD proteins or the subcomplexes formed among them may be involved in nuclear events independently of H3K4 methylation. Finally, at least DPY-30 and WDR5 can be recruited to cytoplasmic organelles. It will be important to determine whether their functions in the cytoplasm requires the assembly of a local MLL-WRAD complex or subcomplex, and/or these functions are linked to histone methylation in the nucleus.
Higher Education Press and Springer-Verlag Berlin Heidelberg