1 Introduction
In eukaryotes, genomic DNA is compacted together with histones to form highly ordered chromatin. To exert its functions such as transcription and replication, chromatin is modulated by a variety of regulatory proteins, including several families of enzymes that catalyze covalent chemical modifications on DNA and histones. These chromatin modifications contribute to establishment and maintenance of relatively stable states of gene expression, thus representing an important epigenetic regulatory mechanism [
1–
3]. Notably, although the catalytic activities of these epigenetic modifiers are usually considered to be essential for their functions, recent studies have revealed that many of them possess important noncatalytic functions (for reviews, see [
4–
6]). While being somewhat surprising, this notion is plausible considering that these enzymes in fact have acquired many extra functional domains, have been involved in multi-protein complexes, and have expanded their family members so that their functions become redundant in many regulatory processes.
The human
SETD2 (also known as
HYPB) gene, orthologous to yeast
Set2 [
7], encodes the only histone methyltransferase responsible for the transcription-coupled trimethylation of histone H3 lysine 36 (H3K36me3) [
8]. The catalytic activity of the SETD2 protein is undertaken by the evolutionarily conserved SET domain (named after the
Drosophila genes
Su(var)3-9,
E(z) and
trx) together with the adjacent N-terminal AWS (associate with SET) and C-terminal PostSET domains [
8]. The direct coupling of SETD2 with gene transcription is determined by its SRI (Set2-Rpb1 interacting) domain that specifically binds to the hyperphosphorylated, elongating form of RNA polymerase II (Pol II) [
8–
11]. Besides these domains, SETD2 also contains a WW domain (named for two conserved W residues) which may mediate protein–protein interactions [
12–
15], a lowly charged region which shows transcriptional activation activity [
8], a SHI (SETD2-hnRNP interacting) domain which interacts with heterogeneous nuclear ribonucleoprotein L (hnRNP L) [
16–
18], and several potentially functional disordered regions [
19]. Although SETD2 has several homologous family members, including ASH1L, NSD1, NSD2 (also known as WHSC1), and NSD3 (also known as WHSC1L1), SETD2 serves its unique role as the only transcription-coupled H3K36me3 methyltransferase, whereas its other family members tend to catalyze H3K36me1/2 but not H3K36me3 [
20–
27]; this is unlike the situation of many other epigenetic modifiers that share redundant activities and functions with their family members. Notably, because of its direct binding to elongating Pol II, SETD2 seems to indiscriminately leave the H3K36me3 mark on virtually all protein-coding genes that are actively transcribed by Pol II [
28]. Thus, it remains a mystery of whether SETD2 acts just equally on all these genes or, under any circumstances, could preferentially regulate any specific target gene. In this regard, an objective approach to dissect the biological functions of SETD2 would be generating specific mutants of SETD2 to determine what function(s) of SETD2 is dependent on which domains or activities.
Since the first identification of recurrent mutations in the human
SETD2 gene in clear cell renal cell carcinoma (ccRCC) [
29], it has now been well documented that
SETD2 mutations are associated with various types of cancers and some developmental defects [
29–
32]. To study the physiologic functions of
SETD2, we previously have generated constitutive
Setd2 knockout mouse and zebrafish models [
27,
33]. Our studies show that loss of
Setd2 in mice causes embryonic lethality at embryonic day (E) 10.5 due to severe defects in vascular remodeling [
27] and, subsequently, our cross-species comparative studies between the mouse and zebrafish models suggest that this vascular phenotype is likely related to metabolic stress that is withstood by the mouse but not the zebrafish embryos [
33]. Meanwhile, we and other groups have also generated several conditional
Setd2 knockout mouse models [
34–
42]. Some of these models, upon crossing with various tissue-specific Cre recombinase expressing mice, have been widely used to study tumorigenesis and the results have recapitulated many aspects of human cancers caused by
SETD2 loss-of-function [
38,
42–
51]. However, there is still lacking an animal model harboring a
SETD2 mutation derived from patients.
In this study, we first identified a ccRCC patient-derived single-nucleotide mutation in
SETD2 (C1685F) [
29] to produce a catalytically dead (CD) SETD2 protein. This SETD2-CD protein was confirmed to only lose its histone methyltransferase activity but retain its specific interaction with hyperphosphorylated Pol II. We then introduced the corresponding mutation in mouse
Setd2 gene (C1659F) into mouse embryonic stem (ES) cells using homologous recombination technology and subsequently generated a new
Setd2-CD knockin mouse model. A side-by-side comparative study between this
Setd2-CD and our original
Setd2 constitutive knockout mouse models was performed to clarify the catalytically dependent and independent Setd2 functions.
2 Materials and methods
2.1 Plasmids, antibodies, and mouse strains
The pGEX-5X-1 based plasmid containing glutathione S-transferase (GST)-fused human SETD2 fragment was the same one used in our previous study [
8]. The corresponding fragment of mouse Setd2 was cloned into the same vector. The mammalian expression plasmids of human SETD2 were generated by cloning the C-terminal fragments of SETD2 into a customized lentiviral vector (OBiO Tech), in which the expression of FLAG-tagged SETD2 is driven by a CMV promoter. Point mutations in these plasmids were generated with site-directed mutagenesis strategy.
The following antibodies that can distinguish different modification states of histone H3 and Pol II (represented by its large subunit Rpb1) were used: H3K36me1 (Abcam, ab9048), H3K36me2 (Cell Signaling Technology, 2901), H3K36me3 (Abcam, ab9050), total H3 (Cell Signaling Technology, 4499S), Ser2-phosphorylated Rpb1 (Cell Signaling Technology, 13499), Ser5-phosphorylated Rpb1 (Cell Signaling Technology, 13523), and total Rpb1 (Cell Signaling Technology, 14958). A customized polyclonal rabbit antibody for SETD2 was developed by immunizing rabbits with a peptide embracing amino acids 916-1038 (according to GenBank protein accession number NP_054878.5) expressed and purified from bacteria (ABclonal Biotechnology). This antibody can also detect mouse Setd2. The GAPDH antibody was purchased from Proteintech (60004-1-Ig).
The C57BL/6J mice were purchased from Shanghai Model Organisms Center. The mice were grown and used according to animal care standards, and the animal studies were approved by the Committee of Animal Use at Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences.
2.2 In vitro histone methyltransferase activity assay
The GST-fusion enzyme proteins were expressed in Escherichia coli strain BL21 and purified with Glutathione-Sepharose 4B according to the manufacture’s protocol (Cytiva). Protein concentration was determined by Coomassie blue staining (Yeasen) of SDS-PAGE gels, using bovine serum albumin as a standard. Recombinant polynucleosomes were purchased from Active Motif (31466) and were used as substrates. The in vitro histone methyltransferase activity assay was performed by adding 1.2 μg purified GST-fusion enzymes, 1.2 μg recombinant polynucleosomes, 10 μmol/L S-adenosylmethionine to a final volume of 30 μL methylation buffer (50 mmol/L Tris pH8.0, 50 mmol/L NaCl, 1 mmol/L MgCl2, 2 mmol/L DTT, 5% glycerol). The reaction system was incubated at 30 °C for 1.5 h and stopped by adding 7.5 μL 5× SDS sample buffer. A part of the reaction products was separated by SDS-PAGE and visualized by Coomassie blue staining as loading controls, and the others were subjected to immunoblot analysis with specific antibodies for H3K36me1, H3K36me2, and H3K36me3.
2.3 Generation of SETD2 knockout cell lines
Monoclonal SETD2 knockout HEK293T cell lines were generated using CRISPR/Cas9 technology. Two sgRNAs (5′-TTAAAGAACCAGTTGATACGAGG-3′; 5′-GTTGTGTATGATCGAACTCAAGG-3′) were designed to target exon 3 of human SETD2, and the targeting plasmids were constructed using the pSpCas9-Puro (PX459) vector. HEK293T cells were transfected with the targeting plasmids and selected with puromycin. Monoclonal cell clones were isolated using the limiting dilution method, and the knockout of SETD2 in the cell lines were validated by sequencing the genomic PCR products and TA clones, as well as immunoblot analysis of protein expression.
2.4 Cell transfection and coimmunoprecipitation (co-IP)
SETD2 knockout HEK293T cells were transfected with FLAG-tagged wild-type (WT) or mutant C-terminal
SETD2 plasmids using Polyethylenimine Linear (PEI) MW40000 (Yeasen, 40816ES03). The transfected cells were harvested 48 h after transfection and lysed with T/G lysis buffer (20 mmol/L Tris HCl, pH 7.5, 300 mmol/L NaCl, 50 mmol/L NaF, 2 mmol/L EDTA, 1% Triton X-100, and 20% glycerol) [
52]. Cell lysates were incubated with anti-FLAG magnetic beads (MedChemExpress, HY-K0207) and rotated for 4 h at 4 °C. The beads were precipitated and washed 3 times with T/G lysis buffer. The bound proteins were eluted with SDS sample buffer at 95 °C for 10 min and subjected to immunoblot analysis with antibodies for different modification states of Pol II. Meanwhile, whole cell lysates of these transfected cells were also prepared with SDS sample buffer for immunoblot analysis of the H3K36me1, H3K36me2, and H3K36me3 levels.
2.5 Generation of Setd2 knockin mice
The targeting vector was electroporated into 129/Sv mouse ES cells. The targeted ES cell clones were selected with neomycin and validated by genomic PCR with a pair of primers (forward, 5′-TACTCATTATACTGCTTTTC-3′; reverse, 5′-AAAAAGAATTCTGACTTAAGG-3′) and sequencing of the PCR products. The heterozygous ES cells were microinjected into C57BL/6J blastocysts, followed by implantation into pseudopregnant foster mothers. Male chimeras were mated with C57BL/6J females to generate F1 mice, which were further mated with
Ddx4-Cre (Shanghai Model Organisms Center, NM-KI-225028) C57BL/6J females to generate Neo deleted F2 progeny. The F2 mice were backcrossed to the C57BL/6J strain for eight generations to examine the persistence of the phenotypes. Genotypes of the mice were determined by genomic PCR with a pair of primers spanning the LoxP site (forward, 5′-GCAGTGATGCTGTCTGTTTCA-3′; reverse, 5′-TGCCCTCAGAAGGGCTAATA-3′) and a pair of primers spanning exon 9 (forward, 5′-GGGGCTGAAAATGAAGCATA-3′; reverse, 5′-GATGATCTGGGTTTGATCCCTG-3′). The PCR products were further validated by sequencing. The
Setd2 constitutive knockout C57BL/6J mice were obtained from our previous work [
27].
2.6 Embryo dissection, immunoblot, and histology analysis
Embryos were collected at multiple stages for genotyping, RNA-seq, immunoblot, and histology analyses. For immunoblot analysis, the embryos were lysed with SDS sample buffer and facilitated with a homogenizer. The lysates were incubated at 95 °C for 10–15 min, centrifuged at 12 000 rpm for 10 min, and the supernatant were collected for immunoblot. For histology analysis, the embryos together with yolk sacs were fixed in 4% paraformaldehyde overnight and then subjected to paraffin embedding. The embedded embryos and yolk sacs were cut into 5 μm sections. The sections were dewaxed by immersion in xylene and decreasing alcohol concentration, stained with hematoxylin and eosin (H&E) (Servicebio), and imaged under microscope.
2.7 RNA-seq and data analysis
Total RNA was extracted from whole embryos or yolk sacs using TRIzol reagent (Invitrogen, 15596018). RNA-seq libraries was constructed with KAPA RNA HyperPrep Kits (Roche) and sequenced using NovaSeq 6000 (Illumina). Raw reads were filtered by fastp (0.22.0) with default set. Clean sequenced reads were aligned to the mm10 reference genome by HISAT2 (2.2.1). Aligned reads were extracted by featureCounts (2.0.1) with default parameters and normalized by DEseq2 (1.32.0). Fragments per kilobase of exon model per million mapped reads (FPKM) were calculated using a protocol as described previously [
53]. Principal component analysis (PCA) and hierarchical clustering analysis were performed with Cluster (3.0) and the heatmap was drawn with Java TreeView (1.1.6r4) as described previously [
54]. Gene set enrichment analysis (GSEA) was performed with the version 2023.1 of the Molecular Signatures Database (MSigDB) [
55]. The RNA-seq data have been deposited in the Gene Expression Omnibus with accession number GSE241154.
2.8 Single-cell RNA sequencing (scRNA-seq) and data analysis
Embryos were dissociated, and single-cell suspensions were prepared following a previously established protocol [
56]. In brief, embryos were incubated with TrypLE Express dissociation reagent (Gibco) at 37 °C for 15 min, followed by quenching with heat-inactivated fetal bovine serum (FBS). The resulting single-cell suspension was then washed and filtered through a 40-μm cell strainer before being resuspended in PBS containing 1% FBS. Cell concentration was assessed using a Countstar, and cells were subsequently processed for single-cell RNA sequencing using the 10x Genomics Chromium Single Cell 3′ v3 platform. Sequencing data were aligned to the mm10 reference genome using Cell Ranger to generate gene expression matrices, which were then subjected to downstream analysis using Seurat (4.1.0). Data preprocessing involved filtering cells expressing between 500 and 5000 genes, with mitochondrial content less than 15%. The expression data was log-normalized and scaled, and the variable genes were identified based on their expression patterns. Next, these variable genes were utilized to conduct PCA. Subsequently, cells were clustered based on a shared nearest neighbor graph constructed using the first 30 principal components, with a resolution set at 0.8. Integration of different samples was achieved using the “FindIntegrationAnchors” and “IntegrateData” functions in Seurat. Identification of specific markers for each cluster was conducted using the “FindAllMarkers” function. Finally, cell types were assigned to each cluster through manual review of gene expression profiles, focusing on classic markers.
3 Results
3.1 Identification of the cancer patient-derived SETD2 CD mutation
While many types of human cancers have been documented to be associated with
SETD2 mutations, ccRCC remains at the top of the list [
30,
57,
58] and the contribution of the
SETD2 mutations to ccRCC pathology has been relatively better studied [
47,
48,
59–
64]. We therefore focused on analyzing the
SETD2 mutations in ccRCC. From the Catalogue of Somatic Mutations (COSMIC) database, we retrieved 360 ccRCC cases containing
SETD2 mutations. Relatively high numbers of nonsense mutations (100 cases; 27.78%) and frame-shift mutations (127 cases; 35.28%) suggest that
SETD2 loss-of-function may serve as an important mechanism in tumorigenesis (Fig.1). Meanwhile, single-nucleotide missense mutations in SETD2 were found in 124 ccRCC cases (34.44%). Notably, although these mutations seemingly located throughout the whole protein, they showed a considerable enrichment in the AWS-SET-PostSET (abbreviated as ASP) catalytic domains of SETD2 (Fig.1), thus suggesting that the catalytic activity of SETD2 is probably more important for its tumor suppressive functions.
We then screened several
SETD2 missense mutations within the ASP domains using
in vitro histone methyltransferase activity assays [
8] and identified the C1685F mutation [
29] for further analysis because this mutation could abolish the catalytic activity of SETD2 while barely affecting the stability of the GST-fused SETD2(ASP) protein (see below). The C1685 of human SETD2 is equivalent to C1659 of mouse Setd2. It locates within the PostSET domain of SETD2 and is highly conserved through the evolution from yeasts to humans, as well as in the SETD2 paralogs including ASH1L, NSD1, NSD2, and NSD3 (Fig.1). According to the structural studies of human SETD2 [
65,
66], the C1685 combines with C1631, C1678, and C1680 to tetrahedrally coordinate a zinc ion near the active site (Fig.1), and the C1685F mutation is thus supposed to affect the catalytic activity of SETD2. We then purified both human and mouse versions of the GST-SETD2(ASP) WT and mutant proteins for
in vitro histone methyltransferase activity assays, and the results showed that, while the WT proteins could catalyze mono-, di-, and trimethylation of H3K36 on recombinant nucleosomes, the C1685F/C1659F mutations completely abolished all these catalytic activities but barely affected the stability of the human and mouse GST-SETD2(ASP) proteins (Fig.1–1F).
Next, we asked whether the WT and mutant SETD2 could equally bind to the hyperphosphorylated elongating Pol II and, if they could, whether they indeed show dramatic difference in catalyzing the transcription-coupled H3K36me3 in cells. To answer these questions, we first generated a
SETD2 knockout HEK293T cell line using the CRISPR/Cas9 technology. Compared with the control cells, the
SETD2 knockout cells showed dramatically decreased H3K36me3 but not H3K36me1/2 levels. Given the difficulty in ectopically expressing the full-length SETD2 protein [
8,
67,
68], we transfected these cells with the C-terminal part of the WT and C1685F mutant SETD2, which contained the ASP, WW, and SRI domains (abbreviated as ASPWS) (Fig.1), and results of the co-IP assays showed that both the WT and mutant SETD2(ASPWS) protein could bind to the hyperphosphorylated, but not the unphosphorylated Pol II (Fig.1). Meanwhile, we performed immunoblot analysis of the H3K36 methylation levels in these cells. The results showed that only the WT, but not the mutant, SETD2(ASPWS) protein could substantially restore the H3K36me3 level in the
SETD2 knockout cells; in contrast, the H3K36me1/2 levels showed no difference upon expression of either WT or mutant SETD2(ASPWS) proteins (Fig.1). Collectively, these results indicate that this ccRCC patient-derived
SETD2 mutation can specifically abolish its catalytic activity while retaining its Pol II-interaction, thus providing a rational tool for dissecting the functions of SETD2 dependent or independent of its catalytic activity.
3.2 Generation of the site-specific Setd2-CD knockin mouse model
We employed the targeted homologous recombination technique to generate a mouse model harboring the ccRCC patient-derived
SETD2 mutation. Both the C1685 of human SETD2 and the C1659 of mouse Setd2 proteins are encoded by a TGC codon, in which a substitution for the second letter G by T causes a missense amino acid mutation from C to F. As the C1659 of mouse Setd2 locates in exon 9, we cloned an XbaI/
KpnI-digested genomic fragment embracing exons 7–9 into the targeting vector, while making the G-to-T mutation in exon 9 and inserting a LoxP-Neo-LoxP cassette in intron 8 and a Tk cassette downstream of the 3′ homologous arm as positive and negative selection markers, respectively (Fig.2). Besides the G-to-T mutation, the targeting strategy only leaves a single LoxP site in the middle of intron 8 upon Cre-Lox recombination, so that it provides a marker for genotyping but barely affects the expression of the
Setd2 gene. After introducing the targeting vector into mouse ES cells by electroporation, we selected the neomycin-resistant ES cell clones and validated them by genomic PCR and sequencing (Fig.2 and 2C). Subsequently, we chose clone #9 for blastocyst microinjection and transferred the injected blastocysts into pseudopregnant mice. Chimeras were bred to obtain germline-transmitted heterozygous knockin mice, and their Neo cassettes in the genome were removed by mating with a
Ddx4 (also known as
Vasa) promoter-driven Cre transgenic mouse line, in which the Cre recombinase is specifically expressed in germ cells [
69]. The WT and heterozygous
Setd2 C1659F knockin (
Setd2+/CD) mice were genotyped with PCR primers flanking the remaining single LoxP site in the genome (Fig.2), and the C1659F mutation site was validated by sequencing the PCR products of the genomic region containing exon 9 (Fig.2 and 2F). We then crossed the
Setd2+/CD mice to obtain homozygous
Setd2 C1659F knockin (
Setd2CD/CD) offsprings. However, genotyping of multiple litters showed that, while the numbers of the WT and
Setd2+/CD littermates showed a normal ratio of roughly 1:2, there was not any
Setd2CD/CD mouse born (Fig.2). This finding suggested that the
Setd2CD/CD mice might die
in utero, which was reminiscent of the embryonic lethality phenotype of our
Setd2−/− model [
27]. Therefore, we sought to perform a side-by-side comparative study between these two mouse models.
3.3 Comparable embryonic lethality phenotypes and reduced H3K36me3 levels in the Setd2CD/CD and Setd2−/− mice
To determine whether the embryonic lethality of the Setd2CD/CD and Setd2−/− mice occurred at the same developmental stage, we collected and analyzed their embryos at multiple time points. The results revealed that both Setd2CD/CD and Setd2−/− embryos died at E10.5 as there was no embryo found viable beyond this stage. At E10.5, the Setd2CD/CD and Setd2−/− embryos exhibited comparable levels of overall phenotypes including pale and shriveled yolk sacs and severely retarded growth of the embryos proper; in contrast, the heterozygous embryos of both models (Setd2+/CD and Setd2+/−) showed no apparent defect compared with their WT littermates (Fig.3 and 3B). To exclude the possibility that the Setd2 null-like phenotypes of the Setd2CD/CD embryos were just caused by complete elimination of Setd2 protein by the C1659F mutation, we performed immunoblot analysis of the WT, heterozygous and homozygous Setd2-CD and Setd2 knockout embryos at E10.5. The results showed that, although the Setd2-CD protein level was relatively lower, the Setd2CD/CD embryos still contained a considerable amount of Setd2-CD protein (Fig.3); in contrast, Setd2 proteins in the Setd2−/− embryos were entirely absent (Fig.3). This observation indicates that the C1659F mutation cannot eliminate Setd2 protein in the embryos, and that the limited decrease of protein level may unlikely be able to causes such severe Setd2 null-like phenotypes. We also verified whether the specific loss of Setd2 catalytic activity in the Setd2CD/CD embryos could cause comparable levels of reduced H3K36me3 compared with the Setd2−/− embryos. As a result, immunoblot and quantification analysis showed that the H3K36me3 levels in the Setd2CD/CD and Setd2−/− embryos were similarly reduced to 20%–30% levels relative to their WT littermates, whereas their levels of H3K36me1 and H3K36me2, which might be catalyzed by other methyltransferases, were not reduced (Fig.3–3H). Collectively, these results demonstrated that the loss of H3K36me3 methyltransferase activity of Setd2 per se could cause similar embryonic lethality phenotypes as the complete knockout of the whole Setd2, thus implying that the catalytic activity of Setd2 is essential for its functions in supporting embryonic development.
3.4 Vascular remodeling defects and gene expression profiling of the Setd2CD/CD and Setd2−/− yolk sac
To further characterize the developmental defects of the Setd2CD/CD and Setd2−/− mice and to understand the molecular mechanisms, we performed a comparative gene expression profiling based on RNA-seq analysis of their yolk sacs at E9.5. This relatively early developmental stage was chosen because the Setd2CD/CD and Setd2−/− yolk sacs at E9.5 were still apparently normal and thus would be more suitable for identifying earlier molecular mechanisms underlying the phenotypes. PCA analysis of the highly variable genes among the 6 groups of yolk sac samples (i.e., the yolk sacs of the Setd2CD/CD and Setd2−/− mice, as well as those of their heterozygous and WT littermates; 2 samples in each group) showed that the Setd2CD/CD and Setd2−/− yolk sacs (4 samples) were closely clustered, whereas all the heterozygous and WT samples were clustered into another group (Fig.4). This result indicates that, consistent with their comparable phenotypes, the Setd2CD/CD and Setd2−/− yolk sacs also share considerable similarities at transcriptomic level.
Relevant to the fact that the
Setd2−/− embryos die of vascular remodeling defects, our GSEA analysis showed that the angiogenesis hallmark genes were similarly downregulated in both
Setd2CD/CD and
Setd2−/− yolk sacs compared with their WT littermates (Fig.4), and these results were further validated by an unsupervised hierarchy clustering analysis of the above-mentioned 6 groups of yolk sac samples (Fig.4). Furthermore, since we previously used cDNA microarray techniques to identify a transcriptomic signature including several differentially expressed genes in
Setd2−/− yolk sacs compared with their WT littermates [
27], we herein used RNA-seq to re-analyze these genes. As a result, this signature was largely recapitulated by the RNA-seq analysis of both
Setd2CD/CD and
Setd2−/− yolk sacs, as majority of the previously identified genes showed similar downregulation (
Gja4,
Plg,
Angptl6, and
Vegfb) and upregulation (
Ccn2,
Ccn1,
Lama1, and
Foxo3); meanwhile, probably due to technical difference between the RNA-seq and microarray platforms, several previously identified differentially expressed genes by microarray showed no significant change in the RNA-seq results (Fig.4 and 4E). Nevertheless, the side-by-side comparison of gene expression patterns of
Setd2CD/CD and
Setd2−/− yolk sacs demonstrated clear similarity between these two mouse models. To validate their vascular remodeling defects, we performed histological section analysis of the
Setd2CD/CD and
Setd2−/− yolk sacs. The results showed that both
Setd2CD/CD and
Setd2−/− yolk sacs contained a number of widely enlarged cavities between the visceral endoderm and mesoderm, which were very rarely observed in the yolk sacs of their WT littermates (Fig.4 and 4G).
Apart from the above-mentioned angiogenic genes, we also performed a broader analysis of the gene expression profiles to gain more insights into the potential mechanisms underlying the vascular remodeling defects of the
Setd2CD/CD and
Setd2−/− yolk sacs. Notably, our previous cross-species comparative studies between
Setd2 knockout mouse and zebrafish models have suggested that the embryonic lethal vascular remodeling phenotypes are likely related to metabolic stress that is withstood by the mouse but not the zebrafish embryos [
33]. Therefore, it would be important to continue searching for stress-related genes and more upstream differentially expressed genes that could be directly regulated by Setd2. In particular, GSEA analysis of the RNA-seq data against the hallmark gene sets [
70] suggested that the p53 pathway and the heme metabolism were activated, whereas the coagulation and epithelial-mesenchymal transition pathways were suppressed, in both
Setd2CD/CD and
Setd2−/− yolk sacs (Fig.5). These observations imply that the tissues might suffer certain types of stress (e.g., DNA damage and hypoxia) and related dysfunctions. Notably, a comparison between the gene expression profiles of the homozygous and heterozygous
Setd2-CD and
Setd2 knockout mice can further exclude the possibility that the phenotypes of the
Setd2CD/CD mice were caused by decreased Setd2 protein levels rather than loss-of-activity, as if this possibility is real, the gene expression pattern of the
Setd2CD/CD mice, which still contain considerable amount of Setd2 protein, would be more like
Setd2+/−. However, the gene expression profiling showed a close similarity between
Setd2CD/CD and
Setd2−/−, but not with
Setd2+/− (Fig.5), thus suggesting that the
Setd2 null-like phenotypes of the
Setd2CD/CD mice are caused by Setd2 loss-of-activity rather than protein instability.
Furthermore, GSEA analysis against the Gene Ontology gene sets [
71] showed that collagen assembly related genes were downregulated in the
Setd2CD/CD and
Setd2−/− yolk sacs (Fig.5), and this result was confirmed by an unsupervised hierarchical clustering analysis of all samples of the homozygous, heterozygous and WT yolk sacs (Fig.5). Notably, the immediate relevance of collagen proteins to angiogenesis [
72] and the special genomic features of the collagen genes (i.e., most collagen genes are very long and highly interrupted) [
73] imply that these genes may serve as candidate genes preferentially regulated by Setd2 and H3K36me3 in the developmental processes (for more detailed historical and logical discussions on this point, see
Discussion section). Taken together, these results suggest that certain stress response, metabolism and cellular communication pathways might play important roles in the vascular remodeling defects of the
Setd2CD/CD and
Setd2−/− yolk sacs.
3.5 Setd2CD/CD embryos exhibit slightly milder developmental defects than Setd2−/−
To investigate the developmental defects of the Setd2CD/CD and Setd2−/− embryos proper, we isolated the embryos at E9.5 and performed comparative gene expression profiling together with the embryos of their heterozygous and WT littermates. Notably, relative to the closely comparable gene expression patterns between the Setd2CD/CD and Setd2−/− yolk sacs, their embryos showed bigger differences in gene expression alterations. For example, in the Setd2−/− embryos, we found that 293 and 257 genes were up- and downregulated, respectively, compared with those of their WT littermates; in contrast, there were only 88 and 144 genes were found to be up- and downregulated, respectively, in the Setd2CD/CD embryos (Fig.6). Nonetheless, the similarity between the Setd2CD/CD and Setd2−/− embryos was still evident because their genes were largely regulated in the same direction, i.e., the upregulated genes in the Setd2CD/CD embryos were overlapped with the upregulated genes in the Setd2−/− embryos, and vice versa (Fig.6). To further evaluate the level of change of each gene, we ranked the shared upregulated genes along their fold changes in Setd2−/−, and the fold changes in Setd2CD/CD were generally less dramatic (Fig.6, left). The same trend was also observed in the downregulated genes (Fig.6, right). These results suggested that gene expression alterations in the Setd2CD/CD embryos were less dramatic than those in the Setd2−/− embryos.
We therefore further compared the phenotypes of the
Setd2CD/CD and
Setd2−/− embryos in detail, especially taking their heterogeneities into consideration. As reported in our previous study [
27], the
Setd2−/− embryos showed a considerable degree of heterogeneity in their growth retardation phenotypes, which could be measured by their asynchronous completeness of developmental milestone events. In particular, at E9.5, close to half of the
Setd2−/− embryos were found showing incomplete attachment of their allantoides to the chorion; in contrast, although the incomplete chorioallantoic attachment was also observed in some
Setd2CD/CD embryos, the majority of them had completed this milestone event (Fig.6). Furthermore, histological section analysis showed different levels of developmental defects in the
Setd2CD/CD and
Setd2−/− placentas. Compared with the WT placentas, in which the labyrinthine layer was well developed to have blood vessels invaded and interdigitated properly, the majority of
Setd2CD/CD placentas contained a much thinner labyrinthine layer with blood vessels remained at the periphery, whereas the placentas of those
Setd2−/− embryos with incomplete chorioallantoic attachment showed no labyrinthine layer (Fig.6 and 6E). Therefore, this different severeness of developmental defects between the
Setd2CD/CD and
Setd2−/− placentas, though only notable when considering their heterogeneities, suggests that the Setd2-CD protein could still retain minor functions in regulation of mouse embryonic development.
3.6 scRNA-seq analysis reveals differential regulation of allantois-specific 5′ Hoxa cluster genes in Setd2CD/CD and Setd2−/− embryos
To further characterize the differences between the Setd2CD/CD and Setd2−/− embryos at single cellular level, we isolated the embryos at E8.5 and performed scRNA-seq analysis. At this stage, the allantoides of the Setd2CD/CD and Setd2−/− embryos had not been attached to the chorion and thus there was not apparent phenotypic difference between them. Our uniform manifold approximation and projection (UMAP) analysis of the scRNA-seq data led to identification of 20 distinct cell clusters (Fig.7), whose identities were annotated by specific marker genes (Fig.7). All of these clusters, including the allantois (annotated by marker genes including Pitx1, Slc38a4, Amot, Hoxa9, Plac1, and Sgce), were similarly observed in the Setd2CD/CD and Setd2−/− embryos (Fig.7), and the frequencies of each cluster were also correlated (Fig.7). These results suggested that the cellular identities and compositions in the Setd2CD/CD and Setd2−/− embryos were closely comparable.
To explore the potential mechanism relevant to the different severeness of allantois developmental defects observed at later stage (E9.5), we then focused on analyzing the genes that were enriched and known to play important roles in the allantois of the
Setd2CD/CD and
Setd2−/− embryos. Notably, previous studies have shown that the spatiotemporal expression of the 5′
Hoxa cluster genes in the allantois is important for placental labyrinth development in mice [
74,
75]. Indeed, our scRNA-seq data showed that
Hoxa9,
Hoxa10,
Hoxa11, and
Hoxa13 were relatively highly expressed in the allantois cells (Fig.7). Comparison between the
Setd2CD/CD and
Setd2−/− embryos showed that, while these 5′
Hoxa cluster genes were dramatically downregulated in the
Setd2−/− allantois cells compared with the WT littermates, their changes in the
Setd2CD/CD allantois cells were very little (Fig.7). These results suggest that the inactivation of the 5′
Hoxa cluster genes specifically in the
Setd2−/− allantois cells may provide a mechanism underlying the severer allantois developmental defects in the
Setd2−/− embryos, and that the spatiotemporal activation of the 5′
Hoxa cluster genes may require the entire functions of Setd2 besides its catalytic activity.
4 Discussion
In this study, we have generated the first Setd2-CD mouse model by knocking-in a single-point mutation which is derived from cancer patients. This mutation has been verified to specifically abolish the catalytic activity but not the transcriptional coupling of Setd2. This model provides a vital tool for determine whether the physiologic function of Setd2 is dependent on its catalytic activity. A side-by-side comparative study between the Setd2CD/CD and Setd2−/− mice at both phenotypic and molecular levels is important for properly interpreting the phenotypes and for drawing unbiased conclusions. Our results demonstrate both similarities and differences between these two models. On the one hand, they show very similar phenotypes, including embryonic lethality at E10.5, vascular defects, and the developmental retardation, all of which are underlain by closely comparable transcriptomic alterations. On the other hand, they also show certain differences, though being relatively subtle, in specific developmental events such as the chorioallantoic developmental defects which could be explained by differential regulations of the allantois-specific 5′ Hoxa cluster genes. Taken together, these results suggest that the essential functions of Setd2 in supporting mouse embryonic development is largely dependent on its catalytic activity, whereas in specific circumstances the Setd2-CD protein may still be able to exert minor noncatalytic functions.
Although epigenetic modifiers are usually thought to function mainly through their catalytic activities, their noncatalytic functions recently have attracted much attention owing to several elegant studies on specific CD mutants in comparison with complete knockout of the whole proteins. For example, in a mouse ES cell model harboring CD mutants of the H3K4me1 methyltransferases Mll3 and Mll4, studies showed that, despite the loss of H3K4me1 on the enhancers of the Mll3/4-target genes, transcriptional levels of these genes had much smaller changes in the Mll3/4-CD cells than that of those in the Mll3/4 double knockout cells, thus suggesting an important noncatalytic function of Mll3/4 [
76]. This notion was further explored by a series of studies on a
Drosophila model harboring the CD mutants of Trr (ortholog of mammalian Mll3/4), in which Trr-CD could rescue the embryonic lethality caused by Trr knockout, indicating that the viability of the embryos requires only the noncatalytic but not the whole functions of Trr [
77]. Subsequently, a new domain within Trr was identified to mediate this noncatalytic function through interacting with, and stabilizing, the H3K27 demethylase Utx [
78]. Furthermore, noncatalytic functions of several histone acetyltransferases such as
Drosophila Nejire (ortholog of mammalian Cbp/p300) and Gcn5 (ortholog of mammalian Gcn5/Pcaf) have also been reported [
79]. Based on these studies, it has been thoughtfully conceived that the noncatalytic activities of these epigenetic modifiers would be related to their involvement in multiprotein complexes through protein–protein interactions and/or the nature of functional redundancies among the different chromatin modifications [
6]. In contrast, however, our study provides an example in the opposite side because the major physiologic function of Setd2 in mouse embryonic development is largely dependent on its catalytic activity. Notably, another example supporting this side is that the mouse model harboring a CD mutation in the DNA methyltransferase Dnmt1 showed similar embryonic lethal phenotypes as the Dnmt1 complete knockout mice [
80]. Therefore, given the fact that both Setd2 and Dnmt1 play relatively nonredundant roles in H3K36me3 and DNA methylation maintenance, respectively, these studies jointly suggest that Setd2 and Dnmt1 may represent a class of epigenetic modifiers whose functions are heavily dependent on their catalytic activities because they themselves, and their catalyzed chromatin modifications, are basically irreplaceable.
Setd2 and H3K36me3 have been implicated into many aspects of genomic regulation, including gene transcriptional elongation, repression of intragenic cryptic transcription, repair of DNA damage, and mRNA splicing (for review, see [
81,
82]). However, as SETD2 catalyzes H3K36me3 on virtually all actively transcribed protein-coding genes, it remains unclear whether Setd2 tends to just indiscriminately regulate all target genes or preferentially regulate specific genes in any circumstance. If the latter possibility is real, those target genes that are preferentially regulated by SETD2 shall have special structural features (e.g., gene length, genomic structure or regulatory elements). Interestingly, previous studies have shown that, in yeasts, Set2-mediated H3K36 methylation preferentially regulates longer and infrequently transcribed genes [
83], although this finding remains controversial due to variable statistical methodology [
84]. Potentially related to this notion, we herein found that many collagen genes, which are highly interrupted and thus contain large numbers of exons [
73], are downregulated in the
Setd2CD/CD and
Setd2−/− yolk sacs at very early developmental stage, therefore suggesting that these collagen genes could be considered as model genes in mammals to study whether the long and highly interrupted genes tend to be more dependent on Setd2 and H3K36me3. Furthermore, regarding the potential catalytically independent functions Setd2, we found that the 5′
Hoxa cluster genes are differentially regulated between the
Setd2CD/CD and
Setd2−/− embryos. This observation is possibly related to previous studies showing that, in
Drosophila, the
Hox cluster genes are more sensitive to the H3K36R and H3K36A mutants which could affect the crosstalk between H3K36me3 and PRC2-mediated H3K27me3 [
85]. Besides these mechanisms, it has also been found that, in yeasts, overlapping genes (including those containing antisense RNAs within gene body) can be regulated by Set2 in a special way because of the possible promoter-locating H3K36me3 left by Set2 from the cDNA strand [
86,
87]. In this regard, considering that the mammalian genomes also contain some overlapping genes [
88–
90], it would be interesting to investigate whether these genes may rely more on Setd2 either dependent or independent of the catalytic activity.
Since the discovery of SETD2 mutations in cancer, to our knowledge, there have been at least 10
Setd2 knockout mouse models independently generated by different research groups (including 1 constitutive and 9 conditional
Setd2-knockout models) [
27,
34–
42]. In the present study, we generated the first
Setd2 point mutation knockin mouse model. In addition to determining the catalytically dependent and independent functions of Setd2, we have also characterized this new model by comparing the heterozygous
Setd2-CD and
Setd2 null embryos. Significant difference between the
Setd2+/CD and Setd2
−/− mice suggests that the herein studied mutant Setd2 protein does not function as a dominant-negative factor against the WT Setd2 protein, while the similarity between
Setd2CD/CD and Setd2
−/−, but not Setd2
+/−, suggests that the loss of catalytic activity, rather than protein instability, represents the key mechanism for this mutant Setd2 protein to cause the phenotypes in the mice. Lastly, although the embryonic lethality restricts the immediate use of this model to study tumorigenesis, crossing this model with the conditional
Setd2 knockout models would be an option to create a window to evaluate the tissue-specific role of this patient-derived mutation, and the same approach would be useful to study many other patient-derived
SETD2 mutations in the future.