1 Introduction
Genomic alteration emerges as a fundamental driver of tumorigenesis. Recent compelling pieces of evidence offer new insights into chromatin structural reorganization and functional changes that regulate gene expression and activation during cancer development [
1]. The three-dimensional (3D) analysis of chromatin architecture defines a functional DNA structure that orchestrates chromatin loops in which promoters (P) and enhancers (E) coordinately regulate the expressions of a variety of genes. Chromatin loops develop when CCCTC binding factor (CTCF) and cohesin located at distant loci on a linear chromosome interact to form a topological close-up structure [
2]. The disruption of CTCF binding sites or their orientation in loop anchors can disrupt enhancer–promoter (E-P) interactions and impair gene expressions, which ultimately gives rise to an abnormal phenotype [
3,
4]. Individual chromatin CTCF loops spatially compact together to form submegabase regions, known as topologically associated domains (TADs), which are highly self-interacting and reflect genomic signatures [
5]. The DNA genome is glossily designated into large-scale compartments A and B, in which compartment A involves an accessible and transcriptionally active genome, whereas compartment B is compact and relatively silent. Reorganization and conversion between these compartments frequently occur in cancer [
6–
9]; however, the molecular mechanisms underpinning topological reorganization and carcinogenesis remain to be established.
Gallbladder cancer (GBC) is the most common malignancy of the biliary tract system, with a median survival of less than one year [
10]. The poor prognosis of this disease, which lacks notable symptoms at the early stages, is largely attributed to the delayed diagnosis [
11]. A total of 25% of patients are considered acceptable for curative surgery, and less than 16% can survive for more than five years [
12]. Thus, improvements in early cancer detection and prevention are eminent. GBC prevention requires the decreased levels of cancer risk factor, including cholecystitis, cholelithiasis, and gallbladder polyps [
12,
13]. In addition, a large body of research evidence has revealed a variety of key molecules that govern cancer development. Our previous work primarily focused on the molecular and genetic signatures of GBC and demonstrated that genomic dysfunctions promoted the malignant transformation of GBC [
14–
18]. However, topological changes in genome and chromatin organization in GBC are poorly understood.
In this study, we employed in situ Hi-C, RNA-seq, and ChIP-seq to systemically investigate the hierarchical layers of cancer genome reorganization and subsequent functional changes in GBC tissues and cell lines. We observed that 3D reconfiguration and functional changes in compartments, TADs, and CTCF-mediated loops that harbor genes (e.g., BMP4, KRT19, and FOXA2) driven by E-P interaction coordinately contributed to the genotypic and phenotypic changes in GBC cells, which promoted cancer cell growth. This study sheds light on the newly discovered signature of chromatin reorganization and activity that regulates the expression of cancer-related genes and drives the development of GBC.
2 Materials and methods
2.1 Patients and clinical specimens
Gallbladder cancerous and paired paracancerous tissues were obtained from patients who underwent cholecystectomy without neoadjuvant chemoradiotherapy or endocrinotherapy at the Department of General Surgery, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, China. All the patients were subjected to accept the written informed consent before enrollment. The study was approved by the ethics committee of Xinhua hospital (No. XHEC-D-2021-071) and the study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Informed consent was obtained from all patients for being included in the study.
2.2 Cell culture and reagents
The GBC-SD cell line was purchased from the cell bank of the Type Culture Collection of the Chinese Academy of Sciences (Shanghai, China). The NOZ cell line was obtained from the Health Science Research Bank (Osaka, Japan). Both cell lines were maintained in high-glucose Dulbecco’s Modified Eagle Medium supplemented with 10% fetal bovine serum, penicillin G (100 U/mL), and streptomycin (100 g/mL). The cells were maintained as monolayer cultures at 37 °C in humidified air with 5% CO2 and 95% air. Human biliary epithelial cells (HBECs) were purchased from ScienCell (#5100, USA) and characterized by immunostaining (CK18).
2.3 In situ Hi-C library construction
An
in situ Hi-C library was constructed as described by Rao
et al. with minor modifications [
19]. Briefly, approximately two million cells from gallbladder cancerous and paracancerous tissues were prepared as a single-cell suspension and readied for crosslinking. The cell samples were incubated in 1% formaldehyde for 10 min at room temperature (RT) and quenched by the addition of proper volumes of glycine (final concentration: 0.2 mol/L). Then, the samples were lysed with an ice-cold Hi-C lysis buffer containing 10 mmol/L Tris-HCl (pH 8.0), 10 mmol/L NaCl, 0.2% Igepal-CA630, and 1× protease inhibitors. After centrifugation, the pelleted nuclei were resuspended in 0.5% sodium dodecyl sulphate (SDS) and incubated at 62 °C for 10 min, followed by the addition of 10% Triton X-100 to quench SDS. A restriction enzyme of 100 U
MboI (R0147, NEB, USA) was used to digest the chromatin overnight at 37 °C, followed by incubation at 62 °C for 20 min to inactivate
MboI. The DNA ends were filled with Klenow enzyme (M0210, NEB, USA) and simultaneously labeled with biotin-14-Datp (#19524016, Thermo, USA). Chromatin was ligated again using 2000 U T4 ligase (M0202, NEB, USA) for 4 h at RT. Then, the samples were digested with proteinase K (20 mg/mL) for 30 min at 55 °C, followed by overnight incubation at 68 °C to reverse the crosslinks. DNA was precipitated using ethanol and sodium acetate and fragmented from 300 bp to 500 bp with sonication. Biotin-labeled DNA was precipitated using streptavidin-conjugated beads (#65602, Life Technologies, USA), and the beads were washed thrice with 1× Tween wash buffer containing 5 mmol/L Tris-HCl (pH 7.5), 0.5 mmol/L ethylenediaminetetraacetic acid, 1 mol/L NaCl, and 0.05% Tween-20 at 55 °C. A DNA library was established through DNA end repair, dA-Tailing, and adaptor ligation using the NEB DNA library prep kit (E7370, NEB, USA). After washing, DNA samples were amplified via 10 cycles of polymerase chain reaction (PCR). The 300–500 bp DNA fragments were selected with Ampure XP beads (A63880, Beckman Coulter, USA) and sequenced using the Illumine NovaSeq platform.
2.4 ChIP-seq library construction
ChIP was performed using the Enzymatic Chromatin IP Kit, in accordance with the manufacturer’s instructions (#9003, CST, USA). Briefly, cells were cross-linked with 1% formaldehyde for 10 min at RT, and chromatin was fragmented by digestion with micrococcal nuclease. The lysate was then immunoprecipitated using antibodies against CTCF (#3418, CST, USA), H3K27ac (#8173, CST, USA), and H3K4me3 (#9751, CST, USA). Next, DNA was purified using DNA purification spin columns. Finally, a DNA library was constructed using the NEB DNA library prep kit and sequenced by the Illumina NovaSeq platform.
2.5 RNA-seq library construction
Total RNA was extracted from the cell lines and tissues using TRIzol reagent (#15596026, Invitrogen, USA). The resulting libraries were generated using the NEB RNA Library Prep Kit for Illumina (E7770, NEB, USA) following the manufacturer’s instructions, and barcodes were added to the attribute sequences of each sample. RNA-seq was performed with two biological replicates for GBC cell lines, normal epithelial cells, and GBC-1 tissues. No biological replicate was performed for GBC-2 tissues. The pooled libraries were sequenced on the Illumina NovaSeq platform, and paired-end reads were obtained.
2.6 Contact heatmap analysis
Adapters of raw Hi-C data were trimmed by cutadapt (version 3.2). HiC-Pro [
20] was used to map the previously generated clean data to the human reference genome (hg19). Contact matrices were normalized by iterative correction and eigenvector decomposition (ICE-normalized) [
21] and further normalized using 100 million contacts per sample. The Matplotlib plotting package was used to generate whole-genome heatmaps at 1 Mb resolution.
2.7 Whole-genome sequencing (WGS) and translocation analysis
Whole-genome DNA of GBC-SD and NOZ cell lines were extracted and sequenced at 80× depth. After quality control and preprocessing, the clean data were aligned to hg19 using BWA (version 0.7.17). Translocation events were identified by Delly [
22] software (version: 1.0.3; command: delly call -t TRA -g hg19) and further filtered with the number of paired-end supporting the structural variant greater than or equal to 10. The resulting high-quality events were used for further analyses. We then determined the highest 100 interchromosomal interactions using the following process. Intrachromosomal bin pairs were first deleted from the matrices and then sorted based on the interaction count. Then, adjacent bin pairs were merged. When the merged adjacent bin pairs reached 100, the number “N” of bin pairs was determined. “N” was 222 in GBC-SD cells and 278 in NOZ cells. In addition, the bin pairs from Hi-C data were compared with translocations identified using the WGS data. NeoLoopFinder [
23] software was used for genome reconstruction around the breakpoints of translocation events found in the WGS and Hi-C data sets (command: assemble-complex SVs –minimum-size 5000 –balance-type ICE).
2.8 Analysis of compartments A/B
Juicertools [
24] was used to analyze compartmentalization through the first principal component (PC1 values, eigenvector) of Pearson’s matrix at 500 kb resolution. The whole genome was divided into two groups based on the eigenvector sign. The gene expression density with fragments per kilobase of transcript per million mapped reads per unit distance was calculated for high and low expression densities, in which the high density as a positive value was designated compartment A, and the low density as a negative value was designated compartment B.
2.9 RNA-seq analysis
Trimmomatic (version 0.39) [
25] was used to trim universal Illumina adaptors and filter low-quality reads. High-quality reads were aligned to h19 using STAR (version 2.7.8a) [
26]. Read counts were generated by HTseq [
27] (version 0.13.5, command: -s no -a 10 -t exon -i gene_id -m intersection-nonempty -f bam). A set of R packages was used to perform downstream processing. We collected differentially expressed genes (DEGs) using DESeq2 (version 1.30.1) [
28]. For samples without biological replication, DEGs were generated by edgeR [
29].
2.10 ChIP-seq analysis
ChIP-seq raw data were truncated to 50 bp and aligned to hg19 using bowtie2 [
30] with default parameters. Peaks were called using MACS2 [
31], including H3K27ac, H3K4me3 (-B -g hs –shift 73 –broad -f BAMPE -q 0.05), and CTCF (-B -g hs -f BAMPE -q 0.05), where parameter q represents the q-value with a cutoff of 0.05. DeepTools [
32] was used to calculate genome coverage with reads per kilobase of exon per million reads mapped (RPKM) normalization and generate bigWig files. For the joint analysis with A/B compartments, we merged the peak file of the tumor and control groups using the merge function of bedtools [
33]. Next, we applied the multicov function to calculate the read count. Finally, edgeR package was applied in differential analysis to calculate the fold change (FC)of the Chip-seq signal.
2.11 CTCF analysis
Loops generated from juicer_tools were annotated by CTCF peaks, and the proportions of CTCF loops and non-CTCF loops were distinguished. Enrichment of the CTCF signal in loop anchors was computed and displayed using computeMatrix and plotheatmap scripts from DeepTools. The enrichment of the CTCF signal at the left and right anchors was further analyzed separately. For CTCF motif orientation analysis, we first determined the direction of each CTCF peak. The CTCF peak region sequence in hg19 was extracted using bedtools (version 2.30.0, command: getfasta). The human CTCF motif matrix (MA0139.1) was obtained from the JASPAR website, and the sequence of each peak was compared with the motif matrix using fimo software (version 5.3.3, command: –max-strand) to search for candidate binding sites with a P value cutoff equal to 1e–4. If multiple CTCF binding sites were present in one peak, the CTCF with the highest score was selected to determine the orientation of the peak. Second, we excluded loop anchors without CTCF peaks to conduct the joint analysis of loops and CTCF orientation. We also filtered out loops in which one anchor point was enriched by multiple peaks whose directions were conflicted.
2.12 Boundary analysis (identification of TADs)
Boundaries were detected by the index of insulation score as previously described [
34]. Briefly, the sparse matrices generated by HiC-Pro were converted into dense ones. The total number of interactions within a square reading frame that slid along the diagonal of the matrix was assigned to the 10 kb diagonal bin. Then, the insulation score was normalized by comparison of the total number of interactions with the corresponding chromosome. Then, this score was normalized relative to the total number of interactions on the corresponding chromosome. The minima of the normalized insulation score curve represented TAD boundaries, and the bins on minima were extended by 30 kb to both sides to finally determine the range of boundaries. The insulation score of each 10 kb bin was calculated using Perl script (matrix2insulation.pl, version: 1.0.0, command: -b 10000 -im mean -bmoe 3 -nt 0.1 -v). To identify differences in the boundary strength between samples, we aligned all boundaries of each sample at the midpoint with an extension of 50 bins to both ends and compared the average insulation scores. TADs were defined as regions between two boundaries. To detect different inter/intra-TAD interactions between samples, we defined an 80 × 80 bin window with the boundary at the center. The average interactions of all windows in each sample were calculated and compared.
2.13 Loop analysis
HICCUPS [
19] was used to call loops at 5, 10, and 25 kb resolutions, and different loops between samples were detected using juicer_tools. Venn diagrams were generated by R (version 3.6.1, package: eulerr). Loops were then annotated using the P and E table from the ChIP-seq analysis results. P were defined with +/−2 kb to the transcript start sites (TSS). E were defined as regions with H3K27ac peaks except P regions using bedtools (version 2.30.0, command: substract -A). The length of loops was determined by the distance between the midpoints of the left and right anchors.
2.14 Gene Ontology (GO) analysis
The loops were annotated to genes based on the genomic locations of loop ends and genes using bedtools. Genes located at cancer or normal unique loop ends were imported to the R package ClusterProfiler v4.0.5 [
35] for GO analysis using the enrichGO function with the following parameters: pvalueCutoff = 0.05, qvalueCutoff = 0.05, and pAdjustMethod = BH. The default background parameters of this package were applied in which whole human reference genes (R package: org.Hs.eg.db) were used as background.
2.15 Transcription factor (TF) motif enrichment analysis
Scripts from Homer [
36] were used for motif enrichment analysis. Loops with lengths greater than 100 kb were defined as long loops, and those with a length of less than 100 kb were short loops. Then, the Perl script (findMotifsGenome.pl) was used in different motif enrichment analyses with long loops as the background (command: size 2000). For E-P loop-related E and P TF enrichment analysis, E and P were classified into two groups, that is, those that were related and not related to E-P loops, with the latter used as the background.
2.16 Gene expression and E-P loop interaction analysis
Loop files of two compared cell lines were concatenated (HBEC with GBC-SD; HBCE with NOZ) and annotated in the E and P table. Interactions were evaluated as the interaction counts between bins where the left and right anchors were located. Heatmaps were generated using the R package pheatmap.
2.17 Gene expression with CTCF and H3K27ac signal
The comparisons of CTCF and H3K27ac signals at the P regions of DEGs were performed using bigWigCompare 3.5.1 (command: –skipNAs). A heatmap was plotted with computeMatrix (command: reference-point –reference Point center –sortRegions keep –missingDataAsZero –skipZeros -a 2000 -b 2000) and plotHeatmap (–sortRegions keep).
2.18 Genes selected for CTCF loop–gene expression model
We selected genes that met the following three criteria: (1) highly expressed (
≥ 1 and
P value ≤0.01) in GBC cell lines compared with HBEC; (2) highly expressed (
≥ 1 and
P value ≤ 0.05) in nine GBC tissues compared with the matched normal tissues obtained from the RNA microarray data in the Gene Expression Omnibus (GEO) database (GSE76633) [
37]; (3) expression regulated via cancer-specific E-P loops. More specifically, different loops between normal cells and GBC cell lines obtained from Juicer_tools were annotated based on the P and E information. To further identify putative E, we defined genome loci with H3K27ac peaks and without H3Kme3 peaks as candidate E. Then, genes regulated by cancer-specific E-P loops and with higher expression in GBC cell lines than in HBEC cell lines were selected. Seven genes were finally identified for further analysis.
2.19 Western blot
GBC-SD and NOZ cells were lysed with radioimmunoprecipitation assay buffer (P0013C, Beyotime, China). Protein lysates were separated on a 10% polyacrylamide gel and transferred to a polyvinylidene difluoride membrane (IPVH00010, Millipore, Germany). Then, the membrane was incubated overnight at 4 °C with BMP4 (#4680, CST, USA) or GAPDH antibody (#5174, CST, USA) followed by incubation with horseradish peroxidase-conjugated IgG antibody for 1 h at RT. Finally, the chemiluminescence signal was detected by a ChemiDoc Touch System (Bio-Rad).
2.20 Quantitative real-time PCR
RNA was extracted with TRIzol reagent (#15596026, Invitrogen, USA) and reverse transcribed to cDNA using an reverse transcription reagent kit (RR047, Takara, Japan). For BMP4 mRNA expression analysis, the 2-ΔΔCt method was applied, and GAPDH was used as an endogenous control.
2.21 Transient transfection
SiRNAs were transfected into cells using Lipofectamine 2000 transfection reagent (#11668019, Invitrogen, USA), in accordance with the manufacturer’s instructions. For BMP4 plasmid transfection, 2 µg plasmid was transfected into six-well plates using a Viafect transfection reagent (E4981, Promega, USA). After 48 h, the cells were harvested for subsequent experiments.
2.22 Cell proliferation analysis
Cell Counting Kit-8 (CCK8) assay was performed to evaluate cell proliferation. Each 96-well plate was seeded with 1000 cells. The optical density (OD) after the addition of CCK8 reagent (#40203ES60, Yeasen, China) was determined for five consecutive days.
2.23 E knockout
The templates for producing target sgRNAs were constructed in the pSpCas9(BB)-2A-Puro (PX459) V2.0 plasmid (Addgene: #62988, a gift from Feng Zhang) and confirmed by sequencing [
38]. At 24 h after transfection, puromycin was added to a final concentration of 2 µg/mL for NOZ cells and 20 µg/mL for GBC-SD cells. Three days later, the DNA was extracted for PCR analysis to confirm the effectiveness of gene editing.
2.24 CTCF motif disruption
For CTCF binding site disruption, a sgRNA targeting sequence close to the CTCF motif was designed. The lentivirus was produced by cotransfection of CRISPR plasmids with VSVG and Δ8.9 plasmids into HEK293T cells using polyethylenimine. The CRISPR lentivirus was used to infect GBC-SD cells. After puromycin selection, editing efficiency was confirmed by TA cloning.
2.25 Statistical analysis
The bootstrap method was used to determine the relationship between interchromosomal interactions and translocation events identified by WGS data. Briefly, random interchromosomal interactions were selected to calculate the overlap with translocation 1000 times. The P value is equal to (the number of times random sites exceeded the top 100 sites)/1000. The same method was also used to determine the relationship between CTCF peaks and loop ends. For gene expression, CTCF, and H3K27ac signal changes within compartment switching, the Wilcoxon rank-sum test was used to compare the . The Wilcoxon rank-sum test was also used for TAD sizes, loop length, and insulation score comparison. The significance of different interactions between up/down expression genes was calculated using the Wilcoxon rank-sum test on the interaction count ratio. For cell proliferation, significance was determined using an unpaired two-sided t test on the OD values on day 5. For quantitative PCR (qPCR), an unpaired two-sided t test was used to compare delta cycle threshold (CT) values.
2.26 Data availability statement
All data can be viewed in the National Omics Data Encyclopedia datasets by pasting the accession number OEP002892.
3 Results
3.1 Global changes in genome architecture in GBC
To understand the reorganization of chromatin structure in GBC, we performed
in situ Hi-C on two pairs of cancerous and benign gallbladder tissues derived from patients, HBECs, and two GBC cell lines (GBC-SD and NOZ). We combined these
in situ Hi-C data sets with data from CTCF binding state (CTCF ChIP-seq), E/P activity (H3K27ac/H3K4me3 ChIP-seq), and gene expression (RNA-seq) to explore the molecular mechanisms underpinning hierarchical layers of chromatin reorganization, including compartments, TADs, and loops (Fig.1). The whole-genome contact heatmaps generated using the ICE-normalized matrices revealed stronger interchromosomal interactions in tissues than in cell lines (Fig.1 and S1A). In benign and cancer tissues, interchromosomal interactions accounted for 47%–52% of total chromatin interactions, but they decreased to 11%–17% in HBECs and cancer cells (Table S1), which suggest that tissue heterogeneity may account for strong interchromosomal interactions. Moreover, a number of strong specific interchromosomal interactions were observed in the GBC cell lines (Fig.1 and S1A). Given the previous reports that strong interchromosomal interactions are involved chromosomal translocation [
9], we compared these strong interchromosomal interactions with the corresponding translocation events identified using WGS data. We found 30 and 32 interchromosomal interactions from the 100 highest interchromosomal interactions, and they were ascribed to translocation events in GBC-SD and NOZ cell lines, respectively (bootstrap
P value < 0.001, Fig.1 and S1B, Table S2). To further explore the specific genes that may be influenced by aberrant chromosomal interactions, we rearranged genome translocation events using NeoLoopFinder software and found strong interactions within the
PTPRD locus on chromosome 9 joined with chromosome 2 and the
ZNF423 locus on chromosome 16 joined with chromosome 3 (Fig.1 and S1C, Table S3).
PTPRD is a member of the protein tyrosine phosphatase family and regulates a variety of cellular processes, including cancer progression [
39]. ZNF423 and its mouse ortholog Zfp423 are critical transcriptional modulators in neuroblastoma and leukemia [
40]. Altogether, we integrated high-resolution chromatin interaction maps and epigenetic data on gallbladder tissues and cell lines and observed the association of strong interchromosomal interactions with chromosomal translocations.
3.2 Compartments and TADs are reorganized in GBC
To determine whether changes in large-scale compartment organization occur in GBC, we quantified the A/B compartments using an eigenvector-based method at 500 kb resolution. The comparisons revealed a significant difference between GBC and normal controls in A/B compartmentalization. Although most of the genome regions retained their compartment identities, a significant fraction of the genome switched from compartment A to B (5%–12%) or from compartment B to A (6%–15%) between normal and cancer (Fig.2 and 2B, Table S4). In the region of chromosome 4, a 30 Mb region switched compartments between normal and cancer cells (Fig. S2A). Notably, when compartments A/B switched, the expression levels of genes decreased considerably relative to the genes in the A/A compartment in GBC-SD or NOZ cells versus HBECs (Fig.2 and S2B). Conversely, when compartment B/A switched, the gene expression levels increased. CTCF and H3K27ac signals exhibited similar changes with compartment switching in these cells (Fig.2, 2E, and S2C). In addition, a majority of CTCF peaks (~70%) were located in compartment A, and the signal of CTCF peaks in compartment A was higher than that in compartment B (Fig. S2D and S2E). However, DEGs were not always associated with compartment switching, with only 11%–27% of DEGs located in the compartment-switching region (Fig. S2F). To further elucidate the relationship between chromatin structural changes and gene expression changes in GBC, we analyzed the TAD organization in normal and cancer tissues and cell lines. Larger but slightly fewer TADs were identified in cancer lines than in HBECs. However, the opposite relationship was observed in cancer and normal tissues (Fig.2 and 2G, Table S5). Notably, the insulation scores for the TAD boundaries were higher in one GBC tissue and two GBC cell lines than in benign tissue and HBECs (Fig.2 and S2G, Table S6), which indicates weakened boundaries in GBC. To further interpret the functional implications of these alterations in insulation scores, we generated pile-up heatmaps of all TADs from cancer and normal cells by aligning TAD boundaries and then analyzed the differences in their inter/intra-TAD interactions. The results revealed that inter-TAD interactions increased but intra-TAD interactions decreased in GBC cell lines versus HBECs (Fig.2). Similarly, GBC-1 of two cancerous tissues showed higher inter-TAD interactions but lower intra-TAD interactions than benign tissues (Fig. S2H). Altogether, compartment and TAD organization exhibited considerable differences between GBC and normal samples. The TAD boundaries in cancer tissues and cell lines were weakened relative to their counterparts in normal tissues and cell lines, and this result may have profound implications for gene expressions.
3.3 Altered chromatin loops are associated with genes involved in cancer progression
Given that TADs are established by large-scale chromatin loops [
41,
42], we subsequently analyzed whether the different loop structures contribute to the alteration of TADs. The majority of loops over 50% were shared between HBECs and GBC cell lines, and 21%–42% of loops were unique to each cell line. Similarly, more than 50% of loops were common between tumor and normal tissues (Fig.3 and S3A, Table S7). In addition, as shown in Fig.3 and S3B, cancer-specific loops were significantly shorter than normal-specific or common loops in cancer cells/tissue versus normal cells/tissue. These loop features differed from the TAD size characteristics between tissues and cell lines. By analyzing the distribution of cancer-specific, normal-specific, and common loops with different lengths in tissues and cells, we discovered that a higher proportion of loops < 100 kb was located in cancer-specific loops than in other loops (Fig.3 and S3C). These data suggest that these short loops may be primarily associated with cancer progression. To assess whether cancer-specific short loops harbor distinct transcription factor binding motifs, we analyzed the enrichment of transcription factor motifs in the anchor locus of short loops and observed that six transcription factor motifs were enriched in these short loops compared with those in long loops in cancers versus normal controls, in which Sp1 transcription factor (
SP1) [
43] and activating transcription factor 1 (
ATF1) [
44] are known as the activated transcription factors (Fig. S3D). To further explore the potential biological process possibly ascribed to cancer-specific loops, we performed GO enrichment analysis of genes associated with different groups of loops. Biological processes related to development were enriched in normal-specific loops, whereas cancer-associated biological processes, including epithelial–mesenchymal transition (EMT), chromatin remodeling, and regulation of MAP kinase activity, were identified in cancer-specific loops [
45–
47] (Fig.3 and S3E, Table S8). SRY-box transcription factor 9 (
SOX9) was a typical gene example in the chromatin remodeling process, and it was located at the anchor of a loop unique to NOZ cells, rather than in HBECs (Fig.3). Altogether, we discovered that the distinct chromatin loop landscapes between GBC and normal controls may be associated with genotypic alterations in the occurrence and development of GBC.
3.4 CTCF-mediated loop formation
Next, we explored the underlying mechanism for the differences in loop structures. Given that CTCF blocks the cohesion-mediated loop extrusion process at chromatin loop anchors, we analyzed the CTCF ChIP-seq data to define the CTCF signature in the loops in GBC and normal cells. Consistent with previous studies [
19,
48], we found that the majority of loops were mediated by CTCF in HBEC and GBC cell lines, of which 83%–91% loops contained CTCF binding sites in at least one anchor, and 42%–60% loops had CTCF binding sites in both anchors (bootstrap
P value < 0.001, Fig.4). In addition, compared with the random region of the chromosome, CTCF signals were enriched at the loop anchors to similar extents in normal and cancer cells (Fig.4). We further classified the loops into four types based on the polarity of CTCF binding (forward-reverse, reverse-forward, forward-forward, and reverse-reverse) [
4,
49]. In GBC and HBEC cells, ~79% of loops were anchored by CTCF binding sites in a convergent orientation, ~20% of loops in the same orientation, and ~1% of loops in divergent direction (Fig.4). Interestingly, CTCF was more likely to bind to either the left side of the left anchor midpoint or the right side of the right anchor midpoint (Fig.4). In an example of chromatin loops on chr10, in addition to two loops commonly located in normal cells and NOZ cells, a specific loop connected by two convergent CTCF binding sites was identified only in NOZ cells (Fig.4). This cancer-specific loop, which was associated with a newly emerged CTCF peak in one of the loop anchors, suggests that changes in CTCF genome-wide occupancy may underlie the reorganization of chromatin loop landscapes in GBC cells. Collectively, our data suggest that the cancer-specific chromatin loops associated with the convergent CTCF binding sites may provide a structural basis for cancer-associated gene regulation.
3.5 CTCF-mediated E-P interaction loop regulates cancer-related gene expression
To identify CTCF-mediated loops as gene regulatory elements, we specifically focused on the relationship between CTCF, E-P loop interactions, and gene expression. We analyzed E-P interaction levels using H3K27ac peaks (excluding the +/− 2 kb region of the TSS) as the putative E region in HBECs and GBC-SD/NOZ cells. First, we analyzed the proportion of E-P loops in the total loops and observed that ~20% of loops harbored E and P elements in all cell lines (Fig. S4A). In addition, gene expression, CTCF occupancy, and H3K27ac signal were positively correlated with gene expression levels in cancer cells (Fig.5 and S4B). Furthermore, CTCF and CTCFL motifs were enriched in the E and P loci of E-P loops compared with other E and P in GBC cells and HBECs (Fig. S4C). In addition, the gene expression levels in cancer cells were positively correlated with E-P interaction frequencies (Fig.5 and S4D), which suggests that E-P interactions may play important roles in directing cancer-specific gene expression programs. To further define the cancer-related genes potentially controlled by E-P loops, we analyzed the gene expression profiles of the GBC cohort (nine GBC samples, GEO accession number: GSE76633), GBC-SD, and NOZ cells. As a result, cross-combined assessment of these gene data sets revealed seven altered genes, including
BMP4,
KRT19, and others (Fig.5 and 5D, Table S9).
FOXA2 gene was elevated in two GBC cell lines.
BMP4 is a member of the TGF-β superfamily that mediates cell proliferation and EMT [
50], and
KRT19 is upregulated and correlated with the overall survival in lung cancer patients [
51].
FOXA2 is elevated in patients with Barrett’s metaplasia, dysplasia, and adenocarcinoma [
52]. By integrating RNA-seq, ChIP-seq (CTCF/H3K27ac/H3K4me3), loop interactions, and Hi-C heatmap, we observed that all three genes (
BMP4,
KRT19, and
FOXA2) were associated with cancer-unique E-P loops in GBC-SD and NOZ cell lines (Fig.5 and S4E). Only the H3K27ac signal enriched at the end of the loops was annotated as the putative E of
BMP4,
KRT19,
or FOXA2. H3K27ac and H3K4me3 signals enriched at one loop end were defined as P. Moreover, a unique CTCF peak downstream of the right end was found in GBC-SD cells. When we knocked down
BMP4 levels using siRNA in two GBC cell lines, the proliferation of cancer cells was substantially inhibited (Fig.5 and 5G). These results suggest the need to develop a model in which the CTCF-mediated E-P interaction loop in cancer regulates gene expression for cancer cell development.
3.6 Deletion of the E or CTCF binding site downregulates gene expression
To further verify that BMP4, KRT19, and FOXA2 are regulated by cancer-specific E-P loop interactions, we deleted the putative E in GBC-SD and NOZ cells using the CRISPR/Cas9 system. The efficiency of E deletion via two sgRNAs was confirmed by Sanger sequencing and nucleic acid electrophoresis (Fig.6 and S5A). Compared with the wild-type (WT) GBC-SD and NOZ cells, the expressions of all three genes in edited cells decreased (Fig.6, 6C and S5B). To subsequently evaluate the cellular functional changes, we assessed cell proliferation in BMP4-associated putative E-edited and WT cells by CCK8 assay. Concordant with the knockdown of BMP4, the E-deleted cells also exhibited defects in cell proliferation. By contrast, the overexpression of BMP4 in the edited cells fully rescued the cell proliferation defects (Fig.6). Altogether, these data show that the E-P loop interaction can induce cancer-related gene expression and promote tumor cell proliferation in GBC. In addition, to further investigate whether BMP4 is regulated by CTCFs that mediate the formation of the E-P loop, we reconfirmed the specific CTCF binding downstream of the BMP4-associated E by ChIP‒qPCR and disrupted the CTCF binding site using CRISPR/Cas9 (Fig.7 and S5C). CTCF-edited cells with 91% indels in CTCF binding sites were obtained (Fig.7). The disruption of CTCF binding sites decreased the expression of BMP4 at the mRNA and protein levels (Fig.7 and 7D, respectively). Overall, our data support the notion that the CTCF-mediated E-P loop regulates the abnormal expression of the BMP4 gene and contributes to cancer progression.
4 Discussion
Structurally altered genomes can affect gene expression and functions, which leads to aberrant individual genotypes and phenotypes in a broad spectrum of human diseases [
53–
55]. Growing evidence has demonstrated that genomic organization is a multistep process involving CTCF-loop formation by distant locus contact in linear chromosomes [
19,
56–
58], high self-interaction of TADs composed of multiple loops [
41,
59], and large-scale A/B compartments compacted with interactive chromatin [
8,
60,
61]. Despite the previous intense focus on TAD reorganization, the structural and functional relationships among chromosomal compartments, CTCF loops, and gene expressions remain to be established in cancer. Hence, we systematically investigated the topological landscape of individual genomic components and functional changes in GBC tissues and cell lines by means of
in situ Hi-C, RNA-seq, and ChIP-seq methodologies. In the comparison between GBC tissues/cancer cell lines and benign tissues/normal epithelial cells, we observed that GBC harbored unique cancer loops that were associated with chromatin remodeling and EMT activation, whereas controls consisted of loops regulating normal development. In addition, gene expression was correlated with CTCF levels and E-P interactions, which highlights the importance of CTCF-mediated E-P loops for gene expression in cancer.
BMP4 gene expression was upregulated by the CTCF-mediated E-P loop in GBC cell lines, which was absent in normal controls. Furthermore, in GBC, increased inter-TAD interactions with weaker TAD boundaries may facilitate proximal TADs to create new loops. In addition, CTCF peaks were mostly distributed in compartment A, and increased CTCF levels and gene expressions were correlated with the switching of compartment B to A. Collectively, our findings suggest that reorganization of the 3D genome architecture at multiple levels contributes to genotypic and phenotypic abnormalities during the development of GBC.
In the analysis of inter- and intra-chromosomal interaction ratios, the ratios in our patient tissues were 47%–52% and 48%–53%, respectively, and the values in cell lines were 11%–17% and 83%–89%, which suggest that tissue heterogeneity may account for the strong interchromosomal interactions. In concert with our findings, a number of tissue and cell line studies exhibited the elevated intrachromosomal interaction ratio in tissues. The average inter/intrachromosomal interaction ratio is 37%/63% in cardiac tissue [
62] and 48%/52% in human lung tissues [
63]. However, the ratio reaches 14%/86% in pancreatic cell lines [
64] and 17%/83% in prostate cancer cell lines [
65]. Thus, all the studies should explore the molecular mechanisms underlying the different inter/intrachromosomal interaction ratios between tissues and cultured cells.
A recent study on colorectal cancer (CRC) reported that changes in the DNA loops between MYC and its E can promote the development of colon cancer [
66]. In addition, a CRC risk variant approximately 300 kb away within an E of MYC facilitated loop formation [
67]. In line with these findings, we defined the regulatory mechanisms of the cancer-related genes, namely,
BMP4,
KRT19, and
FOXA2, driven by an E-P loop in GBC-SD and NOZ cell lines. The disruption of individual putative E via the CRISPR/Cas9 system resulted in downregulation of gene expressions and inhibition of cell proliferation in
BMP4-E-edited cells. The individual anchors enriched in the binding of two convergent CTCF proteins at both ends of the loop secured loop integrity and function. This result was in line with a reported evidence indicating that 46% (4322/9948) of loops were mediated by two CTCF binding sites and 92% of motif pairs were convergent [
19]. Disruption of the specific CTCF motif around the
BMP4-related E in GBC-SD cells resulted in the downregulation of
BMP4 expression. However, only one CTCF binding site was identified at the cancer-specific E-P loop at the
BMP4 locus in NOZ cells. A recent study may explain this discrepancy, which revealed that P exhibited a higher CTCF binding affinity than E and that deletion of this CTCF peak had a role in determining finer-scale E-P choice and engagement [
68]. Another report suggested that DNA methylation of the CTCF site can alter CTCF binding and loop formation [
69]. In our system, gene modification may participate in this event but not the genetic mutation that has been excluded (Fig. S5D). Notably, we observed that CTCF peaks were not always at loop ends; instead, they were located on both sides of the ends. This phenomenon may be related to the limited resolution of the interaction matrix. In addition, CTCF was more likely to bind to either the left side of the left anchor midpoint or the right side of the right anchor midpoint. In summary, we have offered substantial pieces of evidence that reveal the gene expression mechanisms regulated by CTCF-mediated E-P interaction loops in GBC.
Our current study focused on GBC cell lines and two GBC patients and unveiled the topological and functional alterations of the cancer genome. Although the overall outcomes support our gene regulatory model driven by CTCF-mediated E-P loops, two human samples yielded several inconsistencies that limit our knowledge. The insulation score of boundaries changed in one GBC patient but not in another. Similarly, one patient possessed more cancer-specific short loops, but the other did not. Therefore, the enrollment of a substantially large number of GBC patients is essential to eliminate divergence and confirm our findings.
Given the rapid development of detection systems, such as in situ Hi-C and ChIP-seq, the 3D spatial genome organization with gene regulatory elements, including E, P, and insulators, in diseases has increasingly received remarkable attention, with the expectation of offering novel insights into the topological organization of individual genomic components. To date, a number of molecular mechanisms, such as the regulation of short loop formations, differences in inter/intra-TAD interactions, and compartment A/B switching, remain to be clarified. Nevertheless, the current study has demonstrated that topological changes in distinct genomic layers give rise to the oncogenic signature and drive cancer-specific gene expression in cancer development.