INTRODUCTION
Derived from an adaptive bacterial immune system, the clustered regularly interspaced palindromic repeats (CRISPR)/ CRISPR-associated 9 (Cas9) system is a versatile and powerful tool for mammalian genome modification. The system requires two components when applied as an experimental tool: a single guide RNA (sgRNA) that recognizes a specific site in the genome through Watson-Crick base pairing, and a Cas9 nuclease that binds the sgRNA and generates a DNA double-strand break. The CRISPR/Cas9 system is easy to be manipulated by simply altering the short sequences of sgRNAs, and therefore it is utilized for wide applications such as genome editing [
1–
3], transcriptional regulation [
4–
8] and genomic function interrogation [
9–
18].
The CRISPR/Cas9 system can be widely applied in cell-based functional screening assays for hundreds of coding genes in parallel [
9–
14], in which pooled libraries of sgRNAs target the coding regions of genes. The sgRNA guided DNA cleavage often causes frameshift of coding genes by generating short insertions or deletions (indels), allowing the investigation of gene function on phenotypes such as cell growth. Although NHEJ mediated indels are reported to cause significant impact on the region of 3′ UTR of miRNA binding sites [
19], indels may not be sufficient to produce loss-of-function phenotypes for large non-coding genomic elements, such as gene/miRNA clusters or lncRNAs, for which specific chromosomal deletions are more suitable. Although paired-sgRNAs have been successfully used to generate genomic deletions for lncRNAs [
20–
24], enhancers [
25], microRNAs [
26] and gene clusters [
27], a systematic strategy for designing a CRIPSR-based paired-sgRNA library for high-throughput screening in non-coding regions has been scarcely described in details.
In this study, we presented designs for genomic deletions with paired-sgRNAs, provided methods to generate two types of pooled paired-sgRNA plasmids and validated the efficiency of pooled library. Due to the abundance of lncRNAs
in vivo [
28] and their diverse regulatory roles in chromosome silencing [
29], chromatin modification remodeling [
30], transcriptional regulation [
22,
23] and nuclear transport [
31], we targeted highly expressed lncRNAs in non-small-cell lung cancer (NSCLC) for chromosomal deletion to demonstrate the utility of this approach.
RESULTS
Repurposing microarray data for lncRNA selection in NSCLC
Inspired by previous study in which cancer-type specific dysregulations of lncRNA have been found through integrative analyses [
32], we searched for overexpressed lncRNAs in NSCLC. Expression profiles of lncRNA can be extracted from microarray data through the reannotation of probes, despite the fact that lncRNAs are not the intentional targets [
33–
37]. Therefore, we retrieved 27 microarray datasets performed on the H1650 cell line, a commonly used cell line in NSCLC research, from the Gene Expression Omnibus database (Table 1).
To repurpose these available array-based data, we designed a method to reannotate the probes from Affymetrix Human Genome Arrays, excluding low-quality and ambiguous probes and keeping lncRNAs transcripts with at least 3 matched probes (Fig. 1A). Then, we developed a pipeline to analyze and derive expressions of lncRNAs, including preprocessing raw microarray data with two different methods MAS5.0 and GCRMA, summarizing expression values of uniquely mapped lncRNAs according to the reannotation file. After listing lncRNAs in descending order according to their expression values, top 600 overlapped highly expressed lncRNAs were selected (Fig. 1B, Supplementary Table S1). In addition to the selected lncRNAs, we chose 18 well characterized lncRNAs associated with the occurrence and development of NSCLC as positive controls [
38–
57] (Supplementary Table S1).
Design of paired-sgRNAs for lncRNA deletion
We designed two schemes for lncRNA deletion to disrupt the function of selected lncRNAs: one targeting the promoter plus the lncRNA coding region (pg-type) and the other only targeting the coding region of the lncRNA (gg-type) (Fig. 1C). We considered the 300-bp upstream of each transcription start site (TSS) as the assumed promoter, because it covered the region of core promoter and proximal promoter, which contained a large number of transcription factor binding sites for basal transcription and transcription activation or repression activities respectively [
58–
61]. After the definition of targeted regions, we sorted for all possible sgRNAs with the proper protospacer adjacent motif (PAM) NGG through the online algorithm of CRISPR-ERA [
62]. We set rules to filter sgRNAs only if (i) their sequences mapped to the intended loci with up to two mismatches, (ii) the sum of efficacy score (E-score) and specificity score (S-score) of each sgRNA was greater than 0 to ensure cutting efficiency, and (iii) the sgRNAs did not include the UUUU/TTTT polymer.
We ordered all potential sgRNAs according to their scores (the sum of its E-score and S-score), enumerated all possible paired-sgRNAs and multiplied their scores for the following evaluations. From the paired-sgRNAs with high multiplying scores, we separately chose approximately 10 pairs for each lncRNA according to the two deletion schemes to meet the following requirements: (i) the cleavage offset of the pair was between 500 bp and 10 kb, (ii) the sgRNAs in the pair were not reused and (iii) the deletion region did not overlap with other genes. The offset distance above was chosen to optimize the deletion efficiency, because the deletion efficiency dropped rapidly when the deletion size was larger than 10 kb [
63]. For the lncRNAs with very few paired-sgRNAs, we obtained enough pairs by reusing the sgRNAs with high targeting scores. Finally, we obtained 5253 and 5420 paired-sgRNAs for pg-type and gg-type lncRNA deletion, respectively (Supplementary Table S1).
For 18 well studied lncRNAs, we designed 10 paired-sgRNAs for each lncRNA following the scheme that only targeted the lncRNA coding region to serve as positive controls for further functional screening study. Additionally, 45 sgRNAs that did not target any loci in the human genome were designed. Performing a random combination of two sgRNAs above, 2025 paired-sgRNAs were generated as negative controls. In summary, the whole library contained a total of 12,878 sgRNA pairs.
Construction of a paired-sgRNA library
To facilitate further pooled functional screening, we cloned two distinct sgRNAs into a single lentiviral vector. In the construction of sgRNA pairs for selected and positive lncRNAs, the library preparation protocol comprised three cloning steps (Fig. 2A). First, the synthesized oligonucleotides, which included paired-sgRNA sequences separated by two Esp3I enzyme restriction sites, were inserted into donor vectors carrying the human U6 promoter. Then, the fragments containing a scaffold for the former sgRNA and promoter H1 were cloned into intermediate vectors by the Golden Gate cloning method, followed by excision of the chloramphenicol resistance marker by Cre/LoxP recombination. The pooled paired-gRNAs for these lncRNAs were divided into nine subpools, containing 1356, 1356, 1352, 1356, 1319, 1311, 1314, 1309 and 180 pairs, respectively.
A two-step cloning method was used to construct negative control paired-sgRNAs (Fig. 2B). The synthesized 24-bp targeting oligonucleotides were annealed and inserted into the donor vector by conventional ligation at the BsaI site, and then tandem ligations of two sgRNAs were cloned to generate the final plasmid using the Golden Gate cloning method, serving as the tenth subpool.
Quality analyses of pooled paired-sgRNA library
We digested the pool plasmids from each subpool with XhoI enzyme. Three bands with the correct length were obtained in 10 subpools (Fig. 3A), which confirmed that the plasmids were correctly cloned. Further, we selected approximately 20 bacterial colonies from each subpool for Sanger sequencing, and approximately 80% of colonies mapped to the designed library (Table 2). Therefore, we mixed 10 subpools together equally according to their corresponding pairs to obtain the final pooled library. The 576-bp fragments harboring paired-sgRNA sequences were amplified by PCR from the final pooled library, and then sequenced using HiSeq 2500 with the paired-end 250-bp (PE250) mode. We found approximately 80% of paired-sgRNAs mapped to the designed library (Fig. 3B), with a relatively even distribution with 73.01% coverage of pairs within a 64-fold range (Fig. 3C).
Validation of the library cleavage performance
To validate the chromosomal cleavage performance of the pooled library, we randomly selected 12 paired-sgRNAs from two targeting schemes for lncRNA deletion. The deletion size of selected pairs was from 500 to 3000 bp, which constituted the majority in our library. Using a paired-sgRNA which did not target any loci in the genome as the negative control, we transfected HEK293-FT cells with plasmids expressing functional positive paired-sgRNAs and Cas9 protein. After 5 days of puromycin selection, cells were harvested to extract genomic DNA for deletion detection. PCR results showed various deletion rates, with 1.3%–87.8% for pg-type targeting scheme (Fig. 4A) and 10.8%–78.6% for gg-type (Fig. 4B). We conducted the experiments three times and the summarized results suggested that there was no significant difference between two targeting schemes (Fig. 4C). Despite GC content, presence of poly-thymidine and specificity considered in CRISPR-ERA to provide sgRNAs with well performance, the cutting efficiency of paired-sgRNAs might be influenced by the deletion size and accessibility of sgRNAs to their target sites. Within the range of 500 –3000 bp, the higher efficiency was observed when the deletion size was larger. Additionally, we noticed that there were weak bands in some lncRNA deletions located between the deletion band and non-deletion band (Fig. 4A, B), which we speculated could be large indels ranging from ~–200 to ~+500 bp [
63].
With more stable cleavage performance than pairs of pg-type, we used 3 well performed sgRNA pairs of gg-type targeting lncRNA-4, lncRNA-5 and lncRNA-6 to produce lentiviral particles to infect H1650-Cas9 cells (H1650 cells that were engineered to stably express Cas9). Similarly after puromycin selection and genomic DNA extraction, we conducted PCR for deletion detection. These three pairs showed deletion rate between 7% –21% (Fig. 4D), in which the pair for lncRNA-6 was the highest and the other two were close as they did in transient transfections.
DISCUSSION
Since known protein-coding sequences compose a minority of the whole human genome [
64], functional identification of non-coding genomic regions such as lncRNAs is needed. We here described a method to design and establish a CRISPR-based paired-sgRNA library for genomic deletion. We detailed the selection of overexpressed lncRNAs in NSCLC, criteria of paired-sgRNA design and methods to construct the final pooled library, whose cleavage efficiency was confirmed by several selected plasmids.
There are several limitations to our strategy that could be further optimized. First, although we used two different promoters to express sgRNAs to prevent recombination, not all designed pairs were recovered in our deep sequencing results (Fig. 3B, C). This could be improved with careful design of sequence constructs and primers for lower recombination rates and higher sequencing quality [
65]. As shown in Fig. 4 that the cleavage efficiency of paired-sgRNAs varied along with the deletion size, the library could be optimized to achieve better cleavage performance for functional screening. Finally, due to various locations of lncRNA in the genome [
66], it is necessary to exclude genomic regions that overlap with other functional elements to avoid misleading results.
In conclusion, we have demonstrated a strategy to design and construct pooled paired-sgRNAs to generate genomic deletion in the lncRNA region, explored the relationship of deletion efficiency with respect to deletion size and observed the better performance with the larger deletion distance within the range of 500 to 3000 bp.This method would be also suitable for investigation of other uncharacterized long non-coding genomic regions in mammalian cells in an efficient and cost-effective manner.
MATERIALS AND METHODS
Construction of paired-sgRNA subpools for lncRNA deletion
Chip synthesis of oligonucleotide sequences
The oligonucleotides containing each paired-sgRNA sequences, separated by a short spacer harboring two Esp3I sites, were designed and synthesized on a chip. At the 5′ and 3′ ends of each oligonucleotide were short sequences including a BsaI site and a fragment used to distinguish different subpools. The schematic construct was fragment1-BsaI-sgRNA1-Esp3I-Esp3I-sgRNA2-BsaI-fragment2.
Step one of construction
For the construction of each subpool, primers targeting the BsaI site and flanking fragments of oligonucleotides were used for amplification to generate 127-bp dsDNA molecules. Primers for nine subpools were as follows:
Subpool-1: TGGTGATAGGTAAGGATGGC; CGGCTCAGTATTGCGATTAC
Subpool-2: TCGACACCACTATACACCAC; GGCCCGTGAGAGTATAAAGA
Subpool-3: CATGTAGTGCAGCCATTCTC; GGGCACAGCAATCAAAAGTA
Subpool-4: TCTAGGTTTCGGCTTCATGT; GGTGCATGGGAGGAACTATA
Subpool-5: ATACTGCTGGGCTGGATATC; TCCTGAGAGAATACGGATGC
Subpool-6: ACCCAAAGAACTCGATTCCT; GCTAAATGGAGTGAGGAGGT
Subpool-7: AGTCTTAGGCTTGGAGTGTC; GTAGGCTGAGTAGTGATCCC
Subpool-8: GCTCTCCGCTATCAGTAACA; GACGAAGTTCACTAGACCCA
Subpool-9: GCCTATCCTCTAGTTCTGCC; TCGAGTTAGATTGTCACCCC
The amplicons were purified from a gel and ligated to the vector of pDonor 1 (U6-BsaI-ccdB-BsaI-sgRNA_scaffold-hEF1a-EGFP-2A-Puro) using the Golden Gate cloning method. To ensure no loss of representation, 10 parallel transformations were performed and plated onto 15-cm plates with ampicillin selection. The intermediate vectors were named after pDonor 2, laying out as U6-sgRNA1-Esp3I-Esp3I-sgRNA2-sgRNA_scaffold-hEF1a-EGFP-2A-Puro.
Step two of construction
The gel-purified fragments generated by Esp3I digestion of pDonor 3 (Esp3I- sgRNA_scaffold-loxp-Chl-loxp-H1-Esp3I) were ligated to pDonor 2 using the Golden Gate cloning method. To ensure no loss of representation, 10 parallel transformations were performed and plated onto 15-cm plates with chloramphenicol and ampicillin selection. The intermediate vectors were named after pDonor 4, with the following configuration: U6-sgRNA1-sgRNA_scaffold-loxp-Chl-loxp-H1-sgRNA2-sgRNA_scaffold-hEF1a-EGFP-2A-Puro.
Step three of construction
The pDonor 4 plasmids were transformed into competent cells expressing cyclization recombination, and plated onto 15-cm plates with ampicillin selection. To ensure no loss of representation, 10 parallel transformations were performed. In this step, the cassette of chloramphenicol was deleted, generating the final constructs (U6-sgRNA1-sgRNA_scaffold-loxp-H1-sgRNA2-sgRNA_scaffold-hEF1a-EGFP-2A-Puro).
Construction of paired-sgRNA subpools for negative controls
Step one of construction
We purchased 45 pairs of self-complementary oligonucleotides synthesized in 96-well plates. Each pair of oligonucleotides was mixed and annealed. These products were mixed in the same ratio, and then ligated into pDonor 5 (Esp3I-U6-BsaI-sgRNA_scaffold-Esp3I) and pDonor 6 (Esp3I-H1-BsaI-sgRNA_scaffold-Esp3I) digested by BsaI, respectively. To ensure no loss of representation, 2 parallel transformations were performed and plated onto 15-cm plates with kanamycin selection. The products were intermediate vectors named after pDonor 7 (Esp3I-U6-sgRNA-sgRNA_scaffold-Esp3I) and pDonor 8 (Esp3I-H1-sgRNA-sgRNA_scaffold-Esp3I).
Step two of construction
Intermediate vectors of pDonor 7 and pDonor 8 were mixed equally and then digested with Esp3I to insert into pDonor 9 (Esp3I-ccdB-Esp3I-hEF1a-EGFP-2A-puro) through the Golden Gate cloning method, generating paired-sgRNAs in tandem. To ensure no loss of representation, two parallel transformations were performed and plated onto 15-cm plates with spectinomycin selection. Finally, we obtained the randomly paired-sgRNA subpool, in which the structure of plasmids was U6-sgRNA1-sgRNA_scaffold-H1-sgRNA2-sgRNA_scaffold-hEF1a-EGFP-2A-Puro.
Transfection, lentivirus production and cell infection
HEK293-FT cells were seeded onto four 6-well plates with 1mL at a density of 1×106 cell/mL the day before cell transfections. Apart from 1 µg of Cas9 plasmid, each well was also transfected with 1 µg of functional paired-sgRNA plasmid or negative paired-sgRNA plasmid with Lipofectamine 3000 according to the manufacturer’s protocol. Puromycin (3 µg/mL) was added 24 hours after transfection and maintained for 5 days to harvest genomic DNAs.
HEK293-FT cells were seeded onto three 10-cm plates with 1mL at a density of 3×106cell/mL the day before cell transfections for lentivirus production. Each plate was transfected with 2.5 µg of the paired-sgRNA plasmid, 5 µg of pCMV-dR8.2-dvpr and 2.5 µg of pCMV-VSV-G with Lipofectamine 3000 according to the manufacturer’s protocol. Lentiviruses were harvested and filtered 48 hours later. H1650-Cas9 cells were seeded onto 6-well plate with 1mL at a density of 1×106 cell/mL the day before cell infection. On the next day, cells were infected with lentiviruses along with 8 µg/mL polybrene. Puromycin (1 µg/mL) was added two days after infection and maintained for 7 days to harvest genomic DNAs.
Validation of the library cleavage performance
Twelve plasmids were selected from the library to validate the cleavage efficiency of the paired-sgRNAs. The 6 transcript IDs of lncRNA targeted by paired-sgRNAs of pg-type were ENST00000442823, ENST00000438436, ENST00000-425412, ENST00000445681, ENST00000500612, ENST00000520348, and 6 transcript IDs of lncRNA targeted by paired-sgRNAs of gg-type were ENST00000-317114, ENST00000448587, ENST00000439321, ENST00000428008, ENST00000458653, ENST00000-415590. After harvesting genomic DNAs from cultured cells both in transfection and infection experiments, PCR were performed to validate deletion of lncRNAs. Primers for PCR were designed at the upstream and downstream of the target loci, and the sequences of 12 sets primers were as follows:
pg-type:
LncRNA-1: 5′-CTGGAGCATAGTAAGTGCTG-3′, 5′-GTGAGGTAGGCTTTATGGC-3′;
LncRNA-2: 5′-CATAGAACATCCCGAACCC-3′, 5′-GTTCTTCGATTTCACAGAGG-3′;
LncRNA-3: 5′-CTGGTAGATCAGACGTCAC-3′, 5′-CCTTAGAGGCTTTCTCCGC-3′;
LncRNA-4: 5′-CATGTCTACTGATCGGAATG-3′, 5′-GCTCTCCTTAAACTCTGTGC-3′;
LncRNA-5: 5′-CATGACCCTATGTCAGGAG-3′, 5′-GCCTTGAACTCCTGGAATG-3′;
LncRNA-6: 5′-CTCGAACTCCTGACATCGG-3′, 5′-CAGCTGTCAGCCTCAATGAG-3′;
gg-type:
LncRNA-1: 5′-CACGGATGTAACCACAGCAC-3′, 5′-ACGCCTGCTTTCCAGATCC-3′;
LncRNA-2: 5′-ATCCGGATGCCTCGTCTTG-3′, 5′-TGTGGCTGTGGGACCTTAG-3′;
LncRNA-3: 5′-GGTTAGGCCCCTTGGAAG-3′, 5′-GTGGTTGAGAAGTGGAGCAC-3′;
LncRNA-4: 5′-CGGTTTGGTGCGTGTGAAGC-3′, 5′-CCCAACTTGGAAATGGGTC-3′;
LncRNA-5: 5′-GTTTCCGTTCCCCGCAGAC-3′, 5′-ACAGGCCAATGTCAGTCC-3′;
LncRNA-6: 5′-CTCACCACAGTGGGAAGTAC-3′, 5′-GCCTTGTTCAAAACTGGGC-3′.
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature