1 BACKGROUND
With the boom in sequencing technology, the relationship between genes and phenotypes can be revealed through a variety of experimental techniques. CRISPR-mediated gene editing is currently the most convenient and rapid technique for observing phenotypic effects by knocking out (knocking down) or activating genes to regulate gene expression
[1,
2]. Steered by a single guide RNA (sgRNA), CRISPR-associated (Cas) nucleic acid proteins can target and complement near the site where a protospacer adjacent motif (PAM) appears
[1]. At targeted genomic loci, Cas proteins generate insertion or deletion by cellular DNA repair pathways after a DNA double break (DSB)
[3,
4]. Since the first discovery of the CRISPR/Cas editing system, the CRISPR toolbox continues to expand for better application in various cell types and organisms
[5]. Cas9 is the major nuclease in CRISPR-based gene editing, mutants of this Cas enzyme offer additional application scenarios as well as improved editing efficiency
[3]. Cas9 nickase is a mutant form of Cas9 that can be created by mutating one of the two nuclease active regions, RuvC1 and HNH. This form of mutation produces a single-strand nick rather than a DSB at the target DNA loci. Using Cas9 nickase, the prime editor efficiently generates accurate base conversion, insertion and deletion effects without the DSB and exogenous DNA templates
[6]. Dead Cas9 (dCas9) is a simultaneous mutation of the RuvC1 and HNH nuclease active regions of Cas9. As a result, dCas9 retains only the ability to be guided into the genome by sgRNA, but the cleavage activity is lost. By fusing the dCas9 with a base modification enzyme that operates on single-stranded DNA, the base editor can enable the precise substitution of a single base
[7]. In addition, CRISPR interference and activation editors can be generated for transcriptional downregulation and upregulation by integrating dCas9 and transcriptional regulators
[2]. Also, CRISPR off/on editing systems was developed to regulate targeted gene expression by adjusting DNA methylation conditions and modifying histone proteins with long-term memory
[8]. Likewise, other Cas nuclease families offer additional application scenarios to facilitate their development in medicine and other fields
[9,
10].
The on-target specificity of all these CRISPR-based editing systems is mainly determined by the guiding component, guide RNA
[11]. Since a segment of 20 nucleotides can occur multiple times in a given genome, and some mismatches may be accepted by CRISPR/Cas system, off-target could be produced
[12]. Meanwhile, the differential editing efficiency of sgRNAs at distinct locations of the same gene, and hence maximizing on-target and minimizing off-target is essential for the application of the CRISPR/Cas system
[13]. One of the most accurate methods is to conduct experiments to screen candidate sgRNAs one by one. However, each step is costly in terms of time, funding and labor. Various experiment data for CRISPR/Cas editing have been available with the application and development of the technology, which can be used for
in silico analysis for sgRNA design
[11,
13]. Dozens of predictive tools have been devised in recent years, either in a web server or in a stand-alone program
[14–
17]. Web-based methods are user-friendly, especially for those without deep understanding of computers. Even so, there are a number of predictive tools with distinctive design propose and frameworks that would confuse users
[17–
20]. In addition, some tools do not work due to a lack of continuous maintenance and updates by the developers. Here, we characterized the currently available on-target design algorithms in web form, and developed a web-based selection tool, named Aid for Target Guide RNA design (
Aid-TG), to help users quickly select a system suitable for their purpose
[21].
2 sgRNA DESIGN FLOW
When conducting CRISPR-related experiments, there are several key points to note during the sgRNA design (Fig.1). The first step is to query the database for information about the target gene. It is important to consider the selection of species and to determine the registration number of the target gene in the database in order to avoid searching for the incorrect target gene. Once the target gene information is acquired, further attention should be given to the upstream and downstream sequence context of the targeted loci, the number of transcripts, the number and length of exons, the transcriptional start and stop sites. This information will then be taken into considerable account for further sgRNA design. The next step is to pick the appropriate target areas. For efficient editing of the targeted genes, the following tips should be considered. (1) Avoid selecting regions that overlapped with other genes. (2) Cover as many transcripts as possible and avoid the promoter region, with the target site preferably in the first 50% of the coding region
[4]. (3) Act on the functional domain of the protein. The third step is to perform sgRNA design, in which PAM sequence, GC content, positional information, strand, potential off-target sites, is considered
[13]. Following aforementioned steps, experiments are performed using the sgRNAs designed in the previous step. Evaluating their efficiency and selecting one or more sgRNAs with maximum on-target efficiency but also minimum off-target efficiency. It was worth noting that each step in the experimental screening process of sgRNA is time-consuming, costly and labor-intensive. These drawbacks prompted the emergence of software tools based on experimental data sets.
3 OVERVIEW OF IN SILICO sgRNA DESIGN TOOLS
With the investigation of CRISPR-mediated editing tools,
in silico design methods based on various frameworks and algorithms have been developed. Depending on different design principles, these sgRNA designers can be divided into three categories (Fig.1)
[12,
22]. (1) Sequence pairing-based (Tab.1)—the Cas protein binding is confined to a DNA target site adjacent to the PAM, which is diverse in different species and nucleases. Any the better performing candidate sgRNAs often have fewer mismatches. Also, the type of promoters is an influencing factor, as the U6 and T7 promoters require GG and G at the 5′ end of the sgRNA, respectively
[38,
39]. As indicated in previous studies, the Cas-OFFinder is mainly designed for potential off-target sites prediction using Bowtie2, while flyCRISPR is designed for
Drosophila research with an alignment design purpose
[2,
26,
32]. (2) Feature scoring-based (Tab.2)—editing activity has been found to vary across target loci, suggesting inherent differences in the sensitivity of certain targets to cleavage, leading to a series of explorations to find key features that influence targeting effectiveness
[11,
48]. Examples include the percentage of GC in candidate sgRNA, position-dependent nucleotide features, position-independent nucleotide motifs and exon position
[13,
49,
50]. (3) Machine learning-based (Tab.3)—the system can learn the weights of multiple features from an existing data set. However, the performance of sgRNA design tools based on different frameworks and algorithms vary considerably, especially on training sets from diverse sources
[12]. For example, sgRNA Scores v2.0 using a support vector machine as its backend in sequencing data from human HEK293T cells, while the developer of DeepCRISPR chose convolution neural network for both on-target and off-target editing prediction
[55]. In addition to the various algorithms on which they are based, the range of editing systems and the features considered contribute to the diversity of sgRNA design tools
[65]. The pgRNAFinder is a web tool designed specifically for the guide RNAs of prime editing, while BE-Hive is a tool based on deep learning for sgRNA design of base editing
[54,
61]. In addition, these tools are either web server and stand-alone program according, with the advantage of online tools is ease of use for those who lack coding skills.
3.1 Previous benchmarking
Nearly 60 predictive tools have been developed in recent years, and a number of them offer both website and stand-alone programs, which makes it challenging to select appropriate tools for guide RNA design
[4]. Thus, benchmarking the performance of existing tools and highlighting their applicability scenarios is important for their application
[22]. In an attempt to evaluate the performance of various tools, there have been several benchmarking studies done with diversity methods. Hanna and Doench used the human gene
HPRT1 (hypoxanthine phosphoribosyltransferase 1) to compare the on-target and off-target prediction of sgRNAs by four methods, CHOPCHOP, CRISPick, E-CRISP and GUIDES, and found that these methods gave virtually no matching output
[4,
13,
18,
34,
66]. They also conducted a comparison of guides predicted by CHOPCHOP, E-CRISP and CRISPick for six protein-coding genes, and found the rankings of sgRNAs predicted by the four methods varied considerably. Another benchmarking study was conducted on 17 available
in silico tools for genome-wide off-target prediction
[22]. Through a fair comparison, they found CRISPRoff to provide the best performance and then developed a one-stop integrated genome-wide off-target cleavage search platform (iGWOS), which has demonstrated improved predictive performance
[67]. Another study evaluated nine typical targeting design tools using six data sets across five separate cell types
[68]. In the end, they recommended different CRISPR sgRNA design tools for diverse application scenarios. They also recommend that users choose E-CRISP and CRISPick first for sgRNA targeting design, as they are well balanced in terms of prediction accuracy, prediction coverage, tool usability and adaptability to different cell types
[13,
34]. These case studies highlight the common phenomenon of the variation in the predictive performance of forecasting tools due to divergent design principles.
3.2 Criteria for selecting web sever
A list of criteria is needed to help select a tool that matches the particular experiments neatly when facing these predictive tools with different purposes. The first considerations are the diversity of the genome, the type of Cas effector and the function of the editing system being considered. The majority of tools offer sgRNA design mainly for the human and mouse genome, however, there will be significant limitations for those intending to target other genomes
[18,
24,
69]. CRISPR-PLANT v2 will be a better choice in terms of targeting plant genomes, while flyCRISPR is equally suitable for those targeting
Drosophila[31,
32].
Additionally, several tools support hundreds or any species genome, and some even allow the user to provide any genomes
[70]. PAM recognition sites differ according to the type of Cas enzymes, although the options provided by most tools for Cas9 or Cas12a and their mutants are likely to be sufficient for most users, more comprehensive PAM options will be more helpful accompanied by the development of the CRISPR toolkit. The function of an editing system under consideration is fundamental in the choice of a guide RNA designer. For example, if a transformation of a specific base is needed BE-Hive or BE-smart are recommended
[13,
54]. Also, the input and output provided by the website are important criteria. Some websites only support sequence input whereas others provide gene symbols and/or coordinates. The major output of these websites is often a table with the corresponding analysis values but some offer additional visualization of the results that may be more intuitively interpreted
[19,
41]. Some users are more interested in machined learning-based tools for prediction, consequently the design principle is also worth considering.
4 PLATFORM FOR SELECTING THE OPTIMAL sgRNA DESIGN TOOL
Notably, the constantly updated and maintained website can be onerous for developers, and so certain tools are no longer maintained probably as they have few users, such as a notice of CrispRGold has been posted that they went offline in March 2021 due to server-side issues
[71]. Together with the advantages of the website design tools described above and the wide range of tools currently available, we propose a solution for choosing the optimal tool for users. First, we tested almost all available web servers for sgRNA design that could be found at the time, mainly by using the test data given on the website, and excluded those that were not working properly. According to our previously summarized criteria for choosing an sgRNA designer, we carefully characterized 43 post-selection website tools. A detailed comparison is given in three tables (Tab.1–Tab.3), which provides a relatively comprehensive reference. As no one tool is a panacea, it is critical to fully consider the prerequisites and intended purpose of an sgRNA designer before selecting. To obtain accurate results, the user will need to mix and match the result of multiple tools sometimes, where our summarized work could be useful.
In the end, a platform, namely Aid-TG, which integrates the features including species genomes, Cas effectors, and functions of 43 web servers is provided to help users find the optimal guide RNA design tool easily and quickly (Fig.2). The user-friendly interface of Aid-TG offers a simple selection of options with buttons and outputs the most recommend web server tools with their introduction and address. In brief, users can choose their target genome, PAM sequence, Cas enzyme and the function of the gene-editing system according to their experimental purpose from a series of options that integrates the main information of 43 websites designer. Another advantage of Aid-TG is that it covers a wide range of messages, which is likely sufficient for most application scenarios, hence greatly avoiding the hassle of searching for the matching information of purpose one by one. For example, if a person wants to design an sgRNA that targets the human Tyr gene for knockdown based on the Cas9 enzyme, he just needs to open Aid-TG and click the selection, and then a recommended designer with its address will be output. Overall, Aid-TG provides a convenient web page for matching sgRNA designer neatly.
5 CONCLUSIONS
The CRISPR-mediated editing system is a powerful toolkit for gene engineering and has been applied to research in a number of areas including medicine, agriculture and basic life science. However, the on-target efficiency, which needs to be improved, and the potential off-target effect hinder the application in the clinic. Choosing an appropriate sgRNA is one of the effective strategies to increase the on-target efficiency with minimized off-target effect. A large number of in silico designers have been developed based on various algorithms and frameworks, but their results and application scenarios varied dramatically, which makes it confusing for users to choose the optimal designer for their project. In this study, we provide an overview of the conventional design of sgRNAs and the major genres of in silico tools. We also summarized benchmarking studies of sgRNA designers and provided principles to follow for selection of a guide RNA design tool. After testing 43 sgRNA design algorithms, we present here a table with key information on 43 web designers. We also developed a web server platform for the user to choose the optimal designer that matched their particular experiments in a simple and convenient way, which displays helpful guidance for sgRNA design.
The Author(s) 2022. Published by Higher Education Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0)