INTRODUCTION
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has been extensively used to determine the binding sites of chromatin associated proteins and the enrichments for specific histone modifications on a genome-wide scale [
1,
2]. One of the most important downstream analyses of ChIP-seq data is to identify genomic regions with a significant change in ChIP-seq signal across biological conditions [
3]. This analysis is a critical step towards understanding the mechanism regulating dynamic changes of gene expression during tissue development [
4–
6] and the onset of diseases [
7]. Specifically, identifying the genomic loci differentially marked by histone modifications across cell types has been widely used to search for cell type specific
cis-elements as well as the regulators associated with these elements [
4,
8,
9]. In addition, it has been revealed that lineage master regulators can cooperate with cohesin proteins and transcription factors in signaling pathways to modulate their chromatin occupancy and establish lineage specific gene expression programs [
9–
11].
In many previous studies, people simply used the overlap between the peaks identified from different ChIP-seq samples to define common and specific peaks [
12–
14]. However, it has been suggested that the cell type specific peaks defined by this approach may contain a considerable fraction of false positives, and a rigorous comparison based on statistical models specifically designed for this purpose is more recommended [
8,
15]. In particular, it usually gives more reliable results to perform ChIP-seq data analysis in a quantitative manner, especially for cross-condition comparisons [
8,
9,
16]. For example, several recent studies suggested to quantitatively combine the ChIP-seq signal intensity of a peak, sometimes called peak height, as well as the distance from this peak to a candidate target gene to represent its regulatory potential to the gene [
16–
18]. It has been shown that such a quantitative measure can integrate with other observations to better infer the functional impact of the protein’s chromatin binding being studied. Another example is that Shao
et al. performed a systematic comparison of ChIP-seq data between different cell types, and found that, for histone modifications like H3K4me3 and H3K27ac, quantitative changes of ChIP-seq signals strongly correlate with the expression changes of target genes as well as the binding of cell type-specific regulators [
8]. Following this direction, Xu
et al. suggested that, although in mammalian genomes distal enhancer elements often have clearly higher cell type specificity than proximal promoters, it is still important to use the quantitative changes of associated histone modifications to define a high-confidence set of cell type specific enhancers, especially when the difference between the cell types under comparison is mild [
9].
Despite the importance of differential binding analysis with ChIP-seq data and the increasing need of methods for this analysis, it is still computationally challenging to reliably assess the statistical significance of changes in signal intensity on a genome-wide scale, due to the high level of noise and variability intrinsic to ChIP-seq data. More specifically, ChIP-seq is a multi-step experiment where biases may be introduced at each step [
19,
20], leading to a generally limited data reproducibility [
21,
22]. Among others, the amount of input material, efficiency of antibody, sequencing quality and depth may vary considerably from an experiment to another [
19,
20]. As a result, ChIP-seq samples typically have quite different signal-to-noise ratios, especially for those generated from different batches and/or labs, which makes it extremely difficult to quantitatively compare the signal intensities between samples [
8].
In recent years, quite a number of computational tools have been developed to address the problem [
3,
15]. These tools take advantage of different statistical techniques and vary in the range of applicability. This review aims to summarize existing computational tools for differential binding analysis with ChIP-seq data, according to their scope of application as well as strategy of statistical modeling (Figures 1 and 2). For simplicity, we primarily focus on the comparison of ChIP-seq samples between two biological conditions, with or without replicates. Besides, we assume the sequencing reads have been appropriately mapped to a reference genome [
23].
STRATEGIES FOR DIFFERENTIAL BINDING ANALYSIS WITH ChIP-SEQ DATA
Peak calling for ChIP-seq data
Typically, a considerable proportion of the mapped reads of a ChIP-seq sample are dispersed throughout the genome, while the others cluster together constituting reads-enriched regions, termed peaks (Figure 1, top) [
24,
25]. To be noted, ChIP-seq reads falling outside of peak regions are predominately contributed by background noise or non-specific binding, while the peaks with significantly elevated ChIP-seq signal intensities generally represent stable binding sites of the protein being ChIPed or genomic regions heavily marked by a specific histone modification. A number of algorithms have been developed to identify significant peaks on a genome-wide scale [
25–
27]. Here, we provide a brief overview of the available tools for ChIP-seq peak calling. One of the reasons is that a large number of computational tools for differential binding analysis require users to provide a set of pre-defined peaks for the ChIP-seq samples under comparison and then focus on modeling the ChIP-Seq signals at peak regions [
8,
28,
29]. Another reason is that quite several peak calling programs claimed that they could also be applied to identify differential peaks between two ChIP-seq samples by taking one of them as input [
25,
26]. Thus, the basic concept and strategies of ChIP-seq peak calling can serve as a valuable reference for differential binding analysis.
To be noted, the characteristics of peaks, especially their size, can differ substantially depending on the protein or histone modification targeted in the experiment. For example, the majority of transcription factors and many histone modifications like H3K4me3 and H3K27ac tend to have narrow peaks, with a size ranging from several hundred to a few thousand base pairs [
25]. MACS has been widely used to perform peak calling on such ChIP-seq samples, especially those for transcription factors which are usually associated with sharp and isolated peaks [
24,
25]. For some other histone modifications such as H3K9me3 and H3K36me3, they tend to form broad genomic domains with diffusive ChIP-seq signals, which can span up to hundreds or even thousands of kilo base pairs [
30,
31]. There are computational tools that are specifically devised to handle such situations. For example, SICER is developed to identify large spatial clusters of ChIP-seq reads [
26], while RSEG utilizes a hidden Markov model (HMM) to detect broad epigenomic domains with consecutively elevated ChIP-seq signals [
32]. Notably, both MACS and SICER accept a treatment ChIP-seq sample and an optional input sample as the negative control for peak calling. The latter is highly recommended for a practical ChIP-seq experiment design and can be used to account for local biases resulting from read mappability, DNA repeats, local GC content and so on [
25].
Many studies choose to generate multiple ChIP-seq samples for the same biological condition, with the aim to assess the variability and reproducibility of ChIP-seq signals. To make a full use of the replication, two strategies are usually employed to integrate the ChIP-seq replicates and derive a single list of peak regions. One of them directly performs a joint analysis over the replicates, which has been shown to detect peak boundaries with high precision [
33]. The other strategy first calls peaks on each individual sample, and then uses measurements such as IDR (irreproducible discovery rate) to select the peak regions with high reproducibility across replicates [
21,
34].
Differential binding analysis based on pre-defined peaks
Since peak regions with significantly elevated ChIP-seq signals are often of the highest interest across the whole genome, especially for the factors with sharp binding peaks, a lot of computational tools choose to perform differential binding analysis only on the peaks identified from the ChIP-seq samples under comparison [
8,
28,
29,
35]. To exploit such tools, users typically need to start with peak calling on each ChIP-seq sample involved. But, it should be noted that peak calling on each individual sample is usually a fundamental step of ChIP-seq data analysis and the obtained peaks could also be used for other analyses [
9]. Thus, it will be easy to integrate the results of differential binding analysis with pre-defined peaks with the other analyses. In the start of a differential binding analysis with pre-defined peaks, usually the peak regions of all the samples under comparison are first merged into a consensus set of peaks, which defines the search space where differential ChIP-seq signals are expected to find (Figure 1, lower left). In principle, these peaks serve as the reference genes in a typical differential expression analysis with RNA-seq data [
36,
37]. Therefore, many solid statistical models initially developed for calling differentially expressed genes with RNA-seq data can then be adapted to ChIP-seq data [
38].
Biological replicates may not be available in practical studies. In the extreme case, only one ChIP-seq sample is available for each of two conditions. In this case, many peak calling tools such as MACS and SICER can also be used to identify differential peaks, by taking one of the two samples as treatment and the other as negative control [
3,
25]. Although these methods come up with a
P value to assess the statistical significance of ChIP-Seq signal change at each detected differential peak, it should be strongly emphasized that, without replication, there is no way in principle to estimate biological variation in the measured signal intensities and, hence, no meaningful inference regarding the population can be made [
39]. Therefore, any
P value deduced in this context has only exploratory value. On this account, it may make more sense to measure the practical significance of signal changes. MACS and SICER calculate a fold change of ChIP-seq signal intensity for each candidate differential peak by normalizing each sample on basis of its library size, which is often inappropriate for ChIP-seq data as different samples may have highly distinct signal-to-noise ratios. Previously, MAnorm proposed to normalize two ChIP-seq samples based on their common peak regions [
8]. The method introduces a hypothesis that, when two ChIP-seq samples share a significantly larger number of common peaks than expected by chance, the binding of the targeted protein in the experiment at these common peaks is very likely to be mediated by largely the same mechanism. Hence, no global binding changes should be expected at these peak regions. Based on this hypothesis, MAnorm utilizes the traditional M-A plot, in which the log2 fold ratios of signal intensities are plotted against the average log2-transformed intensities between two samples, and fits a linear model between the M and A values in their common peaks. Then, the linear model is used as a reference to correct the M and A values of all peak regions. Through that, the variation in signal-to-noise ratio across samples is largely moderated, leading to a more robust estimation of the fold change of ChIP-seq signal intensity [
3,
8]. Besides MAnorm, in the field of RNA-seq data analysis there are methods that propose to improve the estimation of fold change by taking into account the large variance of M values calculated from small read counts [
37,
40]. For example, DESeq applies a variance-stabilizing transformation on RNA-seq count data [
37], making the M value comparable between transcripts with different expression levels and, thus, providing a more reasonable ranking of differentially expressed genes. Another example is GFOLD, which shrinks the M values calculated from small read counts to 0 under a Bayesian framework [
40]. In principle, these methods can be easily adapted to the differential binding analysis with ChIP-seq data.
When ChIP-seq replicates are available, it becomes feasible to model the biological variation at each peak region besides the technical variation introduced during sample preparation and sequencing. This strategy is widely used in the computational tools developed for differential expression analysis of RNA-seq data such as edgeR and DESeq [
36,
37], which are inherited by DiffBind and DBChIP by adapting their statistical models to ChIP-seq data [
28,
38]. These tools introduce potential sources of bias like sequencing depth into the model and perform a normalization as well as a differential binding analysis simultaneously. For the majority of other methods, however, normalization between ChIP-seq samples is required prior to conducting a differential binding analysis (Figure 1, lower left). In principle, any normalization approach can be adapted to these methods. To normalize away the most concerning factor across samples, which is sequencing depth, some methods rely on the total or effective read counts [
35]. Such approaches tend to perform poorly when peak regions are highly heterogeneous across samples [
8]. Besides, peaks associated with very large read counts may considerably bias the normalization result. Normalization methods that avoid using total read counts include those implemented in THOR, MAnorm and DBChIP [
8,
15,
28]. Basically, these methods perform a normalization on basis of peak regions that are supposed to be invariant across samples, such as the promoter regions of house-keeping genes [
15] and the observed common peaks [
8]. In general, ChIP-seq signals in these regions are more reliable and stable than in the others, and the methods are therefore resistant to individual highly-represented peaks. Moreover, some computational tools can handle additional sources of bias by considering local GC-content, input subtraction and sequence mappability along the genome [
15,
28,
29,
32]. In spite of these bias correction and signal normalization procedures at the level of individual samples, ChIP-seq data may still be associated with serious batch effects, especially in the large-scale studies where plenty of samples are involved. COMBAT and ARSyN, which are initially developed to remove batch effects in microarray data [
41,
42], have been shown to work well with normalized sequencing data [
39].
The first computational tools for differential analysis with sequencing count data have used discrete distributions such as the Poisson and negative binomial [
28,
35,
37,
43]. The negative binomial distribution can be viewed as a gamma-Poisson hierarchy. From this perspective, it explicitly models the underlying biological variance, which is believed to be the primary cause of the observed over-dispersion in sequencing data, in addition to the technical variance expected from randomly sampling from a pool of molecules [
37]. On the other hand, though it follows the nature of count data, the use of discrete distributions is not a requisite for an accurate differential binding analysis. In principle, applying continuous distributions such as the normal distribution to sequencing data analysis is valid as long as the mean-variance relationship for counts is carefully modeled [
44]. voom applies a log-transformation to the normalized read counts while learning the global variance structure for transformed values [
44]. The method unlocks a repository of statistical methodologies originally devised for microarray data, and has been shown to work well compared with many approaches based on discrete distributions [
44,
45]. In either case, nearly all the methods have made an effort to reduce the uncertainty associated with peak-specific variance estimates, considering that a highly limited number of replicates is often the case. Among others, a widely adopted strategy is to borrow information between peaks [
36,
37,
44], given the parallel structure in a typical differential binding analysis in which the same model is fitted to each peak. For example, DESeq improves the variance estimates by fitting a mean-variance curve and, hence, sharing information between peaks with close signal levels [
37]. voom adapts count data to the empirical Bayes framework implemented in limma and integrates information from all peak regions into a common prior distribution of the variance associated with each peak [
44,
46].
One-step differential binding analysis without the requirement for pre-defined peaks
Sometimes users are mainly interested in the genomic regions with significant ChIP-seq signal changes across conditions. Thus, they may prefer to perform a one-step differential binding analysis without doing peak calling for each ChIP-seq sample in advance. Following this direction, an alternative strategy of differential binding analysis directly seeks for the changes in ChIP-seq signal intensity throughout the whole genome, without the need of calling peaks beforehand [
15,
32,
47–
50]. Approaches following this strategy address an obvious issue arising from using peak calling based methods, in which the search for differential binding sites is restricted to pre-defined peak regions and artefacts may be introduced when applying a certain cutoff to define peaks. These approaches can be further classified into two categories. One of them scans the whole genome with a sliding window and consecutively performs the same statistical test on the ChIP-seq signals at each window [
49,
50], where the window size is usually selected to match the typical size of a ChIP-seq signal enriched region. The other class takes advantage of more sophisticated segmentation techniques such as HMM [
15,
32,
47,
48], where the genome is fragmented into sequences of bins and a putative hidden state is then inferred for each bin to indicate whether it is associated with differential ChIP-seq signal (Figure 1, lower right). One of the reasons accounting for the superiority of using an HMM is that, for a target bin, it incorporates ChIP-seq signals lying in the vicinity to improve the inference made for the bin. These methods are therefore robust to the selection of bin size and can achieve a wide range of resolution in identifying differential binding sites.
A famous application of HMM in ChIP-seq data analysis is ChromHMM, which takes a set of ChIP-seq samples of multiple chromatin marks generated under the same cellular condition as input, and systematically detects their combinatorial binding patterns as representation of local chromatin states [
51,
52]. It has been applied to a large panel of human cell types to derive a systematic annotation of chromatin states along the whole genome for each of them [
53]. The method leverages the correlation between different marks and, thus, significantly improves the interpretation of observed ChIP-seq signals. However, employing ChromHMM requires the availability of multiple ChIP-seq samples for various marks in a single condition, which seriously limits its applicability. Other HMM based methods, such as THOR [
15], are specifically devised to call genomic regions of differential ChIP-seq signals between a pair of biological conditions. THOR accepts ChIP-seq samples for a single mark from two biological conditions, and encodes the significance as well as the direction of ChIP-seq signal changes into the underlying states of an HMM (Figure 1, lower right).
Despite the clear advantages of HMM based methods, they usually make a stronger assumption on the observed data than the methods focusing on peak regions. More specifically, most of these methods train an HMM with a very limited number of hidden states (typically 3 for a comparison between two conditions; Figure 1, lower right) to model the observed ChIP-seq signals [
15,
32,
48], which could be impractical for a real ChIP-seq dataset and may result in a loss of flexibility. In particular, compared with peak calling based methods, HMM based methods may be less sensitive to quantitative changes in ChIP-seq signal intensity between a pair of closely related conditions (e.g., in studies of personal epigenomes [
54]).
Practices for selecting the appropriate tools for a custom differential binding analysis
In Figure 2, we use a diagram to depict the major characteristics and applicability of the computational tools introduced in this review. The diagram also serves as a practical decision tree for researchers to choose proper methods depending on their own dataset. Besides, there are several points that we think are necessary to highlight.
Firstly, the computational tools based on pre-defined peaks typically focus on modeling the ChIP-seq signals at peak regions, and, thus, tend to be less sensitive to the variation in signal-to-noise ratio across samples compared to the methods involving background signals in modeling, e.g., the HMM based methods. Hence, for most transcription factors and the histone modifications associated with narrow peaks (e.g., H3K4me3 and H3K9/27ac), methods based on pre-defined peaks should take the priority. On the other hand, HMM based methods may better fit for analysis with the histone modifications typically constituting broad domains (e.g., H3K9me3 and H3K36me3), as the variations of local ChIP-seq signals for these marks may not be quite informative and the HMM based methods can borrow information from flanking genomic regions to help identifying large chromatin domains with a continuous change [
47]. Besides, some HMM based methods can also be utilized to detect subtle ChIP-seq signal changes within a broad domain, such as the partial losses/gains of histone modifications [
15,
48].
Secondly, for the peak calling procedure, JAMM is recommended in the cases where biological replicates are available [
33]. It can integrate information from replicates and determine peak boundaries with high precision. Particularly, it could resolve neighboring narrow peaks and, thus, increase the resolution of the downstream differential binding analysis.
Finally, tools such as DESeq and edgeR are originally designed for RNA-seq data, which are expected to have less variability between samples than ChIP-seq data. These methods believe sequencing depth is the only concerning factor that needs to be normalized between samples, and integrate the normalization procedure into the differential analysis [
36,
37]. In the case where this assumption does not hold (e.g., when the ChIP-seq samples under comparison are generated from different batches or labs), it is wiser to first extensively correct for confounding factors and normalize ChIP-seq signal intensities, prior to performing the differential binding analysis. The normalization approaches implemented in MAnorm and THOR are recommended, as they are robust to the variation of signal-to-noise ratios across samples [
8,
15]. For reference, Table 1 summarizes the main features and practical utility of each method shown in Figure 2.
OUTLOOK
ChIP-seq has become the standard technology for determining transcription factor binding sites and histone modification enrichments on a genome wide scale, and the tools for its differential binding analysis are continuing to evolve. Despite the importance of this analysis, the agreement between the results obtained by applying different tools could be surprisingly low [
3]. This observation strongly stresses the importance of choosing proper methods based on the specific experimental setting and application scenario. In addition, analytic challenges can emerge under certain contexts. Here we take two scenarios as examples to illustrate the point.
Methods for ChIP-seq data normalization are usually based on the genomic loci with signal levels that are expected to be invariant across samples [
8,
15], or the assumption that the majority of the loci being analyzed are not differentially bound [
28,
37]. Such methods, however, may not be appropriate for the case where a global change of chromatin binding takes place (e.g., when an enzyme catalyzing the histone modification under comparison is functionally depleted from the cells) [
55]. ChIP-seq data normalization in such cases is difficult to accomplish, due to the lack of “invariant” loci. To our knowledge, no computational tools are currently available to resolve this problem. Recently, spiking experiment was proposed to specifically deal with this problem [
55]. In the experimental procedure, a constant, low amount of chromatin sample from a foreign species is added to the chromatin samples of interest prior to the immunoprecipitation step. Then, this ‘‘spike’’ genome is used as an internal reference for adjusting ChIP-seq signal levels across samples.
A universal technical problem associated with analyzing count data is that the mathematical theory of discrete distributions (e.g., the Poisson and negative binomial distributions) is far less tractable than that of the normal distribution, which seriously limits the type of analysis that can be performed on sequencing data. For example, a large number of statistical methods based on the normal distribution have been developed to analyze intensity data from microarrays [
44], including those for detecting differential expressions [
46], modeling random effects [
56], testing gene sets [
57,
58] and so on. However, most of the tools developed for RNA-seq and ChIP-seq data aim at the differential analysis, and only a few of them can handle complicate experiment designs (see Table 1). To re-use the statistical models originally devised for microarray data analysis, voom is proposed to transform sequencing tag counts into the values showing a continuous manner that can be modeled by the normal distribution [
44]. These transformed values can then be input to the models for analyzing microarray signal intensities. The normalization and variance structure learning procedures implemented in voom, however, are specifically tailored for RNA-seq data, which are supposed to have less variability and lower noise level than ChIP-seq data. For future methodology studies, a similar method suited for ChIP-seq data is under expectation.
Higher Education Press and Springer-Verlag Berlin Heidelberg