Introduction
Branchio-oto-renal syndrome or BOR (OMIM 113650, 602588, 601653) (
Kumar et al., 2000;
Kochhar et al., 2007) is a phenotypic disorder characterized by hearing loss, branchial arch and urinary tract malformations. This is known as Melnick-Fraser syndrome, an autosomal dominant developmental disorder which creates hearing impairment and renal failures in the affected individuals (
Fraser et al., 1978;
Fraser et al., 1980;
Fraser et al., 1983;
Abdelhak et al., 1997;
Kumar et al., 2000;
Kawakami et al., 2000;
Ruf et al., 2004;
Kochhar et al., 2007;
Christensen et al., 2008;
Patrick et al., 2009). The mutated and faulty genes end up in to faulty protein function during organogenesis, thus causing structural defects. The recent studies emerge that functionally defective missense mutations responsible in disease are located in
SIX1,
EYA1 and
EYA2 genes (
Kumar et al., 2000;
Ruf et al., 2004;
Kochhar et al., 2007).
We take a brief review of BOR syndrome and SIX1-EYA protein regulatory system. Roughly these genes were reported on broad regions of chromosome 8q
1 in 1994. The relationship of syndrome and
SIX1-EYA regulatory system was established around 2004 (
Ruf et al., 2004;
Christensen et al., 2008). Mutations in EYA proteins are the cause of 40 percent of BOR related cases; fact established around 2002. Later studies in 2004–2005 established that the SIX family is another contender. Kochhar et al. (
2007) gave a concise review on the pre-2006 stands up. The confirmatory remark on the SIX1-EYA-DNA complex disruption and BOR link was pronounced in the literature (
Ruf et al., 2004). Recently author Patrick et al. (
Patrick et al., 2009) studied all contemporary SIX1 mutations, their binding with EYA protein and DNA. The study demonstrated that only V17E mutation completely abolished the SIX1-EYA1 complex formation; whereas other mutations were able to form stable complexes. Even so, the remaining five were unable to keep the DNA complex and the V17E was able to bind the same. Concluding that, all of the
SIX1 mutants were defective in their transcriptional activity (
Patrick et al., 2009).
Let us zoom in to the protein−protein interaction. SIX family proteins are characterized by two major domains first is a DNA binding homeodomain (HD) and second is for protein interaction, SIX domain (SD) (
Kumar et al., 2000;
Ruf et al., 2004;
Kochhar et al., 2007;
Christensen et al., 2008;
Patrick et al., 2009). EYA protein belongs to the EYA gene family (transcriptional cofactors). They contain N-terminal transactivation domain and highly conserved EYA domain (ED) (
Patrick et al., 2009;
Kumar et al., 2009). ED shows phosphatase activity and involves in protein−protein interaction with SIX family homeo-domain proteins.
SIX1 protein attaches to EYA to form a transcriptional complex, where
SIX1 confers DNA binding and EYA confers transactivation activity respectively. SIX domain (SD) is important for interaction with EYA and a homeodomain (HD) is responsible for DNA binding (
Christensen et al., 2008;
Patrick et al., 2009). Thus overall a transcription activity is carried out on a specific DNA site. All of these records come from molecular biology laboratories, and there has been no computational studies performed in near instance.
Methodology
Following methodological algorithms and tools were employed.
Data set
Protein sequences corresponding to SIX1 and EYA2 protein were obtained from the Uniprot website (http://www.uniprot.org) (
Bairoch and Apweiler, 1996). Protein structures extracted from the Protein Data Bank; Protein IDs (for EYA2 protein) are as follows: 3HB0, 3HB1, 3GEB, 4EGC. All of these proteins except 4EGC were found to be homo-tetramers; 4EGC is a Hetero-dimer; confirmed by pairwise sequence alignment (
Larkin et al., 2007). For the analysis of SNPs on the human
SIX1 gene, we have retrieved the sequence data from the NCBI (DBSNP) and SWISS-Prot databases. SWISS-MODEL is used for modeling; it is a fully automated protein structure homology-modeling server. The amino acid sequence of a protein is submitted to construct a 3D model (
Schwede et al., 2003). Afterwards NOMADRef program (http://lorentz.immstr.pasteur.fr) has been employed to generate the energy minimized structure files of wild and mutant proteins.
SIFT
SIFT (Sorting Intolerant from Tolerant) is a sequence homology based tool, which presumes that important amino acids will be conserved in the protein family. Hence, changes at well-conserved positions tend to be deleterious. SIFT takes a query sequence and uses multiple alignment information to predict tolerated and deleterious substitutions for every position of the query sequence. Substitutions with normalized probabilities less than cutoff are predicted to be deleterious. The cutoff value in SIFT program is>0.05 below which mutations are predicted deleterious (
Ng and Henikoff, 2003).
PolyPhen2
PolyPhen2 tool uses eight sequence-based and three structure-based features to predict the damaging effects of missense mutations. It analyzes the profiles of homologous sequences and substitution site for a known protein's 3-dimensional structures. In turn they are the parameters taken into account by PolyPhen server to calculate the Position Specific Independent Counts (PSIC) score for each of the two variants, and then compute the PSIC score difference between them. Higher the PSIC score difference, higher is the functional impact a particular amino acid substitution (
Adzhubei et al., 2010).
I Mutant 2.0
I Mutant 2.0 is a support vector machine (SVM)-based tool for prediction of protein stability changes upon single point mutations, starting either from the protein structure or sequence. SVM was trained and tested on a data set derived from ProTherm. The predicted free-energy change value (∆∆G) is calculated from the unfolding Gibbs free energy value, of the mutated protein minus the native type (kcal/mol). Positive ∆∆G values mean that the mutated protein posses high stability and vice versa (
Capriotti et al., 2005).
SNPs&GO
SNPs&GO program predicts the disease related mutations from the input of protein sequences with a scoring accuracy and Matthews correlation coefficient of 82% and 0.63 respectively. It is a SVM-based classifier consisting of a single SVM that takes in an input protein sequence, profile and functional information. Of note, we have employed two different SVM based algorithms mainly to eliminate the false positive data in computational prediction. The server predicts if given variation can be classified disease-related or neutral (
Calabrese et al., 2009).
PHD-SNP
PHD-SNP is another SVM trained and tested on protein sequence and profile information. It is optimized to predict point mutation to classify as disease-related or as neutral polymorphism (
Capriotti et al., 2006,
2013).
PANTHER
PANTHER algorithm is based on a library of Hidden Markov Models (HMMs) obtained from the multiple sequence alignments of different protein families. PANTHER estimates the likelihood of a particular nsSNP causing a functional impact. The software computes the predicted free energy change value (∆∆G) similar to I Mutant. A positive ∆∆G value is a measure of protein stability (
Thomas et al., 2003a;
2003b).
IUPred server
The basis of predicting protein disorder is the difference in sequence characteristics between folded and disordered proteins. Typically, Interensically Unstructured Proteins (IUPs) exhibit a strong bias in their amino acid composition. It uses a quadratic expression in the amino acid composition. Applying this calculation for IUP sequences, their estimated energies are clearly shifted toward less favorable energies compared to globular proteins, enabling the prediction of protein disorder on this ground. This tested criterion works out in the algorithm and it provides a probability to amino acids individually (
Dosztányi et al., 2005a,
2005b;
Dosztányi et al., 2009;
Mészáros et al., 2009).
Molecular docking
The docking study was utilized in order to understand the binding affinity. Initially the complex structures were formed by using GRAMM-X docking algorithm (
Tovchigrechko and Vakser, 2006). This server is the prolongation of the original GRAMM Fast Fourier Transformation methodology by employing smoothed potentials, refinement stage, and knowledge-based scoring. The server was extensively tested by benchmarking, several months of public employment. Subsequently we have used DFIRE2 server (
Tovchigrechko and Vakser, 2005) for the calculation of free energy scores. The PyMOL view of the docked complex is shown in Fig. 1.
PIC server
Protein Interactions Calculator (PIC) is a server, which recognizes several kinds of molecular interactions at protein−protein interaction interface; such as disulphide bonds, hydrophobic interactions, ionic interactions, hydrogen bonds, aromatic- aromatic interactions, aromatic-sulfur interactions and cation- π interactions within a protein or between proteins in a complex. It reads a PDB file of docked protein as input (Tina et al., 2007).
Results
In the original data set obtained from Uniprot and NCBI SNP database we have achieved 27 natural variations out of which 23 were unique; They are as follows V17E, E28D, H44Q, H73P, A95V, R99C, V106G, R110Q, R110W, R112C, W122R, Y129C, T165I, V168I, N193I, P211L, E217K, D227E, D227Y, P249Q, P249L, T252A, L260Q and the wild type. We have performed computational analysis of deleterious missense mutations with 6 major servers; they are SIFT, Polyphen2, I Mutant2, PHDSNP, PANTHER and SNP&GO. We started with putting them on mutation analysis servers; this has reduced the number to one third of the original figure.
SNP data analysis results
SIFT gave 15 out 23 mutations; Similarly Polyphen2 yielded 13 out of 23 mutations; I Mutant2 predicted 14 probably damaging mutations (Table 1). Only 7 mutations are consistent with our strategy, that they should show apparent loss of function, we took them for further analysis. Then PHD-SNP, PANTHER and SNPs&GO servers calculated similar reliability index of 9 or above for all of the 7 mutations, with the minimum probability score>0.9 (Table 2). Conclusively the remaining mutations are: V17E, R99C, V106G, R110Q, R110W, R112C and W122R.
IUPred results
The original wild type protein and all the seven mutants were analyzed on ANCHOR and IUPred servers for their disorder (
Dosztányi et al., 2005a,
2005b;
Dosztányi et al., 2009;
Mészáros et al., 2009). The study reveals that a single amino acid change in a protein structure here has significant impact in protein disorder profile (Figs. 2–4). This elaborates probability of a domain being disordered, based on IUPred and ANCHOR algorithm. The difference is not visible at first sight, however when we overlap these mutant graphs with the wild type we can identify the dissimilarity. Just a single amino acid change from the original structure leads to the distortion of the disorder probability graph at various places. All these profiles show a commonality, as in all the mutation there is just a single change in amino acid, whereas the probability change is observed in nearer amino acids as well as that are far away from sight of change. Following inference can be taken out:
1) V17E shows no affect of disturbance around 120th amino acid and onwards, this is fairly visible; thus the DNA binding site does not affect much, we predict usual DNA binding.
2) R99C, V106G, R110Q, R110W may have similar in binding pattern as their IUPred disorder profile is distinctly similar. Regions of both binding sites are affected by disturbance.
3) R112C and W122R were found different among all, the comparative disorder spread was observed to be of long range in both. Regions of both binding sites are affected by disturbance.
4) Except V17E all of them show major fluctuation near around 100 to 140th amino acid, which is the starting of a putative DNA binding region near 124th amino acid.
Molecular docking
In the beginning we modeled SIX1 protein and EYA2 protein structure. All the modeling has been done using SWISS-MODEL service on the ExPASy server in automatic mode. 4EGC protein was employed as template to model SIX1 protein. Protein 3HB1 chain A was taken as template to model EYA2 protein structure. SIX1 mutations at V17E, R99C, V106G, R110Q, R110W, R112C and W122R positions were then modeled. 3D model were energy minimized using NOMADRef server. Consequently we had 9 structural files ready at hands; original wild type SIX1, EYA2 protein and 7 SIX1 mutant proteins. Further these files were uploaded to GRAMM-X docking server to proceed with docking. We had 8 combinations of docking in between EYA2 and SIX1 wild type and its 7 analog mutants. As a result a docked file has been generated in each experiment. The first result out of 10 output structures was taken in the interpretation. To gain understanding of the types of molecular interactions occurring at the interface we have used PIC server. The Table 3 briefly shows the number of ionic interactions taking place at the interface and the DEFIRE free energy score. To compute free energy scores DFIRE server has been employed. The results are shown in Table 4.
Discussion
Figure 5 depicts the relation between the numbers of ionic interactions and the normalized free energy. For the sake of scaling free energy score divided (normalized) by a factor of 100. The Figure clearly indicates there is one to one relationship between the two variables; therefore we deduce out reasoning that ionic interactions play a major role in protein−protein interaction. Looking at the Fig. 5 and Table 5, originally only one ionic interaction was present in native protein while docking (41 ASP of SIX1 and 210 LYS of EYA2). This alone interaction is visible in V17E and R110W mutant proteins. While in all the other mutant (docked complexes), we can witness five or six ionic interactions. All other mutations are interacting out of their usual places as compared with wild type interactions. Original SIX1-EYA2 complex has 4 hydrophobic interactions in the former region at 35-39 position that was seen absent from all mutant proteins except V17E (Fig. 6). Besides these, there is a significant increase in the number of hydrogen bonds in all SIX1 mutants except V17E and number of hydrogen bonds in the interface has almost doubled. V17E has been an exception, where hydrogen bonds and free energy has decreased. We bother mentioning one more interaction has been established, that was originally not present, its cation- π interaction. Two cation-π interactions have been found in every mutant except V17E. We can also conclude that the binding in all the mutant cases is stronger than its wild type relative.
Conclusion
The present study aims to report the SIX1 mutations with their disease expressions at structural level. Generally this has been observed that whenever we find a critical mutation in a protein, we shall find it in vicinity of the active sites and active site structure distort such that the binding force loosens up hence its binding energy decreases. In the present investigation of SIX1 protein particularly, we report following three revelations.
First, we state with evidence that the mutations far away from active site or binding regions can also affect the binding region. IUpred results clarify that impact of single point mutation can be of long range.
Secondly, we report that the free energy or the binding energy of docked complex which is supposed to decrease in a general trend is increasing in many of the mutations (Table 4); thus indicating stronger binding. This also endorses the results from wet laboratory observations empirically.
Thirdly, the mutant proteins bind EYA2 strangely away from its original active site but occupying the part of protein sites responsible for binding the DNA in native state.
The key finding of the work is that, the binding ability in the SIX1 mutant has been heightened; however, it may get the DNA binding side functionally affected. Our results are matching and vindicating wet laboratory work; consequently providing the proof at molecular level. Therefore we can predict that all these mutations may wind up as nonfunctional to activate a transcriptional regulator mechanism.
Higher Education Press and Springer-Verlag Berlin Heidelberg