Introduction
Niemann-Pick disease type C (NPC) (OMIM 257220) is an autosomal recessive genetic disorder that causes an abnormal accumulation of cholesterol and other lipids in many cell types (
Morris et al., 1999;
Beltroy et al., 2005). The incidence is estimated between 1:120000 live births (
Garver and Heidenreich, 2002). NPC is caused by mutations of either the NPC1 (95% of families) or NPC2 genes in humans, although the precise mechanisms of action of these proteins are still under investigation. The identification of mutations within the NPC1 gene is challenging. To date, 252 mutations of NPC1 and 18 mutations of NPC2 gene have been reported at NPC database (http://npc.fzk.de/) (
Runz et al., 2008). The NPC1 gene, mapped to chromosome 18q11-q12, spans 56 kb and contains 25 exons (
Tamura et al., 2006;
Xiong et al., 2012). NPC1 is a multispan membrane protein that is typically associated with late endosomes or lysosomes (
Vanier and Suzuki, 1998), degradative organelles which hydrolyzed the cholesteryl esters brought into the cell through lipoproteins (
Goldstein and Brown, 1992;
Garver and Heidenreich, 2002). NPC1 has a sterol-sensing transmembrane domain which is similar to endoplasmic reticulum proteins that respond to alters in cellular cholesterol (
Carstea et al., 1997). The NPC1 protein assists the transbilayer transport of some hydrophobic molecules, but it does not appear to transport cholesterol directly
(Puri et al., 1999;
Liscum, 2000;
Ioannou, 2000;
Scott and Ioannou, 2004). Loss of functional NPC1 or NPC2 causes the accumulation of free cholesterol (FC) in endocytic organelles that comprised the characteristics of late endosomes and/or lysosomes. These abnormal organelles will be referred to here as lysosome like storage organelles (LSOs) (
Pipalia et al., 2006). The LSOs that are associated with NPC are quite similar to the LSOs associated with other hereditary glycosphingolipid storage disorders (often caused by the inability to metabolize a particular lipid). The storage organelles contain multi layered internal whorls of membrane bilayers which contain cholesterol, sphingomyelin, and high amounts of bis-(monoacylglycero)-phosphate (BMP), also known as lyso-bisphosphatidic acid (LBPA) (
Kobayashi et al., 1999;
Sun et al., 2001). The clinical features of NPC are vertical supranuclear gaze palsy, cerebellar, ataxia, dysarthria, dysphagia, progressive dementia, cataplexy, seizures and dystonia (
Garver et al., 2007,
2010) which severely affect the patient’s development and life quality.
Prediction of the disease causing nsSNPs using computational approach has become a eminent methodology. Several research articles state its effectiveness in identifying the deleterious, disease associated mutations, thus predicting the pathogenic phenotypic alleles in correlation to its functional and structural damaging properties (
Carvalho et al., 2007,
2009;
Goldgar et al., 2004;
Karchin, 2009;
Kumarand Purohit, 2012a, 2012b,
2012c;
Kumar et al., 2012,
2013a,
2013b;
Balu and Purohit, 2013;
Kamaraj and Purohit, 2013,
2014). An in-depth study of the genetic mutations and their molecular basis of invoking disease related pathways has become an important insight of genomic research (
Steward et al., 2003;
Mooney, 2005;
Ng and Henikoff, 2006;
Kumar et al., 2014). The future of SNP analysis significantly lies in the development of personalized medicines that can aid the treatment of genomic variations induced disorders at a higher extent.
The main of objective of this study is to identity the most deleterious and disease-associated nsSNPs in NPC1 gene and their structural effect at molecular level. In this analysis we used SIFT, Polyphen 2.0, PANTHER, PhD-SNP, Pmut and MUTPred tools to ranked the most deleterious and disease associated nsSNPs (non-synonymous) from the available SNP data sets which obtained from dbSNP database. The disease-associated mutations showed high damage in structure and leads to affect the function of the NPC1 protein. Conformational changes in the 3D structure of the protein account for its time dependent physiologic affinities and various biochemical pathway alterations (
Purohit and Sethumadhavan, 2009;
Purohit et al., 2011a;
2011b;
Rajendran et al., 2012;
Balu et al., 2013; Keener et al., 2014;
Kumar and Purohit, 2014). I-TASSER (
Zhang, 2008), a threading based approach was used to model native and mutant NPC1 protein structures. PROCHECK (
Laskowski et al., 1996) and PROSA (
Wiederstein and Sippl, 2007) program was applied to evaluate the model of native and mutant NPC1 structures. We further applied quantitative assessment and flexibility analysis to observe the structural consequence of NPC1 proteins upon mutation.
Materials and methods
Data set
The data on human NPC1 genes were collected from OMIM (
Amberger et al., 2009) and Entrez Gene on National Center for Biological Information (NCBI) Website. The SNP information of NPC1 genes were obtained from dbSNP (http://www.ncbi.nlm.nih.gov/snp/) database (
Sherry et al., 2001). The aminoacid sequence of NPC1 protein was retrieved from the Uniprot database (Uniprot ID: O15118).
Disease associated SNP prediction
The single nucleotide polymorphism occurring in the protein coding region may lead to the deleterious and affect its 3D structure, this phenomenon may lead to disease-associated. Here we used SIFT (http://sift.jcvi.org/) (
Kumar et al., 2009), PolyPhen 2.0 (http://genetics.bwh.harvard.edu/pph2/) (
Adzhubei et al., 2010), PANTHER (http://www.pantherdb.org/) (
Thomas et al., 2003), PhD-SNP (http://snps.biofold.org/phd-snp/) (
Capriotti et al., 2006), Pmut (http://mmb2.pcb.ub.es:8080/PMut/) (
Ferrer-Costa et al., 2005) and MutPred (http://mutpred.mutdb.org/) (
Li et al., 2009) tools in order to examine the disease-associated nsSNP occurring in the NPC1 protein coding region. SIFT uses sequence homology-based approach to classify amino acid substitutions (
Kumar et al., 2009). The prediction score<0.05 is considered to be deleterious. PolyPhen 2.0 is based on combination of sequence and structure based attributes and uses naive Bayesian classifier for the identification of an amino acid substitution and the impact of mutation. The output levels of probably damaging and possibly damaging were classified as functionally significant (≤0.5) and the benign level were classified as tolerated (≥0.51) (
Adzhubei et al., 2010). PANTHER program which is a protein family and subfamily database which predicts the frequency of occurrence of amino acid at a particular position in evolutionary related protein sequences. The threshold subPSEC score of -3 has been assigned below which the predictions are considered as deleterious (
Thomas et al., 2003). We filtered the nsSNPs that were combinedly predicted to be deleterious and damaging from these three servers. Further we used PhD-SNP, Pmut and MutPred tools to examine the disease-associated nsSNPs. PhD-SNP is SVM based classifier, trained over the million amino acid polymorphism data sets using supervised training algorithm (
Capriotti et al., 2006). It predicts whether the given amino acid substitution leads to disease associated or neutral along with the reliability index score (
Capriotti et al., 2006). Pmut is a neural network based program which is trained on large database of neutral and pathological mutations (
Ferrer-Costa et al., 2005). Pmut uses 3 parameters including mutation descriptors, Solvent accessibility and Residue and sequence properties to calculate the pathogenecity indexes of given input mutation data ranging from 0 to 1. The mutations with index score>0.5 is predicted to be pathologically significant (
Ferrer-Costa et al., 2005). MutPred is a web based tool, used to predict the molecular cause of disease related amino acid substitution (
Li et al., 2009). It uses SIFT, PSI-BLAST and Pfam profiles along with some structural disorder prediction algorithms, including TMHMM, MARCOIL, I Mutant 2.0, B-factor prediction, and DisProt (
Li et al., 2009). Functional analysis includes the prediction of DNA binding site, catalytic domains, calmodulin binding targets, and posttranslational modification sites (
Li et al., 2009). Combining the scores of all three servers, the accuracy of prediction rises to a greater extent and finally we filtered the most disease associated mutation.
Modeling of NPC1 protein
According to the annotated information available in UNIPROT entry - O15118, the predicted deleterious mutation sites of NPC1 protein was observed in the region of 920– 1200. Hence, this protein segment consists of 281 amino acid residues was modeled by I-TASSER server for 3D structure prediction based on threading approach (
Zhang, 2008). This program works by combining the folds and secondary structure by profile-profile alignment threading techniques for non-aligned regions. For the submitted sequences, five 3D models were obtained and the best model was selected based on the lowest energy. Further the native structure was mutated with the most deleterious substitution predicted in this study. To build the mutant structures, we made a point mutation in native NPC1 protein at R1186C (arginine to cysteine), S940L (serine to leucine) R958Q (arginine to glutamine) and I1061T (isoleucine to threonine) using SPDB viewer (
Kaplan and Littlejohn, 2001). These structures were energetically optimized by GROMACS package 4.5.5 (
Hess et al., 2008). During energy minimization both native and mutant structures were solvated in a cubic box with simple point charge (SPC) water molecules at 10Å marginal radius. Initially the solvent molecules were relaxed while all the solute atoms were harmonically restrained to their original positions with a force constant of 100 kcal/mol for 5000 steps. After this, whole molecular system was subjected to energy minimization for 5000 iterations by steepest descent algorithm implementing all atom OPLS force field.
Quantitative assessment and flexibility analysis
VADAR (Volume Area Dihedral Angle Reporter) is a comprehensive web server for quantitative protein structure evaluation. It accepts 3D coordinates of protein as input and calculates, key structural parameters both for individual residues and for the entire protein. These derived parameters can be used to rapidly identify both general and residue-specific problems within newly determined protein structures (
Willard et al., 2003). The VADAR web server is accessible at http://redpoll.pharmacy.ualberta.ca/vadar. 3D coordinates of native and mutant NPC1 structures were given as input to the server for quantitative analysis. Normal mode analysis (NMA) is a powerful tool for predicting the possible movements of a given macromolecule. It has been shown recently that half of the known protein movements can be modeled by using two low-frequency normal modes. NMA provides an alternative to molecular dynamics for the study of motions of macromolecules. A quantitative measure of the atomic motions in proteins can be obtained from the mean square fluctuations of the atoms relative to their average positions. Understanding structural dynamics of proteins is essential for gaining greater insights into their biological functions (
Kumar and Purohit, 2013;
Kumaret al., 2013;
Purohit, 2014;
Rajendran and Sethumadhavan, 2014). Since protein flexibility is important for protein function and for rational drug design. Therefore flexibility of certain amino acids in protein is useful for various types of interactions which can be analyzed by B factor which are computed from the mean square displacement. We used WEBnm (http://www.bioinfo.no/tools/normalmodes) (
Hollup et al., 2005) to calculate the slowest modes of native and mutant NPC1 proteins. 3D coordinates of modeled native and mutants NPC1 structures were given as input to the server.
Results and discussion
Prediction of deleterious nsSNPs using SIFT, PolyPhen 2.0, PANTHER program
Collection of SNP data on the NPC1 gene was retrieved from dbSNP database for our investigation. The SNPs in regulatory regions and coding non-synonymous regions were selected for our studies. Total 103 nsSNPs were computationally analyzed and to identify the deleterious and damaging nsSNPs in NPC1 gene. The SIFT server was used to calculate the tolerance index of all 103 collected nsSNPs by evolutionary conservation analysis. A SIFT score value of<0.05 was considered to be deleterious. Out of 103 input polymorphic data set, 35 mutations (Y634C, N490T, P471L, L737V, R1186H, R1272C, V780M, F842L, R1183H, A1018T, R1186C, V1141G, S652R, S940L, R161W, R404Q, S1169I, A851T, C113R, R958Q, L1213F, V706A, I1061T, A1054T, P1007A, Q775P, C177Y, P434S, S1200G, R978C, A1035V, Y1088C, N1156S, T1036M and T511M) were predicted to be deleterious with tolerance index≤0.05 (Table 1). Among these, 18 mutations Y634C, P471L, R1186H, F842L, R1183H, R1186C, V1141G, S652R, R161W, R404Q, S1169I, C113R, R958Q, L1213F, I1061T, Q775P, S1200G, Y1088C, N1156S and T1036M were reported to highly deleterious with SIFT score of 0.00 (Table 1). PolyPhen2.0 server used to predict the possible impact of an amino acid substitution on the structure and function of a protein. Based on polyphen score, 49 nsSNPs were found to be “damaging” (0.5 to 1.000) to protein structure and function and the remaining 53 nsSNPs were characterized as benign. Among these 49 deleterious nsSNPs, out of which 15 mutations (Y634C, P471L, R1186H, F842L, R1186C, R404Q, S1169I, C113R, R958Q, L1213F, I1061T, Q775P, Y1088C, N1156S and T1036M) were report to be highly deleterious with Polyphen score of 1.000 and all the scores are listed in Table 1. Total of 34 mutations were identified as deleterious in both SIFT and PolyPhen server (Table 1) and it signified that good correlation was observed between these two servers. To further validate these results we carried HMM based statistical prediction method to identify the functionally significant point mutations using PANTHER server. The mutations with subPSEC score less than -3 has been reported to be probably deleterious. 78 mutations with subPSEC score less than or equal to -3 were characterized to be deleterious. Two mutations R958Q and I1061T were predicted as highly deleterious with subPSEC score -8.09327 and -8.53429 respectively. Since the result of PANTHER entirely depends on MSA profile, hence the priority for prediction shall be given to SIFT and PolyPhen scores. We filtered 32 mutations Y634C, N490T, P471L, L737V, R1186H, R1272C, V780M, F842L, R1183H, A1018T, R1186C, V1141G, S652R, S940L, R404Q, S1169I, A851T, C113R, R958Q, L1213F, V706A, I1061T, A1054T, P1007A, Q775P, S1200G, R978C, A1035V, Y1088C, N1156S, T1036M and T511M were commonly predicted to be deleterious mutants and showed good correlations with SIFT, PolyPhen 2.0 and PANTHER servers, which were highlighted as bold in Table 1.
Prediction of disease-associated nsSNPs
We applied Support Vector machine based PhD-SNP tool to further classify the predicted deleterious nsSNP's as disease associated. Total 32 nsSNPs were commonly predicted in SIFT, PolyPhen 2.0 and PANTHER tools (32.96%) were used in PhD-SNP server. Prediction carried out by PhD-SNP depends on intensive supervised training for over million amino acid polymorphism data sets and hence the prediction efficiency is extremely higher.
Out of 32 nsSNPs, 25 mutations Y634C, N490T, P471L, R1186H, R1272C, F842L, R1183H, A1018T, R1186C, V1141G, S652R, S940L, R404Q, S1169I, C113R, R958Q, L1213F, I1061T, A1054T, P1007A, Q775P, A1035V, Y1088C, N1156S and T1036M were predicted to be disease related (84%) and listed in Table 2. To verify this prediction, we further employed artificial neural network (ANN) based Pmut tool and these 32 nsSNPs were also submitted as input to Pmut tool. Out of 32 nsSNPs, 21 mutations were showed pathogenecity and remaining 11 nsSNPs showed as neutral and listed in Table 2.
Seven mutations (V1141G, S652R, R958Q, I1061T, R978C and Y1088C) were showed high pathological phenotype with pathogenicity index greater than 0.8. Two mutations R958Q and I1061T were predicted as highly disease associated with pathogenecity index 0.8995 and 0.9486 respectively (Table 2). Mutpred tool was used to predict the SNP disease-association probability and probable change in the molecular mechanism in the mutant. We found 12 disease associated mutations (16%) of which 4 mutations R1186C, S940L, 958Q and I1061T were commonly showed by the three (PhD-SNP, Pmut and MUTPred) servers (Table 2). Mutations R1186C, S940L, R958Q and I1061 were found to be highly deleterious with actionable, confident and highly confident hypothesis respectively. Finally we screened four most disease associated mutations R1186C, S940L, R958Q and I1061T which were highlighted as bold in Table 2. Our prediction also endorsed with experimental evidences (
Greer et al., 1999;
Millat et al., 1999;
Yamamoto et al., 2000;
Millat et al., 2001;
Ribeiro et al., 2001;
Sun et al., 2001;
Bauer et al., 2002;
Tarugi et al., 2002;
Fernandez-Valero et al., 2005;
Millat et al., 2005).
Molecular modeling
The human NPC1 protein (domain region between 920–1200) was modeled by I-TASSER program based on threading approach. I-TASSER used more than ten templates to model the protein. The top most template (PDB ID: 1b3u) was covered 97% of the NPC1 protein query sequence. The best structure with high confidence score was collected and used for further analysis. The deleterious mutations R1186C, S940L, R958Q and I1061T can possibly alter the native conformation of the NPC1 protein. Hence we made a point mutation in native NPC1 protein at the position of R1186C (arginine to cysteine), S940L (serine to leucine), 958 (arginine to glutamine) and 1061 (isoleucine to threonine) to build the mutant structures. The quality of the modeled structure of native and mutant NPC1 protein was evaluated independently by the PROCHECK (Millat et al., 1996) and PROSA (
Wiederstein and Sippl, 2007) programs, which shows good stereo chemical properties of the modeled proteins. In native NPC1, 94.7% of residues in most favored and allowed regions and z-score value of -2.04. Mutant R1186C showed 92.4% of residues in most favored and allowed region, and z-score value of -1.94. Mutant S940L showed 91.6% of residues in most favored and allowed region and z-score value of -1.81. Mutant R958Q showed 87.6% of residues in most favored and allowed region z-score value of -1.96. Mutant I1061T showed 97.6% of residues in most favored ans allowed region and z-score value of -6.2. The overall G-factors of native and mutant NPC1 structures (acceptable between 0 and 0.5) were produced by PROCHECK in the range of 0.09–0.38. These scores implicate high confidence level and the structures were selected for further quantitative and flexibility analysis.
Quantitative assessment and flexibility analysis
VADAR (
Willard et al., 2003) was used to evaluate different quantitative measures with default parameters. Modeled and validated native and mutant NPC1 coordinate files were submitted as input to the server. We calculate the numbers of hydrogen bonds, total accessible surface area (ASA) and free folding energy of native and mutant NPC1 structures and shown in Table 3. Native NPC1 showed 19809.7 Å
2 of ASA, while mutant structures (R1186C, S940L, R958Q and I1061T) showed 19879.1, 19813.4, 19805.7 and 19973.8 Å
2 of ASA respectively (Table 3). Accessible surface area (ASA) is the exposed surface area of the protein that a water molecule could access or touch. It is believed that ASA is good measure of structure geometry of protein. Expanded structure has more ASA as compared to shrink structure. In our analysis native showed minimum ASA when compared to mutant structures. It indicates that due to mutation in NPC1 protein acquired more expanded structural geometry in turns achieve higher value of ASA. Mutant I1061T showed highest value of ASA than other structures.
Native structure contains free folding energy of - 204.52 kcal/mol, while mutant I1061T contains - 181.33 kcal/mol. Other mutants contain intermediate level of free folding energy (FFE) (Table 3). Native NPC1 protein structure showed least free folding energy as compared to mutant structures. Lowest FFE indicates correct folding which is essential for NPC1 protein function, while higher FFE leads toward misfolding and functionally inactive. It is clear that while mutation in NPC1 acquires higher FFE, it lost its activity. Compare to native and other mutations in NPC1, mutant I1061T showed highest FFE (-181.33 kcal/mol) and it’s responsible for destabilizing the NPC1 structure. Hence, it produces most deleterious effect on NPC1. It indicates the mutation I1061T produced high structural deleterious effect as compared to other mutations.
During the superimposition analysis mutant R1186C, S940L, R958Q and I1061 showed structure deviation of 0.077 Å, 0.082 Å, 0.076 Å and 0.088 Å respectively, when compared to native structure. Superimposed images of native and mutants are shown in Fig. 1(A)–(D). Superimposition analysis with reference to native and mutant structure shows significant deviation at c-alpha carbon atom. Mutant T373K showed higher structural deviations as compared to other analyzed mutation. Estimation of the contribution of hydrogen bonding to protein stability has been made by a combination of experiments on model compounds and site-directed mutational studies (
Harpaz eet al., 1994). The change in amino acid residues is often accompanied with the alterations in interaction pattern, specially the H bond formation of the corresponding protein (
Sunyaev et al., 2000;
Chasman and Adams, 2001). Number of hydrogen bonds was found significantly altered upon mutation. Formations of hydrogen bonds in protein structure affect its conformational flexibility (
Purohit and Sethumadhavan, 2009;
Purohit et al., 2011a, 2011b;
Rajendran et al., 2012;
Balu et al., 2013;
Kumar et al., 2014;
Kumar and Purohit, 2014). More hydrogen bonds lead to rigid structure and vice versa. Mutant I1061T structure acquired 215 hydrogen bonds, while native structure has 221 hydrogen bonds as depicted in Table 3. All mutant structures showed significantly decreased number of hydrogen bonds compared to native structure. Due to mutation the structure got expanded and it was well supported by ASA analysis. We conducted flexibility analysis to confirm the above observations. We analyzed the flexibility behavior of native and mutant NPC1 protein by WEBnm approach. Normal mode analysis (NMA) has become the method of choice to investigate the slowest motions in macromolecular systems. NMA relies on the hypothesis that the vibrational normal modes having the lowest frequencies (also named soft modes) describe the largest movements in a protein and are the ones that are functionally relevant. In WEBnm analysis we observed mode 7 which has lower deformation energy than other mode. Native and mutant fluctuation in mode 7 is shown in Fig. 2(A)–(E). The mutant structures showed more motions and flexibility behavior than native structure. It confirms that due to mutation, NPC1 structure became more flexible in nature and because of this structural flexibility it may lose the correct function and leads to cause Niemann-Pick disease type C1. Our analysis suggests the most disease-associated mutations of NPC1 gene and its structural consequence NPC1 protein.
Conclusions
Computational investigation has now become a roadmap to characterize a standard disease specific SNP at molecular level. In this study we screened four most disease associated mutations (R1186C, S940L, R958Q and I1061T) which are related to Niemann-Pick disease type C1. Threading based approach was used to model the native and mutant NPC1 proteins. Quantitative and structural approaches have also been extensively used to report the structural consequences of the deleterious predicted point mutations. Due to mutation the structure became more flexible in nature and Mutation I1061T showed most deleterious effect than other mutants in NPC1 which was well supported by ASA, FFE and NHbonds. This may affect the structural and functional behavior of NPC1 protein. Our analysis provides a way to detect the Niemann-Pick disease type C1 associated SNPs from the large SNP data set. Also provides a clear cut clue to researchers about how disease-related SNPs, R1186C, S940L, R958Q and R1186C in NPC1 gene affect the structure of NPC1 protein.
Higher Education Press and Springer-Verlag Berlin Heidelberg