Introduction
Proteins are the most abundant bio-molecules in biology. There are about 100 000 different types of proteins that take part in nearly every chemical process of our life (
Brändén and Tooze, 1999). The majority of proteins must fold into compact structures to perform their bio-functions. As the bio-chemical reactions among bio-molecules require the atoms that react to each other to be close less than an angstrom scale, the protein structures are extremely specific as if the conformations are the elaborate scaffolds of the corresponding reaction. Therefore, a misfolding of protein can induce an alteration of protein’s bio-properties, and result in disease consequently.
As a general causation of illness, protein misfolding is responsible for various types of diseases. Some proteins that have distinct structural change were reported as pathogenic molecules in classical cases of conformational diseases (CDs), such as prion for transmissible spongiform encephalopathy (TSE), and β
2-microglobulin for dialysis-related amyloidosis. More than 30 different human diseases are related to conformational conversion (
Thomas et al., 1995;
Kelly, 1996;
Carrell and Lomas, 1997;
Carrell and Gooptu, 1998;
Soto, 2001). Moreover, besides these classical cases, there are lots of diseases that can be investigated in the context of structural change, such as the H5N1 avian influenza virus (
Bornholdt and Prasad, 2008) and A(H1N1) global pandemic outbreak in 2009 (
Garten et al, 2009;
Liu and Zhao, 2010c). Therefore, CDs are not rare, but are responsible for the development of a wide range of diseases.
Several factors are related to the pathogenic structural change of protein, including multiple stable states, lifespan, molecular environment, evolution, and so on. As each of them is extremely significant, it is hard to make a rank of their importance.
Multiple stable states
From a physical point of view, a sequence folds into its native structure that has a free energy minimum. In the folding process, a polypeptide chain makes a stochastic search of many conformations that are accessible (
Wolynes et al., 1995;
Dill and Chan, 1997;
Karplus, 1997;
Dobson et al., 1998;
Dobson, 2004). Consequently, the free energy of the polypeptide chain can be described as a function of its conformational properties. As shown by the energy landscape in Fig. 1, there are usually several metastable states for a protein related to pathogenic structural change. It can be deemed that a protein can be in any of these states, but with different probabilities. Once the protein folds into a conformation other than the ‘healthy’ state, disease can be developed potentially.
Lifespan
Some wild type proteins, such as the α-synuclein that is associated with Parkinson’s disease (
Hardy et al., 2006), can induce conformational disease in old age. On the other hand, some fatal disease, such as hereditary cerebral angiopathy (
Abrahamson, 1996;
Ólafsson and Grubb, 2000) and TSE, are reported with sporadic events. There are evidences that conformational disease is correlated tightly with human lifespan (
Liu and Zhao, 2010b). All proteins can fold into pathogenic state, and have an inherent tendency to aggregate (
Dobson, 1999). Selection pressure of evolution has made a filtration, and resulted in proteins that can resist aggregation and other pathogenic state during normal lifespan. The incidence of fatal diseases has been depressed by low probabilities of changing structures from native to morbid states (
Liu and Zhao, 2010b). While evolution can do no better than the necessity of enabling us to transmit our genes to offsprings (
Dobson, 2002); thus, to prolong life is to cope with the proliferation of these diseases, and to challenge the nature of evolution.
Molecular environment
The folding process depends on the environment in which the folding takes place. Protein folding beginning as a nascent chain is still attached to ribosome (
Hardesty and Kramer, 2001), and complete the major part step by step after release from the ribosome. As only partially fold, some regions that are buried in the native state can be exposed. Such structures are prone to contact with other molecules inappropriately (
Hore et al., 1997;
Capaldi et al., 2002). But the living systems have developed a range of elaborate strategies to make the completion of correct folding prior to mistake interactions (
Gething and Sambrook, 1992;
Hartl and Hayer-Hartl, 2002;
Dobson, 2003), such as using molecular chaperones and folding-accelerate catalyst. A typical example is aiding the folding process by molecular chaperonin GroEL that contains a cavity in which incompletely folded polypeptide chains can be held and protected from the outside world. There are large numbers of molecular chaperones that are present in all types of cells and cellular compartments (
Dobson, 2004), and nurse the proteins by interacting with nascent chains as they emerge from the ribosome, or by binding non-specifically to protect aggregation-prone regions, and so on. Error in such quality control mechanism can cause disease.
Evolution
Conformational disease is a phenomenon of evolution. Therefore, evolution is a major factor of protein misfolding either for the past, the present, or the future. In modern life, there are lots of practices that are not experienced during previous evolution, such as new agricultural practices (
Prusiner, 1997), a changing diet associated with type-II diabetes (
Höppener et al., 2002), new medical procedures associated with iatrogenic Creutzfeldt-Jakob diseases (
Prusiner, 1997). As the practices are introduced much more rapid than evolvement, we fatally have not enough time to set up effective protective mechanisms (
Dobson, 2002).
Many efforts have been made to investigate protein misfolding in computational approaches. Due to the increased power and accuracy, such approaches have attracted many concerns. In order to present readers an outline of this field, I review the recent developments and suggest several points for further studies.
The pathological structure of disease-related protein
Clinical reports
There are several points that have been reported clinically as the features of pathological structure of disease-related protein.
Aggregation
One of the typical features of conformational disease is the formation of insoluble protein, amyloid, that deposits in tissues. For example, the normal prion protein is found on the membranes of cells throughout the body, even in healthy people and animals. In TSE, the soluble cellular isoform of prion (PrP
C) folds inappropriately into the scrapie isoform (PrP
Sc), which accumulates and forms fibrils in brain tissue, causes tissue damage and cell death, and leads to degeneration of nerve system. In this process, the normal alpha helix rich PrP
C changes its structure into the pathological beta sheet rich PrP
Sc (
Kuwata et al., 2007). Similarly, there are obviously structural changes in many classical disease-related proteins also, such as insulin and serpins. Whereas, there are also amyloidogenic proteins that form fibrils in their native state in globular form, in which the local structural change is significant, e.g. β
2 microglobulin, transthyretin, lysozyme (
Thomas et al., 1995;
Kelly, 1996;
Carrell and Lomas, 1997;
Carrell and Gooptu, 1998;
Soto, 2001).
‘Steric zipper’ β-sheets amyloid architecture
It was found that the fibrillar structures of various proteins have very similar morphologies, whereby pairs of parallel/antiparallel β
-sheets form a dry interface running perpendicular to the fibril axis (
Nelson et al., 2005;
Sawaya et al., 2007). It is clear that the core structure of the fibrils is stabilized primarily by interactions, particularly hydrogen bonds, involving the polypeptide main chain. As the main chain is common to all polypeptides, this observation explains why fibrils formed by polypeptides of very different amino acid sequences are similar in appearance (
Dobson, 2004).
High toxicity of the early pre-fibrillar aggregate
The insoluble protein mass can disrupt the functioning of specific organs (
Pepys, 1995), or result in the loss of functional protein that leads to the failure of some crucial cellular process (
Thomas et al., 1995). It has been suggested that the early pre-fibrillar aggregates of proteins are highly damaging to cell (
Koo et al., 1999;
Caughey and Lansbury, 2003). Moreover, it has become clear that the pre-fibrillar aggregates are toxic through a less specific mechanism, such as the exposure of non-native hydrophobic surface (
Polverino et al., 2003;
Stefani and Dobson, 2003). For example, the pre-fibrillar aggregates of several non-disease-related proteins can be as cytotoxic as those of amyloid
β-protein (
Bucciantini et al., 2002). By contrast, the mature fibrils have much lower toxicity than that of their precursors (
Walsh et al., 2002;
Caughey and Lansbury, 2003).
Folding nucleus is distinct from aggregation nucleus
The roles of individual residues in the folding process have been investigated by site-directed mutagenesis. A wide range of studies suggest that there are a small number of key residues, which form the folding nucleus of a protein (
Matouschek et al., 1989;
Fersht, 1999,
2000); the collapse of the polypeptide chain to stable compact structure can occur only after the majority of the folding nucleus have been formed, i.e., the native structure is a consequence of the formation of folding nucleus. If these key interactions are not formed, the protein cannot usually fold directly to a stable globular structure. It prevents protein misfolding by prolonging the unfolded state. As a result of such ‘quality control’ process, native structure is formed prior to the incorrect one (
Vendruscolo et al., 2001;
Davis et al., 2002;
Makarov and Plaxco, 2003). On the other hand, investigations of the mechanism of amyloid formation suggest that there are aggregation-prone regions, ‘hot spots’ of fibril formation, that are considered to be responsible for aggregation (
Ivanova et al., 2004;
Ventura et al., 2004). An important observation is that the residues of the folding nucleus are distinct from those of aggregation (
Chiti et al., 2002). It means that the evolutionary pressure may select sequences that favor the assembly process of folding other than aggregation.
Besides the aforementioned points, there are also other pathogenic features in classical disease-related protins, such as the protein unstable in haemoglobin and serpins, structural topological change in apolipoprotein AI. As numerous efforts have been focused on the classical cases of conformational diseases, their features of pathological structure have been revealed more than those of non-classical one. In the latter, it has been suggested that the pathological structure of disease-related protein can introduce new targets to human immune system, and be associated with the highly pathogenic H5N1 avian influenza and A(H1N1) 2009 global pandemic (
Liu and Zhao, 2010c).
Computational approaches
Since aggregation appears in the majority of classical conformational diseases, amyloidogenic mechanism is the main focus of computational approaches.
Prediction of amyloid core
The discovery of aggregation-prone region has promoted the development of a number of algorithms and models for predicting the aggregation propensity of proteins (
Fernandez-Escamilla et al., 2004;
López and Serrano, 2004;
Yoon and Welsh, 2004;
Pawar et al., 2005;
Sánchez et al., 2005;
Bemporad et al., 2006;
Caflisch, 2006;
Galzitskaya et al., 2006;
Saiki et al., 2006;
Zhang et al., 2007). Some of them possess quite good capability in identifying residues that are buried in the amyloid core, and have become facility tools in the analysis of pathological structure. As the amyloid core observed in experimentation often contains a large number of residues, it is difficult to rank the significance of individual residues in the amyloidogenic mechanism. Complementarily, the pre-residue score in these algorithms may provide such information, and aid in the comprehension of amyloid in the coming efforts.
Prediction of pathological structure
Because there is usually a block of proteins in the amyloid, the purification of pathological structures is quite difficult in experiment. Thus, there is a shortage of high quality coordinates. Computational approach has been an efficient way in uncovering such information. For example, in contrast to the high-resolution data for the PrP
C, the structure of PrP
Sc is largely unknown. Huang, Prusiner, and Cohen have developed a three-dimensional model of PrP
Sc with a combination of computational techniques and experimental data (
Huang et al., 1996). However, there are lots of unsolved problems, and it remains a hot spot in recent studies (
Smirnovas et al., 2009).
Investigation of interactions in fibril
Computational approaches have gained great achievement in the study of molecular architecture of protein amyloid, particularly for cases in lack of pathological structure. For example, the molecular architecture of PrP
Sc amyloid has been investigated with theoretical models by Govaerts et al. and DeMarco et al. (
DeMarco and Daggett, 2004;
Govaerts et al., 2004). Features of the interaction surface in amyloidogenic regions can also been investigated by a combination of several different algorithms (
Castillo and Ventura, 2009).
Investigation of the early steps of aggregation
Since the early oligomers during the aggregation process are the primary toxic species in amyloidosis, the investigation focusing on the initial assemblies of oligomers is one of the hot spots at present. However, the transient/premature atomic details are difficult to be characterized using biophysical methods (
Hardy and Selkoe, 2002;
Bitan et al., 2003;
Mastrangelo et al., 2006). Therefore, computational approaches contribute great to this field. Although only very short polypeptide (usually less than ten residues) can be simulated, such efforts have achieved some patterns of self-assembly, which could be useful for the studies of true amyloidosis (
Wei et al., 2007).
Computational assistant treatment
Scientists have attempted to prove their theory of amyloidosis with experimental efforts. In 2002, López et al. designed amyloid hexapeptide sequences using a computer-designed algorithm. Sequences with a high propensity to form homopolymeric β-sheets were validated experimentally. It has been shown that the
de novo designed peptide self-associates efficiently into β-sheets. Whereas, some point mutations that were predicted to be unfavorable for fibrils inhibited the polymerization. The delicate balance of interactions involved in fibrils formation to those in more disorder aggregates was uncovered (
López et al., 2002). Other evidences suggested that the increased aggregation propensity and the decreased stability of amyloid protein are significantly correlated with the decreased patient survival (
Meiering, 2008). Therefore, the computational assistant analysis and treatment may promise new clinical therapeutics in amyloidosis.
The switch of pathogenic structural change
Clinical reports
Either the preliminary or the later stage of the process of pathological conformational conversion, e.g. the protein aggregation, is the result of switching on the misfolding pathway of the native state. It is ideal for a treatment to cure a disease by prohibiting proteins falling into the misfolding pathway; that is, by preventing the disease-related misfolding other than coping with the mass of subsequent pathological changes of misfolding. As sites significant for switching on the misfolding pathway can be perfect binding targets for drugs in clinical treatment, many efforts have been made to note them experimentally, using information such as disease-related point mutations observed in clinical practice, sites involved in the inhibition of the pathogenic change process, and the abnormal cleavage site responsible for amyloidosis (
Liu and Zhao, 2010c).
A typical study dealing with switch sites was reported by Kuwata et al. (
Kuwata et al., 2007). Based on a series of relevant works, the authors selected 14 amino acid residues for an in-depth investigation. The switch region that is responsible for the pathogenic structural conversion of prion protein is identified as a pocket formed by five residues. The intercalation of an anti-prion compound GN8 to this pocket can inhibit the pathogenic change process of the prion protein, and prolong the survival of TSE-infected mice.
There is evidence that the amyloid-related mutations are not necessarily involved in aggregation-prone regions. So it is still not clear whether switch sites occur in hot spots of aggregation or not (
Ivanova et al., 2004;
Ventura et al., 2004;
Liu and Zhao, 2010b). It is also an open question whether such a difference is related to the distinction between folding nucleus and aggregation nucleus.
Prediction of switch region responsible for pathogenic structural change
Usually, the identification of switch sites needs large amount of experiments. The literatures may even conflict with each other. Moreover, the long duration is another important restriction. A prediction algorithm of switch region can help the comprehension of clinical reports, speed up the clinical investigation, decrease the hardness and knowledge threshold especially for the inexpert and people in interdisciplinary sciences. However, such prediction is difficult and in absence for a long time because multiple factors must be jointly considered.
Based on a joint consideration of protein stability and the selection pressure of protein evolution, we have developed the first algorithm in the prediction of switch region responsible for pathogenic structural change (
Liu and Zhao, 2010c). The remote homologous relationships among polypeptides can be identified with a high accuracy based on the discovery of the significant role of molecular mechanics properties in protein evolution (
Liu and Zhao, 2009a,
2010a). Using this highly accurate algorithm, it was revealed that there are only two major clusters in the phase space of polypeptide: the helix-donut zone that consists of the helix segments and the N/C-terminal helix caps, and the strand-arc part that is mainly comprised of β-sheet segments and the N/C terminal strand caps. A query protein is treated as successive residue segments (
Liu and Zhao, 2009b). In the native fold of a protein, each segment belongs to one of the clusters in the phase space. Its probability of being in the other cluster determines the capability of each segment in arousing pathogenic structural change.
The algorithm can identify the residue segments that are responsible for the start of pathogenic structural changes with an accuracy of 94%, and find the residues that are tightly associated with conformational diseases about eight times the capability of random dicing (
Liu and Zhao, 2010b). It would be a useful tool in identifying the riskiest region of a protein, and form a foundation for further investigation.
Summary and outlook
Here I review the investigations of protein misfolding in a full process of the formation of pathogenic change, including studies about switching on the misfolding pathway, the early step of aggregation, the final pathological structure per molecule, hot spots of amyloid, the interactions in morbid polymer, and architecture of amyloid. This review provides typical references of clinical and computational analysis in each step therein, and singles out some important conceptions in the comprehension of protein misfolding, such as multiple stable states, energy landscape, contribution of protein evolution, high toxicity of the early pre-fibrillar aggregates, amyloid architecture, the difference of folding nucleus and aggregation nucleus, and the difference of switch region and amyloid core. I hope it is helpful for the readers to comprehend the outline of pathogenic structural change and conformational disease swiftly.
As shown in the aforementioned text, there are computational efforts for each step of the misfolding process. Although the present letter mainly focuses on the investigation of the disease-related proteins, there are notable progress in the algorithm of protein-protein interaction (
Bonvin, 2006;
Gray, 2006), medicine design (Available Chemicals Directory, MDL Information System, San Leandro, CA), toxicity evaluation (Toxicity, www.symyx.com), and so on. One of the bottle-neck of computational approach is that, as an all-around study cannot be accomplished with the present computational technique, the significant sites must be selected according to experimental reports. With the development of the algorithm of switch region prediction, some significant sites can be identified computationally. As conformational change is responsible for many diseases, this would benefit large amount of research. Investigating diseases in an aspect of structural change can be a promising methodology in pathology (
Liu and Zhao, 2010c).
Higher Education Press and Springer-Verlag Berlin Heidelberg