Introduction
The impact of computational methods in modern drug discovery process cannot be over emphasized. Early stages of drug discovery routinely use bioinformatics methods to efficiently analyze genomic data (
Yao et al., 2009). Biomolecular modeling methods are being successfully applied to lead identification and optimization (
Coumar et al., 2010;
Jamieson et al., 2010;
Muddassar et al., 2010;
Murumkar et al., 2010;
Talele et al., 2010;
Ogata et al., 2010;
Postigo et al., 2010). Pharmacokinetic/pharmacodynamic/pharmacometric methods model biological/ physiological responses of drugs and play an important role in the decision making part of the drug development phase (
Rajman, 2008). Last but not the least, computational methods help design optimized clinical trials (
Holford et al., 2010). Computational methods are increasingly used to complement experimental methods (
Phatak et al., 2009) and have proved to reduce costs and time without compromising the quality of the drug discovery process. Such computational methods differ in their genesis (e.g. computer science, physics, mathematics) but share a common goal: solving biological problems. This commonality allowed researchers to encompass these methods in an interdisciplinary field termed computational biology. Computational biologists develop theoretical and/or knowledge-based predictive models of the biological system of interest among other applications (
Kiran et al., 2008;
Laursen, 2009). Protein modeling is one such important method that attempts to construct near-native 3D models of proteins that do not have an experimentally solved structure. The development of protein modeling methods is an actively pursued area of research because: (a) experimental methods may never provide structures for every protein (
Baker and Sali, 2001), and (b) such models are required for the increasingly applied structure-based methods for drug discovery (
Barcellos et al., 2008;
Cavasotto and Phatak, 2009;
Daga et al., 2010). However, these structures (modeled or experimental) are mere snapshots. The native state of a protein is not a static structure, but consists of an ensemble of structures that interconvert in a dynamic equilibrium. These spontaneous interconversions range from vibrations of individual bonds to domain rearrangements. In addition to passive action, there is also evidence that these fluctuations actively drive processes including substrate binding and ion channel gating (
Bahar et al., 2010). The widespread use of molecular dynamics in the study of biological molecules represents an acknowledgment of the critical importance of molecular motions in biological function. This review primarily provides a succinct background of homology modeling and molecular dynamics methods, their recent developments and selected applications in the drug discovery field.
Protein modeling
Overview
3D protein structures potentially rationalize the biology (molecular and system level) of a validated drug target (
Cozzetto and Tramontano, 2008), and have been successfully applied to diverse structure-based drug design projects (
Dessailly et al., 2009;
Grant, 2009;
Myler et al., 2009). However, the key to their applicability is their availability. Pragmatic approaches (e.g., focusing on solving representative protein structure per therapeutically important target family (
Dessailly et al., 2009) and technological innovations (X-ray crystallography and NMR techniques (
Montelione and Szyperski, 2010) from several structural genomics projects (
Weigelt, 2010), though successful, have failed to narrow the ever increasing available-sequence:available-structure gap (
Malmström and Goodlett, 2010). Computational modeling of proteins offers an alternative vista in the pursuit of 3D protein structures and may be considered reasonable substitutes until a corresponding experimental structure is determined.
Protein modeling can be broadly divided into two main categories: template-based and template-free modeling. Figure 1 demonstrates a representative workflow for protein modeling methods. Template-based models exploit the rationale that similar sequences share similar protein structures. Homology modeling (
Xiang, 2006;
Cavasotto and Phatak, 2009;
Daga et al., 2010) is an avant-garde method in template-based modeling though threading (
Peng and Xu, 2010;
Trojanowski et al., 2010;
Zhou and Skolnick, 2010)/fold-recognition approaches (
Deschavanne and Tufféry, 2009;
Lampros et al., 2009;
Lobley et al., 2009;
Yan et al., 2009;
Ying et al., 2009) are being increasingly developed and applied. Homology modeling follows a well-established protocol to develop models for proteins with unknown structures (targets): identifying homologous protein structure(s) (template(s)), aligning the target and template sequences, building and subsequently refining models. Threading or fold recognition approaches exploit the finite number of protein structure folds in nature. Scoring functions are developed based on the known relationships between sequences and structures. Subsequently, models are built based on that template which provides a maximum score post sequence alignment. This approach has shown promising results in critical assessment of techniques for protein structure prediction (CASP) competitions, particularly in its ability to detect remote homologs and target-sequence alignment (
Kryshtafovych and Fidelis, 2009). It is routine to obtain models with 1-2 Å deviation from the experimental structure for closely homologous sequences (
Ginalski, 2006;
Burley et al., 2008). Automated servers based on template-based modeling methods now provide good quality models with minimal human intervention for simple cases (
Kim et al., 2004;
Grünberg et al., 2007;
Bordoli et al., 2009;
Hildebrand et al., 2009;
Roy et al., 2010).
In the absence of any template (homologs or not),
de novo methods using knowledge and/or physics-based potentials are used to construct 3D structural models. Free,
ab initio or
de novo modeling is the umbrella terms for structure prediction methods that attempt to construct protein models using knowledge and/or physics-based approaches when no structural homolog exists. Rosetta (
Das and Baker, 2008;
Kaufmann et al., 2010) uses a knowledge-based approach to construct initial models, which in turn are refined by all-atom physics-based approaches (
Bradley et al., 2005;
Das et al., 2007). Threading assembling refinement (TASSER) from Wu et al. (
2007) is another promising method which uses only knowledge-based approaches for
de novo protein modeling. Presently,
de novo model building exercises are limited to smaller proteins (120-150 amino acids) and rarely obtain models<2 Å resolution (
Zhang, 2008). However in either case the caveat is that protein modeling, by definition, is error-prone. This review will highlight selected recent comparative modeling and refinement methods.
Refinement and optimization of protein models
Comparative modeling often introduces errors due to: e.g., (a) choice of template(s) and sequence alignment (
Larsson et al., 2008); (b) low sequence identity (<30%) between the target and template sequences (G- Protein coupled receptors are an exception due to their structural signatures); and (c) flexible loop regions, particularly when involved in ligand binding. A plethora of bioinformatics approaches have been developed to minimize errors associated with the identification of homologous proteins from protein databases such as the Protein Data Bank (PDB) and subsequent sequence alignment. These include techniques based on pairwise sequence comparison (
Dunbrack, 2006;
Lu et al., 2008;
Jayalakshmi et al., 2010), machine learning methods (
Lee et al., 2008) e.g., hidden Markov models (
Bystroff and Krogh, 2008;
Karplus, 2009), neural networks (
Mooney and Pollastri, 2009;
Walsh et al., 2009), and support vector machines (
Fariselli et al., 2007) among others. They attempt to maximize the sensitivity (i.e., the ability to identify distant relationships) and selectivity (i.e., to return a minimal set of false positives). Even then the constructed comparative model may be inaccurate.
Modeling errors, particularly incorrect side chain orientations in binding pockets, can negate their applicability in drug discovery applications. Two themes dominate the current methodologies of refining comparative models. In the first, known ligand information is used to optimize the pharmaceutically relevant binding pocket. In the MOBILE approach developed by Evers et al. (
2003), spatial knowledge of ligands placed into an averaged binding site of crude ensemble of protein models is used as restraints to generate a second generation of refined homology models. The method was successfully applied to identify sub-micromolar inhibitors for the neurokinin-1 receptor (
Evers and Klebe, 2004). Moro and colleagues used a soft-docking approach to generate initial ligand poses within the binding pocket and reconstructed the homology model of A3 adenosine receptors with optimized side-chain orientations to accommodate the ligand pose without the soft-docking approach (
Moro et al., 2006). Continuing on this theme, but by incorporating a docking-based Monte Carlo approach and full receptor:ligand flexibility, Cavasotto et al. developed and applied the ligand-steered modeling method in multiple structure-based drug discovery projects targeting melanin concentrating hormone receptor 1 (
Cavasotto et al., 2008) and cannabinoid receptor 2 (
Diaz et al., 2009a, 2009b) (Fig. 2). Recently, Phatak et al. (
2010) extended and benchmarked the ligand-steered modeling method by retrospectively cross modeling Class A G-protein coupled receptors of Beta2-adrenergic and adenosine-2A. The models obtained near-native co-crystallized ligand poses, showed promising selectivity profiles and enrichment rates as compared to crystal structures and crude homology models respectively. Radestock et al. (
2008) manually refined the side-chain orientations to map the corresponding receptor:ligand interactions in the metabotropic glutamate receptor 5. These were then encoded as a simple binary scoring function and compared against conventional scoring functions in a retrospective virtual screening exercise. Neves et al. (
2010) optimized receptor binding pockets of CXCR4, an anti HIV target, using diverse chemotype ligands in a cyclic combination of elastic network normal mode analysis for backbone/side chain sampling and full flexible receptor:ligand docking for model selection.
The second theme for refinement uses molecular dynamics (MD) methods. Michino et al. (
2010) developed a two-step method for the seven transmembrane (7-TM) GPCRs termed foldGPCR. Using tertiary restraints from the template structure, and simulated annealing process the 7-TMs of beta-2 adrenergic receptor and its known ligand carazolol, is assembled in the first step. An ensemble of complexes is generated by random rotation of the helices and
x-axis translation of the ligand. The averaged structure is then refined using a combination of generalized Born (GB) implicit membrane model and replica exchange sampling methods. The method retrospectively predicted low RMSD beta2-adrenergic model derived from a rhodopsin template (TM RMSD ~2 Å and ligand RMSD ~1 Å). Zhu et al. (
2008) applied temperature-based replica exchange molecular dynamics (REMD) method to refine homology models. First, structural errors in side chains and loop regions of models were corrected using a variety of toolkits. Next, the models were refined using REMD protocol implemented in the GROMACS package, where the replicas explored a range of temperature space to potentially allow the system to access conformational space that would be prohibited at standard temperatures. Finally, a combination of statistical potentials and scoring functions were used to identify near-native conformations. To offset the disadvantage of using a large number of replicas to cover a given temperature range, Kannan and Zacharias (
2010) proposed a biased potential REMD flavor to refine protein models. The biasing potential for the phi and psi angles of the backbone dihedrals permits transitions and thus flexibility of the backbone by lowering energy barriers. This explicit solvent BP-REMD approach showed promising results for loop prediction for structures from the Rosetta decoy set at considerably shorter simulation time frames (~10 ns). Kimura et al. (
2008) developed and validated protein models for chemokine receptor 2 (CCR-2 of the GPCR family) using a ligand-free pressure-based steered molecular dynamics method. The binding pocket was optimized by gradually increasing the radii of pre-populated Lennard-Jones particles during the course of the molecular dynamics simulation. To prevent any unreasonable shape of the pocket, the particles were tethered via harmonic bonds to four nearest neighbors and the backbone dihedrals of the transmembrane helices were restrained. Other approaches include the use of: low-frequency normal modes (
Rai et al., 2010), a combination of normal modes and molecular dynamics (
Stumpff-Kane et al., 2008), knowledge-based refinement methods (
Chopra et al., 2010), and restraints derived from cryo electron microscopy maps (
Zhu et al., 2010).
Applications of protein models to drug discovery projects
Homology models have been widely applied to a variety of structure-based drug discovery projects. Here, we list a few selected projects that have resulted in active hits against therapeutically relevant proteins. Table 1 shows other applications of homology models to drug discovery.
Emeria tenella CDK-related kinase 2 (EtCRK2) is a validated target against coccidiosis. Using an EtCRK2 homology model and chemoinformatic methods, Engels et al. (
2010) identified 293 hits via a virtual screening exercise (hit rate ~6%). Four compounds were classified as leads and one of them belonged to the benzimidazole carbonitrile class of compounds which were not previously documented as cyclin-dependent kinase activities. Sisay et al. (
2010) used a comparative model of matriptase-2, a target for the treatment of systemic iron overload. Side chain conformations of the non conserved residues in the active pocket were manually adjusted by using low energy rotamers and loop regions were modeled to mimic the template protein matriptase-1. This rational medicinal chemistry effort based on pre-existing knowledge of known inhibitors and crystal structure complexes of other matriptase family proteins resulted in two small molecule inhibitors with
Ki values of 170 and 460 nmol/L, respectively. NF-kappaB inducing kinase (NIK) is an understudied enzyme involved in the downstream of several tumor necrosis factor receptors implicated in the pathogenesis of rheumatoid arthritis. The lack of potent and selective inhibitors for NIK drove this study, where a previously built homology model of NIK by the same group (
Mortier et al., 2010) was used to virtually screen compounds from the ZINC database (
Irwin and Shoichet, 2005). The compounds were scored using three scoring functions and ranked based on a consensus ligand efficiency function. Top ranking compounds were visually inspected for the validated hydrogen bond to Leu472 and subsequently tested. One 4
H-isoquinoline-1,3-dione scaffold-based compound and one of its 16 analogs were found to be micromolar hits (51-91 µmol/L) (
Mortier et al., 2010). Medina-Franco’s group screened natural products from the ZINC library (
Irwin and Shoichet, 2005) against the homology model of the catalytic domain of DNA methyltransferase, an anti-cancer target. A multi-step docking approach using three docking programs (AutoDock (
Goodsell, Morris et al. 1996), Glide (www.schrodinger.com) and Gold (
Verdonk et al., 2003)). 58 hits were identified from ~89K compounds. This was the first reported study involving the virtual screening of a large database of natural products (
Medina-Franco et al., 2010). Homology models of histone deacetylase (HDAC) 1 and 6, another oncogenic target, were used to develop potent and selective inhibitors for HDAC 6. The models were developed using iTASSER platform. The structural differences between the two isozymes (e.g. wider and shallower catalytic channel rim) were used to design carbazole hydroxamic acid based compounds with bulkier and short aromatic moieties to selectively target HDAC 6 (
Butler et al., 2010). Several other examples of lead identification on a wide variety of protein targets have been reported elsewhere (
Hu et al., 2009;
Ostrov et al., 2009;
Odell et al., 2010;
Sierecki et al., 2010).
Molecular dynamics
Overview
In molecular dynamics simulations, a set of points in 3-dimensional space represent the configuration of the simulated system (e.g. a protein in water). The positions of these points, which represent the molecular conformations, are propogated in time. Snapshots of the system may be taken to form an ensemble of conformations representing possible molecular arrangements. An example of such an ensemble is illustrated in Fig. 3. These points are usually positioned on atomic sites or functional groups, and represent their molecular motions. The interactions between sites are represented by a function which gives the system energy as a function of all the site positions, and is usually called a “force field” . Force field implementations typically define parameters for each site. For instance, a site representing an atom may be described by a charge and van der Waals radius. These parameters must be determined, either using experimental data, or resulting from high-level quantum mechanics simulations. Methods for deriving more accurate force fields are an intense area of current research. Of note for drug discovery is the development of “polarizable” force fields (
Jiao et al., 2008;
Jiao et al., 2009), in which atomic parameters (specifically the charges) may change in response to changes in environment. This can be expected to improve predictions of binding free energy, where the molecular environment of a ligand may change greatly during binding. For the interested readers, more details of this procedure and other aspects of molecular dynamics can be found in various references (
Allen and Tildesley, 1989;
Leach, 2001;
Frenkel and Smit, 2002). In this section, we describe a representative set of applications of molecular dynamics for therapeutic drug design.
Targeting receptor flexibility
Molecular dynamics may be useful for tackling the challenge of receptor flexibility in structure-based drug design (
Marco and Gago, 2007;
Morra et al., 2008). Virtual screening using rigid receptor docking potentially overlooks an abundance of drug candidates, so flexible docking is currently an area of intense research (
Cozzini et al., 2008;
B-Rao et al., 2009). Since molecular dynamics samples conformational flexibility, a number of algorithms based on molecular dynamics have been naturally developed. One such example is from McCammon et al., who have shown that incorporating a short molecular dynamics refinement step into an iterative docking method gives improved results and can find alternative docking poses which otherwise would be missed (
Lin et al., 2002). A typical protocol for incorporating molecular dynamics into the docking procedure to obtain multiple poses (of receptor and ligand) is shown in Fig. 4. As a note, even if molecular dynamics simulation is not directly used in the docking process for virtual screening, concepts from molecular dynamics force field development have shown their influence on docking techniques. The problem of calculating the interaction free energy between ligand and receptor is a natural fit for concepts and techniques of force field development. This can be seen in the development of programs such as AutoDock (
Goodsell et al., 1996) and APBS (
Baker et al., 2001) showing the wide influence of molecular dynamics on the field of molecular modeling.
Performing standard molecular dynamics on a receptor may still be too computationally expensive for virtual screening, but variants of molecular dynamics have been developed to reduce the computational cost. One often-used technique is the temperature replica exchange method (
Sugita and Okamoto, 1999). Briefly, several replicas of the system of interest are run in parallel using molecular dynamics at different temperatures. Periodic exchanges are performed between replicas such that the correct sampling is achieved at each temperature. This allows for a faster search of configurational space, and the parallel nature of this algorithm is well suited to modern cluster computing architecture. Examples of the application of replica exchange molecular dynamics include the work by Gerek and Ozkan (
2010) who, in a ligand binding replica exchange simulation study of postsynaptic density-95/Dlg/ZO-1 PDZ domains, used dihedral restraints on the receptor to induce a bias toward binding site conformations, further reducing the simulation time while still allowing for binding site flexibility. Of course, replica exchange can also be used to efficiently sample ligand populations as well (
Okumura et al., 2010). However, due to the computational expense of standard molecular dynamics, its practical use in high-throughput virtual screening is limited to detailed studies of a limited number of selected compounds, late in the screening pipeline (
Okimoto et al., 2010). This may change in the future with improvements in computing hardware.
An alternative approach to achieve biological timescale simulations is the development of MD specific computing hardware (
Klepeis et al., 2009;
Zwier and Chong, 2010). MDGRAPE-3 developed by RIKEN research is one of the first examples of such hardware. It reportedly performed up to 40 times faster as compared to single conventional processors (
Kikugawa et al., 2009). Recently, the group of Shaw et al. (
2007) developed specialized parallel computing hardware (Anton) to conduct microsecond simulations at greatly increased speeds. Of late, the inherent architectural features of general-purpose graphical processing units (e.g. high bandwidth memory systems, hardware multithreading, parallel data handling) are exploited for molecular dynamics and other modeling problems (
Stone et al., 2010).
Membrane protein simulations
Approximately 30%-50% of current drug targets are membrane proteins (
Gruber et al., 2010). However, the experimental structure determination of membrane proteins presents technical challenges (
Cherezov et al., 2010) and the current number of unique structures determined is ~300, far behind the number for soluble proteins (
White, 2004;
Cherezov et al., 2010). Thus, post G-protein coupled receptor (GPCR) structure modeling with molecular dynamics simulations takes on a vital role in structure-function studies of membrane proteins. Molecular dynamics simulations of membrane proteins, e.g., GPCRs, offers an exciting avenue to elucidate details of protein-membrane interactions (
Lindahl and Sansom, 2008;
Khalili-Araghi et al., 2009). The potential barrier to these atomistic MD studies is the size of such systems and limited timescales (
Lindahl and Sansom, 2008). Lately, coarse-grained approaches that aim to reduce the magnitude of interactions, increase the timesteps and speed up the simulation appear to be in vogue (
Bond et al., 2008;
Sansom et al., 2008). Specially developed interaction parameters and force fields (e.g. MARTINI(
Marrink et al., 2007)), hardware and software(
Phillips et al., 2005) are steadily contributing to the advances in membrane simulations. Please refer to the excellent reviews on this subject by Khalili-Araghi et al.(
2009) and Pandit and Scott (
2009).
Protein structure-function analysis
Despite the ever-increasing number of available protein structures, many systems of interest lack structures, either wholly or partially, limiting our ability to perform structure-function analysis. In particular, molecular docking depends heavily on the availability of accurate receptor structures. Perhaps not all residues are seen in a crystallography experiment, or only the unbound form of a receptor is known. In these cases, molecular dynamics is a useful tool, and often the only reasonable choice, for obtaining a suitable receptor structure for docking (
Marco and Gago, 2007). In this context, molecular dynamics is often combined with homology modeling to generate a reasonable structure suitable for further study. A standard protocol may involve, for instance, first generating a homology model followed by energy minimization and molecular dynamics using the GROMACS simulation package (
Kutzner et al., 2007). For a good example of the application of this procedure, see the work by Kubarenko et al. (
2007) on the generation of structures for toll-like receptor homologs and orthologs. Researchers at Merck used homology modeling, induced fit docking and molecular dynamics to design novel therapeutic agents for steroid hormone binding receptors and performed structure-function relationship studies (
Cornell and Nam, 2009).
Free energy calculations
Free energy calculations can determine the difference of free energy between different states (such as the free and bound forms of a receptor-ligand system) or along a reaction path (such as the path for ligand binding). The theoretical foundations are well-established (
Shirts and Pande, 2005;
Deng and Roux, 2009) and the techniques have been implemented in molecular dynamics packages such as GROMACS (
Kutzner et al., 2007). In principle, it is possible to directly calculate free energies of binding for ligand-receptor interactions, but only at great computational expense. One example is the determination of the free energies of binding of eight ligands to immunophilin FKBP using a massively parallel computing platform (
Fujitani et al., 2005). The main difficulty lies in the entropic contribution to binding free energies, which requires heavy computational sampling of a large configurational space. Much of the size of this configurational space is due to the solvent (water), so improvement in computational efficiency of free energy calculations can be obtained by using continuum solvation models (which treat water only implicitly) combined with molecular dynamics (
Hou et al., 2011). The computational expense has discouraged extensive application to drug design, but we may anticipate these methods will become increasingly important as computational power increases. In fact, this may already be the case, for example the study by Ge and Roux (
2010) using all-atom molecular dynamics to calculate free energies of binding for several sparsomycin analogs to the bacterial ribosome.
Determination of binding pathways
It is often fruitful to study the action of known inhibitors to gain insight that can be used to design new inhibitors. This includes not only the binding mode of the ligand, but also the specific pathway of binding. Such detailed studies can be easily performed with molecular dynamics simulations. For example, Yang et al. (
2009) used steered molecular dynamics to directly compare two pathways for the dissociation of a type II kinase inhibitor Imatinib and its receptors c-Kit and Abl. In steered molecular dynamics, external forces are applied to the system, driving it along a pathway of interest. In this case, the ligand was forced to dissociate along two different pathways. Equilibrium information, in particular, the free energies for each pathway, can be recovered using the Jarzynski relation (
Liphardt et al., 2002). In this case, the authors were able to determine that the ATP-channel pathway, even though it appears sterically restricted, was more favorable than the allosteric-pocket-channel pathway, due to favorable van der Waals interactions facilitated by the flexibility of residues lining the ATP channel. This information would not be possible to obtain by examination of static structures. In addition, this method has been extended to the comparison of the binding of a series of similar ligands (
Colizzi et al., 2010) where the force profiles were shown to distinguish between active and inactive compounds.
Conclusions
Absence of an experimentally determined structure of a therapeutic target does not necessarily rule out structure-based drug discovery efforts. Protein modeling methods can now construct reasonably accurate models or at least the relevant binding sites of yet uncrystallized targets. Their increasing success in the drug discovery and design process provides further impetus to the protein modeling field. Future developments will aim at cases where there is low template:target sequence identity, loop modeling, and also modeling proteins based on just the sequence details. As computational power continues to rise, we may expect molecular dynamics to take on an increasing role in the development of therapeutic agents. Molecular dynamics is a very general simulation method that can potentially provide detailed information about a wide variety of dynamic systems, but can also be easily adapted for specific needs. The on-going development and testing of sophisticated algorithms for molecular dynamics promises to provide increasing and successful returns in the discovery and design of next generation therapeutics.
Higher Education Press and Springer-Verlag Berlin Heidelberg