As their name indicates (from the Greek protas), proteins are macromolecules of primary importance to the structure and function of living cells. Understanding and describing their biological function is essential to understanding their role in the cellular machinery. Proteins are flexible macromolecules, often changing shape/structure as needed to interact with other molecules. Structural fluctuations of proteins are related to their function. All the different structures that a protein assumes under physiological conditions (often referred to as equilibrium conditions, corresponding to room temperature of 300 K) contribute to its biological function. For example, for a protein such as ubiquitin, equilibrium fluctuations facilitate having multiple partner molecules and so a vast array of biological functions. Therefore, to describe and understand the biological function of a protein, it is important to model all the different structures that a protein can assume at equilibrium conditions.
Experimental techniques such as X-ray crystallography, Nuclear Magentic Resonance (NMR), and cryo Electron Microscopy (cryoEM) offer a limited view into one or a few structures available to a protein at equilibrium. Traditionally, obtaining a thorough picture of all equilibrium structures of a protein has been the task of computational techniques. Currently, Molecular Dynamic (MD) and Monte Carlo (MC) simulations, are limited in their ability to model structural fluctuations that occur on timescales beyond nanoseconds. Such a limitation is quite incapacitating. Fluctuations that are important to model in order to better understand and describe function may occur on timescales longer than nanoseconds, such as microseconds or even milliseconds.
Therefore, we address the following problem: Given an experimentally determined structure of a protein, model equilibrium fluctuations around this structure with no timescale limitations. The goal is to obtain an ensemble of structures that a protein assumes at equilibrium. The obtained ensemble needs to be representative of all the possible equilibrium structural fluctuations of a protein.
We take an approach that is complementary to current simulation techniques. Drawing from our expertise in sampling robot configurations to obtain a representation of their free configuration space, we
model proteins as articulated manipulators sample manipulator configurations corresponding to protein structures refine the energy of each obtained structure weight each structure by its Boltzmann probability
Consider one fragment of the polypeptide chain from amino acid a to amino acid b. This fragment can correspond to a loop for example. Loops that are on the surface of proteins are especially mobile. Modeling the equilibrium fluctuations of the fragment corresponding to the loop requires finding configurations of the fragment where amino acids a and b remain connected to the rest of the polypeptide chain. This is illustrated in the following figure, left, for the 12 amino acid loop of cytochrome inhibitor 2. We have developed a Fragment Ensemble Method (FEM) to obtain an ensemble of physical configurations of particular fragment of a protein polypeptide chain.
![]()
![]()
Left: The fragment corresponding to the loop in grey, needs to connect to the rest of the protein structure, in blue. This introduces spatial constraints on the end points of the fragment. Right: The main steps of FEM, described below, are ilustrated.
FEM first strips a fragment off its side chains. This coarse graining allows to model the resulting backbone of the fragment as an open kinematic chain. Analogies between its backbone and an open kinematic chain are then exploited to sample backbone conformations similarly to sampling configurations of the kinematic chain. An inverse kinematics technique developed in the context of kinematic chains, Cyclic Coordinate Descent (CCD) is applied to each chain configuration to satisfy the end point constraints. Finally, the side chains are put back on the backbone and optimal dihedral angles are sampled for the side chains. Energetic refinement of the entire fragment allows to minimize unfavorable interactions. The refinement focuses mostly on the fragment while allowing small needed fluctuations in the rest of the protein structure. In a statistical mechanics framework, each resulting configuration is weighted by its Boltzmann probability, which measures the feasibility of the configuration at equilibrium.
FEM allows to model equilibrium fluctuations of a particular fragment of a polypeptide chain. To model equilibrium fluctuations of an entire protein, we have developed a Protein Ensemble Method (PEM). The main steps of PEM, illustrated below, are as follows: PEM first divides the protein polypeptide chain into consecutive fragments of significant overlap. This is illustrated below on the 123 amino acid sequence of alpha-lactalbumin.
![]()
![]()
Left: The main steps of PEM, described below, are illustrated. Right: PEM-measured RMSD values of the amino acids of the entire polypeptide chain of alpha-lactalbumin are shown.
Sliding a window of length 30 aminoacids over the sequences defines 19 fragments where neighboring fragments overlap in 25 amino acids with one another. For each fragment, FEM is applied to obtain an ensemble of low-energy fragment conformations. These are pictorially illustrated by the ensembles inside each window. The final step of PEM combines fluctuations of neighboring fragments to obtain equilibrium fluctuations of the entire chain. The end-result of
PEMis illustrated for a measurement of the root-mean-squared-deviations (RMSD) of each amino acid of the polypeptide chain of alpha-lactalbumin. RMSD values of amino acids belonging to different fragments are colored differently. PEM can measure different quantities over the fragment ensembles. We typically measure thermodynamic quantities such as order parameters, scalar couplings, and residual dipolar couplings to compare the PEM-obtained equilibrium fluctuations to NMR dynamics data.
We can apply FEM to model equilibrium fluctuations of proteins such as cytochrome inhibor2 and variable surface antigen. The ensembles for each one are shown in the Figures below.
![]()
![]()
Left: The obtained loop conformations are shown in transparent, superimposed over the lowest energy structure generated, shown in opaque. The X-ray structure of the variable surface antigen misses this loop due to the mobility of the loop causing disorder in the crystal. The heterogeneity of the obtained ensemble agrees well with the hypothesized high equilibrium mobility of the loop. Right: Measured fluctuations of the loop are compared to PONDR scores that measure mobility given sequence information alone. The datasets are normalized for the comparison since they are of different magnitudes. The agreement is significant, even though the purpose of the comparison of the datasets is mostly qualitative.
We have applied FEM to model equilibrium fluctuations of entire proteins such as ubiquitin and protein G. For each protein, the obtained ensembles are compared to available NMR data that measure equilibrium fluctuations over a broad range of timescales, from picoseconds to milliseconds. In each case we obtain very high correlations, as the Figures below indicate.
![]()
![]()
Left: The obtained conformations for protein G are shown in transparent, superimposed over the experimentally available protein G native structure, shown in opaque. Right: Amide order parameters measured over the generated ensemble are compared to order parameters that quantify reorientations of the amide bond that occur on slow timescales. The agreement is high, with a Pearson correlation of 83%.
![]()
![]()
![]()
Left: The obtained conformations for ubiquitin are shown in transparent, superimposed over the experimentally available ubiquitin native structure, shown in opaque. Middle: Amide and methyl order parameters measured over the generated ensemble correlate with order parameters available from NMR with a Pearson correlation of 97%. Right: Residual dipolar couplings measured over the generated ensemble correlate with residual dipolar couplings available from NMR with a Pearson correlation of 95%. The agreement of the obtained ensemble with these NMR data is significant: long MD simulations up to 6 ns in explicit water fail to capture these NMR available data.
Our applications of PEM show that the equilibrium fluctuations modeled by the method agree very well with available experimental data. Obtaining a good agreement with NMR data that span multiple timescales is highly non-trivial. NMR data such as methyl order parameters, scalar couplings, and residual dipolar couplings may report equilibrium fluctuations that occur on timescales as slow as milliseconds. These results have prompted us to validate our method even more and investigate possible extensions.
Collaborations: This work has been conducted in collaboration with Dr. Cecilia Clementi. It is part of a larger project that seeks to computationally characterize and analyze conformational changes of molecules at different time-scales.
<?php $_SESSION['biblio_filter'] = array(); ?>