### abstract ###
Comparison of elastic network model predictions with experimental data has provided important insights on the dominant role of the network of inter-residue contacts in defining the global dynamics of proteins.
Most of these studies have focused on interpreting the mean-square fluctuations of residues, or deriving the most collective, or softest, modes of motions that are known to be insensitive to structural and energetic details.
However, with increasing structural data, we are in a position to perform a more critical assessment of the structure-dynamics relations in proteins, and gain a deeper understanding of the major determinants of not only the mean-square fluctuations and lowest frequency modes, but the covariance or the cross-correlations between residue fluctuations and the shapes of higher modes.
A systematic study of a large set of NMR-determined proteins is analyzed using a novel method based on entropy maximization to demonstrate that the next level of refinement in the elastic network model description of proteins ought to take into consideration properties such as contact order and the secondary structure types of the interacting residues, whereas the types of amino acids do not play a critical role.
Most importantly, an optimal description of observed cross-correlations requires the inclusion of destabilizing, as opposed to exclusively stabilizing, interactions, stipulating the functional significance of local frustration in imparting native-like dynamics.
This study provides us with a deeper understanding of the structural basis of experimentally observed behavior, and opens the way to the development of more accurate models for exploring protein dynamics.
### introduction ###
Associated with each protein fold is a set of intrinsically accessible global motions that arise solely from the 3-dimensional geometry of the fold and involve the entire architecture.
For a number of systems it has been shown that these intrinsic motions play an important role in protein function CITATION, facilitating events such as recognition and binding CITATION, CITATION, catalysis CITATION CITATION and allosteric regulation CITATION, CITATION, CITATION.
The time scales of these cooperative motions are usually beyond the reach of conventional MD simulations.
They are modeled instead with coarse-grained techniques that omit the finer details of atomic interactions.
The elastic network model is an example of a coarse-grained model that has enjoyed considerable success in predicting global dynamics of proteins and other macromolecules.
The central idea behind the ENM is that, in the vicinity of a minimum, the potential energy landscape of a biomolecular system can be approximated by the sum of pairwise harmonic potentials that stabilize the native contacts.
In the simplest ENM, the Gaussian network model CITATION, each node of the network is identified by an amino acid, and each edge is a spring that provides a linear restoring force to deviations from the minimum-energy structure.
The system's dynamics is therefore expressed in terms of the normal modes of vibration of the many-bodied system about its equilibrium state; and dynamical information about the protein, such as the expectation values of residue fluctuations or cross-correlations, is uniquely defined by the network topology.
A few prevalent methods are used for constructing ENMs, but most have at their hearts two underlying assumptions: The springs are all at their rest lengths in the equilibrium conformation, and the force constants decrease with the distance between nodes, among other variables.
In the earliest models CITATION, CITATION and the anisotropic network model CITATION CITATION, force constants were taken to be uniform for all nodes separated by a distance less than a specified cutoff distance and zero for greater distances.
In parallel, models were proposed in which the force constants decay exponentially CITATION, CITATION or as an inverse power of distance CITATION, CITATION, or where stronger interactions are assigned to sequentially adjacent residues CITATION, CITATION, CITATION.
Although such modifications can lead to modest improvements in the agreement between ENM predictions and certain experimental data, there is still no clear best method for assigning force constants in an ENM.
A common approach for assessing the performance of ENMs or estimating their force constants has been to compare the ENM-derived autocorrelations of residue motions to the corresponding X-ray crystallographic B-factors or the mean-square fluctuations in residue coordinates observed between NMR models.
Because the slow modes have the largest amplitudes, often the focus of study has been a narrow band of the slowest modes.
The ENM slow modes have indeed been shown to agree well with those predicted by detailed atomic-level force fields and with experimentally determined dynamics CITATION, CITATION.
However, the majority of the dynamical information conveyed by the ENM is contained in the residue cross-correlations, and this information has been largely overlooked during comparisons of ENM results to experimental data.
Further, the subtle and complex dynamics of the structures that lie beneath the gross global motions are ignored when only the slowest modes are considered.
Mid- and high-frequency modes are predicted with relatively lower confidence by ENMs, but these modes may be important for coordinating the finer motions of the molecule while the slower modes orchestrate its global rearrangements CITATION.
Finally, while the ENM-based studies have shown that the network topology is the dominant factor that defines the collective modes, especially those in the low frequency regime, there may be other structural properties that are not accounted for by ENMs but which may provide a more realistic description of equilibrium dynamics, if accurately modeled.
Here we examine the ensembles of structural models determined by NMR for 68 proteins and evaluate for each ensemble the covariance in the deviations of residue-positions from their mean values.
We present a technique for optimizing ENM force constants within a pre-defined network topology so as to provide the most accurate representation of the experimentally observed covariance data.
Our method is based on the concept of entropy maximization: Briefly, when inferring the form of an unknown probability distribution, the one that is least reliant on the form of missing data is that which maximizes the system's entropy subject to constraints imposed by the available data CITATION, CITATION.
This method has been applied to a variety of biological problems, including neural networks CITATION, gene interaction networks CITATION, and protein folding CITATION .
The resulting auto- and cross-correlations in residue fluctuations are used to build an ENM-based model with optimal force constants.
It can be shown that when the constraints of the maximization are pair correlations, the probability distribution takes a Gaussian form.
Further, the only terms that contribute to the probability distribution are those that correspond to pairs with correlations that are explicitly considered as constraints on the entropy maximization.
In terms of the ENM, this means that for a given network topology, there exists a unique set of force constants that exactly reproduces the experimentally observed cross- correlations between all pairs of interacting residues, along with their autocorrelations .
Notably, our technique captures the physical significance of factors such as sequence separation and spatial distance which have been empirically found to influence force constant strengths.
Sequence separation is expressed in terms of contact order, i.e., the number of residues along the sequence between two residues that are connected by a spring in the ENM.
Further, our analysis benchmarked against a test set of 41 NMR ensembles of proteins suggests additional factors, including hydrogen bond formation and secondary structure type, which should also be incorporated in the ENMs for a more accurate description of experimental data.
It also identifies factors that are of little consequence insofar as the collective dynamics near equilibrium conditions are concerned.
Amino acid specificity turns out to be one of them; diffuse, overlapping distributions of OFCs are obtained for different types of amino acids, precluding the assignment of residue-specific OFCs.
A modified version of the GNM, mGNM, that accounts for these factors is proposed and is verified to perform better than existing models especially in reproducing cross-correlations.
Finally, the study highlights the importance of higher modes and the role of frustration in protein dynamics, the implications of which are discussed with regard to model development and protein design.
