Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new
structures, but often these have no known molecular functions. One approach to recover this
information is to use 3D templates—structure-function motifs that consist of a few functionally
critical amino acids and may suggest functional similarity when geometrically matched to other
structures. Since experimentally determined functional sites are not common enough to define
3D templates on a large scale, this work tests a computational strategy to select relevant residues
for 3 emplates.
Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA)
pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-
redundant structure database. On average each template matched 2.7 distinct proteins, of which
2.0 share the first three Enzyme Commission digits as the template’s enzyme of origin. In many
cases (61%) a single most likely function could be predicted as the annotation with the most
matches, and in these cases such a plurality vote identified the correct function with 87%
accuracy. ETA was also found to be complementary to sequence homology-based annotations.
When matches are required to both geometrically match the 3D template and to be sequence
homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either
method alone, especially in the region of lower sequence identity where homology-based
annotations re east eliable.
These data suggest that knowledge of evolutionarily important residues improves functional
annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the
ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it
should prove a useful, large scale, and general adjunct to combine with other methods to
decipher rotein unction n he tructural poteome.