If you use the LabelHash web server or the LabelHash command line version, we kindly ask that you acknowledge us. You can use the first citation below. This paper describes the LabelHash algorithm in detail, but does not include a detailed description of the web interface and the visualization front-end. Check back in a few months to see if anything has changed.
[1] M. Moll and L.E. Kavraki. Matching of Structural Motifs Using Hashing on Residue Labels and Geometric Filtering for Protein Function Prediction. The Seventh Annual International Conference on Computational Systems Bioinformatics (CSB2008), Stanford, CA, 2008.If you have any questions about LabelHash, please contact Mark Moll by email at “mmoll” AT cs.rice.edu.
The full PDB had 54,076 entries on November 11, 2008. Typically, each entry consist of more than one chain. Each individual chain is inserted into a LabelHash table. The total number of chains is 130,992. In the non-redundant PDB with less than 95% sequence identity there are 20,655 chains. The LabelHash tables contain partial matches of three residues that are somewhat close together and near the molecular surface (see the CSB2008 paper for details). The LabelHash tables for the full PDB contain 15.6 billion such 3-tuples, whereas the non-redundant PDB contains 2.4 billion 3-tuples. The total file size of the LabelHash tables for the full PDB and non-redundant PDB are 236GB and 36GB, respectively. The matching program runs in parallel on a dual quad-core 2.33GHz machine.
This project was performed pursuant to Baylor College of Medicine Grant No. DBI-054795 from the National Science Foundation. Lydia Kavraki has also been supported by a Sloan Fellowship. The computers used to carry out experiments for this project were funded by NSF CNS 0454333 and NSF CNS-0421109 in partnership with Rice University, AMD and Cray.
We are indebted to Dr. Slava Fofanov and Dr. Marek Kimmel from the Statistics Department at Rice University for their contributions to the statistical analysis and for their comments on LabelHash. We are also deeply grateful for the help of Dr. Brian Chen with MASH and the earlier contributions of Dr. Olivier Lichtarge, Dr. David Kristensen and Dr. Andreas Martin Lisewski within the context of the above mentioned NSF funded project.
LabelHash includes parts of the OOPSMP package related to proximity data structures, coordinate transformations and random number generation. This code was written by Erion Plaku in the Physical and Biological Computing Group. LabelHash also relies on several other external programs and libraries: CMake, MPI, msms, fftw3, LAPACK, libxml2, zlib, Chimera, and python. The ViewMatch plugin use the PDB-to-GO mapping from the Jena Library, the PDB-to-EC mapping from PDBSProtEC, and relies on the PDBsum server for additional protein information.