Protein Classification

Protein Classification and Search based on their 3D Structure

The proposed protein classification method is based primarily on the 3D shape of a protein and secondarily on its structure characteristics (primary, secondary structure). Having as input the PDB files, the 3D coordinates of the main atoms composing the amino acids are taken into account in order to construct a 3D model that describes the protein. These 3D protein forms are further processed in a way to be applicable to the Spherical Trace Transform. This methodology leads to the creation of completely rotation invariant descriptor vectors that perfectly describe the 3D shape of the proteins. Additionally, from the PDB files, characteristics which describe the primary and secondary structure of the proteins are also extracted. The geometrical descriptors, along with the structural descriptors, form a compound descriptor vector. This compound descriptor vector serves as input to a classification method which is used to categorize unclassified protein molecules. The nearest neighbour classifier has been used.

The dataset ( used during the performance assessment of our method consists of 2631 protein structures classified in 27 categories, according to the FSSP classification scheme which is held on DALI server at the European Bioinformatics Institute ( The 3D structures are derived from the Protein Data Bank (PDB).

Relevant papers

P. Daras, D. Zarpalas, A. Axenopoulos, D. Tzovaras and M. G. Strintzis: “3D Shape-Structure Comparison Method for Protein Classification”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 3, No. 3, pp. 193-207, July 2006.