Alignment scoring of proteins sequences

To study the phylogenetic relationship between proteins, amino acid sequences are aligned such that they reach the maximum similarity. The amino acid replacements are then rated according to their likelihood. For example a Gly-Ala change is more likely than a Gly-Pro change, which would alter the 3D structure of the protein. Thereby we can derive the degree of homology. These evolutionary distances are usually depicted as a phylogenetic tree. To validate the result, the analysis is repeated several times and the resulting trees are superimposed on each other (bootstrapping), showing only the phylogenetic tree that is the most likely. Each node is assigned a value based on the proportion of bootstrapped trees showing that same clade. Nodes with a value greater than 70% are considered as consistent.

The resulting phylogenetic tree provides insights on the taxonomy of the different species. In the case of proteins, the close homologs are usually responsible for similar functions. When well-studied proteins are included in the phylogenetic analysis, one can derive the biological function of closely related enzymes.