The working of the tool can be described as two step process:
(a) Extraction of position-specific,structure-based and amino-acid features from the query protein (input processing) and
(b) Prediction of deleterious mutations:Disease or Benign.
Input processing: The input to the tool is the amino acid sequence of the query protein for where mutations mapped onto protein has to be predicted. From the amino acid sequence 10 nsSNP neutral-disease discriminatory features (nsSNPND) are calculated. They are:
Features |
Description |
Position-specific features |
Position-specific probability score of wild-type amino acid residues (pab_WT) |
Position-specific probability score of mutant-type amino acid residues (pab_MT) |
|
Difference of position–specific probabilities score of Wild and Mutant amino acid residues (pab_WT-pab_MT ) |
|
Gribskov’s Score of wild-type amino acid residues (Gab_WT) |
|
Gribskov’s Score of mutant-type amino acid residues (Gab_MT) |
|
Difference of Gribskov’s Score of wild-type and mutant amino acid residues (Gab_WT-Gab_MT ) |
|
Sequence-based Structure Features |
Solvent accessibility status of amino acid residues Buried (1) or Exposed (0) $ |
Secondary structure prediction status Alpha-helix (1) or extended strand (2) or rest (0) # |
|
Amino acid residue based Features |
Difference of transfer in free energy values of wild type and mutated type from inside to surface of the protein @ |
BLOSUM62 Substitution scores for Wild-type -Mutated Type amino acids |
$ Solvent accessibility calculated from ACCpro4.0 (Cheng et al.,2005)
# Secondary structure prediction calculated from SSpro v4.5 (Cheng et al.,2005)
@ Transfer in free energy values from inside to outside of a globular protein (Janin,1979)
Prediction of Deleterious mutation:
Mutation mapping onto the query protein will be characterized as “Disease” or “Neutral” by our SVM-based method. This is carried out by means of Support Vector Machine (SVM) which is a supervised machine-learning method first developed by Vapnik (1995). All SVM computations are carried out using LIBSVM (Chang and Lin, 2001) using RBF kernel with the values of the cost parameter C and the kernel parameter g optimized by us. Before the actual predictions are carried out, we have used 5-fold cross validation of HumVar dataset .This is referred to as training of SVM.