We describe and empirically evaluate machine learning methods for the prediction of zinc binding sites from protein sequences. We start by observing that a data set consisting of single residues as examples is affected by autocorrelation and we propose an ad-hoc remedy in which sequentially close pairs of candidate residues are classified as being jointly involved in the coordination of a zinc ion. We develop a kernel for this particular type of data that can handle variable length gaps between candidate coordinating residues. Our empirical evaluation on a data set of non redundant protein chains shows that explicit modeling the correlation between residues close in sequence allows us to gain a significant improvement in the prediction performance. © Springer-Verlag Berlin Heidelberg 2006.
Improving prediction of zinc binding sites by modeling the linkage between residues close in sequence
Passerini, Andrea;
2006-01-01
Abstract
We describe and empirically evaluate machine learning methods for the prediction of zinc binding sites from protein sequences. We start by observing that a data set consisting of single residues as examples is affected by autocorrelation and we propose an ad-hoc remedy in which sequentially close pairs of candidate residues are classified as being jointly involved in the coordination of a zinc ion. We develop a kernel for this particular type of data that can handle variable length gaps between candidate coordinating residues. Our empirical evaluation on a data set of non redundant protein chains shows that explicit modeling the correlation between residues close in sequence allows us to gain a significant improvement in the prediction performance. © Springer-Verlag Berlin Heidelberg 2006.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



