Abstract
We attempt to establish geometrical methods for amino acid sequences. To measure the similarities of these sequences, a kernel on strings is defined using only the sequence structure and a good amino acid substitution matrix (e.g. BLOSUM62). The kernel is used in learning machines to predict binding affinities of peptides to human leukocyte antigen DR (HLA-DR) molecules. On both fixed allele (Nielsen and Lund in BMC Bioinform. 10:296, 2009) and pan-allele (Nielsen et al. in Immunome Res. 6(1):9, 2010) benchmark databases, our algorithm achieves the state-of-the-art performance. The kernel is also used to define a distance on an HLA-DR allele set based on which a clustering analysis precisely recovers the serotype classifications assigned by WHO (Holdsworth et al. in Tissue Antigens 73(2):95–170, 2009; Marsh et al. in Tissue Antigens 75(4):291–455, 2010). These results suggest that our kernel relates well the sequence structure of both peptides and HLA-DR molecules to their biological functions, and that it offers a simple, powerful and promising methodology to immunology and amino acid sequence studies.
Original language | English |
---|---|
Pages (from-to) | 951-984 |
Number of pages | 34 |
Journal | Foundations of Computational Mathematics |
Volume | 14 |
Issue number | 5 |
DOIs | |
Publication status | Published - 1 Sept 2014 |
Externally published | Yes |
Keywords
- HLA DRB allele classification
- Major histocompatibility complex
- Peptide binding prediction
- Reproducing kernel Hilbert space
- String kernel
ASJC Scopus subject areas
- Analysis
- Computational Theory and Mathematics
- Computational Mathematics
- Applied Mathematics