Peptide-MHC Binding Prediction Task Overview

Definition: In the human body, T cells monitor the existing peptides and trigger an immune response if the peptide is foreign. To decide whether or not if the peptide is not foreign, it must bound to a major histocompatibility complex (MHC) molecule. Therefore, predicting peptide-MHC binding affinity is pivotal for determining immunogenicity. There are two classes of MHC molecules: MHC Class I and MHC Class II. They are closely related in overall structure but differ in their subunit composition. This task is to predict the binding affinity between the peptide and the pseudo sequence in contact with the peptide representing MHC molecules.

Impact: Identifying the peptide that can bind to MHC can allow us to engineer peptides-based therapeutics such vaccines and cancer-specific peptides.

Generalization: The models are expected to be generalized to unseen peptide-MHC pairs.

Product: Immunotherapy.

Pipeline: Activity - peptide design.

MHC Class I, IEDB-IMGT, Nielsen et al.

Dataset Description: Binding of peptides to MHC class I molecules (MHC-I) is essential for antigen presentation to cytotoxic T-cells. An organized datasets by NetMHCpan for MHC class I collected from IEDB and IMGT/HLA database.

Task Description: Regression. Given the amino acid sequence of peptide and the pseudo amino acid sequence of MHC, predict the binding affinity.

Dataset Statistics: 185,985 pairs, 43,018 peptides and 150 MHC class 1s

Dataset Split: Random Split


from tdc.multi_pred import PeptideMHC
data = PeptideMHC(name = 'MHC1_IEDB-IMGT_Nielsen')
split = data.get_split()

References:

[1] Nielsen, Morten, and Massimo Andreatta. “NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets.” Genome medicine 8.1 (2016): 1-9.

[2] Vita, Randi, et al. “The immune epitope database (IEDB): 2018 update.” Nucleic acids research 47.D1 (2019): D339-D343.

[3] Zeng, Haoyang, and David K. Gifford. “Quantification of uncertainty in peptide-MHC binding prediction improves high-affinity peptide Selection for therapeutic design.” Cell systems 9.2 (2019): 159-166.

Dataset License: CC BY 4.0.


MHC Class II, IEDB, Jensen et al.

Dataset Description: Major histocompatibility complex class II (MHC‐II) molecules are found on the surface of antigen‐presenting cells where they present peptides derived from extracellular proteins to T helper cells. Useful to identify T‐cell epitopes. An organized datasets by NetMHCIIpan for MHC class II collected from IEDB database.

Task Description: Regression. Given the amino acid sequence of peptide and the pseudo amino acid sequence of MHC, predict the binding affinity.

Dataset Statistics: 134,281 pairs, 17,003 peptides and 75 MHC class 2s

Dataset Split: Random Split


from tdc.multi_pred import PeptideMHC
data = PeptideMHC(name = 'MHC2_IEDB_Jensen')
split = data.get_split()

References:

[1] Jensen, Kamilla Kjaergaard, et al. “Improved methods for predicting peptide binding affinity to MHC class II molecules.” Immunology 154.3 (2018): 394-406.

[2] Vita, Randi, et al. “The immune epitope database (IEDB): 2018 update.” Nucleic acids research 47.D1 (2019): D339-D343.

[3] Zeng, Haoyang, and David K. Gifford. “Quantification of uncertainty in peptide-MHC binding prediction improves high-affinity peptide Selection for therapeutic design.” Cell systems 9.2 (2019): 159-166.

Dataset License: CC BY 4.0.