Outline

1. Background

2. Workflow

3. Demonstration

    3.1 A brief introduction to the input page

    3.2 A brief introduction to the result page



1. Background

The adaptive immune system can make various peptides processed from self and foreign proteomes. Some of these peptides can be loaded onto the antigen-binding grooves of major histocompatibility complex (MHC) proteins and then detected by T cells. Such naturally processed and presented peptides are called NPPs in short, which in fact are natural T cell epitopes. Among them, peptides presented by MHC class I molecules on the cell surface are recognized by CD8+ T cells and thus termed as cytotoxic T cell epitopes. These epitopes are peptides of length about 8-11, usually 9, generated by proteolysis of intracellular proteins such as viral and tumor antigen.

Various ligand binding assays have been used to identify MHC-binding peptides (MHC binders) from chemically-synthesized overlapping peptides of protein antigens, because binding to MHC class I proteins is a prerequisite for being a cytotoxic T cell epitope. However, an MHC Binder in vitro does not necessarily mean it can be naturally processed and presented in vivo. For this reason, immunoproteomic methods have become increasingly popular in recent years, which can identify NPPs directly from cells or tissues using high performance liquid chromatography and mass spectrometry. Nevertheless, both experimental assays mentioned above are expensive and time-consuming. Therefore, reliable bioinformatics tools are urgently needed to reduce the number of necessary experiments.

With the data on MHC binders, non-binders, and NPPs accumulated and curated in several immune databases such as the Immune Epitope Database (IEDB), many tools for the prediction of cytotoxic T cell epitopes have been developed using machine learning methods. To make the available tools more comparable, the Brusic team at Dana-Farber Cancer Institute designed a repository for machine learning in immunology in 2009 and aimed to provide standardized data sets, consistent measurement methods, and uniform scales. After constructing the repository, they proposed and hosted the first Machine Learning Competition in Immunology (MLI), which was a good attempt to bridge the gap between the community of immunology and machine learning. Like most available tools, the topic of the first MLI competition is to predict MHC class I binding peptides. It is only natural since such binding is a requisite for any cytotoxic T cell epitope. However, it is reported that only 5-15% of MHC binders are naturally processed though all NPPs are known to bind relevant MHC (see Figure 1). Therefore, to predict a cytotoxic T cell epitope, it is better to compute if it is a NPP rather than if it is an MHC binder. And this has become the new trend of cytotoxic T cell epitope prediction. In 2012, the theme of the 2nd MLI competition became to distinguish NPPs from MHC binders. Winners such as MHC-NP, NetMHC 3.2, and ElutedPPred showed top performances on one or several alleles. Nonetheless, they are not always the best choice for peptides of every length and on every allele, suggesting the need for complementary tools such as NIEluter.

Figure 1. Different categories of peptides contained in the datasets.


2. Workflow

NIEluter is capable of predicting peptides eluted from six HLA alleles (A0201, B0702, B3501, B4403, B5301, and B5701) of 8-11 amino acids at present. The tool is an ensemble predictor based on a combination of five SVM models trained with one of the following features: position-specific amino acid composition, position-specific dipeptide composition, hidden markov model, binary encoding and BLOSUM62 feature. For each length and each allele, there is an ensemble predictor consisting of five SVM models. Each model will compute the peptide's probability to be a NPP. If the value is equal to or greater than 0.5, the model will vote NPP. All votes are then counted by the ensemble predictor. For each peptide, the final result will be positive (+) if three or more models vote NPP; Otherwise it will be labelled as negative (-), which means it is unlikely to be a NPP. The workflow of NIEluter is shown below in Figure 2.

Figure 2. The workflow of NIEluter.


3. Demonstration

3.1 A brief introduction to the input page

Figure 3. A brief introduction to the input page of NIEluter.

Step 1: Input sequences

NIEluter provide a flexible and convinient way to input sequences. Users can either paste sequences in the text box directly or upload the relevant file from the local computer. The file should be plain text in raw or FASTA format. NIEluter will take all lines between the two adjacent >-beginning lines as just one sequence. If there is no line beginning with > in the file, each line of the plain text will be taken as a separate sequence. Therefore, you'd better input sequences in FASTA format if you want to predict NPP from a batch of proteins. However, you can also input sequences conveniently in raw format if you only want to find out NPP from a series of MHC Binders.

Step 2: Select parameters

Users need to specify some other information, including the HLA allele and the peptide length. The default value for peptide length is 9 and the default allele is HLA-A0201. If the length of your sequence is longer than the selected peptide length, your sequence will be cut into overlapping peptides with the selected length! An error page will be displayed if the length of your sequence is shorter than the selected peptide length.


3.2 A brief introduction to the result page


Figure 4. A brief introduction to the result page of NIEluter.

The meaning of each column of the above table is described below.

Number: the numbering system for (cut) peptides starting from 1.
Sequence: the amino acid sequence of (cut) peptide.
Location: Pi: n-m, the peptide locating at the position from n to m of the ith input sequence.
+/-: + means NPP and - not NPP.
Votes: the number of NPP votes, minimum 0, maximum 5; the number of NPP votes can be used as an indicator of the predictive confidence.
+ Probability: the average probability to be a NPP of five SVM models; users can make their own prediction according to the average probability, though the prediction given by NIEluter is based on votes.
Length: the length of (cut) peptide.
Allele: from which MHC allele the peptide is eluted.