Rui Borges

Hello and welcome to my website.

Home
Education
Professional experience
Publications
Teaching
Software
PoMo-cod

Tutorial 3

Detecting positive selection in molecular sequences

In this tutorial, we will investigate the mammalian C7 protein, which is part of the complement immunity pathway, for signatures of positive selection. The data was taken from Kosiol, Vinar, da Fonseca, Hubisz, Bustamante, Nielsen, and Siepel (2008) entitled Patterns of positive selection in six mammalian genomes. To place the problem in context, you will test the mammalian C7 protein for positive selection by comparing the nearly neutral model M7 against the selection model M8.

We will use codeml, a part of the PAML package. codeml includes several models of codon and protein evolution.

CODEML operates similarly to BASEML. Rather than specifying the options with which the program should be executed through an interactive menu, these options are specified in a so-called control file.

This control file specifies, amongst other things, parameters describing:

To set the name of the alignment file and tree file needed by the analysis you would include the following two lines in the control file:

seqfile  = C7.phy   * name of sequence data file
treefile = C7.tree  * name of tree structure file

To test the positive selection hypothesis using CODEML, you will need to execute the program twice. The first time, you will run the analysis using a model with two classes of sites for purifying selection and neutral evolution (model 1). The second time, you will run the analysis using a selection model (model 2) with three classes of sites including the additional class of sites with positive selection.

These models can be set using the NSsites line.

      NSsites = 1   * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
                    * 5:gamma;6:2gamma;7:beta;8:beta&w;9:betaγ

In the option NS sites, choose either 7: Beta or 8: Beta and w. You will have to run codeml twice to obtain the likelihood of each model. Alternatively, one could test for selection by using the pair 1: Neutral Model and 2: Selection Model. To run the CODEML program, type ./codeml at the command line.

Using CODEML you will need to calculate for yourself the difference in the likelihood statistic between the two models, which is twice the difference in log likelihoods as estimated under the two models, i.e., LRT=2(lnL1-lnL0). Significant likelihood ratios are obtained using a chi-square distribution where the degrees of freedom are the difference in the number of parameters between the two models. A chi-square table can be found here.

Is there any evidence for positive selection in the mammalian C7 proteins? If this is the case, what is the proportion of positively selected sites and the dN/dS ratio intensity?