Monday, July 18, 2011

High rate of false positives in the estimates of positive selection due to faulty alignments


image
In a paper published in Genome Research, Penka Markova (who successfully graduated this year) and Dmitri, continue to shine light on the often underapprecaited step in studying natural selection in protein and DNA sequences. This step - alignment of homologous sequences - is key as it determines which positions in proteins are the "same" and thus can be meaningfully compared across species or individuals. Because it is often hard to assess the error at this step, the common practice is to accept the alignments as if they were in fact true and to investigate all other sources of possible error. Unfortunately, as this paper shows in particular, this assumption might be woefully wrong especially in the studies of positive selection. After all, we often define possible cases of positive selection by detecting patterns of evolution that are faster or different than predicted by the model of unchanging constraint. It is hard to generate a more unsual pattern than that produced by misalignments. Our paper suggests that 50-80% (!) of all cases of detected positive selection in the alighments of Drosophila proteins are due to misaligments. The problem is very severe and calls for computational and statistical solutions, manual curation of candidates, and above all caution in interpreting scans for positive selection based on massive, genome-level aligments of proteins. Our paper has been positively reviewed by Faculty of 1000 (two evaluations can be found here: http://f1000.com/11045956 and here is pdf).