BRAliBase II: Benchmarking alignment algorithms.


Supplementary data for BRaliBase II:

Gardner PP, Wilm A & Washietl S (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Research. 33(8):2433-2439.Supp. Mat.


Data-set 1: tar.gz


Data-set 1 (with structures): tar.gz NEW: 08/03/2007

Group II Intron

5S rRNA

SRP*

tRNA

U5

Unaligned sequences

Unaligned sequences

Unaligned sequences

Unaligned sequences

Unaligned sequences

Structural alignment

Structural alignment

Structural alignment

Structural alignment

Structural alignment

* the SRP alignments were not used in this study. Check Erratum2 below.

Due to many requests we've added all the accuracy values (SPS & SCI) used to compute table 1 of the above publication. These are available here. The R-scripts used to generate figure 2 and figure 3.

Erratum1: Some of the pairwise sequence identities have been scrambled (due to a combination of mis-formatting and alistat sensitivity to mis-formatted alignments). An updated set of identities can be found here. Note that the comparison of alignment methods of our paper remain valid - only the homology groups are affected by this error.

Erratum2: Due to the mis-formatting problem mentioned in "Erratum1" the SRP appeared to be mis-aligned. Further testing has shown these alignments to be perfectly reliable. Our sincere apologies to the SRPDB people who we've unnecessarily slandered here.


Data-set 2: tar.gz


Data-set 2 (with structures): tar.gz NEW: 08/03/2007

tRNA pairwise

Unaligned sequences

Structural alignment

The accuracy values (SPS & SCI) used to compute figure 3 of the above publication are available from here.

Software:

use bali_score.c to compute the SPS and RNAz to compute the SCI reported in the above paper.



The following algorithms were compared in this study:


Multiple Sequence Alignment

Clustal

A multiple sequence alignment program. Clustal, despite its great age (or maybe because of), is also superior to many alternative alignment tools (for structured RNA alignment).

DIALIGN

DIALIGN constructs pairwise and multiple alignments by comparing whole segments of the sequences. No gap penalty is used. This approach is especially efficient where sequences are not globally related but share only local similarities, as is the case with genomic DNA and with many protein families.

MUSCLE

Public domain multiple alignment software for protein and nucleotide sequences. MUSCLE stands for multiple sequence comparison by log-expectation. Recent (since 21/05/05) updates to MUSCLE (v3.52) code have resulted in improvements of this algorithm on ncRNA data since our published benchmark.

PCMA

a progressive multiple sequence alignment program that combines two different alignment strategies. Highly similar sequences are aligned in a fast way as in ClustalW, forming pre-aligned groups. The T-Coffee strategy is applied to align the relatively divergent groups based on profile-profile comparison and consistency. The scoring function for local alignments of pre-aligned groups is based on a novel profile-profile comparison method that is a generalization of the PSI-BLAST approach to profile-sequence comparison.

POA

Multiple sequence alignment using partial order graphs. Produces (comparatively) reasonable sequence-based alignments in an extremely short amount of time! Recommended if speed is an issue. NB. Using both the global and progressive modes is recommended.

ProAlign

A probabilistic multiple alignment program. Our recommended method for sequence-based structured RNA alignment!

prrn

Global multiple alignment of a set of protein or DNA sequences by doubly nested iterative refinement method.

T-COFFEE

Combines a collection of multiple/pairwise, global/local alignments into a single one. Your alignments may come from any source. T-Coffee also makes it possible to estimate the level of consistency of each position within the new alignment with the rest of the alignments. This concistency is usualy an indicator of alignment accuracy.

Handel

A probabilistic multiple alignment creation and annotation tool.

MAFFT

MAFFT ver.5.6 - Multiple alignment program for amino acid or nucleotide sequences. The updated versions of MAFFT now performs very well on our benchmark data.

Structural Alignment

Dynalign*

Uses a "full energy model" and comparative information to align and fold 2 sequences. Restricts the 'span' of base-pairs to improve CPU time.

Foldalign 2

New: Structurally align two sequences using a light weight energy model in combination with RIBOSUM like score matrices.

PMcomp

A variant of the Sankoff algorithm from the Vienna group.

Stemloc

Comparative RNA structure-finder using accelerated pairwise stochastic context-free grammars. Ships with the 'dart' package by Ian Holmes.


.


Paul Gardner, <pg5@sanger.ac.uk>
Dept. of Evolutionary Biology, University of Copenhagen,
Universitetsparken 15, 2100 Copenhagen Ø, Denmark.

Time-stamp: 2005-07-19 10:21:24 pgardner