BRaliBase III.


Freyhult E, Bollback JP and Gardner PP (2007) Exploring genomic dark matter: a critical assessment of the performance of homology search methods on non-coding RNA. Genome Research.

. 17(1):117-25. Supplementary materials.

Supplementary Information.

A tarball of the data-sets.


The 5 and 20 sequence query and database datasets can be found here.

The scripts used to run and analyse homology search software.

The tab-delimited results files from aforementioned homology search software.

The README giving details of the datasets and summarising scripts and results.



The following algorithms were compared in this study:


Single sequence methods

Description

NCBI-BLAST

finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.

WU-BLAST

Faster at any sensitivity, more sensitive at any speed, the original gapped BLAST with statistics, providing the performance, features and reliability demanded by technical professionals.

FASTA

Compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.

PARALIGN

Rapid and sensitive sequence similarity searches powered by parallel computing technology.

SSEARCH

Performs a rigorous Smith-Waterman alignment between a protein sequence and another protein sequence or a protein database, or with DNA sequence to another DNA sequence or a DNA library (very slow).

Profile HMM methods

HMMer

Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER is a freely distributable implementation of profile HMM software for protein sequence analysis.

SAM

A collection of software tools for creating, refining, and using a type of statistical model called a linear hidden Markov model for biological sequence analysis.

Structural methods

ERPIN

ERPIN (Easy RNA Profile IdentificatioN) is an RNA motif search program developed by Daniel Gautheret and André Lambert

Infernal

cmalign aligns the RNA sequences in seqfile to the covariance model (CM) in cmfile, and outputs a multiple sequence alignment.

RaveNnA

A software package for faster covariance models whilst provably sacrificing no accuracy for rigorous filtering. Also includes heuristic filters.

RSEARCH

RSEARCH aligns an RNA query to target sequences, using SCFG algorithms to score both secondary structure and primary sequence alignment simultaneously. It's slow, but somewhat more capable of finding significant remote RNA structure homologies than sequence alignment methods like BLAST.

RSmatch

RNA Secondary Structure Matcher: provides four functions: (1) regular database search, (2) multiple structure alignment, (3) iterative database search, and (4) pairwise sequence alignment.


.


Paul Gardner, <pg5@sanger.ac.uk>
Dept. of Evolutionary Biology, University of Copenhagen,
Universitetsparken 15, 2100 Copenhagen , Denmark.

Last changed: 2006-06-19 13:01:21 pgardner