BlastZ-net/Lastz-net Pairwise Alignment Analysis
BlastZ-net (Schwartz S et al., Genome Res.;13(1):103-7, Kent WJ et al., Proc Natl Acad Sci U S A., 2003;100(20):11484-9) or the newer version LastZ-net alignments are provided for closely related pairs of species. The alignments are the results of post-processing the raw BlastZ or LastZ results. In the first step, original blocks are chained according to their location in both genomes. The netting process chooses for the reference species the best sub-chain in each region.
There is no blastz-net data available for this clade.
Translated Blat Pairwise Alignment Analysis
Translated blat (Kent W, Genome Res., 2002;12(4):656-64) is used to look for homologous regions between more distantly related pairs of species. We expect to find homologies mainly in coding regions. We also pass the raw results through a chain and netting procedure similar to that used for the BlastZ-net analyses to produce the best sub-chain for the reference species (Translated Blat Net)
Currently tblat-net analysis is carried out on the Aspergillus clade and then separately on the remaining pair of fungi (S.cer and S.pom).
PECAN Multiple Alignment Analysis
Pecan is used to provide global multiple genomic alignments. First, Mercator is used to build a synteny map between the genomes and then Pecan builds alignments in these syntenic regions.
Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate. Read more about Pecan.
There is currently no pecan data for this clade.
EPO Multiple Alignment Analysis
The new EPO (Enredo, Pecan, Ortheus) pipeline is a three steps pipeline for whole-genome multiple alignments. Enredo produces colinear segments from extant genomes handling both rearrangements, deletions and duplications. Pecan, as described above, is used to align these segments. Finally, Ortheus is used to create genome-wide ancestoral sequence reconstructions. Further details on these methods can be found at:
- Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs
- Genome-wide nucleotide-level mammalian ancestor reconstruction
The high coverage eutherian mammal alignments were generated using the recent EPO (Enredo Pecan Ortheus) pipeline.
Each alignment set can be accessed using the Compara API via the Bio::EnsEMBL::DBSQL::MethodLinkSpeciesSetAdaptor using the method_link_type and either the list of the species or the species_set_name.
There is currently no EPO analysis for this clade.
Ancestral sequences are inferred from the EPO multiple alignments using Ortheus. Ortheus is a probabilistic method for the inference of ancestor, a.k.a tree, alignments. The main contribution of Ortheus is the use of a phylogenetic model incorporating gaps to infer insertion and deletion events. Ancestral sequences are predicted for each node of the phylogenetic tree that relates the sequences. Each ancestral sequence is named according to the derived extant species. For example, a sequence named Hsap, Ptro, Mmul corresponds to the ancestor of the Homo sapiens, Pan troglodytes, and Macaca mulatta genomes.
Additionally we use Gerp (Cooper GM et al., Genome Res., 2005; 15:901-913) to calculate conservation scores and call constrained elements on the 31-way and 12-way multiple alignments. Conservation scores are estimated on a column-by-column basis. Constrained elements are stretches of the multiple alignment where the sequences are highly conserved according to the previous score.
There is currently no conservation data available for this clade.
Synteny in Ensembl Genomes is calculated using a modified form of the algorithm as specified in Ensembl due to the size of genomes displayed. The source for synteny generation can be pairwise analysis (such as tblat-net) or multiple sequence alignments (as generated by Pecan).
The algorithm looks for stretches where the alignment blocks are in synteny. The search is run in two phases. In the first one, syntenic alignments that are closer than 1.5kbp are grouped. In the second phase, the groups that are in synteny are linked provided that no more than 2 non-syntenic groups are found between them and they are less than 3kbp apart.
There is currently no synteny data available for this clade.