Way of representing sequence similarity without DP Make search matrix as for DP, but locally represent sequence similarity by averaging using a sliding window. A dot is placed in every place where row and column contain same character.

Home / Summaries / Class notes - Algorithms in Sequence Analysis / alignment-gap-penalty

Pairwise alignment II

Q: What is a problem with the simple dynamic programming approach?

It can only do linear gap penalties The algorithm can be extended to deal with affine gap penalties

13 important questions on Pairwise alignment II

What is a problem with the simple dynamic programming approach?

It can only do linear gap penalties
The algorithm can be extended to deal with affine gap penalties

How can the simple dynamic programming approach be extended to deal with affine gap penalties?

Affine gap penalty:
g(k) = b + ak

b = penalty for opening a gap
a = penalty for extending a gap

We look back to not only the standard three cells (above, diagonal, left), but to four more.
With affine, it matters if the gap is the starter of one or more gaps next to each other, or if it follows after an already existing gap.

(schermopname 20)

What is the time and memory complexion of DP?

Time complexity: O (n^2) for aligning two sequences of n residues, you need to perform n^2 algorithmic steps: matrix size

different seq lenghts: (m * n)

Memory space complexity: also O(n^2): you need a square search matrix of n by n containing n^2 cells

How does an alternative DP approach work, suitable for all types of gap penalties? This alternative approach is called general DP algorithm.

Schermopname 21.

Complexity is increased from O(n^2) to O(n^3).
The gap length is known exactly, therefore any gap penalty regime can be applied.
*check this again. Slide 30, 31

Name three situations when you would want to use semiglobal alignment. Also name a risk.

Finding a gene in a genome
Placing marker onto a chromosome
When one sequence is much longer than the other one

Risk: if the gap penalty is high, really bad alignment for divergent sequences.

Where do you find the alignment score and start of the traceback for global, semiglobal and local alignment?

Global: bottom right cell
Semiglobal: max value of bottom row or most right column
Local: max value of the whole matrix

Why/in which cases would you use local alignment? (2)

Parts of sequence diverge faster

evolutionary pressure is not present on whole sequence
consider high vs low conserved regions

Proteins have modular construction:

sharing domains between sequences
so eg aligned region occurs twice in sequence x, once in seq y

How is the equation changed for local alignment?

Fourth option: 0. So instead of a negative value, a hard zero is introduced. When you reach this during traceback, stop the alignment.
Adding the 0 also works for alternative DP for all gap penalties: also add 0 as option for the equation (see slide 57).

How can you find the second best alignment?

Best alignment surrounded by clump: clump of alignment = alignments sharing at least one aligned pair.
Don't let any matched pair contribute to the next alignment:

clear the best alignment (set all values of this path to 0)
recalculate the clump

Now find the next best alignment and repeat this
This is called Waterman-Eggert algorithm

Useful for eg protein with repeats, those domains may differ slightly but you want to align them all.

What are two pitfalls of alignments?

Alignment is often not a reconstruction of evolution: common ancestor is usally extinct
Repeats: matches to the same fragment!

What does protein research tell us about requirements for alignment techniques? (4)

Alignments need to be able to skip secondary structural elements to complete domains (ie putting gaps opposite these features in shorter sequences).
Gap situations can be even more complex in cases of alternatively spliced protein variants.
Depending on chosen gap penalty, algorithm might have difficulty with making such lang gaps --> incorrent alignment.
Alignments are only meaningful for homologous sequences.

What are dot plots?

Way of representing sequence similarity without DP
Make search matrix as for DP, but locally represent sequence similarity by averaging using a sliding window.
A dot is placed in every place where row and column contain same character.

Name 4 features to quantify sequence similarity.

Sequence identity: number of identical exchanges per unit length
Raw alignment score
Sequence similarity (alignment score normalized to a maximum possible)
Alignment score normalised to a randomly expected situation

Z-score

The question on the page originate from the summary of the following study material:

Algorithms in Sequence Analysis

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Pairwise alignment II

13 important questions on Pairwise alignment II

What is a problem with the simple dynamic programming approach?

How can the simple dynamic programming approach be extended to deal with affine gap penalties?

What is the time and memory complexion of DP?

How does an alternative DP approach work, suitable for all types of gap penalties? This alternative approach is called general DP algorithm.

Name three situations when you would want to use semiglobal alignment. Also name a risk.

Where do you find the alignment score and start of the traceback for global, semiglobal and local alignment?

Why/in which cases would you use local alignment? (2)

How is the equation changed for local alignment?

How can you find the second best alignment?

What are two pitfalls of alignments?

What does protein research tell us about requirements for alignment techniques? (4)

What are dot plots?

Name 4 features to quantify sequence similarity.

Summaries related to Intro + Pairwise alignment

Class notes - Algorithms in Sequence Analysis

Syllabus Introduction to systems biology

Structural Bioinformatics

Class notes - Biosystems Data Analysis

Indian Economics

Global politics

Essentials of international relations

Behavioral genetics

Management and organisational behaviour

Follow Up Engels idioom 4/5 H

International Business

Marketing fundamentals