Pairwise alignment II

13 important questions on Pairwise alignment II

What is a problem with the simple dynamic programming approach?

  • It can only do linear gap penalties
  • The algorithm can be extended to deal with affine gap penalties

How can the simple dynamic programming approach be extended to deal with affine gap penalties?

Affine gap penalty:
g(k) = b + ak

b = penalty for opening a gap
a = penalty for extending a gap

We look back to not only the standard three cells (above, diagonal, left), but to four more.
With affine, it matters if the gap is the starter of one or more gaps next to each other, or if it follows after an already existing gap.  

(schermopname 20)

What is the time and memory complexion of DP?

  • Time complexity: O (n^2) for aligning two sequences of n residues, you need to perform n^2 algorithmic steps: matrix size
    • different seq lenghts: (m * n)
  • Memory space complexity: also O(n^2): you need a square search matrix of n by n containing n^2 cells
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

How does an alternative DP approach work, suitable for all types of gap penalties? This alternative approach is called general DP algorithm.

Schermopname 21.

Complexity is increased from O(n^2) to O(n^3).
The gap length is known exactly, therefore any gap penalty regime can be applied.
*check this again. Slide 30, 31

Name three situations when you would want to use semiglobal alignment. Also name a risk.

  • Finding a gene in a genome
  • Placing marker onto a chromosome
  • When one sequence is much longer than the other one

Risk: if the gap penalty is high, really bad alignment for divergent sequences.

Where do you find the alignment score and start of the traceback for global, semiglobal and local alignment?

  • Global: bottom right cell
  • Semiglobal: max value of bottom row or most right column
  • Local: max value of the whole matrix

Why/in which cases would you use local alignment? (2)

  • Parts of sequence diverge faster
    • evolutionary pressure is not present on whole sequence
    • consider high vs low conserved regions
  • Proteins have modular construction:
    • sharing domains between sequences
    • so eg aligned region occurs twice in sequence x, once in seq y

How is the equation changed for local alignment?

Fourth option: 0. So instead of a negative value, a hard zero is introduced. When you reach this during traceback, stop the alignment.
Adding the 0 also works for alternative DP for all gap penalties: also add 0 as option for the equation (see slide 57).

How can you find the second best alignment?

  • Best alignment surrounded by clump: clump of alignment = alignments sharing at least one aligned pair.
  • Don't let any matched pair contribute to the next alignment:
    • clear the best alignment (set all values of this path to 0)
    • recalculate the clump
  • Now find the next best alignment and repeat this
  • This is called Waterman-Eggert algorithm


Useful for eg protein with repeats, those domains may differ slightly but you want to align them all.

What are two pitfalls of alignments?

  • Alignment is often not a reconstruction of evolution: common ancestor is usally extinct
  • Repeats: matches to the same fragment!

What does protein research tell us about requirements for alignment techniques? (4)

  • Alignments need to be able to skip secondary structural elements to complete domains (ie putting gaps opposite these features in shorter sequences).
  • Gap situations can be even more complex in cases of alternatively spliced protein variants.
  • Depending on chosen gap penalty, algorithm might have difficulty with making such lang gaps --> incorrent alignment.
  • Alignments are only meaningful for homologous sequences.

What are dot plots?

  • Way of representing sequence similarity without DP
  • Make search matrix as for DP, but locally represent sequence similarity by averaging using a sliding window.
  • A dot is placed in every place where row and column contain same character.

Name 4 features to quantify sequence similarity.

  • Sequence identity: number of identical exchanges per unit length
  • Raw alignment score
  • Sequence similarity (alignment score normalized to a maximum possible)
  • Alignment score normalised to a randomly expected situation
    • Z-score

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo