Intro + Pairwise alignment

10 important questions on Intro + Pairwise alignment

If we want to find the function of a newly sequenced gene with a 'lazy approach' (only bioinformatics, no biological experiments), how would we do this?

  • Find a set of protein sequences similar to the unknown sequence.
  • Identify similarities and differences.
  • For long protein sequences: first identify domains and then use corresponding subsequences.

Name 3 things we look at for reconstructing evolutionary and functional relationships.

  • Based on sequence
    • identity (simplest)
    • similarity
  • Homology (ultimate goal)
  • Other information such as 3D structure

What did a study on 3D structure and protein evolution show?

The distance from the active site determines the rate of evolution. Close: slow evolution, Far: fast evolution
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What is a frame shift mutation?

An insertion or deletion leading to a different reading frame, shifting all codons. Often results in shortened protein. Often nonfunctional.

What is a DNA expression mutation?

A mutation that does not change the protein itself but it's expression, eg where a protein is made and how much of a protein is made. Can lead to proteins being made at the wrong time or in the wrong cell type. Or under/overproduction.

What can you entounter when reconstructing "evolution" with sequences?

See slide.

Name conditions for aligning  sequences.

  • Sequences should be related trough divergent evolution
    • so they should be homologous
    • and preferablly orthologous:
    • paralogous sequences can become too distant for correct alignment (think of BAD)
  • Analogous sequences should not be aligned!
    • Sometimes a short functional motif can be detected.

What should an alignment scoring method do? How is alignment score defined.

  • Produce reasonable alignments
  • Must assign scores to:
    • substitutions (match/mismatch)
      • DNA
      • Proteins
    • Gap penalties
      • linear
      • affine
      • concave


Alignment score is defined as the summed score of all alignment columns.

Explain the concept of combinatorial explosion and the solution we use.

  • 1 gap in 1 seq: n+1 possibilities for alignment
  • 2 gaps in 1 seq: (n + 1)n
  • 3 gaps in 1 seq: (n + 1)n(n - 1)
  • *check formula later

explodes!

Solution = dynamic programming:
  • breaks up alignment problem in smaller subproblems, solve them iteratively.
  • Alignment is simulated as a Markov process. All sequence positions are seen as independent and identically distributed.
  • Chanches of sequence events are independent
    • Therefore probabilities per aligned position are multiplied
    • AA matrices contain log odds --> sum

Name 2 alternative alignment methods (so not global, semiglobal local).

  • De Novo sequencing
    • tracks overlap between millions of short seq reads coming from seq experiment. N is number of reads --> N ^2 overlap matches required.
  • Reference based sequencing
    • aligns short reads against reference genome


These algorithms are not based on evolutionary considerations per se, but match (near)identical fragments

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo