Intro + Pairwise alignment
10 important questions on Intro + Pairwise alignment
If we want to find the function of a newly sequenced gene with a 'lazy approach' (only bioinformatics, no biological experiments), how would we do this?
- Find a set of protein sequences similar to the unknown sequence.
- Identify similarities and differences.
- For long protein sequences: first identify domains and then use corresponding subsequences.
Name 3 things we look at for reconstructing evolutionary and functional relationships.
- Based on sequence
- identity (simplest)
- similarity
- Homology (ultimate goal)
- Other information such as 3D structure
What did a study on 3D structure and protein evolution show?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What is a frame shift mutation?
What is a DNA expression mutation?
What can you entounter when reconstructing "evolution" with sequences?
Name conditions for aligning sequences.
- Sequences should be related trough divergent evolution
- so they should be homologous
- and preferablly orthologous:
- paralogous sequences can become too distant for correct alignment (think of BAD)
- Analogous sequences should not be aligned!
- Sometimes a short functional motif can be detected.
What should an alignment scoring method do? How is alignment score defined.
- Produce reasonable alignments
- Must assign scores to:
- substitutions (match/mismatch)
- DNA
- Proteins
- Gap penalties
- linear
- affine
- concave
Alignment score is defined as the summed score of all alignment columns.
Explain the concept of combinatorial explosion and the solution we use.
- 1 gap in 1 seq: n+1 possibilities for alignment
- 2 gaps in 1 seq: (n + 1)n
- 3 gaps in 1 seq: (n + 1)n(n - 1)
- *check formula later
explodes!
Solution = dynamic programming:
- breaks up alignment problem in smaller subproblems, solve them iteratively.
- Alignment is simulated as a Markov process. All sequence positions are seen as independent and identically distributed.
- Chanches of sequence events are independent
- Therefore probabilities per aligned position are multiplied
- AA matrices contain log odds --> sum
Name 2 alternative alignment methods (so not global, semiglobal local).
- De Novo sequencing
- tracks overlap between millions of short seq reads coming from seq experiment. N is number of reads --> N ^2 overlap matches required.
- Reference based sequencing
- aligns short reads against reference genome
These algorithms are not based on evolutionary considerations per se, but match (near)identical fragments
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding