Structure alignment
12 important questions on Structure alignment
Explain the difference between structural superposition and structural alignment.
- input = proteins with their atomic coordinates (2 PDB files) + a mapping, indicating which are corresponding residues (an alignment)
- output = 2 superimposed proteins (typically by providing a rotation and translation of coordinate frame)
Structural aligment:
- input = 2 proteins with their atomic coordinates (2 PDB files)
- output = An alignment between two protein structures, based on the structure alone
What is the goal in superposition? And how is this achieved?
--> translate and rotate
---> translate so that centers of mass fall onto each other
and find rotation that minimizes RMSD. (use Jacobi algorithm, eigenvalue problem).
What if we want to superimpose two proteins with different sequences?
- We 'only' need an alignment between the two structures.
- Problem: find an optimal alignment of residues, using the structures (coordinates) of two proteins.
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What is the complexity of superposition, and of structural alignment?
- Superposition: O(Np) ---- polynomial
- Structural alignment: NP
For structural alignment, what are the problems of representation, optimization and scoring?
- Representation: How to represent the input structures in a coordinate-independent space suitable for alignment.
- Optimization: How to sample the space of possible alignment solutions between the structures.
- Scoring: How to score a given alignment and determine its statistical significance.
For sequence alignment, what are the solutions to representation, optimization and scoring?
- Representation: Sequence (+ scoring matrix indicating sequence similarity)
- Optimization: (finding maximal alignment score): Dynamic Programming: Needleman-Wunsch / Smith-Waterman
- Scoring: Scoring Matrix → alignment score → e-value (BLAST)
Epxlain the concept of SSAP.
- C-beta vectors
- SSAP uses vectors, in a reference frame of the backbone
- This also adds directional information
Explain the concept of double dynamic programming.
- High level matrix: elements are resulting scores of low level DP
- Low level matrix: keep one pair of residues fixed
How is scoring significance determined?
- Extreme value distribution
- Typically one needs a p-value or z-score to indicate the relevance of a structural alignment
- A z-score indicates how many standard deviations an element is from the mean
- Note, this is the same what BLAST does
Structural Alignments are often used as “gold” standard for sequence alignment. Is this problematic?
How does multiple structrure alignment work?
Why is sequence alignment easier?
- Take the maximum at each step
- Why are we allowed to do this?
- If a point B lies on the most optimal path between A and C, than the optimal subpath A-B, lies on the same optimal alignment between A-C.
Realigning residues that are close together in sequence, may affect the alignment score of pairs much further along the sequence, since such residues may be close in space.We need to try all (or an exponentially large number of) possible combinations to find optimal alignment
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding