BIBA - Naïve Bayes - Introduction to K Nearest Neigbors
6 important questions on BIBA - Naïve Bayes - Introduction to K Nearest Neigbors
Give an example for the classification with 5 nearest neighbors in the angry birds plot for the white angry bird.
Blue when K=5, since the closest neighbors are blue for the majority and therefore blue is the predominant class. Set white bird to blue class
How can you determine the best value for K?
- Based on the similarity / closeness between records
- → Measure the distance based on their values distance between records ri and rj is dij
- Distances can be defined in multiple ways
What typical properties are required when determining the best value for K?
- P.1. Non-negative: dij > 0
- P.2. Self-proximity: dii = 0 (dist. from a record to itself)
- P.3. Symmetry: dij = dji
- P.4. Triangle inequality: dij ≤ dik + dkj
- I.e., the distance between any pair cannot exceed the sum of distances between the other two pairs
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What is the Euclidean Distance and what is the formula to calculate it?
- Most popular distance measure for numerical values
- Record ri has values xi1,xi2, .., xip
- Euclidean distance between ri and rj is
What is the disadvantage of the Euclidean Distance and what is the solution?
- Highly scale dependent
- I.e., units of one variable can have a huge influence on the results, for example from cents to dollars
- Solution is normalizing the values before computing
- This converts all measurements to the same scale
- → Subtract average and divide by standard deviation
What is the manhattan distance and what is the formula?
- It looks at the absolute differences rather than squared differences
- Manhattan distance between ri and rj is:
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding