K-Nearest Neighbors & Performance measures
38 important questions on K-Nearest Neighbors & Performance measures
What is the idea of k-Nearest Neighbors?
Use neighbors (i.e., these k records) to classify the new record into a class).
Assign the new record t the predominant class among these neighbors
What are the different parts of kNN?
- Choosing the number of neighbors, i.e., value k
- Computing classification (for a categorical outcome) or prediction (for a numerical outcome)
What is triangle inequality?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What is the Euclidean Distance?
What is meant when it is said that the Euclidean Distance is highly scale dependent?
What is good option when there are many outliers with Euclidean Distance?
What is the difference between the Manhattan distance and the Euclidean distance?
What is the problem when k is too low?
What is the problem when k is too high?
What is the problem when k is the number of records in the training dataset?
How to choose the value for k?
What is the validation set?
What is the difference in the approach for k NN with numerical outcomes versus categorical values?
What are the advantages of KNN?
- Simplicity of the method
- Lack of parametric assumptions
In what situations does KNN work especially well?
- When there is a large enough training set present
- When each class is characterized by multiple combinations of predictor values
What are the shortcomings of the KNN?
- Computing the nearest neighbors can be time consuming
- For every record to be predicted, we compute its distance from the entire set of training record only a the time of the prediction, "lazy learner"
- Number of records required in the training set to qualify as a large increases exponentially with the number of predictors
What are possible solutions for the first shortcoming?
- Reduce time taken to compute distances by working on less dimensions, generated using dimension reduction techniques
- Speed up identification of nearest neighbors using specialized data structures
What is the biggest evaluation issue?
What are the three steps for classification?
- Split data into train and test sets
- Build a model on the training set
- Evaluatie on a test-set
What is parameter tuning?
Some training schemes operate in two stages:
- build the basic structure
- optimize parameter settings
The test data cannot be used for parameter tuning. Proper procedure uses three sets:
- training data
- validation data
- test ata
Why is is a validation test set created?
What is supervised learning?
What are the three main types of outcomes?
- Predicted numerical value, e.g., house price
- Predicted case membership, e.g., cancer or not
- Probability of class membership, e.g., Naive Bayes
How do we measure accuracy when generating numeric predictions?
How do we measure accuracy given the prediction accuracy measure?
- Mean absolute error
- Mean error
- Mean percentage error
- Mean absolute percentage error
- Root mean squared error
What is Lift Chart?
The goals is to search for.a subset of records that gives the highest cumulative predicted values.
Compares the model's predictive performance to a baseline model that has no prediction.
What is the lift factor?
What is a natural criterion for judging the performance of a classifier?
What is a confusion/classification matrix?
Rows and columns correspond to the predicted and true (actual) classes
What is the overall success rate /accuracy?
What are ROC curves?
Characterize the trade-off between positive hits and false alarms
It plots the true positive rate against the false positive rate for the different possible thresholds of a diagnostics test
What are the limitations of accuracy?
How is this solved? --> cost matrix
What is multi class prediction?
What is the Kappa statistic?
What is the ability of precision?
Why is determining recall difficult?
What is the solution when determine recall is too difficult?
- Sample across the dataset and perform relevance judgement on these items
- Apply different models to the same dataset and then use the aggregate of relevant items as the total relevant set
Why would value 170 not be a suitable for the value of k?
- Because it is not an odd number and we might get ties
- Because it will assign all records to the majority class of the training data
- Because it is too high and we might not capture the local data structure
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding