Of textbook B - weblectures quiz

12 important questions on Of textbook B - weblectures quiz

What is an input vector?

  • Y is the dependent variable, referred to as response or target.
  • independent variables are denotes as feature, input, or predictors.
  • you usually have a set of predictors, x1, x2, etc.
  • this set can be collectively referred to as the input vector. Which is a vector with all features in it.


  • Large X's are variables, while small x's are datapoints

How can you write the regression formula for three regressors?

  • F(x) = f(x1, x2, x3) = E(Y | X1 = x1, X2 = x2, X3 = x3)
  • this just means that f(x) returns the average of all Y's for which holds: X1 = x1, X2 = x2, X3 = x3

What is a problem with estimating f, and how is it solved?

  • Particulary in small datasets, it might be the case that we have no datapoints with X = x.
  • we thus cannot compute E(Y | X = x), the average of Y for X = x.
  • we can still compute f^(x) through the following:
  • f^(x) = Ave(Y | X ∊ N(x))
  • which means estimate f^(x) on the basis of neighbouring points.
  • this is called nearest neighbor averaging


it works best with small amount of variables and largish amount of datapoints.
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

Why does nearest neighbor averaging not work for datasets with more variables?

  • If you want to include a specific proportion of the sample to find f^(x), let's say 10%;
  • you need a bigger radius to achieve this goal is you have a second variable compared to when you have one variable
  • the more dimensions, the further you have to look to find ten procent of the sample.
  • the further you have to look for neighbors, the less similar these neighbors are going to be which means less accurate predictions
  • this concept is called the curse of dimensionality

How can we solve the curse of dimensionality?

  • By adding more structure to our model; using a pre-specified formula, like the linear regression formula, we can predict Y for X = x without using nearest neighbor averaging.

What is meant by flexibility of a model/method?

  • Flexibility is the degree to which the model has freedom to fit to the data.
  • making our model the mean of y (so for every x value the predicted y value is it's mean) is not very flexible
  • a linear regression is slightly more flexible
  • a polynomial is even more flexible.
  • as models become more flexible they have the opportunity to fit to the data better, but they also have a larger tendency to overfit to trainingdata.

What is smoothing and how is K nearest Neighbor averaging an example of it?

  • Smoothing is the collective of methods of forming a model that work not by fitting a prespecified model, like a regression, to the data but specify prediction (points) based on what the data in that location indicates.
  • k nearest neighbor averaging forms a line through the data cloud by taking the average of the values of the k nearest neighbors around that point. Y^i is thus based on what the data around xi is like.

How does K nearest naighbor classifying work for qualitative y?

  • KNN classifying assigns individual cases to a class by looking at which classes the K nearest neighbors belong to.
  • Then it estimates the conditional probability for each class j as the proportion of class j members in the neighborhood.
  • it assigns the point to the class to which the highest amount of neighbors belong
  • dimension curse is also present here, similarly to in regression problems.

How is performance of a classification system assessed?

  • The performance of C^(x) is assessed by looking at the misclassification error rate:
  • Err te = Ave i ∊ te I [yi =! C^(xi)]
  • which states that the misclassification error rate is the average of instances in a test set where the predicted class did not match the actual class.

How can the dimension curse be solved for knn classifiers?

  • Again use structured models like support-vector machines.
  • also structured models for obtaining conditional class probabilities like logistic regression can be used.
  • the dimension curse is a bigger problem for modelling the probabilities compared to just classifying

What is the decision boundary?

The boundary where an observation is classified as class a on one side, and as class b on the other side.
in this case the black, or purple dotted line

How do the size of k (amount of neighbors evaluated) and the error rate relate?

  • In this figure, K becomes smaller towards the right side. The model thus becomes more flexible.
  • again we see that as the model becomes more flexible, training errors become smaller.
  • test errors however, benefit from a certain degree of flexibility but becoming more flexible after this optimum causes more test error since the model is overfitting to the training data.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo