Cross-validation

5 important questions on Cross-validation

What are folds? What is their usage?

Subdivision of the data set into disjoint, equally-sized quantities, usually 5 or 10, all containing approximately the same class distribution. It is used to loop over the training data to have unbiased validation data.

What is problematic about a biased validation set in the process of training a model?

It causes biased validation performance, on which the hyperparameters will be tuned incorrectly, which results in a suboptimal trained model.

How may bias occur in the validation process? Name 2 possibilities.


  1. The validation data shows a lot of similarity with the training data.
  2. The occurrence of different classes is not representative to the real world: There is bias in the class distribution.
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

How is the best parameter setting determined in cross-validation?

The highest average validation performance over all folds.

How to ensure balance between the classes during training a supervised classifier?


    1. Reflect the distribution of classes to the occurrence in real-life.
    2. Apply stratification: Have an equal amount of items per class and afterwards

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo