Imbalanced data

8 important questions on Imbalanced data

How to spot overfitting in plotting the accuracy?

Look for a large gap between training and validation accuracy plots to spot overfitting.

How to avoid over flexibility in a classifier

Use regularization techniques

Splitting the data into two or three will lower the sample size, what can we do to counteract this?

Cross Validation; allows you to simulate testing on the entire set.
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

How does cross validation work?

Per iteration, take 1/k part of the data out for testing. Train model on the remaining data and test on the left out data. Repeat k times so all data is used for testing

What is nested cross validation?

Within each fold of the outer cross validation loop, we perform another cross validation loop (inner loop) for hyperparameter tuning

What strategy would make the training set as large as possible?

Use all data except for 1 sample for training, then apply on the one sample = leave one out cross validation.

If splits and cross validation split are random what does this imply for the results?

The results are dependent on the randomness in the split, when repeating the experiment the results might change. The stability of your results given this randomness is an indication of generalizability of the method

What is the difference between validation and test data?

Validation data: test data inner experiments

Test data: test data outer experiments

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo