Dataset complexity
15 important questions on Dataset complexity
The complexity of your problem consists mainly of two parts:
- is your dataset representative enough of your problem
What do you check to see if your problem is high dimensional?
What can we say about the dimensionality when you have more features than samples?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
How do we measure complexity?
What do learning curves tell us?
When a small number of training objects is available you overtrain
Use a simple classifier when you don't have many training examples
Is there something more general to quantify the complexity? Learning curves are specific per dataset?
What does a high bias mean?
What does high variance mean?
What is L2 regularization?
What does L2 to the weights?
What is Ridge regression?
What does L1 to the weights?
Which hyperparameters to optimize in SVM?
- Kernel parameters
- Slack
How to tune hyperparameters?
- Randomized search
What are the issues with hyperparameter optimization?
- Computational expensive
- Randomness
- Overfitting
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding