Data Science Foundations - Preprocessing
6 important questions on Data Science Foundations - Preprocessing
What are decision boundaries?
A decision tree cannot lear and incorporate linear relations. Decision trees will sometimes perform well, sometimes not.
- (X)OR-style: yes
- linear relations: no
When to step the decision tree process?
- misclassification error: % of incorrectly predicted
- classification acc: % correctly predicted
What are the pros and cons of decision trees?
- interpretable
- non parametic
- robust with respect to input data
Disadvantages
- sensitive to changes in the training data
- sensitive to imbalanced class distribution
- predicitive power: weak classifier
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What is the use of decision trees?
Data exploration: no pre processing required
Variable selection in pre processing
Segmentation prior to development of predictive models
Coarse classification
Ensembles
When use regression trees?
- loss given default
- customer lifetime value
How to make splitting decisions for regression trees?
- with N is number of observations
ANOVA/F-Test
- low p value indicates good split
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding