Data Science Foundations - Preprocessing

6 important questions on Data Science Foundations - Preprocessing

What are decision boundaries?

Interaction effects: when the relation between predictor A and target Y depends of value of another predictor B.

A decision tree cannot lear and incorporate linear relations. Decision trees will sometimes perform well, sometimes not.
- (X)OR-style: yes
- linear relations: no

When to step the decision tree process?

Simulate behavior on new data. Use a validation set: random subsample of 30%. Evaluatie:
- misclassification error: % of incorrectly predicted
- classification acc: % correctly predicted

What are the pros and cons of decision trees?

Advantages
- interpretable
- non parametic
- robust with respect to input data

Disadvantages
- sensitive to changes in the training data
- sensitive to imbalanced class distribution
- predicitive power: weak classifier
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What is the use of decision trees?

Predictive model: careful, depending on setting'
Data exploration: no pre processing required
Variable selection in pre processing
Segmentation prior to development of predictive models
Coarse classification
Ensembles

When use regression trees?

If the target variable is continuous
- loss given default
- customer lifetime value

How to make splitting decisions for regression trees?

Mean squared error: I(N) = 1/n sum(Yi-Y)^2
- with N is number of observations

ANOVA/F-Test
- low p value indicates good split

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo