Decision trees
15 important questions on Decision trees
Why are decisions trees a popular classification technique?
- Performs well across a wide range of situations
- Does not require much effort from the analyst
- Easy understandable by the consumers
- At least when the trees are not too large
- Can be used for both:
- Classification, called classification trees
- Prediction, called regression trees
What is the main processing with decision trees?
How is the tree constructed in induction?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What are issues that occur with induction?
- Determine how to split the records
- How to specify the attribute test condition?
- How to determine the best split?
- Determine when to stop slitting
How can the splitting take place based on nominal attributes?
- Multi-way split - use as many partitions as distinct values
- Binary split - divides values into two subsets
How can the splitting take place based on continuous attributes? (numbers)
- Discretization to from an ordinal categorical attribute
- Static, discretize once at the beginning
- Dynamic, ranges can be found by equal interval bucketing, equal frequency bucketing (percentiles) or clustering
- Binary Decision (i.e., Xi < v or Xi > v)
- Finds the best cut among all possible splits
- Can be more compute intensive
How can the impurity of a node can be measured?
- Gini Index
- Entropy measure
What is information gain?
What is calculated with the Gini index?
What is calculated with the Entropy measure?
How is the combined impurity calculated?
- Weighted by the number of records in each split.
What are the stopping criteria for tree induction?
- Stop expanding a node when all the records belong to the same class
- Stop expanding a node when all the records have similar attribute
- Early termination
How can we address overfitting?
- Pre-pruning
- Post-pruning
What are the advantages of decision-trees?
Easy to generate rules
What are the disadvantages of decision trees?
- May suffer from overfitting
- Classifies by rectangular partitioning, so it does not handles correlated features very well
- Can be quite large —> pruning is necessary
- Does not handle streaming data easily, but a few successful ideas/techniques
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding