Decision trees

15 important questions on Decision trees

Why are decisions trees a popular classification technique?

  • Performs well across a wide range of situations
  • Does not require much effort from the analyst
  • Easy understandable by the consumers
    • At least when the trees are not too large
  • Can be used for both:
    • Classification, called classification trees
    • Prediction, called regression trees

What is the main processing with decision trees?

Separate records into subgroups by creating splits on predictors

How is the tree constructed in induction?

Top-down recursive divide-and-conquer manner. At the start all the training instances are t the root of the tree. Instances are then partitioned recursively based on selected attributes to get the homogenous subgroups.
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What are issues that occur with induction?

  • Determine how to split the records
    • How to specify the attribute test condition?
    • How to determine the best split?
  • Determine when to stop slitting

How can the splitting take place based on nominal attributes?

  • Multi-way split - use as many partitions as distinct values
  • Binary split - divides values into two subsets

How can the splitting take place based on continuous attributes? (numbers)

  • Discretization to from an ordinal categorical attribute
    • Static, discretize once at the beginning
    • Dynamic, ranges can be found by equal interval bucketing, equal frequency bucketing (percentiles) or clustering
  • Binary Decision (i.e., Xi < v or Xi > v)
    • Finds the best cut among all possible splits
    • Can be more compute intensive

How can the impurity of a node can be measured?


  • Gini Index
  • Entropy measure

What is information gain?

Information gain is used to determine which feature/attribute provides the maximum information about a class. It splits records based on an attribute test optimizing certain criterion

What is calculated with the Gini index?

Gini index measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen

What is calculated with the Entropy measure?

Entropy, which is the degree of uncertainty, impurity or disorder. It aims to reduce the level of entropy starting from the root to the leave nodes.

How is the combined impurity calculated?

The combined impurity created by a split is a weighted average of the impurity measures.
  • Weighted by the number of records in each split.

What are the stopping criteria for tree induction?

  • Stop expanding a node when all the records belong to the same class
  • Stop expanding a node when all the records have similar attribute
  • Early termination

How can we address overfitting?

  • Pre-pruning
  • Post-pruning

What are the advantages of decision-trees?

Easy to understand
Easy to generate rules

What are the disadvantages of decision trees?

  • May suffer from overfitting
  • Classifies by rectangular partitioning, so it does not handles correlated features very well
  • Can be quite large —> pruning is necessary
  • Does not handle streaming data easily, but a few successful ideas/techniques

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo