Weblectures quiz
19 important questions on Weblectures quiz
What are pros and cons of tree-based methods?
- Tree-based methods are simple and useful for interpretation
- trees are very easy to explain
- decision trees closely mirror human decision making
- trees can handle qualitative predictors without the need to create dummy variables.
- however, they are not competitive with the best supervised learning approaches in terms of prediction accuracy
- techniques like bagging, boosting and random forrests allows you to combine multiple trees which increases predictive performance while it costs interpretability.
Describe some tree terminology according to this figure.
- The regions or bins of the feature space are known as terminal nodes, in this case R1, R2, R3
- these terminal nodes are also referred to as leafs. Terminal nodes are not further split.
- the tree is typically drawn upside down, so the leaves are at the bottom of the tree.
- internal nodes are points along the tree where the feature space is split.
How are predictions made in a decision tree?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What information is provided by the lengths of the arms?
- The longer the arms, the higher the drop in RSS was due to that split.
- this shows that further splits always cause a lesser decrease in RSS compared to earlier splits.
What is the definition of a regression tree?
- A tree for which the response variable is continuous
- the tree thus has nothing to do with regression.
In classification tree, what is the predicted outcome of the tree for a given observation?
- For a given observation, the tree returns the class that occurs the most in the region to which the observation belongs.
What is the most basic strategy to fitting a classification tree?
- Just as in the regression setting, recursive binary splitting is used to grow a classification tree.
- an alternative to RSS for classification is the classification error rate.
- which is simply the fraction of the training observations in a region that do not belong to the most occuring class of that region (which will be the prediction for observations in that region).
- however, misclassification error is not that sensitive to tree growing, so two other methods are preferable
What is the deviance or cross-entropy?
- An alternative to misclassification rate to assess classification performance.
- The deviance within a region is given by:
- - sum~K(^pmk * log(^pmk))
- the deviance is numerically very similar to the gini index, small numbers indicate that the region is dominated by one class, and large numbers indicates that the classes have equal dominance in the region.
Why don't you have to have to prune trees when you use bagging?
- The only reason that you prune a tree is because you want to reduce the variance of the method, which would lead to overfitting.
- in bagging you already reduce variance because you average your predictions over many trees.
- since variance of the method is already reduced through averaging you can make the individual trees very large and don't need to prune them.
How would you use bagging for classification?
- Exactly the same as with regression trees:
- build a tree for each bootstrapped sample and obtain the prediction for observation x of each tree.
- the eventual prediction will in this case be the class that most trees predict observation x to belong to.
In the following situation: the dataset originally contained 4000 features. To do our analysis, we pick the 500 features that have the highest variance. Is this going to introduce bias into our model?
- In this case we won't introduce bias into our model because we don't use the response variable to select predictors.
- if we look at overal variance of the features without taking the response variable into account, this is allowed.
How can you have a function in r that performs random forests, perform bagging?
How do simple trees, bagging and random forests have an advantage over eachother?
- The advantage of bagging over simple trees is that the variance is reduced by averaging unbiased predictions rather than increasing the bias by pruning the tree.
- The advantage of random forests over bagging is that the bootstrapped trees in a random forest tend to have more dissimilar branches than the boostrapped trees in bagging.
What are the tuning parameters for boosting?
- The number of trees B.
- unlike bagging and random forests, boosting can overfit if B is too large. Cross validation can be used to select B.
- the shrinkage parameter lambda
- this is the rate at which the boosting learns, it is set very small often, 0.01 or 0.001
- the number of splits d
- often d = 1 works very well, each tree is called a stump in this case
- d is also referred to as the interaction depth. Since d splits can involve at most d variables it controls how many variables are used in the creation of each tree.
How can you indicate how important a variable is in your trees method?
- For bagged/RF regression trees you can record the total amount that the RSS dropped due to a split performed on the variable, averaged over all B trees.
- for classification you can add up the total amount that the Gini index is decreased by splits over a given predictor, averaged over all B trees.
- in both cases, a large value indicates an important predictor.
How does bayesian additive regression trees work?
- K trees are created
- each of these K treas undergoes B iterations where in each iteration, a pertubation is performed (adjustment) on the tree.
- the final predictions are obtained by averaging the predictions of each of the K trees at each of the B iterations.
What are possible pertubations for trees in BART?
- Altering the structure of the tree
- move or delete branches at any locations
- altering the predictions of each of the terminal nodes
What is the output of BART?
- Bart provides B * K tree models
- to obtain a single prediction we take the average of these models.
- firstly however, we remove L burn in iterations, these are mostly the first models at each K because those models are not very good.
- next to averaging other quantities about the final prediction can be calculated, for example the percentiles provide a measure of uncertainty of the final prediction.
What is the advantage of BART over boosting?
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding