Pre-processing and Performance Evaluation

5 important questions on Pre-processing and Performance Evaluation

Why is a test set not used for validation?

To avoid the peeking effect, where information from the test set is leaked into the learning algorithm.

Why is postpruning more successful than early stopping?

certain combinations of attribute values are very effective, others are not; post-pruning can discovers this

What are two other pruning methods, besides postpruning?

1.        subtree replacement: uses the approach as described on slide 12 while considering all nodes in the tree
2.         subtree raising (used in the famous C4.5 algorithm, J48 in WEKA): entire subtree is raised toward the root
             •and replaced by its child tree
             •leaves of the raised subtree need to be reclassified
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

Explain True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).

•Example: test record with “Patrons” = “Some” and “Wait” = False
•Known label:            False
•Predicted label:      True


Test results (for binary classification)
•True positives (TP): positive test examples that are correctly labeled by the classifier / concept description
•True negatives (TN): negative test examples that are correctly labeled by the classifier / concept description
•False Positives (FP): negative test examples that are incorrectly labeled as positive by the classifier / concept description
•False Negatives (FN): positive test examples that are incorrectly labeled as negative by the classifier / concept description

Why can a One Rule be a good method to use?

•For each attribute:
•For each value of that attribute, make a rule, i.e.,:
             count how often each class appears;
             find most frequent class;
             make the rule assign that class to this attribute-value;
•Calculate the error of the rule;
•Choose the rules of attribute with smallest error rate;

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo