Predictive Analytics I: Data Mining Process, Methods, and Algorithms - Data Mining Methods

9 important questions on Predictive Analytics I: Data Mining Process, Methods, and Algorithms - Data Mining Methods

Give examples of situations in which classification would be an appropriate data mining technique. Give examples of situations in which regression would be an appropriate data mining technique.

classification: if what is being predicted is a class label:
whether the weather on a particular day will be "sunny", "rainy" or "cloudy".
Credit approval: good or bad credit risk 
Store location: e.g. good, moderate, bad
regression: if what is being predicted is a numeric value
e.g. predict temperature.

List 2 popular estimation methodologies used for classification-type data mining models.

Simple split: simple random partitioning: training set (2/3 of the data set) and a test set (1/3 of the data set)
K-fold cross validation: rotation estimation: dataset is randomly split up  into k mutually exclusive subsets of ca. equal size. training on all but one and testing on that one, --> k-times

List and briefly define at least 2 classification techniques.

Decision trees: most popular classification techniques
Statistical analysis: logistic regression, ...
Neural networks: among the most popular: for classificaton-type problems
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What are some of the criteria for comparing and selecting the best classification technique?

  • Predictive accuracy
  • Speed
  • Robustness
  • Scalability
  • Interpretability   

Define Gini Index. What does it measure?

Gini Index measures the diversity of a population. It evaluates the goodness of the split

What is an ensemble model in data mining? What are the pros and cons of ensemble models?

Is essentially the process of intelligently combining the information created and provided by 2 or more information sources. Combining of the outputs of prediction models into a single composite score.
Pros:
improved accuracy
improved robustness
Cons:
increase of the model complexity
lack of interpretability (i.e. transparancy)

Give examples of situations in which cluster analysis would be an appropriated data mining technique.

Fraud detection for e.g. credit cards
market segmentation of customers (for CRM systems)

What is the major difference between cluster analysis and classification?


Classification learns the function between independent and output variable through a supervised learning process. With cluster analysis, class labels are unknown. Cluster analysis is an exploratory data analysis tool and the membership of object is learned though an unsupervised learning proces, where only input variables are presented.

What are some of the methods for cluster analysis?

  • Statistical methods: k-means, k-modes
  • Neural networks (with self-organizing map architecture)
  • Fuzzy logic
  • Genetic algorithms
  • Divisive
  • Agglomerative

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo