Predictive Analytics I: Data Mining Process, Methods, and Algorithms - Data Mining Methods
9 important questions on Predictive Analytics I: Data Mining Process, Methods, and Algorithms - Data Mining Methods
Give examples of situations in which classification would be an appropriate data mining technique. Give examples of situations in which regression would be an appropriate data mining technique.
whether the weather on a particular day will be "sunny", "rainy" or "cloudy".
Credit approval: good or bad credit risk
Store location: e.g. good, moderate, bad
regression: if what is being predicted is a numeric value
e.g. predict temperature.
List 2 popular estimation methodologies used for classification-type data mining models.
K-fold cross validation: rotation estimation: dataset is randomly split up into k mutually exclusive subsets of ca. equal size. training on all but one and testing on that one, --> k-times
List and briefly define at least 2 classification techniques.
Statistical analysis: logistic regression, ...
Neural networks: among the most popular: for classificaton-type problems
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What are some of the criteria for comparing and selecting the best classification technique?
- Predictive accuracy
- Speed
- Robustness
- Scalability
- Interpretability
Define Gini Index. What does it measure?
What is an ensemble model in data mining? What are the pros and cons of ensemble models?
Pros:
improved accuracy
improved robustness
Cons:
increase of the model complexity
lack of interpretability (i.e. transparancy)
Give examples of situations in which cluster analysis would be an appropriated data mining technique.
market segmentation of customers (for CRM systems)
What is the major difference between cluster analysis and classification?
Classification learns the function between independent and output variable through a supervised learning process. With cluster analysis, class labels are unknown. Cluster analysis is an exploratory data analysis tool and the membership of object is learned though an unsupervised learning proces, where only input variables are presented.
What are some of the methods for cluster analysis?
- Statistical methods: k-means, k-modes
- Neural networks (with self-organizing map architecture)
- Fuzzy logic
- Genetic algorithms
- Divisive
- Agglomerative
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding