Predictive Analytics I: Data Mining Process, Methods, and Algorithms - Data Mining Process
3 important questions on Predictive Analytics I: Data Mining Process, Methods, and Algorithms - Data Mining Process
Why do you think the early phases (understanding of the business and understanding of the data) take the longest in data mining projects?
What are the main data preprocessing steps? Briefly describe each step, and provide relevant examples.
Data Cleaning: Impute values, Reduce Noise, Eliminate duplicates. e.g. lacking attribute values, containing of errors or outliers and inconsistencies
Data Transforming: reduce the number of values to standard values. discretize / categorize
Data Reduction
How does CRISP-DM differ from SEMMA?
SEMMA: Sample, Explore, Modify, Model, Assess. (from SAS).
CRISP-DM takes a more comprehensive approach, including understanding business and data, to data mining projects. SEMMA implicitly assumes that the data mining project's goals and objective along with the appropriate data sources have been identified and understood.
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding