Predictive Analytics I: Data Mining Process, Methods, and Algorithms - Data Mining Process

3 important questions on Predictive Analytics I: Data Mining Process, Methods, and Algorithms - Data Mining Process

Why do you think the early phases (understanding of the business and understanding of the data) take the longest in data mining projects?

Understanding business and data is primordial and a thorough understanding is needed because all next steps are build on the previous ones an especially the early ones in order to not put the whole study on an incorrect path.

What are the main data preprocessing steps? Briefly describe each step, and provide relevant examples.

Data consolidation
Data Cleaning: Impute values, Reduce Noise, Eliminate duplicates. e.g. lacking attribute values, containing of errors or outliers and inconsistencies
Data Transforming: reduce the number of values to standard values. discretize / categorize
Data Reduction

How does CRISP-DM differ from SEMMA?

CRISP-DM: Cross Industry Standard Process for Data Mining. (Consortium)
SEMMA: Sample, Explore, Modify, Model, Assess. (from SAS).

CRISP-DM takes a more comprehensive approach, including understanding business and data, to data mining projects. SEMMA implicitly assumes that the data mining project's goals and objective along with the appropriate data sources have been identified and understood.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo