Summary: Knowledge Management And Business Intelligence

Study material generic cover image
  • This + 400k other summaries
  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Use this summary
Remember faster, study better. Scientifically proven.
Trustpilot Logo

Read the summary and the most important questions on Knowledge Management and Business Intelligence

  • 1 Data Science Foundations

  • 1.1 Preprocessing

    This is a preview. There are 10 more flashcards available for chapter 1.1
    Show more cards here

  • What are the steps in pre processing?

    Identify data sources
    Select data
    Clean data
    Transform data
  • What type of sample biasses are there?

    Sample selection bias: consider the selection mechanism
    Seasonality effects: consider the handling of time
  • How to treat missing values?

    1. Remove:
    Eliminate rows or columns. But could mean deleting usefull information

    2. Replace missing values
    - acquire true values: contact, purchase
    - imputation techniques: replace by mean, prediction

    3. Keep!
    - add variable called missing, or introduce dummy

    4. Weight-of-evidence
  • How to detect and treat outliers?

    Z = (x - mean) / st.dev
    If Z is > 3 it could be an outlier 

    Reduce impact by keeping the max value at z=3? Replace with 99% percentile

    Multivariate outliers: if multiple dimensions are considered simultaneously. Often just ignore them
  • What is feature engineering?

    Enrich data set as to increase predictive performance

    For instance: time-flattening: removing the time dimension by defining features that summarize performance period. 
    Or transforming from unstructured to structured data.
  • What is variable transformation?

    Normalization: rescale variables to typically [0,1]
    Standardisation: rescale data to have a mean zero and st.dev of one. 
    Transformation: to a normal distribution

    Advanced transformations: Box-Cox, Yeo-Johnson, Principle Component Analysis
  • How to handle course classifications?

    Pivotting tables and regrouping in order to create more distinction. Done via the Chi-squared test. The bigger its value, the better.
  • Why change a continuous variable to categorical?

    Interpretability: some prefer age segments

    Allows to incorporate non-linear relations within a linear model. And thus improve perfromance

    Sometimes for anonymization, or different applications.
  • How does weight-of-evidence work?

    WOE = ln (Distr. Good/Distr.Bad)*100
    Why take the ln of the "relative odds" and not the absolute odds? This way WOE is independant of class distribution and permits easy interpretation. 

    Information Value: IV = Sum(distr.good.cat - distr.bad.cat)*woe.cat)    

    Category boundaries can be given so as to maximise the predictive powers in terms of IV

    # of categories is a trade-off: fewer is simpler. More is to keep predictive power. 

    Binning: questions wether its with or without interaction
  • What are the pros and cons of WOE?

    All-in-one solution:
    - categorical to continuous
    - continuous to categorical
    - missing values
    - outliers
    - assessment of predictive strength
    - nonlinear relations in a linear, interpretable model

    Drawbacks: some loss of predictive power?

To read further, please click:

Read the full summary
This summary +380.000 other summaries A unique study tool A rehearsal system for this summary Studycoaching with videos
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart