Home / Summaries / Class notes - Business Intelligence & Business Analytics / dataset-training-records

BIBA - Naïve Bayes - Choosing K

Q: How do you code a classification in python?

Load data # load data from the source DS = pd.read_csv(r’C:......File.csv’) Then, split columns into dependent and independent variables predictors = [‘Outlook', ‘Temperature’, ‘Humidity’, ‘Wind'] X = pd.get_dummies(DS[predictors]) y = dataset_iris[‘PlayTennis’].values Method train_test_split to separate: Records into training and testing X_train, X_test, y_train, y_test = train_test_split( DS.data, DS.target, test_size=0.2, random_state=109 ) Train the model: model.fit(X_train, y_train) Predict values for dependent variable: y_pred = model.predict(X_test)

7 important questions on BIBA - Naïve Bayes - Choosing K

What happens when you choose a value for K that is too low, high or same as the number of records in training dataset?

k is too low:

may be fitting to the noise in the dataset

k is too high:

miss out on the method’s ability to capture the local structure in the dataset, one of its main advantages

k is the number of records in training dataset:

assign all records to the majority class in the training data

What is typical about balancing K, number K when the structure of data is complex and irregular and typical values of K?

Balanced choice depends on the nature of the data
The more complex and irregular the structure of the data, the lower the optimum value of k

Typically:

Values of k fall in the range 1 to 20
Use odd numbers to avoid ties

How do you make a validation dataset?

Validation dataset:

Take a subset of the training dataset
Use them for the selecting the model

Predict the class for the records in validation
Use different values of k, e.g., equal to 3, 4, 5, etc.
Choose k that minimize validation error

How do you calculate the error rate of a validation dataset?

Error rate:

Percentage of mistakes I.e., assigned an incorrect class to records

How can you extend the algorithm to predict continuous values instead of categorical values?

First step remains unchanged, I.e., determining neighbors by computing distances
Second step must be modified I.e., determining class through majority voting
Determine the prediction by taking the average outcome value of the k-nearest neighbors

What are advantages of K Nearest Neighbors?

Simplicity of the method
Lack of parametric assumptions
Performs surprisingly well especially when

There is a large enough training set present
Each class is characterized by multiple combinations of predictor values

How do you code a classification in python?

Load data
# load data from the source

DS = pd.read_csv(r’C:\......\File.csv’)

Then, split columns into dependent and independent variables

predictors = [‘Outlook', ‘Temperature’, ‘Humidity’, ‘Wind']
X = pd.get_dummies(DS[predictors])
y = dataset_iris[‘PlayTennis’].values

Method train_test_split to separate:
Records into training and testing

X_train, X_test, y_train, y_test = train_test_split( DS.data, DS.target, test_size=0.2, random_state=109 )

Train the model:

model.fit(X_train, y_train)

Predict values for dependent variable:

y_pred = model.predict(X_test)

The question on the page originate from the summary of the following study material:

Business Intelligence & Business Analytics

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

BIBA - Naïve Bayes - Choosing K

7 important questions on BIBA - Naïve Bayes - Choosing K

What happens when you choose a value for K that is too low, high or same as the number of records in training dataset?

What is typical about balancing K, number K when the structure of data is complex and irregular and typical values of K?

How do you make a validation dataset?

How do you calculate the error rate of a validation dataset?

How can you extend the algorithm to predict continuous values instead of categorical values?

What are advantages of K Nearest Neighbors?

How do you code a classification in python?

Summaries related to BIBA - Naïve Bayes - Introduction to K Nearest Neigbors

Class notes - Business Intelligence & Busine…

Class notes - MSc. Information Management

Class notes - Enterprise Architecture as a B…

Class notes - Interactive Data Transformation

Class notes - Digital Transformation

Indian Economics

Global politics

Essentials of international relations

Behavioral genetics

Management and organisational behaviour

Follow Up Engels idioom 4/5 H

International Business