K-means clustering

12 important questions on K-means clustering

What is the semantical value of a distance in the context of clustering?

It describes the level of similarity between items
And it is the leading metric on deciding items to belong together

What is a feature vector?

An n-dimensional vector that captures relevant metrics, the features, to discriminate items against in a clustering task

What is a similarity function?

A function to compute the similarity between two items based on their feature vector
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What is the target/goal a K-means clustering algorithm aims to achieve?

Achieve the minimal total distance between data points and closest cluster centers

Why use heuristic functions when computing a K-Means clustering?

It reduces computation costs

What is the trade-off for applying a heuristic function when computing a clustering with a K-Means algorithm?

The quality of the solution found in contrast to computation costs

What are the three steps in the K-Means clustering algorithm?

Initialization
Assignment
Update

What happens in the assignment-step in the K-Mean algorithm?

Data points are assigned the label to the cluster center they reside the most nearest at

What happens in the update-step in the K-Means algorithm?

The cluster centers are moved into the direction of the accumulated weight point of the data point cloud

When do we achieve convergence in the iterative process of the K-Means algorithm?

Once no data point has changed its label
And the cluster centers did not move either

Name three approaches of initiating K-Means cluster centers.

Randomly in space (within range bounds)
Random data points
Randomly label points and derive initial cluster centers from that

Neglecting the possibility of local minima, what are two guaranteed shortcomings of a clustering result with the K-Means algorithm?

Data patterns, the structure of data points, cannot be cannot be sensed
Data points are strictly categorized with a single label, even though

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo