K-Nearest Neighbor

April 02, 2020

K-Nearest Neighbor

The K-Nearest Neighbors algorithm is a powerful supervised machine learning algorithm typically used for classification. However, it can also perform regression.

K-Nearest Neighbors (KNN) is a classification algorithm. The central idea is that data points with similar attributes tend to fall into similar categories.

K-Nearest Neighbors :from sklearn.neighbors import KNeighborsClassifier

Three steps of the K-Nearest Neighbor Algorithm:

Normalize the data
Find the k nearest neighbors
Classify the new point based on those neighbors

Euclidean Distance

To find the Euclidean distance between two points, we first calculate the squared distance between each dimension. If we add up all of these squared differences and take the square root, we’ve computed the Euclidean distance.

The image below shows a visual of Euclidean distance being calculated:

 $d = (a_{1} - b_{1})^{2} + (a_{2} - b_{2})^{2}$

Manhattan Distance

Manhattan Distance is extremely similar to Euclidean distance. Rather than summing the squared difference between each dimension, we instead sum the absolute value of the difference between each dimension.

The equation is shown below:


 $\mid a_1 - b_1 \mid + \mid a_2 - b_2 \mid + \ldots + \mid a_n - b_n \mid$

Note that Manhattan distance will always be greater than or equal to Euclidean distance. Take a look at the image below visualizing Manhattan Distance:

The Manhattan distance between two points.




 $d = ∣ a_{1} - b_{1} ∣ + ∣ a_{2} - b_{2} ∣$

Hamming Distance

Hamming Distance is another slightly different variation on the distance formula. Instead of finding the difference of each dimension, Hamming distance only cares about whether the dimensions are exactly equal. When finding the Hamming distance between two points, add one for every dimension that has different values.

Hamming distance is used in spell checking algorithms. For example, the Hamming distance between the word “there” and the typo “thete” is one. Each letter is a dimension, and each dimension has the same value except for one.

SciPy Distances

Python’s SciPy library:

Euclidean Distance .euclidean()
Manhattan Distance .cityblock()
Hamming Distance .hamming()

Search This Blog

Data Science

K-Nearest Neighbor

Euclidean Distance

Manhattan Distance

Hamming Distance

SciPy Distances

Python’s SciPy library:

Comments

Post a Comment

Popular posts from this blog

Support Vector Machines

ACCURACY, RECALL, PRECISION, F1 SCORE

Random Forest