K-Nearest Neighbor
K-Nearest Neighbor
The
K-Nearest Neighbors algorithm is a powerful supervised machine learning
algorithm typically used for classification. However, it can also
perform regression.
K-Nearest Neighbors (KNN)
is a classification algorithm. The central idea is that data points
with similar attributes tend to fall into similar categories.
K-Nearest Neighbors :from sklearn.neighbors import KNeighborsClassifier
Three steps of the K-Nearest Neighbor Algorithm:
- Normalize the data
- Find the
k
nearest neighbors - Classify the new point based on those neighbors
Euclidean Distance
To find the Euclidean
distance between two points, we first calculate the squared distance
between each dimension. If we add up all of these squared differences
and take the square root, we’ve computed the Euclidean distance.
The image below shows a visual of Euclidean distance being calculated:
Manhattan Distance
Manhattan Distance is extremely similar to Euclidean distance. Rather than summing the
squared difference between each dimension, we instead sum the absolute
value of the difference between each dimension.
The equation is shown below:
Note that Manhattan distance will
always be greater than or equal to Euclidean distance. Take a look at
the image below visualizing Manhattan Distance:
Hamming Distance
Hamming Distance
is another slightly different variation on the distance formula.
Instead of finding the difference of each dimension, Hamming distance
only cares about whether the dimensions are exactly equal. When finding
the Hamming distance between two points, add one for every dimension
that has different values.
Hamming distance is used in spell
checking algorithms. For example, the Hamming distance between the word
“there” and the typo “thete” is one. Each letter is a dimension, and
each dimension has the same value except for one.
SciPy Distances
Python’s SciPy library:
- Euclidean Distance
.euclidean()
- Manhattan Distance
.cityblock()
- Hamming Distance
.hamming()
Comments
Post a Comment