The k-nearest neighbors (KNN) technique is a simple supervised machine learning approach that may be used to address classification and regression issues. It's simple to set up and comprehend, but it has a big downside of becoming substantially slower as the bulk of the data in use rises. KNN works by calculating the distances between a query and all of the instances in the data, then picking the number of examples (K) closest to the query and voting for the most frequent label (in the case of classification) or averaging the labels (in the case of regression).

The figure above is “Birds of a feather flock together.” Observe how, in most cases, comparable data points are near to each other in the graphic above. The KNN algorithm is based on the premise that this assumption is true enough for the algorithm to be beneficial. KNN captures the concept of similarity (also known as distance, proximity, or closeness) with certain mathematics we may have learned as children, such as calculating the distance between points on a graph. Before proceeding, it is important to understand how we compute the distance between points on a graph. If you are unfamiliar with or need a reminder on how to perform this calculation, read "Distance Between 2 Points" in its entirety and return.


  • from sklearn.neighbors import KNeighborsClassifier

Full Code Exercise of K-Neighbors Classifier

Scikit-learn is a very popular Machine Learning library for Python. In this kernel let us use it to build a machine learning model using k-Nearest Neighbors algorithm to predict whether the patients in the "Pima Indians Diabetes Dataset" have diabetes or not.

Download the file of diabetes.csv here and see the full code below :