k-近邻算法

原理:计算当前点(无标记)和其他每个点(有标记)的距离并升序排序,选取k个最小距离的点,根据这k个点对应的类别进行投票,票数最多的类别的即为该点所对应的类别。
代码实现:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import neighbors
from sklearn.metrics import accuracy_score

def get_iris():
    iris_data = load_iris()
    X_train, X_test, y_train, y_test = train_test_split(iris_data.data, iris_data.target, test_size=0.4, random_state=0)
    return X_train, X_test, y_train, y_test

def knn_classify(self_point, dataset, labels, k):
    distance = [np.sqrt(sum((self_point - d)**2)) for d in dataset]
    train_data = zip(distance, labels)
    train_data = sorted(train_data, key=lambda x: x[0])[:k]
    self_label = {}
    for i in train_data:
        i = str(i[1])
        self_label[i] = self_label.setdefault(i, 0) + 1
    self_label = sorted(self_label, key=self_label.get, reverse=True)
    return self_label[0]


X_train, X_test, y_train, y_test = get_iris()
size = len(y_test)
count = 0
for t in range(len(X_test)):
    y_pre = knn_classify(X_test[t], X_train, y_train, 5)
    if y_pre == str(y_test[t]):
        count += 1
print('custom的准确率: ', count / size)

# 使用sklearn内置的KNN
knn = neighbors.KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
pre = knn.predict(X_test)
print('sklearn的准确率: ', accuracy_score(y_test, pre))

对比结果:
custom的准确率: 0.95
sklearn的准确率: 0.95

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容