-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
73 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
--- | ||
comments: true | ||
header-includes: | ||
- \usepackage[ruled,vlined,linesnumbered]{algorithm2e} | ||
--- | ||
# K近邻 | ||
|
||
## 问题及算法描述 | ||
|
||
给定数据集$\{(x_i,y_i)\}$, 其中$y_i \in \{ c_1,c_2,....c_k\}$ 一共有K种类别,现在给出一个新的数据点$x$,求出它的类别$y$。 | ||
算法思想: | ||
|
||
- 找出离x最近的K个邻居$N_{k}(x)$ | ||
- 由着K个邻居进行多数表决: | ||
|
||
$$ | ||
y =\underset{c_j}{\operatorname{argmax}} \sum_{N_{k}(x)} I(y = c_j) | ||
$$ | ||
|
||
其中的$I(y = c_j)$是指示函数,当$y = c_j$时,$I(y = c_j) = 1$,否则$I(y = c_j) = 0$。 | ||
|
||
## K近邻的三要素 | ||
|
||
有了上述的算法框架后,K近邻算法由三个要素决定: | ||
|
||
=== "距离度量" | ||
一般来说我们使用$L_p$距离,即: | ||
|
||
$$ | ||
L_p(x_i,x_j) = (\sum_{l=1}^{n}|x_i^{(l)} - x_j^{(l)}|^p)^{\frac{1}{p}} | ||
$$ | ||
|
||
当p的取值为2时,就是欧氏距离 | ||
=== "K值选择" | ||
当K比较小的时候,模型比较复杂,容易过拟合; | ||
当K比较大的时候,模型比较简单,容易欠拟合。 | ||
一般来说K值都不会选的太大。 | ||
=== "分类决策规则" | ||
一般使用多数表决规则,在此情景下多数表决规则等效于经验风险最小化。 | ||
|
||
|
||
## 用kd树切分空间求解K近邻 | ||
|
||
### kd树的构造 | ||
|
||
- 输入: $\{(x_i,y_i)\}$ | ||
- 开始: 取变量$x$的第一个维度$x_i^(1)$,第一个纬度的中位数$x_j^(1)$,将数据集分为两部分,左子树和右子树,而$x_j^(1)$就是根节点,左右子节点的深度为1 | ||
- 重复: 对左右子树继续进行分割,对于j深度的子树,取$x$的第$j+1$维度$x_i^(j+1)$进行分割 | ||
- 结束: 直到左右空间都不再有子节点 | ||
|
||
??? example "课件上的例子" | ||
![](images/KNN/2023-11-21-15-29-48.png#pic) | ||
|
||
### 搜索kd树寻找K近邻 | ||
|
||
- 输入: kd树,目标点$x$(注意此处的x一般而言不是数据集中的点) | ||
- 开始: 从根节点开始,递归的向下访问kd树,在每个节点把对应维度跟该节点比较决定向左还是向右,直到叶子节点 | ||
- 重复: 维护一个当前的最近距离,这个当前最近距离的初始值就是跟到达叶节点的距离,仅有某个子树可能含有距离小于此最小距离的的节点时我们才访问之 | ||
- 向上回溯,首先尝试用父节点更新最小距离,然后计算一下兄弟空间是否跟以x为中心,以最小距离为半径的超球相交,如果相交,对兄弟子树进行kd树搜索 | ||
- 结束: 回溯到根的时候结束,当前维护的最近点就是全局最近点 | ||
|
||
一次搜索给出一个最近邻点,时间复杂度用为$O(logn)$,如果要找出K个最近邻点,时间复杂度为$O(klogn)$ | ||
|
||
??? example "PPT上的例子" | ||
![](images/KNN/2023-11-21-15-39-15.png#pic) | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters