自适应提升的正交匹配寻踪稀疏子空间聚类
daptively Boosted Sparse Subspace Clustering via Orthogonal Matching Pursuit
实际应用中,往往需要对高维数据进行处理。而高维数据事实上具有低维结构,可以用若干低维子空间近似表示。将若干高维子空间根据它们所属的低维子空间进行聚类的任务,称为子空间聚类。在比较宽泛的条件下,稀疏子空间聚类求解得到的自表示矩阵具有子空间保持性质。子空间保持性质确保在计算一个数据点的自表达时不利用来自不同的子空间的数据点,然而,如果自表示系数矩阵过于稀疏,则无法保证所建立的亲和度图中来自同一子空间的所有点一定构成连通分量,导致过分割问题,从而影响聚类的准确率。将集成学习引入稀疏子空间聚类中,提出了一种新的稀疏子空间聚类算法,称为自适应增强正交匹配稀疏子空间聚类(AdaBoost SSCOMP)。AdaBoost SSCOMP通过在迭代中改变字典中数据点的权重并重复采样,集成得到更稠密的自表示系数矩阵。证明了通过更改字典中数据点的权重并在迭代中重复采样的策略,可以增加同一子空间中更多数据点的参与度,增加单连通分量,提高连通性,并在子空间保持误差可接受的情况下,显著提高算法准确率。在合成数据集和真实数据集Extended Yale B和MNIST上进行了大量实验,实验结果证明了方法的有效性。 稀疏子空间聚类算法由于其宽泛的理论保证和出色的实验性能而备受关注。然而,稀疏子空间聚类算法所计算的自表达系数往往过于稀疏,以至于无法保证所诱导的亲和度图中来自同一子空间的所有数据点构成连通分量——这会引起后续的谱聚类产生严重的过分割问题,导致聚类准确率显著下降。本文尝试把集成学习思想引入到稀疏子空间聚类算法中用于增加非零自表达系数。具体地,稀疏自表达系数使用正交匹配寻踪算法多次求解,每次求解自表达系数时自动抑制掉已经关联到非零自表达系数的数据点;因此,称之为自适应提升的正交匹配寻踪稀疏子空间聚类算法(Adaptively Boosted Sparse Subspace Clustering via Orthogonal Matching Pursuit:AdaBoost SSCOMP)。AdaBoost SSCOMP算法通过重复采样策略多次计算然后进行集成,从而可以得到更多的非零自表达系数。本文在合成数据集和真实数据集Extended Yale B和MNIST上进行了大量实验,实验结果证明了所提出方法的有效性。
Sparse subspace clustering algorithm has attracted much attention due to its broad theoretical guarantees and excellent experimental performance. However, the self-expression coefficients calculated by the sparse subspace clustering algorithm is often too sparse to ensure that all data points from the same subspace in the induced affinity graph constitute connected components --- which will cause serious over-segmentation problems in subsequent spectral clustering, leading to a significant drop in clustering accuracy. This paper attempts to introduce an ensemble learning idea into sparse subspace clustering algorithms to increase non-zero self-expression coefficients. Specifically, the sparse self-expression coefficients are solved via Orthogonal Matching Pursuit algorithm multiple times, and the data points associated with non-zero self-expression coefficients are automatically suppressed when solving the self-expression coefficients in the next time; therefore, it is called Adaptive Boosted Sparse Subspace Clustering via Orthogonal Matching Pursuit (AdaBoost SSCOMP). In AdaBoost SSCOMP, the computed self-expression coefficients are integrated and then induce a connectivity improved affinity graph. To validate the effectiveness of the proposed approach, extensive experiments are conducted on both synthetic and real world datasets, yielding notable performance improvement.
黄致远、李春光、范晓翰
计算技术、计算机技术
稀疏子空间聚类自表示模型集成学习
Sparse Subspace ClusteringSelf-expression ModelEnsemble Learning
黄致远,李春光,范晓翰.自适应提升的正交匹配寻踪稀疏子空间聚类[EB/OL].(2023-05-19)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/202305-177.点此复制
评论