【关键词:数据降维,PCA】
PCA(Principal Components Analytics)重要数据的定位分析:
意义:取最能表现数据趋势的一些数据。
方法:
找出最能体现数据(variance)的数据。忽略其它
- 画出两个维度的拟合直线,L1,L2
-
分析数据与L1,L2的 variance(可以简单的计算)
PCA
3.如果一条拟合直线对于variance没有过多的影响,即可以将此维度去除忽略。
Generalization(概括来进行PCAReduction):一种以相关度排序的方法,用来Reduce Dimention
X1, X2, X3, … Xp, original pvariables
Z1, Z2, Z3, … Zp, weighted averages of original variables
All pairs of Z variables have 0 correlation
Order Z’s by variance (z1 largest, Zp smallest)
Usually the first few Z variables contain most of the information, and so the rest can be dropped.
Normalizing data(规范化数据)
Regression-Based Dimention Reduction
� Multiple Linear Regression or Logistic Regression
� Use subset selection
� Algorithm chooses a subset of variables
� This procedure is integrated directly into the predictive task