问题:采用stasmodles进行单变量回归,结果显示存在多重共线性
错误提示:
OLS Regression Results
==============================================================================
Dep. Variable: eci_mid R-squared: 0.197
Model: OLS Adj. R-squared: 0.195
Method: Least Squares F-statistic: 82.93
Date: Wed, 23 Jun 2021 Prob (F-statistic): 7.68e-18
Time: 15:27:28 Log-Likelihood: -441.98
No. Observations: 339 AIC: 888.0
Df Residuals: 337 BIC: 895.6
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.3352 0.060 -5.592 0.000 -0.453 -0.217
in_degree 4.736e-05 5.2e-06 9.107 0.000 3.71e-05 5.76e-05
==============================================================================
Omnibus: 32.398 Durbin-Watson: 1.613
Prob(Omnibus): 0.000 Jarque-Bera (JB): 12.251
Skew: 0.204 Prob(JB): 0.00219
Kurtosis: 2.163 Cond. No. 1.42e+04
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.42e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
原因:不是与共线性有关(因为只有一个变量),而是与解释变量的缩放有关。
解决方法:把解释变量去中心化即可。参见回答中的小例子。
其他查看共线性的方法:查看相关性矩阵的特征值。如果特征值接近0,说明存在共线性,对应的特征向量表明了哪些变量是共线的。(在回归分析中经常采用方差膨胀因子(VIF)进行度量。)
You can detect high-multi-collinearity by inspecting the eigen values of correlation matrix. A very low eigen value shows that the data are collinear, and the corresponding eigen vector shows which variables are collinear. If there is no collinearity in the data, you would expect that none of the eigen values are close to zero.
详细解释参考:https://stackoverflow.com/questions/25676145/capturing-high-multi-collinearity-in-statsmodels