Our goal was to write a practical guide to multivariate analysis, visualization and interpretation,focusing on principal component methods.
This book was built with:
- R 3.3.2
- factoextra 1.0.5
- FactoMineR 1.36
- ggpubr 0.1.5
- dplyr 0.7.2
- bookdown 0.4.3
01.
There are a number of R packages implementing principal component methods. These packages include: FactoMineR, ade4, stats, ca, MASS and ExPosition.
4 Principal Component Analysis
4.1 Introduction
PCA reduces the dimensionality of a multivariate data to two or three principal components, that can be visualized graphically, with minimal loss of information.
4.2 Basics
Taken together, the main purpose of principal component analysis is to:
- identify hidden pattern in a data set,
- reduce the dimensionnality of the data by removing the noise and redundancy in the data,
- identify correlated variables
4.3 Computation
4.3.1 R packages
Here, we'll use the two packages FactoMineR (for the analysis) and factoextra (for ggplot2-based
visualization).
install.packages("FactoMineR")
devtools::install_github("kassambara/factoextra")
library(FactoMineR)
library(factoextra)
4.3.3 Data standardization
When scaling variables, the data can be transformed as follow:
xi−mean(x)sd(x) \frac{x_i - mean(x)}{sd(x)}
Where mean(x)mean(x) is the mean of x values, and sd(x)sd(x) is the standard deviation (SD).
The R base function scale() can be used to standardize the data. It takes a numeric matrix as an input and performs the scaling on the columns.
4.3.4 R code
4.4 Visualization and Interpretation
4.4.1 Eigenvalues / Variances
4.4.2 Graph of variables
4.4.2.1 Results
4.4.2.2 Correlation circle
4.4.2.3 Quality of representation
4.4.2.4 Contributions of variables to PCs
4.4.2.3 Quality of representation
4.4.2.5 Color by a custom continuous variable
4.4.2.6 Color by groups
He is the author of many popular R packages for:
- multivariate data analysis factoextra,
- survival analysissurvminer,
- correlation analysis ggcorrplot,
creating publication ready plots in Rggpubr.
Recently, he published three books on data analysis and visualization:
- Practical Guide to Cluster Analysis in R (https://goo.gl/DmJ5y5)
- Guide to Create Beautiful Graphics in R (https://goo.gl/vJ0OYb).
- Complete Guide to 3D Plots in R (https://goo.gl/v5gwl0)