---
title: "Nonlinear Principal Components Analysis: Multivariate Analysis with Optimal Scaling (MVAOS)"
author: "D G Rossiter"
date: "`r Sys.Date()`"
output:
html_document:
code_folding: show
fig_align: center
fig_height: 4.5
fig_width: 4.5
number_sections: yes
theme: spacelab
toc: yes
toc_float: yes
toc_depth: 4
---
# Introduction
This note is for those familiar with principal components analysis (PCA) on a multivariate set of continuous variables. The terms *loadings*, *scores*, *eigenvalues*, *screeplot* etc. should be familiar.
The concept PCA of can be extended to nominal or ordinal categorical variables. That is, transforming a large set of variables to another set, ordered by decreasing amount of variability explained. The original variables have to be *transformed* to a numerical scale for this to work. This is referred to as Multivariate Analysis with Optimal Scaling (MVAOS) (De Leeuw _et al._, 2016).
MVAOS techniques quantify factors, replacing categorical values (in R terms, levels of factors) by real numbers, so that then PCA can be applied. But, the transformations are computed along with the PCA, in order to maximize the variance explained by each component in turn, as explained in the references.
There are two kinds of categorical variables (R factors):
*Ordinal* variables: the order is meaningful, but the interval between them are not necessarily the same. This is typical of the Likert scale [^1], in which respondents to a survey question indicate their level of agreement or disagreement with a statement on a symmetric scale. In R this is `is.ordered(...)` returns `TRUE`.
[^1]: https://en.wikipedia.org/wiki/Likert_scale
*Nominal* variables: are not ordered. The non-linear PCA procedure finds a transformation to best represent an ordering, based on the stepwise PCA. See the references for details. In R this is `is.ordered(...)` returns `FALSE`. Logical variables, i.e., `is.logical(...)` returns `TRUE`, are a special (binary) case of a nominal variable.
In this technical note we use the `Gifi` R package for MVAOS There are several other packages with somewhat different approaches, see the References.
## The Gifi procedure
"MVAOS is a *linear* multivariate technique in the sense that it makes linear combinations of transformed variables, and it is a *nonlinear* multivariate technique in the sense that these transformations are generally nonlinear."
The aim is to minimize a *loss function*:
$$
\sigma(H, A) = \sum_{\ell = 1}^L \mathrm{SSQ} (\sum_{j=1}^m H_j A_{j \ell})
$$
where $\mathrm{SSQ}$ is the unweighted sum-of-squares for each of the $L$ equation blocks, $H$ is the matrix of transformed variables, and $A$ is the _pattern_, containing the coefficients of the linear combinations.
# Package
The `Gifi` "Multivariate Analysis with Optimal Scaling" package follows the standard book (Gifi 1990). This re-implements the Gifi FORTRAN programs for R.
```{r}
require("Gifi")
```
# Synthetic dataset
The sample dataset is from "ABC Customer Satisfaction" from Kenett & Salini (2012). It is a customer satisfaction survey. All questions are ordinal on a Likert scale.
Load the example data and display the variable names:
```{r}
data(ABC)
names(ABC)
str(ABC)
```
Notice all are R unordered factors, i.e. nominal. These are in fact ordinal, since they come from Likert scales. We will see how to treat them as both ordinal and nominal.
Make a subset with fewer variables, for demonstration:
```{r}
ABC6 <- ABC[,6:11]
```
## Ordinal MVAOS
We compute the MVAOS PCA using `Gifi::princals`. The default is for two components, treating the cateogries as ordinal, as is typical of a Likert scale. Also, the default is to treat the factors, no matter what the R data type, as *ordinal*.
### PCA
We perform the MVAOS PCA asking for two components.
```{r}
(fitord <- princals(ABC6)) ## default is ordinal=TRUE
summary(fitord)
```
Here we see the proportion of transformed variance explained by each PC, in this case about 49% for PC1, 14% for PC2. About 37% is still unexplained.
### Diagnostic plots
#### Transformations
These show how the original variable is transformed for use in the PCA.
```{r fig.width=8, fig.height=8}
plot(fitord, plot.type = "transplot")
```
These are all step functions, because the default is `ordinal=TRUE`.
We can see the transformations for the first few objects:
```{r}
ABC6[1:6,]
fitord$transform[1:6,]
```
#### Component loadings
We can see all the loadings, including those $< 0.1$ not shown in the model summary:
```{r}
round(fitord$loadings,3)
```
And then plot them; these loading plots are similar to those we would see for continuous variables.
```{r}
plot(fitord, "loadplot", main = "Loadings Plot ABC Data") ## aspect ratio = 1
```
Here, PC1 is a "size" component, i.e., more or less satisfaction with all measures.
Since the arrows point the same way in the loadings plot, satisfactions on all items are positively correlated.
Here PC2 contrasts sales/pricing (positive PC2) with training/technical (negative PC2). That is, after accounting for overall satsifaction, these two aspects of satisfaction are opposite. Satisfaction with "Equipment" has no loading in PC2, it is not part of this contrast.
#### Biplot
The biplot shows the loading vectors along with the PC scores in the space of the first two PCs. This helps find unusual observations that do not fit the overall pattern of inter-variable relations.
```{r fig.height=7, fig.width=7}
plot(fitord, "biplot",
main = "Biplot ABC Data")
```
This shows where the individual cases fall in the PC space. For example responent 135 is much more satisfied with training/techical than with sales/pricing , and is overall quite dissatisfied (far negative score for PC1).
For example:
```{r}
ABC6[135,]
```
This respondent gave the minimum score to all variables except `training`, which received a neutral score.
```{r}
ABC6[51,]
```
This respondent was quite satisfied with `technical` and `training` and `purchase`, which go together in PC1/2 space, but not at all satisfied with `sales` and `pricing`, which also go together, but are oppose to the other three in PC2 space.
#### Screeplot
This shows the relative amount of variance explained.
```{r fig.width=7, fig.height=4}
plot(fitord, "screeplot")
```
This shows that 2 components are probably all that is needed.
### Cross-classification
We can check the correlation between two factors with cross-classification tables. Here we pick two that seem highly correlated in PC1/2 space.
```{r}
(t.actual <- table(ABC6$purchase, ABC6$technical))
t.purchase <- table(ABC6[,"purchase"])
t.technical <- table(ABC6[,"technical"])
(t.expected <- round(outer(t.purchase, t.technical/sum(t.purchase))))
```
Difference actual and expected:
```{r}
print(t.actual - t.expected)
```
Very close agreement, athough some discrepency in class 5 "highly satisfied".
## Nominal MVAOS
This if we can't assume a rank.
### PCA
```{r}
(fitnom <- princals(ABC6, ordinal=FALSE)) ## nominal PCA
summary(fitnom)
summary(fitord)
```
The loadings, eigenvalues and proportion of variance accounted for (`VAF`) are not the same as for the ordinal case, although very close. Notice that the signs of the loadings for PC2 is reversed. The loss value is the same.
Notice that many more iterations were required to find a solution than were required for the ordinal case.
### Diagnostic plots
#### Transformations
```{r fig.width=8, fig.height=8}
plot(fitnom, plot.type = "transplot", stepvec=rep(FALSE,6))
```
Not step functions, not necessarily monotonic (see "equipmnet").
Look at the transformations of two of the variables:
"Technical": "Overall satisfaction level from technical support. (1 ... very low, 5 ... very high)". So here we consider it nominal, not ordinal (which it really is). And the transformation is now reversed, but the sequence remains the same.
"Equipment": Overall satisfaction level from the equipment. (1 ... very low, 5 ... very high). Somehow the transformation rates "low" worse than "very low".
#### Component loadings:=
We can see all the loadings, including those $< 0.1$ not shown in the model summary:
```{r}
round(fitnom$loadings,3)
```
And plot them:
```{r}
plot(fitnom, "loadplot", main = "Loadings Plot ABC Data") ## aspect ratio = 1
```
PC1 is a "size" component (overall satisfaction) but "technical" is opposite this because of the transformation, which no longer preserve the order; PC2 contrasts sales/pricing with training/purchase.
#### Biplot
```{r fig.width=7, fig.height=7}
plot(fitnom, "biplot", main = "Biplot ABC Data")
```
Again we see the anomalies, such as observations 51 and 135.
#### Screeplot:
```{r fig.width=7, fig.height=4}
plot(fitord, "screeplot")
```
This is very similar to the ordinal case.
# A real dataset
This is a messier dataset, a random sample from a real study, of which only a few variables are reported here -- also we don't say where they came from, and sensitive information has been coded so that it can't be related to the original value. The variables are five demographic indicators: age group, education level, employment type, marital status, and self-reported ethnicity; also car ownership status.
```{r}
load("NonlinearPCA_example.Rdata", verbose=TRUE)
dim(ds.test)
names(ds.test)
summary(ds.test)
```
There are `r dim(ds.test)[1]` survey responses and `r dim(ds.test)[2]` variables .
We have arranged `AgeGroup` and `Education` to have a logical order, but not the others, which are just nominal categories.
We use nonlinear PCA to see how much redundancy there is in these variables, and their inter-relation.
## PCA
With so many variables we choose to start with three components.
We specify a Boolean vector for `ordinal` to `princcals`, with the two factors `AgeGroup` and `Education` as ordinal, the others as nominal.
```{r}
names(ds.test)
(fitreal <- princals(ds.test, ndim=3, ordinal=c(T,F,T,F,F,F)))
summary(fitreal)
```
70% of the (transformed variables) variance is explained. The loadings show which variables are represented by each component. For the two ordinal variables, we can interpret the loadings as "more" or "less", although the sign is arbitary. For the nominal variables, we can't interpret as "more" or "less", just that the variables (after transformation) go together in this PC space.
So in PC1 `AgeGroup`, `Education` and `Employment` go together, opposite to the transformed `Mari`.
PC2 is dominated by `Ethnicity`, which is absent from PC1; this has some correlation with `Education` in this component.
PC3 is dominated by `Car`, which is negatively related to `Employment` and `AgeGroup`.
## Diagnostic plots
Examine the details of the transformations and PCA.
### Transformations
```{r fig.width=8, fig.height=8}
plot(fitreal, plot.type = "transplot", stepvec=c(T,F,T,F,F,F))
```
Note the major jump in transformation of `AgeGroup` after `A1` (the youngest group).
Note also the large discrepency between the transformed value of `Student` and the other employment categories of `Employment`.
### Screeplot
```{r fig.width=7, fig.height=4}
plot(fitreal, plot.type = "screeplot")
```
It's clear more components are needed to account for the majority of the variance. This shows that these variables are fairly independent -- this would be good if they are used as predictors in a multivariate regression.
However, with 3 PC we have reached the "knick point" of the screeplot.
### Component loadings
We can see all the loadings, including those $< 0.1$ not shown in the model summary:
```{r}
round(fitreal$loadings,3)
```
Then plot them:
```{r}
plot(fitreal, "loadplot", main = "Loadings Plot") ## aspect ratio = 1
plot(fitreal, "loadplot", main = "Loadings Plot", plot.dim=2:3) ## aspect ratio = 1
```
These show graphically the numeric values of the loadings from the model summary.
### Biplot:
```{r fig.width=8, fig.height=8}
plot(fitreal, "biplot", main = "Biplot transit data")
plot(fitreal, "biplot", main = "Biplot transit data", plot.dim=c(2,3))
```
The numbers are indivuals, too many to see all of them clearly.
Some are "unusual" in the PC space, for example:
```{r}
ds.test[92,]
```
A young (group `A1`) single student with the least education and no car.
### Transformation details
We can examine details of the transformation. For some interesting individuals:
The original values:
```{r}
cases <- c(92, 208, 37, 207, 205, 126)
fitreal$data[cases,]
```
Converted to numbers (R factor levels):
```{r}
fitreal$datanum[cases,]
```
The transformed values for these individuals:
```{r}
fitreal$transform[cases,]
```
The scores of each object in PC space:
```{r}
fitreal$objectscores[cases,]
```
These scores could be used as transformed variables in a regression analysis:
```{r}
ds.test$D1 <- fitreal$objectscores[,"D1"]
ds.test$D2 <- fitreal$objectscores[,"D2"]
ds.test$D3 <- fitreal$objectscores[,"D3"]
```
Note that by design these are uncorrelated:
```{r}
round(cor(ds.test$D1, ds.test$D2), 10)
round(cor(ds.test$D1, ds.test$D3), 10)
round(cor(ds.test$D2, ds.test$D3), 10)
```
## Cross-classification
To see which levels of the predictors are correlated in bivariate space, we use a cross-classification matrix.
For example: ethnicity vs. education level:
```{r}
# actual
(t.ee <- table(ds.test[,"Ethnicity"], ds.test[,"Education"]))
```
The expected values are computed from the marginal totals:
```{r}
# expected
t.eth <- table(ds.test[,"Ethnicity"])
t.edu <- table(ds.test[,"Education"])
(t.ee.expected <- round(outer(t.eth, t.edu)/sum(t.edu)))
```
Difference actual and expected:
```{r}
(t.ee - t.ee.expected)
```
There is quite some difference. In this case, the actual values show a lack of independence. Specifically, `Eth1` have more than expected education status `E1`, whereas `Eth3` have fewer. So these are not independent. We saw in the loadings plots that these two variables went together in PC2, but not the reason for it.
Test whether the expected is different from the actual by a $\chi^2$ test:
```{r}
# different?
chisq.test(t.ee)
```
As expected, rejecting the null hypothesis of independence is quite unlikely to be an error.
# References
Gifi, A. (1990). Nonlinear Multivariate Analysis. Wiley.
De Leeuw, J., Mair, P., & Groenen, P. (2016, September 26). Multivariate Analysis with Optimal Scaling. https://bookdown.org/jandeleeuw6/gifi/
Kenett, R. S., & Salini, S. (2012). Modern Analysis of Customer Surveys with Applications in R. New York: Wiley
Linting, M., Meulman, J. J., Groenen, P. J. F., & van der Kooij, A. J. (2007). Nonlinear principal components analysis: Introduction and application. Psychological Methods, 12(3), 336–358. https://doi.org/10.1037/1082-989X.12.3.336
Some packages not used here:
Le, S., Josse, J., & Husson, F. (2008). `FactoMineR`: An R package for multivariate analysis. Journal of Statistical Software, 25(1), 1–18.
Mair, P., & de Leeuw, J. (2010). A general framework for multivariate analysis with optimal scaling: The R package `aspect`. Journal of Statistical Software, 32(9), 1–23.
de Leeuw, J., & Mair, P. (2009). Gifi Methods for Optimal Scaling in R: The Package `homals`. Journal of Statistical Software, 31(4), 1–21.