Discriminant Analysis (DA) predicts the probability of belonging to a given class based on a set of original variables.
Let’s focus on the application of linear algebra on statistics here.
The goal of LDA is to find a direction (basis vectors) in the data space that maximally separates categories of data. The objective function is defined as: \[ \tag{1} \begin{equation} \lambda=\frac{||\mathbb{X_B}\vec{w}||^2}{||\mathbb{X_W}\vec{w}||^2} \end{equation} \] In other words: we want to find a set of feature weights \(\vec{w}\) that maximize the ratio of the variance of data feature \(\mathbb{X_B}\), to the variance of data feature \(\mathbb{X_W}\).
Given \(||\mathbb{X_B}\vec{w}||^2=\mathbb{w^TX_B^T}\mathbb{X_B}\mathbb{w}, \mathbb{X_B^T}\mathbb{X_B}=\mathbb{C_B}\), to express equation (1) algebraically and generally \[ \begin{equation} \tag{2} \Lambda=\frac{\mathbb{W^TX_B^T}\mathbb{X_B}\mathbb{W}}{\mathbb{W^TX_W^T}\mathbb{X_W}\mathbb{W}} \Rightarrow \Lambda=\mathbb{W^{-1}C_W^{-1}C_BW} \Rightarrow \mathbb{C_BW}=\Lambda\mathbb{C_WW} \end{equation} \] To express equation (2) algebraically, the solution to LDA comes from a generalized eigendecomposition, i.e., simultaneous diagonalization of two matrices, on two covariance matrices. The eigenvectors are the weights, and the generalized eigenvalues are the variance ratio of each component.
Although LDA is widely used as a data reduction technique, it suffers from a number of problems.
To project the original data matrix onto a lower dimensional space, three steps needed to be performed:
between-class variance
)within-class variance
)# exmaple data classes
w1 <- matrix(c(1.00, 2.00, 3.00, 4.00, 5.00, 2.00, 3.00, 3.00, 5.00, 5.00), ncol = 2)
w2 <- matrix(c(4.00, 5.00, 5.00, 3.00, 5.00, 6.00, 2.00, 0.00, 2.00, 2.00, 3.00, 3.00), ncol = 2)
w1
## [,1] [,2]
## [1,] 1 2
## [2,] 2 3
## [3,] 3 3
## [4,] 4 5
## [5,] 5 5
w2
## [,1] [,2]
## [1,] 4 2
## [2,] 5 0
## [3,] 5 2
## [4,] 3 2
## [5,] 5 3
## [6,] 6 3
# calculate class means and total mean
u1 <- colMeans(w1) %>% t()
u2 <- colMeans(w2) %>% t()
u <- u1*nrow(w1)/(nrow(w1)+nrow(w2))+u2*nrow(w2)/(nrow(w1)+nrow(w2))
u1
## [,1] [,2]
## [1,] 3 3.6
u2
## [,1] [,2]
## [1,] 4.666667 2
u
## [,1] [,2]
## [1,] 3.909091 2.727273
sb1 <- 5*t(u1-u) %*% (u1-u)
sb1
## [,1] [,2]
## [1,] 4.132231 -3.966942
## [2,] -3.966942 3.808264
\[ S_{B_1}=n_1(\mu_1-\mu)^T(\mu_1-\mu)=5\begin{pmatrix}-0.91 & 0.87\end{pmatrix}^T\begin{pmatrix}-0.91 & 0.87\end{pmatrix}=\begin{pmatrix}4.13 & -3.97\\ -3.97 & 3.81\end{pmatrix} \]
sb2 <- 6*t(u2-u) %*% (u2-u)
sb2
## [,1] [,2]
## [1,] 3.443526 -3.305785
## [2,] -3.305785 3.173554
sb <- sb1 + sb2
sb # between-class covariance matrix
## [,1] [,2]
## [1,] 7.575758 -7.272727
## [2,] -7.272727 6.981818
# mean-center
d1 <- sweep(w1, 2, u1)
d2 <- sweep(w2, 2, u2)
d1
## [,1] [,2]
## [1,] -2 -1.6
## [2,] -1 -0.6
## [3,] 0 -0.6
## [4,] 1 1.4
## [5,] 2 1.4
d2
## [,1] [,2]
## [1,] -0.6666667 0
## [2,] 0.3333333 -2
## [3,] 0.3333333 0
## [4,] -1.6666667 0
## [5,] 0.3333333 1
## [6,] 1.3333333 1
within-class variance for each class (\(S_{Wi}\)) is calculated as follows \[ \mathbb{S_{W_j}}=\mathbb{d_j}^T\mathbb{d_j}=\sum_{i=1}^{n_j}\mathbb{(x_{ij-\mu_j})}^T\mathbb{(x_{ij-\mu_j})} \] The total within-class matrix (\(S_W\)) is then calculated as follows \[ \mathbb{S_W}=\sum_{i=1}^{c}\mathbb{S_{w_i}} \]
sw1 <- t(d1) %*% d1
sw2 <- t(d2) %*% d2
sw <- sw1 + sw2
sw1
## [,1] [,2]
## [1,] 10 8.0
## [2,] 8 7.2
sw2
## [,1] [,2]
## [1,] 5.333333 1
## [2,] 1.000000 6
sw # within-class covariance matrix
## [,1] [,2]
## [1,] 15.33333 9.0
## [2,] 9.00000 13.2
Do generalized eigendecomposition with the between-, and within-class covariance matrix
s <- eigen(solve(sw) %*% sb)
s$values
## [1] 2.783885 0.000000
s$vectors
## [,1] [,2]
## [1,] 0.6773521 -0.6925318
## [2,] -0.7356590 -0.7213873
# class 1 projection coordinate on v1
y11 <- w1 %*% s$vectors[,1]
y11
## [,1]
## [1,] -0.7939658
## [2,] -0.8522727
## [3,] -0.1749206
## [4,] -0.9688864
## [5,] -0.2915343
# class 2 projection coordinate on v1
y21 <- w2 %*% s$vectors[,1]
y21
## [,1]
## [1,] 1.2380905
## [2,] 3.3867605
## [3,] 1.9154426
## [4,] 0.5607384
## [5,] 1.1797836
## [6,] 1.8571357
# class 1 projection coordinate on v2
y12 <- w1 %*% s$vectors[,2]
y12
## [,1]
## [1,] -2.135306
## [2,] -3.549226
## [3,] -4.241757
## [4,] -6.377064
## [5,] -7.069596
# class 2 projection coordinate on v2
y22 <- w2 %*% s$vectors[,2]
y22
## [,1]
## [1,] -4.212902
## [2,] -3.462659
## [3,] -4.905434
## [4,] -3.520370
## [5,] -5.626821
## [6,] -6.319353
class <- c(rep("1", 5), rep("2",6))
df <- rbind(w1,w2) |> as.data.frame() |> cbind(class) |> rename(ft1=V1, ft2=V2)
y1 <- rbind(y11, y21)
y2 <- rbind(y12, y22)
df_p <- cbind(y1,y2) |> as.data.frame() |> cbind(class) |> rename(ft1=V1, ft2=V2)
# plot
par(mfrow=c(1,2))
plot(df$ft1,df$ft2, type="p", pch=19, xlab = "Axis 1", ylab = "Axis 2")
plot(df_p$ft1,df_p$ft2, type="p", pch=19, xlab = "LDAxis 1", ylab = "LDAxis 2")
# Load the data
data("iris")
# Split the data into training (80%) and test set (20%)
set.seed(123)
training.samples <- iris$Species %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- iris[training.samples, ]
test.data <- iris[-training.samples, ]
# Estimate preprocessing parameters
preproc.param <- train.data %>%
preProcess(method = c("center", "scale"))
# Transform the data using the estimated parameters
train.transformed <- preproc.param %>% predict(train.data)
test.transformed <- preproc.param %>% predict(test.data)
# fit LDA
model <- lda(Species~., data = train.transformed)
model
## Call:
## lda(Species ~ ., data = train.transformed)
##
## Prior probabilities of groups:
## setosa versicolor virginica
## 0.3333333 0.3333333 0.3333333
##
## Group means:
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa -1.0112835 0.78048647 -1.2900001 -1.2453195
## versicolor 0.1014181 -0.68674658 0.2566029 0.1472614
## virginica 0.9098654 -0.09373989 1.0333972 1.0980581
##
## Coefficients of linear discriminants:
## LD1 LD2
## Sepal.Length 0.6794973 0.04463786
## Sepal.Width 0.6565085 -1.00330120
## Petal.Length -3.8365047 1.44176147
## Petal.Width -2.2722313 -1.96516251
##
## Proportion of trace:
## LD1 LD2
## 0.9902 0.0098
The lda() outputs contain the following elements:
inspect model results
The predict() function returns the following elements:
# Make predictions
predictions <- model %>% predict(test.transformed)
names(predictions)
## [1] "class" "posterior" "x"
# Model accuracy
mean(predictions$class==test.transformed$Species)
## [1] 0.9666667
# Predicted classes
head(predictions$class, 6)
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
# Predicted probabilities of class memebership.
head(predictions$posterior, 6)
## setosa versicolor virginica
## 1 1 3.978425e-22 1.319337e-43
## 2 1 1.038098e-17 3.967605e-38
## 6 1 2.882148e-21 2.041612e-41
## 16 1 8.381782e-28 7.486309e-50
## 23 1 6.531615e-25 3.414098e-47
## 34 1 1.089899e-28 7.865614e-52
# Linear discriminants
head(predictions$x, 3)
## LD1 LD2
## 1 8.162939 -0.5052768
## 2 7.202713 0.7111062
## 6 7.816243 -1.7327151
plot
lda.data <- cbind(train.transformed, predict(model)$x)
# plot
ggplot(lda.data, aes(LD1, LD2)) +
geom_point(aes(color = Species))