Linear Discriminant Analysis (LDA)

Discriminant Analysis (DA) predicts the probability of belonging to a given class based on a set of original variables.

Let’s focus on the application of linear algebra on statistics here.

Mathematics & Objective function

The goal of LDA is to find a direction (basis vectors) in the data space that maximally separates categories of data. The objective function is defined as: \[ \tag{1} \begin{equation} \lambda=\frac{||\mathbb{X_B}\vec{w}||^2}{||\mathbb{X_W}\vec{w}||^2} \end{equation} \] In other words: we want to find a set of feature weights \(\vec{w}\) that maximize the ratio of the variance of data feature \(\mathbb{X_B}\), to the variance of data feature \(\mathbb{X_W}\).

Given \(||\mathbb{X_B}\vec{w}||^2=\mathbb{w^TX_B^T}\mathbb{X_B}\mathbb{w}, \mathbb{X_B^T}\mathbb{X_B}=\mathbb{C_B}\), to express equation (1) algebraically and generally \[ \begin{equation} \tag{2} \Lambda=\frac{\mathbb{W^TX_B^T}\mathbb{X_B}\mathbb{W}}{\mathbb{W^TX_W^T}\mathbb{X_W}\mathbb{W}} \Rightarrow \Lambda=\mathbb{W^{-1}C_W^{-1}C_BW} \Rightarrow \mathbb{C_BW}=\Lambda\mathbb{C_WW} \end{equation} \] To express equation (2) algebraically, the solution to LDA comes from a generalized eigendecomposition, i.e., simultaneous diagonalization of two matrices, on two covariance matrices. The eigenvectors are the weights, and the generalized eigenvalues are the variance ratio of each component.

Cons

Although LDA is widely used as a data reduction technique, it suffers from a number of problems.

Tutorial

To project the original data matrix onto a lower dimensional space, three steps needed to be performed:

Between & Within class variance

  • Class means \[ \mu_j=\frac{1}{n_j}\sum_{x_i \in w_j}{x_i} \]
# exmaple data classes
w1 <- matrix(c(1.00, 2.00, 3.00, 4.00, 5.00, 2.00, 3.00, 3.00, 5.00, 5.00), ncol = 2)
w2 <- matrix(c(4.00, 5.00, 5.00, 3.00, 5.00, 6.00, 2.00, 0.00, 2.00, 2.00, 3.00, 3.00), ncol = 2)
w1
##      [,1] [,2]
## [1,]    1    2
## [2,]    2    3
## [3,]    3    3
## [4,]    4    5
## [5,]    5    5
w2
##      [,1] [,2]
## [1,]    4    2
## [2,]    5    0
## [3,]    5    2
## [4,]    3    2
## [5,]    5    3
## [6,]    6    3
  • Total mean \[ \mu=\frac{1}{N}\sum_{i=1}^{N}{x_i}=\sum_{i=1}^{c}\frac{n_i}{N}\mu_i \]
# calculate class means and total mean
u1 <- colMeans(w1) %>% t()
u2 <- colMeans(w2) %>% t()
u <- u1*nrow(w1)/(nrow(w1)+nrow(w2))+u2*nrow(w2)/(nrow(w1)+nrow(w2)) 
u1 
##      [,1] [,2]
## [1,]    3  3.6
u2
##          [,1] [,2]
## [1,] 4.666667    2
u
##          [,1]     [,2]
## [1,] 3.909091 2.727273
  • Between-class variance of each class \(S_{B_i}\) & total between-class variance \(S_B\)
sb1 <- 5*t(u1-u) %*% (u1-u)
sb1
##           [,1]      [,2]
## [1,]  4.132231 -3.966942
## [2,] -3.966942  3.808264

\[ S_{B_1}=n_1(\mu_1-\mu)^T(\mu_1-\mu)=5\begin{pmatrix}-0.91 & 0.87\end{pmatrix}^T\begin{pmatrix}-0.91 & 0.87\end{pmatrix}=\begin{pmatrix}4.13 & -3.97\\ -3.97 & 3.81\end{pmatrix} \]

sb2 <- 6*t(u2-u) %*% (u2-u)
sb2
##           [,1]      [,2]
## [1,]  3.443526 -3.305785
## [2,] -3.305785  3.173554
sb <- sb1 + sb2
sb # between-class covariance matrix
##           [,1]      [,2]
## [1,]  7.575758 -7.272727
## [2,] -7.272727  6.981818
  • Within-class variance
# mean-center
d1 <- sweep(w1, 2, u1)
d2 <- sweep(w2, 2, u2)
d1
##      [,1] [,2]
## [1,]   -2 -1.6
## [2,]   -1 -0.6
## [3,]    0 -0.6
## [4,]    1  1.4
## [5,]    2  1.4
d2
##            [,1] [,2]
## [1,] -0.6666667    0
## [2,]  0.3333333   -2
## [3,]  0.3333333    0
## [4,] -1.6666667    0
## [5,]  0.3333333    1
## [6,]  1.3333333    1

Class-Independent Method

within-class variance for each class (\(S_{Wi}\)) is calculated as follows \[ \mathbb{S_{W_j}}=\mathbb{d_j}^T\mathbb{d_j}=\sum_{i=1}^{n_j}\mathbb{(x_{ij-\mu_j})}^T\mathbb{(x_{ij-\mu_j})} \] The total within-class matrix (\(S_W\)) is then calculated as follows \[ \mathbb{S_W}=\sum_{i=1}^{c}\mathbb{S_{w_i}} \]

sw1 <- t(d1) %*% d1
sw2 <- t(d2) %*% d2
sw <- sw1 + sw2
sw1
##      [,1] [,2]
## [1,]   10  8.0
## [2,]    8  7.2
sw2
##          [,1] [,2]
## [1,] 5.333333    1
## [2,] 1.000000    6
sw # within-class covariance matrix
##          [,1] [,2]
## [1,] 15.33333  9.0
## [2,]  9.00000 13.2

Generalized Eigendecomposition

Do generalized eigendecomposition with the between-, and within-class covariance matrix

s <- eigen(solve(sw) %*% sb)
s$values
## [1] 2.783885 0.000000
s$vectors
##            [,1]       [,2]
## [1,]  0.6773521 -0.6925318
## [2,] -0.7356590 -0.7213873

Projection on new space

# class 1 projection coordinate on v1
y11 <- w1 %*% s$vectors[,1]
y11
##            [,1]
## [1,] -0.7939658
## [2,] -0.8522727
## [3,] -0.1749206
## [4,] -0.9688864
## [5,] -0.2915343
# class 2 projection coordinate on v1
y21 <- w2 %*% s$vectors[,1]
y21
##           [,1]
## [1,] 1.2380905
## [2,] 3.3867605
## [3,] 1.9154426
## [4,] 0.5607384
## [5,] 1.1797836
## [6,] 1.8571357
# class 1 projection coordinate on v2
y12 <- w1 %*% s$vectors[,2]
y12
##           [,1]
## [1,] -2.135306
## [2,] -3.549226
## [3,] -4.241757
## [4,] -6.377064
## [5,] -7.069596
# class 2 projection coordinate on v2
y22 <- w2 %*% s$vectors[,2]
y22
##           [,1]
## [1,] -4.212902
## [2,] -3.462659
## [3,] -4.905434
## [4,] -3.520370
## [5,] -5.626821
## [6,] -6.319353

plot

class <- c(rep("1", 5), rep("2",6))
df <- rbind(w1,w2) |> as.data.frame() |> cbind(class) |> rename(ft1=V1, ft2=V2)

y1 <- rbind(y11, y21)
y2 <- rbind(y12, y22)
df_p <- cbind(y1,y2) |> as.data.frame() |> cbind(class) |> rename(ft1=V1, ft2=V2)


# plot
par(mfrow=c(1,2))
plot(df$ft1,df$ft2, type="p", pch=19, xlab = "Axis 1", ylab = "Axis 2")
plot(df_p$ft1,df_p$ft2, type="p", pch=19, xlab = "LDAxis 1", ylab = "LDAxis 2")

Implementation in R

# Load the data
data("iris")

# Split the data into training (80%) and test set (20%)
set.seed(123)
training.samples <- iris$Species %>%
  createDataPartition(p = 0.8, list = FALSE)
train.data <- iris[training.samples, ]
test.data <- iris[-training.samples, ]

# Estimate preprocessing parameters
preproc.param <- train.data %>% 
  preProcess(method = c("center", "scale"))

# Transform the data using the estimated parameters
train.transformed <- preproc.param %>% predict(train.data)
test.transformed <- preproc.param %>% predict(test.data)

# fit LDA
model <- lda(Species~., data = train.transformed)
model
## Call:
## lda(Species ~ ., data = train.transformed)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3333333  0.3333333  0.3333333 
## 
## Group means:
##            Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa       -1.0112835  0.78048647   -1.2900001  -1.2453195
## versicolor    0.1014181 -0.68674658    0.2566029   0.1472614
## virginica     0.9098654 -0.09373989    1.0333972   1.0980581
## 
## Coefficients of linear discriminants:
##                     LD1         LD2
## Sepal.Length  0.6794973  0.04463786
## Sepal.Width   0.6565085 -1.00330120
## Petal.Length -3.8365047  1.44176147
## Petal.Width  -2.2722313 -1.96516251
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9902 0.0098

The lda() outputs contain the following elements:

inspect model results

The predict() function returns the following elements:

# Make predictions
predictions <- model %>% predict(test.transformed)
names(predictions)
## [1] "class"     "posterior" "x"
# Model accuracy
mean(predictions$class==test.transformed$Species)
## [1] 0.9666667
# Predicted classes
head(predictions$class, 6)
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
# Predicted probabilities of class memebership.
head(predictions$posterior, 6) 
##    setosa   versicolor    virginica
## 1       1 3.978425e-22 1.319337e-43
## 2       1 1.038098e-17 3.967605e-38
## 6       1 2.882148e-21 2.041612e-41
## 16      1 8.381782e-28 7.486309e-50
## 23      1 6.531615e-25 3.414098e-47
## 34      1 1.089899e-28 7.865614e-52
# Linear discriminants
head(predictions$x, 3)
##        LD1        LD2
## 1 8.162939 -0.5052768
## 2 7.202713  0.7111062
## 6 7.816243 -1.7327151

plot

lda.data <- cbind(train.transformed, predict(model)$x)

# plot
ggplot(lda.data, aes(LD1, LD2)) +
  geom_point(aes(color = Species))

Reference

Tharwat, Alaa, Tarek Gaber, Abdelhameed Ibrahim, and Aboul Ella Hassanien. 2017. “Linear Discriminant Analysis: A Detailed Tutorial.” Ai Communications 30 (May): 169–90,. https://doi.org/10.3233/AIC-170729.