Loading [MathJax]/jax/output/HTML-CSS/jax.js

Saturday, November 29, 2014

Dimension Reduction using spectral decomposition (1) - LDA


Introduction

Most of texts including textbooks and web articles describe FLD(Fisher Linear Discriminant) as LDA instead of classical LDA. However if you survey academical papers released 10 years ago, you will find more simpler LDA than FLD.

The issues that LDA concerns are totally different from PCA's. The concept of "group or class" is added into LDA. PCA deals with individual samples. LDA focuses on classifications of sample groups.

PCA can be used as a recognition/classification approach by itself. But actually PCA is used as preprocessing of dimention reduction for more complex methods for classification or recognition. And some strategy used in PCA process is already melt inside LDA.

My reference on web is :
  1. https://onlinecourses.science.psu.edu/stat557/node/37
  2. http://en.wikipedia.org/wiki/Linear_discriminant_analysis
Before discussion on LDA, I will explain Mahalanobis Distance and Multivariate Normal Distribution. And then QDA, complicated version of LDA. Finally LDA. All of things in this articles are based on simillar process using covariance matrix.

Unfortunately this classical LDA needs 2 strict assumptions, identical covaraince matrix and multivariate normal distribution. They make hard to apply LDA on real world problems. But its strategy is simple. and through traveling to LDA here, we can understand relationships between a lot of contents look simillar but never explained in one shot.


Euclidean Distance and Mahalanobis Distance

There are so many types of distance metric between two vectors or features. Euclidean Distance is the simplest.
DEuclidean(x)=(xy)T(xy)

As describe in the figure above, if we select euclidean distance, we don't care about the distribution of sample group and its covariances.

 
In the calculation of Mahalanobis Distance, normalization transform on bases is added just before Euclidean Distance calculation.
DMahalanobis(x)=(xy)TΣ1(xy)


Let's deep-dive into the Σ1.
We already look over covariance matrix Σ in the previous article of PCA.

Σ=[σ211σ21Dσ2D1σ2DD]diagonalization=QΛQT=[e1e2eD][λ100λD][eT1eT2eTD]

λs can be replaced with "NEW" variances, σ2s.

Σ=QΛQT=[e1e2eD][σ21100σ2DD][eT1eT2eTD]
 
The matrix consisting of eTds means transformation to new bases ed.



Σ1=QΛ1QT=[e1e2eD][1σ211001σ2DD][eT1eT2eTD]=([e1e2eD][1σ11001σDD])([1σ11001σDD][eT1eT2eTD])

1σ means unit-varaince normalization.


Multivariate Normal Distribution

Multivariate Normal Distribution(MND) is :

ND(μ,Σ)=12πD|Σ|exp(12(xμ)TΣ1(xμ))

where D=dim(x).

We can see a term of squared Mahalanobis Distance. (xμ)TΣ1(xμ) means the squared Mahalanobis Distance from μ. Determinant of Σ is a product of variances dσ2d.


QDA (Quadratic Discriminant Analysis)

QDA is more general form of LDA. For QDA We need a important assumption that all classes have a normal distribution (MND).



If we want to discriminate class k and h, we have the following discriminant system.

gk(x)gh(x)>Tk against h gk(x)=p(x|Ck)p(Ck)=12πD|Σk|exp(12(xμk)TΣ1k(xμk))p(Ck) gk(x)gh(x)=12πD|Σk|exp(12(xμk)TΣ1k(xμk))p(Ck)12πD|Σh|exp(12(xμh)TΣ1h(xμh))p(Ch)=|Σk|12exp(12(xμk)TΣ1k(xμk))p(Ck)|Σh|12exp(12(xμh)TΣ1h(xμh))p(Ch)

It is based on simple Bayesian Decision. Let's refine the formula by using log-scale and the general notation, πk=p(Ck)

ln(gk(x)gh(x))=12ln|Σk|+(12(xμk)TΣ1k(xμk))+lnπk+12ln|Σh|(12(xμh)TΣ1h(xμh))lnπh

The discriminant function has a quadratic form of vectors. This shows why its name is "Q"DA.


LDA (Linear Discriminant Analysis)

LDA need one more assumption. Σs are same.

It looks unrealistics. But here go ahead.

ln(gk(x)gh(x))=12ln|Σ|+(12(xμk)TΣ1(xμk))+lnπk+12ln|Σ|(12(xμh)TΣ1(xμh))lnπh=(12(xμk)TΣ1(xμk))+lnπk(12(xμh)TΣ1(xμh))lnπh=12(xTΣ1x2μTkΣ1x+μTkΣ1μk)+12(xTΣ1x2μThΣ1x+μThΣ1μh)+lnπklnπh=12(2μTkΣ1x+μTkΣ1μk)+12(2μThΣ1x+μThΣ1μh)+lnπklnπh=(μkμh)TΣ1x12(μk+μh)TΣ1(μhμk)+lnπklnπh

As you see, the discriminant boundary changes from a curve to a straight line (Linear)





Conclusion

Covariance matrix is everywhere. The contents described here are so feasible that most of researchers and developers do not need to check again. But someone who doesn't have concrete mathematical or statistical funcdamentals cannot catch these common pieces and can be confused easily.

No comments: