EECS205002 Assignment 4: Everyone Has A Number

On Sale

$25.00 (% off)

$19.00

Added to cart

Suppose we want to give each student a number based on his/her personal data, such as weight and hight. How can we do that? One simple strategy is to nd a weighted sum,

s = ₁w + ₂h;

that combines both factors. The question is how to decide the ratio between ₁ and ₂?

One solution is to perform the Principle Component Analysis (PCA) of the given data. Suppose we have N students, whose weights and heights are (w₁; h₁), (w₂; h₂), : : :, (w_N ; h_N ). The PCA computes a vector (1-D subspace), so that the data projected to it have the maximum variance, as shown in Figure 1. The solution of the vector is called the principle component of the given data.

Figure 1: The principle components of the given data.

Now let’s translate this problem to the language of linear algebra. Let x_i = (w_i; h_i)^T and the principle component be y = ( ₁; ₂), where kyk = 1. Recall the projection of x_i to y is (see Section 5.1)

P x_i = (yy^T )x_i = (y^T x_i)y = a_iy:

where P = yy^T is the projection matrix, and a_i = y^T x_i is the coordinate of x_i’s projection onto y. Now the problem of PCA becomes a one-dimensional variance maximization problem. Recall the basic statistics, the mean and the variance of a₁; a₂; : : : ; a_N are

a =

a_i and ² =

(a_i

a)²:

X_i

i=1

So the problem of PCA can be expressed as

max

^N(a_i

a)²

(1)

k k

X_i

It can be shown that (1) is equivalent to the following expression (why?)

k k

^Xi

max

P (x

x)k

(2)

y; y =1

where x is the mean of x_i.

Let’s continue to work on the details of (2). The square of vector norm equals to the inner-product of itself, and P = yy^T is the projection matrix. So

kP (x_i

x)k² = (x_i

x)^T P ^T P (x_i

= (x_i

x)^T yy^T yy^T (x_i x)

= (x_i

x)^T yy^T (x_i

We can do that because_Ty

²=1.

y = k

T^k

Next, since (x_i

y and y

(x_i

x) are scalars, we can exchange those

two terms in their product and rewrite the above equation as

y^T (x_i

x)(x_i

x)^T y:

Let’s de ne the covariance matrix for the vector data x_i as

(x_i

x)(x_i

x)^T :

(3)

^Xi

For x_i = (w_i; h_i)^T , the covariance matrix is

= _N

(w_i

ⁱw)(h_i

i=1

_(hi^w)(_hⁱ₎2

h h)

(20%) For each student, we have collect his/her personal data x_i = (w_i; h_i; z_i). https://forms.gle/3LZ1LKrAaJCZ3C1z6

(10%) Remove your own data from the list, and use other data to compute the PCA. Then use the mean and the principle component of other data to calculate your number.

(10%) Remove your own data from the list, and use other data to compute the PCA of 2D data rst, (w_i; h_i), and nd the weighted sum t_i = ₁w_i + ₂h_i. Then compute the PCA of (t_i; z_i). Plot the gure of t_i, z_i, and their principle components, as Figure 1. Compare the results with (b). Will they give the same numbers? Discuss the reasons. (If they are the same, why? And if they are di erent, why?)

(20%) How to project the 3D data onto a 2D subspace so the variance of projected data are maximized? Design an algorithm, prove its correctness, and write a python code using the given data to show your algorithm works

pictorially. Your algorithm needs to nd an orthogonal basis of the 2D subspace Y = [y₁ y₂], Y ^T Y = I₂. The de nition of 2D variance is

^Xi

a = _N

a_i; ² = _N

ak²;

(6)

i=1

ka_i

where a_i is the projected x_i from 3D to the 2D subspace, and a is their center (mean). The problem is to maximize ².

Submissions

Write a report in a PDF le that includes (1), (2), (3), (4), and (5). For question (4), attach the plots and your discussion. For question (5), give your algorithm and proofs.

Python code of your implementation of (4) and (5).

Zip them and submit to iLMS system.

You will get a ZIP (11KB) file

EECS205002 Assignment 4: Everyone Has A Number

You Might Also Like