\[ \mathbb{A+B}=(a_{ij}+b_{ij}) \]
\[ s\mathbb{A}=(s \times a_{ij}) \]
\[ \begin{equation} \tag{1} \vec{a^T}\vec{b}=\vec{b^T}\vec{a}=\begin{pmatrix} a_1 & \dots a_m\end{pmatrix}\begin{pmatrix} b_1 \\ \dots \\ b_m\end{pmatrix}=a_1b_1+ \dots + a_mb_m=\sum_{i=1}^{m}a_ib_i \end{equation} \] \[ \begin{equation} \tag{2} ||\vec{a}||^2=\vec{a^T}\vec{a}=\sum_{i=1}^{m}a_i^2 \end{equation} \] \(||\vec{a}||\) is called norm of \(\vec{a}\).
A matrix with the number of rows equivalent to that of columns is said to be square. For a square matrix, the elements on the diagonal are called the diagonal elements of that matrix. Their sum is called a trace
. \[ \begin{equation} \tag{2} tr(\mathbb{AB})=tr(\mathbb{BA}) \end{equation} \] \[ ||\mathbb{A}||^2=tr(\mathbb{AA^T})=tr(\mathbb{A^TA})=\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ij}^2 \] This is called the squared norm of \(\mathbb{A}\)
symmetric
.\[ \mathbb{D}^t=\mathbb{DD \dots D} \] - Identity matrix
refers to the diagonal matrix whose diagonal elements are all ones
\[ \mathbb{X}=\begin{pmatrix} x_1 \dots x_j \dots x_p\end{pmatrix} \]
\[ \bar{\vec{x_j}}=\frac{1}{n}\vec{1^T}\vec{x_j} \]
\[ \vec{y_j}=\vec{x_j}-\begin{pmatrix} \bar{x_j} \\\vdots \\\bar{x_j} \\\vdots \\\bar{x_j}\end{pmatrix}=(\mathbb{I_n}-\frac{1}{n}\vec{1}_n\vec{1}_n^T)\vec{x_j}=\mathbb{J}\vec{x_j} \] mind the centering matrix \(\mathbb{J}\), it has special properties \(\mathbb{J^2}=\mathbb{JJ}=\mathbb{J}, \mathbb{J}=\mathbb{J}^T\).
\[ var(\vec{x_j})=\frac{1}{n}\sum_{i=1}^{n}(x_{ij}-\bar{x_j})^2 \] Important: rewrite it in algebra terms, \[ var(\vec{x_j})=\frac{1}{n}(\mathbb{J}\vec{x_j})^T(\mathbb{J}\vec{x_j})=\frac{1}{n}\vec{x_j}^T\mathbb{J}\vec{x_j} \] \[ \begin{equation} \tag{3} var(\vec{x_j})=\frac{1}{n}\vec{x_j}^T\mathbb{J}\vec{x_j}=\frac{1}{n}\vec{x_j}^T\mathbb{J}^T\mathbb{J}\vec{x_j}=\frac{1}{n}\vec{y_j}^T\vec{y_j}=\frac{1}{n}||\vec{y_j}||^2 \end{equation} \] The variance
of raw scores is expressed using their centered score vector
simply as \(n^{−1}||\vec{y_j}||^2\). The variance of the centered scores equals that for their raw scores.
The square root of variance is called standard deviation, which is the length of vector \(\vec{y_j} = \mathbb{J}\vec{x_j}\) divided by \(n^{1/2}\).
Let the standard score vector for variable \(j\) be denoted by \(\vec{z_j}=[z1j, \dots, z_{nj}]^T\), \[ \vec{z_j}=\frac{1}{\sqrt{var(\vec{x_j})}}\mathbb{J}\vec{x_j}=\frac{1}{\sqrt{var(\vec{x_j})}}\vec{y_j} \] where \(\vec{y_j}\) is the centered \(\vec{x_j}\).
\[ \frac{1}{n}||\vec{z_j}||^2=\frac{1}{n*var(\vec{x_j})}||\vec{y_j}||^2=1 \] This also implies that the length of every standard score vector is always \(||\vec{z_j}|| = n^{1/2}\).
A property of matrix product: if \(\mathbb{A}\) is a matrix of n by m and \(\vec{b_1}, …, \vec{b_k}\) are m by 1 vectors, then
\[ \begin{pmatrix} \tag{4} \mathbb{A}\vec{b_1} & \dots & \mathbb{A}\vec{b_k} \end{pmatrix}= \mathbb{A} \begin{pmatrix} \vec{b_1} & \dots & \vec{b_k} \end{pmatrix} \]
from (4), Let \(\mathbb{Y} = [\vec{y_1} \cdots \vec{y_p}]\) denote the n by p matrix of centered scores whose jth column is the corresponding column of X. \[ \tag{5} \mathbb{Y}= \begin{pmatrix} \mathbb{J}\vec{x_1} & \dots & \mathbb{J}\vec{x_p} \end{pmatrix}= \mathbb{J} \begin{pmatrix} \vec{x_1} & \dots & \vec{x_p} \end{pmatrix}= \mathbb{JX} \]
\[ \tag{6} \mathbb{Z}= \begin{pmatrix} \frac{1}{\sqrt{var(\vec{x_1})}}\vec{y_1} & \dots & \frac{1}{\sqrt{var(\vec{x_p})}}\vec{y_p} \end{pmatrix}= \begin{pmatrix} \vec{y_1} & \dots & \vec{y_p} \end{pmatrix} \begin{bmatrix} \frac{1}{\sqrt{var(\vec{x_1})}} & & \\ & \ddots & \\ & & \frac{1}{\sqrt{var(\vec{x_p})}} \end{bmatrix}= \mathbb{YD^{-1}} \] where \[ \mathbb{D}= \begin{bmatrix} \sqrt{var(\vec{x_1})} & & \\ & \ddots & \\ & & \sqrt{var(\vec{x_p})} \end{bmatrix} \]
is the p by p diagonal matrix whose diagonal elements are the standard deviations for p variables.
The standard score matrix \(\mathbb{Z}\) can also be expressed as \[ \tag{7} \mathbb{Z=JXD^{-1}} \]
Covariance is a measure of dependency between random variables. The correlation between two variables j and k can be indicated by a covariance, which is defined as \[ \sigma_{jk}=Cov_{jk}=E[(x_{ij}-\bar{x_j})(x_{ik}-\bar{x_k})] \]
to express it in vector form, \[ \tag{8} \sigma_{jk}=\frac{1}{n}\begin{pmatrix} x_{1j}-\bar{x_j} & \dots & x_{nj}-\bar{x_j}\end{pmatrix} \begin{pmatrix} x_{1k}-\bar{x_k} \\ \vdots \\ x_{nk}-\bar{x_k}\end{pmatrix}= \frac{1}{n}(\mathbb{J}\vec{x_j})^T\mathbb{J}\vec{x_k}=\frac{1}{n}\vec{y_j}^T\vec{y_k} \] That is, the covariance between variables j and k is the inner product
of the centered score vectors \(\vec{y_j} = \mathbb{J}\vec{x_j}\) and \(\vec{y_k}=\mathbb{J}\vec{x_k}\) divided by n
.
The covariance of centered scores equals that of their raw scores.
The true covariance is estimated by the empirical covariance \(s_{jk}\): \[ s_{jk}=\frac{1}{n}\sum_{i=1}^{n}(x_ij-\bar{x_j})(x_{ik}-\bar{x_k}) \] For small n, say \(n \le 20\), we should replace the factor \(\frac{1}{n}\) by \(\frac{1}{n-1}\) to correct for a small bias. In addition, covariance measures only linear dependence. That is if X and Y are independent, then \(\rho_{X,Y}=Cov_{X,Y}=0\), in general, the converse is not true as exemplified by the covariance between X and its quadratic form.
Though the covariance is a theoretically important statistic, an inconvenient property of the covariance is that its value does not allow us to easily capture how strong
the positive/negative correlations
between variables are.
The correlation between two variables \(x_{ij}\) and \(x_{ik}\) is defined from the covariance
\[ \rho_{jk}=\frac{\sigma_{jk}}{\sqrt{var(\vec{x_j})}\sqrt{var(\vec{x_k})}} \] The true correlation \(\rho\) is estimated by the empirical correlation \(r\). An empirical correlation coefficient (r) between variables j and k is given by dividing the covariance by the square roots of the variances of variables j and k (i.e.,by the standard deviations of j and k).
\[ r_{jk}=\frac{\sigma_{jk}}{\sqrt{var(\vec{x_j})}\sqrt{var(\vec{x_k})}}=\frac{\vec{y_j}^T\vec{y_k}}{||\vec{y_j}||||\vec{y_k}||} \]
Vector \(\vec{y_j} = [y_{1j}, \dots , y_{nj}]^T\) is regarded as the line extending from \([0,\dots, 0]^T\) to \([y_{1j}, \dots , y_{nj}]^T\). \[ r_{jk}=\frac{\vec{y_j}^T\vec{y_k}}{||\vec{y_j}||||\vec{y_k}||}=\cos\theta_{jk} \]
The covariance of standard scores is equivalent to the correlation coefficient of raw scores.
the p-variables by p-variables covariance matrix \(\mathbb{V} = (v_{jk})\) for the data matrix X can be expressed as \[ \tag{9} \mathbb{V}=\frac{1}{n} \begin{bmatrix} \vec{x_1^T\mathbb{J}^T} \\ \vdots \\ \vec{x_j^T\mathbb{J}^T} \\ \vdots \\ \vec{x_p^T\mathbb{J}^T} \end{bmatrix} \begin{bmatrix} \mathbb{J}\vec{x_1} & \dots & \mathbb{J}\vec{x_j} & \dots & \mathbb{J}\vec{x_p} \end{bmatrix} = \frac{1}{n} \begin{bmatrix} \vec{x_1^T} \\ \vdots \\ \vec{x_j^T} \\ \vdots \\ \vec{x_p^T} \end{bmatrix} \mathbb{J}^T\mathbb{J} \begin{bmatrix} \vec{x_1} & \dots & \vec{x_j} & \dots & \vec{x_p} \end{bmatrix}=\frac{1}{n}\mathbb{X^TJ^TJX}=\\ \frac{1}{n}\mathbb{Y^TY} \]
Let \(R = (r_{jk})\) denote the p-variables by p-variables correlation matrix for X. Use (9): \[ \tag{10} \mathbb{R}=\frac{1}{n}\mathbb{Z^TZ}=\frac{1}{n}\mathbb{D^{-1}X^TJ^TJXD^{-1}}=\mathbb{D^{-1}VD^{-1}} \]
\[ V^{\star}=\frac{1}{n-1}\mathbb{Y^TY} \]
Its off-diagonal and diagonal elements are called unbiased covariances and unbiased variances, respectively.
Note
Correlations and covariances measure only linear dependence. The quadratic dependence of \(Y = X^2\) on \(X\) is not reflected by these measures of dependence. If \(x,y\) are independent, then \(\rho_{xy}=v_{xy}=0\); but the converse is not true as exemplified by the case of quadratic dependence.