Principal Component Analysis (PCA) is a statistical method usually used for reducing the dimensionality of data. "It involves replacing a group of series with a weighted average of those series, where the weights chosen so that the new vector (called the principal component or PC) explains as much of the variance of the original series as possible. This leaves a matrix of unexplained residuals, but this matrix can be reduced to a PC as well. In that case the original PC is called the first PC (PC1), and the PC of the residuals is called the second PC, or PC2. And there will be residuals from it too, yielding PC3, PC4, etc. The higher the number of the PC, the less important is the pattern it explains in the original data. PC1 is the dominant pattern, PC2 is the secondary pattern, etc. In many cases a large number of data series can be summarized with relatively few PCs." —Ross McKitrick, "What is the ‘Hockey Stick’ Debate About?"
The benefits of doing PCA are various:
1) Variable reduction: Perhaps the most common reason for using PCA is that any number of PCs, starting at PC1, contain more information than the same number of original variables. This leaves one able to isolate redundant information from independent variables and remove that information (i.e. the last PCs with minimal variance) from the statistical model. This in turn reduces the degrees of freedom required by the model and improves predictive power, especially when the number of observations of the target variable are relatively small.
2) Uncorrelated variables: These are desired, or sometimes required, by the assumptions of many statistical tests and processes.
PCA was used in many of the Hockey Stick studies, initially in MBH98 and MBH99 (MBH9X). It is used in two ways in the MBH algorithm:
For reducing the dimensionality of the tree ring proxy networks: After Michael E. Mann released (a part) of the source code used in MBH9X Steve McIntyre and Ross McKitrick found out that an incorrect version of PCA was used. This is usually called a Mannian PCA in Climate Audit.
For decomposition of the instrumental temperature record: To be precise, an extension of PCA called the method of empirical orthogonal function (EOF) analysis is used.
Problems with the Mann et al. PCA
"In all our discussions, a principal component series is weighted combination of up to 70 individual tree ring series. Some readers may find it helpful to think of the Dow-Jones Index, which is a weighted average of individual stock prices. Principal component series can include negative weights, which result in showing a contrast between different series - picture a series with positive weights for finance stocks and negative weights for tech stocks.
In principal components discussions, the weights have forbidding names like eigenvectors or empirical orthogonal functions, but, at the end of the day, these are just weights. The decomposition is prescribed by the matrix algebra. There are canned programs in high level languages so that the principal components decomposition of a matrix X of time series can be obtained in one line. As we discuss in our articles, these decompositions can be highly sensitive to transformations of the data - even if the transformation only seems to be a “standardization”." — Steve McIntyre, http://www.climateaudit.org/?page_id=1002
Ross McKittrick gives a detailed description of the problems with Mann et al.'s PCA in his paper "What is the ‘Hockey Stick’ Debate About?", explained for the general reader. You should read this paper — it is still the best explanation of what went wrong with the Hockey Stick, which goes well beyond the statistical problems. For another viewpoint, see Hockey stick controversy at Wikipedia. Unfortunately, this article does not , in our opinion, give an objective account of the debate. Efforts to edit the article to meet Wikipedia's standard neutral point of view have been thwarted by William M. Connolley (see post 17), a Hockey Team member and Wikipedia administrator, and other AGW partisans.