me nugget: SVD

Showing posts with label SVD. Show all posts

Thursday, February 25, 2016

RSpectra::svds function now default method for sinkr::dineof

The summary blog post describing the RSpectra (formerly rARPACK) package made a convincing case for the improved decomposition speed of the "svds" function for partial SVD (Singular Value Decomposition) over several other R packages. Until now, the sinkr package has relied on irlba for it's dineof function (Data Interpolating Empirical Orthogonal Functions). Since the routine can be quite computationally intensive, I wanted to test the performance of svds as an alternative method.

In a simple example that performs SVD on a field of sea level pressure of the equatorial Pacific, svds outperforms irlba both in speed and correlation of singular vector (e.g. $u) to the output of the base svd function. One can see in the following graph, that trailing vectors break down in their correlation while svds maintains nearly perfect correlation. Interestingly, this artifact is removed by first centering the data field.

While the effect looks dramatic above, it should be noted that the trailing vectors usually carry only a small fraction of information, and thus contribute only marginally to errors in field reconstruction. Below is a figure showing the reconstruction error of svd, svds, and irlba with increasing levels of truncation.

Finally, both methods were compared in their performance within dineof. With the non-centered field both approaches arrive to a similar RMS, but svds converges with less iterations and EOFs than irlba. With the centered data both methods produce nearly identical results.

So, even though the differences are small, the rSpectra::svds method will now be the default method for both the eof and dineof functions within sinkr. For the moment, the irlba method is maintained for compatibility with previous versions.

Code to reproduce:

DINEOF (Data Interpolating Empirical Orthogonal Functions)

I finally got around to reproducing the DINEOF method (Beckers and Rixon, 2003) for optimizing EOF analysis on gappy data fields - it is especially useful for remote sensing data where cloud cover can result in large gaps in data. Their paper gives a nice overview of some of the various methods that have been used for such data sets. One of these approaches, which I have written about before, involves deriving EOFs from a covariance matrix as calculated from available data. Unfortunately, as the author's point out, such covariance matrices are no longer positive-definite, which can lead to several problems. The DINEOF method seems to overcome several of these issues.

Canonical Correlation Analysis for finding patterns in coupled fields

First CCA pattern of Sea Level Pressure (SLP) and Sea Surface Temperature (SST) monthly anomalies for the region between -180 °W to -70 °W and +30 °N to -30 °S.

The following post demonstrates the use of Canonical Correlation Analysis (CCA) for diagnosing coupled patterns in climate fields. The method produces similar results to that of Maximum Covariance Analysis (MCA), but patterns reflect maximum correlation rather than maximum covariance. Furthermore, the output of the model is a combination of linear models that can be used for field prediction.

This particular method was illustrated by Barnett and Preisendorfer (1997) - it constructs a CCA model based on a truncated subset of EOF coefficients (i.e. "principle components") instead of using the original field (as with MCA). This truncation has several benefits for the fitting of the model - First, one reduces the amount of noise in the problem by eliminating the higher EOF modes, which represent poorly organized, small-scale features of the fields. Second, by using orthogonal functions, the algebra of the problem is simplified (see von Storch and Zweiers 1999 for details). Bretherton etal. (1992) reviewed several techniques for diagnosing coupled patterns and found the Barnett and Preisendorfer method (hereafter "BPCCA") and MCA to be the most robust.

Exponentiation of a matrix (including pseudoinverse)

The following function "exp.mat" allows for the exponentiation of a matrix (i.e. calculation of a matrix to a given power). The function follows three steps:
1) Singular Value Decomposition (SVD) of the matrix
2) Exponentiation of the singular values
3) Re-calculation of the matrix with the new singular values

The most common case where the method is applied is in the calculation of a "Moore–Penrose pseudoinverse", or a matrix raised to the power of -1. In this case, the function returns the same result as the "pseudoinverse" function from the corpcor package (although maybe not as nicely implemented). This should also be analogous to the pinv function in Matlab.

Less common is the need to calculate other powers of matrices. For example, I use this function to calculate the square root of a matrix (and square root of the inverse of a matrix) within Canonical Correlation Analysis (CCA). I hope to write a post shortly on the use of CCA in identifying coupled patterns in climate data.

Below is the code for the exp.mat function as well as some demonstrations of its use in R.

me nugget

Thursday, February 25, 2016

RSpectra::svds function now default method for sinkr::dineof

Tuesday, October 30, 2012

DINEOF (Data Interpolating Empirical Orthogonal Functions)

Sunday, March 25, 2012

Canonical Correlation Analysis for finding patterns in coupled fields

Thursday, March 22, 2012

Exponentiation of a matrix (including pseudoinverse)

NPR Topics: Science

WWF - Environmental News